Tuesday, March 31, 2015

Retrieve an average number of chars in XML column in DB2


In case if you have an XML column content in your youtTable and you want to count an average number of characters (or min, max) in the column you can use this code. 
  1. set schema yourSchema;
  2. SELECT avg(xtab.length) 
  3.    FROM URCH_POSTS, 
  4.        XMLTABLE(     
  5.           'for $i in $c 
  6.         let $len := fn:string-length($i)
  7.       return {$len}'
  8.       passing youtTable.content as "c"
  9.       COLUMNS 
  10.               length INTEGER PATH '.')  
  11.                                                            as xtab;
In 4th line we are starting to create a XMLTABLE
In 5th line we start an XQuery (') for loop over every line of $c where $c is defined in the line 8 (just a value from you column)
In 6th line we define our return value. It should be in the XML form. And finish our XQuery (')
In 9th line we start to define columns that will be returned in the XMLTABLE
In 10th line we name our column length, define its type INTEGER and the path in the resulted from the line 7 XML string. 
In the 11th line we name the output
In the 2nd line we retrieve the output as an INTEGER column and therefore we can run corresponding operations. 

One can  refine the code by calculating the average in the XQuery directly using fn:avg

Friday, July 4, 2014

Patterns of users/communities/media in the Web: list of papers

Hi,

i want to share a list of papers that are observing user traces in the Web and extract patterns from these. You can add your/other papers by commenting + i am updating the list. The citation style is not consistent and i'm not guaranteeing that citations are 100% correct (pages, proceedings, etc). If you use them please check and let me know if anything is wrong.

I add as well some comments about the papers.

The last but not least, good luck with your research.



  • 2006
    • Danyel Fisher, Marc Smith, and Howard T. Welser. 2006. You Are Who You Talk To: Detecting Roles in Usenet Newsgroups. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 03 (HICSS '06), Vol. 3. IEEE Computer Society, Washington, DC, USA, 59.2-. DOI=10.1109/HICSS.2006.536 http://dx.doi.org/10.1109/HICSS.2006.536

      The authors define a question-answer person
       that is characterized by responding to those with low out-degree and infrequently responding to those with higher degree. Depending on a forum topic, the behavior of its users differentiate. Forums devoted to hobby topics admit outsiders though they have cores. While in flame forums people are replying only to well-connected users.
    • Ralf Klamma, Marc Spaniol, and Dimitar Denev. PALADIN: A Pattern Based Approach to Knowledge Discovery in Digital Social Networks. In Proceedings of I-KNOW ’06, 6th International Conference on Knowledge Management, Graz, Austria, September 6 - 8, 2006, J.UCS (Journal of Universal Computer Science) Proceedings, pages 457–464. Springer, 2006.

      Focus of this paper is to create a pattern language that describe patterns of digital traces in Web. After analyzing mailing lists, authors find patterns like no answering person, spammer and troll based on patterns they described with the language.
  • 2009
    • R. Dean Malmgren, Jake M. Hofman, Luis A.N. Amaral, and Duncan J. Watts. 2009. Characterizing individual communication patterns. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09). ACM, New York, NY, USA, 607-616. DOI=10.1145/1557019.1557088 http://doi.acm.org/10.1145/1557019.1557088

      The authors define a model that describe communication patterns (e-mail communication of students); find that users do not change their behaviors over semesters (their belonging to initial clusters are stable) while they differentiate between each other. 
    • Meeyoung Cha, Juan Antonio Navarro Perez, and Hamed Haddadi Flash Floods and Ripples: The Spread of Media Content through the Blogosphere. In Proc. of the AAAI Conference on Weblogs and Social Media (ICWSM) Data Challenge Workshop, San Jose, May 2009

      The paper investigates patterns of content diffusion by investigating blog networks. Most of blogs (70%) produces only 30 % of the content. While other blogs serve like mass media. These are community-edited blogs, spammed blogs, microblogging blogs and content aggregators. Two types of content spreading was defined: flash floods like news which quickly distribute but did not appear later and ripples which can be discovered a long time after appearing, like songs.
    • Fabrício Benevenuto, Tiago Rodrigues, Meeyoung Cha, and Virgílio Almeida. 2009. Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference (IMC '09). ACM, New York, NY, USA, 49-62. DOI=10.1145/1644893.1644900 http://doi.acm.org/10.1145/1644893.1644900

      User activities with social networks reveal more often user activities in SN. Moreover sequence of activities were investigated as well to find out when and why users perform some actions, like viewing pictures. 
  • 2012
    • Claudia WagnerMatthew RoweMarkus Strohmaier, and Harith Alani What Catches Your Attention? An Empirical Study of Attention Patterns in Community Forums. ICWSM, The AAAI Press,

      The study focuses on LDA (Latent Dirichlet Allocation) application on investigating forum texts. The authors consider user, focus, post, title and community features to find the most relevant features for each community. They discover differences in features' significance depending on communities. For example, in communities with no specific topics a post should be short to increase a probability to be answered. While in communities with specific topics newbies get replies quiet often.
    • Hillmann, R.; Trier, M. Dissemination Patterns and Associated Network Effects of Sentiments in Social Networks 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 

      The authors consider Web communities according to dynamic network motif analysis. They find network patterns like reciprocity (bi-directional connection) and transitivity (a friend of my friend becomes my friend) in BBC and Digg forum discussions. But emotions, positive and negative, expressed in textual messages have no impact on these classical network patterns.
    • Cha, M., Benevenuto, F., Haddadi, H., and Gummadi, K.  The World of Connections and Information Flow in Twitter. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 42, no. 4 (2012): 991–998. 

      The authors divide Twitter users onto three groups: grassroots, evangelists and mass media. The division was done according to the indegree of the users. After that, these users were described, e.g. reciprocity, appearance in events, etc. One of the statement of the paper is that evangelists extend the reach of mass media up to 25 %. 
  • 2013
    • Kizilcec, René F., Piech, Chris, and Schneider, Emily. Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. 2013. http://doi.acm.org/10.1145/2460296.2460330.

      The main focus of the paper to find patterns of learner engagement in Massive Open Online Courses. Authors use demographic, forum activity, overall experience and some other information for detecting clusters and explain user disengagement by consideration of particular classes.
  • 2014
    • Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide web (WWW '14). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 687-698. DOI=10.1145/2566486.2568042 http://dx.doi.org/10.1145/2566486.2568042 

      Here activities of users in MOOCs were analyzed. The authors define patterns of users trivially by a number of lecture viewing. Later grades of users were compared with viewing of lectures and participating in forums. Two types of users are submitting assignments: hard working students that are engaged in a course all the time and students that have a knowledge of the course and just submit its assignments. Forum participants are mostly consist of forum users. In forum threads users who answer threads are those who are more active in the forum than initiators of the threads. Or are those who have higher grades.