Showing posts with label PhD topic. Show all posts
Showing posts with label PhD topic. Show all posts

Wednesday, September 26, 2012

Users editing articles in several Wikipedia

I was wondering about contributors who are posting in several Wikipedia, Who are they? What country are they from?

Surely English Wikipedia is a leader in the number of contributors, articles, articles edits and number of users who contribute both in English and any other Wikipedia is high. I never doubt it. What about other Wikipedia? Looking at Wikipedia we can find which nations interact more often with each other than others.

A bit more than 1% of all contributors from our data set editing/creating articles in 2 Wikipedia. Only 12 of 4,919,026 users have ever worked in 5 Wikipedia.



Most of the cross-Wikipedia authors are registered; therefore, we cannot identify the place where the authors come from. A majority of the cross-Wikipedia anonymous authors are coming from Germany (15%) and the United States (11%)













The cross-Wikipedia authors contributing to more than 3 Wikipedia are summarized in following table. 



We find 1,718 contributors that did 618,584 edits. In the table we summarize the contributors work: the number of edits and the number of edits per cross-Wikipedia user. Cross-Wikipedia users contributing in 3 Wikipedia, are participating often in Russian, Japanese and any other Wikipedia from the set. The highest activity (more than 100 edits per cross-Wikipedia user) was shown in Macedonian, Russian, Catalan and Arabic Wikipedia instances. Generally cross-Wikipedia users are more active than contributors working only in one Wikipedia.

Data set:


We analyzed the Wikipedia data starting from June, 30th 2001 till January, 1 2009 and divide this time into 16 equal time intervals to keep visualizations simple. The period of the 1st interval is from June 30th, 2001 till December 31st, 2001, the period of the 2nd interval is from January 1st, 2002 till June, 29, 2002 and so on. Because of hardware limitations, we did not consider all revisions done in the period for constructing author networks. As Wikipedia instances have different number of articles, we choose a number of revisions depending on the number of all revisions in the instances. The revisions are picked up along an article timeline. The overall structure of the Wikipedia network and activities of Wikipedians are not impaired by this data reduction. Finally, we got datasets with number of revisions comparable to each other.

We chose both European and Asian Wikipedia. The instances were selected according to their size: large European Wikipedia (Spanish and Russian), large Asian Wikipedia (Japanese and Turkish), small European Wikipedia (Bulgarian, Catalan, Danish, Greek, Macedonian, and Ukrainian) and small Asian Wikipedia (Arabic, Hindi, and Korean). The list of small European Wikipedia list includes Wikipedia of different Slavic languages (Bulgarian, Macedonian, Ukrainian) and the Catalan Wikipedia, the Wikipedia of a minority language group in Spain.

Monday, August 20, 2012

Cultural analysis of Wikipedia


I'm working under the paper devoted to cultural differences in Wikipedia. Therefore, i have done some literature research where i found some interesting resources devoted to this topic. Following I'm sharing them with you. 
Wikipedia has been found in early 2001 and since then has been one of the most successful and referenced source of knowledge with a qualitative information[1]. The open Encyclopedia can include opinions of any person wherever she geographically located. Wikipedia is organized in such a way that any culture can open its Wikipedia and use its own language for providing content.
According to Hofstede (1991) "culture is the collective programming of the mind which distinguish[sic] one group of people from the other"(Hofstede, 1991). Cultural patterns together create a complex culture structure that can be examined through studying cultural dimensions (Hall, 1976; 1983; Hofstede, 1991; Kluckhohn and Strodbeck 1961; Trompenaars & Hampden-Turner, 1998). Hofstede defined 5 dimensions of culture and calculate empirically dimension rates of many nationalities in the World. The dimensions has become recently a standard framework for cross-cultural research projects. Some works about Wikipedia cross-cultural nature use Hofstede’s dimensions. Courtesy behaviors in the Eastern Wikipedia are explained by high value of power distance and preferable collective work (Hara et al., 2010).  They claimed that authors from Western Wikipedia have more conflict and disagreement behaviors. Moreover, they argue that patterns of author behavior differ in the size of Wikipedia. They analyzed only 4 Wikipedia of various sizes focusing on differences between eastern and western cultures.
Hofstede’s dimensions were used in measuring the quality of the article game (Pfeil et al., 2006). The researchers analyzed the Wikipedia article from French, German, Japanese and Dutch Wikipedia. They found that even if cultures have positively correlated Hofstede's dimensions, Wikipedia quality criteria of considered Wikipedia are different. More evidence of cultural influence in Wikipedia was found by Pembe & Bingol (2006) during comparing linguistic structures of English and German Wikipedia. Although the English Wikipedia is larger; the German Wikipedia included at that point more words associated with family concept.
Rask (2007) analyzed differences between aspects of Wikipedia in developed and developing countries. He considered the number of contributors and edits per articles from different countries and concluded that richer countries profit more on the knowledge shared in Wikipedia. Rask compared different Wikipedia from the economical point of view using the human development index[2] in his observations.
The attempts to define cultural patterns include examination of Wikipedia user talks and talk contents, contents of articles and numbers of edits. In this paper we focus on a still untouched area for cultural patterns in Wikipedia. We observe author networks over the course of time and differentiate between registered and anonymous authors. Moreover, the geographical location of authors and their migrations to other countires is taken into consideration. Furthermore, we find Wikipedia users contributing to several Wikipedia and analyze their behavior and geographical location. These and other findings devoted to Wikipedia growth and editing behaviors are used to define cultural patterns.
Voss (2005) was the first who as many other researchers analyzed fundamentals of Wikipedia and their networks. His main focus was on the German Wikipedia and its graph of links. He showed that the Wikipedia network is scale-free (Barabási et al., 1999). Moreover, Voss found that the number of user talk pages is much higher in Japanese than in German, Danish or Croatian Wikipedia although he left questions concerning cultural differences of Wikipedia unanswered. 
In the next section, we consider name existing research works examining Wikipedia networks. Later, we present our methodology, and afterwards explain the data set we are using. The results include findings about authors, their behaviors, author networks, and articles. The paper concludes with a discussion and an outlook on future work. 

Dynamic development of Wikipedia network

Dynamic development of networks was in the focus by many works (Klamma and Haasler, 2008a, 2008b; Capocci et. al, 2006, Zlatic et. al, 2006). Klamma and Haasler (2008a, 2008b) visualized different Wiki projects (Berlin Wiki, Google Wiki, Aachen Wiki) and observed their changes over the course of time. They found that registered users often serve as connectors in networks of anonymous users. Moreover, they showed that a tiny number of Wikipedia contributors created or edited the majority of articles. Klamma and Haasler created Wikiwatcher tool that can be used for retrieving Wikipedia data, visualize their networks and calculate simple SNA values.
Analysis of Wikipedia as complex networks reveals that the growth of Wikipedia happens according to the preferential attachment (Barabási et al, 1999). Capocci et. al, (2006) showed similarities of evolution patterns of complex networks of WWW and different Wikipedia: new nodes are more probably connected with existed nodes with high degrees of connections. Zlatic et. al (2006) examined 11 Wikipedia networks of articles. The researchers concentrated on article network measures and its comparison. They argue that the growth of Wikipedia networks is unique for different language versions of Wikipedia.

References:
Barabási, A.-L., Réka A., Hawoong J. (1999): Mean-field theory for scale-free random networks.
Hall, E.T. (1976). Beyond culture. Garden City, New York: Doubleday.
Hall, E.T. (1983). The dance of life. Garden City, New York: Doubleday.

Hara, N., Shachaf, P., & Hew, K. (2010). Cross cultural analysis of the Wikipedia community. Journal of the American Society of Information Science and Technology, 61(10), 2097‐2108.
Hofstede, G.H. (1991). Cultures and organizations: Software of the mind. London: McGraw Hill.
Kluckhohn, C., & Strodbeck, F.L. (1961). Variation in value orientation. Evanson, IL: Row and Peterson.
Pembe, F., & Bingol, H. (2006). Complex networks in different languages: A study of an emergent multilingual encyclopedia, Proceedings of Sixth International Conference on Complex Systems, June 25-30, 2006, Boston, MA, USA.
Pfeil, U., Zaphiris, P., & Ang, C.S. (2006). Cultural differences in collaborative authoring of Wikipedia. Journal of Computer-Mediated Communication, 12(1), article 5.
Rask, M.(2007), The Richness and Reach of Wikinomics: Is the Free Web-Based Encyclopedia Wikipedia Only for the Rich Countries?. Proceedings of the Joint Conference of The International Society of Marketing Development and the Macromarketing Society, June 2-5, 2007. 

Trompenaars, F., & Hampden-Turner. C. (1998). Riding the waves of culture: Understanding cultural diversity in global business. New York: McGraw-Hill.
Voß, J. (2005). Measuring Wikipedia, Proceedings of the 10th ISSI 2005 Conference, July 24-28, 2005, Stockholm, Sweden, 1-12.