Indic language wikipedias – Statistical report – 2012

The content of this blogpost is moved to the new domain. Please visit http://shijualex.in/indic-language-wikipedias-statistical-report-2012-2/ to read this blogpost.

This entry was posted in Indian Language Wikipedia Statistics, Indian Language Wikipedias, Indic Languages, Wikimedia, Wikipedia and tagged , . Bookmark the permalink.

7 Responses to Indic language wikipedias – Statistical report – 2012

  1. Dear Shiju,
    Again another good report! I think it would be very useful to all the Indic language Wiki-watchers.

    Like I said last time, we should not give merely the article count, we should at least give one other column, where the articles with at least 200 characters are shown. For example Telugu Wikipedia shows 45K articles (in May 2010), but only 17K articles with 200+ characters. Similarly Hindi Wiki shows 56K (in May 2010) but only shows 35K articles with 200+ characters. So giving just the article count is quite misleading. As you rightly point out – “Do not get obsessed by article counts or readership. These are natural outcomes of community building.” – is indeed most important, but there is no harm in providing one more column. All the newspapers and lay media use raw data of “article count” and draw all kinds of conclusions and unwelcome misleading comparisons and misrepresentation of the situation. All these ‘quality’ parameters are actually not that much valuable (including size, the number of words etc.) . That Indic Wikipedias can have a huge impact in spreading the knowledge in Indian languages is the important point and we are indeed making progress towards this goal.

    Again, congratulations on bringing out this edition of the report.

    Regards
    Selva
    ————-
    C.R. (Selva) Selvakumar (from Tamil Wikipedia – userpage http://ta.wikipedia.org/s/1lo )

  2. Shiju Alex says:

    Hi Selva,

    I know the importance of such a data (over 2 kB) when it comes to article count. Unfortunately such a data do not exist. WMF is not giving importance to that data now days I guess. For example, for Tamil I took the data from http://stats.wikimedia.org/EN/SummaryTA.htm and http://stats.wikimedia.org/EN/TablesArticlesTotal.htm The data you asked do not exist there.

    The data for over 2kB (http://stats.wikimedia.org/EN/TablesArticlesGt1500Bytes.htm) was last updated some time in June 2010. So no data exists for 2011 or 2012. Same is the case with other Indic languages.

    May be if was a technical person I would have been able to generate this data myself after analyzing the dumps. But I have no technical knowledge regarding this.

    As you rightly said the number of articles over 2 kB (according to me for indic langauage it should be 5 kB or 10kB) will give the actual useful content in the respective language wikipedia. We might need to find some new tools to generate such a data.

  3. Shiju,

    Thanks for the report.

    To compare monthly averages instead of end of year performances will be good as a community’s activity can differ throughout the year.

    Increase in database size, number of most active contributors (making more than 100 edits a month) and page views can be real indicators of growth and activity.

    Any inference based on page views should be normalized based on the population of native speaking people to have a better perspective of the community’s performance compared to its potential / size.

    Ravi

  4. Shiju Alex says:

    \\To compare monthly averages instead of end of year performances will be good as a community’s activity can differ throughout the year. \\

    Yes. Ravi. Now I too feel the same. Will work on this and try to update the user statistics.

    \\Any inference based on page views should be normalized based on the population of native speaking people to have a better perspective of the community’s performance compared to its potential / size. \\

    hmm. Let me try to analayze this report further and try to publish a separete post based on suggestions from both of you. The real issue is some of the data (for example articles more than 2kB) are not getting updated. Let me find some alternate ways of digging out this data with the help of developers. If that works I will write a separate blogpost by taking into account of thje suggestions from both of you.

    Thanks for visting.

  5. Pingback: Analysis of the Indic Language Statistical Report 2012 | abundance of the heart

  6. Shiju please correct the below paragraph as year show is wrong it should be 2012 instead of 2013

    *With more than 1,04,000 articles, Hindi continues to the biggest indic language wikipedia in terms of the number of articles. Almost 3500 articles were added to Hindi wikipedia in the year 2013.

  7. Shiju Alex says:

    Thanks Irvin, corrected the error.

Comments are closed.