I have compiled the statistical report of the Indic language Wikipedias for the year 2010. My aim is to provide my views about the contributions in various Indic language wikipedias for the year 2010 (contributions happened between 2010 January 1 and 2010 December 31).
I was waiting for the past 2 months to publish this report. Since this report is based on statistical information published at http://stats.wikimedia.org, I was waiting to get that site updated. But due to various technical issues (my assumption), the stats.wikimedia team is not yet able to generate data for some important parameters. So I have decided to move forward with the available data.
Disclaimer
In this blog post I am not going to select the best Indic language wikipedia after comparing different parameters. My intention is to compare different quality and quantity parameters and present the current status of various Indic language Wikipedias.
Parameters selected
I have included almost all important parameters (for which there is required and updated data) for this report. But there are few parameters like articles with size > 0.5 kb, articles with size > 2 kb, bytes per article, and so on that I want to include in this report. But the data for those parameters is not yet ready. So I am using only those parameters for which the complete information is available.
Wikipedias Selected
For the analysis, I have selected the wikipedias of all the languages spoken in Indian subcontinent, or more precisely, the languages of India and its neighboring countries.
In the report, the language wikipedias are divided into 3 groups based on the number of articles (The number of articles that respective language wikipedia had on 2010 December 31). This division is done purely for comparing apples with apples
. There is no meaning in comparing a Wikipedia having more than 50,000 articles (for example, Hindi Wikipedia) with another wikipedia having less than 1000 articles (for example, Assamese Wikipedia). Following are the groups and the languages that fall under each group.
Group 1 (More than 10,000 articles)
- Nepal Bhasha/Newari
- Hindi
- Telugu
- Marathi
- Bishnupriya Manipuri
- Tamil
- Bengali
- Gujarati
- Urdu
- Malayalam
- Nepali
Group 2 (More than 1,000 articles)
Group 3 (Less than 1,000 articles)
Report
This analysis is divided into 2 sections. 1. Article statistics and 2. editor statistics. I have labeled the first 3 positions of each group using A, B, and C.
——————————————————————
NOTE:
I am forced to place tables as images in this blogpost, since I found it very difficult to handle these many tables in good format in wordpress blog. If any one require PDF or ODT format of the below tables please contact me at shijualexonline@gmail.com
—————————————————————-
Section I: Article statistics
Parameter 1 – Number of Articles
Number of articles in a wiki is always a popular parameter (and fancy parameter
) that everyone likes. Most of the time it is shown as an indicator of a successful wikipedia. But many communities are misusing it by concentrating only on increasing the article count using bots (and most of those articles are one-liners or copy paste from English wikipedia) and not on building the community.
Like last year, Nepal Bhasha/Newari wikipedia continues to be on the top with 69,557 articles. Hindi, the biggest language of India, comes in the second place with 58,055 articles. Nepali wikipedia has grown from 3,079 articles to more than 10,000 articles.
The article creation in Bishnupriya Manipuri is almost stopped now. Now there is no community to work on it. According to me, this will be the case with all wikipedias that are not trying to build a community. For the wikipedias that has good community the article creation and the activity in Wiki are almost stable.
The focus of Bengali wiki community is completely shifted from article creation to increasing the quality of the existing articles. I found that more language wiki communities have started focusing on the quality than on the quality.
Pali, Bhojpuri, Punjabi, and all the wikipedias in Group 3 are inactive. There needs to be some program to revive the respective community by reaching out to the speakers/interested users of the respective language.
Parameter 2 – Number of Edits
The number of edits shows the overall activism in wiki.
Malayalam tops the list and Bengali comes second. Malayalam recently crossed the 10 lakh edit mile stone, the first Indian language wikipedia to cross that milestone.
Parameter 3 – Breakup of Edits
This parameter shows the percentage of manual edits and bot edits.
Few experienced Wikipedians use bots to automate many tasks that are repetitive (for example, adding inter wiki links, fixing double redirects, and so on). In general it is a good idea to automate the tasks that are repetitive. But when we even create articles using bots (just for the sake of increasing the article count), then we are misusing a good tool that is given for our assistance.
By comparing the above parameters with the other parameters listed on this blogpost, we can find that most of the articles in Nepal Bhasha/Newari wikipedia are bot created. The percentage of bot edits in that wiki is more than 90 % which is not good to sustain an active community. Same is the case with Bishnupriya Manipuri with almost 97% bot edits.If we continue to create bot articles like this then community won’t grow. And wiki may eventually become inactive especially if the number of speakers of that language wiki is less. You can already see the sign of low activity in Bishnupriya Manipuri wikipedia. Only 10 articles are created in that wiki in 2010 even though it has more than 24,700 articles when we started 2010. Also bpy wiki has no active editors now. So this is an alarm sign for all wiki communities.
Blocking of bot edits in a wiki is also not a good idea. We do not want active editors go after fixing double redirects, insert inter wiki links, and so on which can easily be done using a bot. In the case of Sinhala wikipedia most of its articles doesn’t have inter wiki links. That is one of the main reason for its low bot edit percentage. This is also not good for a wikipedia.
So a balance is required for using the bots for wiki editing. My suggestion is, use bots only for the most required and sufficient tasks. Do not use bot to block the tasks that a new user would like to do (a new user always like to start wiki editing by creating new articles). As per my understanding, 40-50% of bot edit is normal in Indic wikis due to various type of fixes required. But if it goes beyond 50 or 60 percent that means active editors are moving away from wiki, which is not good for a wikipedia.
Remember Wikipedia is an encyclopedia in making. We do not want to complete this task today.
The world will not end tomorrow. If it does, we don’t need Wikipedia. If it doesn’t, and if Wikipedia is to be a long-term, respected and reliably informative entity – perhaps even “The Encyclopedia of the Future” – then we can afford to take our time when deciding on what is allowed in, and when we open the door.
Parameter 4 – Edits per article
Edits per article shows the number of times a wikipedia article is edited. For active wikipedias it is a rough indicator of quality. Wiki article will have more encyclopedic value when more people see and edit it. When more people edit an article the neutrality and the quality of article will increase. So it is an indicator of quality.
Malayalam tops the list with almost 30 revisions per article. Bengali and Tamil come in the second and third positions. As you know all these wiki communities are very active. If you have large number of active users the quality of wiki article will go up. The higher number of edits per article for some of the languages (for example, Pali and Bhojpuri) in Group 2 and all the languages in Group 3 is due to the low article count in that wikis. So same articles are getting edited all the time.
Section II: Editor statistics
Parameter 5 – Number of active wikipedians (atleast 5 edits per month)
The number of active editors are editors who do atleast 5 edits in one month.
Malayalam tops the list with 90 active editors. Tamil and Hindi comes in second and third positions with 70 and 46 active editors respectively. From this parameter we can understand that even though most big Indian wikis has more than 20,000 registered users, not even 100 of them are active.
The worst affected wikis are Bishnupriya Manipuri and Nepal Bhasha wikis. It doesn’t have even a single active user. This is the one of the after effect of creation of articles using bots. By doing so we actually blocked many potential users from becoming wiki editors. We need to find some plan to convert a good percentage of newly registered users to active editors.
The alarming part is, for some wikis with huge article count the active user base is very low. Those communities should think about some plans to build the community.
Parameter 6 – Number of highly active wikipedians (atleast 100 edits per month)
Highly active wikipedians are the editors who do at least 100 edits per month. They are the backbone of a wikipedia. Infact we must say that they are people who is running the respective wiki.
Tamil tops the list with 18 highly active users. Malayalam comes second with 16 and hindi comes thrid with 6 highly active users. From the chart you can see that for most of the Indian wikipedias not even 5 editors are highly active. As mentioned before, the worst affected Wikipedias are Nepal Bhasha and Bishnupriya manipuri. The interested users need to come back and build the community to save the project.
This parameter will define the future of wikipedia in the respective language. So it the duty of current highly active users to convert many of the active users to highly active users.
Parameter 7 – Wikipedias who edited atleast 10 times since they arrived
This parameter shows how many of the registered users did actually turned into actual wiki editors and done at least few edits in wiki.
Malayalam tops the list with 564 wikipedians, which means that out of more than 20,000 registered users in Malayalam wikipedia, Malayalam community was able to convert only 564 users into actual wikipedians. Hindi with almost 40,000 registered users got only 559 of them as wiki editors. Tamil comes third.
This parameter can be used for the strategic plans of WMF. We should be able to convert at least one third of registered users to wiki editors.
This parameter shows we need to have some good program to convert many registered users to actual wikipedians. There lies the future of wiki proejct in the respective language. The importance of advocacy programs like Wiki workshops, seminars, exhibitions, wikipedia CD, wiki meetups, participating in various programs, and so on can help to popularize wikipedia among the speakers of the respective language. Some language communities are already doing it. But we need to replicate that success story across various Indic languages.
Conclusion
As mentioned at the beginning of this blog post, my intention is not compare different parameters and find the best Indic language wikipedia. Each language wikipedia is good when we take one or the other parameter. But by comparing different wikipedias you can make your own conclusions.







Nice and exhaustive report ! Thanq Shiju ..
Thank you for the excellent report. It is very useful.
Pingback: Indian language wikipedias – 2010 statistical report | abundance … | Manifest My Desire
Excellent and very detailed work.
I see improvements in creating community and there are more active users than in the past. For example in Sanskrit, Pali and Bhojpuri.
Nepal Bhasa now has very less contributors. Eukesh used to be the great contributor and he did his best to put the Newari wikipedia “Nepal bhasa” into existence. Had he not been there there would not have been such wikipedia today. His contribution in Hindi wikipedia and Nepali wikipedia along with Pali, Bhojpuri, Bisnupriya and used to be significant.
Shiju has truly said “Now there is no community to work on it. ” .
Lets hope for the best. Kudos to everyone who have tried their best to retain their community. Its really a hard job but it should not last long.
User:RajeshPandey
Good report and thanks for the patient work.
Interesting to see growth profile in a year. Hope we can do this every year from now.
Great work Shiju, thanks. I wish there was a way to bring in India-related English Wikipedia statistics.
Building the community is the key. For that Number of Edits and Active Users are important parameters. Malayalam Wiki is punching above its weight relative all Indian Wikipedias. Specially taking into account the population base. Good education attainment in Kerala and the community mindedness of Keralites are key in this regards. Tamil is ok, Hindi is lacking, and Telugu is way behind.
Thank you all for your valuable comments.
As Natkeeran said above, building the community (not increasing the article count) is the key.
Which ever community understand the importance of this factor will succeed. All other communities who are concentrating on increasing the article count without bothering about its readers or building the community will lag behind. When this happens for big Indian language wikipedias that will affect the wikimedia movement in India Itself.
Excellent Report Shiju! Especially for me, it was of great interest as I could compare where my Gujarati (please note, it is not Gujarathi) wikipedia stands among other Indian wikipedia. Keep the good work up.
Thanks Dhaval.
I will correct the spelling of Gujarati from next report onwards. I am unable to correct this now as the tables for all parameters are uploaded as images.
Great work Shiju, thanks. Your report is discussed in Russia. Probably, on its basis someone will make the analysis on Wikipedias in the Russian languages.
…on Wikipedias in languages of the Russian Federation.
Thanks Nokolay. I am very happy to hear that.
Comparison of wikipedias of same language family or same region will benefit each other. It will allow the respective community to strengthen the respective language wikipedias. Other wise we will be in our own world by thinking that our wikipedia is the best wikipedia
We never know a neighboring language wikipedia might be far better and more useful to the readers. So sharing help every one.
I am very happy to hear that this report has even helped Russian wikipedians!!!!
I really want to help in Punjabi Wikipedia from my heart. I love Wikipedia – one of the best! on internet. But the problem is currently I do not have much time coz exams are near in these months. But whenever I’ve time, I will surely give some contribution.
Wikipedia Rocks!
Comparison is encouraging everyone of speaking any language. Thank you for ever.
Great report, Shiju. Finally read it properly. A couple of questions:
*The percentage of bot edits seems pretty high on many of the languages: Marathi, Bengali, Urdu, Kannada for example (higher than the 40-50% balance you say is fine). Any thoughts on how this balance can be achieved in practice?
*The number of highly active wikipedians (with 100+ edits per month) has gone down in many of the group 1 languages: eg Hindi, Telugu, Marathi, Bengali, Urdu + others. As you say, these editors are the backbones of that language wikipedia, so this seems a bit worrying to me. Any thoughts what the issues might be – and how this trend might be reversed?
For both the queries answer is simple.
– Build the community. The wikis of language communities interested in building the community are doing well (for example, Tamil and Malayalam).
– Interact with other language wikimedians and adopt the best practices.
– Consider readers also when designing UI. Provide some tools for them to search for the articles in wiki. A good readership will in turn attract more people to wiki.
Pingback: Malayalam Loves Wikimedia | abundance of the heart
Pingback: Indian Language wikipedia Statistics – 2011 September | abundance of the heart
Its a great language. Just love it.
Pingback: Indic language wikipedias – Statistical report – 2011 | abundance of the heart