After a long gap of almost 7 months, I have compiled the statistical report for the Indian Language wikipedias.
As all of you know recently I joined the India Programs of WMF to support the Indic language wiki projects. In the past I was interacting with various Indic language wikipedians for various community related and technical things (for example, the FAQ booklet translated to various Indic languages, the typing tool integrated to several Indic wikipedias (now rechristened as Narayam extension and now has the official backing of WMF), Wiki India Newsletter, and so on). Now onwards I will be able to spend more time on the Indic language wiki projects.
The data for this report is taken from http://stats.wikimedia.org/. I thank Erik Zachte for providing me the support for the same.
Due to the long processing time, the report at stats.wikimedia.org is getting generated after one month. Hence in this report I have captured the data for 2011 September (also data for 2011 August is given for comparison).
Unlike the previous reports, now onwards I will be analyzing data only for the languages spoken in India. So from this report I have excluded languages of neighboring countries of India like Sinhala, Burmese, and so on (even though personally I am interested to watch the growth of these language wikipedias also since those languages are closely related to one or the other language spoken in India).
Following are the languages of India I have selected for preparing this report. The number of speakers for each language is given against each language.
NOTE: I have used the Indian way way of denoting large numbers, since that make more sense in India. Others please note, Crore is equal to 10 million, and Lakh is 100,000.
Indian languages having wikipedia

In India, Hindi is the language with most number of native language speakers. There are few more languages with huge speaker base. But when it comes to Wikimedia movement of India, speaker base is not making much impact.
The number of speakers for Punjabi (the language spoken in the Punjab state of India) is often misquoted at many places including the WMF stats. Punjabi language has two variants (Eastern Punjabi and Western Punjabi). According to various Indian statistics reports, the Punjbai language (also called Eastern Punjabi according to en Wikipedia) that uses Gurumukhi Script, is spoken by almost 2.9 crore people. The Western Punjabi (a language spoken by close to 6 crore people in Pakistan) has its own wikipedia (http://pnb.wikipedia.org). I assume the issue is similar to that of Hindi and Urdu where languages are closely related but use different scripts due to various reasons. Since my interest is in the Punjabi wikipedia (http://pa.wikipedia.org) that uses Gurumukhi script, I considered the number of speakers for that language.
Also I found that the Bhojpuri wikipedia (http://bh.wikipedia.org/) still uses the wrong language code (bh), the code that represents the Bihari language family. Bihari (ISO-639-1 bh) is a language family and the Bhojpuri language is just one of the languages in this family (Angika, Fiji Hindi and Maithili are few others). Bhojpuri has the language code bho. So we need to do two things in the case of Bhojpuri wikipedia. 1. update the language code to bho, 2. change the language name to Bhojpuri (instead of Bihari) in wikimedia records.
I have included almost all the important parameters (for which there is required and updated data) in this report. From the next report onwards I will be adding few more relevant parameters. The placement of languages in the all the tables of this report is based on the number of speakers.
Article statistics
Hindi with more than 1 lakh (100,000) articles is on the top. Newari wikipedia with 69,826 articles comes second, and Telugu comes third with 48,803.
In the span of 9 months (from 2011 January) Hindi wikipedia has added more than 40,000 articles. But did the community size increased? See the next few parameters for more information.
Odia and Assamese wikipedias made much progress since my last report. Both the article number and community strength are increased for both. The article number in Punjabi wikipedia is going high.
Edits per article
Edits per article shows the number of times a wikipedia article is edited. More edits for an articles means more people contributed to it and neutrality of the article is also high. For active wikipedias it is a rough indicator of quality. Wiki article will have more encyclopedic value when more people see and edit it.
Among active wikipedias Bengali and Malayalam got maximum edits per article. It is expected, since it has a very active community. (To see the community strength refer the next few tables).
For languages like Kashmiri, Pali the edit per article is high because it has very less number of articles and same articles are getting edited (mostly by bots) every time.
Editor and Reader Statistics
Malayalam and Tamil tops the list with almost 85 active editors. But in Malayalam there is a reduction of 14 active editors from the previous month.
As said before, when it comes to Wikimedia movement, the speaker base is not making much impact.
For example, Sanskrit language with just 50,000 speakers is making huge impact in the Wikimedia world. It has 14 active users, a bigger community than many other big Indic languages. I am impressed by the efforts of Sansskrit wiki projects especially with the sister wiki projects (for example wikisource), their way of interacting and implementing the best practices from other Indic language wikipedias, and so on.
In the past few months I have conducted 3 wiki workshops for Sanskrit. I found each time they are maturing with the vision about the future of Sanskrit wiki projects. In the last Sanskrit wiki workshop the main focus was on defining the category tree for Sanskrit wikipedia.
When I published the report last time, Odia and Assamese wikipedias were inactive. Now the situation is changed. Now we have a community to work on it.
The progress made by Odia wiki project (http://or.wikipedia.org/) is note worthy. Kudos to Odia wiki community for all the online and offline initiatives for building the community and to increase the article count. I was actively involved in the community building for Odia wikipedia. I still remember the day (2011 January 15) when I introducted Odia wikipedia and Odia tying tool (developed by Junaid) to Odia speaker to Ashuthosh Kar during Wiki X celebration at Bangalore. Through him very soon we got a wonderful wikipedian Subhashish who is leading the efforts for Odia now. Initially Subhasish and I used to meet at my home and work on the basic things for Odia wiki. I remember us working on Odia wikipedia logo, FAQ booklet, Translate wiki, and so on. Soon we got more members to the team through the few Odia wiki workshops happened at Bangalore. Along with workshops Odia wikipedians translated the FAQ booklet to Odia and took efforts to integrate the Odia tyoing solution developed by Junaid to Odia wikipedia. Later with the support of Dhanada Mishra (the chairman of Human Development Foundation (http://www.hdf.org.in/)) and a young student Odia wikipedian Srikanth Kedia we had conducted a wiki workshop at Bhubaneshwar. Odia wikipedians from Bangalore are doing an excellent job and now many of them are participating in Wikimedia India chapter activities also.
Odia wiki project picked up not because Odia has got huge speaker base, high literacy, access to computers or any thing else; it become active only because it has receieved the volunteers who has passion and vision of developing a wikipedia in their mother language. We need similar volunteers for each Indic langauge.
The case is similar for Assamese also. I was trying to get a good volunteer for Assamese wikipedia for the past 3 years. Initially I tried to get the volunteers from Bangalore since Bangalore has good representation of Assamese community and it is easy for me to reach people. But that didn’t worked out. Then I tried for online outreach, initially through emails. Finally I got connected with Parabhakar who is a professor at NIT in Silchar, Assam. Together we try to do online outreach first through a google group (it didn’t worked out), then through facebook. A facebook page is created for Assamese wikipedia projects aimed at bringing together all Assamese people who are in Facebook and who are interested in Assamese wiki projects. It has more than 460 members now. Prabhakar used that group and his personal contacts effectively to build a community for Assamese wikipedia. That worked. Assamese wikipedia started becoming slowly active. Later Prabhakar started another Facebook group dedicated to NIT Silchar for promoting Indic language wikipedias among students (and aluminis) of NIT Silchar. Due to all these in the next few months we are going to see more wiki activity from the Assam state of India and in the Assamese wikipedia. Recently Narayam (the typing solution extension) is integrated to Assamese wikipedia. Thanks to the Assamese wikipedians Chaipu, Prabahakar and other volunteers who actively worked to make it a reality. A major roadblock for bringing Assamese people to Assamese wikipedia is removed now.
Assamese wiki community is currrently concentrated on online outreach, but soon they are planning to start offline outreach activties also.
It is intersting to note that the community size of smaller languages is either equal to or even larger than that of much bigger languages. I don’t fully understand this and like to hear your opinion on this. One hypothesis could be that, larger ratio of people in smaller languages are more passionate about their language and there fore are willing to put additional effort to showcase their mother language. Each Indic language wiki project is waiting for the few users who has vision and passion about the future of the respective wiki project.
The technical issues and the other road blocks for smaller languages are more.
Number of highly active wikipedians (more than 100 edits per month)
Highly active wikipedians are the editors who do at least 100 edits per month. In fact we must say that they are people who are running the respective language wiki.
Here also Malayalam and Tamil tops the list. In fact if we have more high active editors you will be able to see more activities (not just article creation) coming out of that wiki community. Due to this you can see that offline project, photo events, article writing contest, community quiz, collaborating with respective state government, photo contest, wiki workshops, and many other innovative wiki projects are coming out from these two wiki communities. So ideally we should be able to convert more active editors to highly-active editors to make the wiki activism in each language wikipedia more vibrant.
Registered users who edited at least 10 times since they arrived
This parameter shows how many of the registered users did actually turned into actual wiki editors and done at least few edits in wiki.
Even though many big languages has more number of registered users, still Malayalam continues to be on the top. Hindi and Tamil comes second and third. I wish all wiki communities be able to convert more registered users into active wiki editors.
Newly registered users who edited atleast 10 times
This parameter is a subset of the preceding table. It shows how many of the newly registered users turned into wiki editors. Tamil wiki community is leading here. Among smaller communities, Assamese is also doing well (due to the reasons I told else where).
Page Views – Non Mobile (In Lakhs)
This parameter shows how many readers are waiting for us. Are we caring for them? The following table give the data for Non-mobile (mainly PC).
Even though Hindi lags behind in some of the wikipedia editing/editor matrices, the speakers of Hindi are not lagging behind in using Hindi wikipedia. Hindi with 77 lakh page views tops the list. No other Indic langage is near Hindi. And this is expected considering the huge speaker base of Hindi.
Even for the inactive wikipedias like Sindhi, Kashmiri, Pali, and so on, we have thousands of people accessing it every month. But do we have enough content to offer for these readers? We need to build community for all these languages to serve our readers. In fact we should be converting some these readers into wikipedians of the respective language wikipedias.
Page Views – Mobile
This parameter shows how many are accessing each language wikipedia using mobile. Unlike the non-mobile data, this data is showing an upward trend for all the Indic languages. Eventhough rendering of indic scripts is not good in most of the mobile devices, many users are accessing it. This also shows from where our future readers are going to come.
Here again Hindi comes first. Most of the other Indic languages are increasing its mobile reader base. For some lamguages the growth is more than 100% than the previous month. I assume this is going to increase in future and it is our duty to welcome all these new readers and convert some of them into editors.
Conclusion
In short, in terms of readers most of the Indian languages are doing good. But when it comes to editors that is not the case. One hypothesis could be that we are more knowledge consumers than knowledge creators. Another – and a probably more valid one – is that there are large gaps in basic awareness of the existence of Indic language wikipedias, relatively lower use of Indic language on the Internet, (expected) lack of familiarity with wiki editing, technical issues with regards to Indic languages, tiny community sizes, and many other things.
As wikipedians can we change this scenario?
For wikimedia India we have lot of things to do. In terms of building the community, overcoming the technical challeges, creating awareness about wikipedia (more inportant is creating awareness about Indic language wikipedia), and so on. The challenges are many – but the opportunities are massive. I look forward to working closely with the various language communities on realizing the enormous potential of their respective languages.
Once again, I welcome your views and comments and opinions on the above. Please express your views as comment here. You can also reach me at shiju@wikimedia.org in case you want to send a personal mail. Thanks for reading.
Was waiting this one for long time
Newari Wikipedia has the same number of articles even after one month? Or, is there a flaw in the article count?
Even now Newari Wikipedia has only 69,827 articles. http://new.wikipedia.org/wiki/Special:Statistics
Which means there is no active community there.
Feeling nice to see those words for odia wiki. Thanks for the information.
feeling nice to see those words for Odia wiki and its and the members. Thank You for the Information
Increase in mobile traffic is the most notable thing among this. Should focus on mobile from here and provide editing tools to mobile users at the earliest.
yes, kindly request the wiki admin to provide mobile editor
Very informative analysis.
Thanks
Great work, Shiju. Keep it up.
nice work to sum up the statics shiju……..
Interesting analysis as always. Literacy rate (by State), and Internet availability or penetration (by State) would be helpful in further understanding the situation. Sanskrit community is mostly an academic community, thus I am not surprised by their high involvement in Wikimedia projects, specially Wiki Sources.
Malayam Wiki is by far doing things right. They should definitely share their experiences, best practices, and strategies. I would have expected Marathi and Bengali to do better. Marathi because of urbanized population and Bengali because of their language movement.
//Literacy rate (by State), and Internet availability or penetration (by State) would be helpful in further understanding the situation. //
Infact to analyze the situation better we need the Literacy rate and Internet availability information by language is needed. But unfortunately I found such a data is not available (at least not publicly available)
//Sanskrit community is mostly an academic community, thus I am not surprised by their high involvement in Wikimedia projects, specially Wiki Sources. //
yeah, there is much happening there (and will happen more in future)
//Malayalam Wiki is by far doing things right. They should definitely share their experiences, best practices, and strategies. //
In fact in the past I had done this couple of times. There is no secret strategy. Build community, build community, build community…. That is the only strategy. When we have good community new ideas will flow, new wiki projects will be initiated, and lot of out of the box things will happen.
I would have expected Marathi and Bengali to do better. Marathi because of urbanized population and Bengali because of their language movement.
I stayed in Pune for more than 3.5 years. In general I feel among urban Maharashtraians, Hindi is getting priority over Marathi. The presence of bollywood industry might be one of the reason for this. I have seen many things in similar lines. Adding to that is an earlier decision to use Devanagri script instead of Modi script. With that decision Marathi language lost its uniqueness in writing.
Bengali is not picking up in India mainly because of the awareness issue, I guess.
Very nice analysis, Shiju. Thanks for the stats.
One thing that always bothers me is the route some smaller languages take. For example, there are a few languages in your lists with 50k+ article count. But if you go to the edits per article, you’ll see less than 7 or so edits/article. That’s no surprise since these wikis are mostly written by bots using databases and scripts. Nepal bhasha and Bishnupriya Manipuri wiki have taken this approach. I understand the reason behind this …. there are only a few active editors (Sometimes just 1), and only a small amount of content available digitally / online, so the bot generated content is probably the only way to go for them. But I feel that the rush to increase article count will eventually harm these wikipedias in the long run.
Just for reference and comparison, English-related information could also have been added in the tables.
thanks for this informative statistics.
in Kannada wikipedia, the participation is less. unfortunately certain articles are left in attended for editing for a month or so. one article on jagijit sing is still unattended.
i am not having sound knowledge. lots of support is needed for me. But I have the flair for work !
Pingback: Indian wikipedia | Sharpaxeconsul
Pingback: Wikipedia in Indic Languages: 32.95 million Pageviews in Aug 2011; Online vs Mobile - MediaNama
Pingback: Wikipedia in Indic Languages: 32.95 million Pageviews in Aug 2011; Online vs Mobile | Indian Media Magazine - Media, Television, Movie and Celebrity News
Excellent !
Thank you for the time consuming , intelligent work.
Pingback: Wikepedia faq | Tribalgamingin
Wow! What a nice article to see? So crisp and clear! You even motivated me to get back to Malayalam Wikipedia. Thanks much for all the good info, Shiju!
Pingback: Multi-Lingual Web – Challenges & Opportunities | BG Mahesh | Mahesh | mahesh.com
Thanks for your comprehensive statistics. Really a hard work done.
Great Statistics. You work well appreciated
It is interesting to note that W.Punjabi Wikipedia (pnb.wikipedia.org) is not included.