Submissions/Fixing grammar errors semi-automatically

This is an accepted submission for Wikimania 2014.

Submission no. 5022
Title of the submission
Fixing grammar errors semi-automatically
Type of submission (discussion, hot seat, panel, presentation, tutorial, workshop)
presentation
Author of the submission
Daniel Naber, Marcin Miłkowski
E-mail address
daniel.naber@languagetool.org, Marcin.Milkowski@ifispan.waw.pl
Username
dnaber/dnaber, Milek_pl
Country of origin
Germany, Poland
Affiliation, if any (organisation, company etc.)
https://languagetool.org
Personal homepage or blog
http://www.danielnaber.de, http://marcinmilkowski.pl
Abstract (at least 300 words to describe your proposal)

To improve the quality of text on Wikipedia, we developed a system that scans all Wikipedia edits for style and grammar errors. Anyone can correct the errors, often without any editing but just with some clicks. The software fetches the Atom feed of changes at least once a minute and runs LanguageTool on the edited paragraphs to find errors that have been introduced with that edit. LanguageTool is our Open Source style and grammar checker software that supports many languages, including English, French, German, and Polish.

LanguageTool detects problems that a common spell checker won't detect. Typical errors it detects include:

  • missing possessive apostrophes: "Download software from the teachers computer" instead of "Download software from the teacher's computer"
  • agreement errors: "He has two brother" instead of "He has two brothers"
  • a vs. an, e.g. "a Indian film" instead of "an Indian film"
  • missing space after a sentence period

The basic approach for finding errors is to search the text for patterns of known errors. Many of the patterns are quite simple and all patterns are independent of each other. Thus LanguageTool can easily be extended to detect new kinds of potential problems, also ones specific to Wikipedia. For example, the German rules of LanguageTool detect weasel words like "many people say", which are not wrong, but usually not appropriate for Wikipedia. The presentation will give a brief introduction on how to write new error detection rules. It will also explain the reasons for false alarms, some of which are due to bugs, some of which are due to the way we extract text from the Wikipedia.

Our wish list for the future contains more Wikipedia-specific error detection rules and closer integration into MediaWiki, for example integration into the Visual Editor. The presentation will provide some ideas on how this could be achieved.

The Recent Changes check tool is available at http://tools.wmflabs.org/languagetool/feedMatches. LanguageTool is available at https://www.languagetool.org for online use and download.

Track
Technology, Interface & Infrastructure
Length of session (if other than 30 minutes, specify how long)
30 minutes
Will you attend Wikimania if your submission is not accepted?
yes (Daniel), not yet decided (Marcin)
Slides or further information (optional)
Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with a hash and four tildes. (# ~~~~).

  1. Steven (WMF) (talk) 21:25, 26 March 2014 (UTC)[reply]
  2. OwenBlacker (talk) 15:27, 6 April 2014 (UTC)[reply]
  3. Ocaasi (talk) 23:06, 6 April 2014 (UTC)[reply]
  4. Mervat Salman (talk) 21:03, 24 July 2014 (UTC)[reply]
  5. VIGNERON (talk) 19:19, 7 August 2014 (UTC)[reply]
  6. Add your username here.