PlEWiC (Polish Language Errors from Wikipedia) was created in the result of automatic extraction of language error from Polish Wikipedia edition history. The method is described in:

  • Roman Grundkiewicz, Automatic Extraction of Polish Language Errors from Text Edition History, Proceedings of the 16th International Conference on Text, Speech and Dialogue TSD 2013, Springer, LNCS, pages 129--136, Czech, September 2013 pdf bib

Corpus contains above 1.53 mln sentences and about 1.71 mln naturally-occuring language error examples. Presentation describing the corpus is available at:


The first version of the PlEWiC is publicly available in YAML format: