No major effort is made to update the links given here. I am working towards publishing my browser bookmarks instead (2005-12-13).
Here are some recommended online and offline tools for corpus work.
For a comprehensive, indexed and annotated set of links, go to David Lee's Bookmarks for Corpus Linguists page.
Systematically updated, comprehensive bibliography of (English) corpus linguistics
LLT corpora bibliography (empty? 2005-12-13)
Learner corpora bibliography (Centre for English Corpus Linguistics, Louvain-la-Neuve)
Sabine Reich's Introduction to corpus linguistics - University of Koeln (1998) (still available from the Internet Archive)
Corpus Linguistics - Web supplement to Wilson & McEnery (1996/2001) Corpus Linguistics book
Kerstin Fischer's Corpus Linguistics Course (2001) [links + basic research questions for students]
Catherine N. Ball's page (2001) and Concordances and Corpora Tutorial
Rūta Marcinkevičienë's Systematic Dictionary of Corpus Linguistics
ICAME (International Computer Archive of Modern English; Norwegian Computing Centre for the Humanities - NCCH)
UCREL Home Page (UCREL = the University Centre for Computer Corpus Research on Language, Lancaster University)
PELCRA (Polish-English Language Corpora for Research and Applications)
David Lee's comprehensive, up-to-date, must-see devoted.to\corpora page
Michael Barlow's Corpus Linguistics page (Rice University, Texas, USA): corpora/parallel corpora)
Adam Kilgarriff's homepage (corpling and lexicography, downloadable articles etc)
The British National Corpus (BNC) and its online demo service
The Bank of English (Collins Cobuild) , its concordancer/collocation sampler and WordbanksOnline demo
The new ICAME CD-ROM [written/spoken/historical/tagged/parsed collections (Brown, LOB, FLOB, Frown, Kolhapur, ACE, Wellington Corpus, ICE-EA, London-Lund, Lancaster/IBM, COLT, Helsinki, CEEC, Lampeter, ICAMET, POW)]
WebCorp - great online Web Concordancer (can also drive Google!)
CSLU Speech Corpora (Center for Spoken Language Understanding - free corpora & toolkit)
Danko Sipka's Turbo Lingo (will profile any text)
untagged IPI-PAN / OSU corpus (over 13M words, no Polish diacritics)
Online access to the PICLE corpus (Polish minicorpus to be added soon)
PELCRA (Polish-English Language Corpora for Research and Applications) (downloadable samples)
www.athel.com (software, corpora CD-ROMs, books for linguists and language teachers)
Comet Home Page (Corpus Of Modern English Texts)
Project Gutenberg (a non-copyright text resource, mostly classic litarature) and Ronald Reck's searchable PG database
CETH (The Center for Electronic Texts in the Humanities)
CLR (The Consortium for Lexical Research)
Mike Scott's Web (Liverpool University, UK) (WordSmith Tools)
University of Stuttgart - computational linguistics and corpus exploration (IMS Corpus Workbench / Unix - free)
Barlow's MonoConc (free demo) & ParaConc (free beta for Windows)
WinConcord (free concordancer for Win3.x and later)
ConcApp (free concordancer and vocabulary profiler for Windows)
KWiCFinder (free Web concord)
LDC's online collocation tester (MI & T-score) (limited non-member access)
The TOSCA Research Group for Corpus Linguistics, Nijmegen, Holland (free deep POS tagger for DOS)
INTEX - free mulltilingual analyser
Xerox Research Centre (text analysis researc: on-line demos of tools for various lgs, incl. Polish)
A Xerox site with taggers to experiment with (plus articles for downloading)
Georgetown University Natural Language Processing (including Parser Modularity Demo page)
Generating taggers/lemmatisers. Maintaining lexica (WordManager)
Experimental Part-of-Speech Service (University of Birmingham)
E-mail part-of speech tagger (University of Leeds)
Other sites discussing parsers: 1. http://www.cl.cam.ac.uk./ftp/nltools; http://www.sil.org/pcpatr (Summer Institute of Linguistics, Inc.)
Tim John's DDL (= data-driven learning) page (University of Birmingham)
The Internet Grammar of English (based on ICE-GB Corpus; University College, London)
Language Learning & Technology (special volume 5)
Krajka's IATEFL POLAND COMPUTER SIG JOURNAL (incl. corpora issues)
HLT Magazine (with Michael Rundell's 'Corpora Ideas')
TEACHING AND LANGUAGE CORPORA (TALC'94, 96, 98, 2000) pages
Corpus Linguistics 2001 Conference (Lancaster 2001)
CCAAL 2001 (Challenges in Computer-Assisted Applied Linguistics / PLM33, Poznań)
http://xxx.lanl.gov/cmp-lg/ (papers mainly in computational & mathematical linguistics)
Literary and Linguistic Computing (OUP, abstracts only)
Torbjoern Lager's 'A Logical Approach to Computational Corpus Linguistics'
Excerpts from the book Evolutionary Web Development (Springer London 2001)
ICAME 2002 (Sweden, 22-26 May 2002)
TALC 2002 (Italy, July 2002)
IPI PAN: Lingwistyka komputerowa w Polsce (incl. corpora)
Papers from Fifth Workshop on Computational Natural Language Learning (CoNLL-2001, France)
Book 'Survey of the State of the Art of Human Language Technology' (1997)
On-Line Proceedings of COLING-94 (Fifteenth International Conference on Computational Linguistics)
Richard Chantrill's page (University of Queensland, Australia)
D. Eastment's page (language technology)
Moby project results (frequency lists)
Primer on Computational Lexicology (1992) (159k)
Lexicom: 2001 (workshop in lexicography and lexical computing)
BABEL (multi-language database comprising five of the most widely differing Eastern European languages: Bulgarian, Estonian, Hungarian, Polish and Romanian)
http://www.longman-elt.com (LONGMAN: catalogue; downloadable teaching materials - also http://www.longman.com/northstar
Edinburgh University Press: http://www.eup.ed.ac.uk
Cambridge University Press: http://cup.cam.ac.uk
Quarterly Newsletter of the Contrastive Grammar Research Group of the University of Gent
Cambridge journals on-line: http://www.journals.cup.org
OUP journals: http://elt.oupjournals.org
Continuum books: http://continuumbooks.com
Back to Main Page |
Last update: 2005-12-13