Date: Mon, 8 Jul 2002 08:50:32 -0400 (EDT)

From: Ronald Reck <>


Subject: [Corpora-List] string frequency reports for Project Gutenberg texts


Hello all,


I have created string frequency

reports for 5400+ books (400M words)

from Project Gutenberg:


they are searchable here:


the process is described briefly here with links to

all the src in CVS:


I am looking for help in improving

these graphs of string frequency histograms across the archive

when they are rendered in SVG:


I merged some of the results into an SVG:

(its worth the plugin hassle)


I also extended the DAML ontology for PG presented here:


and created RDF metadata for the archive here:


the meta data is loaded into a specialty rdf backend called

Parka. this example query shows how to get RF values for an

author's use of certain strings:


Comments, and criticisms are very appreciated,

(I know the png graphs arent labeled well, all will get fixed

in the SVG s.)




Ronald P. Reck