Date: Mon, 8 Jul 2002 08:50:32 -0400 (EDT)

From: Ronald Reck <rreck@iama.rrecktek.com>

To: CORPORA@HD.UIB.NO

Subject: [Corpora-List] string frequency reports for Project Gutenberg texts

 

Hello all,

 

I have created string frequency

reports for 5400+ books (400M words)

from Project Gutenberg:

http://iama.rrecktek.com/text/frequency/

 

they are searchable here:

http://iama.rrecktek.com/cgi-bin/apps/wordfind/searchpg.pl

 

the process is described briefly here with links to

all the src in CVS:

http://iama.rrecktek.com/text/

 

I am looking for help in improving

these graphs of string frequency histograms across the archive

when they are rendered in SVG:

http://iama.rrecktek.com/text/frequency/words/seeall.html

 

I merged some of the results into an SVG:

(its worth the plugin hassle)

http://iama.rrecktek.com/~rreck/samplesvg

 

I also extended the DAML ontology for PG presented here:

http://www.daml.org/ontologies/113

 

and created RDF metadata for the archive here:

http://iama.rrecktek.com/text/frequency/meta/

 

the meta data is loaded into a specialty rdf backend called

Parka. this example query shows how to get RF values for an

author's use of certain strings:

http://iama.rrecktek.com/cgi-bin/apps/parka/parka.pl

 

Comments, and criticisms are very appreciated,

(I know the png graphs arent labeled well, all will get fixed

in the SVG s.)

 

 

----

Ronald P. Reck                          rreck@iama.rrecktek.com