|
Text Analysis and Conversion Utilities (TACU)
Text Analysis and Conversion Utilities (TACU)
Dmitry V. Khmelev
WARNING! I have recently noticed that some Win9X executables of
programs below crash. I should admit that they actually were
cross-compiled from GNU/Linux system, so the problem seems to appear
from this step. At the moment I suggest Win9X users to download and
install MinGW system http://www.mingw.org, download sources, run
to MinGW shell, untargz sources and do make, like
gunzip a-0.0.0.tar.gz
tar xvf a-0.0.0.tar
cd a-0.0.0
make -f makefile.w32
I shall work over this program and shall try to make Win9X executables
running in two-three weeks. Don't hesitate to contact me if something
does not work. Sorry for inconvenience. Dima Khmelev, 2003-08-16.
This package of six programs, available via navigation forms a basic
toolkit for analysis of large text collection for:
- location of duplicates (duplicator),
- calculation of relative entropy and R-measure for
classification (cross-entropy),
- generation of random text using model text
(generator),
- construction of suffix array (suffsort),
- TRansformation Invertible Stream EDitor (trised),
- location and navigation through cross-citations (xcitata).
Of all these programs only program trised is under GNU
GPL v2 or higher, while the others are available for scientific and
educational purposes only, and a written permission from the author is
required for commercial use. All programs use command line interface
and I hope they would be valuable addition to standard Unix commands
for analysis and translation of the text: grep, head,
tail, sed, tr.
When you use these programs, please refer to the author:
D. Khmelev, Text Analysis and Conversion Utilities
http://www.math.toronto.edu/dkhmelev/PROGS/tacu/
The author thanks Dr W.J.Teahan for valuable comments.
|