Programs Sep, 25 >> TACU >> [ duplicator | cross-entropy | generator | suffsort | trised | xcitata ]

Text Analysis and Conversion Utilities (TACU)

Text Analysis and Conversion Utilities (TACU)

Dmitry V. Khmelev

WARNING! I have recently noticed that some Win9X executables of programs below crash. I should admit that they actually were cross-compiled from GNU/Linux system, so the problem seems to appear from this step. At the moment I suggest Win9X users to download and install MinGW system http://www.mingw.org, download sources, run to MinGW shell, untargz sources and do make, like

gunzip a-0.0.0.tar.gz
tar xvf a-0.0.0.tar
cd a-0.0.0
make -f makefile.w32

I shall work over this program and shall try to make Win9X executables running in two-three weeks. Don't hesitate to contact me if something does not work. Sorry for inconvenience. Dima Khmelev, 2003-08-16.

This package of six programs, available via navigation forms a basic toolkit for analysis of large text collection for:

  • location of duplicates (duplicator),
  • calculation of relative entropy and R-measure for classification (cross-entropy),
  • generation of random text using model text (generator),
  • construction of suffix array (suffsort),
  • TRansformation Invertible Stream EDitor (trised),
  • location and navigation through cross-citations (xcitata).

Of all these programs only program trised is under GNU GPL v2 or higher, while the others are available for scientific and educational purposes only, and a written permission from the author is required for commercial use. All programs use command line interface and I hope they would be valuable addition to standard Unix commands for analysis and translation of the text: grep, head, tail, sed, tr.

When you use these programs, please refer to the author:

D. Khmelev, Text Analysis and Conversion Utilities http://www.math.toronto.edu/dkhmelev/PROGS/tacu/

The author thanks Dr W.J.Teahan for valuable comments.

Programs Sep, 25 >> TACU >> [ duplicator | cross-entropy | generator | suffsort | trised | xcitata ]

- ???????@Mail.ru
© 2002-2005 D.Khmelev -