Programs Sep, 25 >> TACU >> [ duplicator | cross-entropy | generator | suffsort | trised | xcitata ]

Program generator version 0.9.5

Program generator version 0.9.5

Initial revision 2003-01-19; Last revision 2004-05-31

1  Download
2  File readme
3  Usage and options summary
4  Description
5  Project revision history
6  License

1  Download

Sources: src/generator-0.9.5.tgz [31 Kb ]

Win9x-EXE (minGW cross-compiled): mingw/generator.zip [22 Kb ]

2  File readme

generator --- random text generator by given model text


SUPPORTED ENVIRONMENTS

http://www.gnu.org    GNU/Linux 
http://www.mingw.org  MinGW --- Minimalist GNU For Windows


COMPILATION

Enter make (or gmake) in the directory where sources reside


BRIEF INSTRUCTION

This program can be used for the random text generator given model
text. The call

generator file.txt

will produce random text similar to file.txt to standard output

License conditions are described in file LICENSE.txt


3  Usage and options summary

user@computer$ ./generator --help
Usage: generator [OPTION]... FILE
  -o, --order <num=2>       the maximal order for the model
  -s, --seed <num=1>        seed the model with integer <num>
  -c, --catch-eof <num=1>   stop output on meeting EOF if 1; no stop if 0
  -g, --generator <num=1>   select random number generator 1..3
  -b, --bytes <num>         output <num> bytes (warning: sets -c0)
  -k, --kbytes <num>        output <num>*1024 bytes (warning: sets -c0)
  -r, --randomize           the seed is chosen using the current time
  -n, --naive-sort          use naive sort (decrease memory use but slower)
  -q, --quiet               do not send any messages to stderr
  -h, --help                display this help and exit
  -m, --man                 display complete description
  -v, --version             display version and exit


4  Description

user@computer$ ./generator --man
<Usage information from the previous section is omitted>


This program generates random output using statistics from file
FILE1. It uses Markov model of order <order> defined by switcher
--order (and equal to 2 by default) to output next symbol from current
context of length <order>. If current context does not present in
FILE, then the length of context is decreased and program uses Markov
model of smaller order. Finally it arrives at order 2 and outputs a
randomly-chosen symbol from FILE. If --catch-eof=0 (-c0), then the
program would output -b bytes or -k kilobytes, or will never stop if
-b and -k options were not specified. If --catch-eof=1, then the
program stops as soon as it encounter the context at the end of FILE;
for <order>=0, the program stops with probability 1/(size(FILE)+1) on
each outputted symbol (this way you can produce outputs of size
comparable to size(FILE). The initial context is chosen at random.

You can define initial seed number for random number generator,
randomize it with current time by -r option (in this case the random
number used is outputted to STDERR). Three random number generators
from are available with option -g<num>. All of them were taken from
the book "Numerical Recipes in C, 2nd edition"

-g1 (default) "Minimal" random number generator of Park and Miller
    with Bays-Durham shuffle and added safeguards.

-g2 Long period(>2E18) random number generator of L'Ecuyer with Durham
    shuffle.

-g3 Knuth's random number generator using subtractive method
    "Seminumerical algorithms", 2nd edition., vol. 2 of "The art of
    computer programming", sections 3.2-3.3

We use Larsson-Sadakane sorting algorithm for suffix sort described in
"Faster Suffix Sorting" by N. Jesper Larsson (jesper@cs.lth.se) and
Kunihiko Sadakane (sada@is.s.u-tokyo.ac.jp). It requires 9*size(FILE)
memory. One can reduce memory requirements by switcher -n for naive
suffix sort using system qsort function. In the last case memory
requirements decay to 5*size(FILE), at cost of slowing by factor 4.
However, the system qsort may require a lot of memory, in particular,
in the stack which might lead to errors in sorting


5  Project revision history

Files of the project were modified on the following dates:

2003-01-19

2003-02-08

2003-05-16

2003-05-18

2003-08-27

2004-05-31

6  License

generator

Available at http://www.math.toronto.edu/dkhmelev/PROGS/tacu/

Author:

Dmitry V. Khmelev dkhmelev((at))math.toronto.edu [change ((at)) to @ in order to get proper address - antispam]

University of Toronto, Department of Mathematics, 100 St George Street, M5S 3G3 ON, Canada

LICENSING TERMS

This program is granted free of charge for research and education purposes. However you must obtain a license from the author to use it for commercial purposes.

Scientific results produced using the software provided shall acknowledge the use of generator. The proper reference is:

D. Khmelev, Text Analysis and Conversion Utilities http://www.math.toronto.edu/dkhmelev/PROGS/tacu/

Moreover shall the author of generator be informed about the publication.

The software must not be modified and distributed without prior permission of the author.

By using generator you agree to the licensing terms.

NO WARRANTY

BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ÄS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

1  Download
2  File readme
3  Usage and options summary
4  Description
5  Project revision history
6  License

Programs Sep, 25 >> TACU >> [ duplicator | cross-entropy | generator | suffsort | trised | xcitata ]

- ???????@Mail.ru
© 2002-2005 D.Khmelev -