|
Program tunique version 0.0.8
Program tunique version 0.0.8
Initial revision 2003-12-06; Last revision 2004-05-31
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
1 Download
Sources: src/tunique-0.0.8.tgz [45 Kb ]
Win9x-EXE (minGW cross-compiled): mingw/tunique.zip [127 Kb ]
2 File readme
tunique --- make Tags UNIQUE.
SUPPORTED ENVIRONMENTS
GNU/Linux/TMT
http://www.gnu.org GNU/Linux
TMT --- Text Mining Toolkit by Dr W.J.Teahan
COMPILATION
Enter make (or gmake) in the directory where sources reside
BRIEF INSTRUCTION
CONFUSION MODEL DESCRIPTION
1. Simple Rules
The model consists of a sequence of rules. The most general form of
the rule is
source_text_format:[code_length] markup_text_format;
The argument source_text_format is the format of the original text
begin corrected; markup_text_format is the format of the text it will
be corrected to; and code_length is the cost in bits of making that
correction when the text is being marked up.
Meaning of source text formatting characters:
%% - this will match the % (percentage) character.
%m - this will match when the predicting model has the same model
number as the corresponding one specified in the argument list.
%w - wildcard symbol: this will match the current symbol in the
context.
%[..] - range symbols: this will match any symbol specified between
the square brackets. Example: %[aeiou] matches vowels.
Meaning of markup text formatting characters:
%% - the % (percentage) character is inserted into the markup text.
%w - wildcard symbol: this will insert the matching symbol from the
context into the markup text.
%r - this will insert the matching range symbol from the context
into the markup text.
%[..] - range symbols: this will generate markup texts for all the
symbols specified between the square brackets.
example: %[aeiou] generates five distinct markup texts for each
%$ - the sentinel symbol is inserted into the markup text.
%_ - insert the next symbol into the marked up text but do not
encode it (or update the context)
%. - insert the next symbol into the marked up text, update the
context (but do not encode it)
In script file you can use # for one-line characters
Examples:
-------------
1:a;
# This generates a single markup that replaces the character 1 with
# the letter a.
-------------
"%w":"%w ";
# This generates a single markup that inserts an extra space after
# each symbol.
-------------
"%[xy]":"%r%[abc]");
#This generates the following markup corrections:
#e.g. "x" generates "xa", "xb" and "xc"
# "y" generates "ya", "yb" and "yc"
-------------
"x":[1.0]"x%[abc]");
"y":[0.5]"y%[abc]");
#This generates the following markup corrections:
#e.g. "x" generates "xa", "xb" and "xc"; with 1 bit correction codelength.
# "y" generates "ya", "yb" and "yc"; with 0.5 bit correction codelength.
2. Metacodes
2a. Symbols with codes >=256 can be inserted into markup using
\DIGITS\ format. Example:
\256\:%[\257\\258\a];
# Code 256 is converted to either code 257, or code 258 or letter 'a'
You can not use format \DIGITS\ to insert the following special
symbols with codes <256: ':', ';', '"', '%', '[', ']'. All other codes
like \0\ are acceptable.
2b. Backslash can also be used to insert some special symbols. These
symbols are: '\a', '\b', '\f', '\n', '\r', '\t', '\v', '\ ' (for
space), '\\', '\"'.
3. String concatenation and echoing of spaces symbols
License conditions are described in file LICENSE.txt
3 Usage and options summary
user@computer$ ./tunique --help
Usage: tunique [OPTION]... INPUT
Markup INPUT according to MODEL, using confusion scheme CONFUSION
Mandatory arguments to long options are mandatory for short options too.
Default values are given in squared brackets
-m, --model=MODEL Set model file name to MODEL (obligatory)
-f, --file=CONFUSION Input confusion scheme from CONFUSION
-o, --output=OUTFILE Output file [INPUT.tun]
-p, --M-max-order=O Model maximal order [5]
-w, --w2w-rule-disable Do not include rule '%w:%w;' by default
-q, --quiet do not send any messages to stderr
-h, --help display this help and exit
-d, --description display complete description
-v, --version display version and exit
4 Description
user@computer$ ./tunique --description
<Usage information from the previous section is omitted>
EXIT CODES
The program exits with code 0 only in the case it has done
what it was meant to do. In all other cases the exit code
is non-zero. For example, --help option exits with non-zero code
ERROR_HELP. The errors related to any interaction with the operating
system have codes >=16. In future versions more exit codes
can be added, but the current exit codes will remain as they are.
THE TABLE OF EXIT CODES:
0 ERROR_OK
1 ERROR_VERSION
2 ERROR_HELP
3 ERROR_DESCRIPTION
4 ERROR_NO_ARGUMENT
5 ERROR_WRONG_ARGUMENT
6 ERROR_CONFUSION_SCHEME
16 ERROR_SERIOUS
16 ERROR_INTERNAL
17 ERROR_MALLOC
18 ERROR_STAT_FILE
19 ERROR_EMPTY_FILE
20 ERROR_OPEN_FILE_READ
21 ERROR_OPEN_FILE_WRITE
22 ERROR_OPEN_FILE_TMP
23 ERROR_WRITE_FILE
24 ERROR_TMT
5 Project revision history
Files of the project were modified on the following dates:
2003-12-06
2003-12-07
2003-12-08
2003-12-09
2003-12-15
2004-05-31
6 License
tunique
Available at http://www.math.toronto.edu/dkhmelev/PROGS/
Author:
Dmitry V. Khmelev
dkhmelev((at))math.toronto.edu
[change ((at)) to @ in order to get proper address - antispam]
University of Toronto,
Department of Mathematics,
100 St George Street,
M5S 3G3 ON,
Canada
LICENSING TERMS
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. You should obtain GNU GPL with
file COPYING in this distribution.
Scientific results produced using the software provided shall
acknowledge the use of this software. The proper reference is:
D. Khmelev,
http://www.math.toronto.edu/dkhmelev/PROGS/
Moreover shall the author of the software be informed about
the publication.
By using this program you agree to the licensing terms.
NO WARRANTY
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT
WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
PARTIES PROVIDE THE PROGRAM ÄS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO
LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
|