Programs Sep, 25 >> Markup >> [ libru | tunique ]

Program tunique version 0.0.8

Program tunique version 0.0.8

Initial revision 2003-12-06; Last revision 2004-05-31

1  Download
2  File readme
3  Usage and options summary
4  Description
5  Project revision history
6  License

1  Download

Sources: src/tunique-0.0.8.tgz [45 Kb ]

Win9x-EXE (minGW cross-compiled): mingw/tunique.zip [127 Kb ]

2  File readme

tunique --- make Tags UNIQUE.


SUPPORTED ENVIRONMENTS

GNU/Linux/TMT 

http://www.gnu.org    GNU/Linux 
TMT --- Text Mining Toolkit by Dr W.J.Teahan


COMPILATION

Enter make (or gmake) in the directory where sources reside


BRIEF INSTRUCTION



CONFUSION MODEL DESCRIPTION

1. Simple Rules

The model consists of a sequence of rules. The most general form of
the rule is

   source_text_format:[code_length] markup_text_format;

The argument source_text_format is the format of the original text
begin corrected; markup_text_format is the format of the text it will
be corrected to; and code_length is the cost in bits of making that
correction when the text is being marked up.

Meaning of source text formatting characters:

   %% - this will match the % (percentage) character.
   
   %m - this will match when the predicting model has the same model
   number as the corresponding one specified in the argument list.

   %w - wildcard symbol: this will match the current symbol in the
   context.

   %[..] - range symbols: this will match any symbol specified between
   the square brackets. Example: %[aeiou] matches vowels.


Meaning of markup text formatting characters:

   %% - the % (percentage) character is inserted into the markup text.

   %w - wildcard symbol: this will insert the matching symbol from the
   context into the markup text.

   %r - this will insert the matching range symbol from the context
   into the markup text.

   %[..] - range symbols: this will generate markup texts for all the
   symbols specified between the square brackets.

   example: %[aeiou] generates five distinct markup texts for each

   %$ - the sentinel symbol is inserted into the markup text.

   %_ - insert the next symbol into the marked up text but do not
   encode it (or update the context)

   %.  - insert the next symbol into the marked up text, update the
   context (but do not encode it)

In script file you can use # for one-line characters

Examples:
-------------
1:a; 
# This generates a single markup that replaces the character 1 with
# the letter a.

-------------
"%w":"%w "; 
# This generates a single markup that inserts an extra space after
# each symbol.

-------------
"%[xy]":"%r%[abc]");
#This generates the following markup corrections:
#e.g.    "x" generates "xa", "xb" and "xc"
#        "y" generates "ya", "yb" and "yc"

-------------
"x":[1.0]"x%[abc]");
"y":[0.5]"y%[abc]");
#This generates the following markup corrections:
#e.g. "x" generates "xa", "xb" and "xc"; with 1 bit correction codelength.
#     "y" generates "ya", "yb" and "yc"; with 0.5 bit correction codelength.



2. Metacodes 

2a. Symbols with codes >=256 can be inserted into markup using
\DIGITS\ format. Example:

\256\:%[\257\\258\a];
# Code 256 is converted to either code 257, or code 258 or letter 'a'

You can not use format \DIGITS\ to insert the following special
symbols with codes <256: ':', ';', '"', '%', '[', ']'. All other codes
like \0\ are acceptable.

2b. Backslash can also be used to insert some special symbols. These
symbols are: '\a', '\b', '\f', '\n', '\r', '\t', '\v', '\ ' (for
space), '\\', '\"'. 


3. String concatenation and echoing of spaces symbols



License conditions are described in file LICENSE.txt


3  Usage and options summary

user@computer$ ./tunique --help
Usage: tunique [OPTION]... INPUT
Markup INPUT according to MODEL, using confusion scheme CONFUSION

Mandatory arguments to long options are mandatory for short options too.
Default values are given in squared brackets
  -m, --model=MODEL         Set model file name to MODEL (obligatory)
  -f, --file=CONFUSION      Input confusion scheme from CONFUSION
  -o, --output=OUTFILE      Output file [INPUT.tun]
  -p, --M-max-order=O       Model maximal order [5]
  -w, --w2w-rule-disable    Do not include rule '%w:%w;' by default

  -q, --quiet               do not send any messages to stderr
  -h, --help                display this help and exit
  -d, --description         display complete description
  -v, --version             display version and exit


4  Description

user@computer$ ./tunique --description
<Usage information from the previous section is omitted>


EXIT CODES

The program exits with code 0 only in the case it has done
what it was meant to do. In all other cases the exit code
is non-zero. For example, --help option exits with non-zero code
ERROR_HELP. The errors related to any interaction with the operating
system have codes >=16. In future versions more exit codes
can be added, but the current exit codes will remain as they are.
  THE TABLE OF EXIT CODES:
   0 ERROR_OK
   1 ERROR_VERSION
   2 ERROR_HELP
   3 ERROR_DESCRIPTION
   4 ERROR_NO_ARGUMENT
   5 ERROR_WRONG_ARGUMENT
   6 ERROR_CONFUSION_SCHEME
  16 ERROR_SERIOUS
  16 ERROR_INTERNAL
  17 ERROR_MALLOC
  18 ERROR_STAT_FILE
  19 ERROR_EMPTY_FILE
  20 ERROR_OPEN_FILE_READ
  21 ERROR_OPEN_FILE_WRITE
  22 ERROR_OPEN_FILE_TMP
  23 ERROR_WRITE_FILE
  24 ERROR_TMT


5  Project revision history

Files of the project were modified on the following dates:

2003-12-06

2003-12-07

2003-12-08

2003-12-09

2003-12-15

2004-05-31

6  License

tunique

Available at http://www.math.toronto.edu/dkhmelev/PROGS/

Author:

Dmitry V. Khmelev dkhmelev((at))math.toronto.edu [change ((at)) to @ in order to get proper address - antispam]

University of Toronto, Department of Mathematics, 100 St George Street, M5S 3G3 ON, Canada

LICENSING TERMS

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. You should obtain GNU GPL with file COPYING in this distribution.

Scientific results produced using the software provided shall acknowledge the use of this software. The proper reference is:

D. Khmelev, http://www.math.toronto.edu/dkhmelev/PROGS/

Moreover shall the author of the software be informed about the publication.

By using this program you agree to the licensing terms.

NO WARRANTY

BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ÄS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

1  Download
2  File readme
3  Usage and options summary
4  Description
5  Project revision history
6  License

Programs Sep, 25 >> Markup >> [ libru | tunique ]

- ???????@Mail.ru
© 2002-2005 D.Khmelev -