|
Program maxlcp version 0.0.9
Program maxlcp version 0.0.9
Initial revision 2003-08-20; Last revision 2004-05-31
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
1 Download
Sources: src/maxlcp-0.0.9.tgz [31 Kb ]
Win9x-EXE (minGW cross-compiled): mingw/maxlcp.zip [18 Kb ]
2 File readme
maxlcp --- locating maximal repeated substring
SUPPORTED ENVIRONMENTS
http://www.gnu.org GNU/Linux
http://www.mingw.org MinGW --- Minimalist GNU For Windows
COMPILATION
Enter make (or gmake) in the directory where sources reside
BRIEF INSTRUCTION
This program simply outputs the coordinates for longest repeated
substring. Program uses with suffix arrays produced by suffsort
program. The algorithm is naive and, has O(N*N) worst case time
complexity.
It also finds interesting application in compressor compress.pl
(*included in the sources tarball*): a simple funny compressor, which
can be used to compress LARGE, seemingly UNCOMPRESSABLE FILE. The idea
is to find a long *repeated* substring in the FILE, to wipe out one of
the occurencies of repeated substring and to make decompressor with
head/tail commands for restoring the source file.
License conditions are described in file LICENSE.txt
3 Usage and options summary
user@computer$ ./maxlcp --help
Usage: maxlcp [OPTIONS] FILE
Input files: FILE FILE.ary
Output: prints the longest pair of common prefixes
-a <name>, or take suffix array for FILE from <name>
--suffix-array <name>,
-q, --quiet do not send any messages to stderr
-h, --help display this help and exit
-m, --man display complete description
-v, --version display version and exit
4 Description
user@computer$ ./maxlcp --man
<Usage information from the previous section is omitted>
On input program requires the FILE and its suffix array FILE.ary
Suffix array should be constructed using suffsort program from
TACU (Text Analysis and Conversion Utilities) by D.Khmelev
(http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html)
or by sary program.
Example of use:
suffsort FILE # produces suffix array in FILE.ary
maxlcp -q FILE # outputs 3 numbers
The program outputs three numbers l,m,n: the length of longest repeated
substring, and beginning positions (1-based) of two repetitions of this
substring.
Using this three numbers you can output repeated fragment of the FILE
by command
tail -c+<m> FILE|head -c<l>
(substitute numerical values instead of <m> and <l>). You can verify
that the outputted fragment is indeed repeated by command
tail -c+<n> FILE|head -c<l>
(notice <n> instead of <m>).
Also, maxlcp outputs the record values to stderr (this can be
suppressed with -q option).
5 Project revision history
Files of the project were modified on the following dates:
2003-08-20
2003-08-23
2003-08-24
2004-05-31
6 License
maxlcp - locating maximal repeated substring
Available at http://www.math.toronto.edu/dkhmelev/PROGS/
Author:
Dmitry V. Khmelev
dkhmelev((at))math.toronto.edu
[change ((at)) to @ in order to get proper address - antispam]
University of Toronto,
Department of Mathematics,
100 St George Street,
M5S 3G3 ON,
Canada
LICENSING TERMS
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. You should obtain GNU GPL with
file COPYING in this distribution.
Scientific results produced using the software provided shall
acknowledge the use of this software. The proper reference is:
D. Khmelev,
http://www.math.toronto.edu/dkhmelev/PROGS/
Moreover shall the author of the software should be informed about
the publication.
By using this program you agree to the licensing terms.
NO WARRANTY
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT
WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
PARTIES PROVIDE THE PROGRAM ÄS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO
LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
|