|
Program lcp version 0.1.9
Program lcp version 0.1.9
Initial revision 2003-08-23; Last revision 2004-05-31
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
1 Download
Sources: src/lcp-0.1.9.tgz [361 Kb ]
Win9x-EXE (minGW cross-compiled): mingw/lcp.zip [22 Kb ]
2 File readme
lcp --- Longest-Common-Prefix construction from Suffix Arrays
SUPPORTED ENVIRONMENTS
http://www.gnu.org GNU/Linux
http://www.mingw.org MinGW --- Minimalist GNU For Windows
COMPILATION
Enter make (or gmake) in the directory where sources reside
BRIEF INSTRUCTION
See program help
lcp --man
License conditions are described in file LICENSE.txt
3 Usage and options summary
user@computer$ ./lcp --help
USAGE: lcp [OPTIONS] FILE
INPUT files: FILE FILE.ary
OUTPUT: FILE.lcp with information on common prefixes
Mandatory arguments to long options are mandatory for short options too.
GENERAL OPTIONS:
-a, --suffix-array=<name> take suffix array for FILE from <name>
-o, --lcp-array=<lcpname> output lcp array to file <name>
LCP RECONSTRUCTION ALGORITHM
-k, --Kasai-Lee-alg Kasai-Lee (cf [1] in `lcp -m`)
-r<num>, or resilience(use optional parameter
--resilience-alg <num> <num=1> to decrease memory use)
-s, --synthesis-alg (default) synthesis
-n, --naive-alg naive
OUTPUT OPTIONS
-t, --stdout output result to stdout
-d, --digital print lcp's as digital numbers separated by newline
-c, --clocks output clocks of the program (depends on algorithm)
-i, --information output in three lines (max. lcp, sum of lcp, avg.lcp)
INFORMATION
-q, --quiet stop messages to stderr and shorten other messages
-h, --help display this help and exit
-m, --man display complete description
-v, --version display version and exit
4 Description
user@computer$ ./lcp --man
<Usage information from the previous section is omitted>
DESCRIPTION
On input program requires the FILE and its suffix array FILE.ary
Suffix array should be constructed using suffsort program from
TACU (Text Analysis and Conversion Utilities) by D.Khmelev
(http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html)
or by sary program.
Example of use:
suffsort FILE # produces suffix array in FILE.ary
lcp FILE # produces FILE.lcp from FILE.ary
Suppose that length(FILE)=size. Denote suffix array by SA[0..size-1].
The lcp[0..size-2] array is defined by relation
lcp[i]=LCP(FILE[SA[i]..size-1],FILE[SA[i+1]..size-1])
This program realizes four algorithms for lcp[] construction from the
suffix array, which can be selected by switches:
Naive algorithm (-n --naive-alg)
Straightforward computing with potential O(size*size) time complexity.
Consumed memory: (12+1)*size bytes
Kasai-Lee algorithm (-k --Kasai-Lee-alg)
Linear time lcp reconstruction algorithm described in [1].
Consumed memory: (12+1)*size bytes
Resilience algorithm (-r --resilience-alg)
Algorithm by D.Khmelev, based on *resilience* property of suffix arrays
Consumed memory: (8+1)*size bytes. If optional factor <num> is specified
e.g.
lcp -r 4 FILE
then the memory consumption became (1+(8/num))*size bytes ((1+8/4)*size or
3*size above). These comes in trade-off with increasing computation time
by factor <num> for highly repetitive files.
Synthesis algorithm (-s --synthesis-alg, default)
The mixture of naive and resilience algorithm. Time behaviour is O(size),
while space consumption is (1+8/64)*size or 1.125*size.
[1] T. Kasai, G.Lee et al. Linear-Time Longest-Common-Prefix Computation
in Suffix Arrays and Its Applications//CPM 2001, LNCS 2089, pp.181--192,
Springer-Verlag Berlin Heidelberg, 2001
DATA REPRESENTATION
We assume that the suffix array SA[0..size-1] of unsigned integers 0..2^32-1
for character array FILE[0..size-1] is kept in FILE.ary in big-endian
format. Array lcp[0..size-2] of unsigned integers 0..2^{32}-1 is outputted to
FILE.lcp in 4-byte big endian format either.
To read an integer from this file use the following macros in C:
// The next definition should be used for reading from binary file "rb"
static unsigned char __cint[4];//REQUIRED SIZEOFINT==4!!!!
#define GETINT(f,fname) (fread(&__cint,1,SIZEOFINT,f)==4?\
(256*(256*(256*__cint[0]+__cint[1])+__cint[2])+__cint[3]):\
(fprintf(stderr,"Failed to read big-endian 4-byte integer from %s\n",fname),\
exit(1),0))
To write an integer in big-endian format we used the following procedure:
void write_bigendian_unsigned_int_to_file(FILE *fp, unsigned int n)
{
fputc((n & 0xFF000000) >> 24, fp);
fputc((n & 0xFF0000) >> 16, fp);
fputc((n & 0xFF00) >> 8, fp);
fputc((n & 0xFF), fp);
}
5 Project revision history
Files of the project were modified on the following dates:
2003-08-23
2003-08-24
2003-08-26
2003-08-27
2003-08-28
2003-08-29
2003-09-13
2003-09-14
2003-09-19
2004-01-19
2004-04-10
2004-05-31
6 License
lcp - Longest-Common-Prefix construction from Suffix Arrays
Available at http://www.math.toronto.edu/dkhmelev/PROGS/
Author:
Dmitry V. Khmelev
dkhmelev((at))math.toronto.edu
[change ((at)) to @ in order to get proper address - antispam]
University of Toronto,
Department of Mathematics,
100 St George Street,
M5S 3G3 ON,
Canada
LICENSING TERMS
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version. You should obtain GNU GPL with
file COPYING in this distribution.
Scientific results produced using the software provided shall
acknowledge the use of this software. The proper reference is:
D. Khmelev,
http://www.math.toronto.edu/dkhmelev/PROGS/
Moreover shall the author of the software should be informed about
the publication.
By using this program you agree to the licensing terms.
NO WARRANTY
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT
WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
PARTIES PROVIDE THE PROGRAM ÄS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO
LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.
1 Download
2 File readme
3 Usage and options summary
4 Description
5 Project revision history
6 License
|