|
Ïðîãðàììà lcp âåðñèÿ 0.1.9
Ïðîãðàììà lcp âåðñèÿ 0.1.9
Ïðîåêò íà÷àò 2003-08-23; Ïîñëåäíèå èçìåíåíèÿ 2004-05-31
1 Çàãðóçêà
2 Ôàéë readme.koi
3 Èñïîëüçîâàíèå è ñâîäêà îïöèé
4 Îïèñàíèå
5 Ðàçâèòèå ïðîãðàììû
6 Ëèöåíçèÿ
1 Çàãðóçêà
Èñõîäíèêè: src/lcp-0.1.9.tgz [361 Êá ]
Èñïîëíèìûé ôàéë äëÿ Win9x/2000/XP (êðîññ-ñêîìïèëèðîâàííûé ïîä minGW): mingw/lcp.zip [22 Êá ]
2 Ôàéë readme.koi
lcp --- ïîñòðîåíèå èíäåêñà LCP èç ñóôôèêñíîãî ìàññèâà
ÏÎÄÄÅÐÆÈÂÀÅÌÛÅ ÎÊÐÓÆÅÍÈß
http://www.gnu.org GNU/Linux
http://www.mingw.org MinGW --- Minimalist GNU For Windows
ÊÎÌÏÈËßÖÈß
Ââåäèòå make (èëè gmake) â äèðåêòîðèè, ãäå íàõîäÿòñÿ èñõîäíûå òåêñòû
ïðîãðàììû.
ÊÐÀÒÊÀß ÈÍÑÒÐÓÊÖÈß
Ñì. ïîìîùü ê ïðîãðàììå:
lcp --help
Óñëîâèÿ èñïîëüçîâàíèÿ îïèñàíû ôàéëå LICENSE.koi
3 Èñïîëüçîâàíèå è ñâîäêà îïöèé
user@computer$ ./lcp --help
USAGE: lcp [OPTIONS] FILE
INPUT files: FILE FILE.ary
OUTPUT: FILE.lcp with information on common prefixes
Mandatory arguments to long options are mandatory for short options too.
GENERAL OPTIONS:
-a, --suffix-array=<name> take suffix array for FILE from <name>
-o, --lcp-array=<lcpname> output lcp array to file <name>
LCP RECONSTRUCTION ALGORITHM
-k, --Kasai-Lee-alg Kasai-Lee (cf [1] in `lcp -m`)
-r<num>, or resilience(use optional parameter
--resilience-alg <num> <num=1> to decrease memory use)
-s, --synthesis-alg (default) synthesis
-n, --naive-alg naive
OUTPUT OPTIONS
-t, --stdout output result to stdout
-d, --digital print lcp's as digital numbers separated by newline
-c, --clocks output clocks of the program (depends on algorithm)
-i, --information output in three lines (max. lcp, sum of lcp, avg.lcp)
INFORMATION
-q, --quiet stop messages to stderr and shorten other messages
-h, --help display this help and exit
-m, --man display complete description
-v, --version display version and exit
4 Îïèñàíèå
user@computer$ ./lcp --man
<Ïðîïóùåíà èíôîðìàöèÿ, ïðèñóòñòâóþùàÿ â ïðåäûäóùåì ðàçäåëå>
DESCRIPTION
On input program requires the FILE and its suffix array FILE.ary
Suffix array should be constructed using suffsort program from
TACU (Text Analysis and Conversion Utilities) by D.Khmelev
(http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html)
or by sary program.
Example of use:
suffsort FILE # produces suffix array in FILE.ary
lcp FILE # produces FILE.lcp from FILE.ary
Suppose that length(FILE)=size. Denote suffix array by SA[0..size-1].
The lcp[0..size-2] array is defined by relation
lcp[i]=LCP(FILE[SA[i]..size-1],FILE[SA[i+1]..size-1])
This program realizes four algorithms for lcp[] construction from the
suffix array, which can be selected by switches:
Naive algorithm (-n --naive-alg)
Straightforward computing with potential O(size*size) time complexity.
Consumed memory: (12+1)*size bytes
Kasai-Lee algorithm (-k --Kasai-Lee-alg)
Linear time lcp reconstruction algorithm described in [1].
Consumed memory: (12+1)*size bytes
Resilience algorithm (-r --resilience-alg)
Algorithm by D.Khmelev, based on *resilience* property of suffix arrays
Consumed memory: (8+1)*size bytes. If optional factor <num> is specified
e.g.
lcp -r 4 FILE
then the memory consumption became (1+(8/num))*size bytes ((1+8/4)*size or
3*size above). These comes in trade-off with increasing computation time
by factor <num> for highly repetitive files.
Synthesis algorithm (-s --synthesis-alg, default)
The mixture of naive and resilience algorithm. Time behaviour is O(size),
while space consumption is (1+8/64)*size or 1.125*size.
[1] T. Kasai, G.Lee et al. Linear-Time Longest-Common-Prefix Computation
in Suffix Arrays and Its Applications//CPM 2001, LNCS 2089, pp.181--192,
Springer-Verlag Berlin Heidelberg, 2001
DATA REPRESENTATION
We assume that the suffix array SA[0..size-1] of unsigned integers 0..2^32-1
for character array FILE[0..size-1] is kept in FILE.ary in big-endian
format. Array lcp[0..size-2] of unsigned integers 0..2^{32}-1 is outputted to
FILE.lcp in 4-byte big endian format either.
To read an integer from this file use the following macros in C:
// The next definition should be used for reading from binary file "rb"
static unsigned char __cint[4];//REQUIRED SIZEOFINT==4!!!!
#define GETINT(f,fname) (fread(&__cint,1,SIZEOFINT,f)==4?\
(256*(256*(256*__cint[0]+__cint[1])+__cint[2])+__cint[3]):\
(fprintf(stderr,"Failed to read big-endian 4-byte integer from %s\n",fname),\
exit(1),0))
To write an integer in big-endian format we used the following procedure:
void write_bigendian_unsigned_int_to_file(FILE *fp, unsigned int n)
{
fputc((n & 0xFF000000) >> 24, fp);
fputc((n & 0xFF0000) >> 16, fp);
fputc((n & 0xFF00) >> 8, fp);
fputc((n & 0xFF), fp);
}
5 Ðàçâèòèå ïðîãðàììû
Äàòû èçìåíåíèÿ ôàéëîâ ïðîãðàììû:
2003-08-23
2003-08-24
2003-08-26
2003-08-27
2003-08-28
2003-08-29
2003-09-13
2003-09-14
2003-09-19
2004-01-19
2004-04-10
2004-05-31
6 Ëèöåíçèÿ
lcp - ïîñòðîåíèå èíäåêñà LCP èç ñóôôèêñíîãî ìàññèâà
Ïðîãðàììà äîñòóïíà ñ http://www.math.toronto.edu/dkhmelev/PROGS/
Àâòîð:
Äìèòðèé Âèêòîðîâè÷ Õìåë¸â
dkhmelev((at))math.toronto.edu
[çàìåíèòå ((at)) íà @, ÷òîáû ïîëó÷èòü íàñòîÿùèé àäðåñ - àíòèñïàì]
119 992, Ìîñêâà, Ëåíèíñêèå ãîðû, ÌÃÓ, 1 Ãóì. êîðï.,
Ôèëîëîãè÷åñêèé ôàêóëüòåò,
Ëàáîðàòîðèÿ îáùåé è êîìïüþòåðíîé ëåêñèêîëîãèè è ëåêñèêîãðàôèè.
ÓÑËÎÂÈß ÈÑÏÎËÜÇÎÂÀÍÈß
Ýòà ïðîãðàììà ìîæåò ñâîáîäíî ðàñïðîñòðàíÿòüñÿ íà óñëîâèÿõ ëèöåíçèè GNU
âåðñèè äâà èëè âûøå (ñì. ïðèëàãàåìûé ôàéë COPYING ñ óñëîâèÿìè
ðàñïðîñòðàíåíèÿ).
Ðåçóëüòàòû, ïîëó÷åííûå ñ èñïîëüçîâàíèåì ýòîé ïðîãðàììû äîëæíû
ññûëàòüñÿ íà å¸ èñïîëüçîâàíèå. Ïðèìåð ññûëêè:
Ä.Â. Õìåë¸â
http://www.math.toronto.edu/dkhmelev/PROGS/
Áîëåå òîãî, Âû äîëæíû ïðîèíôîðìèðîâàòü àâòîðà î ïóáëèêàöèè.
Èñïîëüçóÿ ýòó ïðîãðàììó Âû ñîãëàøàåòåñü ñ óñëîâèÿìè
èñïîëüçîâàíèÿ.
ÎÒÑÓÒÑÒÂÈÅ ÃÀÐÀÍÒÈÉÍÛÕ ÎÁßÇÀÒÅËÜÑÒÂ
ÏÎÑÊÎËÜÊÓ ÍÀÑÒÎßÙÀß ÏÐÎÃÐÀÌÌÀ ÐÀÑÏÐÎÑÒÐÀÍßÅÒÑß ÁÅÑÏËÀÒÍÎ, ÃÀÐÀÍÒÈÈ
ÍÀ ÍÅÅ ÍÅ ÏÐÅÄÎÑÒÀÂËßÞÒÑß Â ÒÎÉ ÑÒÅÏÅÍÈ, Â ÊÀÊÎÉ ÝÒÎ ÄÎÏÓÑÊÀÅÒÑß
ÏÐÈÌÅÍÈÌÛÌ ÏÐÀÂÎÌ. ÍÀÑÒÎßÙÀß ÏÐÎÃÐÀÌÌÀ ÏÎÑÒÀÂËßÅÒÑß ÍÀ ÓÑËÎÂÈßÕ "ÊÀÊ
ÅÑÒÜ". ÅÑËÈ ÈÍÎÅ ÍÅ ÓÊÀÇÀÍÎ Â ÏÈÑÜÌÅÍÍÎÉ ÔÎÐÌÅ, ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ
ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ ÍÅ ÏÐÈÍÈÌÀÅÒ ÍÀ ÑÅÁß ÍÈÊÀÊÈÕ ÃÀÐÀÍÒÈÉÍÛÕ ÎÁßÇÀÒÅËÜÑÒÂ,
ÊÀÊ ßÂÍÎ ÂÛÐÀÆÅÍÍÛÕ, ÒÀÊ È ÏÎÄÐÀÇÓÌÅÂÀÅÌÛÕ,  ÎÒÍÎØÅÍÈÈ ÏÐÎÃÐÀÌÌÛ, Â
ÒÎÌ ×ÈÑËÅ ÏÎÄÐÀÇÓÌÅÂÀÅÌÓÞ ÃÀÐÀÍÒÈÞ ÒÎÂÀÐÍÎÃÎ ÑÎÑÒÎßÍÈß ÏÐÈ ÏÐÎÄÀÆÅ È
ÏÐÈÃÎÄÍÎÑÒÈ ÄËß ÈÑÏÎËÜÇÎÂÀÍÈß Â ÊÎÍÊÐÅÒÍÛÕ ÖÅËßÕ, À ÒÀÊÆÅ ËÞÁÛÅ ÈÍÛÅ
ÃÀÐÀÍÒÈÈ. ÂÑÅ ÐÈÑÊÈ, ÑÂßÇÀÍÍÛÅ Ñ ÊÀ×ÅÑÒÂÎÌ È ÏÐÎÈÇÂÎÄÈÒÅËÜÍÎÑÒÜÞ
ÏÐÎÃÐÀÌÌÛ, ÍÅÑÅÒ ËÈÖÅÍÇÈÀÒ.  ÑËÓ×ÀÅ ÅÑËÈ Â ÏÐÎÃÐÀÌÌÅ ÁÓÄÓÒ ÎÁÍÀÐÓÆÅÍÛ
ÍÅÄÎÑÒÀÒÊÈ, ÂÑÅ ÐÀÑÕÎÄÛ, ÑÂßÇÀÍÍÛÅ Ñ ÒÅÕÍÈ×ÅÑÊÈÌ ÎÁÑËÓÆÈÂÀÍÈÅÌ,
ÐÅÌÎÍÒÎÌ ÈËÈ ÈÑÏÐÀÂËÅÍÈÅÌ ÏÐÎÃÐÀÌÌÛ, ÍÅÑÅÒ ËÈÖÅÍÇÈÀÒ.
ÅÑËÈ ÈÍÎÅ ÍÅ ÏÐÅÄÓÑÌÎÒÐÅÍÎ ÏÐÈÌÅÍßÅÌÛÌ ÏÐÀÂÎÌ ÈËÈ ÍÅ ÑÎÃËÀÑÎÂÀÍÎ
ÑÒÎÐÎÍÀÌÈ Â ÄÎÃÎÂÎÐÅ Â ÏÈÑÜÌÅÍÍÎÉ ÔÎÐÌÅ, ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ
ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ, ÊÎÒÎÐÛÉ ÌÎÄÈÔÈÖÈÐÓÅÒ È/ÈËÈ ÐÀÑÏÐÎÑÒÐÀÍßÅÒ ÏÐÎÃÐÀÌÌÓ
ÍÀ ÓÑËÎÂÈßÕ ÍÀÑÒÎßÙÅÉ ËÈÖÅÍÇÈÈ, ÍÅ ÍÅÑÅÒ ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ ÏÅÐÅÄ
ËÈÖÅÍÇÈÀÒÎÌ ÇÀ ÓÁÛÒÊÈ, ÂÊËÞ×Àß ÎÁÙÈÅ, ÐÅÀËÜÍÛÅ, ÏÐÅÄÂÈÄÈÌÛÅ È
ÊÎÑÂÅÍÍÛÅ ÓÁÛÒÊÈ ( ÒÎÌ ×ÈÑËÅ ÓÒÐÀÒÓ ÈËÈ ÈÑÊÀÆÅÍÈÅ ÈÍÔÎÐÌÀÖÈÈ, ÓÁÛÒÊÈ,
ÏÎÍÅÑÅÍÍÛÅ ËÈÖÅÍÇÈÀÒÎÌ ÈËÈ ÒÐÅÒÜÈÌÈ ËÈÖÀÌÈ, ÍÅÂÎÇÌÎÆÍÎÑÒÜ ÐÀÁÎÒÛ
ÏÐÎÃÐÀÌÌÛ Ñ ËÞÁÎÉ ÄÐÓÃÎÉ ÏÐÎÃÐÀÌÌÎÉ È ÈÍÛÅ ÓÁÛÒÊÈ). ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ
ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ Â ÑÎÎÒÂÅÒÑÒÂÈÈ Ñ ÍÀÑÒÎßÙÈÌ ÏÓÍÊÒÎÌ ÍÅ ÍÅÑÓÒ
ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ ÄÀÆÅ  ÒÎÌ ÑËÓ×ÀÅ, ÎÍÈ ÁÛËÈ ÏÐÅÄÓÏÐÅÆÄÅÍÛ Î
ÂÎÇÌÎÆÍÎÑÒÈ ÂÎÇÍÈÊÍÎÂÅÍÈß ÒÀÊÈÕ ÓÁÛÒÊÎÂ.
1 Çàãðóçêà
2 Ôàéë readme.koi
3 Èñïîëüçîâàíèå è ñâîäêà îïöèé
4 Îïèñàíèå
5 Ðàçâèòèå ïðîãðàììû
6 Ëèöåíçèÿ
|