Ïðîãðàììû >> Ðàçíûå >> [ template | cvswork | img2djvu Îêò, 7 | lcp | longlcp | lar | histogram | maxlcp | pdftodjvu Ñåíò, 25 | polygon | ppmcluster ]

Ïðîãðàììà lcp âåðñèÿ 0.1.9

Ïðîãðàììà lcp âåðñèÿ 0.1.9

Ïðîåêò íà÷àò 2003-08-23; Ïîñëåäíèå èçìåíåíèÿ 2004-05-31

1  Çàãðóçêà
2  Ôàéë readme.koi
3  Èñïîëüçîâàíèå è ñâîäêà îïöèé
4  Îïèñàíèå
5  Ðàçâèòèå ïðîãðàììû
6  Ëèöåíçèÿ

1  Çàãðóçêà

Èñõîäíèêè: src/lcp-0.1.9.tgz [361 Êá ]

Èñïîëíèìûé ôàéë äëÿ Win9x/2000/XP (êðîññ-ñêîìïèëèðîâàííûé ïîä minGW): mingw/lcp.zip [22 Êá ]

2  Ôàéë readme.koi

lcp --- ïîñòðîåíèå èíäåêñà LCP èç ñóôôèêñíîãî ìàññèâà


ÏÎÄÄÅÐÆÈÂÀÅÌÛÅ ÎÊÐÓÆÅÍÈß

http://www.gnu.org    GNU/Linux 
http://www.mingw.org  MinGW --- Minimalist GNU For Windows


ÊÎÌÏÈËßÖÈß

Ââåäèòå make (èëè gmake) â äèðåêòîðèè, ãäå íàõîäÿòñÿ èñõîäíûå òåêñòû
ïðîãðàììû.


ÊÐÀÒÊÀß ÈÍÑÒÐÓÊÖÈß

Ñì. ïîìîùü ê ïðîãðàììå:
  lcp --help


Óñëîâèÿ èñïîëüçîâàíèÿ îïèñàíû ôàéëå LICENSE.koi


3  Èñïîëüçîâàíèå è ñâîäêà îïöèé

user@computer$ ./lcp --help
USAGE: lcp [OPTIONS] FILE 
INPUT files: FILE  FILE.ary
OUTPUT: FILE.lcp with information on common prefixes
Mandatory arguments to long options are mandatory for short options too.
GENERAL OPTIONS:
  -a, --suffix-array=<name>  take suffix array for FILE from <name>
  -o, --lcp-array=<lcpname>        output lcp array to file <name>
LCP RECONSTRUCTION ALGORITHM
  -k, --Kasai-Lee-alg     Kasai-Lee (cf [1] in `lcp -m`)
  -r<num>, or             resilience(use optional parameter
  --resilience-alg <num>  <num=1> to decrease memory use)
  -s, --synthesis-alg     (default) synthesis
  -n, --naive-alg         naive
OUTPUT OPTIONS
  -t, --stdout            output result to stdout
  -d, --digital           print lcp's as digital numbers separated by newline
  -c, --clocks            output clocks of the program (depends on algorithm)
  -i, --information       output in three lines (max. lcp, sum of lcp, avg.lcp)
INFORMATION
  -q, --quiet             stop messages to stderr and shorten other messages
  -h, --help              display this help and exit
  -m, --man               display complete description
  -v, --version           display version and exit


4  Îïèñàíèå

user@computer$ ./lcp --man
<Ïðîïóùåíà èíôîðìàöèÿ, ïðèñóòñòâóþùàÿ â ïðåäûäóùåì ðàçäåëå>


DESCRIPTION

On input program requires the FILE and its suffix array FILE.ary
Suffix array should be constructed using suffsort program from
TACU (Text Analysis and Conversion Utilities) by D.Khmelev
(http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html)
or by sary program.
Example of use:
  suffsort FILE       # produces suffix array in FILE.ary
  lcp FILE            # produces FILE.lcp from FILE.ary
Suppose that length(FILE)=size. Denote suffix array by SA[0..size-1].
The lcp[0..size-2] array is defined by relation 
  lcp[i]=LCP(FILE[SA[i]..size-1],FILE[SA[i+1]..size-1])
This program realizes four algorithms for lcp[] construction from the
suffix array, which can be selected by switches:

Naive algorithm (-n --naive-alg)
Straightforward computing with potential O(size*size) time complexity.
Consumed memory: (12+1)*size bytes

Kasai-Lee algorithm (-k --Kasai-Lee-alg)
Linear time lcp reconstruction algorithm described in [1].
Consumed memory: (12+1)*size bytes

Resilience algorithm (-r --resilience-alg)
Algorithm by D.Khmelev, based on *resilience* property of suffix arrays
Consumed memory: (8+1)*size bytes. If optional factor <num> is specified
e.g.
 lcp -r 4 FILE 
then the memory consumption became (1+(8/num))*size bytes ((1+8/4)*size or
3*size above). These comes in trade-off with increasing computation time
by factor <num> for highly repetitive files.

Synthesis algorithm (-s --synthesis-alg, default)
The mixture of naive and resilience algorithm. Time behaviour is O(size),
while space consumption is (1+8/64)*size or 1.125*size.

[1] T. Kasai, G.Lee et al. Linear-Time Longest-Common-Prefix Computation
in Suffix Arrays and Its Applications//CPM 2001, LNCS 2089, pp.181--192,
Springer-Verlag Berlin Heidelberg, 2001

DATA REPRESENTATION

We assume that the suffix array SA[0..size-1] of unsigned integers 0..2^32-1
for character array FILE[0..size-1] is kept in FILE.ary in big-endian
format. Array lcp[0..size-2] of unsigned integers 0..2^{32}-1 is outputted to
FILE.lcp in 4-byte big endian format either.
To read an integer from this file use the following macros in C:

// The next definition should be used for reading from binary file "rb"
static unsigned char __cint[4];//REQUIRED SIZEOFINT==4!!!!
#define GETINT(f,fname) (fread(&__cint,1,SIZEOFINT,f)==4?\
  (256*(256*(256*__cint[0]+__cint[1])+__cint[2])+__cint[3]):\
  (fprintf(stderr,"Failed to read big-endian 4-byte integer from %s\n",fname),\
   exit(1),0))

To write an integer in big-endian format we used the following procedure:

void write_bigendian_unsigned_int_to_file(FILE *fp, unsigned int n)
{
  fputc((n & 0xFF000000) >> 24, fp);
  fputc((n & 0xFF0000) >> 16, fp);
  fputc((n & 0xFF00) >> 8, fp);
  fputc((n & 0xFF), fp);
}


5  Ðàçâèòèå ïðîãðàììû

Äàòû èçìåíåíèÿ ôàéëîâ ïðîãðàììû:

2003-08-23

2003-08-24

2003-08-26

2003-08-27

2003-08-28

2003-08-29

2003-09-13

2003-09-14

2003-09-19

2004-01-19

2004-04-10

2004-05-31

6  Ëèöåíçèÿ

lcp - ïîñòðîåíèå èíäåêñà LCP èç ñóôôèêñíîãî ìàññèâà

Ïðîãðàììà äîñòóïíà ñ http://www.math.toronto.edu/dkhmelev/PROGS/

Àâòîð:

Äìèòðèé Âèêòîðîâè÷ Õìåë¸â dkhmelev((at))math.toronto.edu [çàìåíèòå ((at)) íà @, ÷òîáû ïîëó÷èòü íàñòîÿùèé àäðåñ - àíòèñïàì]

119 992, Ìîñêâà, Ëåíèíñêèå ãîðû, ÌÃÓ, 1 Ãóì. êîðï., Ôèëîëîãè÷åñêèé ôàêóëüòåò, Ëàáîðàòîðèÿ îáùåé è êîìïüþòåðíîé ëåêñèêîëîãèè è ëåêñèêîãðàôèè.

ÓÑËÎÂÈß ÈÑÏÎËÜÇÎÂÀÍÈß

Ýòà ïðîãðàììà ìîæåò ñâîáîäíî ðàñïðîñòðàíÿòüñÿ íà óñëîâèÿõ ëèöåíçèè GNU âåðñèè äâà èëè âûøå (ñì. ïðèëàãàåìûé ôàéë COPYING ñ óñëîâèÿìè ðàñïðîñòðàíåíèÿ).

Ðåçóëüòàòû, ïîëó÷åííûå ñ èñïîëüçîâàíèåì ýòîé ïðîãðàììû äîëæíû ññûëàòüñÿ íà å¸ èñïîëüçîâàíèå. Ïðèìåð ññûëêè:

Ä.Â. Õìåë¸â http://www.math.toronto.edu/dkhmelev/PROGS/

Áîëåå òîãî, Âû äîëæíû ïðîèíôîðìèðîâàòü àâòîðà î ïóáëèêàöèè.

Èñïîëüçóÿ ýòó ïðîãðàììó Âû ñîãëàøàåòåñü ñ óñëîâèÿìè èñïîëüçîâàíèÿ.

ÎÒÑÓÒÑÒÂÈÅ ÃÀÐÀÍÒÈÉÍÛÕ ÎÁßÇÀÒÅËÜÑÒÂ

ÏÎÑÊÎËÜÊÓ ÍÀÑÒÎßÙÀß ÏÐÎÃÐÀÌÌÀ ÐÀÑÏÐÎÑÒÐÀÍßÅÒÑß ÁÅÑÏËÀÒÍÎ, ÃÀÐÀÍÒÈÈ ÍÀ ÍÅÅ ÍÅ ÏÐÅÄÎÑÒÀÂËßÞÒÑß Â ÒÎÉ ÑÒÅÏÅÍÈ,  ÊÀÊÎÉ ÝÒÎ ÄÎÏÓÑÊÀÅÒÑß ÏÐÈÌÅÍÈÌÛÌ ÏÐÀÂÎÌ. ÍÀÑÒÎßÙÀß ÏÐÎÃÐÀÌÌÀ ÏÎÑÒÀÂËßÅÒÑß ÍÀ ÓÑËÎÂÈßÕ "ÊÀÊ ÅÑÒÜ". ÅÑËÈ ÈÍÎÅ ÍÅ ÓÊÀÇÀÍÎ Â ÏÈÑÜÌÅÍÍÎÉ ÔÎÐÌÅ, ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ ÍÅ ÏÐÈÍÈÌÀÅÒ ÍÀ ÑÅÁß ÍÈÊÀÊÈÕ ÃÀÐÀÍÒÈÉÍÛÕ ÎÁßÇÀÒÅËÜÑÒÂ, ÊÀÊ ßÂÍÎ ÂÛÐÀÆÅÍÍÛÕ, ÒÀÊ È ÏÎÄÐÀÇÓÌÅÂÀÅÌÛÕ,  ÎÒÍÎØÅÍÈÈ ÏÐÎÃÐÀÌÌÛ,  ÒÎÌ ×ÈÑËÅ ÏÎÄÐÀÇÓÌÅÂÀÅÌÓÞ ÃÀÐÀÍÒÈÞ ÒÎÂÀÐÍÎÃÎ ÑÎÑÒÎßÍÈß ÏÐÈ ÏÐÎÄÀÆÅ È ÏÐÈÃÎÄÍÎÑÒÈ ÄËß ÈÑÏÎËÜÇÎÂÀÍÈß Â ÊÎÍÊÐÅÒÍÛÕ ÖÅËßÕ, À ÒÀÊÆÅ ËÞÁÛÅ ÈÍÛÅ ÃÀÐÀÍÒÈÈ. ÂÑÅ ÐÈÑÊÈ, ÑÂßÇÀÍÍÛÅ Ñ ÊÀ×ÅÑÒÂÎÌ È ÏÐÎÈÇÂÎÄÈÒÅËÜÍÎÑÒÜÞ ÏÐÎÃÐÀÌÌÛ, ÍÅÑÅÒ ËÈÖÅÍÇÈÀÒ.  ÑËÓ×ÀÅ ÅÑËÈ Â ÏÐÎÃÐÀÌÌÅ ÁÓÄÓÒ ÎÁÍÀÐÓÆÅÍÛ ÍÅÄÎÑÒÀÒÊÈ, ÂÑÅ ÐÀÑÕÎÄÛ, ÑÂßÇÀÍÍÛÅ Ñ ÒÅÕÍÈ×ÅÑÊÈÌ ÎÁÑËÓÆÈÂÀÍÈÅÌ, ÐÅÌÎÍÒÎÌ ÈËÈ ÈÑÏÐÀÂËÅÍÈÅÌ ÏÐÎÃÐÀÌÌÛ, ÍÅÑÅÒ ËÈÖÅÍÇÈÀÒ.

ÅÑËÈ ÈÍÎÅ ÍÅ ÏÐÅÄÓÑÌÎÒÐÅÍÎ ÏÐÈÌÅÍßÅÌÛÌ ÏÐÀÂÎÌ ÈËÈ ÍÅ ÑÎÃËÀÑÎÂÀÍÎ ÑÒÎÐÎÍÀÌÈ Â ÄÎÃÎÂÎÐÅ Â ÏÈÑÜÌÅÍÍÎÉ ÔÎÐÌÅ, ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ, ÊÎÒÎÐÛÉ ÌÎÄÈÔÈÖÈÐÓÅÒ È/ÈËÈ ÐÀÑÏÐÎÑÒÐÀÍßÅÒ ÏÐÎÃÐÀÌÌÓ ÍÀ ÓÑËÎÂÈßÕ ÍÀÑÒÎßÙÅÉ ËÈÖÅÍÇÈÈ, ÍÅ ÍÅÑÅÒ ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ ÏÅÐÅÄ ËÈÖÅÍÇÈÀÒÎÌ ÇÀ ÓÁÛÒÊÈ, ÂÊËÞ×Àß ÎÁÙÈÅ, ÐÅÀËÜÍÛÅ, ÏÐÅÄÂÈÄÈÌÛÅ È ÊÎÑÂÅÍÍÛÅ ÓÁÛÒÊÈ ( ÒÎÌ ×ÈÑËÅ ÓÒÐÀÒÓ ÈËÈ ÈÑÊÀÆÅÍÈÅ ÈÍÔÎÐÌÀÖÈÈ, ÓÁÛÒÊÈ, ÏÎÍÅÑÅÍÍÛÅ ËÈÖÅÍÇÈÀÒÎÌ ÈËÈ ÒÐÅÒÜÈÌÈ ËÈÖÀÌÈ, ÍÅÂÎÇÌÎÆÍÎÑÒÜ ÐÀÁÎÒÛ ÏÐÎÃÐÀÌÌÛ Ñ ËÞÁÎÉ ÄÐÓÃÎÉ ÏÐÎÃÐÀÌÌÎÉ È ÈÍÛÅ ÓÁÛÒÊÈ). ÀÂÒÎÐ È/ÈËÈ ÈÍÎÉ ÏÐÀÂÎÎÁËÀÄÀÒÅËÜ Â ÑÎÎÒÂÅÒÑÒÂÈÈ Ñ ÍÀÑÒÎßÙÈÌ ÏÓÍÊÒÎÌ ÍÅ ÍÅÑÓÒ ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ ÄÀÆÅ  ÒÎÌ ÑËÓ×ÀÅ, ÎÍÈ ÁÛËÈ ÏÐÅÄÓÏÐÅÆÄÅÍÛ Î ÂÎÇÌÎÆÍÎÑÒÈ ÂÎÇÍÈÊÍÎÂÅÍÈß ÒÀÊÈÕ ÓÁÛÒÊÎÂ.

1  Çàãðóçêà
2  Ôàéë readme.koi
3  Èñïîëüçîâàíèå è ñâîäêà îïöèé
4  Îïèñàíèå
5  Ðàçâèòèå ïðîãðàììû
6  Ëèöåíçèÿ

Ïðîãðàììû >> Ðàçíûå >> [ template | cvswork | img2djvu Îêò, 7 | lcp | longlcp | lar | histogram | maxlcp | pdftodjvu Ñåíò, 25 | polygon | ppmcluster ]

- ???????@Mail.ru
© 2002-2005 Ä.Õìåë¸â -