GitHub - rcanovas/Low-LZ78

Low-LZ78

The Low-LZ78 project contains the C++11 and C codes associated with the following reference: D. Arroyuelo, R. Canovas, G. Navarro, and R. Raman. "LZ78 Compression in Low Main Memory Space". To appear in Proc. SPIRE'17

This work includes the following data structures: C++11:

hlz78: A LZ78 data structure that uses a fixed hash table.
ghlz78: A LZ78 data structure that uses a growing hash table.
mhlz78: A LZ78 data structure that uses multi-hash tables.

C (32 bits):

lz78_min: The LZ78 compact representation of Arroyuelo and Navarro ("Space-efficient construction of Lempel-Ziv compressed text indexes". Information and Computation, 209(7):1070–1102, 2011).
lz78_uc: The LZ78 uncompressd baseline representation used by Arroyuelo and Navarro.

Compile

To be able to compile the codes:

Install sdsl-lite. Follow the installation guide here: (https://github.com/simongog/sdsl-lite)
Modify the location of the sdsl library in the CMakeLists.txt if necessary.
Require cmake version 3.5 or higher.
For the C++11 code, go to the build folder (create the folder if it does not exists) and run:
- cmake ..
- make
For the C codes, go to the lz78_c32 folder and run:
- make

Methods

hlz78/ghlz78/mhlz78

-[CompressText] :

Use: ./CompressText file_name <opt> 
  		<file_name>: Name of the file to be compressed 
	<opt> : 
	  	-o output_name:  String containing the name of the output file. Default file_name
	  	-s sigma: Size of the alphabet of the input text. Default s = 256
	  	-f factor: Integer value indicating the overload factor used  for the Hash table. Default factor = 5%
	  	-d D_bits: Number of bits used to store each collision value. Default d=0 (meaning that it is compute internally)
		-w Index_type. Default = 0
    		    ---+--------------------
		     0 | HLZ78 using a map for the displacements 
		     1 | HLZ78 using a hash table and a sublayer for displacements
		     2 | MHLZ78 using maps for the displacements
		     3 | MHLZ78 using a hash table and a sublayer for displacements
		     4 | GHLZ78 using maps for the displacements
		     5 | GHLZ78 using a hash table and a sublayer for displacements
    		     

      	output:  <output_name>.index_type

	Example: ./CompressText file -w 1 -s 97 -f 10 -d 3
	output:  file.hlz78_hash

-[DecompressHLZ] :

Use: ./DecompressHLZ index_type <opt>
	<index_type>: Name of the compressed index file
	<opt> (index details are requiered): 
		-o output_name:  String containing the name of the output file. Default <index_name>_decom
		-w Index_type. Default = 0
		 # | Index_type
		 ---+--------------------
		  0 | HLZ78 using a map for the displacements
		  1 | HLZ78 using a hash table and a sublayer for displacements
		  2 | MHLZ78 using maps for the displacements
		  3 | MHLZ78 using a hash table and a sublayer for displacements
		  4 | GHLZ78 using maps for the displacements
		  5 | GHLZ78 using a hash table and a sublayer for displacements
 
	output:  output_name file

	Example: ./DecompressHLZ file.hlz78_hash -w 1 -o file   
	output: file (uncompress version of file.hlz78_hash)

lz78_min/lz78_uc

-[Clz78_(min/uc)] :

Use: ./Clz78_(min/uc) file_name
	file_name: Name of the file to be compressed 

	output:  file_name.lzt file

	Example: ./Clz78_min file 
	output: file.lzt (Compressed version of file)

-[Dlz78_(min/uc)] :

Use: ./Dlz78_(min/uc) file_name
	file_name: Name of the file to be decompressed (file_name.lzt must exist) 

	output:  file_name file

	Example: ./Dlz78_min file 
	output: file (Decompressed version of file.lzt. Note: the original file is overwrite)

The results presented in the paper "LZ78 Compression in Low Main Memory Space" were generated using the command line:

	./CompressText in_file -w type_index -s sigma -f factor
	using w = {1,3,5} and f = {5,10,20,40,60}, and files (with sigma):
		dblp.xml (s=97)
		proteins (s=27)
		english  (s=237) --> Note: we needed to erasa /0 characters from the original text before compressing
		DNA      (s=51)

For more information about the data structures and the test sets used please refer to the paper.

Note: These codes assume that the computer used has enough RAM to read and store the complete input.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
include		include
lz78_c32		lz78_c32
tests		tests
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Low-LZ78

Compile

Methods

hlz78/ghlz78/mhlz78

lz78_min/lz78_uc

About

Licenses found

Releases

Packages

Languages

License

Licenses found

rcanovas/Low-LZ78

Folders and files

Latest commit

History

Repository files navigation

Low-LZ78

Compile

Methods

hlz78/ghlz78/mhlz78

lz78_min/lz78_uc

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages