1. Primerjava brezizgubnih algoritmov za stiskanje podatkov s sistemom ALGator
- Author
-
PISAREVIĆ, MILOŠ and Dobravec, Tomaž
- Subjects
decompression ,Lempel-Ziv ,Burrows-Wheeler ,algorithm ,Huffman ,data ,ALGator ,stiskanje ,podatki ,algoritem ,razširjanje ,compression ,Java - Abstract
Namen diplomskega dela je bil predstaviti temeljne algoritme za brezizgubno stiskanje podatkov in jih med seboj testirati. V teoretičnem delu najprej na splošno govorimo o značilnostih kodiranja podatkov. Tukaj razjasnimo razlike med več vrstami kod in se podrobneje osredotočimo na kode VLC. Omenjena je tudi informacijska teorija, pri čemer razložimo, zakaj je pomemben element na področju kodiranja. Nato se poglobimo v razlago dejanskih algoritmov, in sicer začnemo s statističnimi algoritmi, kjer opišemo delovanje Shannon-Fanojevega kodiranja, Huffmanovega kodiranja in aritmetičnega kodiranja. Potem preidemo na algoritme, ki dosegajo stiskanje s pomočjo slovarja, in opišemo prvotna algoritma LZ77 in LZ78 ter njuni varianti LZSS in LZW. Za konec opišemo še algoritem Burrows-Wheeler, ki združuje transformaciji Burrows-Wheeler in premakni-na-začetek z entropijskim kodiranjem. V praktičnem delu začnemo z implementacijo algoritmov Huffmanovega kodiranja, aritmetičnega kodiranja, LZSS, LZW in Bzip2. Algoritme implementiramo z odprtokodnimi implementacijami v programskem jeziku Java in za testiranje uporabimo sistem ALGator. Teste izvajamo na več znanih testnih zbirkah, ki so nastale z namenom testiranja brezizgubnega stiskanja. Te zbirke so Calgary, Canterbury, Silesia in Maximum. Po končanem testiranju rezultate primerjamo med seboj in ugotovimo učinkovitost implementiranih algoritmov. The purpose of the thesis is to present the fundamental lossless data compression algorithms, test and compare them. The theoretical part outlines some of the general characteristics of data coding and explores in more detail the differences between different types of codes, particularly VLC codes. Moreover, it touches on the information theory and explains why it plays an important role in coding. It also thoroughly explains actual algorithms, starting with statistical algorithms to describe the functioning of Shannon-Fano coding, Huffman coding and arithmetic coding. Next are algorithms that achieve compression using a dictionary, i.e. algorithms LZ77 and LZ78 and their respective variants LZSS and LZW. The last algorithm presented is the Burrows-Wheeler algorithm that involves the Burrows-Wheeler transformation and move-to-front transformation with entropy coding. As for the practical part, it deals with the implementation and testing of algorithms for Huffman coding, arithmetic coding, LZSS, LZW and Bzip2. The algorithms are implemented with open source implementations in the Java programming language. The ALGator system is used for testing them. The tests are performed on several widely known test collections that were developed for testing lossless compression. Those collections are Calgary, Canterbury, Silesia and Maximum. The comparison of test results determines the effectiveness of the implemented algorithms.
- Published
- 2016