2023年11月29日发(作者:)

JAVA常⽤压缩算法底层原理性能⽐较

压缩算法

⽬前常⽤的⼏个压缩算法

GZIP,⼀个压缩⽐⾼的慢速算法,压缩后的数据适合长期使⽤。 JDK中的putStream / GZIPOutputStream

便是这个算法的实现。

deflate,zip⽂件⽤的就是这⼀算法。与gzip的不同之处在于,你可以指定算法 的压缩级别,这样你可以在压缩时间和输出⽂件⼤

⼩上进⾏平衡。可选的级别有0(不压缩),以及1(快速压缩)到9(慢速压缩)。它的实现是

erOutputStream / InflaterInputStream。

LZ0算法,⼀个⽤ANSI C语⾔编写的实时压缩解压。解压并不需要内存的⽀持。即使使⽤⾮常⼤的压缩⽐例进⾏缓慢压缩出的数据,

要进⾏Huffman编码,⾸先要把整个⽂件读⼀遍,在读的过程中,统计每个符号(我们把字节的256种值看作是256种符号)的出现

次数。然后根据符号的出现次数,建⽴Huffman树,通过Huffman树得到每个符号的新的编码。

把所有符号看成是⼀个结点,并且该结点的值为它的出现次数。进⼀步把这些结点看成是只有⼀个结点的树。

private static final String GZIP_ENCODE_UTF_8 = "UTF-8";

//GZip

解压缩

public static String gzipUnCompress(String inputString){

byte[] decode = Base64.getDecoder().decode(inputString);

return unCompress(decode, GZIP_ENCODE_UTF_8);

}

public static String unCompress(byte[] bytes, String encoding){

if(bytes == null || bytes.length == 0){

return null;

}

ByteArrayOutputStream out = new ByteArrayOutputStream();

ByteArrayInputStream in = new ByteArrayInputStream();

try{

GZIPInputStream ungzip = new GZIPInputStream(in);

byte[] buffer = new byte[256];

int n;

while((n = ungzip.read(buffer)) >= 0){

out.write(buffer, 0, n);

}

return out.toString(encoding);

}catch (Exception e){

throw new RuntimeException("GzipUnCompressError", e);

}

}

//Gzip

压缩

public static String gzipCompress(String original){

return Base64.getEncoder().encodeToString(compress(original, GZIP_ENCODE_UTF_8));

}

public static byte[] compress(String str, String encoding){

if(str == null || str.length() == 0){

return null;

}

ByteArrayOutputStream out = new ByteArrayOutputStream();

GZIPOutputStream gzip ;

try{

gzip = new GZIPOutputStream(out);

gzip.write(str.getBytes(encoding));

gzip.close();

}catch (Exception e){

throw new RuntimeException("GzipCompressError", e);

}

return out.toByteArray();

}

Deflate算法

deflate是zip压缩⽂件的默认算法. 其实deflate现在不光⽤在zip⽂件中, 在7z, xz等其他的压缩⽂件中都⽤. 实际上deflate只是⼀种压

缩数据流的算法. 任何需要流式压缩的地⽅都可以⽤.

deflate算法就是基于LZ77算法和Huffman编码基础上实现的

算法实现

//deflate

解压缩

public static String deflateUnCompress(String inputString){

byte[] bytes = Base64.getDecoder().decode(inputString);

if(bytes == null || bytes.length == 0){

return null;

}

ByteArrayOutputStream out = new ByteArrayOutputStream();

ByteArrayInputStream in = new ByteArrayInputStream(bytes);

try{

InflaterInputStream inflater = new InflaterInputStream(in);

byte[] buffer = new byte[256];

int n;

while((n = inflater.read(buffer)) >= 0){

out.write(buffer, 0, n);

}

return out.toString("utf-8");

}catch (Exception e){

throw new RuntimeException("DeflaterUnCompressError", e);

}

}

//deflate

压缩

public static String deflateCompress(String original){

if(original == null || original.length() == 0){

return null;

}

ByteArrayOutputStream out = new ByteArrayOutputStream();

DeflaterOutputStream deflater ;

try{

deflater = new DeflaterOutputStream(out);

deflater.write(original.getBytes(StandardCharsets.UTF_8));

deflater.close();

return Base64.getEncoder().encodeToString(out.toByteArray());

}catch (Exception e){

throw new RuntimeException("DeflaterCompressError", e);

}

}

JavaDeflateOutputStreamGZIPOutputStream区别

DeflaterOutputStream实现原始deflate压缩⽅法。GZIPOutputStream为GZIP添加了额外的逻辑: CRC-32 检查,GZIP幻

数,GZIP头,预告⽚等

lzo算法

算法原理

LZO是块压缩算法,它压缩和解压⼀个块数据。压缩和解压的块⼤⼩必须⼀样。

LZO将块数据压缩成匹配数据(滑动字典)和⾮匹配的⽂字序列。LZO对于长匹配和长⽂字序列有专门的处理,这样对于⾼冗余的数据

能够获得很好的效果,这样对于不可压缩的数据,也能得到较好的效果。

底层使⽤的也是LZ77算法。

算法实现

//lzo

解压缩

public static String lzoUnCompress(String str){

LzoDecompressor decompressor = LzoLibrary.getInstance()

.newDecompressor(LzoAlgorithm.LZO1X, null);

try{

ByteArrayOutputStream os = new ByteArrayOutputStream();

ByteArrayInputStream is = new ByteArrayInputStream(Base64.getDecoder().decode(str.getBytes(StandardCharsets.UTF_8)));

LzoInputStream lis = new LzoInputStream(is, decompressor);

int count;

byte[] buffer = new byte[256];

while((count = lis.read(buffer)) != -1){

os.write(buffer, 0, count);

}

return os.toString();

}catch (Exception e){

throw new RuntimeException("lzoUnCompressError", e);

}

}

public static String lzoCompress(String str){

LzoCompressor compressor = LzoLibrary.getInstance().newCompressor(

LzoAlgorithm.LZO1X, null);

ByteArrayOutputStream os = new ByteArrayOutputStream();

LzoOutputStream louts = new LzoOutputStream(os, compressor);

try{

louts.write(str.getBytes(StandardCharsets.UTF_8));

louts.close();

return Base64.getEncoder().encodeToString(os.toByteArray());

}catch (Exception e){

throw new RuntimeException("LzoCompressError", e);

}

}

lz4算法

压缩⽐较差

算法实现

//lz4

解压缩

public static String lz4UnCompress(String str){

byte[] decode = Base64.getDecoder().decode(str.getBytes());

ByteArrayOutputStream baos = new ByteArrayOutputStream();

try{

LZ4BlockInputStream lzis = new LZ4BlockInputStream(

new ByteArrayInputStream(decode));

int count;

byte[] buffer = new byte[2048];

while ((count = lzis.read(buffer)) != -1) {

baos.write(buffer, 0, count);

}

lzis.close();

return baos.toString("utf-8");

}catch (Exception e){

throw new RuntimeException("lz4UnCompressError", e);

}

}

//lz4

压缩

public static String lz4Compress(String str){

LZ4Factory factory = LZ4Factory.fastestInstance();

ByteArrayOutputStream byteOutput = new ByteArrayOutputStream();

LZ4Compressor compressor = factory.fastCompressor();

try{

LZ4BlockOutputStream compressedOutput = new LZ4BlockOutputStream(

byteOutput, 2048, compressor);

compressedOutput.write(str.getBytes(StandardCharsets.UTF_8));

compressedOutput.close();

return Base64.getEncoder().encodeToString(byteOutput.toByteArray());

}catch (Exception e){

throw new RuntimeException("lz4CompressError", e);

}

}

snappy算法

从后⾯的性能测试上可以得出初步结论,snappy的压缩解压性能是⾮常优秀的,压缩⽐也较好。

snappy是google基于LZ77的思路编写的快速数据压缩与解压程序库。它的⽬标并⾮最⼤压缩率或与其他压缩程序库的兼容性,⽽是

⾮常⾼的速度和合理的压缩率。

算法实现

//snappy

解压缩

public static String snappyUnCompress(String str){

ByteArrayOutputStream baos = new ByteArrayOutputStream();

try{

byte[] decode = Base64.getDecoder().decode(str.getBytes());

baos.write(Snappy.uncompress(decode));

return baos.toString();

}catch (Exception e){

throw new RuntimeException("snappyUnCompressError", e);

}

}

//snappy

压缩

public static String snappyCompress(String str){

try{

byte[] compress = Snappy.compress(str);

return Base64.getEncoder().encodeToString(compress);

}catch (Exception e){

throw new RuntimeException("snappyCompressError", e);

}

}

上述实现都验证过解压缩结果正确。

算法性能测试

jdk 1.8.0_144

maven相关依赖

snappy-java

1.1.2.6

4

lz4

1.2.0

lzo-core

1.0.5

commons-logging

commons-logging

代码如下

File ctqRequest = new File("D:Usersj_ties");

StringBuffer sb = new StringBuffer();

try {

FileInputStream fin = new FileInputStream(ctqRequest);

InputStreamReader reader = new InputStreamReader(fin, "UTF-8");

while (reader.ready()) {

sb.append((char) reader.read());

}

fin.close();

fin.close();

} catch (IOException ex) {

ex.printStackTrace();

} finally {

}

//1029265

System.out.println("originalLength:" + sb.toString().length());

/**

* 194292

* 48.23

*/

long start = System.currentTimeMillis();

for(int i = 0;i < 100;i++){

String gzipCompress = ZipUtil.gzipCompress(sb.toString());

ZipUtil.gzipUnCompress(gzipCompress);

System.out.println("gzipLength:" + gzipCompress.length());

}

System.out.println("gzipCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);

/**

* 194276

* 31.46

*/

start = System.currentTimeMillis();

for(int i = 0;i < 100;i++) {

String deflateCompress = ZipUtil.deflateCompress(sb.toString());

ZipUtil.deflateUnCompress(deflateCompress);

System.out.println("deflateLength:" + deflateCompress.length());

}

System.out.println("deflateCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);

/**

* 391508

* 20.0

*/

start = System.currentTimeMillis();

for(int i = 0;i < 100;i++) {

String lzoCompress = ZipUtil.lzoCompress(sb.toString());

ZipUtil.lzoUnCompress(lzoCompress);

System.out.println("lzoLength:" + lzoCompress.length());

}

System.out.println("lzoCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);

/**

* 977604

* 20.01

*/

start = System.currentTimeMillis();

for(int i = 0;i < 100;i++) {

String lzo4Compress = ZipUtil.lz4Compress(sb.toString());

ZipUtil.lz4UnCompress(lzo4Compress);

System.out.println("lz4Length:" + lzo4Compress.length());

}

System.out.println("lz4CostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);

/**

* 360388

* 10.74

*/

start = System.currentTimeMillis();

for(int i = 0;i < 100;i++) {

String snappyCompress = ZipUtil.snappyCompress(sb.toString());

ZipUtil.snappyUnCompress(snappyCompress);

System.out.println("snappyLength:" + snappyCompress.length());

}

System.out.println("snappyCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);