2023年11月29日发(作者:)
JAVA常⽤压缩算法底层原理性能⽐较
压缩算法
⽬前常⽤的⼏个压缩算法
GZIP,⼀个压缩⽐⾼的慢速算法,压缩后的数据适合长期使⽤。 JDK中的putStream / GZIPOutputStream
便是这个算法的实现。
deflate,zip⽂件⽤的就是这⼀算法。与gzip的不同之处在于,你可以指定算法 的压缩级别,这样你可以在压缩时间和输出⽂件⼤
⼩上进⾏平衡。可选的级别有0(不压缩),以及1(快速压缩)到9(慢速压缩)。它的实现是
erOutputStream / InflaterInputStream。
LZ0算法,⼀个⽤ANSI C语⾔编写的实时压缩解压。解压并不需要内存的⽀持。即使使⽤⾮常⼤的压缩⽐例进⾏缓慢压缩出的数据,
要进⾏Huffman编码,⾸先要把整个⽂件读⼀遍,在读的过程中,统计每个符号(我们把字节的256种值看作是256种符号)的出现
次数。然后根据符号的出现次数,建⽴Huffman树,通过Huffman树得到每个符号的新的编码。
把所有符号看成是⼀个结点,并且该结点的值为它的出现次数。进⼀步把这些结点看成是只有⼀个结点的树。
private static final String GZIP_ENCODE_UTF_8 = "UTF-8";
//GZip
解压缩
public static String gzipUnCompress(String inputString){
byte[] decode = Base64.getDecoder().decode(inputString);
return unCompress(decode, GZIP_ENCODE_UTF_8);
}
public static String unCompress(byte[] bytes, String encoding){
if(bytes == null || bytes.length == 0){
return null;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream();
try{
GZIPInputStream ungzip = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n;
while((n = ungzip.read(buffer)) >= 0){
out.write(buffer, 0, n);
}
return out.toString(encoding);
}catch (Exception e){
throw new RuntimeException("GzipUnCompressError", e);
}
}
//Gzip
压缩
public static String gzipCompress(String original){
return Base64.getEncoder().encodeToString(compress(original, GZIP_ENCODE_UTF_8));
}
public static byte[] compress(String str, String encoding){
if(str == null || str.length() == 0){
return null;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip ;
try{
gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes(encoding));
gzip.close();
}catch (Exception e){
throw new RuntimeException("GzipCompressError", e);
}
return out.toByteArray();
}
Deflate算法
deflate是zip压缩⽂件的默认算法. 其实deflate现在不光⽤在zip⽂件中, 在7z, xz等其他的压缩⽂件中都⽤. 实际上deflate只是⼀种压
缩数据流的算法. 任何需要流式压缩的地⽅都可以⽤.
deflate算法就是基于LZ77算法和Huffman编码基础上实现的
算法实现
//deflate
解压缩
public static String deflateUnCompress(String inputString){
byte[] bytes = Base64.getDecoder().decode(inputString);
if(bytes == null || bytes.length == 0){
return null;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
try{
InflaterInputStream inflater = new InflaterInputStream(in);
byte[] buffer = new byte[256];
int n;
while((n = inflater.read(buffer)) >= 0){
out.write(buffer, 0, n);
}
return out.toString("utf-8");
}catch (Exception e){
throw new RuntimeException("DeflaterUnCompressError", e);
}
}
//deflate
压缩
public static String deflateCompress(String original){
if(original == null || original.length() == 0){
return null;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream deflater ;
try{
deflater = new DeflaterOutputStream(out);
deflater.write(original.getBytes(StandardCharsets.UTF_8));
deflater.close();
return Base64.getEncoder().encodeToString(out.toByteArray());
}catch (Exception e){
throw new RuntimeException("DeflaterCompressError", e);
}
}
Java中DeflateOutputStream和GZIPOutputStream区别
DeflaterOutputStream实现原始deflate压缩⽅法。GZIPOutputStream为GZIP添加了额外的逻辑: CRC-32 检查,GZIP幻
数,GZIP头,预告⽚等
lzo算法
算法原理
LZO是块压缩算法,它压缩和解压⼀个块数据。压缩和解压的块⼤⼩必须⼀样。
LZO将块数据压缩成匹配数据(滑动字典)和⾮匹配的⽂字序列。LZO对于长匹配和长⽂字序列有专门的处理,这样对于⾼冗余的数据
能够获得很好的效果,这样对于不可压缩的数据,也能得到较好的效果。
底层使⽤的也是LZ77算法。
算法实现
//lzo
解压缩
public static String lzoUnCompress(String str){
LzoDecompressor decompressor = LzoLibrary.getInstance()
.newDecompressor(LzoAlgorithm.LZO1X, null);
try{
ByteArrayOutputStream os = new ByteArrayOutputStream();
ByteArrayInputStream is = new ByteArrayInputStream(Base64.getDecoder().decode(str.getBytes(StandardCharsets.UTF_8)));
LzoInputStream lis = new LzoInputStream(is, decompressor);
int count;
byte[] buffer = new byte[256];
while((count = lis.read(buffer)) != -1){
os.write(buffer, 0, count);
}
return os.toString();
}catch (Exception e){
throw new RuntimeException("lzoUnCompressError", e);
}
}
public static String lzoCompress(String str){
LzoCompressor compressor = LzoLibrary.getInstance().newCompressor(
LzoAlgorithm.LZO1X, null);
ByteArrayOutputStream os = new ByteArrayOutputStream();
LzoOutputStream louts = new LzoOutputStream(os, compressor);
try{
louts.write(str.getBytes(StandardCharsets.UTF_8));
louts.close();
return Base64.getEncoder().encodeToString(os.toByteArray());
}catch (Exception e){
throw new RuntimeException("LzoCompressError", e);
}
}
lz4算法
压缩⽐较差
算法实现
//lz4
解压缩
public static String lz4UnCompress(String str){
byte[] decode = Base64.getDecoder().decode(str.getBytes());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try{
LZ4BlockInputStream lzis = new LZ4BlockInputStream(
new ByteArrayInputStream(decode));
int count;
byte[] buffer = new byte[2048];
while ((count = lzis.read(buffer)) != -1) {
baos.write(buffer, 0, count);
}
lzis.close();
return baos.toString("utf-8");
}catch (Exception e){
throw new RuntimeException("lz4UnCompressError", e);
}
}
//lz4
压缩
public static String lz4Compress(String str){
LZ4Factory factory = LZ4Factory.fastestInstance();
ByteArrayOutputStream byteOutput = new ByteArrayOutputStream();
LZ4Compressor compressor = factory.fastCompressor();
try{
LZ4BlockOutputStream compressedOutput = new LZ4BlockOutputStream(
byteOutput, 2048, compressor);
compressedOutput.write(str.getBytes(StandardCharsets.UTF_8));
compressedOutput.close();
return Base64.getEncoder().encodeToString(byteOutput.toByteArray());
}catch (Exception e){
throw new RuntimeException("lz4CompressError", e);
}
}
snappy算法
从后⾯的性能测试上可以得出初步结论,snappy的压缩解压性能是⾮常优秀的,压缩⽐也较好。
snappy是google基于LZ77的思路编写的快速数据压缩与解压程序库。它的⽬标并⾮最⼤压缩率或与其他压缩程序库的兼容性,⽽是
⾮常⾼的速度和合理的压缩率。
算法实现
//snappy
解压缩
public static String snappyUnCompress(String str){
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try{
byte[] decode = Base64.getDecoder().decode(str.getBytes());
baos.write(Snappy.uncompress(decode));
return baos.toString();
}catch (Exception e){
throw new RuntimeException("snappyUnCompressError", e);
}
}
//snappy
压缩
public static String snappyCompress(String str){
try{
byte[] compress = Snappy.compress(str);
return Base64.getEncoder().encodeToString(compress);
}catch (Exception e){
throw new RuntimeException("snappyCompressError", e);
}
}
上述实现都验证过解压缩结果正确。
算法性能测试
jdk 1.8.0_144
maven相关依赖
代码如下
File ctqRequest = new File("D:Usersj_ties");
StringBuffer sb = new StringBuffer();
try {
FileInputStream fin = new FileInputStream(ctqRequest);
InputStreamReader reader = new InputStreamReader(fin, "UTF-8");
while (reader.ready()) {
sb.append((char) reader.read());
}
fin.close();
fin.close();
} catch (IOException ex) {
ex.printStackTrace();
} finally {
}
//1029265
System.out.println("originalLength:" + sb.toString().length());
/**
* 194292
* 48.23
*/
long start = System.currentTimeMillis();
for(int i = 0;i < 100;i++){
String gzipCompress = ZipUtil.gzipCompress(sb.toString());
ZipUtil.gzipUnCompress(gzipCompress);
System.out.println("gzipLength:" + gzipCompress.length());
}
System.out.println("gzipCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);
/**
* 194276
* 31.46
*/
start = System.currentTimeMillis();
for(int i = 0;i < 100;i++) {
String deflateCompress = ZipUtil.deflateCompress(sb.toString());
ZipUtil.deflateUnCompress(deflateCompress);
System.out.println("deflateLength:" + deflateCompress.length());
}
System.out.println("deflateCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);
/**
* 391508
* 20.0
*/
start = System.currentTimeMillis();
for(int i = 0;i < 100;i++) {
String lzoCompress = ZipUtil.lzoCompress(sb.toString());
ZipUtil.lzoUnCompress(lzoCompress);
System.out.println("lzoLength:" + lzoCompress.length());
}
System.out.println("lzoCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);
/**
* 977604
* 20.01
*/
start = System.currentTimeMillis();
for(int i = 0;i < 100;i++) {
String lzo4Compress = ZipUtil.lz4Compress(sb.toString());
ZipUtil.lz4UnCompress(lzo4Compress);
System.out.println("lz4Length:" + lzo4Compress.length());
}
System.out.println("lz4CostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);
/**
* 360388
* 10.74
*/
start = System.currentTimeMillis();
for(int i = 0;i < 100;i++) {
String snappyCompress = ZipUtil.snappyCompress(sb.toString());
ZipUtil.snappyUnCompress(snappyCompress);
System.out.println("snappyLength:" + snappyCompress.length());
}
System.out.println("snappyCostAverageTime:" + (System.currentTimeMillis() - start) / 100.0);


发布评论