menu_book Explore the article's raw data

The Design of Fast Delta Encoding for Delta Compression Based Storage Systems

Abstract

Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gearbased rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks' words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X similar to 25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%similar to 240%.

article Article
date_range 2024
language English
link Link of the paper
format_quote
Sorry! There is no raw data available for this article.
Loading references...
Loading citations...
Featured Keywords

No keywords available for this article.

Citations by Year

Share Your Research Data, Enhance Academic Impact