Author: "Xia, Wen" / Publisher: ieee / Topic: back up systems - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xia, Wen"' showing total 4 results

Start Over Author "Xia, Wen" Topic back up systems Publisher ieee

4 results on '"Xia, Wen"'

1. DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads.

Author: Xia, Wen, Jiang, Hong, Feng, Dan, and Tian, Lei
Subjects: *DATA reduction, *DATA structures, *BACK up systems, *REDUNDANCY in engineering, *FEATURE extraction, *RANDOM access memory
Abstract: Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. One of the main challenges facing large-scale data reduction is how to maximally detect and eliminate redundancy at very low overheads. In this paper, we present DARE, a low-overhead deduplication-aware resemblance detection and elimination scheme that effectively exploits existing duplicate-adjacency information for highly efficient resemblance detection in data deduplication based backup/archiving storage systems. The main idea behind DARE is to employ a scheme, call Duplicate-Adjacency based Resemblance Detection (DupAdj), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE only consumes about 1/4 and 1/2 respectively of the computation and indexing overheads required by the traditional super-feature approaches while detecting 2-10 percent more redundancy and achieving a higher throughput, by exploiting existing duplicate-adjacency information for resemblance detection and finding the “sweet spot” for the super-feature approach. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

2. Similarity and Locality Based Indexing for High Performance Data Deduplication.

Author: Xia, Wen, Jiang, Hong, Feng, Dan, and Hua, Yu
Subjects: *DATA transmission systems, *INFORMATION sharing, *BACK up systems, *DATA warehousing, *PROBABILITY theory
Abstract: Data deduplication has gained increasing attention and popularity as a space-efficient approach in backup storage systems. One of the main challenges for centralized data deduplication is the scalability of fingerprint-index search. In this paper, we propose SiLo, a near-exact and scalable deduplication system that effectively and complementarily exploits similarity and locality of data streams to achieve high duplicate elimination, throughput, and well balanced load at extremely low RAM overhead. The main idea behind SiLo is to expose and exploit more similarity by grouping strongly correlated small files into a segment and segmenting large files, and to leverage the locality in the data stream by grouping contiguous segments into blocks to capture similar and duplicate data missed by the probabilistic similarity detection. SiLo also employs a locality based stateless routing algorithm to parallelize and distribute data blocks to multiple backup nodes. By judiciously enhancing similarity through the exploitation of locality and vice versa, SiLo is able to significantly reduce RAM usage for index-lookup, achieve the near-exact efficiency of duplicate elimination, maintain a high deduplication throughput, and obtain load balance among backup nodes. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

3. A Fast Asymmetric Extremum Content Defined Chunking Algorithm for Data Deduplication in Backup Storage Systems.

Author: Zhang, Yucheng, Feng, Dan, Jiang, Hong, Xia, Wen, Fu, Min, Huang, Fangting, and Zhou, Yukun
Subjects: *DATA compression, *BACK up systems, *COMPUTER storage devices, *COMPUTER algorithms, *ROBUST control
Abstract: Chunk-level deduplication plays an important role in backup storage systems. Existing Content-Defined Chunking (CDC) algorithms, while robust in finding suitable chunk boundaries, face the key challenges of (1) low chunking throughput that renders the chunking stage a serious deduplication performance bottleneck, (2) large chunk size variance that decreases deduplication efficiency, and (3) being unable to find proper chunk boundaries in low-entropy strings and thus failing to deduplicate these strings. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shifting problem. As a result, AE has higher chunking throughput, smaller chunk size variance than the existing CDC algorithms, and is able to find proper chunk boundaries in low-entropy strings. The experimental results based on real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by more than $2.3\times$ , which is fast enough to remove the chunking-throughput performance bottleneck of deduplication, and accelerates the system throughput by more than 50 percent, while achieving comparable deduplication efficiency. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

4. Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge.

Author: Fu, Min, Feng, Dan, Hua, Yu, He, Xubin, Chen, Zuoning, Liu, Jingning, Xia, Wen, Huang, Fangting, and Liu, Qing
Subjects: *BACK up systems, *CACHE memory, *ALGORITHM research, *DATA replication, *STORAGE fragmentation (Computer science)
Abstract: In backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. We observe that the fragmentation comes into sparse and out-of-order containers. The sparse container decreases restore performance and garbage collection efficiency, while the out-of-order container decreases restore performance if the restore cache is small. In order to reduce the fragmentation, we propose History-Aware Rewriting algorithm (HAR) and Cache-Aware Filter (CAF). HAR exploits historical information in backup systems to accurately identify and reduce sparse containers, and CAF exploits restore cache knowledge to identify the out-of-order containers that hurt restore performance. CAF efficiently complements HAR in datasets where out-of-order containers are dominant. To reduce the metadata overhead of the garbage collection, we further propose a Container-Marker Algorithm (CMA) to identify valid containers instead of valid chunks. Our extensive experimental results from real-world datasets show HAR significantly improves the restore performance by 2.84-175.36 $\times$ at a cost of only rewriting 0.5-2.03 percent data. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Xia, Wen"'

1. DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads.

2. Similarity and Locality Based Indexing for High Performance Data Deduplication.

3. A Fast Asymmetric Extremum Content Defined Chunking Algorithm for Data Deduplication in Backup Storage Systems.

4. Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

4 results on '"Xia, Wen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources