Descriptor: "Global Namespace" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Global Namespace"' showing total 335 results

Start Over Descriptor "Global Namespace"

335 results on '"Global Namespace"'

1. An End-to-End Learning-Based Metadata Management Approach for Distributed File Systems

Author: Yuanning Gao, Ruisi Zhang, Xiaofeng Gao, and Guihai Chen
Subjects: Computer science, Distributed computing, Load balancing (computing), File attribute, Theoretical Computer Science, Metadata, Computational Theory and Mathematics, Hardware and Architecture, Server, Metadata management, Cache, Distributed File System, Global Namespace, Software
Abstract: Current distributed file systems are designed to support PB-scale even EB-scale data storage. Metadata service, which manages file attribute information and the global namespace tree, is crucial to system performance. Distributed metadata management, using multiple metadata servers (MDS's) to store metadata, provides effective approaches to alleviate the workload of a single server. However, maintaining good metadata locality and keeping load balancing among MDS's is a nontrivial problem. In this paper, we present the first machine learning based model called DeepHash, which leverages the neural network to learn a locality preserving hashing (LPH) mapping scheme. DeepHash first converts the metadata nodes to feature vectors by the network embedding technology. Due to the absence of training labels, i.e., the hash values of metadata nodes, we design a pair loss function with distinctive characters to train DeepHash, and introduce the sampling strategy to improve the training efficiency. Besides, we propose an efficient algorithm to dynamically balance the workload and adopt the cache model to improve query efficiency. The experiments on the Amazon EC2 platform demonstrate that the DeepHash can preserve the metadata locality meanwhile maintaining a high load balancing, which denotes the effectiveness and efficiency of DeepHash compared with traditional and state-of-the-art schemes.
Published: 2022
Full Text: View/download PDF

2. Creating Your First Map

Author: Svennerberg, Gabriel, Svennerberg, Gabriel, Wade, Matt, editor, Andres, Clay, editor, Anglin, Steve, editor, Beckner, Mark, editor, Buckingham, Ewan, editor, Cornell, Gary, editor, Gennick, Jonathan, editor, Hassell, Jonathan, editor, Lowman, Michelle, editor, Moodie, Matthew, editor, Parkes, Duncan, editor, Pepper, Jeffrey, editor, Pohlmann, Frank, editor, Pundick, Douglas, editor, Renow-Clarke, Ben, editor, Shakeshaft, Dominic, editor, Welsh, Tom, editor, Tobin, Mary, editor, Blackwell, Jennifer L., editor, and Wimpsett, Kim, editor
Published: 2010
Full Text: View/download PDF

3. An Architecture for a Portable Grid-Enabled Engine

Author: Long, Bruce, Getov, Vladimir, Getov, Vladimir, editor, and Kielmann, Thilo, editor
Published: 2005
Full Text: View/download PDF

4. A Highly Reliable Metadata Service for Large-Scale Distributed File Systems

Author: Weiping Wang, Jiang Zhou, Shuibing He, Dan Meng, and Yong Chen
Subjects: File system, Computer science, Distributed computing, Directory, computer.software_genre, Replication (computing), Failover, Metadata, Computational Theory and Mathematics, Hardware and Architecture, Server, Signal Processing, Metadata management, Data_FILES, Global Namespace, computer
Abstract: Many massive data processing applications nowadays often need long, continuous, and uninterrupted data accesses. Distributed file systems are used as the back-end storage to provide the global namespace management and reliability guarantee. Due to increasing hardware failures and software issues with the growing system scale, metadata service reliability has become a critical issue as it has a direct impact on file and directory operations. Existing metadata management mechanisms can provide fault tolerance capability to some level but are inadequate. They often have limitations in system availability, state consistence, and performance overhead and lack an effective mechanism to offer metadata reliability. This paper introduces a novel highly reliable metadata service to address these issues in large-scale file systems. Different from traditional strategies, this proposed reliable metadata service adopts a new active-standby architecture for fault tolerance and uses a holistic approach to improve file system availability. A new shared storage pool (SSP) is designed for transparent metadata synchronization and replication between active and standby servers. Based on the SSP, a new policy called multiple actives multiple standbys (MAMS) is presented to perform metadata service recovery in case of failures. A new global state recovery strategy and a smart client fault tolerance mechanism are achieved to maintain the continuity of metadata service. We have implemented such highly reliable metadata service in a prototype file system CFS (Clover file system) and conducted extensive tests to evaluate it. Experimental results confirm that it can significantly improve file system reliability with fast failover under different failure scenarios while having negligible influence on performance. Compared with typical reliability designs in Hadoop Avatar, Hadoop HA, and Boom-FS file systems, the mean-time-to-recovery (MTTR) with the highly reliable metadata service was reduced by 80.23, 65.46 and 28.13 percent, respectively.
Published: 2020
Full Text: View/download PDF

5. Streamlining distributed Deep Learning I/O with ad hoc file systems

Author: Alberto Miranda, Marc-André Vef, Frederic Schimmelpfennig, André Brinkmann, Reza Salkhordeh, and Ramon Nou
Subjects: Data set, Workflow, Distributed database, Process (engineering), Computer science, business.industry, Deep learning, Distributed computing, Computer data storage, Data deduplication, Artificial intelligence, Global Namespace, business
Abstract: With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in process, for example. We show in this paper that, in practice, (unbiased) partitioning is not harmful for distributed DL accuracy. Nevertheless, manual partitioning can be error prone and inefficient. Typically, data must be unpacked and shuffled before it is distributed to nodes. We propose a solution that features both: efficient stage-in and fast access to a global namespace to prevent biases. Our architecture is based around an ad hoc storage system relying on a high-speed interconnect allowing an efficient stage-in of DL data sets into a single global namespace. Our proposed solution does not limit access to parts of the data set or relies on data duplication, also relieving the HPC storage system. We obtain high I/O performance during training and ensure minimal interference with communication of the learning workers. The optimizations are transparent to DL applications and their accuracy is not affected by our architecture.
Published: 2021
Full Text: View/download PDF

6. Detecting and understanding JavaScript global identifier conflicts on the web

Author: Mingxue Zhang and Wei Meng
Subjects: business.industry, Computer science, Frame (networking), 020207 software engineering, 02 engineering and technology, Object (computer science), JavaScript, computer.software_genre, Identifier, World Wide Web, Scripting language, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Web application, Global Namespace, business, computer, computer.programming_language
Abstract: JavaScript is widely used for implementing client-side web applications, and it is common to include JavaScript code from many different hosts. However, in a web browser, all the scripts loaded in the same frame share a single global namespace. As a result, a script may read or even overwrite the global objects or functions in other scripts, causing unexpected behaviors. For example, a script can redefine a function in a different script as an object, so that any call of that function would cause an exception at run time. We systematically investigate the client-side JavaScript code integrity problem caused by JavaScript global identifier conflicts in this paper. We developed a browser-based analysis framework, JSObserver, to collect and analyze the write operations to global memory locations by JavaScript code. We identified three categories of conflicts using JSObserver on the Alexa top 100K websites, and detected 145,918 conflicts on 31,615 websites. We reveal that JavaScript global identifier conflicts are prevalent and could cause behavior deviation at run time. In particular, we discovered that 1,611 redefined functions were called after being overwritten, and many scripts modified the value of cookies or redefined cookie-related functions. Our research demonstrated that JavaScript global identifier conflict is an emerging threat to both the web users and the integrity of web applications.
Published: 2020
Full Text: View/download PDF

7. Compiling Chapel

Author: Bradford L. Chamberlain
Subjects: Swift, Interactive programming, Programming language, Computer science, 05 social sciences, 050301 education, 02 engineering and technology, Python (programming language), computer.software_genre, Parallel language, Chapel, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Code generation, Compiler, Global Namespace, 0503 education, computer, computer.programming_language
Abstract: Chapel is a parallel language designed to support productive programming from laptops to supercomputers. Unlike typical general-purpose approaches to large-scale computing, Chapel supports a global namespace and a global view of control. These make programming at scale more productive and natural by avoiding the need to utilize a Single Program, Multiple Data (SPMD) approach. Chapel is being developed as portable, open-source software and has demonstrated its ability to match or beat C/C++/MPI/OpenMP performance using code that's as easy to read as Python or Swift. In this talk, I will introduce some of Chapel's key features for those who are new to the language. However, most of the talk will focus on unique aspects of compiling Chapel. I will illustrate the transformations and analyses we use to implement Chapel's global view of data and control. I will also describe some optimizations that make use of Chapel's feature set to improve performance without sacrificing ease-of-use. In doing so, I will argue that when designed well, a productive language can enhance a compiler's ability to achieve performance and scalability rather than thwarting it. I'll wrap up by touching on some upcoming challenges for the Chapel compiler, including code generation for GPUs, reducing compilation time, and supporting interactive programming.
Published: 2020
Full Text: View/download PDF

8. Pacon: Improving Scalability and Efficiency of Metadata Service through Partial Consistency

Author: Yutong Lu, Yubo Liu, Ming Zhao, and Zhiguang Chen
Subjects: Computer science, Distributed computing, Consistency model, 020207 software engineering, 02 engineering and technology, Metadata, Data sharing, Consistency (database systems), Server, Metadata management, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Distributed File System, Global Namespace, BeeGFS
Abstract: Traditional distributed file systems (DFS) use centralized service to manage metadata. Many studies based on this centralized architecture enhanced metadata processing capability by scaling the metadata server cluster, which is however still difficult to keep up with the growing number of clients and the increasingly metadata-intensive applications. Some solutions abandoned the centralized metadata service and improved scalability by embedding a private metadata service in an HPC application, but these solutions are suitable for only some specific applications and the absence of global namespace makes data sharing and management difficult. This paper addresses the shortcomings of existing studies by optimizing the consistency model of client- side metadata cache for the HPC scenario using a novel partial consistency model. It provides the application with strong consistency guarantee for only its workspace, thus improving metadata scalability without adding hardware or sacrificing the versatility and manageability of DFSes. In addition, the paper proposes batch permission management to reduce path traversal overhead, thereby improving metadata processing efficiency. The result is a library (Pacon) that allows existing DFSes to achieve partial consistency for scalable and efficient metadata management. The paper also presents a comprehensive evaluation using intensive benchmarks and representative application. For example, in file creation, Pacon improves the performance of BeeGFS by more than 76.4 times, and outperforms the state-of-the-art metadata management solution (IndexFS) by more than 4.6 times.
Published: 2020
Full Text: View/download PDF

9. iStore: Towards the Optimization of Federation File Systems

Author: Awais Khan, Youngjae Kim, and Muhammad Attique
Subjects: General Computer Science, Computer science, Distributed computing, Big data, Cloud computing, 02 engineering and technology, computer.software_genre, geo-distributed edge computing, cluster storage and analysis, Server, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Electrical and Electronic Engineering, Global Namespace, File system, Big data storage and HPC, business.industry, General Engineering, 020206 networking & telecommunications, Edge server, Metadata, Data sharing, Analytics, Computer data storage, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, lcsh:TK1-9971, Data migration
Abstract: With the growing volumes of data, many organizations are deploying geo-distributed edge servers and building federations atop of edge servers to improve data sharing, effective collaborations, and analytics. Multiple federation file systems are designed to satisfy such needs, but due to application-specific architectures, these federations neglect some important features that can improve the overall federation performance. In this paper, we address the important challenges of federated file systems, in particular, global namespace, optimal data placement and analysis, efficient data migration across edge servers, and metadata optimizations. To further investigate these challenges, we prototyped the federation file system iStore to emulate the federation and showed the significance of the afore-mentioned key challenges in the federation. The iStore provides unified global namespace atop of geo-distributed edge servers with a generic job and resource-aware data storage and placement algorithm (JRAP), which minimizes the job execution time by considering resources at each edge server. Furthermore, to enable effective data migration, we employed direct channel file layout-aware data transfer and designed a batch-based metadata scheme for federations to reduce the metadata contention with increasing clients. We evaluated the efficacy of various big data applications from data generation to storage and analysis using the iStore on real testbed and simulation.
Published: 2019

10. On Enabling Technologies for the Internet of Important Things

Author: Marten Lohstroh, Beth Osyk, John C. Eidson, Edward A. Lee, Chadlia Jerad, and Hokeun Kim
Subjects: General Computer Science, Computer science, Internet of Things, Cloud computing, 02 engineering and technology, Computer security, computer.software_genre, edge computing, 0202 electrical engineering, electronic engineering, information engineering, Information system, General Materials Science, Global Namespace, software safety, Authentication, business.industry, Cyber-physical systems, General Engineering, 020206 networking & telecommunications, 020207 software engineering, Information security, real-time systems, The Internet, fault tolerance, lcsh:Electrical engineering. Electronics. Nuclear engineering, Communications protocol, business, computer, lcsh:TK1-9971
Abstract: The Internet of Things leverages Internet technology in cyber-physical systems (CPSs), but the protocols and principles of the Internet were designed for interacting with information systems, not cyber-physical systems. For one, timeliness is not a factor in any widespread Internet technology, with quality-of-service features having been routinely omitted for decades. In addition, for things, safety, freedom from physical harm, is even more important than information security, the focus on the Internet. Nevertheless, properties of the Internet are valuable in CPSs, including a global namespace, reliable (eventual) delivery of messages, end-to-end security through asymmetric encryption, certificate-based authentication, and the ability to aggregate data from a multiplicity of sources in the cloud. This paper discusses and surveys architectural approaches, communication protocols, and programming models that promise to bridge the gap, enabling the use of the Internet technologies even in safety-critical, cyber-physical applications such as factory automation and transportation. Specifically, we argue that smart gateways hosted on edge computers complement cloud-based services; they can provide tighter control over timing and security that is robust against network outages, play an active role in managing interactions between things, and isolate safety-critical services from best-effort services. We explain how time sensitive network technology can be leveraged to reliably orchestrate a multiplicity of things, and how augmenting our programming models with a well-defined notion of time can make systems more deterministic and more testable.
Published: 2019

11. A forensic method for efficient file extraction in HDFS based on three-level mapping

Author: Binglong Li and Yuanzhao Gao
Subjects: File system, Multidisciplinary, Database, Computer science, Computer file, 020207 software engineering, 02 engineering and technology, computer.file_format, computer.software_genre, Unix file types, Virtual file system, Torrent file, Self-certifying File System, 020204 information systems, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Global Namespace, computer, File system fragmentation
Abstract: The large scale and distribution of cloud computing storage have become the major challenges in cloud forensics for file extraction. Current disk forensic methods do not adapt to cloud computing well and the forensic research on distributed file system is inadequate. To address the forensic problems, this paper uses the Hadoop distributed file system (HDFS) as a case study and proposes a forensic method for efficient file extraction based on three-level (3L) mapping. First, HDFS is analyzed from overall architecture to local file system. Second, the 3L mapping of an HDFS file from HDFS namespace to data blocks on local file system is established and a recovery method for deleted files based on 3L mapping is presented. Third, a multi-node Hadoop framework via Xen virtualization platform is set up to test the performance of the method. The results indicate that the proposed method could succeed in efficient location of large files stored across data nodes, make selective image of disk data and get high recovery rate of deleted files.
Published: 2017
Full Text: View/download PDF

12. Poster

Author: Wei Meng, Mingxue Zhang, and Yi Wang
Subjects: Computer science, business.industry, 020207 software engineering, 02 engineering and technology, Web developer, JavaScript, computer.software_genre, Global variable, World Wide Web, Scripting language, 020204 information systems, Web page, Like button, 0202 electrical engineering, electronic engineering, information engineering, Web application, Global Namespace, business, computer, computer.programming_language
Abstract: Including JavaScript code from many different hosts is a popular practice in developing web applications. For example, to include a social plugin like the Facebook Like button, a web developer needs to only include a script from facebook.net in her/his web page. However, in a web browser, all the identifiers (i.e., variable names and function names) in scripts loaded in the same frame share a single global namespace. Therefore, a script can overwrite any of the global variables and/or global functions defined in another script, causing unexpected behavior. In this work, we develop a browser-based dynamic analysis framework, that monitors and records any writes to JavaScript global variables and global functions. Our tool is able to cover all the code executed in the run time. We detected 778 conflicts across the Alexa top 1K websites. Our results show that global name conflicts can indeed expose web applications to security risks.
Published: 2019
Full Text: View/download PDF

13. THE PROBLEM OF SELECTION OF INDICATORS, REFLECTING THE PROPORTION OF HIGH-TECH PRODUCTS IN THE GLOBAL MARKET

Author: O. V. Cherchenko, V. G. Zinov, and N G. Kurakova
Subjects: foreign industrial companies, Institutionalisation, foreign patent offices, patents, Post-industrial society, International trade, Intellectual property, science and technology policy, author, patent documents, priority of the russian federation, Economics, Mainstream, Global Namespace, HB71-74, priority directions, lcsh:HB71-74, business.industry, the global market for high-tech products, residents of the russian federation, lcsh:Economics as a science, intellectual property, A share, High tech, indicators, Commerce, Economics as a science, Portfolio, nanotechnologies, business
Abstract: According to the results of the implementation of the priority «industry of nanosystems» an analysis of the use mechanism of protection of intellectual property rights in the global namespace. Argues that only the intellectual property identificeret prospects for capturing a share of the global market of science-intensive products.Therefore, to capture and consolidate global technological leadership in the postindustrial society, primarily in the form of its institutionalization in the system of world standards intellectual property. Noted that the reports of UNESCO and the world intellectual property organization (WIPO), published in 2015, the Program of nanoindustry development in the Russian Federation recognized as not effective in the criterion of low patent activity. The data of the patent analysis in three technological fields of nanotechnology, highlighted WIPO’s as mainstream, estimated the share of Russian patents in the world portfolio in each of these areas.The conclusion about the fundamental relevance of this indicator the priority areas as «the number patents with a priority of the Russian Federation, obtained in foreign countries» to assess the country’s share in the global market high-tech products.
Published: 2017
Full Text: View/download PDF

14. Building a high-performance resilient scalable storage cluster for CORAL using IBM ESS

Author: Gautam Shah and Rezaul Islam
Subjects: File server, General Computer Science, Computer science, Scalability, InfiniBand, Operating system, IBM, Erasure code, computer.software_genre, Global Namespace, Supercomputer, computer, Block (data storage)
Abstract: A high-performance, scalable, and resilient storage subsystem is essential for delivering and maintaining consistent performance and high utilization expected from a modern supercomputer. IBM delivered two systems under the CORAL program, both of which used IBM Spectrum Scale and IBM Elastic Storage Server (ESS) as the storage solution. The larger of the two CORAL clusters is composed of 77 building blocks of ESS, each of which consists of a pair of high-performance I/O Server nodes connected to four high-density storage enclosures. These ESS building blocks are interconnected via a redundant InfiniBand EDR network to form a storage cluster that provides a global namespace aggregating performance over 32,000 commodity disks. The IBM Spectrum Scale for ESS runs high-performance erasure coding on each building block and provides a single global name space across all the building blocks. The IBM Spectrum Scale features deliver a highly resilient, high-performance storage subsystem using ESS. These features include recent improvements for efficient buffer management and fast efficient low-latency communication. CORAL I/O performance results include large-block streaming throughput of over 2.4 TB/s, ability to create over 1 M 32-KB files per second, and enabling an aggregate rate of 30 K zero-length file creates per second in a shared directory from multiple nodes. This article describes the design and implementation of the ESS storage cluster; the innovations required to meet the performance, scale, manageability, and reliability goals; and challenges we had to overcome as we deployed a system of such unprecedented I/O capabilities.
Published: 2020
Full Text: View/download PDF

15. A Universal Namespace Approach to Support Metadata Management and Efficient Data Convergence of HPC and Cloud Scientific Workflows

Author: Hsing-bung Hb Chen
Subjects: File system, Computer science, business.industry, Distributed computing, Cloud computing, computer.software_genre, Metadata, Object storage, Data access, Extended file attributes, Metadata management, Data_FILES, Namespace, business, Global Namespace, computer
Abstract: In this paper, we present an innovative namespace architecture called the Universal Namespace (UNS). UNS enhances both the local and global namespace with portable, mobile, and exchangeable properties and supports fetch-from-anywhere storage systems without requiring remote mounting file systems. UNS exploits the extended file attributes of file systems and enhances file system metadata with unified remote data access information and methods. We design the proposed UNS to support big data computing on multiple geolocation-based shared storage systems. We describe the innovative concepts of the proposed UNS’s architecture and demonstrate UNS’s performance results from various test cases. Finally, we summarize the benefits and advantages of using UNS in supporting big data convergence of HPC and Cloud scientific workflows. Furthermore, we present the future design and developmental direction of UNS.
Published: 2018
Full Text: View/download PDF

16. UNS: A Portable, Mobile, and Exchangeable Namespace for Supporting Fetch-from-Anywhere Big Data Eco-Systems

Author: Hsing-bung Chen, Qiang Guan, and Song Fu
Subjects: File system, business.industry, Computer science, Big data, computer.software_genre, Metadata, Data access, Extended file attributes, Scalability, Data_FILES, Operating system, Namespace, Global Namespace, business, computer
Abstract: Modern global file systems such as parallel file systems or distributed file systems normally use a global namespace (GNS) to support viewing and accessing files that are independent of their physical storage locations. Using GNS support, we can seamlessly modify and reconfigure physical data storage without affecting how users view and access it. However, with the rapid growth of data size, the current design and implementations of the GNS system is experiencing problems of manageability and scalability. In this paper, we present a new namespace architecture called the Universal Namespace (UNS). UNS enhances both the local and global namespace with portable, parallel data mobile, and exchangeable properties and supports fetch-from-anywhere storage systems without requiring remote mounting. UNS exploits the extended file attributes of file systems and enhances file system metadata with unified remote data access information and methods. We design the proposed UNS to support big data computing on multiple geolocation-based shared storage systems. We describe the innovative concepts of the proposed UNS's architecture and demonstrate UNS's early performance results from various test cases. Finally, we summarize the benefits and advantages of using UNS in supporting remote data sharing for extreme scale computing and big data computing. Furthermore, we present the future design and developmental direction of UNS.
Published: 2018
Full Text: View/download PDF

17. Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace

Author: Noah Watkins, Ivo Jimenez, Shel Finkelstein, Patrick Donnelly, Jeff LeFevre, Peter Alvaro, Carlos Maltzahn, and Michael A. Sevilla
Subjects: File system, Speedup, business.industry, Computer science, Serialization, Strong consistency, 02 engineering and technology, computer.software_genre, Durability, Consistency (database systems), POSIX, 020204 information systems, Server, Synchronization (computer science), Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Operating system, 020201 artificial intelligence & image processing, Data center, Global Namespace, business, computer
Abstract: HPC and data center scale application developers are abandoning POSIX IO because file system metadata synchronization and serialization overheads of providing strong consistency and durability are too costly – and often unnecessary – for their applications. Unfortunately, designing file systems with weaker consistency or durability semantics excludes applications that rely on stronger guarantees, forcing developers to re-write their applications or deploy them on a different system. We present a framework and API that lets administrators specify their consistency/durability requirements and dynamically assign them to subtrees in the same namespace, allowing administrators to optimize subtrees over time and space for different workloads. We show similar speedups to related work but more importantly, we show performance improvements when we custom fit subtree semantics to applications such as checkpoint-restart (91.7x speedup), user home directories (0.03 standard deviation from optimal), and users checking for partial results (2% overhead).
Published: 2018
Full Text: View/download PDF

18. Content-based File Sharing in Peer-to-peer Networks Using Threshold

Author: Amol P. Bhagat, Kiran A. Dongre, and Radhika Chaudhari
Subjects: interest extracton, Computer science, BitTorrent tracker, Stub file, threshold based sharing, 02 engineering and technology, computer.software_genre, File sharing, Data file, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Versioning file system, interest oriented file sharing, Global Namespace, SSH File Transfer Protocol, File system fragmentation, General Environmental Science, File system, 020203 distributed computing, business.industry, peer-to-peer network, Device file, 020206 networking & telecommunications, computer.file_format, File sharing system, Torrent file, Shared resource, Self-certifying File System, Content-based file sharing, Journaling file system, General Earth and Planetary Sciences, File area network, business, computer, Computer network
Abstract: In content based file sharing peer-to-peer (P2P) [1] network model nodes share files directly with each other without a centralized server. In such a file sharing system, nodes meet and exchange requests and files in the format of text, short videos, and voice clips in different interest categories. Content is various and large file sharing such as the multimedia content is required with the rapid development of the wireless communication technology. File sharing can also mean having an allocated amount of personal file storage in a common file system. A P2P content based file sharing system, for efficient file searching, threshold takes advantage of node mobility by designating stable nodes, which have the most frequent contact with community members, as community coordinators for intra community searching, and highly mobile nodes that visit other communities frequently as community ambassadors for intercommunity searching. The large file sharing needs more stable end to end path and long transmission time. Last but not least, more relationship between nodes will be used to promote the file sharing process. Content based file sharing is helpful for taking certain decisions during file transmission. These decisions will benefit in proper utilization of network resources. In this paper content-based file sharing scheme using threshold is proposed. The user's interest is determined by the proposed scheme before searching and sharing the files in the peer-to-peer network. The resources in the network are utilized as per the contents of the files to be shared. The performance evaluation show that proposed system significantly lowers transmission cost and improves file sharing success rate compared to current methods.
Published: 2016
Full Text: View/download PDF

19. Social-P2P: An Online Social Network Based P2P File Sharing System

Author: Ze Li, Haiying Shen, and Kang Chen
Subjects: business.industry, Computer science, BitTorrent tracker, Distributed computing, computer.file_format, File sharing system, Torrent file, Distributed hash table, Self-certifying File System, Computational Theory and Mathematics, File sharing, Hardware and Architecture, PlanetLab, Server, Signal Processing, Trust management (information system), Distributed File System, business, Global Namespace, SSH File Transfer Protocol, computer, BitTorrent, Computer network
Abstract: A peer-to-peer (P2P) file sharing system provides a platform that enables a tremendous number of nodes to share their files. Retrieving desired files efficiently and trustworthily is critical in such a large and jumbled system. However, the issues of efficient searching and trustworthy searching have only been studied separately. Simply combining the methods to achieve the two goals doubles system overhead. In this paper, we first study trace data from Facebook and BitTorrent. Guided by the observations, we propose a system that integrates a social network into a P2P network, named Social-P2P, for simultaneous efficient and trustworthy file sharing. It incorporates three mechanisms: (1) interest/trust-based structure, (2) interest/trust-based file searching, and (3) trust relationship adjustment. By exploiting the social interests and relationships in the social network, the interest/trust-based structure groups common-multi-interest nodes into a cluster and further connects socially close nodes within a cluster. The comparably stable nodes in each cluster form a Distributed Hash Table (DHT) for inter-cluster file searching. In the interest/trust-based file searching mechanism, a file query is forwarded to the cluster of the file by the DHT routing first. Then, it is forwarded along constructed connections within a cluster, which achieves high hit rate and reliable routing. Moreover, sharing files among socially close friends discourages nodes from providing faulty files because people are unlikely to risk their reputation in the real-world. In the trust relationship adjustment mechanism, each node in a routing path adaptively decreases its trust on the node that has forwarded a faulty file in order to avoid routing queries towards misbehaving nodes later on. We conducted extensive trace-driven simulations and implemented a prototype on PlanetLab. Experimental results show that Social-P2P achieves highly efficient and trustworthy file sharing compared to current file sharing systems and trust management systems.
Published: 2015
Full Text: View/download PDF

20. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Author: Khai Leong Yong, Yonggang Wen, Weiya Xi, Yew-Soon Ong, Quanqing Xu, and Rajesh Vellore Arumugam
Subjects: Metadata, Network Load Balancing Services, General Computer Science, Computer science, Server, Distributed computing, Scalability, Metadata management, Cache, Load balancing (computing), Global Namespace, Theoretical Computer Science
Abstract: Big data is an emerging term in the storage industry, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load balancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improving quality of services.Many good approaches have been proposed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request distributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication. In this paper, we propose Cloud Cache (C2), an adaptive and scalable load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balancing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C2 runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, loadshedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used, in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack property. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C2.
Published: 2015
Full Text: View/download PDF

21. A Proximity-Aware Interest-Clustered P2P File Sharing System

Author: Lee Ward, Haiying Shen, and Guoxin Liu
Subjects: Computer science, BitTorrent tracker, Stub file, Overlay, Class implementation file, computer.software_genre, File replication, File sharing, Server, Data_FILES, Versioning file system, Distributed File System, SSH File Transfer Protocol, Global Namespace, File system fragmentation, Database, business.industry, Device file, computer.file_format, Data structure, File sharing system, Torrent file, Self-certifying File System, Computational Theory and Mathematics, Hardware and Architecture, Journaling file system, Signal Processing, File area network, business, computer, Computer network
Abstract: Efficient file query is important to the overall performance of peer-to-peer (P2P) file sharing systems. Clustering peers by their common interests can significantly enhance the efficiency of file query. Clustering peers by their physical proximity can also improve file query performance. However, few current works are able to cluster peers based on both peer interest and physical proximity. Although structured P2Ps provide higher file query efficiency than unstructured P2Ps, it is difficult to realize it due to their strictly defined topologies. In this work, we introduce a Proximity-Aware and Interest-clustered P2P file sharing System (PAIS) based on a structured P2P, which forms physically-close nodes into a cluster and further groups physically-close and common-interest nodes into a sub-cluster based on a hierarchical topology. PAIS uses an intelligent file replication algorithm to further enhance file query efficiency. It creates replicas of files that are frequently requested by a group of physically close nodes in their location. Moreover, PAIS enhances the intra-sub-cluster file searching through several approaches. First, it further classifies the interest of a sub-cluster to a number of sub-interests, and clusters common-sub-interest nodes into a group for file sharing. Second, PAIS builds an overlay for each group that connects lower capacity nodes to higher capacity nodes for distributed file querying while avoiding node overload. Third, to reduce file searching delay, PAIS uses proactive file information collection so that a file requester can know if its requested file is in its nearby nodes. Fourth, to reduce the overhead of the file information collection, PAIS uses bloom filter based file information collection and corresponding distributed file searching. Fifth, to improve the file sharing efficiency, PAIS ranks the bloom filter results in order. Sixth, considering that a recently visited file tends to be visited again, the bloom filter based approach is enhanced by only checking the newly added bloom filter information to reduce file searching delay. Trace-driven experimental results from the real-world PlanetLab testbed demonstrate that PAIS dramatically reduces overhead and enhances the efficiency of file sharing with and without churn. Further, the experimental results show the high effectiveness of the intra-sub-cluster file searching approaches in improving file searching efficiency.
Published: 2015
Full Text: View/download PDF

22. 支持EB级存储的元数据服务器集群系统

Author: Liu Jian, Dong Huanqing, Liu Zhenjun, Shao BingQing, Xu Lu, and Zhang Jun-wei
Subjects: Data element, General Computer Science, Database, Computer science, Meta Data Services, Information repository, computer.software_genre, Metadata repository, Metadata, Server farm, Converged storage, Data_FILES, Global Namespace, Engineering (miscellaneous), computer
Abstract: With data volumes rapidly growing on a global scale, PB-scale storage systems will not meet the data-storage requirement capacity of applications in the future. It is time for the research and development of EB-scale storage systems that are general purpose, highly scalable, and easily deployed. Metadata server clustering is the core technology for constructing EB-scale storage systems. Based on pNFS and the Blue Whale block device file system EXFS, Blue Whale metadata server clustering technology for EB-scale storage is presented. To begin, the differences between EB-scale metadata server systems and PB-scale are analyzed. Then, the prototype architecture of a metadata server cluster and the distribution strategy of a global namespace are introduced. Following this, the atomic operation protocol for consistency based on a distributed log is presented. Finally, fine-grained metadata migration technology with low latency is proposed. Measurement results confirm that each metadata server can offer more than 10,000 OPS to EB-scale storage systems simultaneously.
Published: 2015
Full Text: View/download PDF

23. Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Replication for Efficient File Sharing

Author: Kang Chen and Haiying Shen
Subjects: Wireless ad hoc network, business.industry, BitTorrent tracker, Computer science, Replica, Distributed computing, Mobile computing, Mobile ad hoc network, Peer-to-peer, computer.software_genre, Replication (computing), Theoretical Computer Science, Shared resource, Self-certifying File System, Computational Theory and Mathematics, File sharing, Hardware and Architecture, Data_FILES, Resource allocation, SSH File Transfer Protocol, Global Namespace, business, computer, Software, Computer network
Abstract: File sharing applications in mobile ad hoc networks (MANETs) have attracted more and more attention in recent years. The efficiency of file querying suffers from the distinctive properties of such networks including node mobility and limited communication range and resource. An intuitive method to alleviate this problem is to create file replicas in the network. However, despite the efforts on file replication, no research has focused on the global optimal replica creation with minimum average querying delay. Specifically, current file replication protocols in mobile ad hoc networks have two shortcomings. First, they lack a rule to allocate limited resources to different files in order to minimize the average querying delay. Second, they simply consider storage as available resources for replicas, but neglect the fact that the file holders’ frequency of meeting other nodes also plays an important role in determining file availability. Actually, a node that has a higher meeting frequency with others provides higher availability to its files. This becomes even more evident in sparsely distributed MANETs, in which nodes meet disruptively. In this paper, we introduce a new concept of resource for file replication, which considers both node storage and meeting frequency. We theoretically study the influence of resource allocation on the average querying delay and derive a resource allocation rule to minimize the average querying delay. We further propose a distributed file replication protocol to realize the proposed rule. Extensive trace-driven experiments with synthesized traces and real traces show that our protocol can achieve shorter average querying delay at a lower cost than current replication protocols.
Published: 2015
Full Text: View/download PDF

24. Scientific user behavior and data-sharing trends in a petascale file system

Author: Sudharshan S. Vazhkudai, Seung-Hwan Lim, Raghul Gunasekaran, and Hyogi Sim
Subjects: File system, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Everything is a file, computer.software_genre, Supercomputer, Metadata, World Wide Web, Data sharing, Petascale computing, Self-certifying File System, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Snapshot (computer storage), Global Namespace, Distributed File System, computer
Abstract: The Oak Ridge Leadership Computing Facility (OLCF) runs the No. 4 supercomputer in the world, supported by a petascale file system, to facilitate scientific discovery. In this paper, using the daily file system metadata snapshots collected over 500 days, we have studied the behavioral trends of 1, 362 active users and 380 projects across 35 science domains. In particular, we have analyzed both individual and collective behavior of users and projects, highlighting needs from individual communities and the overall requirements to operate the file system. We have analyzed the metadata across three dimensions, namely (i) the projects' file generation and usage trends, using quantitative file system-centric metrics, (ii) scientific user behavior on the file system, and (iii) the data sharing trends of users and projects. To the best of our knowledge, our work is the first of its kind to provide comprehensive insights on user behavior from multiple science domains through metadata analysis of a large-scale shared file system. We envision that this OLCF case study will provide valuable insights for the design, operation, and management of storage systems at scale, and also encourage other HPC centers to undertake similar such efforts.
Published: 2017
Full Text: View/download PDF

25. LocoFS

Author: Siyang Li, Yang Hu, Tao Li, Jiwu Shu, and Youyou Lu
Subjects: Computer science, 02 engineering and technology, Directory, computer.software_genre, Distributed data store, Data file, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Global Namespace, Distributed File System, SSH File Transfer Protocol, File system, 020203 distributed computing, Database, Storage Resource Broker, Computer file, Directory information tree, 020206 networking & telecommunications, computer.file_format, Replication (computing), Torrent file, Metadata, Self-certifying File System, Scalability, File area network, computer
Abstract: Key-Value stores provide scalable metadata service for distributed file systems. However, the metadata's organization itself, which is organized using a directory tree structure, does not fit the key-value access pattern, thereby limiting the performance. To address this issue, we propose a distributed file system with a loosely-coupled metadata service, LocoFS, to bridge the performance gap between file system metadata and key-value stores. LocoFS is designed to decouple the dependencies between different kinds of metadata with two techniques. First, LocoFS decouples the directory content and structure, which organizes file and directory index nodes in a flat space while reversely indexing the directory entries. Second, it decouples the file metadata to further improve the key-value access performance. Evaluations show that LocoFS with eight nodes boosts the metadata throughput by 5 times, which approaches 93% throughput of a single-node key-value store, compared to 18% in the state-of-the-art IndexFS.
Published: 2017
Full Text: View/download PDF

26. Research on Consistency Maintenance of File Management in Real-time Cloud Office System

Author: Zhang Qiang, Gao Liping, and Zhang Xin
Subjects: Database, business.industry, Computer science, Cloud computing, computer.software_genre, WebSocket, Self-certifying File System, Journaling file system, Operating system, The Internet, business, SSH File Transfer Protocol, Global Namespace, Cloud storage, computer
Abstract: With the rapid development of Internet, cloud environment work becomes more and more popular, the requirements of user experience are increasingly high, research on collaborative work in cloud environment has gradually become the mainstream of collaborative work. Based on this, collaborative work under traditional P2P has been unable to meet the needs of the development of the network changes. According to the new architecture, the higher real-time requirements and others in the cloud environment, this paper improves COT algorithm which can adapt to the C/S architecture, reduce unnecessary data transmission and query conversion time, and achieve higher real-time requirements; In this paper, the file model under cloud storage is improved into cooperative cloud office file management system, and the improved COT algorithm is applied to file management in the cooperative office system in view of the three operations(create, delete, rename) of the file nodes in the file model. In addition, For the sake of verify the feasibility and effectiveness of the algorithm, this paper develops a Cross-Platform real-time cloud office file management system based on WebSocket protocol: CloudFileSystem.
Published: 2017
Full Text: View/download PDF

27. Fragmentation prevention in the cloud file system for P2P file sharing using smart devices

Author: Chao-Hsien Lee and Shih-Kuei Yu
Subjects: File system, 020203 distributed computing, BitTorrent tracker, business.industry, Computer science, Fragmentation (computing), Cloud computing, 02 engineering and technology, computer.file_format, computer.software_genre, Self-certifying File System, File sharing, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Defragmentation, SSH File Transfer Protocol, Global Namespace, business, computer, BitTorrent, File system fragmentation, Computer network
Abstract: More and more network services integrate the cloud platform to deliver their services. In order to accelerate the data sharing among users, data are usually divided and encapsulated into small pieces before delivery. However, the cloud file system enlarges the basic block size for maximizing the access throughput. This conflict between network transmission and cloud file system may induce high fragmentation in the cloud file system. In this paper, we propose one minimum block defragmentation mechanism (MBDM) to avoid any fragmentation in the cloud file system even if data are exchanged piece by piece. Based on our experiments, the proposed MBDM save 36% time and 46% energy consumption than the traditional BitTorrent (BT) service.
Published: 2017
Full Text: View/download PDF

28. A Lightweight OS-Level Virtualization Architecture Based on Android

Author: Nai-jie Gu, Bo-wen Liu, and De-he Gu
Subjects: Application virtualization, business.industry, Computer science, Full virtualization, Linux kernel, computer.software_genre, Virtualization, Virtual machine, Embedded system, Operating system, Android (operating system), business, Global Namespace, computer, Mobile device
Abstract: All kinds of intelligent terminal equipment emerge in endlessly and grow explosively. Typical devices, such as smartphones and tablets, are becoming more and more popular, furthermore, intelligent equipment market share has more than that on PCs. And on mobile devices area, the demand for virtualization technology is gradually increasing. As an important branch of computer technology development, operating system level virtualization is becoming a hot spot of current research due to its low overhead and the advantages of lightweight. In this paper, based on the Android system on mobile devices, implement a prototype which can run multiple virtual system instances on only an Android OS by modifying the namespace mechanism in the Linux kernel to and expand into driver namespace mechanism. In addition, put forward a model called the active- inactive model guaranteeing the process if and only if in the active namespace to operate the device drivers, which was designed and implemented a new scheme of operating system level virtualization.
Published: 2017
Full Text: View/download PDF

29. An efficient replication scheme to increase file availability in mobile peer to peer systems

Author: Rahmani Moufida and Mahfoud Benchaïba
Subjects: business.industry, Computer science, BitTorrent tracker, Replica, Distributed computing, 020206 networking & telecommunications, 02 engineering and technology, Peer-to-peer, computer.software_genre, Replication (computing), Self-certifying File System, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), 020201 artificial intelligence & image processing, business, Global Namespace, computer, File system fragmentation, Computer network
Abstract: The search efficiency improvement especially concerning search for rare files is a fundamental challenge in mobile peer-to-peer (MP2P) systems. In this paper, we propose a simple and efficient replication technique to increase file availability in MP2P systems which enhances the search efficiency. The aim of our proposal is to maintain a threshold level of availability at all times in order to find a file within a small number of overlay hops even for rare file. We base on a global availability estimation to choose suitable file to replicate and suitable replica degree. Simulations show that our proposal significantly improves search efficiency in terms of search success, search delay and overhead.
Published: 2017
Full Text: View/download PDF

30. An Early Functional and Performance Experiment of the MarFS Hybrid Storage EcoSystem

Author: Gary Grider, David Montoya, and Hsing-bung Chen
Subjects: File system, Database, business.industry, Computer science, Storage Resource Broker, Computer file, Cloud computing, Information repository, computer.software_genre, Object storage, Self-certifying File System, Operating system, business, Global Namespace, computer
Abstract: Many computing sites, LANL being one of them, have a requirement for long-term retention of mostly cold data. Although the main function of this storage tier is capacity, it does also have a bandwidth requirement. For many years, tape was the best economic solution for this requirement. However, over time, data sets have grown larger more quickly than tape bandwidth has improved. We have now entered a regime in which disk is the more economically efficient medium for this storage tier. Also more and more, data dominates the computing world. There is a "sea" of data out there in many different formats such as file, object, and Key-value that needs to be efficiently managed and effectively used. In this paper, we introduce a new hybrid storage system named MarFS. MarFS is a Near-POSIX File System using scale-out commercial/cloud for data and many POSIX file systems for metadata services. MarFS is an approach to support a data lake for HPC that sits on industry based commodity storage hardware and is a software layer that provides a global namespace and near POSIX semantics. MarFS provides the capability to serve as an umbrella over a variety of underling storage layers. In this paper, we present the system architecture of the proposed MarFS near-POISX file system, we conduct early functional performance testing cases on MarFS's software components, and finally we address current deployment status and future development works of the MarFS.
Published: 2017
Full Text: View/download PDF

31. Implementation of a DB-Based Virtual File System for Lightweight IoT Clouds

Author: Ki Hyeon Kwon and Hyung Bong Lee
Subjects: File system, Computer science, business.industry, Directory, computer.software_genre, Virtual file system, Directory structure, Self-certifying File System, Data_FILES, Operating system, Global Namespace, SSH File Transfer Protocol, Distributed File System, business, computer, Computer network
Abstract: IoT(Internet of Things) is a concept of connected internet pursuing direct access to devices or sensors in fused environment of personal, industrial and public area. In IoT environment, it is possible to access realtime data, and the data format and topology of devices are diverse. Also, there are bidirectional communications between users and devices to control actuators in IoT. In this point, IoT is different from the conventional internet in which data are produced by human desktops and gathered in server systems by way of one-sided simple internet communications. For the cloud or portal service of IoT, there should be a file management framework supporting systematic naming service and unified data access interface encompassing the variety of IoT things. This paper implements a DB-based virtual file system maintaining attributes of IoT things in a UNIX-styled file system view. Users who logged in the virtual shell are able to explore IoT things by navigating the virtual file system, and able to access IoT things directly via UNIX-styled file I⋅O APIs. The implemented virtual file system is lightweight and flexible because it maintains only directory structure and descriptors for the distributed IoT things. The result of a test for the virtual shell primitives such as mkdir() or chdir() shows the smooth functionality of the virtual file system, Also, the exploring performance of the file system is better than that of Window file system in case of adopting a simple directory cache mechanism.Keywords:IoT(Internet of Things), Cloud Service, VFS(Virtual File System), DB-Based File System
Published: 2014
Full Text: View/download PDF

32. SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems

Author: Yifeng Zhu, Hong Jiang, Dan Feng, Lei Xu, and Yu Hua
Subjects: Scheme (programming language), Computer science, Distributed computing, Volume (computing), Directory, computer.software_genre, Namespace-based Validation Dispatching Language, Data access, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, Data_FILES, Operating system, Namespace, Global Namespace, computer, computer.programming_language
Abstract: The explosive growth in data volume and complexity imposes great challenges for file systems. To address these challenges, an innovative namespace management scheme is in desperate need to provide both the ease and efficiency of data access. In almost all today's file systems, the namespace management is based on hierarchical directory trees. This tree-based namespace scheme is prone to severe performance bottlenecks and often fails to provide real-time response to complex data lookups. This paper proposes a Semantic-Aware Namespace scheme, called SANE, which provides dynamic and adaptive namespace management for ultra-large storage systems with billions of files. SANE introduces a new naming methodology based on the notion of semantic-aware per-file namespace, which exploits semantic correlations among files, to dynamically aggregate correlated files into small, flat but readily manageable groups to achieve fast and accurate lookups. SANE is implemented as a middleware in conventional file systems and works orthogonally with hierarchical directory trees. The semantic correlations and file groups identified in SANE can also be used to facilitate file prefetching and data de-duplication, among other system-level optimizations. Extensive trace-driven experiments on our prototype implementation validate the efficacy and efficiency of SANE.
Published: 2014
Full Text: View/download PDF

33. A Survey on Different File System Approach

Author: Vivek B. Kute and Priya N. Parkhi
Subjects: File system, Database, Computer science, Storage Resource Broker, Scalability, Data_FILES, Namespace, computer.software_genre, Global Namespace, computer, Namespace-based Validation Dispatching Language
Abstract: his paper, provide survey of the proposed namespace management schemes for file system. Namespace management can be used to reduce exhaustive search over all directories. Namespace using semantic correlation can also increase search ability. File system namespace as an information organizing infrastructure is a help to improve system's quality of service such as performance, scalability, and ease of use. This paper discusses the improvement to be made for future proposed namespace schemes. This paper provides reader with the basis for research in namespace schemes for file system.
Published: 2015
Full Text: View/download PDF

34. Highly reliable message-passing mechanism for cluster file system

Author: Jin Xiong, Dan Meng, Jiang Zhou, Weiping Wang, and Can Ma
Subjects: File system, Computer Networks and Communications, business.industry, Computer science, Distributed computing, Stub file, computer.software_genre, Failover, Shared resource, Self-certifying File System, SSH File Transfer Protocol, business, Global Namespace, computer, Software, File system fragmentation, Computer network
Abstract: With the increase in personal computer clusters in popularity and quantity, message passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations; each includes one or more network transmissions. Any network failures cause the file system service unavailable. In this paper, we describe a highly reliable message-passing mechanism HR-NET, which tolerates both software and hardware network failures. HR-NET provides fine-grained, connection-level failover across redundant communication paths. With it, the file system can keep passing messages because HR-NET handles failures automatically by either recovery from network failures or failed over to a backup; therefore, it screens network failures from requests or data transmission of cluster file system. Load balance for messages is also achieved to relieve network traffic. For transmission timeout, HR-NET proposes a priority-based message scheduling which dynamically manages messages in an appropriate order to tolerate request–response failures between clients and servers. HR-NET is implemented upon standard network protocol stack. Performance results show that HR-NET can provide almost full underlying network bandwidth with average 6.17% throughput loss and provide a fast recovery. Experiments with cluster file system show that the overall performance degradation is below 8% due to failover of HR-NET while the reliability is highly enhanced.
Published: 2013
Full Text: View/download PDF

35. Analysis of distribution time of multiple files in a P2P network

Author: Pui-Sze Tsang, King-Shan Lui, and Xiang Meng
Subjects: Computer Networks and Communications, Download, Computer science, business.industry, Distributed computing, Stub file, Device file, computer.file_format, Unix file types, File sharing system, Torrent file, Self-certifying File System, File sharing, Journaling file system, Data file, Data_FILES, Versioning file system, SSH File Transfer Protocol, Global Namespace, business, File synchronization, computer, File system fragmentation, Computer network
Abstract: Peer-to-Peer (P2P) file sharing attracts much attention due to its scalability and robustness. One important metric in measuring the performance of a P2P file sharing system is the amount of time required for all peers to get the files. We refer this time as the file distribution time. Researchers have proposed protocols to minimize the file distribution time under different situation. However, most works are based on the single-file scenario. On the other hand, there are studies showing that in a file sharing application, users may download multiple files at the same time. In this paper, we analyze the minimum time needed to distribute multiple files. We develop an explicit expression for the minimum amount of time needed to distribute multiple files in a heterogeneous P2P fluid model. Unlike the single-file scenario, we demonstrate that the theoretical lower bound in multi-file is not always achievable. With a comprehensive consideration of all the configurations, we elaborate how to partition the bandwidth capacities of both seeds and leechers for a particular file such that the finish time is optimal.
Published: 2013
Full Text: View/download PDF

36. A File Sharing Method Based on P2P Small World Model

Author: Xiongfei Li, Wei Li, and Qinsheng Du
Subjects: World Wide Web, Self-certifying File System, File sharing, Database, Computer science, Computer Science (miscellaneous), computer.file_format, SSH File Transfer Protocol, computer.software_genre, Global Namespace, computer, File system fragmentation, Torrent file
Published: 2013
Full Text: View/download PDF

37. Managed file transfer: the next stage for data in motion?

Author: Dan Dunford
Subjects: Information Systems and Management, SIMPLE (military communications protocol), Managed file transfer, Computer Networks and Communications, Computer science, Control (management), Limiting, Computer security, computer.software_genre, Data at Rest, Business environment, File transfer, Safety, Risk, Reliability and Quality, Global Namespace, computer
Abstract: Organisations of all types and sizes have come to rely on file transfer technology to help their businesses run smoothly. But basic file transfer technology is inherently limiting and inadequate. These days, regulatory demands and changing corporate needs are putting pressure on traditional methods and solutions. Organisations of all types and sizes have come to rely on file transfer technology to help their businesses run smoothly. But basic file transfer technology is inherently limiting and inadequate. Many file transfer applications were designed as simple utilities, not as enterprise solutions, and lack the management, control and integration capabilities needed to support today's challenging business environment. The answer is the next generation of managed file transfer (MFT), explains Dan Dunford of Attachmate.
Published: 2013
Full Text: View/download PDF

38. A secure file sharing service for distributed computing environments

Author: Ugo Fiore, Luigi Catuogno, Aniello Castiglione, Aniello Del Sorbo, and Francesco Palmieri
Subjects: Computer science, Distributed computing, Distributed file system, Access control, Theoretical Computer Science, File sharing, Grid computing, Key escrow, Key management/distribution, Threshold based schemes, Software, Information Systems, Hardware and Architecture, SSH File Transfer Protocol, Key management, Global Namespace, Distributed File System, business.industry, Device file, computer.file_format, Everything is a file, Replication (computing), Torrent file, Self-certifying File System, Key (cryptography), business, computer, Computer network
Abstract: Distributed cryptographic file systems enable file sharing among their users and need the adoption of a key management scheme for the distribution of the cryptographic keys to authorized users according to their specific degree of trust. In this paper we describe the architecture of a basic secure file sharing facility relying on a multi-party threshold-based key-sharing scheme that can be overlaid on top of the existing stackable networked file systems, and discuss its application to the implementation of distributed cryptographic file systems. It provides flexible access control policies supporting multiple combination of roles and trust profiles. A proof of concept prototype implementation within the Linux operating system framework demonstrated its effectiveness in terms of performance and security robustness.
Published: 2013
Full Text: View/download PDF

39. GFS:a Graph-based File System Enhanced with Semantic Features

Author: Filippo Geraci and Daniele Di Sarli
Subjects: File system, user content, Computer science, file system, Working directory, 0211 other engineering and technologies, 02 engineering and technology, computer.file_format, Directory, computer.software_genre, Torrent file, Semantics, Design rule for Camera File system, World Wide Web, Self-certifying File System, Journaling file system, 021105 building & construction, 0202 electrical engineering, electronic engineering, information engineering, Data_FILES, 020201 artificial intelligence & image processing, Global Namespace, computer
Abstract: Organizing documents in the file system is one of the most tedious and thorny tasks for most computer users. Taxonomies based on hand made directory hierarchies still remain the only possible alternative for most small and medium enterprises, public administrations and individual users. However, both the limitations of the hierarchical organization of file systems and the difficulty of maintaining the coherence within the taxonomy have raised the need for more scalable and effective approaches. Desktop searching applications provide proprietary interfaces that enable content-based searching at the cost of having no control on the indexing and ranking of results. Semantic file systems, instead, leave users the freedom to manage the taxonomy according to their specific needs, but lose the standard file system features. In this paper we describe GFS (graph-based file system) a new hybrid file system that extends the standard hierarchical organization of files with semantic features. GFS allows the user to nest semantic spaces inside the directory hierarchy leaving unaltered system folders. Semantic spaces allow customized file tagging and leverage on browsing to guide file searching. Since GFS does not change the low-level interface to interact with file systems, users can continue to use their favorite file managers to interact with it. Moreover, no changes are required to integrate the semantic features in proprietary software.
Published: 2017

40. A Framework for Power of 2 Based Scalable Data Storage in Object-Based File System

Author: Ni Lar Thein and Ohnmar Aung
Subjects: File system, Object storage, Self-certifying File System, Computer science, Storage Resource Broker, Device file, Operating system, computer.software_genre, Global Namespace, SSH File Transfer Protocol, computer, File system fragmentation
Published: 2013
Full Text: View/download PDF

41. StepAhead

Author: Debadatta Mishra, Purushottam Kulkarni, and Raju Rangaswami
Subjects: File system, 020203 distributed computing, Computer science, Path (computing), 020206 networking & telecommunications, 02 engineering and technology, Directory, computer.software_genre, Data_FILES, 0202 electrical engineering, electronic engineering, information engineering, Operating system, Cache, Namespace, Global Namespace, computer, Naming collision, Abstraction (linguistics)
Abstract: A hierarchical namespace is a common abstraction used for data organization within modern file systems. Fast translation of namespace objects to physical locations is necessary to carry out efficient file system operations. For reasons attributed to modularity, security, and to some extent legacy, namespace translations involves iterative translation of intervening directory objects from the root of the namespace. Namespace resolution is typically a multi-step process, potentially involving serialized I/O operations at each step. In this paper, we propose a rethink of the strategy to fetch pathname entries. Our technique, StepAhead, proactively utilizes hints about namespace translation lookup failures to enable parallel and just-in-time fetching of necessary path translation data into memory to increase cache hits significantly. With StepAhead, we measure an increase in cache hit rates for path translation data across a set of six workloads by as much as 51%, which in turn results in application speed-up of as much as 20%.
Published: 2016
Full Text: View/download PDF

42. Hash-Based Overlay Routing Architecture for Information Centric Networks

Author: Aytac Azgin, Guoqiang Wang, and Ravishankar Ravindran
Subjects: business.industry, Computer science, Distributed computing, Routing table, 05 social sciences, Hash function, Packet forwarding, 050801 communication & media studies, 020206 networking & telecommunications, 02 engineering and technology, Overlay, 0508 media and communications, Key-based routing, 0202 electrical engineering, electronic engineering, information engineering, Forwarding plane, Overhead (computing), business, Global Namespace, Computer network
Abstract: In this paper, we propose an overlay routing architecture for information centric networks that operates on hash-based groupings of content routers and namespaces. Proposed architecture creates hash-based clusters within each domain using globally shared hash functions, and distributes the global namespace to different clusters. In doing so, we achieve the following notable improvements in control and data plane operations: (i) reduced communication overhead in setting up and maintaining the routing tables (which is resulted by hierarchical communication framework and effective distribution of network resources), (ii) reduced storage overhead (to manage control/data plane operations), and (iii) improved utilization of system resources at the content routers (i.e., storage/computing), which are allocated to in-network caching and packet forwarding operations, due to use of hash-dependent caching and overlay-driven request forwarding.
Published: 2016
Full Text: View/download PDF

43. Improving the network scalability of Erlang

Author: Natalia Chechina, Phil Trinder, Amir Ghaffari, Simon Thompson, and Huiqing Li
Subjects: Computer Networks and Communications, Scala, Computer science, Distributed computing, 02 engineering and technology, computer.software_genre, Network topology, Operational semantics, Theoretical Computer Science, QA76, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Global Namespace, Join-pattern, computer.programming_language, business.industry, 020207 software engineering, Erlang (programming language), Hardware and Architecture, Virtual machine, Scalability, Actor model, Elixir (programming language), business, computer, Software, Computer network
Abstract: As the number of cores grows in commodity architectures so does the likelihood of failures. A distributed actor model potentially facilitates the development of reliable and scalable software on these architectures. Key components include lightweight processes which 'share nothing' and hence can fail independently. Erlang is not only increasingly widely used, but the underlying actor model has been a beacon for programming language design, influencing for example Scala, Clojure and Cloud Haskell.While the Erlang distributed actor model is inherently scalable, we demonstrate that it is limited by some pragmatic factors. We address two network scalability issues here: globally registered process names must be updated on every node (virtual machine) in the system, and any Erlang nodes that communicate maintain an active connection. That is, there is a fully connected O ( n 2 ) network of n nodes.We present the design, implementation, and initial evaluation of a conservative extension of Erlang - Scalable Distributed (SD) Erlang. SD Erlang partitions the global namespace and connection network using s_groups. An s_group is a set of nodes with its own process namespace and with a fully connected network within the s_group, but only individual connections outside it. As a node may belong to more than one s_group it is possible to construct arbitrary connection topologies like trees or rings.We present an operational semantics for the s_group functions, and outline the validation of conformance between the implementation and the semantics using the QuickCheck automatic testing tool. Our preliminary evaluation in comparison with distributed Erlang shows that SD Erlang dramatically improves network scalability even if the number of global operations is tiny (0.01%). Moreover, even in the absence of global operations the reduced connection maintenance overheads mean that SD Erlang scales better beyond 80 nodes (1920 cores). We address the network scalability limitations of distributed Erlang.We present the design and implementation of Scalable Distributed Erlang.We give a semantics for scalable groups and validate the implementation against it.We provide a preliminary evaluation of distributed Erlang and SD Erlang performance.The performance evaluation shows that introducing s_groups improves the scalability.
Published: 2016

44. FARMER: A novel approach to file access correlation mining and evaluation reference model

Author: Peng Xia, Fang Wang, and Dan Feng
Subjects: File system, Computer science, General Mathematics, General Engineering, computer.file_format, Class implementation file, computer.software_genre, Memory-mapped file, Torrent file, Self-certifying File System, Journaling file system, Data_FILES, Data mining, Global Namespace, computer, File system fragmentation
Abstract: File semantic has proven effective in optimizing large scale distributed file system. As a consequence of the elaborate and rich I/O interfaces between upper layer applications and file systems, file system can provide useful and insightful information about semantic. Hence, file semantic mining has become an increasingly important practice in both engineering and research community. Unfortunately, it is a challenge to exploit file semantic knowledge because a variety of factors could affect this information exploration process. Even worse, the challenges are exacerbated due to the intricate interdependency between these factors, and make it difficult to fully exploit the potentially important correlation among various semantic knowledges. This article proposes a file access correlation miming and evaluation reference (FARMER) model, where file is treated as a multivariate vector space, and each item within the vector corresponds a separate factor of the given file. The selection of factor depends on the application, examples of factors are file path, creator and executing program. If one particular factor occurs in both files, its value is non-zero. It is clear that the extent of inter-file relationships can be measured based on the likeness of their factor values in the semantic vectors. Benefit from this model, FARMER represents files as structured vectors of identifiers, and basic vector operations can be leveraged to quantify file correlation between two file vectors. FARMER model leverages linear regression model to estimate the strength of the relationship between file correlation and a set of influencing factors so that the “bad knowledge” can be filtered out. To demonstrate the ability of new FARMER model, FARMER is incorporated into a real large-scale object-based storage system as a case study to dynamically infer file correlations. In addition FARMER-enabled optimize service for metadata prefetching algorithm and object data layout algorithm is implemented. Experimental results show that is FARMER-enabled prefetching algorithm is shown to reduce the metadata operations latency by approximately 30%–40% when compared to a state-of-the-art metadata prefetching algorithm and a commonly used replacement policy.
Published: 2011
Full Text: View/download PDF

45. Measurement Based Analysis of One-Click File Hosting Services

Author: Pere Barlet-Ros, Josep Solé-Pareta, and Josep Sanjuàs-Cuxart
Subjects: Computer Networks and Communications, Computer science, business.industry, Strategy and Management, Internet traffic, computer.file_format, Torrent file, World Wide Web, Self-certifying File System, File sharing, Hardware and Architecture, File area network, business, Global Namespace, SSH File Transfer Protocol, Distributed File System, computer, Information Systems, Computer network
Abstract: It is commonly believed that file sharing traffic on the Internet is mostly generated by peer-to-peer applications. However, we show that HTTP based file sharing services are also extremely popular. We analyzed the traffic of a large research and education network for three months, and observed that a large fraction of the inbound HTTP traffic corresponds to file download services, which indicates that an important portion of file sharing traffic is in the form of HTTP data. In particular, we found that two popular one-click file hosting services are among the top Internet domains in terms of served traffic volume. In this paper, we present an exhaustive study of the traffic generated by such services, the behavior of their users, the downloaded content, and their server infrastructure.
Published: 2011
Full Text: View/download PDF

46. IRM: Integrated File Replication and Consistency Maintenance in P2P Systems

Author: Haiying Shen
Subjects: business.industry, Computer science, Replica, Distributed computing, Replication (computing), Distributed hash table, File replication, Consistency (database systems), Self-certifying File System, Computational Theory and Mathematics, File sharing, Hardware and Architecture, Journaling file system, Signal Processing, Data_FILES, business, Global Namespace, File system fragmentation, Computer network
Abstract: In peer-to-peer file sharing systems, file replication and consistency maintenance are widely used techniques for high system performance. Despite significant interdependencies between them, these two issues are typically addressed separately. Most file replication methods rigidly specify replica nodes, leading to low replica utilization, unnecessary replicas and hence extra consistency maintenance overhead. Most consistency maintenance methods propagate update messages based on message spreading or a structure without considering file replication dynamism, leading to inefficient file update and hence high possibility of outdated file response. This paper presents an Integrated file replication and consistency maintenance mechanism (IRM), that integrates the two techniques in a systematic and harmonized manner. It achieves high efficiency in file replication and consistency maintenance at a significantly lower cost. Instead of passively accepting replica and update, each node determines file replication and update polling by dynamically adapting to time-varying file query and update rates, which avoids unnecessary file replications and updates. Simulation results demonstrate the effectiveness of IRM in comparison with other approaches. It dramatically reduces overhead and yields significant improvements on the efficiency of both file replication and consistency maintenance approaches.
Published: 2010
Full Text: View/download PDF

47. Robust Super-Peer-Based P2P File-Sharing Systems

Author: Jenn-Wei Lin and Ming-Feng Yang
Subjects: Self-certifying File System, General Computer Science, BitTorrent tracker, Computer science, Distributed computing, Device file, Versioning file system, computer.file_format, SSH File Transfer Protocol, Global Namespace, computer, File system fragmentation, Torrent file
Abstract: This paper presents an efficient approach for improving file availability in super-peer-based peer-to-peer (P2P) file-sharing systems. In the super-peer-based P2P file-sharing system, peers are organized into multiple groups. In each group, there is a special peer called super-peer to serve the regular peers within the same group. With this property, the proposed approach utilizes the super-peer to tolerate the departure (failure) of a regular peer in order to protect shared files. Unlike traditional replication-based approaches, the proposed approach keeps track of the file queries in the super-peer to support fault tolerance. The cost of tracking the file queries is much smaller than the cost of replicating the file contents in advance. Furthermore, the proposed approach uses a logical connection technique to consider the departure (failure) of the super-peer. Finally, simulation experiments are performed to quantify the performance and overhead of the proposed approach.
Published: 2009
Full Text: View/download PDF

48. Mitigating Denial-of-Service Attacks on the Chord Overlay Network: A Location Hiding Approach

Author: Mudhakar Srivatsa and Ling Liu
Subjects: Computer science, business.industry, Overlay network, Denial-of-service attack, computer.file_format, Computer security, computer.software_genre, Torrent file, Self-certifying File System, Computational Theory and Mathematics, Hardware and Architecture, Wide area network, Signal Processing, Data_FILES, Global Namespace, SSH File Transfer Protocol, business, Chord (peer-to-peer), computer, Computer network
Abstract: Serverless distributed computing has received significant attention from both the industry and the research community. Among the most popular applications are the wide area network file systems, exemplified by CFS, Farsite and OceanStore. These file systems store files on a large collection of untrusted nodes that form an overlay network. They use cryptographic techniques to maintain file confidentiality and integrity from malicious nodes. Unfortunately, cryptographic techniques cannot protect a file holder from a Denial-of-Service (DoS) or a host compromise attack. Hence, most of these distributed file systems are vulnerable to targeted file attacks, wherein an adversary attempts to attack a small (chosen) set of files by attacking the nodes that host them. This paper presents LocationGuard - a location hiding technique for securing overlay file storage systems from targeted file attacks. LocationGuard has three essential components: (i) location key, (ii) routing guard, a secure algorithm that protects accesses to a file in the overlay network given its location key, and (iii) a set of location inference guards. Our experimental results quantify the overhead of employing LocationGuard and demonstrate its effectiveness against DoS attacks, host compromise attacks and various location inference attacks.
Published: 2009
Full Text: View/download PDF

49. Why file sharing networks are dangerous?

Author: M. Eric Johnson, Dan McGuire, and Nicholas D. Willey
Subjects: General Computer Science, BitTorrent tracker, business.industry, Computer science, computer.file_format, Everything is a file, Torrent file, Self-certifying File System, File sharing, Journaling file system, business, SSH File Transfer Protocol, Global Namespace, computer, Computer network
Published: 2009
Full Text: View/download PDF

50. Decentralized access control in distributed file systems

Author: Sotiris Ioannidis, Angelos D. Keromytis, Vassilis Prevelakis, Jonathan M. Smith, and Stefan Miltchev
Subjects: General Computer Science, business.industry, Computer science, Distributed computing, Distributed lock manager, Access control, Replication (computing), Theoretical Computer Science, Self-certifying File System, Network File System, SSH File Transfer Protocol, Distributed File System, Global Namespace, business
Abstract: The Internet enables global sharing of data across organizational boundaries. Distributed file systems facilitate data sharing in the form of remote file access. However, traditional access control mechanisms used in distributed file systems are intended for machines under common administrative control, and rely on maintaining a centralized database of user identities. They fail to scale to a large user base distributed across multiple organizations. We provide a survey of decentralized access control mechanisms in distributed file systems intended for large scale, in both administrative domains and users. We identify essential properties of such access control mechanisms. We analyze both popular production and experimental distributed file systems in the context of our survey.
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

335 results on '"Global Namespace"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources