Author: "Bilas, Angelos" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bilas, Angelos"' showing total 12 results

Start Over Author "Bilas, Angelos" Database arXiv

12 results on '"Bilas, Angelos"'

1. Running Cloud-native Workloads on HPC with High-Performance Kubernetes

Author: Chazapis, Antony, Maliaroudakis, Evangelos, Nikolaidis, Fotis, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The escalating complexity of applications and services encourages a shift towards higher-level data processing pipelines that integrate both Cloud-native and HPC steps into the same workflow. Cloud providers and HPC centers typically provide both execution platforms on separate resources. In this paper we explore a more practical design that enables running unmodified Cloud-native workloads directly on the main HPC cluster, avoiding resource partitioning and retaining the HPC center's existing job management and accounting policies.
Published: 2024

2. vLSM: Low tail latency and I/O amplification in LSM-based KV stores

Author: Xanthakis, Giorgos, Katsarakis, Antonios, Saloustros, Giorgos, and Bilas, Angelos
Subjects: Computer Science - Databases
Abstract: LSM-based key-value (KV) stores are an important component in modern data infrastructures. However, they suffer from high tail latency, in the order of several seconds, making them less attractive for user-facing applications. In this paper, we introduce the notion of compaction chains and we analyse how they affect tail latency. Then, we show that modern designs reduce tail latency, by trading I/O amplification or require large amounts of memory. Based on our analysis, we present vLSM, a new KV store design that improves tail latency significantly without compromising on memory or I/O amplification. vLSM reduces (a) compaction chain width by using small SSTs and eliminating the tiering compaction required in L0 by modern systems and (b) compaction chain length by using a larger than typical growth factor between L1 and L2 and introducing overlap-aware SSTs in L1. We implement vLSM in RocksDB and evaluate it using db_bench and YCSB. Our evaluation highlights the underlying trade-off among memory requirements, I/O amplification, and tail latency, as well as the advantage of vLSM over current approaches. vLSM improves P99 tail latency by up to 4.8x for writes and by up to 12.5x for reads, reduces cumulative write stalls by up to 60% while also slightly improves I/O amplification at the same memory budget.
Published: 2024

3. Guardian: Safe GPU Sharing in Multi-Tenant Environments

Author: Pavlidakis, Manos, Vasiliadis, Giorgos, Mavridis, Stelios, Argyros, Anargyros, Chazapis, Antony, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space. Existing spatial-sharing mechanisms either lack fault isolation for memory accesses or require static partitioning, which leads to limited deployability or low utilization. In this paper, we present Guardian, a PTX-level bounds checking approach that provides memory isolation and supports dynamic GPU spatial-sharing. Guardian relies on three mechanisms: (1) It divides the common GPU address space into separate partitions for different applications. (2) It intercepts and checks all GPU related calls at the lowest level, fencing erroneous operations. (3) It instruments all GPU kernels at the PTX level -- available in closed GPU libraries -- fencing all kernel memory accesses outside application memory bounds. Guardian's approach is transparent to applications and supports real-life frameworks, such as Caffe and PyTorch, that issue billions of GPU kernels. Our evaluation shows that Guardian's overhead compared to native for such frameworks is between 4% - 12% and on average 9%.
Published: 2024

4. Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

Author: Pavlidakis, Manos, Mavridis, Stelios, Chazapis, Antony, Vasiliadis, Giorgos, and Bilas, Angelos
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support.
Published: 2023

5. Garbage Collection or Serialization? Between a Rock and a Hard Place!

Author: Kolokasis, Iacovos G., Evdorou, Giannos, Papagiannis, Anastasios, Zakkak, Foivos, Kozanitis, Christos, Akram, Shoaib, Pratikakis, Polyvios, and Bilas, Angelos
Subjects: Computer Science - Programming Languages, D.3.3, D.3.4, B.3.2, C.5.5
Abstract: Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive amounts of data that do not always fit on the heap. Therefore, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high memory pressure when off-heap objects are moved back to the managed heap for processing. In this paper, we propose TeraHeap, a system that eliminates S/D overhead and expensive GC scans for a large portion of the objects in big data frameworks. TeraHeap relies on three concepts. (1) It eliminates S/D cost by extending the managed runtime (JVM) to use a second high-capacity heap (H2) over a fast storage device. (2) It reduces GC cost by fencing the garbage collector from scanning H2 objects. (3) It offers a simple hint-based interface, which allows frameworks to leverage knowledge about objects for populating H2. We implement TeraHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that for the same DRAM size, TeraHeap improves performance by up to 73% and 28% compared to native Spark and Giraph, respectively. Also, it provides better performance by consuming up to 8x and 1.2x less DRAM capacity than native Spark and Giraph, respectively. Finally, it outperforms Panthera, a garbage collector for hybrid memories, by up to 69%., Comment: 17 pages, 12 figures, asplos23 submission revision
Published: 2021

6. Using RDMA for Efficient Index Replication in LSM Key-Value Stores

Author: Vardoulakis, Michalis, Saloustros, Giorgos, González-Férez, Pilar, and Bilas, Angelos
Subjects: Computer Science - Databases, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Log-Structured Merge tree (LSM tree) Key-Value (KV) stores have become a foundational layer in the storage stacks of datacenter and cloud services. Current approaches for achieving reliability and availability avoid replication at the KV store level and instead perform these operations at higher layers, e.g., the DB layer that runs on top of the KV store. The main reason is that past designs for replicated KV stores favor reducing network traffic and increasing I/O size. Therefore, they perform costly compactions to reorganize data in both the primary and backup nodes, which hurts overall system performance. In this paper, we design and implement Talos, an efficient rack-scale LSM-based KV store that aims to significantly reduce the I/O amplification and CPU overhead in backup nodes and make replication in the KV store practical. We rely on two observations: (a) the increased use of RDMA in the datacenter, which reduces CPU overhead for communication, and (b) the use of KV separation that is becoming prevalent in modern KV stores. We use a primary-backup replication scheme that performs compactions only on the primary nodes and sends the pre-built index to the backup nodes of the region, avoiding all compactions in backups. Our approach includes an efficient mechanism to deal with pointer translation across nodes in the region index. Our results show that Talos reduces in the backup nodes, I/O amplification by up to $3\times$, CPU overhead by up to $1.6\times$, and memory size needed for the write path by up to $2\times$, without increasing network bandwidth excessively, and by up to $1.3\times$. Overall, we show that our approach has benefits even when small KV pairs dominate in a workload (80%-90%). Finally, it enables KV stores to operate with larger growth factors (from 10 to 16) to reduce space amplification without sacrificing precious CPU cycles.
Published: 2021

7. Frisbee: automated testing of Cloud-native applications in Kubernetes

Author: Nikolaidis, Fotis, Chazapis, Antony, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: As more and more companies are migrating (or planning to migrate) from on-premise to Cloud, their focus is to find anomalies and deficits as early as possible in the development life cycle. We propose Frisbee, a declarative language and associated runtime components for testing cloud-native applications on top of Kubernetes. Given a template describing the system under test and a workflow describing the experiment, Frisbee automatically interfaces with Kubernetes to deploy the necessary software in containers, launch needed sidecars, execute the workflow steps, and perform automated checks for deviation from expected behavior. We evaluate Frisbee through a series of tests, to demonstrate its role in designing, and evaluating cloud-native applications; Frisbee helps in testing uncertainties at the level of application (e.g., dynamically changing request patterns), infrastructure (e.g., crashes, network partitions), and deployment (e.g., saturation points). Our findings have strong implications for the design, deployment, and evaluation of cloud applications. The most prominent is that: erroneous benchmark outputs can cause an apparent performance improvement, automated failover mechanisms may require interoperability with clients, and that a proper placement policy should also account for the clock frequency, not only the number of cores.
Published: 2021

8. Balancing Garbage Collection vs I/O Amplification using hybrid Key-Value Placement in LSM-based Key-Value Stores

Author: Xanthakis, Giorgos, Saloustros, Giorgos, Batsaras, Nikos, Papagiannis, Anastasios, and Bilas, Angelos
Subjects: Computer Science - Databases
Abstract: Key-value (KV) separation is a technique that introduces randomness in the I/O access patterns to reduce I/O amplification in LSM-based key-value stores for fast storage devices (NVMe). KV separation has a significant drawback that makes it less attractive: Delete and especially update operations that are important in modern workloads result in frequent and expensive garbage collection (GC) in the value log. In this paper, we design and implement Parallax, which proposes hybrid KV placement that reduces GC overhead significantly and maximizes the benefits of using a log. We first model the benefits of KV separation for different KV pair sizes. We use this model to classify KV pairs in three categories small, medium, and large. Then, Parallax uses different approaches for each KV category: It always places large values in a log and small values in place. For medium values it uses a mixed strategy that combines the benefits of using a log and eliminates GC overhead as follows: It places medium values in a log for all but the last few (typically one or two) levels in the LSM structure, where it performs a full compaction, merges values in place, and reclaims log space without the need for GC. We evaluate Parallax against RocksDB that places all values in place and BlobDB that always performs KV separation. We find that Parallax increases throughput by up to 12.4x and 17.83x, decreases I/O amplification by up to 27.1x and 26x, and increases CPU efficiency by up to 18.7x and 28x respectively, for all but scan-based YCSB workloads., Comment: 14 pages, 8 figures
Published: 2021

9. Power and Performance Analysis of Persistent Key-Value Stores

Author: Mikrou, Stella, Papagiannis, Anastasios, Saloustros, Giorgos, Marazakis, Manolis, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance
Abstract: With the current rate of data growth, processing needs are becoming difficult to fulfill due to CPU power and energy limitations. Data serving systems and especially persistent key-value stores have become a substantial part of data processing stacks in the data center, providing access to massive amounts of data for applications and services. Key-value stores exhibit high CPU and I/O overheads because of their constant need to reorganize data on the devices. In this paper, we examine the efficiency of two key-value stores on four servers of different generations and with different CPU architectures. We use RocksDB, a key-value that is deployed widely, e.g. in Facebook, and Kreon, a research key-value store that has been designed to reduce CPU overhead. We evaluate their behavior and overheads on an ARM-based microserver and three different generations of x86 servers. Our findings show that microservers have better power efficiency in the range of 0.68-3.6x with a comparable tail latency.
Published: 2020

10. The EuroSys 2020 Online Conference: Experience and lessons learned

Author: Bilas, Angelos, Kostic, Dejan, Magoutis, Kostas, Markatos, Evangelos, Narayanan, Dushyanth, Pietzuch, Peter, and Seltzer, Margo
Subjects: Computer Science - Computers and Society
Abstract: The 15th European Conference on Computer Systems (EuroSys'20) was organized as a virtual (online) conference on April 27-30, 2020. The main EuroSys'20 track took place April 28-30, 2020, preceded by five workshops (EdgeSys'20, EuroDW'20, EuroSec'20, PaPoC'20, SPMA'20) on April 27, 2020. The decision to hold a virtual (online) conference was taken in early April 2020, after consultations with the EuroSys community and internal discussions about potential options, eventually allowing about three weeks for the organization. This paper describes the choices we made to organize EuroSys'20 as a virtual (online) conference, the challenges we addressed, and the lessons learned.
Published: 2020

11. VAT: Asymptotic Cost Analysis for Multi-Level Key-Value Stores

Author: Batsaras, Nikos, Saloustros, Giorgos, Papagiannis, Anastasios, Fatourou, Panagiota, and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Databases, Computer Science - Data Structures and Algorithms
Abstract: Over the past years, there has been an increasing number of key-value (KV) store designs, each optimizing for a different set of requirements. Furthermore, with the advancements of storage technology the design space of KV stores has become even more complex. More recent KV-store designs target fast storage devices, such as SSDs and NVM. Most of these designs aim to reduce amplification during data reorganization by taking advantage of device characteristics. However, until today most analysis of KV-store designs is experimental and limited to specific design points. This makes it difficult to compare tradeoffs across different designs, find optimal configurations and guide future KV-store design. In this paper, we introduce the Variable Amplification- Throughput analysis (VAT) to calculate insert-path amplification and its impact on multi-level KV-store performance.We use VAT to express the behavior of several existing design points and to explore tradeoffs that are not possible or easy to measure experimentally. VAT indicates that by inserting randomness in the insert-path, KV stores can reduce amplification by more than 10x for fast storage devices. Techniques, such as key-value separation and tiering compaction, reduce amplification by 10x and 5x, respectively. Additionally, VAT predicts that the advancements in device technology towards NVM, reduces the benefits from both using key-value separation and tiering.
Published: 2020

12. BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores

Author: Labrineas, Alexandros, Pratikakis, Polyvios, Nikolopoulos, Dimitrios S., and Bilas, Angelos
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Programming Languages
Abstract: This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic synchronization, and executes OpenMP-Ss tasks on a non cache-coherent architecture. We design a runtime that uses fast on-chip inter-core communication with small messages. At the same time, we use non coherent shared memory to avoid large core-to-core data transfers that would incur a high volume of unnecessary copying. We evaluate BDDT-SCC on a set of representative benchmarks, in terms of task granularity, locality, and communication. We find that memory locality and allocation plays a very important role in performance, as the architecture of the SCC memory controllers can create strong contention effects. We suggest patterns that improve memory locality and thus the performance of applications, and measure their impact.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Bilas, Angelos"'

1. Running Cloud-native Workloads on HPC with High-Performance Kubernetes

2. vLSM: Low tail latency and I/O amplification in LSM-based KV stores

3. Guardian: Safe GPU Sharing in Multi-Tenant Environments

4. Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

5. Garbage Collection or Serialization? Between a Rock and a Hard Place!

6. Using RDMA for Efficient Index Replication in LSM Key-Value Stores

7. Frisbee: automated testing of Cloud-native applications in Kubernetes

8. Balancing Garbage Collection vs I/O Amplification using hybrid Key-Value Placement in LSM-based Key-Value Stores

9. Power and Performance Analysis of Persistent Key-Value Stores

10. The EuroSys 2020 Online Conference: Experience and lessons learned

11. VAT: Asymptotic Cost Analysis for Multi-Level Key-Value Stores

12. BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

12 results on '"Bilas, Angelos"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources