264 results on '"Marchal, Loris"'
Search Results
2. Scheduling on Two Types of Resources: a Survey
- Author
-
Beaumont, Olivier, Canon, Louis-claude, Eyraud-Dubois, Lionel, Lucarelli, Giorgio, Marchal, Loris, Mommessin, Clément, Simon, Bertrand, and Trystram, Denis
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The evolution in the design of modern parallel platforms leads to revisit the scheduling jobs on distributed heterogeneous resources. The goal of this survey is to present the main existing algorithms, to classify them based on their underlying principles and to propose unified implementations to enable their fair comparison, both in terms of running time and quality of schedules, on a large set of common benchmarks that we made available for the community. Beyond this comparison, our goal is also to understand the main difficulties that heterogeneity conveys and the shared principles that guide the design of efficient algorithms.
- Published
- 2019
- Full Text
- View/download PDF
3. Taming data locality for task scheduling under memory constraint in runtime systems
- Author
-
Gonthier, Maxime, Marchal, Loris, and Thibault, Samuel
- Published
- 2023
- Full Text
- View/download PDF
4. Locality-Aware Scheduling of Independent Tasks for Runtime Systems
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chaves, Ricardo, editor, B. Heras, Dora, editor, Ilic, Aleksandar, editor, Unat, Didem, editor, Badia, Rosa M., editor, Bracciali, Andrea, editor, Diehl, Patrick, editor, Dubey, Anshu, editor, Sangyoon, Oh, editor, L. Scott, Stephen, editor, and Ricci, Laura, editor
- Published
- 2022
- Full Text
- View/download PDF
5. Improving batch schedulers with node stealing for failed jobs
- Author
-
Du, Yishu, primary, Marchal, Loris, additional, Pallez, Guillaume, additional, and Robert, Yves, additional
- Published
- 2024
- Full Text
- View/download PDF
6. Mapping series-parallel streaming applications on hierarchical platforms with reliability and energy constraints
- Author
-
Gou, Changjiang, Benoit, Anne, Chen, Mingsong, Marchal, Loris, and Wei, Tongquan
- Published
- 2022
- Full Text
- View/download PDF
7. Trading performance for memory in sparse direct solvers using low-rank compression
- Author
-
Marchal, Loris, Marette, Thibault, Pichon, Grégoire, and Vivien, Frédéric
- Published
- 2022
- Full Text
- View/download PDF
8. Taming Tail Latency in Key-Value Stores: A Scheduling Perspective
- Author
-
Mokhtar, Sonia Ben, Canon, Louis-Claude, Dugois, Anthony, Marchal, Loris, Rivière, Etienne, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sousa, Leonel, editor, Roma, Nuno, editor, and Tomás, Pedro, editor
- Published
- 2021
- Full Text
- View/download PDF
9. Locality-Aware Scheduling of Independent Tasks for Runtime Systems
- Author
-
Gonthier, Maxime, primary, Marchal, Loris, additional, and Thibault, Samuel, additional
- Published
- 2022
- Full Text
- View/download PDF
10. Improving Mapping for Sparse Direct Solvers : A Trade-Off Between Data Locality and Load Balancing
- Author
-
Gou, Changjiang, Zoobi, Ali Al, Benoit, Anne, Faverge, Mathieu, Marchal, Loris, Pichon, Grégoire, Ramet, Pierre, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Malawski, Maciej, editor, and Rzadca, Krzysztof, editor
- Published
- 2020
- Full Text
- View/download PDF
11. Scheduling Trees of Malleable Tasks for Sparse Linear Algebra
- Author
-
Guermouche, Abdou, Marchal, Loris, Simon, Bertrand, and Vivien, Frédéric
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Scientific workloads are often described as directed acyclic task graphs. In this paper, we focus on the multifrontal factorization of sparse matrices, whose task graph is structured as a tree of parallel tasks. Among the existing models for parallel tasks, the concept of malleable tasks is especially powerful as it allows each task to be processed on a time-varying number of processors. Following the model advocated by Prasanna and Musicus for matrix computations, we consider malleable tasks whose speedup is $p^\alpha$, where $p$ is the fractional share of processors on which a task executes, and $\alpha$ ($0 < \alpha \leq 1$) is a parameter which does not depend on the task. We first motivate the relevance of this model for our application with actual experiments on multicore platforms. Then, we study the optimal allocation proposed by Prasanna and Musicus for makespan minimization using optimal control theory. We largely simplify their proofs by resorting only to pure scheduling arguments. Building on the insight gained thanks to these new proofs, we extend the study to distributed multicore platforms. There, a task cannot be distributed among several distributed nodes. In such a distributed setting (homogeneous or heterogeneous), we prove the NP-completeness of the corresponding scheduling problem, and propose some approximation algorithms. We finally assess the relevance of our approach by simulations on realistic trees. We show that the average performance gain of our allocations with respect to existing solutions (that are thus unaware of the actual speedup functions) is up to 16% for $\alpha=0.9$ (the value observed in the real experiments)., Comment: Paper accepted for publication at EuroPar 2015
- Published
- 2014
12. Parallel scheduling of task trees with limited memory
- Author
-
Eyraud-Dubois, Lionel, Marchal, Loris, Sinnen, Oliver, and Vivien, Frédéric
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents some large data. A task can only be executed if all input and output data fit into memory, and a data can only be removed from memory after the completion of the task that uses it as an input data. Such trees arise, for instance, in the multifrontal method of sparse matrix factorization. The peak memory needed for the processing of the entire tree depends on the execution order of the tasks. With one processor the objective of the tree traversal is to minimize the required memory. This problem was well studied and optimal polynomial algorithms were proposed. Here, we extend the problem by considering multiple processors, which is of obvious interest in the application area of matrix factorization. With multiple processors comes the additional objective to minimize the time needed to traverse the tree, i.e., to minimize the makespan. Not surprisingly, this problem proves to be much harder than the sequential one. We study the computational complexity of this problem and provide inapproximability results even for unit weight trees. We design a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan. Some of these heuristics are able to process a tree while keeping the memory usage under a given memory limit. The different heuristics are evaluated in an extensive experimental evaluation using realistic trees., Comment: arXiv admin note: substantial text overlap with arXiv:1210.2580
- Published
- 2014
13. Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value Stores
- Author
-
Canon, Louis-Claude, primary, Dugois, Anthony, additional, Marchal, Loris, additional, and Rivière, Etienne, additional
- Published
- 2023
- Full Text
- View/download PDF
14. Analysis of Dynamic Scheduling Strategies for Matrix Multiplication on Heterogeneous Platforms
- Author
-
Beaumont, Olivier and Marchal, Loris
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,F.2.0 - Abstract
The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to predict the performance of a scheduling algorithm. Therefore, dynamic solutions, where scheduling decisions are made at runtime have overpassed static allocation strategies. The simplicity and efficiency of dynamic schedulers such as Hadoop are a key of the success of the MapReduce framework. Dynamic schedulers such as StarPU, PaRSEC or StarSs are also developed for more constrained computations, e.g. task graphs coming from linear algebra. To make their decisions, these runtime systems make use of some static information, such as the distance of tasks to the critical path or the affinity between tasks and computing resources (CPU, GPU,...) and of dynamic information, such as where input data are actually located. In this paper, we concentrate on two elementary linear algebra kernels, namely the outer product and the matrix multiplication. For each problem, we propose several dynamic strategies that can be used at runtime and we provide an analytic study of their theoretical performance. We prove that the theoretical analysis provides very good estimate of the amount of communications induced by a dynamic strategy and can be used in order to efficiently determine thresholds used in dynamic scheduler, thus enabling to choose among them for a given problem and architecture., Comment: Accepted for publication in HPDC 2014
- Published
- 2014
- Full Text
- View/download PDF
15. Taming Tail Latency in Key-Value Stores: A Scheduling Perspective
- Author
-
Mokhtar, Sonia Ben, primary, Canon, Louis-Claude, additional, Dugois, Anthony, additional, Marchal, Loris, additional, and Rivière, Etienne, additional
- Published
- 2021
- Full Text
- View/download PDF
16. Online Scheduling of Task Graphs on Hybrid Platforms
- Author
-
Canon, Louis-Claude, Marchal, Loris, Simon, Bertrand, Vivien, Frédéric, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Aldinucci, Marco, editor, Padovani, Luca, editor, and Torquati, Massimo, editor
- Published
- 2018
- Full Text
- View/download PDF
17. Scheduling tree-shaped task graphs to minimize memory and makespan
- Author
-
Marchal, Loris, Sinnen, Oliver, and Vivien, Frédéric
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents a large IO file. A task can only be executed if all input and output files fit into memory, and a file can only be removed from memory after it has been consumed. Such trees arise, for instance, in the multifrontal method of sparse matrix factorization. The maximum amount of memory needed depends on the execution order of the tasks. With one processor the objective of the tree traversal is to minimize the required memory. This problem was well studied and optimal polynomial algorithms were proposed. Here, we extend the problem by considering multiple processors, which is of obvious interest in the application area of matrix factorization. With the multiple processors comes the additional objective to minimize the time needed to traverse the tree, i.e., to minimize the makespan. Not surprisingly, this problem proves to be much harder than the sequential one. We study the computational complexity of this problem and provide an inapproximability result even for unit weight trees. Several heuristics are proposed, each with a different optimization focus, and they are analyzed in an extensive experimental evaluation using realistic trees.
- Published
- 2012
18. Limiting the memory footprint when dynamically scheduling DAGs on shared-memory platforms
- Author
-
Marchal, Loris, Simon, Bertrand, and Vivien, Frédéric
- Published
- 2019
- Full Text
- View/download PDF
19. Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value Stores
- Author
-
Canon, Louis-Claude, Dugois, Anthony, Marchal, Loris, Rivière, Etienne, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC), École normale supérieure de Lyon (ENS de Lyon), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), and Université Catholique de Louvain = Catholic University of Louvain (UCL)
- Subjects
[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Scheduling ,Performance ,Key-Value Stores ,Replica ,Modularity ,[INFO]Computer Science [cs] - Abstract
International audience; Key-value stores distribute data across several storage nodes to handle large amounts of parallel requests. Proper scheduling of these requests impacts the quality of service, as measured by achievable throughput and (tail) latencies. In addition to scheduling, performance heavily depends on the nature of the workload and the deployment environment. It is, unfortunately, difficult to evaluate different scheduling strategies consistently under the same operational conditions. Moreover, such strategies are often hard-coded in the system, limiting flexibility. We present Hector, a modular framework for implementing and evaluating scheduling policies in Apache Cassandra. Hector enables users to select among several options for key components of the scheduling workflow, from the request propagation via replica selection to the local ordering of incoming requests at a storage node. We demonstrate the capabilities of Hector by comparing strategies in various settings. For example, we find that leveraging cache locality effects may be of particular interest: we propose a new replica selection strategy, called Popularity-Aware, that can support 6 times the maximum throughput of the default algorithm under specific key access patterns. We also show that local scheduling policies have a significant effect when parallelism at each storage node is limited.
- Published
- 2023
20. Improving Mapping for Sparse Direct Solvers
- Author
-
Gou, Changjiang, primary, Zoobi, Ali Al, additional, Benoit, Anne, additional, Faverge, Mathieu, additional, Marchal, Loris, additional, Pichon, Grégoire, additional, and Ramet, Pierre, additional
- Published
- 2020
- Full Text
- View/download PDF
21. Scheduling and data redistribution strategies on star platforms
- Author
-
Marchal, Loris, Rehn, Veronika, Robert, Yves, and Vivien, Frédéric
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
In this work we are interested in the problem of scheduling and redistributing data on master-slave platforms. We consider the case were the workers possess initial loads, some of which having to be redistributed in order to balance their completion times. We examine two different scenarios. The first model assumes that the data consists of independent and identical tasks. We prove the NP-completeness in the strong sense for the general case, and we present two optimal algorithms for special platform types. Furthermore we propose three heuristics for the general case. Simulations consolidate the theoretical results. The second data model is based on Divisible Load Theory. This problem can be solved in polynomial time by a combination of linear programming and simple analytical manipulations.
- Published
- 2006
22. Low-Cost Approximation Algorithms for Scheduling Independent Tasks on Hybrid Platforms
- Author
-
Canon, Louis-Claude, Marchal, Loris, Vivien, Frédéric, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Rivera, Francisco F., editor, Pena, Tomás F., editor, and Cabaleiro, José C., editor
- Published
- 2017
- Full Text
- View/download PDF
23. A generic scheduler to foster data locality for GPU and out-of-core task-based applications
- Author
-
Gonthier, Maxime, Thibault, Samuel, Marchal, Loris, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), STatic Optimizations, Runtime Methods (STORM), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), and Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
Hardware accelerators, such as GPUs, now provide a large part of the computational power used for scientific simulations. GPUs come with their own (limited) memory and are connected to the main memory of the machine via a bus with limited bandwidth. Scientific simulations often operate on very large data, to the point of not fitting in the limited GPU memory. In this case, one has to turn to out-of-core computing: data are kept in the CPU memory, and moved back and forth to the GPU memory when needed for the computation. This out-of-core situation also happens when processing on multicore CPUs with limited memory huge datasets stored on disk. In both cases, data movement quickly becomes a performance bottleneck. Task-based runtime schedulers have emerged as a convenient and efficient way to manage large applications on such heterogeneous platforms. They are in charge of choosing which tasks to assign on which processing unit and in which order they should be processed. In this work, we focus on this problem of scheduling for a taskbased runtime to improve data locality in an out-of-core setting, in order to reduce data movements. We design a data-aware strategy for both task scheduling and data eviction from limited memories. We compare this to existing scheduling techniques in runtime systems. Using the StarPU runtime, we show that our strategy achieves significantly better performance when scheduling tasks on multiple GPUs with limited memory, as well as on multiple CPU cores with limited main memory.
- Published
- 2023
24. Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, École normale supérieure de Lyon (ENS de Lyon), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Centre National de la Recherche Scientifique (CNRS), Université de Bordeaux (UB), STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Runtime Systmes ,Memory-Aware ,GPUs ,Scheduling ,Runtime systems ,[INFO]Computer Science [cs] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Locality - Abstract
International audience
- Published
- 2023
25. Scheduling Trees of Malleable Tasks for Sparse Linear Algebra
- Author
-
Guermouche, Abdou, Marchal, Loris, Simon, Bertrand, Vivien, Frédéric, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Träff, Jesper Larsson, editor, Hunold, Sascha, editor, and Versaci, Francesco, editor
- Published
- 2015
- Full Text
- View/download PDF
26. Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results
- Author
-
Herrmann, Julien, Bosilca, George, Hérault, Thomas, Marchal, Loris, Robert, Yves, and Dongarra, Jack
- Published
- 2016
- Full Text
- View/download PDF
27. Non-clairvoyant Reduction Algorithms for Heterogeneous Platforms
- Author
-
Benoit, Anne, Canon, Louis-Claude, Marchal, Loris, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, an Mey, Dieter, editor, Alexander, Michael, editor, Bientinesi, Paolo, editor, Cannataro, Mario, editor, Clauss, Carsten, editor, Costan, Alexandru, editor, Kecskemeti, Gabor, editor, Morin, Christine, editor, Ricci, Laura, editor, Sahuquillo, Julio, editor, Schulz, Martin, editor, Scarano, Vittorio, editor, Scott, Stephen L., editor, and Weidendorfer, Josef, editor
- Published
- 2014
- Full Text
- View/download PDF
28. Online Scheduling of Task Graphs on Hybrid Platforms
- Author
-
Canon, Louis-Claude, primary, Marchal, Loris, additional, Simon, Bertrand, additional, and Vivien, Frédéric, additional
- Published
- 2018
- Full Text
- View/download PDF
29. Memory-aware tree traversals with pre-assigned tasks
- Author
-
Herrmann, Julien, Marchal, Loris, and Robert, Yves
- Published
- 2015
- Full Text
- View/download PDF
30. Checkpointing à la Young/Daly: An Overview
- Author
-
Benoit, Anne, primary, Du, Yishu, additional, Herault, Thomas, additional, Marchal, Loris, additional, Pallez, Guillaume, additional, Perotin, Lucas, additional, Robert, Yves, additional, Sun, Hongyang, additional, and Vivien, Frederic, additional
- Published
- 2022
- Full Text
- View/download PDF
31. Checkpointing à la Young/Daly: An Overview
- Author
-
Benoit, Anne, Du, Yishu, Herault, Thomas, Marchal, Loris, Pallez, Guillaume, Perotin, Lucas, Robert, Yves, Sun, Hongyang, Vivien, Frédéric, Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), The University of Tennessee [Knoxville], Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] ,Checkpoint ,Young/Daly formula ,Optimal period ,[INFO]Computer Science [cs] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] - Abstract
International audience; The Young/Daly formula provides an approximation of the optimal checkpoint period for a parallel application executing on a supercomputing platform. The Young/Daly formula was originally designed for preemptible tightly-coupled applications. We provide some background and survey various application scenarios to assess the usefulness and limitations of the formula.
- Published
- 2022
32. Model and Complexity Results for Tree Traversals on Hybrid Platforms
- Author
-
Herrmann, Julien, Marchal, Loris, Robert, Yves, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wolf, Felix, editor, Mohr, Bernd, editor, and an Mey, Dieter, editor
- Published
- 2013
- Full Text
- View/download PDF
33. Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Grid'5000, ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019), École normale supérieure de Lyon (ENS de Lyon), Centre National de la Recherche Scientifique (CNRS), and Université de Bordeaux (UB)
- Subjects
Memory-Aware ,GPUs ,Scheduling ,Memory-aware scheduling ,ACM: G.: Mathematics of Computing/G.1: NUMERICAL ANALYSIS/G.1.0: General/G.1.0.6: Parallel algorithms ,Eviction policy ,Runtime systems ,[INFO]Computer Science [cs] ,Locality ,Tasks sharing data - Abstract
International audience; The use of accelerators such as GPUs has become mainstream to achieve high performance on modern computing systems. GPUs come with their own (limited) memory and are connected to the main memory of the machine through a bus (with limited bandwidth). When a computation is started on a GPU, the corresponding data needs to be transferred to the GPU before the computation starts. Such data movements may become a bottleneck for performance, especially when several GPUs have to share the communication bus. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to choose which task to allocate to which GPU and to reorder tasks so as to minimize data movements. We focus on this problem of partitioning and ordering tasks that share some of their input data. We present a novel dynamic strategy based on data selection to efficiently allocate tasks to GPUs and a custom eviction policy, and compare them to existing strategies using either a well-known graph partitioner or standard scheduling techniques in runtime systems. We also improved an offline scheduler recently proposed for a single GPU, by adding load balancing and task stealing capabilities. All strategies have been implemented on top of the STARPU runtime, and we show that our dynamic strategy achieves better performance when scheduling tasks on multiple GPUs with limited memory.
- Published
- 2022
34. Bounding the Flow Time in Online Scheduling with Structured Processing Sets
- Author
-
Canon, Louis-Claude, Dugois, Anthony, Marchal, Loris, Roma, Equipe, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, and Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Flow Time ,Processing Set Restrictions ,Key-Value Stores ,Lower Bound ,Restricted Assignment ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Replication ,[INFO]Computer Science [cs] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,[INFO] Computer Science [cs] - Abstract
International audience; Replication in distributed key-value stores makes scheduling more challenging, as it introduces processing set restrictions, which limits the number of machines that can process a given task. We focus on the online minimization of the maximum response time in such systems, that is, we aim at bounding the latency of each task. When processing sets have no structure, Anand et al. (Algorithmica, 2017) derive a strong lower bound on the competitiveness of the problem: no online scheduling algorithm can have a competitive ratio smaller than Ω(m), where m is the number of machines. In practice, data replication schemes are regular, and structured processing sets may make the problem easier to solve. We derive new lower bounds for various common structures, including inclusive, nested or interval structures. In particular, we consider fixed sized intervals of machines, which mimic the standard replication strategy of key-value stores. We prove that EFT (Earliest Finish Time) scheduling is (3−2/k)-competitive when optimizing maxflow on disjoint intervals of size k. However, we show that the competitive ratio of EFT is at least m − k + 1 when these intervals overlap, even when unit tasks are considered. We compare these two replication strategies in simulations and assess their efficiency when popularity biases are introduced, i.e., when some machines are accessed more frequently than others because they hold popular data. Even though overlapping intervals suffer from a bad worst-case in theory, they enable clusters to reach a maximum load that is up to 50% higher than with disjoint sets.
- Published
- 2022
35. Minimizing I/Os in Out-of-Core Task Tree Scheduling
- Author
-
Marchal, Loris, primary, McCauley, Samuel, additional, Simon, Bertrand, additional, and Vivien, Frédéric, additional
- Published
- 2022
- Full Text
- View/download PDF
36. Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems
- Author
-
Gonthier, Maxime, primary, Marchal, Loris, additional, and Thibault, Samuel, additional
- Published
- 2022
- Full Text
- View/download PDF
37. Steady-State for Batches of Identical Task Trees
- Author
-
Diakité, Sékou, Marchal, Loris, Nicod, Jean-Marc, Philippe, Laurent, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Sips, Henk, editor, Epema, Dick, editor, and Lin, Hai-Xiang, editor
- Published
- 2009
- Full Text
- View/download PDF
38. Towards Real-Time Distributed Signal Modeling for Brain-Machine Interfaces
- Author
-
DiGiovanna, Jack, Marchal, Loris, Rattanatamrong, Prapaporn, Zhao, Ming, Darmanjian, Shalom, Mahmoudi, Babak, Sanchez, Justin C., Príncipe, José C., Hermer-Vazquez, Linda, Figueiredo, Renato, Fortes, José A. B., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Rangan, C. Pandu, editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Shi, Yong, editor, van Albada, Geert Dick, editor, Dongarra, Jack, editor, and Sloot, Peter M. A., editor
- Published
- 2007
- Full Text
- View/download PDF
39. Doing better for jobs that failed: node stealing from a batch scheduler's perspective
- Author
-
Du, Yishu, Marchal, Loris, Pallez, Guillaume, Robert, Yves, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Tongji University, Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Innovative Computing Laboratory [Knoxville] (ICL), The University of Tennessee [Knoxville], and European Project: 956748,H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT) ,ADMIRE(2021)
- Subjects
mean flow ,Batch scheduler ,node stealing ,failures ,parallel jobs ,resilient schedule ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,maximum flow - Abstract
After a machine failure, batch schedulers typically reschedule the job that failed with a high priority. This is fair for the failed job but still requires that job to re-enter the submission queue and to wait for enough resources to become available. The waiting time can be very long when the job is large and the platform highly loaded, as is the case with typical HPC platforms. We propose another strategy: when a job J fails, if no platform node is available, we steal one node from another job J , and use it to continue the execution of J despite the failure. In this work, we give a detailed assessment of this node stealing strategy using traces from the Mira supercomputer at Argonne National Laboratory. The main conclusion is that node stealing improves the utilization of the platform and dramatically reduces the flow of large jobs, at the price of a slight increases the flow of small jobs.
- Published
- 2022
40. Taming data locality for GPU task scheduling in runtime systems
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019), Roma, Equipe, and Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité - - SOLHARIS2019 - ANR-19-CE46-0009 - AAPG2019 - VALID
- Subjects
Memory-aware scheduling ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Eviction policy ,Runtime systems ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Tasks sharing data - Abstract
A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is possible to produce a tasks processing order aiming at reducing the total processing time through three objectives: minimizing data transfers, overlapping transfers and computation and optimizing the eviction of previously-loaded data. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. We prove that the underlying problem of this new strategy is NP-complete, and prove the reasonable complexity of our proposed heuristic. These strategies have been implemented in the StarPU runtime system. We present their performance on tasks from tiled 2D, 3D matrix products, Cholesky factorization, randomized task order as well as randomized data pairs from the 2D matrix product. We introduce a visual way to understand these performance and lower bounds on the number of data loads for the 2D and 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.
- Published
- 2022
41. Ordonnancement de charges de travail intensives en entrées-sorties par lot utilisant la localité des données
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, Larsson, Elisabeth, Nettelblad, Carl, LIP, École normale supérieure de Lyon (ENS de Lyon), STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Centre National de la Recherche Scientifique (CNRS), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon, Université de Bordeaux (UB), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Uppsala Universitet [Uppsala], Division of Scientific Computing, Uppsala University, ENS Lyon, Inria Bordeaux, and Uppsala Universitet
- Subjects
Job input sharing ,Job scheduling ,High Performance Data Analytics ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Data-aware - Abstract
Clusters make use of workload schedulers such as the Slurm Workload Manager to allocate computing jobs onto nodes. These schedulers usually aim at a good trade-off between increasing resource utilization and user satisfaction (decreasing job waiting time). However, these schedulers are typically unaware of jobs sharing large input files, which may happen in data intensive scenarios. The same input files may be loaded several times, leading to a waste of resources. We study how to design a data-aware job scheduler that is able to keep large input files on the computing nodes, without impacting other memory needs, and can use previously loaded files to limit data transfers in order to reduce the waiting times of jobs. We present three schedulers capable of distributing the load between the computing nodes as well as re-using an input file already loaded in the memory of some node as much as possible. We perform simulations using real cluster usage traces to compare them to classical job schedulers. The results show that keeping data in local memory between successive jobs and using data locality information to schedule jobs allows a reduction in job waiting time and a drastic decrease in the amount of data transfers.; Les plateformes de calculs utilisent des planificateurs de charges de travail comme le Slurm Workload Manager pour allouer des travaux aux nœuds de calcul. Ces ordonnanceurs visent à équilibrer l'utilisation des ressources et la satisfaction des utilisateurs (réduire le temps d'attente). Cependant, ces ordonnanceurs ne sont pas conscients des travaux partageant des fichiers d'entrée volumineux. Dans certaines situations, la quantité de tels travaux peut être importante. Le même fichier d'entrée peut être chargé plusieurs fois, menant à une perte de ressources. Nous étudions la création d'un ordonnanceur de travaux utilisant la localité des données capable de garder de larges fichiers d'entrées sur les nœuds de calculs tout en préservant les autres besoins en mémoire, et également capable de réutiliser des fichiers déjà chargés afin de limiter les transferts de données et ainsi réduire le temps d'attente des travaux. Nous présentons trois ordonnanceurs capables de distribuer la charge entre les nœuds de calculs ainsi que de réutiliser le plus possible les fichiers d'entrées déjà chargés en mémoire de certains nœuds. Nous avons effectué des simulations en utilisant l'historique d'une vraie plateforme de calcul afin de comparer nos algorithmes à des ordonnanceurs classiques. Les résultats montrent que garder des données en mémoire entre des travaux successifs et utiliser la localité des données pour ordonner les travaux permettent une réduction du temps d'attente des travaux et une diminution drastique de la quantité de transferts de données.
- Published
- 2022
42. Scheduling Divisible Loads with Return Messages on Heterogeneous Master-Worker Platforms
- Author
-
Beaumont, Olivier, Marchal, Loris, Robert, Yves, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Bader, David A., editor, Parashar, Manish, editor, Sridhar, Varadarajan, editor, and Prasanna, Viktor K., editor
- Published
- 2005
- Full Text
- View/download PDF
43. Optimal Checkpointing Strategies for Iterative Applications
- Author
-
Du, Yishu, primary, Marchal, Loris, additional, Pallez, Guillaume, additional, and Robert, Yves, additional
- Published
- 2022
- Full Text
- View/download PDF
44. Bounding the Flow Time in Online Scheduling with Structured Processing Sets (extended version)
- Author
-
Canon, Louis-Claude, Dugois, Anthony, Marchal, Loris, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), INRIA, and Roma, Equipe
- Subjects
Flow Time ,Lower Bound ,Replication ,Temps de réponse ,[INFO] Computer Science [cs] ,bases de données clé-valeur ,Processing Set Restrictions ,Key-Value Stores ,[INFO.INFO-DC] Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Restricted Assignment ,borne inférieure ,ensembles d'exécution ,[INFO]Computer Science [cs] ,allocation restreinte ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,réplication - Abstract
Replication in distributed key-value stores makes scheduling more challenging, as it introduces processing set restrictions, which limits the number of machines that can process a given task. We focus on the online minimization of the maximum response time in such systems, that is, we aim at bounding the latency of each task. When processing sets have no structure, Anand et al. (Algorithmica, 2017) derive a strong lower bound on the competitiveness of the problem: no online scheduling algorithm can have a competitive ratio smaller than $Ω$($m$), where $m$ is the number of machines. In practice, data replication schemes are regular, and structured processing sets may make the problem easier to solve. We derive new lower bounds for various common structures, including inclusive, nested or interval structures. In particular, we consider fixed sized intervals of machines, which mimic the standard replication strategy of key-value stores. We prove that EFT (Earliest Finish Time) scheduling is (3 − 2/$k$)-competitive when optimizing max-flow on disjoint intervals of size $k$. However, we show that the competitive ratio of EFT is at least $m$−$k$ + 1 when these intervals overlap, even when unit tasks are considered. We compare these two replication strategies in simulations and assess their effciency when popularity biases are introduced, i.e., when some machines are accessed more frequently than others because they hold popular data. Even though overlapping intervals suffer from a bad worst-case in theory, they enable clusters to reach a maximum load that is up to 50% higher than with disjoint sets., La réplication dans les bases de données distribuées de type clé-valeur complique l'étape d'ordonnancement, car elle introduit des restrictions sur les ensembles d'exécution qui limitent le nombre de machines pouvant traiter une tâche donnée. Nous considérons la minimisation du temps de réponse maximum dans ces systèmes, et nous cherchons des garanties sur le ratio de compétitivité atteignable pour ce problème d'ordonnancement en ligne. Lorsque les ensembles d'exécution n'exhibent aucune structure, Anand et al. (Algorithmica, 2017) dérivent une borne inférieure sur le ratio de compétitivité : aucun algorithme en ligne ne peut fournir un ratio inférieur à $Ω$($m$), avec m le nombre de machines. En pratique, la réplication des données est régulière, et des ensembles d'exécution structurés peuvent faciliter le problème. Nous calculons de nouvelles bornes inférieures pour les structures suivantes : inclusive, nested et interval. En particulier, nous considérons des intervalles de machines de taille fixe afin d'imiter la stratégie de réplication standard des bases de données de type clé-valeur. Nous prouvons que la stratégie EFT (Earliest Finish Time) est (3 − 2/$k$)-compétitive pour l'optimisation du temps de réponse maximum sur des intervalles disjoints de taille $k$. Cependant, nous montrons que le ratio de compétitivité de EFT est au moins $m$ − $k$ + 1 lorsque ces intervalles sont non-disjoints, même pour des tâches unitaires. Nous comparons ces deux stratégies de réplication dans des simulations afin d'évaluer leur efficacité lorsque des biais sur la popularité des clés sont introduits. Même si les intervalles non-disjoints ne donnent pas de bonnes garanties dans tous les cas, ils permettent aux clusters d'atteindre une charge maximum jusqu'à 50% plus élevée que les intervalles disjoints
- Published
- 2022
45. Minimizing I/Os in Out-of-Core Task Tree Scheduling.
- Author
-
Marchal, Loris, McCauley, Samuel, Simon, Bertrand, and Vivien, Frédéric
- Subjects
- *
DIRECTED acyclic graphs , *HEURISTIC algorithms , *SCHEDULING , *TREE graphs - Abstract
Scientific applications are usually described using directed acyclic graphs, where nodes represent tasks and edges represent dependencies between tasks. For some applications, this graph is a tree: each task produces a single result used solely by its parent. The temporary results of each task have to be stored between their production and their use. We focus on the case when the data manipulated are very large. Then, during an execution, all data may not fit together in memory. In such a case, some data have to be temporarily written to disk and evicted from memory. These data are later read from disk when they are needed for computation. These Input/Output operations are very expensive; hence, our goal is to minimize their total volume. The order in which the tasks are processed considerably influences the amount of such Input/Output operations. Finding the schedule which minimizes this amount is an open problem that we revisit in this paper. We first formalize and generalize known results, and prove that existing solutions can be arbitrarily worse than the optimal. We then present an Integer Linear Program to solve it optimally. Finally, we propose a novel heuristic algorithm. We demonstrate its good performance through simulations on both synthetic and realistic trees built from actual scientific applications. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Locality-Aware Scheduling of Independent Tasks for Runtime Systems
- Author
-
Gonthier, Maxime, Marchal, Loris, Thibault, Samuel, Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Grid5000, ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019), École normale supérieure de Lyon (ENS de Lyon), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Centre National de la Recherche Scientifique (CNRS), Université de Bordeaux (UB), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
- Subjects
Politique d'éviction ,Support d'exécution ,Memory-aware scheduling ,Eviction policy ,Ordonnancement sous contrainte mémoire ,[INFO]Computer Science [cs] ,Runtime systems ,Tâches partageant des données ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Tasks sharing data - Abstract
International audience; A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the StarPU runtime system. We present their performance on tasks from tiled 2D and 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.; Une manière désormais classique de répondre à la demande croissante de puissance de calcul par les applications HPC est l'utilisation de GPU et autres accélérateurs. Ces accélérateurs ont leurs propre mémoire, qui est généralement assez limitée, et sont connectés à la mémoire principale via un bus dont la bande passante est bornée. Ainsi, une attention particulière doit être portée à la localité des données afin d'éviter des mouvements de données inutiles. Les ordonnanceurs des supports d'exécution à base de tâches sont un moyen pratique et efficace d'utiliser de telles plateformes hétérogènes. Lors du traitement d'une application, l'ordonnanceur a la connaissance de toutes les tâches disponibles, ainsi que leurs dépendances. Ainsi, il est capable d'ordonner les tâches et de pré-charger leurs données d'entrée dans la mémoire du GPU (après avoir éventuellement évincé certaines données précédemment chargées), tout minimisant les transferts de données, afin de réduire le temps d'exécution total. Dans ce papier, nous nous concentrons sur la façon de planifier des tâches qui partagent des données (mais sont par ailleurs indépendantes) sur un GPU. Nous fournissons un modèle formel du problème, nous présentons une stratégie d'éviction optimale et nous montrons qu'ordonner des tâches afin de minimiser les mouvement des données est un problème NP-complet. Nous adaptons des stratégies d'ordonnancement existantes à ce problème, et nous en proposons une nouvelle basé sur l'agrégation des tâches. Ces stratégies ont été implémentées sur le support d'exécution StarPU. Nous présentons leurs performances sur des produits matriciels 2D, 3D, la factorisation de Cholesky et un produit matriciel 2D randomisé. Nos expériences démontrent qu'en utilisant notre nouvelle stratégie, avec la politique d'éviction optimale, nous réduisons la quantité de transferts de données ainsi que le temps de traitement total.
- Published
- 2021
47. Taming Tail Latency in Key-Value Stores: a Scheduling Perspective (extended version)
- Author
-
Ben Mokhtar, Sonia, Canon, Louis-Claude, Dugois, Anthony, Marchal, Loris, Rivière, Etienne, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2), Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Laboratoire de l'Informatique du Parallélisme (LIP), Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université Catholique de Louvain = Catholic University of Louvain (UCL), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Université Catholique de Louvain (UCL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Distribution, Recherche d'Information et Mobilité (DRIM), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Online Scheduling ,Key-Value Store ,Lower Bound ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Tail Latency ,Replica Selection - Abstract
Distributed key-value stores employ replication for high availability. Yet, they do not always efficiently take advantage of the availability of multiple replicas for each value, and read operations often exhibit high tail latencies. Various replica selection strategies have been proposed to address this problem, together with local request scheduling policies. It is difficult, however, to determine what is the absolute performance gain each of these strategies can achieve. We present a formal framework allowing the systematic study of request scheduling strategies in key-value stores. We contribute a definition of the optimization problem related to reducing tail latency in a replicated key-value store as a minimization problem with respect to the maximum weighted flow criterion. By using scheduling theory, we show the difficulty of this problem, and therefore the need to develop performance guarantees. We also study the behavior of heuristic methods using simulations, which highlight which properties are useful for limiting tail latency: for instance, the EFT strategy-which uses the earliest available time of servers-exhibits a tail latency that is less than half that of state-of-the-art strategies, often matching the lower bound. Our study also emphasizes the importance of metrics such as the stretch to properly evaluate replica selection and local execution policies.
- Published
- 2021
48. Non-clairvoyant Reduction Algorithms for Heterogeneous Platforms
- Author
-
Benoit, Anne, primary, Canon, Louis-Claude, additional, and Marchal, Loris, additional
- Published
- 2014
- Full Text
- View/download PDF
49. Optimal Checkpointing Strategies for Iterative Applications
- Author
-
Du, Yishu, Marchal, Loris, Pallez, Guillaume, Robert, Yves, Tongji University, Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Topology-Aware System-Scale Data Management for High-Performance Computing (TADAAM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB), Innovative Computing Laboratory [Knoxville] (ICL), The University of Tennessee [Knoxville], and Inria - Research Centre Grenoble – Rhône-Alpes
- Subjects
Fault-tolerance ,Application itérative ,Checkpoint ,Iterative application ,Fail-stop errors ,Tolérance aux pannes ,[INFO]Computer Science [cs] ,Checkpoint strategy - Abstract
This work provides an optimal checkpointing strategy to protect iterative applications from fail-stop errors. We consider a very general framework, where the application repeats the same execution pattern by executing consecutive iterations, and where each iteration is composed of several tasks. These tasks have different execution lengths and different checkpoint costs. Assume that there are $n$ tasks and that task $a_i$, where $0 ≤ i < n$, has execution time $t_i$ and checkpoint cost $C_i$. A naive strategy would checkpoint after each task. A strategy inspired by the Young/Daly formula would select the task $a_{min}$ with smallest checkpoint cost $C_{min}$ and would checkpoint after every $p^{th}$ instance of that task, leading to a checkpointing period $P_{YD} = pT$ where $\textit{T}{\sum_{i=0}^{n-1}a_i}$ is the time per iteration. One would choose the period so that $P_{YD} \approx \sqrt{ 2{\mu}C_\textrm{min}}$ to obey the Young/Daly formula, where $\mu$ is the application MTBF. Both the naive and Young/Daly strategies are suboptimal. Our main contribution is to show that the optimal checkpoint strategy is globally periodic, and to design a dynamic programming algorithm that computes the optimal checkpointing pattern. This pattern may well checkpoint many different tasks, and this across many different iterations. We show through simulations, both from synthetic and real-life application scenarios, that the optimal strategy significantly outperforms the naive and Young/Daly strategies.; Ce rapport propose une stratégie de checkpoint optimale pour protéger les applications itératives des erreurs fatales. On s’intéresse à un cadre général, où l’application répète le même schéma et exécute des itérations consécutives, et où chaque itération est composée de plusieurstâches. Ces tâches ont des longueurs différentes, et des coûts de checkpoint différents également. Supposons avoir $n$ tâches, et que la tâche $a_i$, pour 0 ≤ $i < n$, a un temps d’exécution $t_i$ et un coût de checkpoint $C_i$. Une stratégie naïve prendrait un checkpoint après chaque tâche. Une stratégie inspirée par la formule de Young/Daly choisirait la tâche $a_{min}$ dont le coût de checkpoint $C_{min}$ est minimal, et prendrait un checkpoint après chaque $p^{ème}$ instance de cette tâche, ce qui conduit à une période de checkpointing $P_{YD} = pT$ où $\textit{T}{\sum_{i=0}^{n-1}a_i}$ est le temps d’une itération. On choisirait alors p pour que $P_{YD} \approx \sqrt{ 2{\mu}C_\textrm{min}}$ obéisse à la formule de Young/Daly, où µ est le MTBF (temps moyen entre deux pannes) de l’application. La stratégie naïve et celle de Young/Daly sont toutes deux sous-optimales. Notre contribution principale est de montrer que la stratégie optimale est globalement périodique, et de proposer un algorithme de programmation dynamique qui calcule la période de ce checkpoint optimale. Cette dernière peut prendre le checkpoint de plusieurs tâches, et ce sur plusieurs itérations. Des simulations basées sur des scénarios applicatifs à la fois synthétiques et issus d’applications réelles, montrent bien le gain qu’apporte la stratégie optimale.
- Published
- 2020
50. Dynamic DAG Scheduling Under Memory Constraints for Shared-Memory Platforms
- Author
-
Bathie, Gabriel, primary, Marchal, Loris, additional, Robert, Yves, additional, and Thibault, Samuel, additional
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.