20 results on '"BENOIT, ANNE"'
Search Results
2. Asymptotic Performance and Energy Consumption of SLACK
- Author
-
Benoit, Anne, Canon, Louis-Claude, Elghazi, Redouane, Heam, Pierre-Cyrille, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), École normale supérieure de Lyon (ENS de Lyon), Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Centre National de la Recherche Scientifique (CNRS)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), Inria Lyon, and ANR-17-EURE-0002,EIPHI,Ingénierie et Innovation par les sciences physiques, les savoir-faire technologiques et l'interdisciplinarité(2017)
- Subjects
empirical study ,asymptotic performance ,étude empirique ,consommation énergétique ,Scheduling ,energy consumption ,Ordonnancement ,[INFO]Computer Science [cs] ,performance asymptotique - Abstract
Scheduling n independent tasks onto m identical processors in order to minimize the makespan has been widely studied. As an alternative to classical heuristics, the SLACK algorithm groups tasks by packs of m tasks of similar execution times, and schedules first the packs with the largest differences. It turns out to be very performant in practice, but only few studies have been conducted on its theoretical properties. We derive novel analytical results for SLACK, and in particular, we study the performance of this algorithm from an asymptotical point of view, under the assumption that the execution times of the tasks follow a given probability distribution. The study is building on a comparison of the most heavily loaded machine compared to the least loaded one. Furthermore, we extend the results when the objective is to minimize the energy consumption rather than the makespan, since reducing the energy consumption of the computing centers is an ever-growing concern for economical and ecological reasons. Finally, we perform extensive simulations to empirically assess the performance of the algorithms with both synthetic and realistic execution time distributions.; Ordonnancer n tâches indépendantes sur m processeurs identiques afin de minimiser le temps total d’exécution est un problème qui a été largement étudié. Comme alternative aux heuristiques classiques, l’algorithme SLACK groupe les tâches en paquets de m tâches avec un temps similaire, et ordonnance d’abord les paquets avec les plus grandes différences de temps. Cette approche est très efficace en pratique, mais peu d'études se sont intéressées à son étude théorique. Nous proposons de nouveaux résultats analytiques pour SLACK. En particulier, nous étudions la performance de l’algorithme d’un point de vue asymptotique, en supposant que le temps d’exécution des tâches suit une distribution de probabilités connue. L’étude se base sur une comparaison entre le processeur le plus chargé et celui qui l’est le moins. De plus, nous étendons ces résultats à une fonction objective différente: la minimisation de la consommation énergétique. En effet, réduire la consommation énergétique descentres de calcul est un des défis majeurs actuels, à la fois d’un point de vue économique, mais aussi et surtout écologique. Finalement, nous effectuons de nombreuses simulations pour étudier de façon empirique la performance des algorithmes, avec des distributions de temps d’exécution synthétiques d’une part, et réalistes d’autre part.
- Published
- 2023
3. List and shelf schedules for independent parallel tasks to minimize the energy consumption with discrete or continuous speeds.
- Author
-
Benoit, Anne, Canon, Louis-Claude, Elghazi, Redouane, and Héam, Pierre-Cyrille
- Subjects
- *
NP-complete problems , *APPROXIMATION algorithms , *ENERGY consumption , *SERVER farms (Computer network management) , *SCHEDULING , *PRODUCTION scheduling - Abstract
Scheduling independent tasks on a parallel platform is a widely-studied problem, in particular when the goal is to minimize the total execution time, or makespan (P | | C max problem in Graham's notations). Also, many applications do not consist of sequential tasks, but rather parallel tasks, either rigid, with a fixed degree of parallelism, or moldable, with a variable degree of parallelism (i.e., for which we can decide at the execution on how many processors they are executed). Furthermore, since the energy consumption of data centers is a growing concern, both from an environmental and economical point of view, minimizing the energy consumption of a schedule is a main challenge to be addressed. One can then decide, for each task, on how many processors it is executed, and at which speed the processors are operated, with the goal to minimize the total energy consumption. We further focus on co-schedules, where tasks are partitioned into shelves, and we prove that the problem of minimizing the energy consumption remains NP-complete when static energy is consumed during the whole duration of the application. We are however able to provide an optimal algorithm for the schedule within one shelf, i.e., for a set of tasks that start at the same time. Several approximation results are derived, both with discrete and continuous speed models, and extensive simulations are performed to show the performance of the proposed algorithms. • A model for a theoretical analysis of the energy consumption of a schedule for moldable and rigid tasks. • Most presented scheduling problems are NP-complete. • There exist approximation algorithms for the presented problems (from 1.59 to 3 depending on the problem). • An experimental study of the trade-off between the time complexity of the scheduler and the energy consumed by the schedule. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Partitioning tree-shaped task graphs for distributed platforms with limited memory
- Author
-
Benoit, Anne, Gou, Changjiang, Marchal, Loris, Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), East China Normal University [Shangaï] (ECNU), and Inria Grenoble Rhône-Alpes
- Subjects
partitionnement de graphe ,Scheduling ,Graph partitioning ,Memory-aware ,Makespan minimization ,Ordonnancement ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,algorithmes orientés mémoire - Abstract
Scientific applications are commonly modeled as the processing of directed acyclicgraphs of tasks, and for some of them, the graph takes the special form of a rooted tree. Thistree expresses both the computational dependencies between tasks and their storage requirements.The problem of scheduling/traversing such a tree on a single processor to minimize its memoryfootprint has already been widely studied. Hence, we move to parallel processing and study howto partition the tree for a homogeneous multiprocessor platform, where each processor is equippedwith its own memory. We formally state the problem of partitioning the tree into subtrees suchthat each subtree can be processed on a single processor and the total resulting processing time isminimized. We prove that the problem is NP-complete, and we design polynomial-time heuristicsto address it. An extensive set of simulations demonstrates the usefulness of these heuristics.; Les applications scientifiques sont couramment modélisées par des graphes de tâches. Pour certaines d'entre elles, le graphe prend la forme particulière d'un arbre enraciné". Cet arbre détermine à la fois les dépendance entre tâches de calcul et les besoins en stockage. Le problème d'ordonnancer (ou parcourir) un tel arbre sur un seul processeur pour réduire son empreinte mémoire a déjà largement été étudié. Dans ce rapport, nous considérons le traitement parallèle d'un tel arbre et étudions comment le partitionner pour une plate-forme decalcul formée de processeurs homogènes disposant chacun de sa propre mémoire.Nous formalisons le problème du partitionnement de l'arbre en sous-arbres de telle sorte que chaque sous-arbre puisse être traité sur un seul processeur et que le temps de calcul total soit minimal. Nous montrons que ce problème est NP-complet et proposons des heuristiques polynomiales. Un ensemble exhaustif,de simulations permet de montrer l'utilité de ces heuristiques.
- Published
- 2019
5. Un modèle de perfomance pour exécuter des graphes de tâches sur des architectures à mémoire haute performance
- Author
-
Benoit, Anne, Perarnau, Swann, Pottier, Loïc, Robert, Yves, Georgia Institute of Technology [Atlanta], Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Argonne National Laboratory [Lemont] (ANL), The University of Tennessee [Knoxville], ENS Lyon, Inria Grenoble Rhône-Alpes, University of Tennessee Knoxville, Georgia Institute of Technology, Argonne National Laboratory, Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
mémoire haute performance ,placement de données ,memory hierarchy ,modèle de performance ,Workflow ,ordonnancement ,performance model ,task graph ,graphe de tâches ,high-bandwidth memory ,hiérarchie mémoire ,scheduling ,mapping ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,manycore ,architecture massivement parallèle - Abstract
This work presents a realistic performance model to execute scientific workflows on high-bandwidth memory architectures such as the Intel Knights Landing. We provide a detailed analysis of the execution time on such platforms, taking into account transfers from both fast and slow memory and their overlap with computations. We discussseveral scheduling and mapping strategies: not only tasks must be assigned to computing resources, but also one has to decide which fraction of input and output data will reside in fast memory, and which will have to stay in slow memory. Extensive simulations allow us to assess the impact of the mapping strategies on performance. We also conduct actualexperiments for a simple 1D Gauss-Seidel kernel, which assess the accuracy of the model and further demonstrate the importance of a tuned memory management. Altogether, our model and results lay the foundations for further studies and experiments on dual-memory systems.; Ce travail présente un modèle de performance réaliste pour exécuter des workflows scientifiques sur des architectures ayant des mémoires à bande passante élevée, comme par exemple Intel Knights Landing. Nous fournissons une analyse détaillée du temps d'exécution sur ces plates-formes, en tenant compte des transferts depuis deux mémoires (rapide et lente), et leur recouvrement avec les calculs. Nous introduisons plusieurs stratégiesd'ordonnancement et de placement mémoire: non seulement les tâches doivent être assignées aux ressources de calcul, mais il faut aussi décider quelle fraction des données d'entrée et de sortie va résider en mémoire rapide, alorsque le reste sera en mémoire lente. Des simulations approfondies nous permettentd'évaluer l'impact des stratégies de placement sur la performance. Nous menons également des expériences réelles pour un noyau de Gauss-Seidel 1D simple, afin d'évaluer la précision du modèle. Nous démontrons ainsi l'importance d'une gestion fine de la mémoire sur les systèmes avec double mémoire.
- Published
- 2018
6. Reliability-aware energy optimization for throughput-constrained applications on MPSoC
- Author
-
Gou, Changjiang, Benoit, Anne, Chen, Mingsong, Marchal, Loris, Wei, Tongquan, Software Engineering Institute [Shangaï], East China Normal University [Shangaï] (ECNU), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Georgia Institute of Technology [Atlanta], Laboratoire LIP, École Normale Supérieure de Lyon & CNRS & Inria, France, Shanghai Key Lab. of Trustworthy Computing, East China Normal University, China, and Georgia Institute of Technology, USA
- Subjects
Energy minimization ,Consommation énergétique ,Scheduling ,Ordonnancement ,Débit ,Fiabilité ,[INFO]Computer Science [cs] ,MPSoC ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Reliability ,Throughput - Abstract
Multi-Processor System-on-Chip (MPSoC) has emerged as a promising embedded architecture to meet the increasing performance demand of embedded applications. However, due to limited energy budget, it is hard to guarantee that applications on MPSoC can be accomplished on time with a required throughput. The situation becomes even worse for applications with high reliability requirements, since extra energy will be inevitably consumed by task re-executions or duplicated tasks. Based on Dynamic Voltage and Frequency Scaling (DVFS) and task duplication techniques, this paper presents a novel energy-efficient scheduling model, which aims at minimizing the overall energy consumption of MPSoC applications under both throughput and reliability constraints. The problem is shown to be NP-complete, and several polynomial-time heuristics are proposed to tackle this problem. Comprehensive simulations on both synthetic and real application graphs show that our proposed heuristics can meet all the given constraints, while reducing the energy consumption.; Le système multiprocesseur sur puce (MPSoC) est une architecture prometteuse pour répondre à la demande de performance croissante des applications embarquées. Cependant, en raison de leur budget énergétique limité, il est difficile de garantir que les applications sur MPSoC peuvent être accomplies à temps avec un débit requis. La situation devient encore pire pour les applications présentant des exigences de fiabilité élevées, car une énergie supplémentaire sera inévitablement consommée par des ré-exécutions de tâches ou des tâches dupliquées. Basé sur le DVFS (Dynamic Voltage and Frequency Scaling) et la duplication de tâches, cet article présente un nouveau modèle d'ordonnancement, qui vise à minimiser la consommation d'énergie globale des applications MPSoC sous des contraintes de débit et de fiabilité. Le problème est montré NP-complet, et plusieurs heuristiques en temps polynomial sont proposées pour résoudre ce problème. Des simulations complètes sur des graphes d'application tant synthétiques que réels montrentque nos heuristiques peuvent répondre à toutes les contraintes données, tout en réduisant la consommation d'énergie.
- Published
- 2018
7. Partitioning Tree-Shaped Task Graphs for Distributed Platforms With Limited Memory.
- Author
-
Gou, Changjiang, Benoit, Anne, and Marchal, Loris
- Subjects
- *
NP-complete problems , *MULTIPROCESSORS , *DIRECTED acyclic graphs , *SPANNING trees , *TREE graphs , *PARALLEL processing , *TERRITORIAL partition , *MEMORY - Abstract
Scientific applications are commonly modeled as the processing of directed acyclic graphs of tasks, and for some of them, the graph takes the special form of a rooted tree. This tree expresses both the computational dependencies between tasks and their storage requirements. The problem of scheduling/traversing such a tree on a single processor to minimize its memory footprint has already been widely studied. The present article considers the parallel processing of such a tree and studies how to partition it for a homogeneous multiprocessor platform, where each processor is equipped with its own memory. We formally state the problem of partitioning the tree into subtrees, such that each subtree can be processed on a single processor (i.e., it must fit in memory), and the goal is to minimize the total resulting processing time. We prove that this problem is NP-complete, and we design polynomial-time heuristics to address it. An extensive set of simulations demonstrates the usefulness of these heuristics. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. Approximation algorithms for energy, reliability and makespan optimization problems
- Author
-
Aupy, Guillaume, Benoit, Anne, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), INRIA, ANR-10-BLAN-0301,RESCUE,Résilience des applications scientifiques sur machines exascales(2010), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
[INFO.INFO-CC]Computer Science [cs]/Computational Complexity [cs.CC] ,models ,reliability ,Scheduling ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,makespan ,approximation algorithms ,energy - Abstract
In this paper, we consider the problem of scheduling an application on a parallel computational platform. The application is a particular task graph, either a linear chain of tasks, or a set of independent tasks. The platform is made of identical processors, whose speed can be dynamically modified. It is also subject to failures: if a processor is slowed down to decrease the energy consumption, it has a higher chance to fail. Therefore, the scheduling problem requires to re-execute or replicate tasks (i.e., execute twice a same task, either on the same processor, or on two distinct processors), in order to increase the reliability. It is a tri-criteria problem: the goal is to minimize the energy consumption, while enforcing a bound on the total execution time (the makespan), and a constraint on the reliability of each task. Our main contribution is to propose approximation algorithms for these particular classes of task graphs. For linear chains, we design a fully polynomial time approximation scheme. However, we show that there exists no constant factor approximation algorithm for independent tasks, unless P=NP, and we are able in this case to propose an approximation algorithm with a relaxation on the makespan constraint.; Dans ce papier, nous considérons le problème d'ordonnancement d'une application sur une plateforme parallèle de calcul. L'application est un graphe de tâches particulier: soit une chaîne de tâche, soit un ensemble de tâches indépendantes. La plateforme est constituée de processeurs identiques, dont la vitesse peut être modifiée dynamiquement. Cette plateforme est aussi sujette à des fautes: lorsque l'on réduit la vitesse d'exécution d'un processeur pour diminuer la consommation d'énergie, ce processeur a une plus grande chance de faillir. C'est pourquoi, pour augmenter la fiabilité du processus, l'ordonnanceur va devoir choisir de re-exécuter ou répliquer certaines tâches (les exécuter deux fois, soit sur le même processeur, soit sur deux processeurs distincts). Le problème est donc tri-critère: nous cherchons à minimiser la consommation d'énergie, tout en préservant une limite sur le temps d'exécution, ainsi qu'une borne sur la fiabilité de chaque tâche. Nos contributions résident en l'écriture d'algorithmes d'approximation efficaces pour les deux classes de graphes étudiées. Dans le cas des chaînes linéaires, nous proposons un schéma d'approximation entièrement polynomial (FPTAS). Puis nous prouvons qu'il n'existe pas d'algorithmes d'approximation à facteur constant dans le cas des tâches indépendantes, sauf si P=NP, mais nous sommes cependant capable d'exhiber un algorithme d'approximation lorsque l'on autorise une relaxation de la contrainte sur le temps d'exécution.
- Published
- 2014
9. Co-Scheduling Algorithms for High-Throughput Workload Execution
- Author
-
Aupy, Guillaume, Shantharam, Manu, Benoit, Anne, Robert, Yves, Raghavan, Padma, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Department of Computer Science, University of Utah, Department of Computer Science and Engineering [University Park], Pennsylvania State University (Penn State), Penn State System-Penn State System, INRIA, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Scheduling ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,approximation algorithms ,efficient heuristics - Abstract
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to to faster workload completion time and to faster response times on average (hence increasing system throughput and saving energy), for significant benefits over traditional scheduling from both the user and system perspectives.; Nous étudions des algorithmes d'ordonnancement par lots pour un ensemble de tâches parallélisables. Plutôt que d'ordonnancer les tâches séquentiellement en utilisant le degré maximum de parallélisme pour chacune d'entre elles, nous cherchons à ordonnancer plusieurs tâches de manière concurrente. L'idée est de partitionner l'ensemble des tâches en une série de lots, qui seront exécutés les uns après les autres. Un lot est composé de plusieurs tâches, et un certain nombre de processeurs est alloué à chacune de ces tâches, sous la contrainte que le nombre total de processeurs alloués à un lot ne dépasse pas le nombre maximum de processeurs disponibles. L'objectif est de trouver une partition de l'ensemble des tâches en lots, et une allocation de processeurs optimale pour la minimisation de la somme des temps d'exécution de chaque lot. Nous étudions la complexité de ce problème d'optimisation, et proposons plusieurs heuristiques qui obtiennent d'excellentes performances sur plusieurs ensembles de tâches représentatifs des calculs parallèles scientifiques. Notre technique d'ordon\-nancement par lots permet non seulement d'obtenir un gain sur le temps d'exécution total, mais obtient également d'excellents résultats pour le temps de réponse moyen.
- Published
- 2013
10. Reclaiming the energy of a schedule: models and algorithms
- Author
-
Aupy, Guillaume, Benoit, Anne, Dufossé, Fanny, Robert, Yves, Algorithms and Scheduling for Distributed Heterogeneous Platforms (GRAAL), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), INRIA, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Energy models ,bi-criteria optimization ,scheduling ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,complexity ,algorithms ,Computer Science::Operating Systems - Abstract
We consider a task graph to be executed on a set of processors. We assume that the mapping is given, say by an ordered list of tasks to execute on each processor, and we aim at optimizing the energy consumption while enforcing a prescribed bound on the execution time. While it is not possible to change the allocation of a task, it is possible to change its speed. Rather than using a local approach such as backfilling, we consider the problem as a whole and study the impact of several speed variation models on its complexity. For continuous speeds, we give a closed-form formula for trees and series-parallel graphs, and we cast the problem into a geometric programming problem for general directed acyclic graphs. We show that the classical dynamic voltage and frequency scaling (DVFS) model with discrete modes leads to a NP-complete problem, even if the modes are regularly distributed (an important particular case in practice, which we analyze as the incremental model). On the contrary, the VDD-hopping model leads to a polynomial solution. Finally, we provide an approximation algorithm for the incremental model, which we extend for the general DVFS model.
- Published
- 2011
11. Static Strategies for Worksharing with Unrecoverable Interruptions (Extended version)
- Author
-
Benoit, Anne, Robert, Yves, Rosenberg, Arnold, Vivien, Frédéric, Algorithms and Scheduling for Distributed Heterogeneous Platforms (GRAAL), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Department of Computer Science [Colorado], Colorado State University [Fort Collins] (CSU), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Fault-tolerance ,divisible loads ,probabilities ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,scheduling - Abstract
One has a large workload that is "divisible''---its constituent work's granularity can be adjusted arbitrarily---and one has access to $p$ remote computers that can assist in computing the workload. How can one best utilize these computers? Complicating this question is the fact that each remote computer is subject to interruptions (of known likelihood) that kill all work in progress on it. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed. Strategies are presented for achieving this goal, by balancing the desire to checkpoint often---thereby decreasing the amount of vulnerable work at any point---vs.~the desire to avoid the context-switching required to checkpoint. The current study demonstrates the accessibility of strategies that provably maximize the expected amount of work when there is only one remote computer (the case p=1) and, at least in an asymptotic sense, when there are two remote computers (the case p=2); but the study strongly suggests the intractability of exact maximization for p >= 2 computers. This study responds to that challenge by developing efficient heuristics that employ both checkpointing and work replication as mechanisms for decreasing the impact of work-killing interruptions. The quality of these heuristics, in expected amount of work completed, is assessed through exhaustive simulations that use both idealized models and actual trace data.
- Published
- 2009
12. Scheduling Pipelined Applications: Models, Algorithms and Complexity
- Author
-
Benoit, Anne, Algorithms and Scheduling for Distributed Heterogeneous Platforms (GRAAL), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Ecole normale supérieure de lyon - ENS LYON, Alix Munier, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
models ,[INFO]Computer Science [cs] ,complexity ,scheduling ,pipelined applications ,algorithms - Abstract
In this document, I explore the problem of scheduling pipelined applications onto large-scale distributed platforms, in order to optimize several criteria. A particular attention is given to throughput maximization (i.e., the number of data sets that can be processed every time unit), latency minimization (i.e., the time required to process one data set entirely), and failure probability minimization. First, I accurately define the models and the scheduling problems, and exhibit surprising results, such as the difficulty to compute the optimal throughput and/or latency that can be obtained given a mapping. In particular, I detail the importance of the communication models, which induce quite different levels of difficulty. Second, I give an overview of complexity results for various cases, both for mono-criterion and for bi-criteria optimization problems. I illustrate the impact of the models on the problem complexity. Finally, I show some extensions of this work to different applicative contexts and to dynamic platforms.
- Published
- 2009
13. Mapping filter services on heterogeneous platforms
- Author
-
Benoit, Anne, Dufossé, Fanny, Robert, Yves, Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Laboratoire de l'informatique du parallélisme, École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), and École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Mapping ,Scheduling ,Period ,Latency ,Heuristics ,Filters ,[INFO]Computer Science [cs] ,Complexity results ,Web services - Abstract
In this paper, we explore the problem of mapping filtering web services on large-scale heterogeneous platforms. Two important optimization criteria should be considered in such a framework. The period, which is the inverse of the throughput, measuresthe rate at which data sets can enter the system. The latency measures the responsetime of the system in order to process one single data set entirely. Both criteria areantagonistic. For homogeneous platforms, the complexity of period minimization isalready known [14]; we derive an algorithm to solve the latency minimization problem,and we provide a bi-criteria algorithm which minimizes latency without exceeding aprescribed value for the period. However, when adding heterogeneity to the platform,we prove that minimizing the period or the latency becomes NP-hard. We provide aninteger linear program to solve both problems in the heterogeneous case.For period minimization on heterogeneous platforms, we design some efficient polynomial time heuristics and we assess their relative and absolute performance througha set of experiments. For small problem instances, the results are very close to theoptimal solution returned by the integer linear program.
- Published
- 2008
14. Optimizing memory allocation for multistage scheduling including setup times.
- Author
-
Benoit, Anne, Coqblin, Mathias, Nicod, Jean-Marc, and Rehn-Sonigo, Veronika
- Subjects
POLYNOMIAL time algorithms ,SETUP time ,WORKFLOW management systems ,COST control ,SCHEDULING - Abstract
Mapping linear workflow applications onto a set of homogeneous processors can be optimally solved in polynomial time for the throughput objective with fewer processors than stages. This result holds true even when setup times occur in the execution and homogeneous buffers are available for the storage of intermediate results. In this kind of application, several computation stages are interconnected as a linear application graph, and each stage holds a buffer of limited size where intermediate results are stored and a processor setup time occurs when passing from one stage to another. In this paper, we tackle the problem in which the buffer sizes are not given beforehand and must be fixed before the execution to maximize the throughput within each processor. The goal of this work is to minimize the cost induced by the setup times by allocating buffers that are proportinal in size to each other. We present a closed formula to compute the optimal buffer allocation in the case of nondecreasing setup costs in the linear application. For the case of unsorted setup times, we provide competitive heuristics that are validated via extensive simulation. Three nonscalable brute force algorithms are also provided to compare heuristic approaches to optimal ones for small applications and to evaluate the relevance of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
15. Approximation Algorithms for Energy, Reliability, and Makespan Optimization Problems.
- Author
-
Aupy, Guillaume and Benoit, Anne
- Subjects
- *
PRODUCTION scheduling , *ENERGY consumption , *APPROXIMATION algorithms , *COMPUTER programming , *MACHINE theory - Abstract
We consider the problem of scheduling an application on a parallel computational platform. The application is a particular task graph, either a linear chain of tasks, or a set of independent tasks. The platform is made of identical processors, whose speed can be dynamically modified. It is also subject to failures: if a processor is slowed down to decrease the energy consumption, it has a higher chance to fail. Therefore, the scheduling problem requires us to re-execute or replicate tasks (i.e., execute twice the same task, either on the same processor, or on two distinct processors), in order to increase the reliability. It is a tri-criteria problem: the goal is to minimize the energy consumption, while enforcing a bound on the total execution time (the makespan), and a constraint on the reliability of each task. Our main contribution is to propose approximation algorithms for linear chains of tasks and independent tasks. For linear chains, we design a fully polynomial-time approximation scheme. However, we show that there exists no constant factor approximation algorithm for independent tasks, unless P=NP, and we propose in this case an approximation algorithm with a relaxation on the makespan constraint. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
16. Reclaiming the energy of a schedule: models and algorithms Reclaiming the energy of a schedule: models and algorithms.
- Author
-
Aupy, Guillaume, Benoit, Anne, Dufossé, Fanny, and Robert, Yves
- Subjects
COMPUTER scheduling ,ALGORITHMS ,ENERGY consumption ,PARALLEL computers ,DIRECTED acyclic graphs ,TASK performance ,GEOMETRIC programming - Abstract
SUMMARY We consider a task graph to be executed on a set of processors. We assume that the mapping is given, say by an ordered list of tasks to execute on each processor, and we aim at optimizing the energy consumption while enforcing a prescribed bound on the execution time. Although it is not possible to change the allocation of a task, it is possible to change its speed. Rather than using a local approach such as backfilling, we consider the problem as a whole and study the impact of several speed variation models on its complexity. For continuous speeds, we give a closed-form formula for trees and series-parallel graphs, and we cast the problem into a geometric programming problem for general directed acyclic graphs. We show that the classical dynamic voltage and frequency scaling (DVFS) model with discrete modes leads to an NP-complete problem, even if the modes are regularly distributed (an important particular case in practice, which we analyze as the incremental model). On the contrary, the Vdd-hopping model that allows to switch between different supply voltages ( V
DD ) while executing a task leads to a polynomial solution. Finally, we provide an approximation algorithm for the incremental model, which we extend for the general DVFS model. Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]- Published
- 2013
- Full Text
- View/download PDF
17. A Survey of Pipelined Workflow Scheduling: Models and Algorithms.
- Author
-
BENOIT, ANNE, ÇATALYÜREK, ÜMIT V., ROBERT, YVES, and SAULE, ERIK
- Subjects
- *
PRODUCTION scheduling , *WORKFLOW , *WORK measurement , *COMPUTER programming , *COMPUTER simulation - Abstract
A large class of applications need to execute the same workflow on different datasets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task, data, pipelined, and/or replicated parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors, or optimization goals. This article surveys the field by summing up and structuring known results and approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
18. Multi-criteria Scheduling of Precedence Task Graphs on Heterogeneous Platforms.
- Author
-
Benoit, Anne, Hakem, Mourad, and Robert, Yves
- Subjects
- *
FAULT tolerance (Engineering) , *HEURISTIC programming , *SCHEDULING software , *ALGORITHMS , *GRAPHIC methods - Abstract
Latency, fault tolerance and reliability are important requirements for several applications that are time critical in nature: such applications require guarantees in terms of latency, even when processors are subject to failures. In this paper, we propose a fault-tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting ɛ arbitrary fail-silent/fail-stop processor failures, and hence valid results will be provided even if ɛ processors fail. First we focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Next we derive a more complex algorithm in which we not only minimize latency and support a fixed number of failures, but also improve the overall reliability. Major achievements include low complexity of the new algorithms, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the fault-tolerance based active replication scheduling algorithm FTBAR. [ABSTRACT FROM PUBLISHER]
- Published
- 2010
- Full Text
- View/download PDF
19. Replica Placement and Access Policies in Tree Networks.
- Author
-
Benoit, Anne, Rehn-Sonigo, Veronika, and Robert, Yves
- Subjects
- *
HEURISTIC programming , *HEURISTIC , *QUALITY of service , *OPERATIONS research , *COMPUTER programming , *LINEAR programming , *HETEROGENEOUS computing , *SYSTEMS theory , *MATHEMATICAL optimization - Abstract
Abstract-In this paper, we discuss and compare several policies to place replicas in tree networks, subject to server capacity and Quality-of-Service (QoS) constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. In the first policy, all requests from a given client are still processed by the same server, but this server can be located anywhere in the path from the client to the root. In the second policy, the requests of a given client can be processed by multiple servers. One major contribution of this paper is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity, both from a theoretical and a practical perspective. In this paper, we establish several new complexity results and provide several efficient polynomial heuristics for NP-complete instances of the problem. These heuristics are compared one to the other, and their absolute performance is assessed by comparison with the optimal solution provided by an integer linear program. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
20. Mapping pipeline skeletons onto heterogeneous platforms
- Author
-
Benoit, Anne and Robert, Yves
- Subjects
- *
OPERATIONS research , *PARALLEL programming , *PARALLEL processing , *COMPUTER programming - Abstract
Abstract: Mapping applications onto parallel platforms is a challenging problem, that becomes even more difficult when platforms are heterogeneous — nowadays a standard assumption. A high-level approach to parallel programming not only eases the application developer’s task, but it also provides additional information which can help realize an efficient mapping of the application. In this paper, we discuss the mapping of pipeline skeletons onto different types of platforms: Fully Homogeneous platforms with identical processors and interconnection links; Communication Homogeneous platforms, with identical links but different-speed processors; and finally, Fully Heterogeneous platforms. We assume that a pipeline stage must be mapped on a single processor, and we establish new theoretical complexity results for different mapping policies: a mapping can be required to be one-to-one (a processor is assigned at most one stage), or interval-based (a processor is assigned an interval of consecutive stages), or fully general. In particular, we show that determining the optimal interval-based mapping is NP-hard for Communication Homogeneous platforms, and this result assesses the complexity of the well-known chains-to-chains problem for different-speed processors. We provide several efficient polynomial heuristics for the most important policy/platform combination, namely interval-based mappings on Communication Homogeneous platforms. These heuristics are compared to the optimal result provided by the formulation of the problem in terms of the solution of an integer linear program, for small problem instances. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.