4 results on '"Guillaume Eynard-Bontemps"'
Search Results
2. Pangeo@EOSC: deployment of PANGEO ecosystem on the European Open Science Cloud
- Author
-
Guillaume Eynard-Bontemps, Jean Iaquinta, Sebastian Luna-Valero, Miguel Caballer, Frederic Paul, Anne Fouilloux, Benjamin Ragan-Kelley, Pier Lorenzo Marasco, and Tina Odaka
- Abstract
Research projects heavily rely on the exchange and processing of data and in this context Pangeo (https://pangeo.io/), a world-wide community of scientists and developers, thrives to facilitate the deployment of ready to use and community-driven platforms for big data geoscience. The European Open Science Cloud (EOSC) is the main initiative in Europe for providing a federated and open multi-disciplinary environment where European researchers, innovators, companies and citizens can share, publish, find and re-use data, tools and services for research, innovation and educational purposes. While a number of services based on Jupyter Notebooks were already available, no public Pangeo deployments providing fast access to large amounts of data and compute resources were accessible on EOSC. Most existing cloud-based Pangeo deployments are USA-based, and members of the Pangeo community in Europe did not have a shared platform where scientists or technologists could exchange know-how. Pangeo teamed up with two EOSC projects, namely EGI-ACE (https://www.egi.eu/project/egi-ace/) and C-SCALE (https://c-scale.eu/) to demonstrate how to deploy and use Pangeo on EOSC and emphasise the benefits for the European community. The Pangeo Europe Community together with EGI deployed a DaskHub, composed of Dask Gateway (https://gateway.dask.org/) and JupyterHub (https://jupyter.org/hub), with Kubernetes cluster backend on EOSC using the infrastructure of the EGI Federation (https://www.egi.eu/egi-federation/). The Pangeo EOSC JupyterHub deployment makes use of 1) the EGI Check-In to enable user registration and thereby authenticated and authorised access to the Pangeo JupyterHub portal and to the underlying distributed compute infrastructure; and 2) the EGI Cloud Compute and the cloud-based EGI Online Storage to distribute the computational tasks to a scalable compute platform and to store intermediate results produced by the user jobs. To facilitate future Pangeo deployments on top of a wide range of cloud providers (AWS, Google Cloud, Microsoft Azure, EGI Cloud Computing, OpenNebula, OpenStack, and more), the Pangeo EOSC JupyterHub deployment is now possible through the Infrastructure Manager (IM) Dashboard (https://im.egi.eu/im-dashboard/login). All the computing and storage resources are currently supplied by CESNET (https://www.cesnet.cz/?lang=en) in the frame of EGI-ACE project (https://im.egi.eu/). Several deployments have been made to serve the geoscience community, both for teaching and for research work. To date, more than 100 researchers have been trained on Pangeo@EOSC deployments and more are expected to join, in particular with easy access to large amounts of Copernicus data through a recent collaboration established with the C-SCALE project. In this presentation, we will provide details on the different deployments, how to get access to JupyterHub deployments and more generally how to contribute to Pangeo@EOSC.
- Published
- 2023
- Full Text
- View/download PDF
3. Pangeo framework for training: experience with FOSS4G, the CLIVAR bootcamp and the eScience course
- Author
-
Anne Fouilloux, Pier Lorenzo Marasco, Tina Odaka, Ruth Mottram, Paul Zieger, Michael Schulz, Alejandro Coca-Castro, Jean Iaquinta, and Guillaume Eynard Bontemps
- Abstract
The ever increasing number of scientific datasets made available by authoritative data providers (NASA, Copernicus, etc.) and provided by the scientific community opens new possibilities for advancing the state of the art in many areas of the natural sciences. As a result, researchers, innovators, companies and citizens need to acquire computational and data analysis skills to optimally exploit these datasets. Several educational programs dispense basic courses to students, and initiatives such as “The Carpentries” (https://carpentries.org/) complement this offering but also reach out to established researchers to fill the skill gap thereby empowering them to perform their own data analysis. However, most researchers find it challenging to go beyond these training sessions and face difficulties when trying to apply their newly acquired knowledge to their own research projects. To this regard, hackathons have proven to be an efficient way to support researchers in becoming competent practitioners but organising good hackathons is difficult and time consuming. In addition, the need for large amounts of computational and storage resources during the training and hackathons requires a flexible solution. Here, we propose an approach where researchers work on realistic, large and complex data analysis problems similar to or directly part of their research work. Researchers access an infrastructure deployed on the European Ocean Science Cloud (EOSC) that supports intensive data analysis (large compute and storage resources). EOSC is a European Commission initiative for providing a federated and open multi-disciplinary environment where data, tools and services can be shared, published, found and re-used. We used jupyter book for delivering a collection of FAIR training materials for data analysis relying on Pangeo EOSC deployments as its primary computing platform. The training material (https://pangeo-data.github.io/foss4g-2022/intro.html, https://pangeo-data.github.io/clivar-2022/intro.html, https://pangeo-data.github.io/escience-2022/intro.html) is customised (different datasets with similar analysis) for different target communities and participants are taught the usage of Xarray, Dask and more generally how to efficiently access and analyse large online datasets. The training can be completed by group work where attendees can work on larger scale scientific datasets: the classroom is split into several groups. Each group works on different scientific questions and may use different datasets. Using the Pangeo (http://pangeo.io) ecosystem is not always new for all attendees but applying Xarray (http://xarray.pydata.org) and Dask (https://www.dask.org/) on actual scientific “mini-projects” is often a showstopper for many researchers. With this approach, attendees have the opportunity to ask questions, collaborate with other researchers as well as Research Software Engineers, and apply Open Science practices without the burden of trying and failing alone. We find the involvement of scientific computing research engineers directly in the training is crucial for success of the hackathon approach. Feedback from attendees shows that it provides a solid foundation for big data geoscience and helps attendees to quickly become competent practitioners. It also gives infrastructure providers and EOSC useful feedback on the current and future needs of researchers for making their research FAIR and open. In this presentation, we will provide examples of achievements from attendees and present the feedback EOSC providers have received.
- Published
- 2023
- Full Text
- View/download PDF
4. The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC
- Author
-
Tina Odaka, Jared Baker, Guillaume Eynard-Bontemps, Guillaume Maze, Ryan Abernathey, Anderson Banihirwe, Aurélien Ponte, and Kevin Paul
- Subjects
Interactive computing ,Scheme (programming language) ,Computer science ,business.industry ,Xarray ,Distributed computing ,Cloud computing ,02 engineering and technology ,Benchmarking ,interactive computing ,Dask ,Software ,020204 information systems ,Scalability ,HPC ,0202 electrical engineering, electronic engineering, information engineering ,Data_FILES ,Pangeo ,cloud ,020201 artificial intelligence & image processing ,benchmarking ,business ,computer ,Chunking (computing) ,computer.programming_language - Abstract
The Pangeo ecosystem is an interactive computing software stack for HPC and public cloud infrastructures. In this paper, we show benchmarking results of the Pangeo platform on two di erent HPC sys- tems. Four di erent geoscience operations were considered in this bench- marking study with varying chunk sizes and chunking schemes. Both strong and weak scaling analyses were performed. Chunk sizes between 64MB to 512MB were considered, with the best scalability obtained for 512MB. Compared to certain manual chunking schemes, the auto chunk- ing scheme scaled well.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.