What is ML_INFN?
Machine Learning is an emerging technology in data analytics, and as such is of great interest for the scientific researches carried on at INFN.
A fruitful approach to Machine Learning is a complex balance of many ingredients: a reasonable understanding of its theoretical aspects, a study on similar cases in literature and the availability of performant and tailored technological platforms.
These three ingredients are at the basis of the ML_INFN (“Machine Learning at INFN”) project, as funded by the INFN CSN5 for the years 2020-2022.
ML_INFN has chosen INFN Cloud as the platform to provide access to the tailored resources for its users – potentially the entire community of INFN researchers interested in Machine Learning.
The technology
A fruitful and user-satisfying platform for Machine Learning prototyping and developing is a technical challenge: researchers need an highly interactive environment, the access to hardware accelerators (GPUs, TPUs, in the nearby future even FPGAs), a non-limiting access do the (often multi-TB) training data, a tailored environment with a multitude of tools from different sources and hardly standardized. On top of that, research groups need to collaborate and share resources, data and code.
We chose INFN Cloud to provide these to users; in particular we deployed:
- A standardized multi user environment for research groups, based on the INDIGO-IAM Authentication and Authorization tool;
- Fast prototyping of solutions via Jupyter Notebooks and fully multi-user Jupyter Lab tools;
- Specifically procured hardware platforms, with the capability to host multiple accelerators, and equipped with fast NVMe disks to provide the needed bandwidth to data.
Virtual Machines are instantiated via the standard INFN Cloud dashboard, and are accessible to all INFN users.
The Hackathon
A first large scale test of the technology took place during the First ML_INFN Hackathon, on June 7th-9th 2021.
The hackathon had been designed to be highly interactive, with a high ratio of tutors to students, and due to the pandemic situation had to take place completely online. The capabilities of the environment we had deployed have allowed 54 students, divided into 9 work groups, to collaborate in the realization of machine learning solutions for “scientifically realistic” use cases, with the assistance of a tutor per group.
Each group has been provided with a Virtual Machine, thus sharing resources, data and code; in that virtual machine, each user had access to a specific and unique container, prepared with an environment presenting all the most recent software tools and libraries for machine learning.
At its highest utilization, INFN Cloud has been able to provide the students, tutors and organizers with ~70 such environments in parallel, without any noticeable problem neither from the technical side (user authentication, access to resources, instantiation of the environments) nor from the performance point of view.
A still ongoing survey among the users is already showing a very high level of satisfaction from the use of such technical solutions, and the intention to use similar tools for their future professional activities.
The hackathon, among other things, proved that INFN Cloud based solutions are a very good starting point for INFN users’ Machine Learning activities.
Provided with a large installed basis of tailored hardware, as expected to be deployed by INFN in the near future, the platform can indeed help in optimizing and fostering the use of such novel technologies in INFN.