European Commission logo
Directorate-general | JRC

Big Data Analytics Platform

Multi-petabyte scale storage, data services and data analytics for policy support

Vision

The JRC Big Data Analytics Platform (BDAP) links data, data services, data scientists and thematic experts for generating policy relevant insights and foresight. It will play an instrumental role in advancing JRC to better mobilise and synthesise its collective knowledge and expertise in support to the EC priorities.

Building blocks

BDAP is build around a petabyte-scale storage system coupled with a processing cluster, accessible from anywhere through encrypted protocols and multi-factor authentication. It provides interactive data analysis tools, a remote data science desktop and distributed computing with specialized hardware for machine learning and deep learning tasks. It groups services for data analytics, data visualisation and data dissemination under the same platform.

Main services

All the BDAP services are built on top of the hardware layer, which consists of servers dedicated to storage and servers dedicated to processing. The multi-petabyte storage is managed by EOS, the distributed file system created by CERN, that allows all the services to see the full storage capacity as a single volume. Dedicated hardware for machine learning and deep learning is also available. On top of the hardware layer, three main services are built, as displayed in the following image:


Quick guide

A quick guide to BDAP and its services is available for downloading at the following link:

Quick guide to BDAP
English
(1.2 MB - PDF)
Download 

BDAP services requiring authentication are mainly serving JRC users. Please consult the JRC Service Catalog for the Big Data Analytics Platform  to get more information on the access modes to BDAP.

Documentation

Pyjeo user guide

User guide of the pyjeo library: how to manage geospatial data in JEO-desk and JEO-batch

Interapro user guide

User guide of the interapro library: how to visualise and analyse geospatial datasets in the JEO-lab Jupyter notebooks

Voilalibrary user guide

User guide of the voilalibrary: library to simplify the development of impactful web applications as Voila' dashboards

Libraries for GUI development available in JEO-lab

Links to help pages of the main libraries to use to create graphical user interface elements in Jupyter notebooks and Voilà dashboards:

Widgets GUI elements:
ipywidgets         ipyvuetify         vuetifyjs        
Mapping:
ipyleaflet
Tabular data:
qgrid
Charting:
bqplot         plotly         bokeh         matplotlib
Custom drawing:
ipycanvas
Hierarchical data display:
ipytree
Events management:
ipyevents

How to access the platform

Various authentication services are used on the BDAP platform, depending on the specific service being accessed. All services require the user to have a BDAP account. Some services require additional security measures in order to be accessed. For those, user certificates and one time passwords are provided to authorised users.

Services

List of services provided by the JRC Big Data Analytics Platform:

Quick access to BDAP services
JEO-lab

The JEO-lab is a Jupyter notebook environment, intended primarily to interactively analyse and visualize data via a dedicated API. In addition to the usage of the API, JEO-lab is also used for starting up machine learning notebooks or accessing project-specific notebook containers for selected use cases. JEO-lab documentation

JEO-desk

The JEO-desk is a desktop terminal service that provides a graphical Linux remote desktop terminal (Xubuntu based) accessible from within a modern web browser with support for HTML5. This virtual terminal is provided from machines located within the JEODPP infrastructure and therefore with fast connection to the JEODPP data. JEO-desk documentation

JEO-batch

The JEO-batch service provides computing power to the JRC experiments for tasks such as high scale image processing, data analysis and simulation. It is an advanced, distributed system designed mainly for high-throughput scientific computing. JEO-batch documentation

JEO-cloud

The JEO-cloud service is based on Nextcloud and is aimed at facilitating collaboration between JEODPP users and making easier the transfer of documents (scripts, project files etc.) and small datasets between the user personal computer and the JEODPP Terminal Service. It also provides some functionalities useful to provide remote support to JEODPP users by the JEODPP team (screen sharing and videoconferencing). JEO-cloud documentation

Voilà dashboards

Voilà is a Jupyter notebook extension to automatically create standalone applications and dashboards. Notebooks are rendered by showing only the output of the cells, while the code is hidden. Suitable for non-technical experts for communicating insights and foresight to a wider audience. Single environment for full data analytics workflows from research and innovation to outreach engaging policy makers and citizens.

As an example of a Voilà dashboard, the CollectionsExplorer is a good starting point to explore the geospatial datasets stored in the platform.

GitLab instance

GitLab is the DevOps platform used at BDAP. It helps BDAP team members and users to collaborate on software development and to provide a place where everyone can contribute. Users can add issues in the BDAP GitLab instance to ask for new features, to evaluate new dataset downloading, to request the installation of new software packages and to report bugs or problems in the BDAP services.

Data

Data provided by BDAP:

Data Catalogue

Web data catalogue for exploring and browsing all the collections stored in BDAP. It follows the SpatioTemporal Asset Catalog (STAC) specifications providing a common language to describe geospatial data.

Collections Explorer

Voila' application to browse, visualise and compare the main geospatial datasets available at BDAP.

Platform information


Timeline

BDAP is the successor of the JRC Earth Observation Data and Processing Platform, widening the scope of the platform towards any type of Big Data analytics. The commonly known name "JEODPP" is continued to be used in most documentations and services URL's. Here a timeline of the project progress:

  1. 1 January 2015
    Earth Observation and Social Sensing Big Data pilot project
    Kick-off of EO&SSBD as a pilot project
  2. 1 March 2016
    Purchase and installation of the first hardware
    JRC Big Data Platform starts its operativity with the first servers installed and configured
  3. 1 January 2019
    Big Data Analytics project launched as an institutional project
    JRC Big Data Platform ends the pilot phase and enters full institutional
  4. 12 December 2020
    Big Data Analytics Platform recognised as an official EC IT platform
    EC ITC and Cybersecurity Board (ITCB) provide a positive opinion on the evolution of the platform towards a Big Data Analytics Platform as a component of the EC Data Platform

Numbers

~175 servers
In the JRC Data Center
For storage, processing jobs and services
~4,500 cores
12-19 GBs of RAM per core
For JEO-batch/desk/lab and other services
10 GPU servers
38 Nvidia GPU's in total
For machine learning and deep learning
28.4 PiB storage
14.2 PiB net capacity
For datasets and satellite images storage



Software stack

The JRC Big Data Analytics Platform is mainly built on Open Source Software. Here a partial list of the tools and libraries used:

Publications

The publications listed here correspond to all publications registered in pubsy and containing at least one co-author from the Big Data Analytics Platform. Numerous publications are the result of fruitful collaborations with other JRC projects and external partners.

List of all publications
English
(159 KB - PDF)
Download 

Reference publication to be used for citing BDAP in papers:

P. Soille, A. Burger, D. De Marchi, P. Kempeneers, D. Rodriguez, V. Syrris, and V. Vasilev. “A Versatile Data-Intensive Computing Platform for Information Retrieval from Big Geospatial Data”. Future Generation Computer Systems 81.4 (Apr. 2018), pp. 30–40. doi: 10.1016/j.future.2017.11.007.

Books:

P. Soille, S. Loekken, and S. Albani, eds. Proc. of the 2021 Conference on Big Data from Space (BiDS’21). ESA-JRC-SatCen. Online event: Publications Office of the European Union, May 2021. doi: 10.2760/125905 P. Soille, S. Loekken, and S. Albani, eds. Proc. of the 2019 Conference on Big Data from Space (BiDS’19). ESA-JRC-SatCen. Munich, Germany: Publications Office of the European Union, Feb. 2019. doi: 10.2760/848593 P. Soille and P. Marchetti, eds. Proc. of the 2017 Conference on Big Data from Space (BiDS’17). ESA-JRC-SatCen. Toulouse, France: Publications Office of the European Union, Nov. 2017. doi: 10.2760/383579 P. Soille and P. Marchetti, eds. Proc. of the 2016 Conference on Big Data from Space (BiDS’16). ESA-JRC-SatCen. Tenerife, Spain: Publications Office of the European Union, Mar. 2016. doi: 10.2788/854791.

Recent journal papers:

P. Kempeneers, O. Pesek, D. De Marchi, and P. Soille. “A Python Package For The Analysis of Geospatial Data”. International Journal of Geo-Information 8.10 (2019). doi: 10.3390/ijgi8100461 V. Syrris, P. Hasenohr, B. Delipetrev, A. Kotsev, and P. Soille. “Evaluation of the potential of convolutional neural networks and random forests for multi-class segmentation of Sentinel-2 imagery”. Remote Sensing 11.8 (2019), p. 4. doi: 10.3390/rs11080907 P. Soille, A. Burger, D. De Marchi, P. Kempeneers, D. Rodriguez, V. Syrris, and V. Vasilev. “A Versatile Data-Intensive Computing Platform for Information Retrieval from Big Geospatial Data”. Future Generation Computer Systems 81.4 (Apr. 2018), pp. 30–40. doi: 10.1016/j.future.2017.11.007. V. Syrris, C. Corbane, M. Pesaresi, and P. Soille. “Mosaicking Copernicus Sentinel-1 data at global scale”. IEEE Transactions on Big Data (2018). doi: 10.1109/TBDATA.2018.2846265. P. Kempeneers and P. Soille. “Optimizing Sentinel-2 image selection in a Big Data context”. Big Earth Data 1.1–2 (2017), pp. 145–158. doi: 10.1080/20964471.2017.1407489.

Media gallery


News

Connected EVENT |
BDAP User-Group Meeting [27/10/2022 09:30--12:00]

The scope of the JRC Big Data Analytics Platform (BDAP) User-Group Meetings is to: Bring together current and prospective BDAP users and team members; Present latest achievements and on-going developments; Discuss future new developments to better link data, data scientists, thematic and policy experts across different domains for generating policy relevant insights and foresight.

Training |
First steps on the JRC Big Data Analytics Platform [10/11/2022, 9:30-12:30]

After an introduction to the JRC Big Data Analytics Platform (BDAP) and its multi-petabyte scale geospatial data holdings co-located with batch processing and remote desktop capabilities, this training will provide a general overview of the different services provided. The focus in this training will be on the data science remote desktop service (JEO-Desk). In-depth and hands-on training for each of the other services will be provided in subsequent training sessions. This training session is for anyone interested in data and data analytics including visualisation whether for small or big datasets in any application domain. No prior knowledge on data science or programming is needed for this training.

Training |
Deep Learning on the JRC Big Data Analytics Platform [17/11/2022, 9:30-12:30]

The course is structured in five main parts:
1. General introduction to Deep Learning: theory and base concepts
2. Brief introduction to the JRC Big Data Analytics Platform
3. Live demonstration of Deep Learning training and inference on the JRC Big Data Analytics Platform
4. Brief introduction to MLOps
5. Tips, tricks and common errors when using Deep Learning models

Training |
Voilà dashboarding on the JRC Big Data Analytics Platform [24/11/2022, 9:30-16:30]

After an introduction to Voilà and a brief demonstration of some of the most representative Voilà dashboards created by the BDAP team in recent times, the course will focus on the detailed explanation of the widgets libraries (ipywidgets and ipyvuetify) that are available inside the BDAP JupyterLab environment and on their usage for the design of the dashboard interfaces. The training will also provide some examples of: Pandas tabular data manipulation and display, visualization of charts using Plotly, creation of customs SVG drawing (with interactivity provided by ipyevents), display of geospatial vector and raster datasets (with bivariate and trivariate legends). An important part of the training will be dedicated to the procedure for the autonomous deploy in production of Voilà dashboards using the VaaS service. The remaining part of the day will be allocated to the usage of the BDAP voilalibrary, also through the step-by-step analysis of an example of a complex dashboard to interact with EUROSTAT data on Energy Consumption in Europe and the voilalibrary functions to ease the creation of responsive web applications.

Presentation |
"Voilà dashboards for policy support"

Presentation given at the JupyterCon 2020 on the Voilà dashboards created at the Big Data Analytics Platform for policy support

Contacts

Email us
JRC-JEODPP@ec.europa.eu
Chat with us
Chat with BDAP group or with other users
Follow us
Go to our Connected page
Ask for support
Create an issue in BDAP GitLab instance
Follow issues
by accessing BDAP GitLab instance or by email (reply to GitLab email notifications)
Open a use case
Create a new use case by filling and returning this Excel template

Team

BDAP is made by people who love IT and Data Science:

Project leader:
Pierre Soille https://orcid.org/0000-0002-8479-9205
Statutory staff:
Armin Burger
Pieter Kempeneers
Davide De Marchi
Paul Hasenohr
Marco Scavazzon
Roberto Ugolotti
Chiara Chiarelli
Alba Bernini
Consultants:
Franck Eyraud
Pier Valerio Tognoli
Luca Marletta
Tomas Kliment
Guido Notari

Linked projects