Simulating a consultancy project for Repsol on gasoline stations across Spain in the 2020-2022 time span.
This projects simulates an analytics consulting service to Repsol. We had to analyze: the price development for the two most common fuel types for the period 2020 - 2022, Repsol's top competitors in the year 2022 and the top and bottom 10 Repsol's gas stations in terms of price ranking, all insights to be served and used by decisional bodies for strategical decison-making. The project followed the whole ETL process where the data ingestion has been performed on Apache NiFi, the data storage exploited HDFS and the data processing and analysis were performed on Apache Spark in a batch setting. Both the analytics presentation (comprising of all the steps followed during the project) and the underlying Apache Spark code are uploaded in the repository.
The work has been supervised by Professor Jorge Centeno, Head of Big Data & Analytics at Inditex Logistics.