This project aims at implementing the basic theoretical concepts behind the MapReduce distributed computing paradigm, and Hadoop in particular, and at building expertise in the practical usage of high-performance computing tools for data engineering, analysis and mining. Classical data mining algorithms are applied to Big Data using Hadoop (Spark).
Link to the dataset: https://www.kaggle.com/datasets/patrickhallila1994/nba-data-from-basketball-reference