Implement the data preprocessing, especially the resampling functions (resample_data), using pyspark