This project simulates and processes real-time clickstream data from an Amazon-like e-commerce website. Built as an end-to-end data engineering pipeline, it captures user interactions, processes them using Apache Spark, stores insights in Cassandra, and visualizes analytics in Tableau. The goal is to demonstrate scalable, real-time data processing  and bashrc file (linux)
- start hadoop,kafka,spark,cassandra via terminal
- use docker for dependencies and version compatibility issues
- write kafka producer and consumer code , kafka gets clickstream data from backend and spark will consume it for preprocessing
- write the code for data transformation like(groupby etc) and submit it to spark application
- for clickstream data use dataset or make a clone of website like amazon to get clickstream data using flask(backend)