Skip to content

This project simulates and processes real-time clickstream data from an Amazon-like e-commerce website. Built as an end-to-end data engineering pipeline, it captures user interactions, processes them using Apache Spark, stores insights in Cassandra, and visualizes analytics in Tableau. The goal is to demonstrate scalable, real-time data processing

Notifications You must be signed in to change notification settings

Kishorsenthilkumar/-Real-Time-Clickstream-Data-Pipeline-using-Kafka-Spark-Cassandra-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time-Clickstream-Data-Pipeline-using-Kafka-Spark-Cassandra-

This project simulates and processes real-time clickstream data from an Amazon-like e-commerce website. Built as an end-to-end data engineering pipeline, it captures user interactions, processes them using Apache Spark, stores insights in Cassandra, and visualizes analytics in Tableau. The goal is to demonstrate scalable, real-time data processing ![Streaming Amazon Click Events with Kafka, Spark, and Cassandra - visual selection](https://github.com/user-attachments/assets/18569f8e-4465-4578-93b8-be5b8ff691ab

Steps To Implement :

  1. install Hadoop,Spark,Kafka,Cassandra on your local machine
  2. set the path in the environment variable (windows) and bashrc file (linux)
  3. start hadoop,kafka,spark,cassandra via terminal
  4. use docker for dependencies and version compatibility issues
  5. write kafka producer and consumer code , kafka gets clickstream data from backend and spark will consume it for preprocessing
  6. write the code for data transformation like(groupby etc) and submit it to spark application
  7. for clickstream data use dataset or make a clone of website like amazon to get clickstream data using flask(backend)

About

This project simulates and processes real-time clickstream data from an Amazon-like e-commerce website. Built as an end-to-end data engineering pipeline, it captures user interactions, processes them using Apache Spark, stores insights in Cassandra, and visualizes analytics in Tableau. The goal is to demonstrate scalable, real-time data processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages