Skip to content

This project builds a real-time data pipeline that ingests, processes, and stores data using Apache Kafka, Apache Spark, and MySQL. It simulates streaming data, processes it in real time, and saves the results for analysis. Automated with Apache Airflow, it highlights expertise in data engineering and real-time data processing.

Notifications You must be signed in to change notification settings

evans25575/Real-time-stock-analysis-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Real-Time Stock Price Analysis Pipeline This project demonstrates a scalable data engineering pipeline that collects, processes, and visualizes real-time stock market data. The pipeline integrates APIs, processes data in real-time using Apache Kafka and Python, and delivers insights through visual dashboards and Python-generated graphs. real-time-stock-analysis-pipeline/

│ ├── README.md # Project overview ├── requirements.txt # Python dependencies ├── data/ # Sample and processed data │ ├── sample_data.csv
│ ├── processed_data.csv ├── src/ # Core Python scripts │ ├── fetch_data.py
│ ├── process_data.py
│ ├── load_data.py
│ ├── visualize_data.py
│ ├── airflow_dag.py
├── dashboards/ # Tableau/Power BI dashboards │ ├── tableau_dashboard.twb │ ├── power_bi_dashboard.pbix ├── scripts/ # Kafka producer/consumer scripts │ ├── kafka_producer.py │ ├── kafka_consumer.py ├── config/ # Configuration files │ ├── api_keys.json
│ ├── db_config.yaml └── docs/ # Documentation and presentations ├── architecture_diagram.png
├── dataset_description.md
└── presentation.pdf

git clone https://github.com/evans25575/real-time-stock-analysis-pipeline.git cd real-time-stock-analysis-pipeline pip install -r requirements.txt

Real-Time Stock Price Analysis Pipeline

📌 Project Overview

This project demonstrates a scalable data engineering pipeline designed to collect, process, and visualize real-time stock market data. The pipeline integrates APIs, processes data in real-time using Apache Kafka and Python, and presents insights through visual dashboards and Python-generated graphs. It is ideal for applications such as live market analysis, trading strategies, and financial data exploration.


🔥 Features

  • Real-Time Data Ingestion: Collects live stock market data using APIs.
  • Streaming Processing: Utilizes Apache Kafka for data streaming and processing.
  • Data Transformation: Transforms raw data into structured, analyzable formats.
  • Data Storage: Stores processed data in CSV files for easy access and further analysis.
  • Visualizations: Provides dashboards using Tableau/Power BI and Python-generated graphs.
  • Automated Pipelines: Includes Airflow DAGs for scheduling and managing ETL processes.

📁 Project Structure

real-time-stock-analysis-pipeline/
│
├── README.md               # Project overview (You're reading this!)
├── requirements.txt         # Python dependencies
├── data/                    # Sample and processed data
│   ├── sample_data.csv
│   └── processed_data.csv
├── src/                     # Core Python scripts
│   ├── fetch_data.py         # Fetches real-time data from APIs
│   ├── process_data.py       # Processes and transforms raw data
│   ├── load_data.py          # Loads data to storage
│   ├── visualize_data.py     # Generates visualizations
│   └── airflow_dag.py        # Automates ETL process using Airflow
├── dashboards/              # Tableau/Power BI dashboards
│   ├── tableau_dashboard.twb
│   └── power_bi_dashboard.pbix
├── scripts/                 # Kafka producer/consumer scripts
│   ├── kafka_producer.py
│   └── kafka_consumer.py
├── config/                  # Configuration files
│   ├── api_keys.json
│   └── db_config.yaml
├── docs/                    # Documentation and presentations
│   ├── architecture_diagram.png
│   ├── dataset_description.md
│   └── presentation.pdf

🚀 Installation

  1. Clone the repository:
 git clone https://github.com/evans25575/real-time-stock-analysis-pipeline.git
 cd real-time-stock-analysis-pipeline
  1. Install dependencies:
 pip install -r requirements.txt

📌 Usage

  1. Fetch Data:
 python src/fetch_data.py
  1. Process Data:
 python src/process_data.py
  1. Visualize Data:
 python src/visualize_data.py

📊 Visualization

Visual dashboards are created using:

  • Tableau: dashboards/tableau_dashboard.twb
  • Power BI: dashboards/power_bi_dashboard.pbix

Python-generated graphs are saved in the data/processed_data.csv folder.


📚 Documentation

Detailed documentation is available in the docs/ folder, including:

  • architecture_diagram.png: Visual representation of the pipeline.
  • dataset_description.md: Description of the datasets used.
  • presentation.pdf: Project presentation for stakeholders.

📄 License

This project is licensed under the MIT License. See the LICENSE file for more details.


🤝 Contributing

Contributions are welcome! Feel free to submit issues, fork the repository, and make pull requests.


📧 Contact Information

For questions or suggestions, feel free to reach out at: kiplaevans2018@gmail.com

python src/fetch_data.py python src/process_data.py python src/visualize_data.py Feel free to contribute or reach out with any questions! Contact: kiplaevans2018@gmail.com

About

This project builds a real-time data pipeline that ingests, processes, and stores data using Apache Kafka, Apache Spark, and MySQL. It simulates streaming data, processes it in real time, and saves the results for analysis. Automated with Apache Airflow, it highlights expertise in data engineering and real-time data processing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published