Real-Time Stock Price Analysis Pipeline This project demonstrates a scalable data engineering pipeline that collects, processes, and visualizes real-time stock market data. The pipeline integrates APIs, processes data in real-time using Apache Kafka and Python, and delivers insights through visual dashboards and Python-generated graphs. real-time-stock-analysis-pipeline/
│
├── README.md # Project overview
├── requirements.txt # Python dependencies
├── data/ # Sample and processed data
│ ├── sample_data.csv
│ ├── processed_data.csv
├── src/ # Core Python scripts
│ ├── fetch_data.py
│ ├── process_data.py
│ ├── load_data.py
│ ├── visualize_data.py
│ ├── airflow_dag.py
├── dashboards/ # Tableau/Power BI dashboards
│ ├── tableau_dashboard.twb
│ ├── power_bi_dashboard.pbix
├── scripts/ # Kafka producer/consumer scripts
│ ├── kafka_producer.py
│ ├── kafka_consumer.py
├── config/ # Configuration files
│ ├── api_keys.json
│ ├── db_config.yaml
└── docs/ # Documentation and presentations
├── architecture_diagram.png
├── dataset_description.md
└── presentation.pdf
git clone https://github.com/evans25575/real-time-stock-analysis-pipeline.git cd real-time-stock-analysis-pipeline pip install -r requirements.txt
This project demonstrates a scalable data engineering pipeline designed to collect, process, and visualize real-time stock market data. The pipeline integrates APIs, processes data in real-time using Apache Kafka and Python, and presents insights through visual dashboards and Python-generated graphs. It is ideal for applications such as live market analysis, trading strategies, and financial data exploration.
- Real-Time Data Ingestion: Collects live stock market data using APIs.
- Streaming Processing: Utilizes Apache Kafka for data streaming and processing.
- Data Transformation: Transforms raw data into structured, analyzable formats.
- Data Storage: Stores processed data in CSV files for easy access and further analysis.
- Visualizations: Provides dashboards using Tableau/Power BI and Python-generated graphs.
- Automated Pipelines: Includes Airflow DAGs for scheduling and managing ETL processes.
real-time-stock-analysis-pipeline/
│
├── README.md # Project overview (You're reading this!)
├── requirements.txt # Python dependencies
├── data/ # Sample and processed data
│ ├── sample_data.csv
│ └── processed_data.csv
├── src/ # Core Python scripts
│ ├── fetch_data.py # Fetches real-time data from APIs
│ ├── process_data.py # Processes and transforms raw data
│ ├── load_data.py # Loads data to storage
│ ├── visualize_data.py # Generates visualizations
│ └── airflow_dag.py # Automates ETL process using Airflow
├── dashboards/ # Tableau/Power BI dashboards
│ ├── tableau_dashboard.twb
│ └── power_bi_dashboard.pbix
├── scripts/ # Kafka producer/consumer scripts
│ ├── kafka_producer.py
│ └── kafka_consumer.py
├── config/ # Configuration files
│ ├── api_keys.json
│ └── db_config.yaml
├── docs/ # Documentation and presentations
│ ├── architecture_diagram.png
│ ├── dataset_description.md
│ └── presentation.pdf
- Clone the repository:
git clone https://github.com/evans25575/real-time-stock-analysis-pipeline.git
cd real-time-stock-analysis-pipeline
- Install dependencies:
pip install -r requirements.txt
- Fetch Data:
python src/fetch_data.py
- Process Data:
python src/process_data.py
- Visualize Data:
python src/visualize_data.py
Visual dashboards are created using:
- Tableau:
dashboards/tableau_dashboard.twb
- Power BI:
dashboards/power_bi_dashboard.pbix
Python-generated graphs are saved in the data/processed_data.csv
folder.
Detailed documentation is available in the docs/
folder, including:
architecture_diagram.png
: Visual representation of the pipeline.dataset_description.md
: Description of the datasets used.presentation.pdf
: Project presentation for stakeholders.
This project is licensed under the MIT License. See the LICENSE file for more details.
Contributions are welcome! Feel free to submit issues, fork the repository, and make pull requests.
For questions or suggestions, feel free to reach out at: kiplaevans2018@gmail.com
python src/fetch_data.py python src/process_data.py python src/visualize_data.py Feel free to contribute or reach out with any questions! Contact: kiplaevans2018@gmail.com