Natural Language End-To-End Summarization App

This is a full-featured summarization pipeline that processes "TED Talk" transcripts or any of your choice into concise summaries using state-of-the-art NLP models. The app is designed with a modular architecture and supports multiple interfaces, including a UI, API, and CLI, for maximum flexibility and usability. It leverages Hugging Face Transformers, FastAPI, MongoDB Atlas, Docker and others for seamless deployment and scalability.

Features

Complete NLP Pipeline: From data ingestion to model fine-tuning and inference, the pipeline handles:
- Data ingestion, standardization, validation, and transformation.
- Remote data management using MongoDB Atlas.
- Modular design for reusability and scalability.
Model Integration:
- Hugging Face Transformers for model fine-tuning, validation and inferance.
- Uploading fine-tuned model to Hugging Face model hub that could be then easliy deployed via spaces.
- TED Talks dataset for training and evaluation.
Logging and Outputs:
- Detailed logging at every stage for transparency.
- All outputs (e.g., standardized data, models, and artifacts) saved in the artifacts folder.
Interfaces:
- UI: Built with Gradio or Mesop.
- API: Developed using FastAPI for remote access.
- CLI: Trigger complete pipelines with a single command.
Cloud and Local Deployment:
- Hugging Face Model Hub for hosting and inference.
- Fully functinoal reproducable configured local enviroment.
- Dockerized setup for reproducibility and platform independence.
Configuration Management:
- .env file for environment variables.
- config.yml for tailored configuration.
- params.yaml for model and pipeline configuration.

Project Directory Structure

TEXT SUMMARIZATION/
├── __pycache__/
├── .github/                     # GitHub workflows and configurations.
├── artifacts/                   # Stores outputs like standardized data and trained models.
├── assets/                      # Images and documentation-related assets.
├── config/                      # Configuration files for pipeline stages and model parameters.
├── logs/                        # Log files for monitoring and debugging.
├── poc/                         # Proof-of-concept notebooks for experimentation.
├── src/                         # Main application source code.
│   ├── TextSummarizer/
│       ├── components/          # Individual modules for each stage in the pipeline.
│       ├── config/              # Configuration manager and related utilities.
│       ├── constants/           # Constant values used across the app.
│       ├── entity/              # Data classes for structured objects.
│       ├── logging/             # Custom logging setup.
│       ├── pipeline/            # Orchestrates the execution of pipeline stages.
│       ├── routes/              # API routes for FastAPI.
│       ├── utils/               # Utility functions.
│       ├── viewers/             # Code for UI viewers like Gradio or Mesop.
│       └── __init__.py          # Package initializer.
├── textsummarizer-env/          # Virtual environment directory.
├── .dockerignore                # Ignore patterns for Docker builds.
├── .env                         # Environment variables.
├── .env.example                 # Example environment file.
├── .gitignore                   # Git ignore patterns.
├── app.py                       # Application entry point.
├── deployment_requirements.txt  # Additional requirements for deployment.
├── Dockerfile                   # Docker configuration for containerization.
├── LICENSE                      # License information (MIT).
├── main.py                      # Main pipeline execution script.
├── params.yaml                  # YAML configuration for model fine-tuning.
├── ProjectTemplate.py           # Project template for consistency.
├── README.md                    # Project documentation.
├── requirements.txt             # Required Python packages.
└── setup.py                     # Setup script for reproducibility.

Technologies Used

Python 3.11.0: Core programming language.
Hugging Face Transformers: Model fine-tuning and deployment.
FastAPI: API development.
Gradio and Streamlit: UI components.
MongoDB Atlas: Remote data management.
Docker: Containerized deployment.
Logging Module: Pipeline logging.
PyYAML: Configuration management.
And many others in the requirements.txt file.

Setup and Installation

Clone the Repository:

git clone https://github.com/Abdallahelraey/Text-Summarization.git

Set Up a Virtual Environment:

python3 -m venv venv
venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Configure Environment:
- Update the .env file with your environment variables (e.g., MongoDB URI, API keys).
- Update params.yaml for model configurations and data URIs.
- Update config.yml for tailored configuration.

How to Use the Text Summarization Pipeline (Run the App Locally)

To trigger the text summarization pipeline, follow these steps:

Ensure you have all the required dependencies installed.
Run the following command to execute the pipeline:

python main.py

This will initiate the pipeline, starting with data ingestion and progressing through data validation, standardization, transformation, and finally, model development for summarization.

When it comes to running the app you can chose one of three options via an argument that you can pass with the cli running command...

a. If you want to use API's you can use:

python app.py api

b. If you want to use MesopUI you can use:

python app.py mesop

c. If you want to use GradioUI you can use:

python app.py gradio

Run the model on the cloud via Huggin Face Spaces

Docker Setup (Optional):

docker build -t summarization-app .
docker run -p 8000:8000 summarization-app # Or any port of your choise

Customization

Fine-Tune Your Model: Change the model name in params.yaml and provide a new dataset URI to train on your data.
Add Your Dataset: Update the params.yaml configuration to specify a new data source.

Examples

Summarizing a TED Talk:
- Upload a transcript file via the UI or API.
- Receive a concise summary output.
Re-train the model on a custom dataset:
- Update the configuration files and trigger the pipeline.

Deployment Options

Local: Full pipeline execution, including downloading, fine-tuning, and inference.
Cloud: Model hosted on Hugging Face Model Hub, accessible via UI, API, or CLI.
Docker: Portable deployment using Docker containers.

Future Enhancements

Full deployment to Amazon Cloud.
Implementing CI/CD pipelines.
Adding model and data versioning for better DevOps integration.

Contributions and Support

We welcome contributions!

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Language End-To-End Summarization App

Features

Project Directory Structure

Technologies Used

Setup and Installation

How to Use the Text Summarization Pipeline (Run the App Locally)

Customization

Examples

Deployment Options

Future Enhancements

Contributions and Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
assets		assets
config		config
poc		poc
src/TextSummarizer		src/TextSummarizer
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
ProjectTemplate.py		ProjectTemplate.py
README.md		README.md
app.py		app.py
deployment_requirements.txt		deployment_requirements.txt
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py

License

Abdallahelraey/Text-Summarization

Folders and files

Latest commit

History

Repository files navigation

Natural Language End-To-End Summarization App

Features

Project Directory Structure

Technologies Used

Setup and Installation

How to Use the Text Summarization Pipeline (Run the App Locally)

Customization

Examples

Deployment Options

Future Enhancements

Contributions and Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages