DP SEO Engine

Deployed at: deployment support ended

RAG-based SEO Optimizer that leverages internal style guides and SEO optimization rules to generate URL slugs, tags, and other functionalities for articles written by the Daily Pennsylvania Inc.

This project uses Retrieval-Augmented Generation (RAG) with LLMs to provide SEO guidance to the editorial team by combining internal style guides with advanced SEO techniques. The system takes in PDFs, CSV data, and external web documents, processes them, and offers suggestions for optimizing SEO performance.

Project Structure

your_project/
├── README.md
├── .gitignore                # Excludes unnecessary files from version control
├── pyproject.toml            # Poetry project configuration
├── files/                    # Data for RAG
├── src/                      # All source code is located here
│   ├── __init__.py           # Marks the directory as a Python package
│   ├── app.py                # Main application ENTRY POINT (run this)
│   ├── chain.py              # Defines the LLM chain and retrieval logic
│   ├── data_loader.py        # Functions to load PDFs, CSVs, and web data
│   ├── prompt.py             # Defines the prompt template for the LLM
│   ├── text_splitter.py      # Splits long documents into smaller chunks
│   ├── ui.py                 # Contains the Gradio UI logic for interaction
│   └── vector_store.py       # Handles vector storage and retrieval using Chroma
├── .env.sample               # Environment variables (API keys, etc.)

File Descriptions

README.md: Documentation for the project setup, structure, and usage guidelines.
.gitignore: Excludes unnecessary files like virtual environments, .env, and cache files from version control.
pyproject.toml: Defines dependencies and project configurations using Poetry.
src/: Contains the source code for the project.
- app.py: Main entry point for the application. Sets up data loading, vector stores, and launches the Gradio UI.
- chain.py: Sets up the language model chain that interacts with the LLM to generate responses. Integrates vector retrieval.
- data_loader.py: Contains functions to load documents from CSVs, web URLs, and PDFs.
- prompt.py: Defines the template for the LLM prompt, ensuring the correct format for SEO-optimized output.
- text_splitter.py: Splits documents into smaller chunks to be processed by the LLM.
- ui.py: Contains the Gradio UI setup, which provides an interface for users to interact with the system.
- vector_store.py: Manages the creation of vector databases using Chroma to store and retrieve document embeddings.
files/: Contains the files (pdfs, csv etc) that are used for RAG

Setup Guide

Follow the steps below to clone, set up, and run the project. The setup involves using Poetry for dependency management and a virtual environment for isolation. Poetry is the standard in modern python dependency management, resolution and handling virtual environments.

Step 1: Poetry Installation

Install Poetry:

Follow these guidelines to install poetry for your system (if you don't have it installed already):
- Go to: https://python-poetry.org/docs/
- Make sure to add Poetry to your PATH variable (just a reminder to not skip this step during installation)
Verify the installation: Open shell or terminal and run
```
poetry --version
```
If you see the version number, Poetry is successfully installed.

Step 2: Clone the Repository

To get started, open an empty folder in vs-code and clone the project repository:

git clone https://github.com/hussainzs/dp-seo-engine.git
cd dp-seo-engine

Step 3: Set Up the Virtual Environment

Poetry automatically manages virtual environments for each project. To create and activate the environment:

Optional: Configure Poetry to create the virtual environment in the project directory:

Before installing dependencies and activating the environment, run the following command to ensure that the virtual environment is created inside your project folder (i.e., in .venv/):
```
poetry config virtualenvs.in-project true
```
This command tells Poetry to always place the virtual environment inside a .venv/ folder within the project directory. This is optional but helps finding interpreter path and managing virtual environments.
Install Dependencies and activate virtual environment:

After setting the configuration, you can continue with the following commands:
```
poetry install
poetry shell
```
poetry install will install all the dependencies in the pyproject.toml file and create a virtual environment in the .venv folder.

Poetry shell will activate the virtual environment.
Set the correct Python Interpreter:

To ensure vs code uses the correct python interpreter, follow these steps:
- Type and select Python: Select Interpreter.
- Paste the path to the virtual environment python interpreter (alternatively on vs code, you can click Find and browse through your project directory .venv\Scripts\python to find the python interpreter).
Find Path: If you are unsure where your path is, you can find it by running the following command in the terminal:
```
poetry env info --executable
```

Step 4: Set Up Environment Variables

You will need to configure environment variables (e.g., API key). These should be placed in the .env file in the project root. Copy the .env.sample file in the project root and name it .env add the values.

Note: Don't put any of the values in quotes. The .env file should look like this:

ANTHROPIC_API_KEY=dummy-key123456
LLM_MODEL_NAME=dummy-model123456
NOMIC_LOGIN_KEY=dummy-key123456

Step 5: Running the Application

Once everything is set up, you can run the application as follows:

poetry run python src\app.py

This will launch the Gradio UI, allowing you to interact with the SEO optimizer.

Contribution Guidelines

1. Add .env to your .gitignore file to avoid sharing your API keys and other sensitive information.

2. Please use type hints and docstrings for your functions.

To contribute any changes, please follow these steps:

Create a new branch for your feature:

git checkout -b feature/your-feature-name

Make your changes, ensuring they adhere to best practices.
Commit your changes with a meaningful commit message:
```
git commit -m "description of your changes"
```

Push to your branch:

git push origin feature/your-feature-name

Optional: Best Practice to resolve conflicts:
- Switch to main branch and pull the latest changes (so you have the latest main)
- Switch back to your branch and merge main into your branch (allows you to resolve conflicts locally in your own branch, ensuring that the main branch stays clean and conflict-free)
- Resolve any conflicts that happen during this merge
- Push any chages you made to your branch.
- Then proceed with a pull request as shown below.
Open a pull request to publish your changes into the main branch.
1. Push your branch: if you haven't already
```
git push origin your-feature-branch-name
```
1. Go to GitHub.com and open our repository.
2. Create the pull request:
  - Once you’ve pushed your branch, GitHub usually displays a prompt to open a pull request. Click the "Compare & pull request" button.
  - If you don’t see the prompt, go to the "Pull requests" tab in your repository, then click the "New pull request" button.
  - Ensure main is selected as the base branch, and your feature branch as the compare branch.
3. Add a title and description
4. Submit the pull request by clicking "Create pull request."

Additional Resources

Poetry Documentation: https://python-poetry.org/docs/
Gradio Documentation: https://gradio.app/docs/
LangChain Documentation: https://langchain.com/docs/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DP SEO Engine

Project Structure

File Descriptions

Setup Guide

Step 1: Poetry Installation

Step 2: Clone the Repository

Step 3: Set Up the Virtual Environment

Step 4: Set Up Environment Variables

Step 5: Running the Application

Contribution Guidelines

Additional Resources

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
files		files
src		src
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

hussainzs/dp-seo-engine

Folders and files

Latest commit

History

Repository files navigation

DP SEO Engine

Project Structure

File Descriptions

Setup Guide

Step 1: Poetry Installation

Step 2: Clone the Repository

Step 3: Set Up the Virtual Environment

Step 4: Set Up Environment Variables

Step 5: Running the Application

Contribution Guidelines

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages