AICryptoPulse

AICryptoPulse is an advanced Retrieval-Augmented Generation (RAG) system designed to curate and analyze daily crypto news. It powers the Telegram bot @agent_cryptopulse_bot, providing insightful updates directly to your chat.

🚀 Project Structure

/bot: Telegram Bot User Interface (UI).
/data: Airflow infrastructure for collecting and processing news feeds.
/notebooks: Jupyter notebooks for research and experiments.
/service: Core logic implementing the RAG pipeline and API application.

🛠 How to Run?

Set up the Airflow module (located in /data):
- Refer to the official Airflow documentation for installation.
- Run a PostgreSQL database to store feed data.
- Set up S3-like bucket to store FAISS indexes.
- Configure settings in /data/configs/.
- Enable all DAGs in the Airflow interface.
Configure environment variables:
- Use .env.example as a template to create your .env file.
Run the Service:
- Use Docker Compose to deploy the system:
```
docker-compose up -d
```
- Alternatively, use the Makefile:
```
make all
```

🌐 Data Pipeline

Data is collecting from the open APIs (feeds, Twitter APIs, Telegram API)
ETLs are running on Airflow and store all data in PostgreSQL
FAISS index (both short-term and long-term) are updated each day on Airflow

Done:

Coindesk
DLNews
Twitter big crypto accounts
DeFillamaFeed

In progress:

Tree Feed
Custom Twitter accounts

To Do:

Bloomberg
Cointelegraph
Classic Financial news portals

Benchmarks and metrics

CryptoQA (HemaChandrao/crypto_QA) - synthetic QnA dataset with GPT answers (215 rows);
Filtered crypto-2024 (sites.google.com/view/cryptoqa-2024/datasets) – dataset with answers to crypto questions from reddit and twitter from the Indian Institute of Technology (239 rows). Filtering was performed by filtering relevant items using the Qwen2.5-14b model.

`HemaChandrao/crypto_QA`

MTEB models

model_id	mAP	MRR
HIT-TMG/KaLM-embedding-multilingual-mini-instr...	0.818385	0.817797
jinaai/jina-embeddings-v3	0.807477	0.805951
Alibaba-NLP/gte-large-en-v1.5	0.779034	0.777258
WherelsAl/UAE-Large-V1	0.7547	0.752776
jxm/cde-small-v1	0.240464	0.224688

Base models

model_id	mAP	MRR
all-mpnet-base-v2	0.798267	0.796895
multi-qa-mpnet-base-dot-v1	0.778915	0.777039
all-distilroberta-v1	0.733526	0.730228
all-MiniLM-L12-v2	0.725623	0.722476
multi-qa-MiniLM-L6-cos-v1	0.716872	0.714196
multi-qa-distilbert-cos-v1	0.713394	0.710679
all-MiniLM-L6-v2	0.712731	0.709069
paraphrase-multilingual-mpnet-base-v2	0.610216	0.601813
paraphrase-albert-small-v2	0.607011	0.601449
paraphrase-multilingual-MiniLM-L12-v2	0.594264	0.585709
distiluse-base-multilingual-cased-v2	0.582778	0.575996
distiluse-base-multilingual-cased-v1	0.571691	0.563731
paraphrase-MiniLM-L3-v2	0.551712	0.543413

`Filtered Cryptoqa-2024`

MTEB models

model_id	mAP	MRR
Alibaba-NLP/gte-large-en-v1.5	0.631214	0.623856
HIT-TMG/KaLM-embedding-multilingual-mini-instr...	0.608928	0.602966
jinaai/jina-embeddings-v3	0.608519	0.601709
WherelsAl/UAE-Large-V1	0.554994	0.547885
jxm/cde-small-v1	0.155341	0.136702

Base models

model_id	mAP	MRR
all-mpnet-base-v2	0.575693	0.566497
all-distilroberta-v1	0.523694	0.512484
multi-qa-mpnet-base-dot-v1	0.515863	0.505068
all-MiniLM-L12-v2	0.509307	0.499188
all-MiniLM-L6-v2	0.469071	0.458595
multi-qa-distilbert-cos-v1	0.466012	0.453911
multi-qa-MiniLM-L6-cos-v1	0.434840	0.420884
distiluse-base-multilingual-cased-v1	0.305668	0.291416
paraphrase-multilingual-mpnet-base-v2	0.303580	0.287447
distiluse-base-multilingual-cased-v2	0.300423	0.283640
paraphrase-multilingual-MiniLM-L12-v2	0.274722	0.258638

RAG

Retriever - all-MiniLM-L6-v2.
Decoder - gpt-3.5-turbo.

Also our solution contains caching of model responses, for more reasonable spending of financial resources, chat history for the user.

📐 System Design

📏 Code Guidelines

Separate Codebases: Clearly distinguish research code from production code.
Lint Before Committing: Run linters using:
```
make lint
```

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
bot		bot
data		data
notebooks		notebooks
service		service
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile_mongodb		Dockerfile_mongodb
Dockerfile_tg_bot		Dockerfile_tg_bot
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
gunicorn.config.py		gunicorn.config.py
main.py		main.py
setup.cfg		setup.cfg
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AICryptoPulse

🚀 Project Structure

🛠 How to Run?

🌐 Data Pipeline

Benchmarks and metrics

`HemaChandrao/crypto_QA`

`Filtered Cryptoqa-2024`

RAG

📐 System Design

📏 Code Guidelines

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

0xBoringWozniak/AICryptoPulse

Folders and files

Latest commit

History

Repository files navigation

AICryptoPulse

🚀 Project Structure

🛠 How to Run?

🌐 Data Pipeline

Benchmarks and metrics

HemaChandrao/crypto_QA

Filtered Cryptoqa-2024

RAG

📐 System Design

📏 Code Guidelines

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`HemaChandrao/crypto_QA`

`Filtered Cryptoqa-2024`

Packages