The repository is dedicated to the Debias project, dedicated to showing relationships between different concepts in the news.
We cover different geographical locations (mainly USA and UK), different political positions (taken from AllSides) and various news providers.
The final goal is to create an interactive visualization, which would show how concepts are interconnected within different time stamps and from different points of view.
Scaper is a service which scrapers news from different news providers. This service is recursively calling itself to scrape the next news pages.
If page requires rendering, it will be sent to the renderer service. If page is static, it is stored in the s3 service, metadata is stored in the metastore service, and a processor service is called to process the page.
Renderer is a service which renders news pages using browser API. It is called by the scraper service. After render, it saves HTML content to the s3 service and metadata to the metastore service and sends a request to the processor service to process the page.
Processor is a service which processes news pages. It extracts human-readable text from the page, performs NLP pipelines and stores the results in the wordstore service.
- Classifier
- A zero-shot classifier from HuggingFace Transformers. In particular,
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anlidue to it's comparably low size.
- A zero-shot classifier from HuggingFace Transformers. In particular,
- Extractor
- A keyword exctraction algorithm with SpaCy. SpaCy is used to extract Named Entities, which are used as keywords after processing.
Web server which serves the results of the processor. It aggregates the statistics of the words, precomputes and caches aggregations, and serves them to the client. It serves the frontend files as well.
A postgres database which stores metadata of the scraped pages.
A S3 provider which stores the static pages. Could be a local MinIO deployment or an external S3 cloud service.
A postgres database which stores the processed pages, keywords, topics, and their corresponding frequencies.
A NATS message queue which is used for S2S communication.
The Javascript visualization is available at https://debias.dartt0n.ru/
- Create
.envfile Fill in the following variables:
PG_USERNAME=...
PG_PASSWORD=...- Create configuration files
debias/scraper/config.tomldebias/server/config.tomldebias/processor/config.tomldebias/renderer/config.toml
Note
You can find example configuration in the following files:
- Pre-download ML models
mkdir models
uv run --group processor download-models.py- Run services
docker compose -f docker-compose.yml up --build --detach- Create
.envfile Fill in the following variables:
MINIO_ACCESS_KEY=...
MINIO_SECRET_KEY=...
MINIO_BUCKET=...
PG_USERNAME=...
PG_PASSWORD=...- Create configuration files
debias/scraper/config.tomldebias/server/config.tomldebias/processor/config.tomldebias/renderer/config.toml
Note
You can find example configuration in the following files:
- Pre-download ML models
mkdir models
uv run --group processor download-models.py- Create MinIO S3 service using docker:
docker compose -f minio.docker-compose.yml up minio_setupThe following services could be automatically scaled horizontally for better performance:
- scraper
- renderer
- processor
For easy scaling use docker-compose --scale option.
E.g., the following command will launch 5 scaper instances, 2 rendererinstances. 2processor` instances:
docker compose up --detach \
--scale scaper=5 \
--scale renderer=2 \
--scale processor=2\To stop all remove all containers AND THEIR VOLUMES:
docker compose -f minio.docker-compose.yml down --volumes
# or
docker compose -f docker-compose.yml down --volumes.
├── debias # shared code root
│ ├── core # reusable components - s3, metastore, configs, etc
│ └── scraper # scraper related code
│ └── processor # NLP processor related code
│ └── renderer # browser renderer related code
│ └── server # server related code
│ └── frontend # frontend related code
To add new service:
- Create new directory in
debiasdirectory - Create
dockerfileprefixed withservicename(e.g.scraper.dockerfile) - Add all the required dependencies to
pyproject.tomlunder--group servicename - Add new package to
tool.hatch.build.targets.wheelconfig inpyproject.toml
- Create
.envfile Fill in the following variables:
PG_USERNAME=...
PG_PASSWORD=...- Launch database container Using docker-compose:
docker compose up -d database- Generate random data
Set environment variable
POSTGRES_CONNECTIONto the connection string of the database (replaceUSERNAMEandPASSWORDwith your actual username and password):
POSTGRES_CONNECTION="postgresql://USERNAME:$PASSWORD$@localhost:5432/postgres" uv run generate-data.py- Create server configuration file
config.tomlReplaceUSERNAMEandPASSWORDwith your actual username and password:
[pg]
connection = "postgresql://${PG_USERNAME}:${PG_PASSWORD}$@localhost:5432/postgres"- Launch backend server with hot reload
CONFIG=config.toml uv run litestar --app debias.server:app run --debug --reloadWe have collected 38 sources of news from USA and UK and found out their political positions.
It seems left parties are indeed more liberal.
We have parsed several news articles using python and prepared a deployment describing general trends in these articles.
The deployment can be found on Github Pages
The visualization is divided into 3 parts:
- Comparison of topics distribution for Left-Leaning and Right-Leaning media.
- Comparison of keywords networks for Left-Leaning and Right-Leaning media.
- Sandbox network with filtering functionality.
All visualizations are created using D3.js.
You can view the visualization at https://debias.dartt0n.ru/







