Skip to content

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

License

Notifications You must be signed in to change notification settings

Bornelov-lab/Camformer

Repository files navigation

Camformer

Code repository for the paper Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic by Tirtharaj Dash and Susanne Bornelöv.

Problem: Let $S = {A,C,G,T,N}^{110}$ denote a promoter sequence of length $110$. Here, $A$, $C$, $G$, $T$ are the four nucleotides and $N$ represents an unknown nucleotide. The gene expression prediction task is then to learn a mapping $f: S \to \mathbb{R}$.

Graphical abstract

Data: We use data from DREAM Challenge consisting of 7 million random promoter sequences and the yellow fluorescent protein level. We then use the official test set from the challenge to evaluate our trained model(s).

Model: A residual convolutional neural network, strategically optimised using automated hyperparameter tuning.

Search for a model

The figure above shows the structure of the original (large variant) model (16M parameters). There is an almost equally good model that has 90% less parameters (1.4M). Please see the associated manuscript (preprint) for more details.

Assessment: Predictive, comparative

Evaluating a trained model

Assessment: Explanatory, Scientific discovery

Evaluating a trained model for explanatory assessment

File information

Here are some details on what the purpose of each file is:

File Purpose
gen_figs.ipynb A notebook to show (re-generate) some figures in the manuscript.
train_rep.py Program to train several replicates of a Camformer model using training data.
score_rep.py Program to test several replicates of a trained Camformer model on test data.

Directory structure

Directory Contents
analysis Contains some basic analysis of results. Contents may be updated.
base Contains core codebase, utility functions, auxiliary helper files etc.
manuscript_figures Contains data, script and figures present in the manuscript.
readme_figs Images used to prepare this nice-looking README file.
saved_models Saved model weights and example code to run.

References

Relevant resources and previous Camformer repositories.

  1. Camformer repository (2022 version): DREAM2022 Submission
  2. DREAM 2022 Challenge Wiki Page
  3. Rafi et al., 2024: Paper Preprint
  4. Rafi et al., 2024: Data and Official Evaluation GitHub

Citation

If you find Camformer useful for your research, please cite:

Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic
Dash T, Bornelöv S
Bioinformatics Advances vbaf130, 2025, doi: https://doi.org/10.1093/bioadv/vbaf130

About

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •