Skip to content

Commit b359267

Browse files
authored
Merge pull request #1665 from vagkaratzas/modules
hhsuite mini test database
2 parents 8edfb1e + ec34f2a commit b359267

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -837,6 +837,8 @@ The earth sciences folder contain subfolders for different data formats encounte
837837
- pdb
838838
- 1tim.pdb: Triose phosphate isomerase, through X-ray diffraction (Chicken muscle - Engineered)
839839
- 8tim.pdb: Triose phosphate isomerase, through X-ray diffraction (Chicken muscle - Breast)
840+
- hhsuite
841+
- pfam.tar.gz: An hh-suite formatted mini test database, containing PF00001.26 and PF00002.29 from pfam version 37.4.
840842

841843
### spatialomics
842844

data/proteomics/README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- [msspectra](#msspectra)
88
- [parameter](#parameter)
99
- [pdb](#pdb)
10+
- [hhsuite](#hhsuite)
1011

1112
## database
1213
'UP000005640_9606.fasta' is the reviewed human proteome of the [SWISS-PROT](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102476/) and was downloaded from UniProt.
@@ -41,5 +42,22 @@ write.table(out_df, file = 'proteus.raw_MaxQuant_proteingroups_tab.tsv', row.nam
4142

4243
## pdb
4344

44-
The pdb folder contains protein structure files in .PDB format. Files 1tim.pdb and 8tim.pdb are part of the example datasets used in the foldseek tool (https://github.com/steineggerlab/foldseek).
45+
The pdb folder contains protein structure files in .PDB format.
46+
Files 1tim.pdb and 8tim.pdb are part of the example datasets used in the [foldseek tool](https://github.com/steineggerlab/foldseek).
4547
They describe chicken muscle proteins (engineered and breast respectively) and their structures were determined through X-ray diffraction.
48+
49+
## hhsuite
50+
51+
The [HH-suite](https://github.com/soedinglab/hh-suite) is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
52+
The hhsuite test-datasets folder contains a compressed archive of an HH-suite formatted mini test database (pfam.tar.gz).
53+
The mini database contains protein families PF00001.26 and PF00002.29, from pfam version 37.4, and can be searched by the HHblits and HHsearch tools.
54+
Such databases usually consist of the following six files, inside a folder, which all start with the name of the database, followed by different extensions:
55+
```
56+
<dbname>_cs219.ffdata packed file with column-state sequences for prefiltering
57+
<dbname>_cs219.ffindex index file for packed column-state sequence file
58+
<dbname>_a3m.ffdata packed file with MSAs in A3M format
59+
<dbname>_a3m.ffindex index file for packed A3M file
60+
<dbname>_hhm.ffdata packed file with HHM-formatted HMMs
61+
<dbname>_hhm.ffindex index file for packed HHM file
62+
```
63+
More information regarding HH-suite format databases can be found [here](https://github.com/soedinglab/hh-suite/wiki#hh-suite-databases).

data/proteomics/hhsuite/pfam.tar.gz

39.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)