Skip to content

Commit 4d48c60

Browse files
authored
Merge pull request #9 from ArpiarSaundersLab/dev
Filtered slashes in in gene names. Resolves #8
2 parents 144ef3b + 21e2834 commit 4d48c60

File tree

7 files changed

+54
-22
lines changed

7 files changed

+54
-22
lines changed

README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,7 @@ asql.plot_umap(color_by="leiden_clusters", annotate=True)
127127

128128

129129
## Reference
130-
AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br />
131-
Kenny Pavan, Arpiar Saunders<br />
132-
bioRxiv 2024.11.02.621676; [doi: https://doi.org/10.1101/2024.11.02.621676](https://www.biorxiv.org/content/10.1101/2024.11.02.621676)
133-
130+
Kenny Pavan, Arpiar Saunders, AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br>
131+
Bioinformatics Advances, 2025; vbaf105, [https://doi.org/10.1093/bioadv/vbaf105](https://doi.org/10.1093/bioadv/vbaf105)
134132
<br>
135133
<br>

docs/cite.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# Citation
22

3-
AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br />
4-
Kenny Pavan, Arpiar Saunders<br />
5-
bioRxiv 2024.11.02.621676; [doi: https://doi.org/10.1101/2024.11.02.621676](https://www.biorxiv.org/content/10.1101/2024.11.02.621676)
3+
Kenny Pavan, Arpiar Saunders, AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br>
4+
Bioinformatics Advances, 2025; vbaf105, [https://doi.org/10.1093/bioadv/vbaf105](https://doi.org/10.1093/bioadv/vbaf105)

docs/index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,8 @@ There are two key reasons to use **AnnSQL**: (1) if you prefer SQL's expressive
7878

7979

8080
## Citation
81-
AnnSQL: A Python SQL-based package for large-scale single-cell genomics analysis on a laptop<br />
82-
Kenny Pavan, Arpiar Saunders<br />
83-
bioRxiv 2024.11.02.621676; [doi: https://doi.org/10.1101/2024.11.02.621676](https://www.biorxiv.org/content/10.1101/2024.11.02.621676)
81+
Kenny Pavan, Arpiar Saunders, AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br>
82+
Bioinformatics Advances, 2025; vbaf105, [https://doi.org/10.1093/bioadv/vbaf105](https://doi.org/10.1093/bioadv/vbaf105)
8483

8584
<br>
8685
<br>

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setuptools.setup(
77
name='AnnSQL',
8-
version='v1.0.2',
8+
version='v1.0.3',
99
author="Kenny Pavan",
1010
author_email="pavan@ohsu.edu",
1111
description="A Python SQL tool for converting Anndata objects to a relational DuckDb database. Methods are included for querying and basic single-cell preprocessing (experimental). ",

src/AnnSQL.egg-info/PKG-INFO

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
Metadata-Version: 2.2
1+
Metadata-Version: 2.4
22
Name: AnnSQL
3-
Version: 1.0.1
3+
Version: 1.0.3
44
Summary: A Python SQL tool for converting Anndata objects to a relational DuckDb database. Methods are included for querying and basic single-cell preprocessing (experimental).
55
Home-page: https://github.com/ArpiarSaundersLab/annsql
66
Author: Kenny Pavan
@@ -12,7 +12,7 @@ Requires-Python: >=3.12
1212
Description-Content-Type: text/markdown
1313
License-File: LICENSE
1414
Requires-Dist: scanpy>=1.10.3
15-
Requires-Dist: duckdb>=1.1.2
15+
Requires-Dist: duckdb>=1.2.2
1616
Requires-Dist: memory-profiler>=0.61.0
1717
Requires-Dist: psutil>=6.0.0
1818
Requires-Dist: pyarrow>=17.0.0
@@ -23,12 +23,25 @@ Dynamic: classifier
2323
Dynamic: description
2424
Dynamic: description-content-type
2525
Dynamic: home-page
26+
Dynamic: license-file
2627
Dynamic: requires-dist
2728
Dynamic: requires-python
2829
Dynamic: summary
2930

30-
<center><img src="examples/images/logo.png" width=500></center>
31-
<br />
31+
<p align="center">
32+
<img src="examples/images/logo.png" width=500>
33+
</p>
34+
35+
<br>
36+
37+
<p align="center">
38+
<a href="https://github.com/ArpiarSaundersLab/annsql/tree/main/tests"><img src="https://img.shields.io/badge/build-passing-brightgreen"></a>
39+
<a href="https://img.shields.io/github/v/release/ArpiarSaundersLab/annsql"><img src="https://img.shields.io/github/v/release/ArpiarSaundersLab/annsql"></a>
40+
<a href="https://static.pepy.tech/badge/annsql/month"><img src="https://static.pepy.tech/badge/annsql/month"></a>
41+
<a href="https://static.pepy.tech/badge/annsql"><img src="https://static.pepy.tech/badge/annsql"></a>
42+
</p>
43+
44+
<br>
3245

3346
# Query AnnData Objects with SQL
3447
The Python based AnnSQL package enables SQL-based queries on [AnnData](https://anndata.readthedocs.io/en/latest/) objects, returning results as either a [Pandas](https://pandas.pydata.org/) DataFrame, an AnnData object, or a [Parquet](https://parquet.apache.org/) file that can easily be imported into a variety of data analysis tools. Behind the scenes, AnnSQL converts the layers of an AnnData object into a relational [DuckDB](https://duckdb.org/) database. Each layer is stored as an individual table, allowing for simple or complex SQL queries, including table joins.
@@ -112,16 +125,39 @@ asql.query("SELECT SUM(COLUMNS(*)) FROM (SELECT * EXCLUDE (cell_id) FROM X)")
112125

113126
#taking the correlation of genes ITGB2 and SSU72 in dendritic cells that express either gene > 0
114127
asql.query("SELECT corr(ITGB2,SSU72) as correlation FROM adata WHERE bulk_labels = 'Dendritic' AND (ITGB2 > 0 OR SSU72 >0)")
128+
129+
############################################################################
130+
# Extended AnnSQL methods (See: https://docs.annsql.com/preprocessing)
131+
# These methods are either SQL based or Python/SQL hybrid implementations.
132+
############################################################################
133+
134+
#basic QC on the dataset
135+
asql.calculate_total_counts()
136+
asql.filter_by_cell_counts(min_cell_count=1000, max_cell_count=50000)
137+
asql.filter_by_gene_counts(min_gene_counts=100, max_gene_counts=10000)
138+
139+
#normalize & log umi counts
140+
asql.expression_normalize(total_counts_per_cell=10000)
141+
asql.expression_log(log_type="LN")
142+
143+
#select highly variable genes
144+
asql.calculate_variable_genes(save_var_names=True, top_variable_genes=1000)
145+
146+
#run pca
147+
asql.calculate_pca(n_pcs=50, top_variable_genes=1000, zero_center=False)
148+
149+
#umap, cluster, then and plot.
150+
asql.calculate_umap()
151+
asql.calculate_leiden_clusters(resolution=0.25, n_neighbors=5)
152+
asql.plot_umap(color_by="leiden_clusters", annotate=True)
115153
```
116154

117155
<br>
118156
<br>
119157

120158

121159
## Reference
122-
AnnSQL: A Python SQL-based package for large-scale single-cell genomics analysis on a laptop<br />
123-
Kenny Pavan, Arpiar Saunders<br />
124-
bioRxiv 2024.11.02.621676; [doi: https://doi.org/10.1101/2024.11.02.621676](https://www.biorxiv.org/content/10.1101/2024.11.02.621676)
125-
160+
Kenny Pavan, Arpiar Saunders, AnnSQL: A Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources<br>
161+
Bioinformatics Advances, 2025; vbaf105, [https://doi.org/10.1093/bioadv/vbaf105](https://doi.org/10.1093/bioadv/vbaf105)
126162
<br>
127163
<br>

src/AnnSQL.egg-info/requires.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
scanpy>=1.10.3
2-
duckdb>=1.1.2
2+
duckdb>=1.2.2
33
memory-profiler>=0.61.0
44
psutil>=6.0.0
55
pyarrow>=17.0.0

src/AnnSQL/BuildDb.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,7 +390,7 @@ def replace_special_chars(self, string):
390390
if string[0].isdigit():
391391
return 'n'+string.replace("-", "_").replace(".", "_")
392392
else:
393-
return string.replace("-", "_").replace(".", "_").replace("(", "_").replace(")", "_").replace(",", "_").replace(" ", "_")
393+
return string.replace("-", "_").replace(".", "_").replace("(", "_").replace(")", "_").replace(",", "_").replace(" ", "_").replace("/", "_")
394394

395395
def determine_buffer_status(self):
396396
"""

0 commit comments

Comments
 (0)