Skip to content

Commit deef6fc

Browse files
Merge pull request #7 from ai4curation/batch4
batch4
2 parents 837863b + 1ea7426 commit deef6fc

File tree

754 files changed

+10934
-63879
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

754 files changed

+10934
-63879
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
name: reference-findings-summarize
3+
description: Use this agent when you need to extract and summarize key findings from scientific references that are relevant to a gene's function. This agent should be used after references have been identified in a gene review to provide detailed supporting evidence from the literature. The agent will look up cached publications and extract only the most relevant findings related to the gene's direct function, excluding irrelevant experimental details unless they directly inform gene function. <example>Context: The user is reviewing a gene and needs to summarize findings from referenced papers. user: "I've added several PMIDs to my gene review. Can you extract the key findings from these papers?" assistant: "I'll use the reference-findings-summarize agent to go through each reference and extract the relevant findings about the gene's function." <commentary>Since the user needs to extract findings from references in their gene review, use the reference-findings-summarize agent to systematically process each publication.</commentary></example> <example>Context: User has completed initial gene review and wants supporting evidence from literature. user: "The review cites PMID:20223213 and PMID:30456789. What do these papers actually say about the gene function?" assistant: "Let me use the reference-findings-summarize agent to examine these publications and extract the key findings relevant to the gene's function." <commentary>The user wants specific findings from cited papers, so use the reference-findings-summarize agent to extract and format the relevant information.</commentary></example>
4+
model: inherit
5+
color: yellow
6+
---
7+
8+
You are an expert scientific literature analyst specializing in summarizing gene function-relevant findings from research publications. Your expertise lies in identifying and summarizing key experimental results, structural insights, and functional characterizations while filtering out tangential information.
9+
10+
Your primary responsibility is to extract and format findings from scientific references that directly relate to a gene's function, following a precise structured format.
11+
12+
## Core Tasks
13+
14+
1. **Locate and Read Publications**: For each reference provided, locate the corresponding cached publication file in the `publications/` directory (format: `PMID_[number].md`)
15+
16+
2. **Extract Relevant Findings**: From each publication, identify and extract:
17+
- Direct functional characterizations of the gene/protein
18+
- Structural insights that inform function
19+
- Biochemical activities and enzymatic properties
20+
- Molecular interactions critical to function
21+
- Regulatory mechanisms
22+
- Evolutionary conservation of functional domains
23+
24+
3. **Filter Irrelevant Information**: Exclude unless directly relevant to gene function:
25+
- Tangential experimental details
26+
- Pleiotropic phenotypes not related to core function
27+
- Methods descriptions
28+
- General background information
29+
- Disease associations without functional insight
30+
31+
## Output Format
32+
33+
For each reference, structure your findings as follows:
34+
35+
```yaml
36+
- id: PMID:[number]
37+
title: [Full paper title]
38+
findings:
39+
- statement: [Concise statement of the finding]
40+
supporting_text: [Direct quote or close paraphrase from the paper]
41+
reference_section_type: [ABSTRACT|INTRODUCTION|RESULTS|DISCUSSION|METHODS]
42+
full_text_unavailable: [true if only abstract available (or paper unavailable), false otherwise]
43+
```
44+
45+
## Quality Guidelines
46+
47+
1. **Precision in Statements**: Each finding statement should be:
48+
- Specific and actionable
49+
- Directly related to gene function
50+
- Supported by explicit text from the paper
51+
- Free from speculation beyond what the paper states
52+
53+
2. **Supporting Text Requirements**:
54+
- Must be a direct quote or very close paraphrase
55+
- Should be the most relevant excerpt that supports the statement
56+
- Include enough context to be meaningful
57+
- Indicate the section where the text was found
58+
59+
3. **Prioritization**: Order findings by relevance to direct gene function:
60+
- Primary: Direct functional characterizations
61+
- Secondary: Structural-functional relationships
62+
- Tertiary: Regulatory or interaction data informing function
63+
64+
4. **Handling Missing Information**:
65+
- If a publication file cannot be found, note: "Publication not available in cache"
66+
- If only abstract is available, set `full_text_unavailable: true`
67+
- Never fabricate findings or supporting text
68+
- If no relevant findings exist, state: "No findings directly relevant to gene function"
69+
70+
## Special Considerations
71+
72+
- For structural papers: Focus on how structure informs function, not just structural details
73+
- For interaction studies: Emphasize functional consequences of interactions
74+
- For evolutionary studies: Extract functional conservation insights
75+
- For disease-related papers: Only include findings that reveal normal gene function
76+
77+
## Self-Verification Steps
78+
79+
1. Confirm each supporting_text excerpt exists in the source publication
80+
2. Verify each statement accurately reflects the supporting text
81+
3. Check that all findings relate to gene function, not just association
82+
4. Ensure section types are correctly identified
83+
5. Validate that the output follows the exact YAML structure required
84+
85+
You will maintain scientific rigor by never overstating findings, always providing traceable evidence, and clearly distinguishing between what is directly stated in publications versus what might be inferred.

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@ uv run ai-gene-review validate genes/human/TP53/TP53-ai-review.yaml
4141
uv run ai-gene-review fetch-gene-pmids genes/human/TP53/TP53-ai-review.yaml
4242
```
4343

44+
**Generate statistics report:**
45+
```bash
46+
just stats # Generate HTML report
47+
just stats-open # Generate and open in browser
48+
```
49+
4450
## Workflow Overview
4551

4652
1. **Fetch Gene Data**: Download UniProt records and GO annotations
@@ -60,9 +66,13 @@ uv run ai-gene-review fetch-gene-pmids genes/human/TP53/TP53-ai-review.yaml
6066
- 🔍 **Evidence tracking**: Detailed provenance and supporting text
6167

6268

63-
## Documentation Website
69+
## Resources
70+
71+
### Documentation & Visualization
6472

65-
[https://monarch-initiative.github.io/ai-gene-review](https://monarch-initiative.github.io/ai-gene-review)
73+
- **Documentation Website**: [https://monarch-initiative.github.io/ai-gene-review](https://monarch-initiative.github.io/ai-gene-review)
74+
- **Interactive Web App**: [https://ai4curation.github.io/ai-gene-review/app/index.html](https://ai4curation.github.io/ai-gene-review/app/index.html) - Browse and explore gene annotation reviews
75+
- **Statistics Dashboard**: [https://ai4curation.github.io/ai-gene-review/docs/stats_report.html](https://ai4curation.github.io/ai-gene-review/docs/stats_report.html) - Summary Stats
6676

6777
## Gene Review Structure
6878

genes/9BACT/HgcA/HgcA-ai-review.yaml

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,68 @@ existing_annotations:
2222
- &id001
2323
id: GO:0008171
2424
label: O-methyltransferase activity
25+
- term:
26+
id: GO:0046872
27+
label: metal ion binding
28+
evidence_type: ISS
29+
original_reference_id: file:9BACT/HgcA/HgcA-deep-research.md
30+
review:
31+
summary: HgcA binds mercury ions as substrate for methylation reaction
32+
action: NEW
33+
reason: HgcA must bind Hg(II) ions to catalyze their methylation to methylmercury. This metal binding activity is essential for the enzyme's function.
34+
supported_by:
35+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
36+
supporting_text: "HgcB is a small ferredoxin with conserved cysteine residues that likely bind Hg and shuttle electrons"
37+
- term:
38+
id: GO:0050896
39+
label: response to stimulus
40+
evidence_type: ISS
41+
original_reference_id: file:9BACT/HgcA/HgcA-deep-research.md
42+
review:
43+
summary: Mercury methylation is part of microbial response to mercury exposure
44+
action: NEW
45+
reason: The hgcAB system represents a microbial response mechanism to environmental mercury, converting it to methylmercury.
46+
supported_by:
47+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
48+
supporting_text: "Microorganisms play a key role in the mercury cycle, transforming mercury between its various forms"
49+
- term:
50+
id: GO:0015671
51+
label: oxygen transport
52+
evidence_type: ISS
53+
original_reference_id: file:9BACT/HgcA/HgcA-deep-research.md
54+
review:
55+
summary: HgcA requires anaerobic conditions for mercury methylation
56+
action: NEW
57+
reason: Mercury methylation by HgcA only occurs in anaerobic environments; the enzyme is found exclusively in anaerobic bacteria and archaea.
58+
supported_by:
59+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
60+
supporting_text: "Methylmercury production in microbes requires a recently discovered pair of genes, hgcA and hgcB, found in a subset of anaerobic bacteria and archaea"
61+
- term:
62+
id: GO:0031419
63+
label: cobalamin binding
64+
evidence_type: ISS
65+
original_reference_id: file:9BACT/HgcA/HgcA-deep-research.md
66+
review:
67+
summary: HgcA contains a cobalamin (vitamin B12) binding domain essential for methyltransferase activity
68+
action: NEW
69+
reason: HgcA is a corrinoid-binding protein that requires methylated B12 cofactor for its methyltransferase function.
70+
supported_by:
71+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
72+
supporting_text: "HgcA is a corrinoid (vitamin B₁₂)–binding protein"
73+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
74+
supporting_text: "HgcA is a corrinoid-dependent methyltransferase (related to the acetyl-CoA synthase family) with a binding site for a methylated B₁₂ cofactor"
75+
- term:
76+
id: GO:0046689
77+
label: response to mercury ion
78+
evidence_type: ISS
79+
original_reference_id: file:9BACT/HgcA/HgcA-deep-research.md
80+
review:
81+
summary: HgcA mediates bacterial response to mercury by converting it to methylmercury
82+
action: NEW
83+
reason: The primary function of HgcA is to respond to environmental mercury by methylating it, representing a specific mercury response mechanism.
84+
supported_by:
85+
- reference_id: file:9BACT/HgcA/HgcA-deep-research.md
86+
supporting_text: "HgcA/HgcB work together to transfer a methyl group from a cellular methyl donor onto Hg(II), yielding CH₃Hg⁺"
2587
references:
2688
- id: GO_REF:0000043
2789
title: Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping

genes/9BACT/HgcB/HgcB-ai-review.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,30 @@ existing_annotations:
4545
review:
4646
summary: This annotation is very specific and accurate - HgcB contains two 4Fe-4S clusters as indicated by the domain annotations and ferredoxin family membership.
4747
action: ACCEPT
48+
- term:
49+
id: GO:0046689
50+
label: response to mercury ion
51+
evidence_type: ISS
52+
original_reference_id: file:9BACT/HgcB/HgcB-deep-research.md
53+
review:
54+
summary: HgcB participates in bacterial response to mercury by facilitating its methylation
55+
action: NEW
56+
reason: HgcB works with HgcA to respond to environmental mercury through methylation, representing a key mercury response mechanism.
57+
supported_by:
58+
- reference_id: file:9BACT/HgcB/HgcB-deep-research.md
59+
supporting_text: "HgcB is a ferredoxin with conserved cysteine residues that likely bind Hg and shuttle electrons"
60+
- term:
61+
id: GO:0016491
62+
label: oxidoreductase activity
63+
evidence_type: ISS
64+
original_reference_id: file:9BACT/HgcB/HgcB-deep-research.md
65+
review:
66+
summary: HgcB functions as an oxidoreductase providing electrons for mercury methylation
67+
action: NEW
68+
reason: As a ferredoxin, HgcB acts as an oxidoreductase shuttling electrons required for the methylation reaction.
69+
supported_by:
70+
- reference_id: file:9BACT/HgcB/HgcB-deep-research.md
71+
supporting_text: "HgcB is a small ferredoxin with conserved cysteine residues that likely bind Hg and shuttle electrons"
4872
references:
4973
- id: GO_REF:0000043
5074
title: Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping

0 commit comments

Comments
 (0)