Skip to content

Commit 4ed0ce1

Browse files
committed
uto push
1 parent c1d53a0 commit 4ed0ce1

25 files changed

+2066
-105
lines changed

.github/workflows/deploy-timeline.yml

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,22 +27,14 @@ jobs:
2727
python -m pip install --upgrade pip
2828
pip install pandas plotly numpy
2929
30-
- name: Generate timeline visualizations
31-
run: |
32-
# If your script writes output into a subfolder (e.g. ./site),
33-
# make sure to set publish_dir to that same folder below.
34-
python scripts/generate_timeline.py
35-
3630
- name: Avoid Jekyll processing (optional)
37-
run: touch .nojekyll
31+
run: touch output/visualizations/.nojekyll
3832

39-
# Publish repo root to gh-pages (adjust publish_dir if needed)
33+
# Publish only the visualizations folder to gh-pages
4034
- name: Publish to gh-pages
4135
uses: peaceiris/actions-gh-pages@v4
4236
with:
4337
github_token: ${{ secrets.GITHUB_TOKEN }}
44-
publish_dir: .
45-
# If your script writes to ./site instead, use:
46-
# publish_dir: ./site
38+
publish_dir: output/visualizations
4739
# Optional: keep a custom domain
4840
# cname: example.org

ORGANIZATION.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Repository Organization Summary
2+
3+
## ✅ Completed Reorganization
4+
5+
### 📁 New Directory Structure
6+
```
7+
exp-overview/
8+
├── 📄 README.md # Updated comprehensive documentation
9+
├── 🏃 run.sh # Main execution script
10+
├── 📂 scripts/ # All executable scripts
11+
│ ├── generate_overview_csv.py # Main data generation script
12+
│ ├── add_experiment.py # Add new experiments
13+
│ ├── check_config_targets.py # Target validation
14+
│ ├── fix_csv_comprehensive.py # Data cleaning
15+
│ └── generate_timeline.py # Timeline generation
16+
├── 📂 data/ # Data organization
17+
│ ├── raw/ # Original/manual data
18+
│ │ └── overview.csv
19+
│ └── processed/ # Generated/cleaned data
20+
│ ├── auto_generated_overview.csv
21+
│ ├── config_targets_check.csv
22+
│ ├── detailed_setup_analysis.csv
23+
│ └── overview_corrected.csv
24+
├── 📂 output/ # Generated outputs
25+
│ ├── reports/ # Analysis reports
26+
│ │ └── comparison_report.md
27+
│ └── visualizations/ # HTML dashboards
28+
│ ├── index.html
29+
│ ├── experiment_timeline.html
30+
│ ├── experiment_gantt.html
31+
│ └── experiment_stats.html
32+
├── 📂 config/ # Configuration files
33+
│ ├── requirements.txt
34+
│ └── project_config.py # Project settings
35+
├── 📂 docs/ # Documentation
36+
│ └── README_scripts.md
37+
├── 📂 archive/ # Archived/temp files
38+
│ └── test_lsd_detection.py
39+
└── 📂 .github/ # CI/CD workflows
40+
└── workflows/
41+
└── deploy-timeline.yml
42+
```
43+
44+
### 🔧 Key Improvements
45+
46+
#### 1. **Clear Separation of Concerns**
47+
- **`scripts/`**: All executable code
48+
- **`data/`**: Clear raw vs processed data separation
49+
- **`output/`**: Generated reports and visualizations
50+
- **`config/`**: Configuration and dependencies
51+
- **`docs/`**: Documentation
52+
53+
#### 2. **Better Entry Points**
54+
- **`run.sh`**: Single command to execute full pipeline
55+
- **Updated paths**: Scripts now output to organized locations
56+
- **Project config**: Centralized configuration management
57+
58+
#### 3. **Professional Documentation**
59+
- **Comprehensive README**: Feature overview, usage guide, development info
60+
- **Directory structure**: Clear hierarchy with purpose explanations
61+
- **Quick start**: Simple commands to get running
62+
- **Statistics**: Current data summary and accuracy metrics
63+
64+
#### 4. **Data Management**
65+
- **Raw data preservation**: Original files in `data/raw/`
66+
- **Processed outputs**: Generated files in `data/processed/`
67+
- **Report separation**: Analysis reports in dedicated directory
68+
- **Visualization assets**: HTML files properly organized
69+
70+
#### 5. **Development Workflow**
71+
- **Executable scripts**: Proper shebang lines and permissions
72+
- **Path updates**: All scripts use new directory structure
73+
- **Configuration management**: Centralized settings
74+
- **Archive area**: Historical/temporary files separated
75+
76+
### 🚀 Usage After Reorganization
77+
78+
#### Generate Complete Overview
79+
```bash
80+
./run.sh
81+
```
82+
83+
#### Individual Components
84+
```bash
85+
# Generate data only
86+
python scripts/generate_overview_csv.py
87+
88+
# Add new experiment
89+
python scripts/add_experiment.py
90+
91+
# Generate timeline
92+
python scripts/generate_timeline.py
93+
```
94+
95+
#### Access Results
96+
```bash
97+
# View main data
98+
cat data/processed/auto_generated_overview.csv
99+
100+
# View accuracy report
101+
cat output/reports/comparison_report.md
102+
103+
# Open visualizations
104+
open output/visualizations/index.html
105+
```
106+
107+
### 📊 Impact
108+
109+
#### Before Reorganization
110+
- ❌ Files scattered in root directory
111+
- ❌ Mixed data types and purposes
112+
- ❌ Unclear execution workflow
113+
- ❌ Limited documentation
114+
115+
#### After Reorganization
116+
-**Professional structure** following best practices
117+
-**Clear data pipeline** from raw → processed → output
118+
-**Easy execution** with single command
119+
-**Comprehensive documentation** for users and developers
120+
-**Maintainable codebase** with proper organization
121+
-**Scalable architecture** for future expansion
122+
123+
### 🎯 Benefits
124+
125+
1. **User Experience**: Single command execution, clear documentation
126+
2. **Development**: Easier to find, modify, and extend code
127+
3. **Collaboration**: Clear structure for team members
128+
4. **Maintenance**: Organized codebase reduces technical debt
129+
5. **Deployment**: Better suited for CI/CD and automation
130+
131+
The repository is now organized according to modern software development best practices with clear separation of concerns, comprehensive documentation, and an intuitive workflow.

README.md

Lines changed: 141 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,149 @@
11

2-
# Experiment Overview
2+
# Experiment Overview Repository
33

4-
This document is a comprehensive overview of all model training experiments and their configurations across different biological groups and setups.
4+
This repository contains tools and data for managing and visualizing machine learning experiment overviews across multiple research projects.
55

6-
## Training Experiments Summary
6+
## 📁 Repository Structure
77

8-
### 🔬 Mitochondria Experiments (`exp_mito`)
9-
- **Focus**: Mitochondria segmentation with LSD loss
10-
- **Model Base**: Fly model architecture
11-
- **Resolution**: 16nm voxel size
12-
- **Training Data**: Mixed datasets for mitochondria detection
13-
- **Setups**: setup_15, setup_16, setup_17, setup_18, setup_19
8+
```
9+
exp-overview/
10+
├── README.md # This file
11+
├── scripts/ # Main execution scripts
12+
│ ├── generate_overview_csv.py # Primary script for generating experiment CSV
13+
│ ├── add_experiment.py # Script for adding new experiments
14+
│ ├── check_config_targets.py # Configuration validation
15+
│ ├── fix_csv_comprehensive.py # Data cleaning utilities
16+
│ └── generate_timeline.py # Timeline visualization generation
17+
├── data/ # Data storage
18+
│ ├── raw/ # Original/manual data files
19+
│ │ └── overview.csv # Original manual experiment overview
20+
│ └── processed/ # Generated/processed data files
21+
│ ├── auto_generated_overview.csv # Main automated overview
22+
│ ├── config_targets_check.csv # Target validation results
23+
│ ├── detailed_setup_analysis.csv # Detailed experiment analysis
24+
│ └── overview_corrected.csv # Corrected overview data
25+
├── output/ # Generated outputs
26+
│ ├── reports/ # Analysis reports
27+
│ │ └── comparison_report.md # Data accuracy comparison report
28+
│ └── visualizations/ # HTML visualizations
29+
│ ├── index.html # Main dashboard
30+
│ ├── experiment_timeline.html # Timeline view
31+
│ ├── experiment_gantt.html # Gantt chart view
32+
│ └── experiment_stats.html # Statistics dashboard
33+
├── config/ # Configuration files
34+
│ └── requirements.txt # Python dependencies
35+
├── docs/ # Documentation
36+
│ └── README_scripts.md # Detailed script documentation
37+
├── archive/ # Archived/temporary files
38+
└── .github/ # GitHub workflows
39+
└── workflows/
40+
└── deploy-timeline.yml # Automated deployment
41+
```
1442

15-
| Setup | Target | Model Type | Starting Checkpoint | Max Iterations | Resolution (nm) | Batch Size | Learning Rate | Creation Date | Still Running |
16-
|-------|--------|------------|-------------------|----------------|-----------------|------------|---------------|---------------|---------------|
17-
| setup_15 | mito | fly model | 20250806_mito_mouse_distance_16nm/362k | 410,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
18-
| setup_16 | mito | fly model | setup_15/80k | 330,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
19-
| setup_17 | mito | fly model | setup_16/30k | 270,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
20-
| setup_18 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 210,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
21-
| setup_19 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 310,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
43+
## 🚀 Quick Start
44+
45+
### Generate Complete Experiment Overview
46+
```bash
47+
python scripts/generate_overview_csv.py
48+
```
49+
This creates `data/processed/auto_generated_overview.csv` with all experiment data.
50+
51+
### Add New Experiment
52+
```bash
53+
python scripts/add_experiment.py
54+
```
55+
56+
### Generate Timeline Visualization
57+
```bash
58+
python scripts/generate_timeline.py
59+
```
60+
61+
## 📊 Data Description
62+
63+
### Main Dataset: `auto_generated_overview.csv`
64+
Contains comprehensive experiment information with the following columns:
65+
66+
| Column | Description |
67+
|--------|-------------|
68+
| Group | Experiment group (exp_cell, exp_cerebellum, etc.) |
69+
| Setup | Unique setup identifier |
70+
| Target | Target organelles (mito, nuc, cell, er+isg+ld+lyso+mito+nuc) |
71+
| Model Type | Architecture type (fly model, isolated_unet) |
72+
| Starting Checkpoint | Initial model checkpoint |
73+
| Max Iterations | Maximum training iterations |
74+
| Resolution (nm) | Voxel resolution in nanometers |
75+
| Batch Size | Training batch size |
76+
| Learning Rate | Training learning rate |
77+
| Creation Date | Experiment creation date |
78+
| Still Running | Whether experiment is currently active |
79+
| LSD | Whether experiment uses Local Shape Descriptors |
80+
81+
## 🔧 Key Features
82+
83+
### Automated Data Extraction
84+
- **Smart Directory Scanning**: Automatically discovers experiments across multiple project directories
85+
- **Configuration Parsing**: Extracts parameters from `config.yaml` and `train.py` files
86+
- **Checkpoint Analysis**: Determines training progress and starting points
87+
- **LSD Detection**: Identifies Local Shape Descriptor usage from code analysis
88+
89+
### Target Detection
90+
- **Config-based**: Extracts organelle targets from segmentation labels
91+
- **Name-based**: Infers targets from experiment naming conventions
92+
- **Multi-organelle Support**: Handles complex multi-target experiments
93+
94+
### Data Quality
95+
- **Filtering**: Excludes incomplete experiments without checkpoints
96+
- **Validation**: Compares automated vs manual data for accuracy
97+
- **Deduplication**: Removes duplicate entries and consolidates data
98+
99+
## 📈 Current Statistics
100+
101+
- **75 total experiments** tracked
102+
- **17 experiments** using LSD (Local Shape Descriptors)
103+
- **58 experiments** using standard approaches
104+
- **95% accuracy** compared to manual curation
105+
106+
### Target Distribution
107+
- `er+isg+ld+lyso+mito+nuc`: 40 experiments (multi-organelle)
108+
- `mito`: 13 experiments (mitochondria)
109+
- `cell`: 7 experiments (cell segmentation)
110+
- `nuc`: 4 experiments (nucleus)
111+
- Other specific combinations: 11 experiments
112+
113+
## 🛠 Development
114+
115+
### Adding New Experiment Types
116+
1. Update scanning logic in `scripts/generate_overview_csv.py`
117+
2. Add new directory patterns to `scan_experiment_directories()`
118+
3. Test with sample data
119+
120+
### Extending Target Detection
121+
1. Modify `extract_additional_config_info()` function
122+
2. Add new organelle patterns to detection logic
123+
3. Update target inference rules
124+
125+
## 📝 Dependencies
126+
127+
Install required packages:
128+
```bash
129+
pip install -r config/requirements.txt
130+
```
131+
132+
Key dependencies:
133+
- `pyyaml`: Configuration file parsing
134+
- `pandas`: Data manipulation
135+
- `pathlib`: File system operations
136+
137+
## 🤝 Contributing
138+
139+
1. Follow the established directory structure
140+
2. Update documentation when adding features
141+
3. Test with existing experiment data
142+
4. Maintain data quality and validation
143+
144+
## 📞 Contact
145+
146+
For questions about specific experiments or data interpretation, please refer to the individual experiment directories or contact the research team.
22147

23148

24149

archive/test_lsd_detection.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/usr/bin/env python3
2+
3+
import yaml
4+
from pathlib import Path
5+
6+
7+
def detect_lsd_usage(run_dir):
8+
"""Detect if experiment uses LSD by checking config.yaml and train.py files."""
9+
config_file = run_dir / "config.yaml"
10+
train_file = run_dir / "train.py"
11+
12+
print(f"Checking LSD for: {run_dir}")
13+
print(f"Config file exists: {config_file.exists()}")
14+
print(f"Train file exists: {train_file.exists()}")
15+
16+
# Check config.yaml for is_lsd or lsd flags
17+
if config_file.exists():
18+
try:
19+
with open(config_file, "r") as f:
20+
config = yaml.safe_load(f)
21+
22+
if isinstance(config, dict):
23+
# Check run section for lsd flags
24+
run_config = config.get("run", {})
25+
if run_config:
26+
is_lsd = run_config.get("is_lsd", False)
27+
lsd = run_config.get("lsd", False)
28+
print(f"Config is_lsd: {is_lsd}, lsd: {lsd}")
29+
if is_lsd or lsd:
30+
print("Found LSD flag in config.yaml")
31+
return True
32+
33+
except Exception as e:
34+
print(
35+
f"Warning: Could not parse config.yaml for LSD detection in {run_dir}: {e}"
36+
)
37+
38+
# Check train.py for affinities_map parameter
39+
if train_file.exists():
40+
try:
41+
with open(train_file, "r") as f:
42+
train_content = f.read()
43+
44+
# Look for affinities_map parameter in run() function call
45+
has_affinities_map = "affinities_map" in train_content
46+
has_assignment = "affinities_map =" in train_content
47+
print(f"Train.py has 'affinities_map': {has_affinities_map}")
48+
print(f"Train.py has 'affinities_map =': {has_assignment}")
49+
50+
if has_affinities_map and has_assignment:
51+
print("Found affinities_map in train.py")
52+
return True
53+
54+
except Exception as e:
55+
print(
56+
f"Warning: Could not read train.py for LSD detection in {run_dir}: {e}"
57+
)
58+
59+
print("No LSD indicators found")
60+
return False
61+
62+
63+
# Test setup_15
64+
setup_15_dir = Path("/groups/cellmap/cellmap/zouinkhim/exp_salivary/runs/setup_15")
65+
result = detect_lsd_usage(setup_15_dir)
66+
print(f"\nSetup_15 LSD result: {result}")

0 commit comments

Comments
 (0)