|
1 | 1 |
|
2 |
| -# Experiment Overview |
| 2 | +# Experiment Overview Repository |
3 | 3 |
|
4 |
| -This document is a comprehensive overview of all model training experiments and their configurations across different biological groups and setups. |
| 4 | +This repository contains tools and data for managing and visualizing machine learning experiment overviews across multiple research projects. |
5 | 5 |
|
6 |
| -## Training Experiments Summary |
| 6 | +## 📁 Repository Structure |
7 | 7 |
|
8 |
| -### 🔬 Mitochondria Experiments (`exp_mito`) |
9 |
| -- **Focus**: Mitochondria segmentation with LSD loss |
10 |
| -- **Model Base**: Fly model architecture |
11 |
| -- **Resolution**: 16nm voxel size |
12 |
| -- **Training Data**: Mixed datasets for mitochondria detection |
13 |
| -- **Setups**: setup_15, setup_16, setup_17, setup_18, setup_19 |
| 8 | +``` |
| 9 | +exp-overview/ |
| 10 | +├── README.md # This file |
| 11 | +├── scripts/ # Main execution scripts |
| 12 | +│ ├── generate_overview_csv.py # Primary script for generating experiment CSV |
| 13 | +│ ├── add_experiment.py # Script for adding new experiments |
| 14 | +│ ├── check_config_targets.py # Configuration validation |
| 15 | +│ ├── fix_csv_comprehensive.py # Data cleaning utilities |
| 16 | +│ └── generate_timeline.py # Timeline visualization generation |
| 17 | +├── data/ # Data storage |
| 18 | +│ ├── raw/ # Original/manual data files |
| 19 | +│ │ └── overview.csv # Original manual experiment overview |
| 20 | +│ └── processed/ # Generated/processed data files |
| 21 | +│ ├── auto_generated_overview.csv # Main automated overview |
| 22 | +│ ├── config_targets_check.csv # Target validation results |
| 23 | +│ ├── detailed_setup_analysis.csv # Detailed experiment analysis |
| 24 | +│ └── overview_corrected.csv # Corrected overview data |
| 25 | +├── output/ # Generated outputs |
| 26 | +│ ├── reports/ # Analysis reports |
| 27 | +│ │ └── comparison_report.md # Data accuracy comparison report |
| 28 | +│ └── visualizations/ # HTML visualizations |
| 29 | +│ ├── index.html # Main dashboard |
| 30 | +│ ├── experiment_timeline.html # Timeline view |
| 31 | +│ ├── experiment_gantt.html # Gantt chart view |
| 32 | +│ └── experiment_stats.html # Statistics dashboard |
| 33 | +├── config/ # Configuration files |
| 34 | +│ └── requirements.txt # Python dependencies |
| 35 | +├── docs/ # Documentation |
| 36 | +│ └── README_scripts.md # Detailed script documentation |
| 37 | +├── archive/ # Archived/temporary files |
| 38 | +└── .github/ # GitHub workflows |
| 39 | + └── workflows/ |
| 40 | + └── deploy-timeline.yml # Automated deployment |
| 41 | +``` |
14 | 42 |
|
15 |
| -| Setup | Target | Model Type | Starting Checkpoint | Max Iterations | Resolution (nm) | Batch Size | Learning Rate | Creation Date | Still Running | |
16 |
| -|-------|--------|------------|-------------------|----------------|-----------------|------------|---------------|---------------|---------------| |
17 |
| -| setup_15 | mito | fly model | 20250806_mito_mouse_distance_16nm/362k | 410,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES | |
18 |
| -| setup_16 | mito | fly model | setup_15/80k | 330,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES | |
19 |
| -| setup_17 | mito | fly model | setup_16/30k | 270,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES | |
20 |
| -| setup_18 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 210,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES | |
21 |
| -| setup_19 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 310,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES | |
| 43 | +## 🚀 Quick Start |
| 44 | + |
| 45 | +### Generate Complete Experiment Overview |
| 46 | +```bash |
| 47 | +python scripts/generate_overview_csv.py |
| 48 | +``` |
| 49 | +This creates `data/processed/auto_generated_overview.csv` with all experiment data. |
| 50 | + |
| 51 | +### Add New Experiment |
| 52 | +```bash |
| 53 | +python scripts/add_experiment.py |
| 54 | +``` |
| 55 | + |
| 56 | +### Generate Timeline Visualization |
| 57 | +```bash |
| 58 | +python scripts/generate_timeline.py |
| 59 | +``` |
| 60 | + |
| 61 | +## 📊 Data Description |
| 62 | + |
| 63 | +### Main Dataset: `auto_generated_overview.csv` |
| 64 | +Contains comprehensive experiment information with the following columns: |
| 65 | + |
| 66 | +| Column | Description | |
| 67 | +|--------|-------------| |
| 68 | +| Group | Experiment group (exp_cell, exp_cerebellum, etc.) | |
| 69 | +| Setup | Unique setup identifier | |
| 70 | +| Target | Target organelles (mito, nuc, cell, er+isg+ld+lyso+mito+nuc) | |
| 71 | +| Model Type | Architecture type (fly model, isolated_unet) | |
| 72 | +| Starting Checkpoint | Initial model checkpoint | |
| 73 | +| Max Iterations | Maximum training iterations | |
| 74 | +| Resolution (nm) | Voxel resolution in nanometers | |
| 75 | +| Batch Size | Training batch size | |
| 76 | +| Learning Rate | Training learning rate | |
| 77 | +| Creation Date | Experiment creation date | |
| 78 | +| Still Running | Whether experiment is currently active | |
| 79 | +| LSD | Whether experiment uses Local Shape Descriptors | |
| 80 | + |
| 81 | +## 🔧 Key Features |
| 82 | + |
| 83 | +### Automated Data Extraction |
| 84 | +- **Smart Directory Scanning**: Automatically discovers experiments across multiple project directories |
| 85 | +- **Configuration Parsing**: Extracts parameters from `config.yaml` and `train.py` files |
| 86 | +- **Checkpoint Analysis**: Determines training progress and starting points |
| 87 | +- **LSD Detection**: Identifies Local Shape Descriptor usage from code analysis |
| 88 | + |
| 89 | +### Target Detection |
| 90 | +- **Config-based**: Extracts organelle targets from segmentation labels |
| 91 | +- **Name-based**: Infers targets from experiment naming conventions |
| 92 | +- **Multi-organelle Support**: Handles complex multi-target experiments |
| 93 | + |
| 94 | +### Data Quality |
| 95 | +- **Filtering**: Excludes incomplete experiments without checkpoints |
| 96 | +- **Validation**: Compares automated vs manual data for accuracy |
| 97 | +- **Deduplication**: Removes duplicate entries and consolidates data |
| 98 | + |
| 99 | +## 📈 Current Statistics |
| 100 | + |
| 101 | +- **75 total experiments** tracked |
| 102 | +- **17 experiments** using LSD (Local Shape Descriptors) |
| 103 | +- **58 experiments** using standard approaches |
| 104 | +- **95% accuracy** compared to manual curation |
| 105 | + |
| 106 | +### Target Distribution |
| 107 | +- `er+isg+ld+lyso+mito+nuc`: 40 experiments (multi-organelle) |
| 108 | +- `mito`: 13 experiments (mitochondria) |
| 109 | +- `cell`: 7 experiments (cell segmentation) |
| 110 | +- `nuc`: 4 experiments (nucleus) |
| 111 | +- Other specific combinations: 11 experiments |
| 112 | + |
| 113 | +## 🛠 Development |
| 114 | + |
| 115 | +### Adding New Experiment Types |
| 116 | +1. Update scanning logic in `scripts/generate_overview_csv.py` |
| 117 | +2. Add new directory patterns to `scan_experiment_directories()` |
| 118 | +3. Test with sample data |
| 119 | + |
| 120 | +### Extending Target Detection |
| 121 | +1. Modify `extract_additional_config_info()` function |
| 122 | +2. Add new organelle patterns to detection logic |
| 123 | +3. Update target inference rules |
| 124 | + |
| 125 | +## 📝 Dependencies |
| 126 | + |
| 127 | +Install required packages: |
| 128 | +```bash |
| 129 | +pip install -r config/requirements.txt |
| 130 | +``` |
| 131 | + |
| 132 | +Key dependencies: |
| 133 | +- `pyyaml`: Configuration file parsing |
| 134 | +- `pandas`: Data manipulation |
| 135 | +- `pathlib`: File system operations |
| 136 | + |
| 137 | +## 🤝 Contributing |
| 138 | + |
| 139 | +1. Follow the established directory structure |
| 140 | +2. Update documentation when adding features |
| 141 | +3. Test with existing experiment data |
| 142 | +4. Maintain data quality and validation |
| 143 | + |
| 144 | +## 📞 Contact |
| 145 | + |
| 146 | +For questions about specific experiments or data interpretation, please refer to the individual experiment directories or contact the research team. |
22 | 147 |
|
23 | 148 |
|
24 | 149 |
|
|
0 commit comments