janelia-cellmap
diff --git a/‎.github/workflows/deploy-timeline.yml‎
Lines changed: 3 additions & 11 deletions b/‎.github/workflows/deploy-timeline.yml‎
Lines changed: 3 additions & 11 deletions
diff --git a/‎ORGANIZATION.md‎
Lines changed: 131 additions & 0 deletions b/‎ORGANIZATION.md‎
Lines changed: 131 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 141 additions & 16 deletions b/‎README.md‎
Lines changed: 141 additions & 16 deletions
diff --git a/‎archive/test_lsd_detection.py‎
Lines changed: 66 additions & 0 deletions b/‎archive/test_lsd_detection.py‎
Lines changed: 66 additions & 0 deletions
@@ -27,22 +27,14 @@ jobs:
           python -m pip install --upgrade pip
           pip install pandas plotly numpy
 
-      - name: Generate timeline visualizations
-        run: |
-          # If your script writes output into a subfolder (e.g. ./site),
-          # make sure to set publish_dir to that same folder below.
-          python scripts/generate_timeline.py
-
       - name: Avoid Jekyll processing (optional)
-        run: touch .nojekyll
+        run: touch output/visualizations/.nojekyll
 
-      # Publish repo root to gh-pages (adjust publish_dir if needed)
+      # Publish only the visualizations folder to gh-pages
       - name: Publish to gh-pages
         uses: peaceiris/actions-gh-pages@v4
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
-          publish_dir: .
-          # If your script writes to ./site instead, use:
-          # publish_dir: ./site
+          publish_dir: output/visualizations
           # Optional: keep a custom domain
           # cname: example.org
@@ -0,0 +1,131 @@
+# Repository Organization Summary
+
+## ✅ Completed Reorganization
+
+### 📁 New Directory Structure
+```
+exp-overview/
+├── 📄 README.md                     # Updated comprehensive documentation
+├── 🏃 run.sh                        # Main execution script  
+├── 📂 scripts/                      # All executable scripts
+│   ├── generate_overview_csv.py     # Main data generation script
+│   ├── add_experiment.py           # Add new experiments
+│   ├── check_config_targets.py     # Target validation
+│   ├── fix_csv_comprehensive.py    # Data cleaning
+│   └── generate_timeline.py        # Timeline generation
+├── 📂 data/                         # Data organization
+│   ├── raw/                        # Original/manual data
+│   │   └── overview.csv            
+│   └── processed/                  # Generated/cleaned data
+│       ├── auto_generated_overview.csv
+│       ├── config_targets_check.csv
+│       ├── detailed_setup_analysis.csv
+│       └── overview_corrected.csv
+├── 📂 output/                      # Generated outputs
+│   ├── reports/                    # Analysis reports
+│   │   └── comparison_report.md
+│   └── visualizations/             # HTML dashboards
+│       ├── index.html
+│       ├── experiment_timeline.html
+│       ├── experiment_gantt.html
+│       └── experiment_stats.html
+├── 📂 config/                      # Configuration files
+│   ├── requirements.txt
+│   └── project_config.py           # Project settings
+├── 📂 docs/                        # Documentation
+│   └── README_scripts.md
+├── 📂 archive/                     # Archived/temp files
+│   └── test_lsd_detection.py
+└── 📂 .github/                     # CI/CD workflows
+    └── workflows/
+        └── deploy-timeline.yml
+```
+
+### 🔧 Key Improvements
+
+#### 1. **Clear Separation of Concerns**
+- **`scripts/`**: All executable code
+- **`data/`**: Clear raw vs processed data separation  
+- **`output/`**: Generated reports and visualizations
+- **`config/`**: Configuration and dependencies
+- **`docs/`**: Documentation
+
+#### 2. **Better Entry Points**
+- **`run.sh`**: Single command to execute full pipeline
+- **Updated paths**: Scripts now output to organized locations
+- **Project config**: Centralized configuration management
+
+#### 3. **Professional Documentation**
+- **Comprehensive README**: Feature overview, usage guide, development info
+- **Directory structure**: Clear hierarchy with purpose explanations
+- **Quick start**: Simple commands to get running
+- **Statistics**: Current data summary and accuracy metrics
+
+#### 4. **Data Management**
+- **Raw data preservation**: Original files in `data/raw/`
+- **Processed outputs**: Generated files in `data/processed/`
+- **Report separation**: Analysis reports in dedicated directory
+- **Visualization assets**: HTML files properly organized
+
+#### 5. **Development Workflow**
+- **Executable scripts**: Proper shebang lines and permissions
+- **Path updates**: All scripts use new directory structure
+- **Configuration management**: Centralized settings
+- **Archive area**: Historical/temporary files separated
+
+### 🚀 Usage After Reorganization
+
+#### Generate Complete Overview
+```bash
+./run.sh
+```
+
+#### Individual Components
+```bash
+# Generate data only
+python scripts/generate_overview_csv.py
+
+# Add new experiment  
+python scripts/add_experiment.py
+
+# Generate timeline
+python scripts/generate_timeline.py
+```
+
+#### Access Results
+```bash
+# View main data
+cat data/processed/auto_generated_overview.csv
+
+# View accuracy report
+cat output/reports/comparison_report.md
+
+# Open visualizations
+open output/visualizations/index.html
+```
+
+### 📊 Impact
+
+#### Before Reorganization
+- ❌ Files scattered in root directory
+- ❌ Mixed data types and purposes
+- ❌ Unclear execution workflow
+- ❌ Limited documentation
+
+#### After Reorganization  
+- ✅ **Professional structure** following best practices
+- ✅ **Clear data pipeline** from raw → processed → output
+- ✅ **Easy execution** with single command
+- ✅ **Comprehensive documentation** for users and developers
+- ✅ **Maintainable codebase** with proper organization
+- ✅ **Scalable architecture** for future expansion
+
+### 🎯 Benefits
+
+1. **User Experience**: Single command execution, clear documentation
+2. **Development**: Easier to find, modify, and extend code
+3. **Collaboration**: Clear structure for team members
+4. **Maintenance**: Organized codebase reduces technical debt
+5. **Deployment**: Better suited for CI/CD and automation
+
+The repository is now organized according to modern software development best practices with clear separation of concerns, comprehensive documentation, and an intuitive workflow.
@@ -1,24 +1,149 @@
 
-# Experiment Overview
+# Experiment Overview Repository
 
-This document is a comprehensive overview of all model training experiments and their configurations across different biological groups and setups.
+This repository contains tools and data for managing and visualizing machine learning experiment overviews across multiple research projects.
 
-## Training Experiments Summary
+## 📁 Repository Structure
 
-### 🔬 Mitochondria Experiments (`exp_mito`)
-- **Focus**: Mitochondria segmentation with LSD loss
-- **Model Base**: Fly model architecture
-- **Resolution**: 16nm voxel size
-- **Training Data**: Mixed datasets for mitochondria detection
-- **Setups**: setup_15, setup_16, setup_17, setup_18, setup_19
+```
+exp-overview/
+├── README.md                     # This file
+├── scripts/                      # Main execution scripts
+│   ├── generate_overview_csv.py  # Primary script for generating experiment CSV
+│   ├── add_experiment.py         # Script for adding new experiments
+│   ├── check_config_targets.py   # Configuration validation
+│   ├── fix_csv_comprehensive.py  # Data cleaning utilities
+│   └── generate_timeline.py      # Timeline visualization generation
+├── data/                         # Data storage
+│   ├── raw/                      # Original/manual data files
+│   │   └── overview.csv          # Original manual experiment overview
+│   └── processed/                # Generated/processed data files
+│       ├── auto_generated_overview.csv     # Main automated overview
+│       ├── config_targets_check.csv        # Target validation results
+│       ├── detailed_setup_analysis.csv     # Detailed experiment analysis
+│       └── overview_corrected.csv          # Corrected overview data
+├── output/                       # Generated outputs
+│   ├── reports/                  # Analysis reports
+│   │   └── comparison_report.md  # Data accuracy comparison report
+│   └── visualizations/           # HTML visualizations
+│       ├── index.html            # Main dashboard
+│       ├── experiment_timeline.html  # Timeline view
+│       ├── experiment_gantt.html     # Gantt chart view
+│       └── experiment_stats.html     # Statistics dashboard
+├── config/                       # Configuration files
+│   └── requirements.txt          # Python dependencies
+├── docs/                         # Documentation
+│   └── README_scripts.md         # Detailed script documentation
+├── archive/                      # Archived/temporary files
+└── .github/                      # GitHub workflows
+    └── workflows/
+        └── deploy-timeline.yml   # Automated deployment
+```
 
-| Setup | Target | Model Type | Starting Checkpoint | Max Iterations | Resolution (nm) | Batch Size | Learning Rate | Creation Date | Still Running |
-|-------|--------|------------|-------------------|----------------|-----------------|------------|---------------|---------------|---------------|
-| setup_15 | mito | fly model | 20250806_mito_mouse_distance_16nm/362k | 410,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
-| setup_16 | mito | fly model | setup_15/80k | 330,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
-| setup_17 | mito | fly model | setup_16/30k | 270,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
-| setup_18 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 210,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
-| setup_19 | mito | fly model | 20250725_mito_all_mixed_distance_16nm/372k | 310,000 | 16 | 14 | 5.0e-05 | 2025-09-14 | YES |
+## 🚀 Quick Start
+
+### Generate Complete Experiment Overview
+```bash
+python scripts/generate_overview_csv.py
+```
+This creates `data/processed/auto_generated_overview.csv` with all experiment data.
+
+### Add New Experiment
+```bash
+python scripts/add_experiment.py
+```
+
+### Generate Timeline Visualization
+```bash
+python scripts/generate_timeline.py
+```
+
+## 📊 Data Description
+
+### Main Dataset: `auto_generated_overview.csv`
+Contains comprehensive experiment information with the following columns:
+
+| Column | Description |
+|--------|-------------|
+| Group | Experiment group (exp_cell, exp_cerebellum, etc.) |
+| Setup | Unique setup identifier |
+| Target | Target organelles (mito, nuc, cell, er+isg+ld+lyso+mito+nuc) |
+| Model Type | Architecture type (fly model, isolated_unet) |
+| Starting Checkpoint | Initial model checkpoint |
+| Max Iterations | Maximum training iterations |
+| Resolution (nm) | Voxel resolution in nanometers |
+| Batch Size | Training batch size |
+| Learning Rate | Training learning rate |
+| Creation Date | Experiment creation date |
+| Still Running | Whether experiment is currently active |
+| LSD | Whether experiment uses Local Shape Descriptors |
+
+## 🔧 Key Features
+
+### Automated Data Extraction
+- **Smart Directory Scanning**: Automatically discovers experiments across multiple project directories
+- **Configuration Parsing**: Extracts parameters from `config.yaml` and `train.py` files
+- **Checkpoint Analysis**: Determines training progress and starting points
+- **LSD Detection**: Identifies Local Shape Descriptor usage from code analysis
+
+### Target Detection
+- **Config-based**: Extracts organelle targets from segmentation labels
+- **Name-based**: Infers targets from experiment naming conventions
+- **Multi-organelle Support**: Handles complex multi-target experiments
+
+### Data Quality
+- **Filtering**: Excludes incomplete experiments without checkpoints
+- **Validation**: Compares automated vs manual data for accuracy
+- **Deduplication**: Removes duplicate entries and consolidates data
+
+## 📈 Current Statistics
+
+- **75 total experiments** tracked
+- **17 experiments** using LSD (Local Shape Descriptors)
+- **58 experiments** using standard approaches
+- **95% accuracy** compared to manual curation
+
+### Target Distribution
+- `er+isg+ld+lyso+mito+nuc`: 40 experiments (multi-organelle)
+- `mito`: 13 experiments (mitochondria)
+- `cell`: 7 experiments (cell segmentation)
+- `nuc`: 4 experiments (nucleus)
+- Other specific combinations: 11 experiments
+
+## 🛠 Development
+
+### Adding New Experiment Types
+1. Update scanning logic in `scripts/generate_overview_csv.py`
+2. Add new directory patterns to `scan_experiment_directories()`
+3. Test with sample data
+
+### Extending Target Detection
+1. Modify `extract_additional_config_info()` function
+2. Add new organelle patterns to detection logic
+3. Update target inference rules
+
+## 📝 Dependencies
+
+Install required packages:
+```bash
+pip install -r config/requirements.txt
+```
+
+Key dependencies:
+- `pyyaml`: Configuration file parsing
+- `pandas`: Data manipulation
+- `pathlib`: File system operations
+
+## 🤝 Contributing
+
+1. Follow the established directory structure
+2. Update documentation when adding features
+3. Test with existing experiment data
+4. Maintain data quality and validation
+
+## 📞 Contact
+
+For questions about specific experiments or data interpretation, please refer to the individual experiment directories or contact the research team.
 
 
 
 
@@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+
+import yaml
+from pathlib import Path
+
+
+def detect_lsd_usage(run_dir):
+    """Detect if experiment uses LSD by checking config.yaml and train.py files."""
+    config_file = run_dir / "config.yaml"
+    train_file = run_dir / "train.py"
+
+    print(f"Checking LSD for: {run_dir}")
+    print(f"Config file exists: {config_file.exists()}")
+    print(f"Train file exists: {train_file.exists()}")
+
+    # Check config.yaml for is_lsd or lsd flags
+    if config_file.exists():
+        try:
+            with open(config_file, "r") as f:
+                config = yaml.safe_load(f)
+
+            if isinstance(config, dict):
+                # Check run section for lsd flags
+                run_config = config.get("run", {})
+                if run_config:
+                    is_lsd = run_config.get("is_lsd", False)
+                    lsd = run_config.get("lsd", False)
+                    print(f"Config is_lsd: {is_lsd}, lsd: {lsd}")
+                    if is_lsd or lsd:
+                        print("Found LSD flag in config.yaml")
+                        return True
+
+        except Exception as e:
+            print(
+                f"Warning: Could not parse config.yaml for LSD detection in {run_dir}: {e}"
+            )
+
+    # Check train.py for affinities_map parameter
+    if train_file.exists():
+        try:
+            with open(train_file, "r") as f:
+                train_content = f.read()
+
+            # Look for affinities_map parameter in run() function call
+            has_affinities_map = "affinities_map" in train_content
+            has_assignment = "affinities_map =" in train_content
+            print(f"Train.py has 'affinities_map': {has_affinities_map}")
+            print(f"Train.py has 'affinities_map =': {has_assignment}")
+
+            if has_affinities_map and has_assignment:
+                print("Found affinities_map in train.py")
+                return True
+
+        except Exception as e:
+            print(
+                f"Warning: Could not read train.py for LSD detection in {run_dir}: {e}"
+            )
+
+    print("No LSD indicators found")
+    return False
+
+
+# Test setup_15
+setup_15_dir = Path("/groups/cellmap/cellmap/zouinkhim/exp_salivary/runs/setup_15")
+result = detect_lsd_usage(setup_15_dir)
+print(f"\nSetup_15 LSD result: {result}")