Refactor and enhancements to saving of ML pipeline results #635

mihow · 2024-11-28T00:01:46Z

By addressing #607, we opened a can of bugs and missing features. So this led to a significant amount of refactoring and enhancements that are long overdue.

Changes:

Begin storing label lists / category maps that algorithms use. The model is called AlgorithmCategoryMap and maps the class index from the last model layer to the actual categories they represent. The categories have both a simple text label "Species name" as well as an object of metadata with an optional GBIF key, taxon rank, etc. These are used to create taxon entries in the Antenna database as well as show the top 3,5,10, N number of predictions from a model rather than just the top 1 which is saved as the occurrence's determination.
Moves job logs to their own field on the job model to reduce DB writes and overwriting the status field when writing logs.
Add "task_type" field to algorithms so we can determine if its a classification model vs. a detection model and so on.
Can handle results from multiple algorithms (moth/non-moth only)
Faster & refactored saving of results (batch saving, split up functions, etc)

Benefits of this change:

Can apply post-processing filters of species by region (filter and re-weight the scores)
Can do genus & higher taxon roll-ups
Can get the top N (top 3, 5, 10) suggestions instead of just the first
Labels are now entities with GBIF key, synonyms, etc. instead of just a text label - So can better map to Taxon entities in the Antenna database
Can use confidence score algorithms other than softmax on the Antenna side (e.g. temperature calibrated)
Allow seeing and agreeing with results from multiple models
Uses best score from all (non-intermediate) algorithms to determine the species determination of an occurrence (not just the latest)
Ready for detection tracking across frames!
Can better send current results back to backend to determine reprocessing needs

While troubleshooting I should have fixed most of the issues reported in #310

TODO:

netlify · 2024-11-28T00:03:31Z

✅ Deploy Preview for ami-dev canceled.

Name	Link
🔨 Latest commit	`2522063`
🔍 Latest deploy log	https://app.netlify.com/sites/ami-dev/deploys/678af27213ed5a0008591ede

…ckend API

…redictions-data

mihow · 2025-01-26T14:37:00Z

Closing in favor of #684

Base automatically changed from feat/improve-initial-start to main November 28, 2024 01:21

mihow force-pushed the feat/more-predictions-data branch from 7a615a5 to 9f10aa6 Compare November 28, 2024 01:31

mihow self-assigned this Nov 28, 2024

mihow force-pushed the feat/more-predictions-data branch 2 times, most recently from 39f974c to d9604c0 Compare December 7, 2024 02:34

mihow marked this pull request as ready for review December 19, 2024 05:43

mihow added this to the ML pipeline enhancements milestone Dec 19, 2024

mihow mentioned this pull request Dec 19, 2024

Enable users to register Processing Services & Pipelines #632

Merged

mihow force-pushed the feat/more-predictions-data branch from 9d7285a to 89ad145 Compare December 20, 2024 02:12

mihow added 20 commits December 20, 2024 13:22

feat: begin storing category maps with algorithms in the DB

ca5a920

feat: begin saving all logits and scores from all predictions

22851e6

fix: complete rename of softmax_scores field (now any calibrated score)

c635f26

feat: admin sections for Classifications and Category Maps

6e86709

fix: save simple labels along with category map data

a665c45

feat: use function to generate fake classifications

0eec366

feat: update schema in example ML backend. comments

5d047c6

fix: formatting

836880f

fix: remove bad admin filter

6c8b550

fix: update formatting (line-lengths)

85dec99

fix: reset line-length to existing project value

7da5159

feat: API views for algorithm category maps

2b0f039

feat: define schema for Algorithm & AlgorithmCategoryMap in the ML ba…

8598d77

…ckend API

feat: continue writing schema for algorithms

c94ecd1

feat: update image fetching utils based on latest AMI data companion

885ca0d

feat: bring schemas related to the ML backend responses in sync

0c6115e

fix: fix import

4f813b4

feat: update schema for algorithms

4c18630

feat: update tests for processing pipeline responses

9bc9072

chore: update formatting (line-lengths)

e1c2d8a

mihow added 6 commits December 20, 2024 13:22

feat: optionally return pipeline results

36f6134

fix: associate category map with each classification in addition to algo

12dbcc4

chore: more logging when saving results

35d6b7a

feat: allow filtering captures by project

fb5af5c

fix: update formatting

0f83a19

feat: make the classification list view more lightweight

a79f177

mihow force-pushed the feat/more-predictions-data branch from 109b707 to a79f177 Compare December 20, 2024 21:26

mihow added 2 commits December 20, 2024 14:06

feat: use redis for primary cache locally

eac3210

feat: support for retying failed requests

e8eb341

mihow changed the title ~~Save all scores from prediction results~~ Refactoring of ML pipeline results Jan 13, 2025

mihow changed the title ~~Refactoring of ML pipeline results~~ Refactor and enhancements to saving of ML pipeline results Jan 13, 2025

mihow mentioned this pull request Jan 15, 2025

Present latest job status in collections table and update populate button logic #657

Merged

mihow added bug Something isn't working enhancement New feature or request backend response time Enhancements related to performance in regards to response time ml related to machine learning models or pipeline services labels Jan 16, 2025

This was linked to issues Jan 16, 2025

Populate Antenna database with classifier scores from every class for each prediction #607

Closed

Fixes & updates to jobs #310

Closed

Make a pipeline for only the binary classifier #621

Closed

Import taxonomy from new classifiers #622

Closed

Merge branch 'main' of github.com:RolnickLab/antenna into feat/more-p…

fcf2f30

…redictions-data

mihow mentioned this pull request Jan 16, 2025

ML Pipeline v2 RolnickLab/ami-data-companion#67

Merged

mihow added 2 commits January 16, 2025 17:14

Merge branch 'main' into feat/more-predictions-data

7a313a8

Merge branch 'main' into feat/more-predictions-data

2522063

This was referenced Jan 21, 2025

New section for Algorithms and Category Maps #683

Closed

ML Pipeline v2 #684

Merged

mihow removed a link to an issue Jan 26, 2025

Import taxonomy from new classifiers #622

Closed

mihow closed this Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor and enhancements to saving of ML pipeline results #635

Refactor and enhancements to saving of ML pipeline results #635

Uh oh!

mihow commented Nov 28, 2024 •

edited

Loading

Uh oh!

netlify bot commented Nov 28, 2024 •

edited

Loading

Uh oh!

mihow commented Jan 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor and enhancements to saving of ML pipeline results #635

Refactor and enhancements to saving of ML pipeline results #635

Uh oh!

Conversation

mihow commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for ami-dev canceled.

Uh oh!

mihow commented Jan 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mihow commented Nov 28, 2024 •

edited

Loading

netlify bot commented Nov 28, 2024 •

edited

Loading