Skip to content

Conversation

mihow
Copy link
Collaborator

@mihow mihow commented Nov 28, 2024

By addressing #607, we opened a can of bugs and missing features. So this led to a significant amount of refactoring and enhancements that are long overdue.

Changes:

  • Begin storing label lists / category maps that algorithms use. The model is called AlgorithmCategoryMap and maps the class index from the last model layer to the actual categories they represent. The categories have both a simple text label "Species name" as well as an object of metadata with an optional GBIF key, taxon rank, etc. These are used to create taxon entries in the Antenna database as well as show the top 3,5,10, N number of predictions from a model rather than just the top 1 which is saved as the occurrence's determination.
  • Moves job logs to their own field on the job model to reduce DB writes and overwriting the status field when writing logs.
  • Add "task_type" field to algorithms so we can determine if its a classification model vs. a detection model and so on.
  • Can handle results from multiple algorithms (moth/non-moth only)
  • Faster & refactored saving of results (batch saving, split up functions, etc)

Benefits of this change:

  • Can apply post-processing filters of species by region (filter and re-weight the scores)
  • Can do genus & higher taxon roll-ups
  • Can get the top N (top 3, 5, 10) suggestions instead of just the first
  • Labels are now entities with GBIF key, synonyms, etc. instead of just a text label - So can better map to Taxon entities in the Antenna database
  • Can use confidence score algorithms other than softmax on the Antenna side (e.g. temperature calibrated)
  • Allow seeing and agreeing with results from multiple models
  • Uses best score from all (non-intermediate) algorithms to determine the species determination of an occurrence (not just the latest)
  • Ready for detection tracking across frames!
  • Can better send current results back to backend to determine reprocessing needs

While troubleshooting I should have fixed most of the issues reported in #310

TODO:

  • More testing with live ML backend
  • Test migration of existing detections & classifications (they shouldn't have any data in the affected fields)
  • Add Algorithm & AlgorithmCategoryMap schemas to ML backends
  • Ensure the schemes in the example ML backend and live ML backend are the same
  • Update tests
  • Add new tests for category maps
  • Add tests for creation of algorithms & category maps (document how to update either)
  • Fix detection of already processed images (since we now return moth/non-moth classifications, all detections are seen as already processed by every pipeline)
  • Fix occurrences using moth/non-moth algorithm as the determination (likely because score is higher)
  • Fix algorithms being registered multiple times
  • Fix category maps from being registered multiple times
  • Raise errors that happen when saving results in subtasks
  • Fix reprocessing - always is processing all images
  • Test on fresh export of data from production
  • Verify that full category map + scores map to expected taxa results - Try showing top 3 (logits are empty again?)
  • Issues with cache not updating?? - this was due to errors happening in the subtasks for saving results
  • Fix tests

Copy link

netlify bot commented Nov 28, 2024

Deploy Preview for ami-dev canceled.

Name Link
🔨 Latest commit 2522063
🔍 Latest deploy log https://app.netlify.com/sites/ami-dev/deploys/678af27213ed5a0008591ede

Base automatically changed from feat/improve-initial-start to main November 28, 2024 01:21
@mihow mihow force-pushed the feat/more-predictions-data branch from 7a615a5 to 9f10aa6 Compare November 28, 2024 01:31
@mihow mihow self-assigned this Nov 28, 2024
@mihow mihow force-pushed the feat/more-predictions-data branch 2 times, most recently from 39f974c to d9604c0 Compare December 7, 2024 02:34
@mihow mihow marked this pull request as ready for review December 19, 2024 05:43
@mihow mihow added this to the ML pipeline enhancements milestone Dec 19, 2024
@mihow mihow force-pushed the feat/more-predictions-data branch from 9d7285a to 89ad145 Compare December 20, 2024 02:12
@mihow mihow force-pushed the feat/more-predictions-data branch from 109b707 to a79f177 Compare December 20, 2024 21:26
@mihow mihow changed the title Save all scores from prediction results Refactoring of ML pipeline results Jan 13, 2025
@mihow mihow changed the title Refactoring of ML pipeline results Refactor and enhancements to saving of ML pipeline results Jan 13, 2025
@mihow mihow added bug Something isn't working enhancement New feature or request backend response time Enhancements related to performance in regards to response time ml related to machine learning models or pipeline services labels Jan 16, 2025
@mihow
Copy link
Collaborator Author

mihow commented Jan 26, 2025

Closing in favor of #684

@mihow mihow closed this Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working enhancement New feature or request ml related to machine learning models or pipeline services response time Enhancements related to performance in regards to response time

Projects

None yet

1 participant