Skip to content

Conversation

@be-smith
Copy link
Contributor

@be-smith be-smith commented Oct 13, 2025

Add version control system for items

Closes #1057

Summary

This PR implements a version control system for datalab items (samples, cells, equipment, starting materials), enabling users to save, compare, and restore previous versions of their item pages.

Features

Core functionality

  • Initial version creation automatically when creating a new items (action="created")
  • Version snapshots created on item save (user manually clicking save button) with atomic (thanks claude) version numbering
  • Version restoration with some data validation and protected fields (need to look at pydantic still)
  • Version comparison using DeepDiff library
  • Audit trail tracking version actions (created, manual_save, restored), need to think about autosave and deleting for the future

Data Model

  • Version snapshots saved in 'item_versions' collection
  • Atomic version counters in 'version_counters' collection to ensure can never have two versions of the same number
  • User storage: Info about use stored as a user object acting as a snapshot to the user details at the time of the version. So would display old display names or emails etc (can discuss) and also an ObjectId for querying
  • Software version tracking incase schemas etc change
  • Version relationships - tracks if version d was restored from version b for example

API endpoints

POST /items/<refcode>/save-version/

Manually save a version snapshot of the current item state.

  • Returns: {"status": "success", "version_number": 1, ...}

GET /items/<refcode>/versions/

List all versions for an item (sorted newest first).

  • Returns: {"status": "success", "versions": [...]}

GET /items/<refcode>/versions/<version_id>/

Get detailed data for a specific version.

  • Returns: {"status": "success", "version": {...}}

GET /items/<refcode>/compare-versions/?v1=<id>&v2=<id>

Compare two versions using DeepDiff.

  • Returns: {"status": "success", "diff": {...}, "v1_version_number": 1, "v2_version_number": 2}

POST /items/<refcode>/restore-version/

Restore item to a previous version (creates new version with action="restored").

  • Body: {"version_id": "..."}
  • Returns: {"status": "success", "restored_version": {...}, "new_version_number": 3}

DELETE /items/<refcode>/versions/<version_id>/

Delete a specific version snapshot.

  • Returns: {"status": "success", "message": "..."}

Protected Fields on Restore

The following fields are protected during version restoration and will not be overwritten:

  • _id (MongoDB ObjectId)
  • refcode (immutable identifier)
  • last_modified (updated automatically)
  • type (cannot change item type via restore)

Automatic Versioning Integration

Version snapshots are automatically created when:

  1. Creating a new item via /new-sample/ (action="created")
  2. Saving an item via /save-item/ (action="manual_save")
  3. Restoring a version via /restore-version/ (action="restored")

Database Optimization

  • Indexes on item_versions.refcode for fast version history lookup
  • Indexes on item_versions.user_id for user contribution queries
  • Compound index on (refcode, version_number) for sorted version history
  • Unique index on version_counters.refcode for atomic version numbering

UI Components (Currently Hidden)

A Vue.js VersionHistoryModal component has been implemented with:

  • Version list display (version number, timestamp, user, action)
  • Side-by-side diff viewer for comparing versions
  • One-click version restoration
  • Integration with EditPage

Dependencies

  • Added deepdiff>=7.0.0 for nested structure comparison

Future Work

  • Implement temporary version system for auto-save functionality
  • Uncomment version history UI in EditPage.vue
  • Add version cleanup/archival policies
  • Add version diff visualization in UI
  • Support for version branching/tagging

… and save the same version. Added better error handling for if an invalid id is used
Adds deepdiff ~= 8.1 to project dependencies to enable proper
comparison of nested dictionaries and lists in version control
functionality.
Replaces simple dict_diff function with DeepDiff library to properly
handle nested dictionaries, lists, type changes, and provide detailed
change information for version comparisons.
Adds comprehensive safety checks to restore_version:
- Permissions check requiring write access
- Protected fields list preventing restoration of critical system fields
  (refcode, _id, immutable_id, creator_ids, file_ObjectIds, version)
- Type consistency check preventing cross-type restoration
- Model validation ensuring restored data passes schema validation
- Atomic version incrementing using shared counter to prevent collisions

The version field now always increments forward to avoid duplicate
version numbers when restoring and then making subsequent changes.
Adds action field to track why each version was created:
- 'manual_save': User explicitly saved (save-version endpoint or save-item)
- 'auto_save': Reserved for future block-triggered auto-saves
- 'pre_restore_backup': System backup created before restoring

Refactored version saving into _save_version_snapshot() helper function
that can be called with different action parameters. The restore_version
endpoint also tracks which version was restored to via restored_from_version field.
Changes save_item to update the item BEFORE saving the version snapshot,
preventing orphaned versions if the item update fails.

Previously: save version → update item (if item update failed, orphaned version)
Now: update item → save version (if version save fails, item is still saved)

If version save fails after successful item update, the error is logged
but the request still succeeds since the user's work has been saved.
Add version field to the HasRevisionControl Pydantic model to support
the version control system's snapshot tracking. Fix the save_item
endpoint to correctly increment version by adding it to updated_data
rather than the discarded item object.
Add 33 tests covering all version control functionality:
- Save, list, get, compare, restore, and delete version endpoints
- Auto-versioning on save_item
- Atomic version counter with race condition prevention
- Protected field validation during restore
- Permissions enforcement
- Error handling and edge cases
- Add action and restored_from_version fields to list_versions endpoint
- Change restore to create version snapshot AFTER restoring (not before)
- Version snapshot now contains the restored data for clearer audit trail
- Update action type from "pre_restore_backup" to "restored"
- Add version control API service methods to server_fetch_utils.js
- Create VersionHistoryModal component for viewing and managing versions
- Add version history button to EditPage navbar
- Support version preview and restore functionality with proper state management
- Add new TestActionFields class with 5 tests validating action values
- Test manual_save action from save-version endpoint
- Test manual_save action from save-item endpoint (user saves)
- Test restored action with restored_from_version reference
- Test that restored version snapshots contain the restored data
- Test complete audit trail across multiple saves and restore
- Rename test_list_versions_action_field to be more descriptive
- Update test_restore_version_creates_backup to _creates_snapshot
- Remove duplicate action field tests from TestRestoreVersion class
- Fix unused variable in test_get_version_success
@be-smith be-smith changed the title Bes/revision history clean history Adding version control to samples, starting materials and cells Oct 13, 2025
@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 90.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.46%. Comparing base (5920ede) to head (888445a).

Files with missing lines Patch % Lines
pydatalab/src/pydatalab/routes/v0_1/items.py 90.27% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1373      +/-   ##
==========================================
+ Coverage   80.14%   80.46%   +0.31%     
==========================================
  Files          70       70              
  Lines        4799     4949     +150     
==========================================
+ Hits         3846     3982     +136     
- Misses        953      967      +14     
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/models/traits.py 98.68% <100.00%> (+0.03%) ⬆️
pydatalab/src/pydatalab/mongo.py 81.81% <100.00%> (+0.99%) ⬆️
pydatalab/src/pydatalab/routes/v0_1/items.py 87.37% <90.27%> (+1.17%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cypress
Copy link

cypress bot commented Oct 13, 2025

datalab    Run #4039

Run Properties:  status check passed Passed #4039  •  git commit 86f1f6625f ℹ️: Merge 888445a7c63c1d38a2940c3219e9828b3653cd16 into 5920ede8f9bbe3574bef6f2e31d7...
Project datalab
Branch Review bes/revision_history_clean_history
Run status status check passed Passed #4039
Run duration 07m 28s
Commit git commit 86f1f6625f ℹ️: Merge 888445a7c63c1d38a2940c3219e9828b3653cd16 into 5920ede8f9bbe3574bef6f2e31d7...
Committer Ben Smith
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 336
View all changes introduced in this branch ↗︎

@ml-evs ml-evs moved this to Todo in merge stack Oct 30, 2025
be-smith and others added 10 commits November 5, 2025 14:46
item_versions.refcode for finding history of one sample
item_versions.user_id for user contributions to versions
refcode and version number for ordered version history
version_counters.refcode for version numbering
…ot at the time a version is made, i.e won't reflect changes to display name.

Also has an user_id as an ObjectId that can be used for fast lookups and joins with the user collection
… restoring data.

Added software version test
@be-smith be-smith marked this pull request as ready for review November 5, 2025 15:12
Comment on lines +1041 to +1049
"_id": 1,
"timestamp": 1,
"user": 1,
"software_version": 1,
"version_number": 1,
"action": 1,
"restored_from_version": 1,
"data.version": 1,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a model for things that go in this collection?

As discussed, we really need to use the pydantic models wherever possible when operating on these.

@ml-evs ml-evs mentioned this pull request Nov 5, 2025
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS/UI side looks good and functional, just a few more comments before we can try this out on deployments -- thanks @be-smith!

"restored_from_version": str(
version_object_id
), # Track which version was restored from
"user": user_snapshot, # Snapshot for fast display
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"user": user_snapshot, # Snapshot for fast display

As mentioned, I'd just store the ID then recreate it on egress via something like the creators_lookup method in this file which does an aggregation as:

def creators_lookup() -> dict:
    return {
        "from": "users",
        "let": {"creator_ids": "$creator_ids"},
        "pipeline": [
            {"$match": {"$expr": {"$in": ["$_id", {"$ifNull": ["$$creator_ids", []]}]}}},
            {"$addFields": {"__order": {"$indexOfArray": ["$$creator_ids", "$_id"]}}},
            {"$sort": {"__order": 1}},
            {"$project": {"_id": 1, "display_name": 1, "contact_email": 1}},
        ],
        "as": "creators",
    }

flask_mongo.db.item_versions.insert_one(
{
"refcode": refcode,
"version_number": next_version_number,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"version_number": next_version_number,
"version": next_version_number,

Comment on lines +1234 to +1236
"restored_from_version": str(
version_object_id
), # Track which version was restored from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"restored_from_version": str(
version_object_id
), # Track which version was restored from
"restored_from_version": ObjectId(version_object_id), # Track which version was restored from

better to use the real ID in the database

), # Track which version was restored from
"user": user_snapshot, # Snapshot for fast display
"user_id": user_id, # ObjectId for efficient querying
"software_version": software_version,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"software_version": software_version,
"datalab_version": software_version,

Comment on lines +1221 to +1225
# Get the software version
try:
software_version = get_package_version("datalab-server")
except PackageNotFoundError:
software_version = "unknown"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get the software version
try:
software_version = get_package_version("datalab-server")
except PackageNotFoundError:
software_version = "unknown"
from pydatalab import __version__
software_version = __version__

better to use this, which defaults to develop if it can't find it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Logging revisions and changes to items

3 participants