Skip to content

Fix pending files retry and scheduler reliability (v2.5.51-v2.5.57)#31

Merged
ttlequals0 merged 8 commits intomainfrom
fix/redis-connection-crash
Jan 4, 2026
Merged

Fix pending files retry and scheduler reliability (v2.5.51-v2.5.57)#31
ttlequals0 merged 8 commits intomainfrom
fix/redis-connection-crash

Conversation

@ttlequals0
Copy link
Owner

@ttlequals0 ttlequals0 commented Jan 1, 2026

Summary

This PR fixes multiple issues related to Redis connection handling, scheduled scans, and cleans up obsolete migration code.

v2.5.57 - Startup Crash Fix & Cleanup

  • Fix NameError crash on startup (regression from v2.5.56)
  • Remove obsolete migration scripts (migrate_db.py, migrate_db_safe.py, run_migration.py)
  • Update documentation to reflect automatic migration system

v2.5.56 - Scheduler & Integrity Scan Fixes

  • Fix scheduler not executing jobs (revert to v2.5.19 behavior)
  • Fix integrity scan reports missing changed files list

v2.5.55 - UI Terminology Update

  • Rename "File Changes" to "Integrity Scan" throughout the UI

v2.5.54 - Enhanced Redis Resilience

  • Enhanced Celery task result handling with exponential backoff
  • Added connection pool reset on Redis failures
  • Added Celery transport options for connection stability

v2.5.53 - Celery Redis Wrappers

  • Added safe wrappers for Celery AsyncResult operations
  • Protected task.ready() and task.get() from Redis crashes

v2.5.52 - HEIC False Positive Fix

  • Fix false positive corruption detection for HEIC files from iOS 18 devices
  • Detect libheif "auxiliary image" limitation errors and treat as warnings

v2.5.51 - Redis Connection Fix

  • Fix application crash with ConnectionResetError during file changes check
  • Created robust Redis connection handling with connection pooling and retry logic

Files Changed

Core Fixes:

  • app.py - Scheduler lock handling, version endpoint Redis utilities
  • pixelprobe/services/maintenance_service.py - Redis wrappers, changed files list fix
  • pixelprobe/progress_utils.py - Robust Redis client utilities
  • pixelprobe/api/scan_routes.py - Safe task state checking
  • celery_config.py - Transport options for resilience
  • media_checker.py - HEIC libheif error detection

UI Updates:

  • templates/index.html - Terminology updates
  • static/js/app.js - Notification message updates

Cleanup (v2.5.57):

  • Deleted tools/migrate_db.py, tools/migrate_db_safe.py, tools/run_migration.py
  • Updated tools/README.md, tools/MIGRATION_GUIDE.md, docs/maintenance/TOOLS_AND_SCRIPTS.md

Test plan

  • All 196 tests pass
  • Docker image v2.5.57 built and pushed
  • App starts without NameError
  • Scheduler acquires Redis lock and executes jobs
  • Integrity scan reports include changed files
  • Redis connection remains stable during long scans

ttlequals0 and others added 2 commits January 1, 2026 15:27
Root cause: redis.from_url() returns low-level Connection object,
not full Redis client. When connection was reset by peer, code
crashed with: 'Connection' object has no attribute 'register_connect_callback'

Solution:
- Added robust Redis connection handling in progress_utils.py
- Connection pooling with health_check_interval=30
- Auto-retry on connection failures (3 attempts, 1s delay)
- socket_keepalive=True to detect stale connections
- retry_on_timeout=True for transient failures

New utility functions:
- get_redis_client(): Returns properly configured Redis client
- get_redis_info(): Safely gets Redis server info with retry
- with_redis_retry(): Decorator for Redis operation retry logic

Updated maintenance_service.py and app.py to use new utilities
instead of problematic redis.from_url() pattern.

Files affected:
- pixelprobe/progress_utils.py
- pixelprobe/services/maintenance_service.py
- app.py
- version.py
- CHANGELOG.MD
HEIC files from iOS 18 devices were incorrectly flagged as corrupted due
to older libheif versions not supporting shared auxiliary images feature.

- Detect libheif "auxiliary image" errors in ImageMagick stderr
- Detect "cannot identify" errors on HEIC files in PIL
- Treat these as warnings instead of corruption flags
- Files now show as "HEIC validation skipped: libheif version limitation"

Ref: github.com/strukturag/libheif/issues/1190

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ttlequals0 ttlequals0 changed the title Fix Redis connection crash in maintenance service (v2.5.51) Fix Redis connection crash and HEIC false positive (v2.5.51-2.5.52) Jan 1, 2026
ttlequals0 and others added 6 commits January 1, 2026 16:21
Add safe_task_ready() and safe_task_get() helper functions with retry
logic to handle Redis connection errors in Celery's internal backend.

The previous fix (v2.5.51) addressed application-level Redis connections
but Celery's task.ready() and task.get() methods use their own internal
Redis connection which can still fail with connection reset errors.

This fix wraps those Celery methods with retry logic and graceful
error handling to prevent integrity scan crashes.
Root cause: v2.5.53 wrappers had insufficient retry logic (3 retries, 0.5s delay).
When Redis connection pool gets corrupted, ALL connections are bad.

Solution:
- Enhanced safe_task_ready/safe_task_get with 5 retries and exponential backoff
- Added reset_redis_pool() to force-disconnect and recreate pool on first failure
- Added safe_check_task_state() wrapper for scan_routes.py AsyncResult access
- Added Celery transport options for connection stability

Files: progress_utils.py, maintenance_service.py, scan_routes.py, celery_config.py
Update user-facing text for consistency:
- Scan reports filter dropdown
- Schedule type dropdowns (2 locations)
- All notification messages (8 locations)
- Scan type display mapping
…hanged files (v2.5.56)

- Revert scheduler to v2.5.19 behavior - remove is_celery_worker check
  - Celery prefork model incompatible with APScheduler threads
  - Allow any process to acquire scheduler lock (gunicorn works correctly)
- Fix integrity scan reports missing changed files
  - Assign local changed_files to self.changed_files_list after processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
v2.5.56 removed is_celery_worker definition but left a reference at line 1091,
causing NameError on startup. This fix removes the orphaned reference from
the file-lock fallback condition.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete tools/migrate_db.py, migrate_db_safe.py, run_migration.py
- Update tools/README.md, MIGRATION_GUIDE.md, docs/maintenance/TOOLS_AND_SCRIPTS.md
- Migrations now run automatically via app_startup_migration.py on startup
- Update CHANGELOG.md with cleanup notes for v2.5.57
@ttlequals0 ttlequals0 changed the title Fix Redis connection crash and HEIC false positive (v2.5.51-2.5.52) Fix pending files retry and scheduler reliability (v2.5.51-v2.5.57) Jan 4, 2026
@ttlequals0 ttlequals0 merged commit 8a340de into main Jan 4, 2026
6 checks passed
@ttlequals0 ttlequals0 deleted the fix/redis-connection-crash branch January 4, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant