Skip to content

Conversation

@SebastienSyd
Copy link

@SebastienSyd SebastienSyd commented Feb 9, 2026

Optimize Docker Image Size - 40%+ Reduction

Summary

This PR significantly reduces the Docker image size from 7.75GB to ~4.5GB (40%+ reduction, saving ~3.25GB) through a series of optimizations while maintaining full application functionality.

Motivation

The current official Langflow Docker image (langflowai/langflow:latest) is 7.75GB, which:

  • Results in slower deployment times
  • Increases storage costs in production environments
  • Makes CI/CD pipelines slower
  • Creates a poor developer experience with longer pull times

Changes Made

1. Exclude Development Dependencies (~500-800MB saved)

Added --no-dev flag to uv sync commands to exclude development packages (pytest, mypy, ruff, etc.) from the production image.

uv sync --frozen --no-install-project --no-editable --no-dev --extra postgresql

2. Install Only Chromium Browser (~1.5GB saved)

Changed Playwright installation to only include Chromium instead of all browsers (Chromium, Firefox, and WebKit).

playwright install chromium  # Instead of: playwright install

3. Aggressive Python Package Cleanup in Builder Stage (~500MB-1GB saved)

Critical optimization: All cleanup happens in the builder stage BEFORE copying to runtime. This ensures deleted files never make it into the final image layers.

Added comprehensive cleanup of unnecessary files from the virtual environment:

  • Test directories: Removed tests/, test/, .pytest_cache/ directories
  • Documentation: Removed *.md, *.rst, *.txt files
  • Compilation artifacts: Removed C/C++ source files (*.c, *.h, *.cpp)
  • Cython source: Removed *.pyx, *.pxd, *.pxi files
  • Debug symbols: Stripped from .so files using strip --strip-unneeded
  • Man pages: Removed documentation from share/man and man/ directories
# BUILDER STAGE - cleanup before COPY
RUN cd /app/.venv && \
    find . -type d -name "tests" -exec rm -rf {} + 2>/dev/null || true && \
    # ... all cleanup operations ...

# RUNTIME STAGE - copy already-cleaned venv
COPY --from=builder --chown=1000 /app/.venv /app/.venv

Why this matters: Cleaning up in the runtime stage would only add whiteout markers without reducing image size. All cleanup must happen before the COPY to actually reduce the final image size.

4. Optimized Node.js Installation (~30-50MB saved)

  • Removed npm documentation and man pages after installation
  • Cleaned npm cache
npm cache clean --force
rm -rf /usr/local/lib/node_modules/npm/docs
rm -rf /usr/local/lib/node_modules/npm/man

5. Removed Unnecessary Runtime Dependencies

Removed git and gnupg from the runtime image (only needed during build).

Results

Metric Before After Improvement
Image Size 7.75GB ~4.5GB -3.25GB (42%)
Python packages 44,502 .py + 44,499 .pyc Optimized ~500-800MB saved
Browsers 3 (Chromium, Firefox, WebKit) 1 (Chromium only) ~1.5GB saved
Dev dependencies Included Excluded ~500-800MB saved

Size Breakdown Analysis

Top consumers after optimization:

  • PyTorch: 322MB
  • Playwright: 123MB (down from ~1.5GB+)
  • PyArrow: 111MB
  • Google APIs: 177MB
  • OpenCV: 126MB
  • SciPy: 65MB

Testing

  • Image builds successfully on ARM64
  • Image builds successfully on AMD64
  • Langflow starts and runs correctly
  • MCP servers work (npx commands functional)
  • Playwright/Cuga browser automation works
  • PostgreSQL connection works
  • All core features functional

Breaking Changes

None. All application features remain fully functional:

  • ✅ MCP (Model Context Protocol) servers work (Node.js/npx retained)
  • ✅ Playwright browser automation works (Chromium installed)
  • ✅ PostgreSQL support works
  • ✅ Frontend properly built and included

Technical Notes

Docker Layer Optimization

The most critical optimization is ensuring all cleanup happens in the builder stage before COPY:

# ❌ WRONG: Cleanup after COPY only adds whiteout markers
COPY --from=builder /app/.venv /app/.venv
RUN find /app/.venv -name "tests" -exec rm -rf {} +  # Files still in previous layer!

# ✅ CORRECT: Cleanup before COPY prevents files from being included
# Builder stage:
RUN find /app/.venv -name "tests" -exec rm -rf {} +
# Runtime stage:
COPY --from=builder /app/.venv /app/.venv  # Cleaned files never copied

Why Manual Chromium Dependencies?

We manually install Chromium dependencies instead of using playwright install --with-deps because:

  1. Playwright's --with-deps uses hardcoded package lists for Ubuntu 20.04
  2. Debian Trixie renamed packages (e.g., ttf-unifontfonts-unifont)
  3. Manual installation ensures compatibility with Debian Trixie

Python Package Cleanup Safety

The cleanup is safe because:

  • Development tools (pytest, mypy, ruff) are not needed in production
  • Test directories are not executed at runtime
  • C/C++ source is not needed after packages are compiled
  • Binary stripping only removes debug symbols, not functionality

@github-actions github-actions bot added the community Pull Request from an external contributor label Feb 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Walkthrough

Modified Docker build configuration to optimize image size by adding --no-dev flag to uv sync in builder stage, expanding runtime APT package installation for critical dependencies, introducing comprehensive cleanup steps to remove Python artifacts and documentation, and reorganizing Playwright setup to install only Chromium.

Changes

Cohort / File(s) Summary
Docker Build Optimization
docker/build_and_push.Dockerfile
Enhanced builder stage with --no-dev flag for uv sync; expanded runtime APT packages for libraries and Chromium dependencies; added extensive cleanup steps (Python build artifacts, test files, caches, documentation, symbols stripping); reorganized Node.js/npm and Playwright setup with image size reduction focus.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 6 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Quality And Coverage ⚠️ Warning PR lacks automated test coverage for Docker image optimization claims including size reduction, functionality with removed files, and Playwright with Chromium-only. Add automated tests validating image size reduction, Playwright functionality, package metadata integrity, and core feature operation with optimized dependencies.
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: reduce image size by 33%' directly and accurately summarizes the main objective of the changeset—optimizing Docker image size through build and runtime improvements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Coverage For New Implementations ✅ Passed PR modifies only Docker infrastructure configuration; no application-level changes or new test files required.
Test File Naming And Structure ✅ Passed The custom check for test file naming and structure is not applicable to this pull request. This PR exclusively modifies the Docker build configuration file (docker/build_and_push.Dockerfile) with no test files created, modified, or deleted.
Excessive Mock Usage Warning ✅ Passed The custom check for excessive mock usage in test files is not applicable to this pull request. The PR exclusively modifies docker/build_and_push.Dockerfile to optimize Docker image size through build configuration and cleanup improvements. No test files are added, modified, or removed in this pull request, making the test design and mock usage assessment irrelevant to the scope of these changes.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docker/build_and_push.Dockerfile`:
- Around line 141-157: The runtime-stage RUN that deletes files after COPY
--from=builder (the block starting with "RUN cd /app/.venv && \\" that removes
tests, docs, C/C++/Cython sources, strips .so, and rm -rf share/man) should be
moved into the builder stage so those files are never written into the layer
copied by COPY --from=builder; update the builder-stage cleanup (the existing
cleanup near the builder stage) to include the same find/rm and strip commands
for tests, __pycache__, docs (*.md, *.rst, *.txt), C/C++ sources (*.c, *.h,
*.cpp, *.hpp, *.cc), Cython files (*.pyx, *.pxd, *.pxi), and strip
--strip-unneeded on *.so, then delete the entire runtime-stage cleanup RUN block
(the post-COPY cleanup) so no whiteout layer is created.
- Line 149: The find command that currently deletes "*.txt" (the line containing
find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" -o -name
"*.TXT" \) -delete) will remove critical metadata files inside .dist-info
directories; change the deletion to either remove *.txt from that pattern or
scope the find to exclude .dist-info (e.g., add a path exclusion for .dist-info
or use -not -path '*/.dist-info/*') so files like
entry_points.txt/top_level.txt/requires.txt are preserved while still deleting
doc files.
- Around line 73-76: The Dockerfile currently deletes __pycache__ directories in
the RUN step (the find . -type d -name "__pycache__" -exec rm -rf {} + command)
which negates UV_COMPILE_BYTECODE=1; either remove that __pycache__ deletion
from the RUN pipeline (and the duplicate __pycache__ deletion later in the file)
so precompiled .pyc files produced by UV_COMPILE_BYTECODE are preserved, or
alternatively remove the UV_COMPILE_BYTECODE=1 setting if you intentionally want
to discard bytecode to minimize image size—update the RUN line(s) that reference
"__pycache__" and the UV_COMPILE_BYTECODE variable accordingly.
- Around line 88-117: The RUN apt-get install block is using pre-Trixie package
names that will fail on python:3.12.12-slim-trixie; update the listed
Chromium/runtime packages in that RUN step by replacing libasound2 with
libasound2t64, libcups2 with libcups2t64, libatk1.0-0 with libatk1.0-0t64,
libatk-bridge2.0-0 with libatk-bridge2.0-0t64, and libatspi2.0-0 with
libatspi2.0-0t64 so apt-get can find the correct Debian Trixie packages (keep
the rest of the RUN block, comments and cleanup steps unchanged).

Signed-off-by: Sebastien NICOT <sebastien.nicot@enterprisedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Pull Request from an external contributor ignore-for-release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant