Skip to content

Conversation

@manoj-bhamsagar
Copy link
Contributor

Make SVG processing optional to fix pycairo installation issues

Description

Referring to Issue
This PR addresses critical installation failures on Debian/Ubuntu systems caused by svglib 1.6.0 introducing breaking changes that require pycairo compilation. The pycairo package requires system-level C compilers (gcc) and Cairo development libraries (cairo-dev), which are often not available in minimal Docker images or CI/CD environments.

Problem:

  • Users reported installation failures when svglib>=1.5.1,<2 was a required dependency
  • The issue stems from pycairo requiring C compilation and system libraries
  • This blocks users who don't need SVG processing functionality

Solution:

  • Made svglib an optional dependency via Python's extras mechanism
  • Implemented graceful degradation when SVG dependencies are unavailable
  • Added support for custom SVG parsers to give users flexibility
  • Maintained full backward compatibility

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Note: While SVG processing now requires explicit installation with [svg] extra, this is NOT a breaking change due to graceful degradation. Existing code continues to work - SVG attachments are simply skipped with informative warnings when dependencies are not installed.

Changes Made

Core Implementation

  1. pyproject.toml: Moved svglib>=1.5.1,<1.6.0 to [project.optional-dependencies] section
  2. base.py: Enhanced process_svg() method with:
    • Custom parser support check
    • Try-except wrapper around svglib imports
    • Graceful degradation returning empty string with warning log
    • Comprehensive docstring explaining optional nature
  3. event.py: Added FileType.SVG = "svg" to enum for custom parser registration
  4. requirements.txt: Auto-updated to reflect dependency changes

Documentation

  1. README.md: Added "Optional Dependencies" section with installation instructions
  2. CHANGELOG.md: Documented the change and migration path
  3. MIGRATION_GUIDE.md: Created comprehensive guide with 4 migration options:
    • Continue without SVG support (default)
    • Install with built-in SVG support
    • Use custom SVG parser
    • Skip SVG files via callback

Testing & Examples

  1. tests/test_svg_optional.py: Added 4 unit tests covering:
    • SVG processing without svglib (graceful degradation)
    • SVG processing with svglib available
    • Edge cases (empty responses)
    • Reader initialization without dependencies
  2. examples/svg_parsing_examples.py: Created working examples demonstrating all 4 approaches

Installation Options

Default (No SVG support):

pip install llama-index-readers-confluence

With SVG support:

pip install 'llama-index-readers-confluence[svg]'

With custom parser (no pycairo needed):

from llama_index.readers.confluence import ConfluenceReader
from llama_index.readers.confluence.event import FileType

reader = ConfluenceReader(
    base_url="https://example.atlassian.com/wiki",
    api_token="your_token",
    custom_parsers={FileType.SVG: YourCustomSVGParser()}
)

Backward Compatibility

Fully maintained through graceful degradation:

  • Existing code continues to work without modification
  • SVG attachments are skipped with informative warnings when dependencies unavailable
  • Users can opt-in to SVG support by installing [svg] extra
  • Custom parsers provide alternative implementation path

How Has This Been Tested?

  • I added new unit tests to cover this change
  • New and existing unit tests pass locally with my changes

Test Results:

pytest tests/test_svg_optional.py -v
==========================================
✅ test_svg_processing_without_svglib PASSED
✅ test_svg_processing_with_empty_response PASSED  
✅ test_reader_initialization_without_svglib PASSED
⏭️  test_svg_processing_with_svglib_available SKIPPED (svglib not installed - expected)

3 passed, 1 skipped in 1.65s

Smoke Tests:

✅ Package imports without SVG dependencies
✅ FileType.SVG enum properly defined
✅ ConfluenceReader initializes successfully
✅ Graceful degradation works correctly
✅ Custom parsers can be registered

New Package?

  • Yes
  • No

Version Bump?

  • Yes - Version bump will be handled by maintainers
  • No

Suggested Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks (N/A)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Additional Context

This follows the established Python pattern for optional dependencies (similar to pandas[excel], requests[security], etc.). Users who don't need SVG processing benefit from faster installation without C compilation requirements, while users who need SVG support can explicitly opt-in.

The implementation provides three migration paths:

  1. No action needed: Continue without SVG support (default)
  2. Install with [svg] extra: Get built-in SVG processing
  3. Custom parser: Implement your own SVG parsing logic
  4. Skip via callback: Exclude SVG files entirely

See MIGRATION_GUIDE.md for detailed migration instructions and examples.

…ation issues

This change addresses installation failures on Debian/Ubuntu systems where
svglib 1.6.0 introduced breaking changes that require pycairo compilation,
which fails without gcc and cairo-dev system libraries.

Changes:
- Move svglib dependency to optional extras: pip install 'llama-index-readers-confluence[svg]'
- Add graceful degradation in process_svg() when dependencies unavailable
- Add FileType.SVG enum for custom parser support
- Add comprehensive migration guide with 4 different approaches
- Add unit tests for optional dependency behavior
- Add working examples for all SVG processing options
- Update README and CHANGELOG

Breaking Change:
SVG processing now requires explicit installation with [svg] extra.
Users who need SVG support should install with:
pip install 'llama-index-readers-confluence[svg]'

Backward Compatibility:
Maintained through graceful degradation - SVG attachments are skipped
with informative warnings when dependencies are not installed.

Fixes installation issues on systems without C compilers.
Tested: 3 tests passed, 1 skipped (expected when svglib not installed)
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Oct 19, 2025
@AstraBert
Copy link
Member

AstraBert commented Oct 23, 2025

Can't we just pin the version of svglib to 1.6.0 (your PR does that nevertheless) and bump the version of the integration to 0.5.0 without actually making the dependency optional? Just to avoid breaking things unexpectedly for those who want to upgrade to newer versions of this integration (and want to do SVG processing), without knowing that they would have to install svglib as an extra dependency

@manoj-bhamsagar
Copy link
Contributor Author

Can't we just pin the version of svglib to 1.6.0 (your PR does that nevertheless) and bump the version of the integration to 0.5.0 without actually making the dependency optional? Just to avoid breaking things unexpectedly for those who want to upgrade to newer versions of this integration (and want to do SVG processing), without knowing that they would have to install svglib as an extra dependency

ok, can I pin svglib version to 1.5.1 because as per the issue mentioned in PR, svglib version 1.6.0 has breaking changes

@AstraBert
Copy link
Member

Yeah, I would just say: svglib>=1.5,<1.6 :)

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 26, 2025
@manoj-bhamsagar
Copy link
Contributor Author

Yeah, I would just say: svglib>=1.5,<1.6 :)

ok, I've reverted the previous changes and pinned the version as suggested.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 27, 2025
@AstraBert AstraBert enabled auto-merge (squash) October 27, 2025 12:27
@AstraBert AstraBert merged commit 41baefb into run-llama:main Oct 27, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants