Skip to content

Conversation

@rushabh31
Copy link

@rushabh31 rushabh31 commented Jul 20, 2025

What does this PR do?

This PR adds support for Groq Vision API and Vertex AI (Google Cloud) in the vision-parse library, enabling users to leverage multiple powerful vision models for processing images and PDFs.

  • Adding support for Groq API for vision models
  • Adding support for Vertex AI integration for vision models

Features Added

  1. Groq Integration: Implemented full support for Groq's vision models (meta-llama/llama-4-scout-17b-16e-instruct and meta-llama/llama-4-maverick-17b-128e-instruct)

  2. Vertex AI Integration: Added support for Google Cloud's Vertex AI platform with Gemini models (gemini-1.5-pro-002 and gemini-1.5-flash-002)

  3. Flexible Configuration Options:

    • Added groq_config parameter to both LLM and VisionParser classes
    • Added vertex_config parameter with support for multiple authentication methods (API key, service account JSON, service account key file)
  4. Robust Error Handling:

    • Implemented specific error handling for Groq's pixel size limitations
    • Added error handling for Vertex AI image size and dimension constraints
    • Providing clear guidance to users when images exceed API limits
  5. Documentation & Examples:

    • Created example scripts demonstrating both Groq and Vertex AI API usage
    • Added test examples for verifying integrations
    • Updated dependency management (pyproject.toml) with appropriate requirements
  6. Performance Optimization: Added guidance for proper image resolution settings to stay within API limitations

  7. Page-Level Visual Analysis: Implemented a new workflow to send entire page images to LLMs for detecting and summarizing embedded visuals like images, diagrams, charts, and visualizations

  8. Configurable Visual Summary: Added enable_image_summary parameter to toggle visual element detection and summary generation

Implementation Details

Groq Integration:

  • Added Groq models to supported models in constants.py
  • Extended LLM class to include Groq client initialization and request handling
  • Updated VisionParser to accept and pass through Groq configuration
  • Added proper error detection and messaging for Groq-specific limitations
  • Added proper optional dependency for Groq in pyproject.toml

Vertex AI Integration:

  • Added Vertex AI models to supported models in constants.py
  • Implemented Vertex AI client initialization with multiple authentication methods
  • Added _vertex method to handle image processing through Vertex AI
  • Updated VisionParser to accept and pass vertex_config parameter
  • Added comprehensive error handling for Vertex AI limitations
  • Created usage examples for Vertex AI integration
  • Added proper optional dependencies for Vertex AI in pyproject.toml

Page-Level Visual Analysis:

  • Implemented new workflow to send entire page images to LLMs for visual element detection
  • Added page_visuals_prompt template for instructing LLMs to identify and summarize embedded images, charts, diagrams, etc.
  • Created detect_page_visuals method in LLM class to handle the visual detection and summarization process
  • Updated parser workflow to integrate visual summaries into the markdown output
  • Removed legacy individual image extraction code in favor of the more efficient page-level approach

Configurable Visual Summary:

  • Added enable_image_summary parameter to LLM class with default value of True
  • Extended VisionParser to accept and pass this parameter to LLM instance
  • Updated the conversion logic to conditionally perform visual analysis based on the parameter value
  • Added example usage in documentation to demonstrate how to toggle the feature

Before submitting

  • This PR improves the library by adding support for new LLM providers (Groq and Vertex AI)
  • Ran make lint and make format to handle lint / formatting issues
  • Ran make test to run relevant tests scripts
  • Read the contributor guidelines
  • Wrote example code demonstrating the new functionalities
  • Added tests for both Groq and Vertex AI integrations

Testing

Groq Testing

  • Manually tested with the Groq API to verify the implementation works correctly
  • Created unit tests for both the LLM and parser classes to ensure proper integration with Groq
  • Verified error handling for image size limitations

Vertex AI Testing

  • Added tests for Vertex AI client initialization with different authentication methods
  • Created unit tests for both the LLM and parser classes to ensure proper integration with Vertex AI
  • Verified proper error handling for Vertex AI-specific limitations
  • Created example script demonstrating usage of Vertex AI with proper configuration

@rushabh31 rushabh31 requested a review from iamarunbrahma as a code owner July 20, 2025 17:54
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 20, 2025
@rushabh31
Copy link
Author

@iamarunbrahma Added support for Groq vision models.

@rushabh31 rushabh31 changed the title adding code for groq support [Feature] adding code for groq support Jul 20, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 20, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants