Austin's Multilanguage Invoice Extractor is a Streamlit-based web application designed to analyze invoice images using Google Gemini AI's generative capabilities. The application allows users to upload invoice images and provides detailed answers to any input queries regarding the uploaded invoices.
- Getting Started
- File Structure
- Installation Guide
- Running the Application
- Environment Variables
- Usage Guide
- Error Handling
- Dependencies
This application provides a streamlined approach to extract, understand, and analyze invoice images in multiple languages using a combination of Google’s Generative AI (Gemini model), Python libraries for handling PDFs, and machine learning tools.
The application takes as input an image of an invoice and a user prompt (question), and it generates a detailed response based on the content of the invoice image. This is done using Google's Generative AI.
- app.py: The main Streamlit app that handles user interaction and invoice image processing.
- requirements.txt: Lists the required Python libraries and dependencies.
- .env: Contains the environment variables, including the Google API key.
- venv: Python virtual environment for isolated package management.
- README.md: Documentation for the application.
To start, clone the repository to your local machine:
git clone <repository-url>
cd multilanguage-invoice-extractor
You can use a virtual environment to manage dependencies. If not created, do the following:
python3 -m venv venv
source venv/bin/activate # For Mac/Linux
# or
venv\Scripts\activate.bat # For Windows
Install the necessary dependencies using pip
from the requirements.txt
file:
pip install -r requirements.txt
This will install all the required libraries like streamlit
, google-generativeai
, langchain
, etc.
You will need to create a .env
file to store your API key for Google Generative AI. Make sure your .env
file contains the following:
GOOGLE_API_KEY=<your-google-api-key>
Replace <your-google-api-key>
with your actual Google Generative AI API key.
Once all dependencies are installed and the environment is configured, you can run the application using Streamlit. Use the following command to launch the app:
streamlit run app.py
This will launch the Streamlit app in your default web browser.
The .env
file is used to securely store sensitive information like API keys. Here's the structure of the .env
file:
GOOGLE_API_KEY=<your-google-api-key>
GOOGLE_API_KEY
: This is required to authenticate the use of Google’s Generative AI model (Gemini) to process and analyze the invoice image.
-
Input Prompt: In the provided text field, enter a prompt/question you want the model to answer about the invoice.
Example:
- "What is the total amount on this invoice?"
- "Who is the recipient of this invoice?"
-
Upload Invoice Image: Upload an invoice image in
.jpg
,.jpeg
, or.png
format. -
Submit: Click the "Tell me about the Invoice" button to process the image and get a response.
-
Results: The response generated by the Gemini model will be displayed on the page under "The Response is".
-
File Not Uploaded: If no file is uploaded and the user clicks the submit button, a
FileNotFoundError
will be raised with the message "No file uploaded". -
Invalid API Key: Ensure that the API key is correctly configured in the
.env
file. Incorrect API keys will result in authentication errors when interacting with Google Generative AI. -
Image File Format: Ensure that only supported formats (
jpg
,jpeg
,png
) are uploaded. Unsupported formats will lead to file processing errors.
Below is a list of all the libraries and dependencies used in this project, as listed in the requirements.txt
file:
- streamlit: For building the interactive web interface.
- google-generativeai: To access Google’s Gemini AI model for content generation.
- python-dotenv: To manage environment variables.
- langchain: For natural language processing and chaining model queries.
- pyPDF2: To handle PDF file uploads (for future expansions).
- chromadb: For vector-based operations (optional, for advanced AI workflows).
- PDF Support: Add support for uploading and extracting data from PDF invoices using
pyPDF2
. - Language Translation: Integrate language detection and translation to support invoices in multiple languages.
- Model Expansion: Expand the model to support more complex queries and integrate invoice-specific AI models.