PDF to Excel Converter (GUI)

This is a PyQt5-based GUI application that allows users to convert PDF files into Excel files. The application provides multiple options for extracting data from PDFs, including tables, text, and OCR (Optical Character Recognition).

Features

PDF Table to Excel: Extracts tables from a PDF and saves them into an Excel file.
PDF Table/Sheet Excel: Extracts multiple tables from a PDF and saves each table into a separate sheet in an Excel file.
PDF to Excel: Extracts all text from a PDF and saves it into an Excel file.
OCR: Uses Optical Character Recognition to extract text from scanned PDFs and saves it into an Excel file.

Requirements

Python 3.x
PyQt5
pdfplumber
pandas
PyMuPDF (fitz)
pdf2image
easyocr
opencv-python (cv2)
numpy
poppler (+Add to PATH)

Installation

Install the required Python packages:

pip install PyQt5 pdfplumber pandas pymupdf pdf2image easyocr opencv-python numpy

Clone or Download the Repository

Run the application:

python main.py

Usage

Select a PDF File:

Click the "File" button to select a PDF file from your system.

Choose an Option:

Select one of the available options:

PDF Table to Excel: Extracts tables from the PDF.
PDF Table/Sheet Excel: Extracts multiple tables and saves each in a separate sheet.
PDF to Excel: Extracts all text from the PDF.
OCR: Uses OCR to extract text from scanned PDFs.

Save the Output:

Click the "Ok" button to choose the output location and save the Excel file.

Code Structure

PdfConverter Class

This class contains static methods for handling PDF conversion:

convert_pdf_table_to_excel(pdf_path, output_file): Extracts tables from a PDF and saves them into an Excel file.
extract_tables_from_pdf(pdf_path): Extracts multiple tables from a PDF.
export_tables_to_excel(tables, output_excel_path): Exports extracted tables to an Excel file.
convert_pdf_text_to_excel(pdf_path, output_file): Extracts all text from a PDF and saves it into an Excel file.
process_ocr(pdf_path, output_file): Uses OCR to extract text from scanned PDFs.
export_to_excel(detections, output_file): Exports OCR-detected text to an Excel file.

Ui_MainWindow Class

This class defines the GUI for the application:

setupUi(MainWindow): Sets up the main window, buttons, and options.
retranslateUi(MainWindow): Sets the text for UI elements.
openFileDialog(): Opens a file dialog to select a PDF file.
onOkButtonClicked(): Handles the conversion process based on the selected option.

Example

Launch the application.
Select a PDF file using the "File" button.
Choose an option (e.g., "PDF Table to Excel").
Click "Ok" and select the output location.
The application will generate an Excel file with the extracted data.

Notes

Ensure that the selected PDF file is not corrupted or password-protected.
For OCR functionality, the PDF should contain scanned images of text.

License

This project is open-source and available under the GPL v3 License.

Author

Kurama-90

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE.txt		LICENSE.txt
PDF-to-Excel.py		PDF-to-Excel.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF to Excel Converter (GUI)

Features

Requirements

Installation

Clone or Download the Repository

Usage

Select a PDF File:

Choose an Option:

Save the Output:

Code Structure

PdfConverter Class

Ui_MainWindow Class

Example

Notes

License

Author

About

Uh oh!

Packages

Languages

License

Kurama-90/GUI-PDF-to-Excel

Folders and files

Latest commit

History

Repository files navigation

PDF to Excel Converter (GUI)

Features

Requirements

Installation

Clone or Download the Repository

Usage

Select a PDF File:

Choose an Option:

Save the Output:

Code Structure

PdfConverter Class

Ui_MainWindow Class

Example

Notes

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages