How to extract the all the images from the pdf page and also ignore the header & footer logo image? #136
Unanswered
thangarajdeivasikamani
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Team,
Consider my pdf sheet as like https://www.st.com/resource/en/datasheet/stm32f205rb.pdf.
In that I have used below code to extract the image. But I am not getting proper images actually available in the pdf.
import sys, pymupdf # import the bindings
fname = "stm32f103c8.pdf" # get filename from command line
doc = pymupdf.open(fname) # open document
iterate over the pages
for page in doc:
img_number = 0 # for enumerating images per page
# iterate over the image blocks
for block in page.get_text("dict")["blocks"]:
# skip if no image block
if block["type"] != 1:
continue
# build filename, like 'img17-3.jpg'
name = f"img{page.number}-{img_number}.{block['ext']}"
out = open(name, "wb")
out.write(block["image"]) # write the binary content
out.close()
img_number += 1 # increase image counter
Some time the reputative footer image logo, side image logo only consider as the image and extracting. Actual image extraction missing.

Even I tried with below code. It's not extracting the required images
https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/examples/extract-images/extract-from-xref.py
Beta Was this translation helpful? Give feedback.
All reactions