How to find the file name for files generated by pdf2image - python

I am trying to convert my pdf files to jpg. I first use pdf2image to save the file as a .ppm. Then I want to use PIL to convert the .ppm to .jpg.
How do I find the name of the file that pdf2image saved?
Here is my code:
def to_jpg(just_ids):
for just_id in just_ids:
image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/')
file_name = ?
im = Image.open("/Users/davidtannenbaum/Desktop/scraped/{}.ppm".format(file_name))
im.save("/Users/davidtannenbaum/Desktop/scraped/{}.jpg".format(just_id))

You don't need to, the image variable should contain a list of Image objects. You can simply do:
for i, im in enumerate(image):
im.save("/Users/davidtannenbaum/Desktop/scraped/{}_{}.jpg".format(just_id, i)))

The convert_to_path() method has a few more parameters you can use. You can set the paths_only parameter to True and the format attribute fmt to "jpeg".
This will directly save your images to your output folder in JPG format instead of PPM and the image variable will contain the relative paths to each image instead of the image objects.
for just_id in just_ids:
image = convert_from_path('/Users/davidtannenbaum/Desktop/scraped/{}.pdf'.format(just_id), output_folder='/Users/davidtannenbaum/Desktop/scraped/', fmt="jpeg", paths_only=True)

pdf_path = '/path/to/pdf_images/'
output_folder = '/path/for/output/images/'
for pdf in os.listdir(pdf_path):
filename = pdf.split('.')[0] # prepare your filename
pdfs = convert_from_path(os.path.join(pdf_path,pdf),output_folder=output_folder, output_file=os.path.join(output_folder+ filename), fmt="jpeg")

Related

Python script to convert RBG image dataset into Grayscale images using pillow

I want to convert an image RGB dataset to Grayscale dataset using pillow. I want to write a script that takes in the dataset path and converts all images one by one into grayscale. At the end I want to save this script, and want to run this script on the server to avoid the copying of huge data to the server.
Probably this code would work for you?
Code:
import os
from PIL import Image
dir = '/content/pizza_steak/test/pizza'
for i in range(len(os.listdir(dir))):
# directory where images are stored
dir = '/content/pizza_steak/test/steak'
# get the file name
file_name = os.listdir(dir)[i]
# creating a final path
final_path = dir + '/' + file_name
# convet and save the image
Image.open(final_path).convert('L').save(f"/content/meow/gray{file_name}")
# uncomment this is you want to delete the file
# os.remove(final_path)

Take images from a directory, crop and save all images in the directory

I am trying to crop high-resolution images to something more manageable. I am trying to read in a directory rather than individual images and save the new cropped images in another directory. I would like to make all the output images as .png as I have in my code.
import cv2
path = './imgs2/P5.png'
img= cv2.imread (path)
imgcropped = img [1:400, 1:400]
cv2.imwrite ('./imgs/P5-cropped', imgcropped)
Any help with this problem is appreciated.
Here's what I use in this case:
import os
import cv2
path = 'path/to/image/dir/'
dest_path = 'path/to/destination/'
for f in os.listdir(path):
image = cv2.imread(os.path.join(path, f))
imgcropped = image[1:400, 1:400]
cv2.imwrite(os.path.join(dest_path, f), imgcropped)
Assuming that:
the images in path are already .png
path contains only the images you want to convert
os.listdir will give you the names of the files inside your origin dir (including extension), which you can use in imwrite to save the image in your destination dir with the same filename.

How to get the images and load it in the variable name same as the image name

I want my code to load all the images automatically. For now I have to
write code for each images separately, but i want it to automatically get all the images from the directory, use the image name as the variable to load image file and also modify the image name to store the encodings.
p_image = face_recognition.load_image_file("p.jpg")
P_face_encoding = face_recognition.face_encodings(p_image)[0]
Source for the face recognition code ( this is not my original code)
https://github.com/ageitgey/face_recognition/blob/master/examples/facerec_from_webcam_faster.py
import glob
p_image_list = []
for each_image in glob.glob("*.jpg"):
p_image_list.append(face_recognition.load_image_file(each_image)
p_image_list contains all the images in current folder
You can use a dictionary where items will be your variable names and have corresponding values of file names:
import os
files = os.listdir()
file_dict = {file : os.path.splitext(file) for file in files}

How to convert the pdf file to jpeg images

Here is my program, I want to convert pdf file into jpeg images, I wrote below code I am getting the PIL.PpmImagePlugin object how can I convert to jpeg format can you please help me. Thank you in advance.
from pdf2image import convert_from_path
images = convert_from_path('/home/cioc/Desktop/testingFiles/pdfurl-guide.pdf')
print images
You can add an output path and an output format for the images. Each page of your pdf will be saved in that directory in the specified format.
Add these keyword arguments to your code.
images = convert_from_path(
'/home/cioc/Desktop/testingFiles/pdfurl-guide.pdf',
output_folder='img',
fmt='jpeg'
)
This will create a directory named img and save each page of your pdf as a jpeg image inside img/
Alternatively, you can save each page using a loop by calling save() on each image.
from pdf2image import convert_from_path
images = convert_from_path('/home/cioc/Desktop/testingFiles/pdfurl-guide.pdf')
for page_no, image in enumerate(images):
image.save(f'page-{page_no}.jpeg')
You could use pdf2image parameter fmt='jpeg' to make it return JPEG instead.
You can also just manipulate the PPM as a you would a normal JPEG as this is only the backend file type. If you do Image.save('path.jpg') it will save it as a JPEG.

Python - how to make BMP into JPEG or PDF? so that the file size is not 50MB but less?

I have a scanner when i scan the page it makes a BMP file but the size per page is 50MB. How do i tell Python, make it JPEG and small size.
rv = ss.XferImageNatively()
if rv:
(handle, count) = rv
twain.DIBToBMFile(handle,'imageName.bmp')
how do you tell him to make it JPEG or PDF? ( Native transfers are always uncompressed images, so your image size will be:
(width-in-inches * dpi) * (height-in-inches * dpi) * bytes-per-pixel)
You can use something like PIL (http://www.pythonware.com/products/pil/) or Pillow (https://github.com/python-pillow/Pillow), which will save the file in the format you specify based on the filename.
The python TWAIN module will return the bitmap from DIBToBMFile as a string if no filename is specified, so you can feed that string into one of the image libraries to use as a buffer. Otherwise, you can just save to a file, then open that file and resave it, but that's a rather roundabout way of doing things.
EDIT: see (lazy mode on)
from PIL import Image
img = Image.open('C:/Python27/image.bmp')
new_img = img.resize( (256, 256) )
new_img.save( 'C:/Python27/image.png', 'png')
Output:
For batch converting:
from PIL import Image
import glob
ext = input('Input the original file extension: ')
new = input('Input the new file extension: ')
# Checks to see if a dot has been input with the images extensions.
# If not, it adds it for us:
if '.' not in ext.strip():
ext = '.'+ext.strip()
if '.' not in new.strip():
new = '.'+new.strip()
# Creates a list of all the files with the given extension in the current folder:
files = glob.glob('*'+ext)
# Converts the images:
for f in files:
im = Image.open(f)
im.save(f.replace(ext,new))

Categories