I have a python program that with time slows to a crawl. I've tested thoroughly, and narrowed it down to a method that downloads an image. The method uses cstringIO and urllib. The problem may also be some sort of infinite download with urllib (the program just freezes after a few hundred downloads).
Any thoughts on where the issue may be?
foundImages = []
images = soup.find_all('img')
print('downloading Images')
for imageTag in images:
gc.collect()
url = None
try:
#load image into a file to determine size and width
url = imageTag.attrs['src']
imgFile = StringIO(urllib.urlopen(url).read())
im = Image.open(imgFile)
width, height = im.size
#if width and height are both above a threshold, it is a valid image
#so add to recipe images
if width > self.minOptimalWidth and height > self.minOptimaHeight:
image = MIImage({})
image.originalUrl = url.encode('ascii', 'ignore')
image.width = width
image.height = height
foundImages.append(image)
imgFile = None
im = None
except Exception:
print('failed image download url: ' + url)
traceback.print_exc()
continue
#set the main image to be the first in the array
if len(foundImages) > 0:
first = foundImages[0]
recipe.imageUrl = first.originalUrl
return foundImages
Related
I have been trying to download images off of a site using bs4, the images are not jpeg or png so I think that bs4 is unable to find the image, I could be wrong about that as well.
Here's my code
#--- IMAGE --- NOT WORKING
#Finds the image URL with the name of the product
#done
image = soup.find('img', attrs={'class':"image_container"})
try:
image = image.get("src")
except AttributeError:
print("NO image FOUND")
image = "NO image FOUND"
if(image != "NO image FOUND"): #if the image is found
try:
pos = image.index("?")
image = "http:" + image[:pos]
except ValueError:
pass
pathImg += name[:nameLength] # Truncates to 5 characters and adds to pathImg file
if(generateFiles):
download(image, pathImg) # Downloads image
self.image = image # Exporting var to class global var
Heres an image of where the source is on the page
Source Code for the image container
I was trying to extract images from a pdf using PyMuPDF (fitz). My pdf has multiple images in a single page. I am maintaining a proper sequence number while saving my images. I saw that the images being extracted don't follow a proper sequence. Sometimes it is starting to extract from the bottom, sometimes from the top and so on. Is there a way to modify my code so that the extraction follow a proper sequence?
Given below is the code I am using :
import fitz
from PIL import Image
filename = "document.pdf"
doc = fitz.open(filename)
for i in range(len(doc)):
img_num = 0
p_no = 1
for img in doc.getPageImageList(i):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
if pix.n - pix.alpha < 4:
img_num += 1
pix.writeImage("%s-%s.jpg" % (str(p_no),str(img_num)))
else:
img_num += 1
pix1 = fitz.Pixmap(fitz.csRGB, pix)
pix1.writeImage("%s-%s.jpg" % (str(p_no),str(img_num)))
pix1 = None
pix = None
p_no += 1
Given below is a sample page of the pdf
I have the same problem I've used the following code:
import fitz
import io
from PIL import Image
file = "file_path"
pdf_file = fitz.open(file)
for page_index in range(len(pdf_file)):
# get the page itself
page = pdf_file[page_index]
image_list = page.getImageList()
# printing number of images found in this page
if image_list:
print(f"[+] Found {len(image_list)} images in page {page_index}")
else:
print("[!] No images found on the given pdf page", page_index)
for image_index, img in enumerate(page.getImageList(), start=1):
print(img)
print(image_index)
# get the XREF of the image
xref = img[0]
# extract the image bytes
base_image = pdf_file.extractImage(xref)
image_bytes = base_image["image"]
# get the image extension
image_ext = base_image["ext"]
# load it to PIL
image = Image.open(io.BytesIO(image_bytes))
# save it to local disk
image.save(open(f"image{page_index+1}_{image_index}.{image_ext}", "wb"))
The most probable way is to locate the 'img' var and order them.
I'd love to hear any further sggestions or if you found better idea/solution.
I'm trying to resize an uploaded file. So far I am confident that the image is loaded properly, and the PILLOW image class is created. It runs through my resizing script, but then it always stops on the .resize code...
I run the code on my desktop (not on a server), and the image resize works, but when I combine the resizing script with an image uploaded via POST, it's not working and shows a 500 error. What's going on?
I used print imageThumbnail.size right after the imageresizer code and got AttributeError: 'NoneType' object has no attribute 'size'
def imageResizer(im, pixellimit):
width, height = im.size
if width > height:
#Land scape mode. Scale to width.
aspectRatio = float(height)/float(width)
Scaledwidth = pixellimit
Scaledheight = int(round(Scaledwidth * aspectRatio))
newSize = (Scaledwidth, Scaledheight)
elif height > width:
#Portrait mode, Scale to height.
aspectRatio = float(width)/float(height)
Scaledheight = pixellimit
Scaledwidth = int(round(Scaledheight * aspectRatio))
newSize = (Scaledwidth, Scaledheight)
#FAILS RIGHT HERE... I double checked by writing print flags all over, and it so happens nothing past this line gets written
imageThumbnail = im.resize(newSize)
return imageThumbnail
Here's the portion of the FLask framework.
file = request.files['file']
location = str(args['lat']) + str(args['lon'])
location = location.replace('.','_')
GUID = datetime.strftime(datetime.now(), '%Y%m%d%H%M%S') + location
datetimeEntry = datetime.strftime(datetime.now(), '%Y-%m-%d %H:%M:%S')
fullFileName = GUID + '.' + file.filename.rsplit('.', 1)[1]
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
image = Image.open(file)
imageThumbnail = imageResizer(image, 800)
#NOTHING PAST THIS POINT GETS EXECUTED
imageThumbnailName = GUID + "thumb" + '.' + file.filename.rsplit('.', 1)[1]
imageThumbnailName.save(os.path.join(app.config['UPLOAD_FOLDER'], imageThumbnailName))
file.save(os.path.join(app.config['UPLOAD_FOLDER_LARGE_IMAGES'], fullFileName))
The problem is that you are trying to open:
file = request.files['file']
image = Image.open(file)
That file is not an actual file, but some metadata object with upload information. What you should do instead is:
image = Image.open(file.stream)
I am practising using scrapy to crop image with a custom imagePipeline.
I am using this code:
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def convert_image(self, image, size=None):
if image.format == 'PNG' and image.mode == 'RGBA':
background = Image.new('RGBA', image.size, (255, 255, 255))
background.paste(image, image)
image = background.convert('RGB')
elif image.mode != 'RGB':
image = image.convert('RGB')
if size:
image = image.copy()
image.thumbnail(size, Image.ANTIALIAS)
else:
# cut water image TODO use defined image replace Not cut
x,y = image.size
if(y>120):
image = image.crop((0,0,x,y-25))
buf = StringIO()
try:
image.save(buf, 'JPEG')
except Exception, ex:
raise ImageException("Cannot process image. Error: %s" % ex)
return image, buf
It works well but have a problem.
If there are original images in the folder,
then run the spider,
the images it download won't replace the original one.
How can I get it to over-write the original images ?
There is an expiration setting, it is by default 90 days.
I'm trying to check an image's dimension, before saving it. I don't need to change it, just make sure it fits my limits.
Right now, I can read the file, and save it to AWS without a problem.
output['pic file'] = request.POST['picture_file']
conn = myproject.S3.AWSAuthConnection(aws_key_id, aws_key)
filedata = request.FILES['picture'].read()
content_type = 'image/png'
conn.put(
bucket_name,
request.POST['picture_file'],
myproject.S3.S3Object(filedata),
{'x-amz-acl': 'public-read', 'Content-Type': content_type},
)
I need to put a step in the middle, that makes sure the file has the right size / width dimensions. My file isn't coming from a form that uses ImageField, and all the solutions I've seen use that.
Is there a way to do something like
img = Image.open(filedata)
image = Image.open(file)
#To get the image size, in pixels.
(width,height) = image.size()
#check for dimensions width and height and resize
image = image.resize((width_new,height_new))
I've done this before but I can't find my old snippet... so here we go off the top of my head
picture = request.FILES.get['picture']
img = Image.open(picture)
#check sizes .... probably using img.size and then resize
#resave if necessary
imgstr = StringIO()
img.save(imgstr, 'PNG')
imgstr.reset()
filedata = imgstr.read()
The code bellow creates the image from the request, as you want:
from PIL import ImageFile
def image_upload(request):
for f in request.FILES.values():
p = ImageFile.Parser()
while 1:
s = f.read(1024)
if not s:
break
p.feed(s)
im = p.close()
im.save("/tmp/" + f.name)