Accessing a specific item - python

I have the following images:
im1 = cv2.imread(root + '/' + '1.jpg')
im1_file = '1.jpg'
img1 = (im1,im1_file)
im2 = cv2.imread(root + '/' + '2.jpg')
im2_file = '2.jpg'
img2 = (im2,im2_file)
I then add the images to the pairs list, as follows:
pair = (img1,img2)
pairs.append(pair)
How can I access the file name (i.e. im_file) in each pair, img1 and img2?

Whenever you start using numbered variables, use a list.
You can store the file names all at once, and you can then read the files from that list.
import os
root = '/'
files = ['1.jpg', '2.jpg']
images = [cv2.imread(os.path.join(root, f) for f in files)]
You can access a specific item with images[0], for example.
If you need to join the lists for any reason, you can zip(files, images)

You can use simple tuple indexing to get the filename out (which indeed is the second, or 1-index, element in the tuple)
>>> img1[1]
'1.jpg'
Or as a collection:
>>> for pair in pairs:
... print(pair[1])
1.jpg
2.jpg
However you may want to consider using namedtuples here
from collections import namedtuple
ImageInfo = namedtuple("ImageInfo", "data name")
img1 = ImageInfo(im1, im1_file)
img2 = ImageInfo(im2, im2_file)
This makes your interface a little nicer to use
>>> img1.name
'1.jpg'
>>> for pair in pairs
... print(pair.name)
1.jpg
2.jpg
(in full disclosure, I tend to overengineer my data structures and a namedtuple may be a bit too heavy here depending on the context. Without knowing that, I tend to prefer to overcomplicate. YMMV.)

Related

Cannot iterate over a file?

I want to know how to apply a function over a file of images and save each of them in a separate file. For one image it works successfully, but i cannot apply it to all images.
import glob
images = glob.glob('/Desktop/Dataset/Images/*')
for img in images:
img = np.array(Image.open(img))
output = 'Desktop/Dataset/Output'
MyFn(img = img,saveFile = output)
You did not define the sv value in your 2nd code snippet.
As the image will be overwrite, try this code:
import glob
images = glob.glob('/Desktop/Dataset/Images/*')
i = 0
for img in images:
i += 1 #iteration to avoid overwrite
img = np.array(Image.open(img))
output = 'Desktop/Dataset/Output'
MyFn(img = img + str(i),saveFile = output)
try to use the library os directly with
import os
entries = os.listdir('image/')
this will return a list of all the file into your folder
This is because you are not setting the sv value in your loop. You should set it to a different value at each iteration in order for it to write to different files.

How to extract jpg EXIF metadata from a folder in chronological order

I am currently writing a script to extract EXIF GPS data from a folder of jpg images. I am using os.scandir to extract the entries from the folder but from my understanding os.scandir opens the files in an arbitrary way. I need the images to be opened in chronological order by filename. Below is my current code which works as intended however it does not open the images in the correct order. The files within my image folder are named chronologically like so: "IMG_0097, IMG_0098" etc.
#!/usr/bin/python
import os, exif, folium
def convert_lat(coordinates, ref):
latCoords = coordinates[0] + coordinates[1] / 60 + coordinates[2] / 3600
if ref == 'W' or ref == 'S':
latCoords = -latCoords
return latCoords
coordList=[]
map = folium.Map(location=[51.50197125069916, -0.14000860301423912], zoom_start = 16)
from exif import Image
with os.scandir('gps/') as entries:
try:
for entry in entries:
img_path = 'gps/'+entry.name
with open (img_path, 'rb') as src:
img = Image(src)
if img.has_exif:
latCoords = (convert_lat(img.gps_latitude, img.gps_latitude_ref))
longCoords = (convert_lat(img.gps_longitude, img.gps_longitude_ref))
coord = [latCoords, longCoords]
coordList.append(coord)
folium.Marker(coord, popup=str(coord)).add_to(map)
folium.PolyLine(coordList, color =" red", weight=2.5, opacity=1).add_to(map)
print(img_path)
print(coord)
else:
print (src.name,'has no EXIF information')
except:
print(img_path)
print("error occured")
map.save(outfile='/home/jamesdean/Desktop/Python scripts/map.html')
print ("Map generated successfully")
I would say ditch os.scandir and take advantage of more modern features the standard library has to offer:
from pathlib import Path
from operator import attrgetter
# assuming there is a folder named "gps" in the current working directory...
for path in sorted(Path("gps").glob("*.jpg"), key=attrgetter("stem")):
print(path) # do something with the current path
The from operator import attrgetter and key=attrgetter("stem") are a bit redundant, but I'm just being explicit about what attribute I would like to use for determining the sorted order. In this case, the "stem" attribute of a path refers to just the name of the file as a string. For example, if the current path has a filename (including extension) of "IMG_0097.jpg", then path.stem would be "IMG_0097". Like I said, the stem is a string, so your paths will be sorted in lexicographical order. You don't need to do any conversion to integers, because your filenames already include leading zeroes, so lexicographical ordering should work just fine.
You can sort a list using the built-in sorted function, Paul made an interesting point and simply sorting without any arguments will work just as fine:
a = ["IMG_0097.jpg", "IMG_0085.jpg", "IMG_0043.jpg", "IMG_0098.jpg", "IMG_0099.jpg", "IMG_0100.jpg"]
sorted_list = sorted(a)
print(sorted_list)
Output:
['IMG_0043.jpg', 'IMG_0085.jpg', 'IMG_0097.jpg', 'IMG_0098.jpg', 'IMG_0099.jpg', 'IMG_0100.jpg']
In your case you can do:
for entry in sorted(entries):

How to Naturally Sort Pathlib objects in Python?

I am trying to create a sorted list of files in the ./pages directory. This is what I have so far:
import numpy as np
from PIL import Image
import glob
from pathlib import Path
# sorted( l, key=lambda a: int(a.split("-")[1]) )
image_list = []
for filename in Path('./pages').glob('*.jpg'):
# sorted( i, key=lambda a: int(a.split("_")[1]) )
# im=Image.open(filename)
image_list.append(filename)
print(*image_list, sep = "\n")
current output:
pages/page_1.jpg
pages/page_10.jpg
pages/page_11.jpg
pages/page_12.jpg
pages/page_2.jpg
pages/page_3.jpg
pages/page_4.jpg
pages/page_5.jpg
pages/page_6.jpg
pages/page_7.jpg
pages/page_8.jpg
pages/page_9.jpg
Expected Output:
pages/page_1.jpg
pages/page_2.jpg
pages/page_3.jpg
pages/page_4.jpg
pages/page_5.jpg
pages/page_6.jpg
pages/page_7.jpg
pages/page_8.jpg
pages/page_9.jpg
pages/page_10.jpg
pages/page_11.jpg
pages/page_12.jpg
I've tried the solutions found in the duplicate, but they don't work because the pathlib files are class objects, and not strings. They only appear as filenames when I print them.
For example:
print(filename) # pages/page_1.jpg
print(type(filename)) # <class 'pathlib.PosixPath'>
Finally, this is working code. Thanks to all.
from pathlib import Path
import numpy as np
from PIL import Image
import natsort
def merge_to_single_image():
image_list1 = []
image_list2 = []
image_list3 = []
image_list4 = []
for filename in Path('./pages').glob('*.jpg'):
image_list1.append(filename)
for i in image_list1:
image_list2.append(i.stem)
# print(type(i.stem))
image_list3 = natsort.natsorted(image_list2, reverse=False)
for i in image_list3:
i = str(i)+ ".jpg"
image_list4.append(Path('./pages', i))
images = [Image.open(i) for i in image_list4]
# for a vertical stacking it is simple: use vstack
images_combined = np.vstack(images)
images_combined = Image.fromarray(images_combined)
images_combined.save('Single_image.jpg')
One can use natsort lib (pip install natsort. It should look simple too.
[! This works, at least tested for versions 5.5 and 7.1 (current)]
from natsort import natsorted
image_list = Path('./pages').glob('*.jpg')
image_list = natsorted(image_list, key=str)
# Or convert list of paths to list of string and (naturally)sort it, then convert back to list of paths
image_list = [Path(p) for p in natsorted([str(p) for p in image_list ])]
Just for posterity, maybe this is more succinct?
natsorted(list_of_pathlib_objects, key=str)
Note that sorted doesn't sort your data in place, but returns a new list, so you have to iterate on its output.
In order to get your sorting key, which is the integer value at the end of your filename:
You can first take the stem of your path, which is its final component without extension (so, for example, 'page_13').
Then, it is better to split it once from the right, in order to be safe in case your filename contains other underscores in the first part, like 'some_page_33.jpg'.
Once converted to int, you have the key you want for sorting.
So, your code could look like:
for filename in sorted(Path('./pages').glob('*.jpg'),
key=lambda path: int(path.stem.rsplit("_", 1)[1])):
print(filename)
Sample output:
pages/ma_page_2.jpg
pages/ma_page_11.jpg
pages/ma_page_13.jpg
pages/ma_page_20.jpg
The problem is not as easy as it sounds, "natural" sorting can be quite challenging, especially with potential arbitrary input strings, e.g what if you have "69_helloKitty.jpg" in your data?
I used https://github.com/SethMMorton/natsort a while ago for a similar problem, maybe it helps you.
Just use like this...
from pathlib import Path
- sorted by name:
sorted(Path('anywhere/you/want').glob('*.jpg'))
- sorted by modification time:
import os
sorted(Path('anywhere/you/want').glob('*.jpg'), key=os.path.getmtime)
- sorted by size:
import os
sorted(Path('anywhere/you/want').glob('*.jpg'), key=os.path.getsize)
etc.
Hint: since filenames are also created by you. Write file names adding padded zeros, like:
for i in range(100):
with open('filename'+f'_{i:03d}','wb'): # py3.6+ fstring
# write your file stuff...
# py3.3+ 'filename'+'_{:03d}'.format(i) for str.format()
...
'filename_007',
'filename_008',
'filename_009',
'filename_010',
'filename_011',
'filename_012',
'filename_013',
'filename_014',
...

How to create variables for Facial_Recognition from database

I'm trying to be able to pull data from a database with a name and an image file name then put it into a face_recognition Python program. However, for the code that I'm using, the program learns the faces by calling variables with different names.
How can I create variables based on the amount of data that I have in the database?
What could be a better approach to solve this problem?
first_image = face_recognition.load_image_file("first.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
second_image = face_recognition.load_image_file("second.jpg")
biden_face_encoding = face_recognition.face_encodings(second_image)[0]
You can use arrays instead of storing each image/encoding in an individual variable, and fill the arrays from a for loop.
Assuming you can change the filenames from first.jpg, second.jpg... to 1.jpg, 2.jpg... you can do this:
numberofimages = 10 # change this to the total number of images
images = [None] * (numberofimages+1) # create an array to store all the images
encodings = [None] * (numberofimages+1) # create an array to store all the encodings
for i in range(1, numberofimages+1):
filename = str(i) + ".jpg" # generate image file name (eg. 1.jpg, 2.jpg...)
# load the image and store it in the array
images[i] = face_recognition.load_image_file(filename)
# store the encoding
encodings[i] = face_recognition.face_encodings(images[i])[0]
You can then access eg. the 3rd image and 3rd encoding like this:
image[3]
encoding[3]
If changing image file names is not an option, you can store them in a dictionary and do this:
numberofimages = 3 # change this to the total number of images
images = [None] * (numberofimages+1) # create an array to store all the images
encodings = [None] * (numberofimages+1) # create an array to store all the encodings
filenames = {
1: "first",
2: "second",
3: "third"
}
for i in range(1, numberofimages+1):
filename = filenames[i] + ".jpg" # generate file name (eg. first.jpg, second.jpg...)
print(filename)
# load the image and store it in the array
images[i] = face_recognition.load_image_file(filename)
# store the encoding
encodings[i] = face_recognition.face_encodings(images[i])[0]

sort images based on a cluster correspondances list

I have the following working code to sort images according to a cluster list which is a list of tuples: (image_id, cluster_id).
One image can only be in one and only one cluster (there is never the same image in two clusters for example).
I wonder if there is a way to shorten the "for+for+if+if" loops at the end of the code as yet, for each file name, I must check in every pairs in the cluster list, which makes it a little redundant.
import os
import re
import shutil
srcdir = '/home/username/pictures/' #
if not os.path.isdir(srcdir):
print("Error, %s is not a valid directory!" % srcdir)
return None
pts_cls # is the list of pairs (image_id, cluster_id)
filelist = [(srcdir+fn) for fn in os.listdir(srcdir) if
re.search(r'\.jpg$', fn, re.IGNORECASE)]
filelist.sort(key=lambda var:[int(x) if x.isdigit() else
x for x in re.findall(r'[^0-9]|[0-9]+', var)])
for f in filelist:
fbname = os.path.splitext(os.path.basename(f))[0]
for e,cls in enumerate(pts_cls): # for each (img_id, clst_id) pair
if str(cls[0])==fbname: # check if image_id corresponds to file basename on disk)
if cls[1]==-1: # if cluster_id is -1 (->noise)
outdir = srcdir+'cluster_'+'Noise'+'/'
else:
outdir = srcdir+'cluster_'+str(cls[1])+'/'
if not os.path.isdir(outdir):
os.makedirs(outdir)
dstf = outdir+os.path.basename(f)
if os.path.isfile(dstf)==False:
shutil.copy2(f,dstf)
Of course, as I am pretty new to Python, any other well explained improvements are welcome!
I think you're complicating this far more than needed. Since your image names are unique (there can only be one image_id) you can safely convert pts_cls into a dict and have fast lookups on the spot instead of looping through the list of pairs each and every time. You are also utilizing regex where its not needed and you're packing your paths only to unpack them later.
Also, your code would break if it happens that an image from your source directory is not in the pts_cls as its outdir would never be set (or worse, its outdir would be the one from the previous loop).
I'd streamline it like:
import os
import shutil
src_dir = "/home/username/pictures/"
if not os.path.isdir(src_dir):
print("Error, %s is not a valid directory!" % src_dir)
exit(1) # return is expected only from functions
pts_cls = [] # is the list of pairs (image_id, cluster_id), load from whereever...
# convert your pts_cls into a dict - since there cannot be any images in multiple clusters
# base image name is perfectly ok to use as a key for blazingly fast lookups later
cluster_map = dict(pts_cls)
# get only `.jpg` files; store base name and file name, no need for a full path at this time
files = [(fn[:-4], fn) for fn in os.listdir(src_dir) if fn.lower()[-4:] == ".jpg"]
# no need for sorting based on your code
for name, file_name in files: # loop through all files
if name in cluster_map: # proceed with the file only if in pts_cls
cls = cluster_map[name] # get our cluster value
# get our `cluster_<cluster_id>` or `cluster_Noise` (if cluster == -1) target path
target_dir = os.path.join(src_dir, "cluster_" + str(cls if cls != -1 else "Noise"))
target_file = os.path.join(target_dir, file_name) # get the final target path
if not os.path.exists(target_file): # if the target file doesn't exists
if not os.path.isdir(target_dir): # make sure our target path exists
os.makedirs(target_dir, exist_ok=True) # create a full path if it doesn't
shutil.copy(os.path.join(src_dir, file_name), target_file) # copy
UPDATE - If you have multiple 'special' folders for certain cluster IDs (like Noise is for -1) you can create a map like cluster_targets = {-1: "Noise"} where the keys are your cluster IDs and their values are, obviously, the special names. Then you can replace the target_dir generation with: target_dir = os.path.join(src_dir, "cluster_" + str(cluster_targets.get(cls,cls)))
UPDATE #2 - Since your image_id values appear to be integers while filenames are strings, I'd suggest you to just build your cluster_map dict by converting your image_id parts to strings. That way you'd be comparing likes to likes without the danger of type mismatch:
cluster_map = {str(k): v for k, v in pts_cls}
If you're sure that none of the *.jpg files in your src_dir will have a non-integer in their name you can instead convert the filename into an integer to begin with in the files list generation - just replace fn[:-4] with int(fn[:-4]). But I wouldn't advise that as, again, you never know how your files might be named.

Categories