How to overwrite an existing image file in google colab? - python

I am doing an ML project in google colab. I need to pre-process the whole image in the train set and replace those images with the newly preprocessed ones provided that train set images are already uploaded in "content/train/images/". I created an image_preprocessing function where the input is the image and returns preprocessed image. Now I need to save this image by replacing previous one.
This is my code :
import cv2
import glob
import os
path = "/content/train/images/*.jpg"
for file in glob.glob(path):
img = cv2.imread(file)
file_name = os.path.basename(file)
img_preprocessed = image_preprocessing(img)
with open(file, 'w') as f:
f.write(img_preprocessed)
print(file_name + " preprocessed and saved\n")
I am a newbie in python. Please help. Thanks in advance.

What you have in your example is not too far but you are trying to save an image using a syntax made to write a text file (with open(file, 'w) as f).
As you are using openCV, you can directly save with cv2.imwrite(file, img_preprocessed). All put together:
import cv2
import glob
import os
path = "/content/train/images/*.jpg"
for file in glob.glob(path):
img = cv2.imread(file)
file_name = os.path.basename(file)
img_preprocessed = image_preprocessing(img)
# Save the img_preprocessed as a picture with a path matching 'file'
cv2.imwrite(file, img_preprocessed)
print(file_name + " preprocessed and saved")
NOTE: This example will overwrite your original images as requested in the question. However, it may be an issue for repeatability, it may be better to save them in a preprocessed_images folder so you retain the source. But it may not be required, it is up to your usage.

Related

Python script to convert RBG image dataset into Grayscale images using pillow

I want to convert an image RGB dataset to Grayscale dataset using pillow. I want to write a script that takes in the dataset path and converts all images one by one into grayscale. At the end I want to save this script, and want to run this script on the server to avoid the copying of huge data to the server.
Probably this code would work for you?
Code:
import os
from PIL import Image
dir = '/content/pizza_steak/test/pizza'
for i in range(len(os.listdir(dir))):
# directory where images are stored
dir = '/content/pizza_steak/test/steak'
# get the file name
file_name = os.listdir(dir)[i]
# creating a final path
final_path = dir + '/' + file_name
# convet and save the image
Image.open(final_path).convert('L').save(f"/content/meow/gray{file_name}")
# uncomment this is you want to delete the file
# os.remove(final_path)

object_detector.DataLoader.from_pascal_voc returning empty data

train_data = object_detector.DataLoader.from_pascal_voc(
'images_jpg_splitted/train/img',
'images_jpg_splitted/train/xml',
['bat']
)
val_data = object_detector.DataLoader.from_pascal_voc(
'images_jpg_splitted/test/img',
'images_jpg_splitted/test/xml',
['bat']
)
I am trying to detect bat from images. I have labeled the data using labelImg.
While trying to load the data from tflite_model_maker, object_detector.DataLoader.from_pascal_voc returns empty data. I have tried not splitting the image and XML file and it still did not work.
The error was in the image file. The file supported was only jpeg but although the extension was a jpeg, it was not recognizing the images as jpeg maybe because it was a png file and the extension was renamed. So, I used PIL to convert them to jpeg.
import PIL.Image
import glob
import os
if not "converted" in os.listdir():
os.mkdir("converted")
lst_imgs = [i for i in glob.glob("*.jpeg")]
print(lst_imgs)
for i in lst_imgs:
img = PIL.Image.open(i)
img = img.convert("RGB")
img.save("converted\\"+i, "JPEG")
print("Done.")
os.startfile("converted")
I ran into this issue because I was specifying the path to my annotations using ~. Starting my path with /home/myuser fixed this for me.
could you please write the code in below style?
dataloader = object_detector.DataLoader.from_pascal_voc(image_dir, annotations_dir, label_map={1: "person", 2: "notperson"})
It may be a syntax issue.

How to get the images and load it in the variable name same as the image name

I want my code to load all the images automatically. For now I have to
write code for each images separately, but i want it to automatically get all the images from the directory, use the image name as the variable to load image file and also modify the image name to store the encodings.
p_image = face_recognition.load_image_file("p.jpg")
P_face_encoding = face_recognition.face_encodings(p_image)[0]
Source for the face recognition code ( this is not my original code)
https://github.com/ageitgey/face_recognition/blob/master/examples/facerec_from_webcam_faster.py
import glob
p_image_list = []
for each_image in glob.glob("*.jpg"):
p_image_list.append(face_recognition.load_image_file(each_image)
p_image_list contains all the images in current folder
You can use a dictionary where items will be your variable names and have corresponding values of file names:
import os
files = os.listdir()
file_dict = {file : os.path.splitext(file) for file in files}

Create numpy array from images in different folders

I am a beginner with Python, scikit-learn and numpy. I have a set of folders with images for which I want to do apply different Machine Learning algorithms. I am however struggling to get these images into numpy data that I can use.
These are my prerequisites:
Each folder name holds the key to what the images are. For example /birds/abc123.jpg and /birds/def456.jpg are both "birds"
Each image is 100x100px jpg
I am using Python 2.7
There are 2800 images in total
This is my code as far as I have gotten:
# Standard scientific Python imports
import matplotlib.pyplot as plt
# Import datasets, classifiers and performance metrics
from sklearn import svm, metrics
import numpy as np
import os # Working with files and folders
from PIL import Image # Image processing
rootdir = os.getcwd()
key_array = []
pixel_arr = np.empty((0,10000), int)
for subdir, dirs, files in os.walk('data'):
dir_name = subdir.split("/")[-1]
if "x" in dir_name:
key_array.append(dir_name)
for file in files:
if ".DS_Store" not in file:
file = os.path.join(subdir, file)
im = Image.open(file)
im_bw = im.convert('1') #Black and white
new_np = np.array(im_bw2).reshape(1,-1)
print new_np.shape
pixel_arr = np.append(pixel_arr, new_np, axis=0)
What works in this code is the browsing through the folders, getting the folder names and fetching the correct files/images. What I cannot get to work is to create a numpy array that is 2800,10000 (or maybe the correct would be 10000,2800), i.e. 2800 rows with 10000 values in each.
This solution (that I am not sure if it works) is super slow though and I am quite sure that there must be a solution that is faster and more elegant than this!
How can I create this 2800x10000 numpy array, preferrably with the index number from the key_array attached?
If you don't need all the images at the same time, you can use a generator.
def get_images():
for subdir, dirs, files in os.walk('data'):
dir_name = subdir.split("/")[-1]
if "x" in dir_name:
key_array.append(dir_name)
for file in files:
if ".DS_Store" not in file:
file = os.path.join(subdir, file)
im = Image.open(file)
im_bw = im.convert('1') #Black and white
yield np.array(im_bw2).reshape(1,-1)
This way you don't hold all the images in memory at the same time, which will probably help you out.
The use the images you would then do:
for image in get_images():
...

How to add a bunch of images to a zipfile

I have a list of image filenames with about 150 entries. Every image is downloaded via urllib and stored on the system. The result is a zipfile containing several broken images. The last part of some images is missing / corrupt.
The image download works perfectly. Every image in the list is completely downloaded and a valid image. It looks like i have to wait until zf.write() is completely done until the next image is added. Is there a way to ensure this?
images = ['image-01.jpg', 'image-02.jpg', 'image-03.jpg']
zf = zipfile.ZipFile('file.zip', mode='w')
for image in images:
download_url = 'http://foo/bar/' + image
image_file = open(image, 'wb')
image_file.write(urllib.urlopen(download_url).read())
image_file.close
zf.write(image)
zf.close()
Thanks to alecxe. The solution is to close the file correctly.
image_file.close()

Categories