Extract waymo dataset to jpeg image files - python

I'm trying to extract waymo tensorrecord (tensorflow) based dataset to pictrue files.
I've tried the following:
import tensorflow as tf
FILENAME = 'D:\\waymo3\\waymo_open_dataset_v_1_2_0_individual_files\\training\\segment-15832924468527961_1564_160_1584_160_with_camera_labels.tfrecord'
dataset = tf.data.TFRecordDataset(FILENAME, compression_type='')
i=0
for data in dataset:
print(dir(data))
with open('C:\\Users\\my_user\\Desktop\\extracted_pic\\'+str(i)+'.jpeg', 'ab') as the_file:
the_file.write(data.numpy())
i+=1
Unfortunately, it creates a folder with unreadable jpeg.
I believe the dataset of waymo is saved in JPEG.
Therefore I can't understand what is my mistake.
As one sees, I've tried to open the folder in windows (10)

Related

How do I convert a folder of images to a npy file?

I have a folder containing images of gestures. But to make it work on my code I need to change it to X.npy and Y.npy. I looked for many questions regarding this kinda problems but still in the dark. How do I evaluate this? How do I convert the folder to create npy dataset of my own? Is there any code for this or any converter?
I found a piece of code for this purpose on github.
from PIL import Image
import os, sys
import cv2
import numpy as np
'''
Converts all images in a directory to '.npy' format.
Use np.save and np.load to save and load the images.
Use it for training your neural networks in ML/DL projects.
'''
# Path to image directory
path = "/path/to/image/directory/"
dirs = os.listdir( path )
dirs.sort()
x_train=[]
def load_dataset():
# Append images to a list
for item in dirs:
if os.path.isfile(path+item):
im = Image.open(path+item).convert("RGB")
im = np.array(im)
x_train.append(im)
if __name__ == "__main__":
load_dataset()
# Convert and save the list of images in '.npy' format
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
You can refer to the code snippet in the following GitHub repo, that I found in google to convert a folder of images to a npy file:
https://gist.github.com/anilsathyan7/ffb35601483ac46bd72790fde55f5c04
Here in this case entire images in the folder are converted into NumPy array and are appended in a list named x_train.To convert and save this list of images in a single '.npy' format file, we can use the same code snippet:
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
To convert and save this list of images in multiple '.npy' format files, use below code snippet :
imgset=np.array(x_train,dtype=object)
for i in range(len(imgset)):
np.save("imgds"+str(i)+".npy",imgset[i])

How to save/extract dataset from hdf5 and convert into TiFF?

I am trying to import CT scan data into ImageJ/FIJI (There is HDF5 plugin in ImageJ/Fiji, however the synchrotron CT data has so large datasets.. so it was failed to open). The scan data (Image dataset) is saved as dataset into the hdf5 file. So I have to extract image dataset from the hdf5 file, then converted it into the Tiff file.
HdF5 File path is "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"
Herein, 'SNT_BTO4_S1_1_1pag_db0005_vol.hdf5' is divided into several datasets, and the image dataset is in here:/entry0000/reconstruction/results/data
At the moment, I accessed to the image dataset using h5py. However, after that, I am in stuck to extract/save the dataset separately from the hdf5 file.
Which code is required to extract the image dataset from the hdf5 file?
After that, I am thinking of using from PIL to Image then convert the image into Tiff file. Can I get any advice on the code for this?
import numpy as np
import h5py
filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"
with h5py.File(filename,'r') as hdf:
base_items = list (hdf.items())
print('#Items in the base directory:', base_items)
#entry0000
G1 = hdf.get ('entry0000')
G1_items = list (G1.items())
print('#Items in entry0000', G1_items)
#reconstruction
G11 = G1.get ('/entry0000/reconstruction')
G11_items = list (G11.items())
print('#Items in reconstruction', G11_items)
#results_data
G12 = G11.get ('/entry0000/reconstruction/results')
G12_items = list (G12.items())
print('#Items in results', G12_items)
Extracting image data from an HDF5 file and converting to an image is a "relatively straight forward" 2 step process:
Access the data in the HDF5 file
Convert to an image with cv2 (or PIL)
A simple example is available here: How to extract individual JPEG images from a HDF5 file.
You can apply the same process to your file. Here is some pseudo-code. It's not complete because you don't show the shape of the image dataset (and the shape affects how to read the data). Also, you didn't say how many images are in dataset /entry0000/reconstruction/results/data --- does it have a single image or multiple images. If multiple images, which axis is the image counter?
import h5py
import cv2 ## for image conversion
filename = "F:/New_ESRF/SNT_BTO4/SNT_BTO4_S1/SNT_BTO4_S1_1_1pag_db0005_vol.hdf5"
with h5py.File(filename,'r') as hdf:
# get image dataset
img_ds = hdf['/entry0000/reconstruction/results/data']
print(f'Image Dataset info: Shape={img_ds.shape},Dtype={img_ds.dtype}')
## following depends on dataset shape/schema
## code below assumes images are along axis=0
for i in range(img_ds.shape[0]):
cv2.imwrite(f'test_img_{i:03}.tiff',img_ds[i,:]) # uses slice notation
# alternately load to a numpy array first
img_arr = img_ds[i,:] # slice notation gets [i,:,:,:]
cv2.imwrite(f'test_img_{i:03}.tiff',img_arr)
Note: you don't need to use .get() to get a dataset. You can simply reference the dataset path. Also, when you use a group object, use the relative path from the dataset to the group, not the absolute path. (You should modify your code to reflect these changes.) For example, the following are equivalent
G1 = hdf['entry0000']
## is the same as G1 = hdf.get('entry0000')
G11 = hdf['entry0000/reconstruction']
## is the same as G11 = hdf.get('entry0000/reconstruction')
## OR referencing G1 group object:
G11 = G1['reconstruction']
## is the same as G11 = G1.get('reconstruction')

How to load many images eficiently from folder using openCV

I try to create my own image datasets for machine learning.
The workflow I thought is the following :
①Load all image files as an array in the folder.
②Label the loaded images
③Split loaded image files to image_data and label_data.
④Finally, split image_data to image_train_data and image_test_data and split label_data to label_train_data and label_test_data.
However, it doesn't go well in the first step(①).
How can I load all image data efficiently?
And if you implement an image data set for machine learning according to this workflow, how you handle it?
I wrote following code.
cat_im = cv2.imread("C:\\Users\\path\\cat1.jpg")
But, Am I forced writing \cat1.jpg , \cat2.jpg ,\cat3.jpg.....?
## you can find all images like extenstion
import os,cv2
import glob
all_images_path= glob.glob('some_folder\images\*png') ## it gives path of images as list
## then you can loop over all files
loaded_images = []
for image_path in all_images_path:
image = cv2.imread(image_path)
loaded_images.append(image)
## lets assume your labels are just name of files and its like cat1.png,cat2.png etc
labels = []
for image_path in all_images_path:
labels.append(os.basename(image_path))

Loading .mat image dataset in python

I have an image dataset in the .mat format, what I want is to load this dataset and visualize it's images to interact with them such as resize them and save them in folder in the format that enable me to show them such as .jpg, .png, etc. How can I do that?
What I did is save the dataset in the scipy.io path in the python site-packages and write the following code:
import scipy.io as sio
dbpath = sio.loadmat('COFW_train_color.mat')
listing = os.listdir(dbpath)
num_samples = size(dbpath)
for file in listing:
im = (dbpath + '\\' + file)
imag = cv2.imread(im)
cv2.imshow(imag)
But this did not give me what i need and also return me the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'COFW_train_color.mat'
I also tried to use the full path to the dataset as folloe:
dbpath = "C:\\Users\\SONY\\AppData\\Local\\Programs\\Python\\Python35\\Lib\\site-packages\\scipy\\io\\COFW_train_color.mat"
but I received another error message:
NotImplementedError: Please use HDF reader for matlab v7.3 files
How can I reach and interact with this type of dataset and visualize it's images? can anyone please help me and I will be thankful.
pip install mat73
import mat73
data_dict = mat73.loadmat('COFW_train_color.mat')

What is 'filename' in the code of Transfer Learning using MNIST dataset?

The code for classification of MNIST dataset using Transfer Learning is given in the link https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tuning-a-pre-trained-model/
I am not able to understand what 'filename' stands in the code? Also, why the dataset is loaded twice in the code.
I have seen code using load_img() function but still, I am not able to run the given code without error as 'filename' is unknown as it is not defined in the link.
The MNIST dataset consists of two files 'mnist_train.csv' and 'mnist_test.csv'. There is code where the .csv files are being converted into images but then they have a single .csv file for every image. Here, there are only two .csv files for all the images present in train and test.
Thanks in advance!
The dataset is in a csv format, and has column filename which contains the image name in it.
I imagine the file has the following structure:
filename label
0 file1.jpg 1
1 file2.jpg 8
2 file3.jpg 5
....
They read the csv file into train
train=pd.read_csv("R/Data/Train/train.csv")
and then use the loop to open each file given in the dataframe
for i in range(len(train)):
temp_img=image.load_img(train_path+train['filename'][i],target_size=(224,224))
Using the above code the image is loaded and resized
train_data = pd.read_csv('train.csv')
labels = []
pixels =[]
for index, row in train_data.iterrows():
label=np.zeros(10)
label[row["label"]]=1
labels.append(label)
pixels.append(row[1:])
labels = np.array(labels)
pixels = np.array(pixels)
I have added the code for loading the data from csv file you posted in the comments

Categories