How to unzip cats-vs-dogs data in kaggle?

How to unzip cats-vs-dogs data in kaggle? - python

This is my first kaggle kernel and I am sure about things in kaggle.
I tried to create a new kernel for cats vs dogs classifier.
I created a new kernel in https://www.kaggle.com/c/dogs-vs-cats/notebooks
Then,
!ls ../input/dogs-vs-cats/
# sampleSubmission.csv test1.zip train.zip
!unzip ../input/dogs-vs-cats/train.zip
# this gives a report that looks like it works.
# it displays jpg files names
# but when I check the folder train, it does not exits
!ls ../input/dogs-vs-cats/train/
# there is no folder train
import os
print(os.listdir("../input/dogs-vs-cats"))
# ['train.zip', 'test1.zip', 'sampleSubmission.csv']
# there is no unzipped folder
How to access the data in kaggle kernel?

You can load the zip file into pandas,
df = pd.read_csv('train.zip')
df

You are looking at the unzipped files at the wrong place.
Instead of:
!unzip ../input/dogs-vs-cats/train.zip
!ls ../input/dogs-vs-cats/train/
Do this:
!unzip ../input/dogs-vs-cats/train.zip
!ls train/
To check in python
import os
print(os.listdir("train"))

Related

Extract waymo dataset to jpeg image files

I'm trying to extract waymo tensorrecord (tensorflow) based dataset to pictrue files.
I've tried the following:
import tensorflow as tf
FILENAME = 'D:\\waymo3\\waymo_open_dataset_v_1_2_0_individual_files\\training\\segment-15832924468527961_1564_160_1584_160_with_camera_labels.tfrecord'
dataset = tf.data.TFRecordDataset(FILENAME, compression_type='')
i=0
for data in dataset:
print(dir(data))
with open('C:\\Users\\my_user\\Desktop\\extracted_pic\\'+str(i)+'.jpeg', 'ab') as the_file:
the_file.write(data.numpy())
i+=1
Unfortunately, it creates a folder with unreadable jpeg.
I believe the dataset of waymo is saved in JPEG.
Therefore I can't understand what is my mistake.
As one sees, I've tried to open the folder in windows (10)

How to read raster as array?

from pyrsgis import raster
from pyrsgis.convert import changeDimension
# Assign file names
greenareaband1='Sentinel-2 (2)dense.tiff'
greenareaband2='Sentinel-2 L1C (3)dense.tiff'
greenareaband3='Sentinel-2 L1C (4)dense.tiff'
# Read the rasters as array
df,myimage=raster.read(greenareaband1,bands='all')
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
I keep getting this error but i'm sure that i have uploaded these images using
from google.colab import files
files.upload()

I had the same problem and I discovered that I made a mistake on assigning the file name. Maybe there is a mistake there and thus, it is not recognized as a tif, and thence be able to ReadAsArray(). Hope that is the only problem.

You have a couple of issue here. Having two spaces and parentheses in your file name is the last thing you want to do in Python. Make sure that you have changed the working directory to where your file is or provide relative path and add 'r' in the beginning. For example:
input_file = r'E:/path_to_your_file/raster_file.tif'
ds, data_arr = raster.read(input_file)
About working with Colab. I think the best option would be upload you files on your Google drive and then authenticate your Colab script to mount drive. Then you just need to change the working directory like this:
# authenticate google drive
from google.colab import drive
drive.mount('/content/drive')
# change working directory
os.chdir(r'/content/drive/My Drive/path_to_your_file')
Or, after mounting the drive simply do this:
input_file = r'/content/drive/My Drive/path_to_your_file/raster_file.tif'
ds, data_arr = raster.read(input_file)

Copy images in subfolders to another directory with subfolders using Python

I have a folder with many subfolders that contain images.
I split the data using train_test_split from sklearn.model_selection same as below:
folder_data_train, folder_data_test, train_target, test_target =
train_test_split(data, targets_array, test_size=0.20, random_state=42, shuffle = True, stratify = targets_array)
folder_data_test is include .png images.
output of print(folder_data_test) is:
['/avi_images/A4CH_RV\\12505310b836710d_c18.png'
'/avi_images/PLAX_valves\\6ad39d497bc07141_c21.png'
'/avi_images/A4CH_LV\\7f50b7e4c051d48f_b52.png' ...
'/avi_images/Suprasternal\\6978b0ee7068a69e_b37.png'
'/avi_images/A5CH\\61cabd1291a81fc8_b43.png'
'/avi_images/PLAX_full\\2cab9cf0dd8d6480_b7.png']
I want to copy these images from folder_data_test to new directory including subfolders. for example subfloder is A4CH_RV. My current code is:
dst_dir_test = '/avi_images_search/test/'
for testdata in folder_data_test:
shutil.copy(testdata, dst_dir_test)
it is copying all images from folder_data_test to dst_dir_test directory without subfolder. How can I copy them to the relevant subfolders?

shutil acts almost like shell in this case.
Your code does this (for each file) :
shutil.copy('/avi_images/A4CH_RV\\12505310b836710d_c18.png', '/avi_images_search/test/')
This is rougthly equivalent to
cp /avi_images/A4CH_RV\\12505310b836710d_c18.png /avi_images_search/test/
I'm a little bit confused by the \\, but I guess you are on windows and what you want is
cp /avi_images/A4CH_RV\\12505310b836710d_c18.png /avi_images_search/test/A4CH_RV
To do this in python you'll have to play around with the path, and create the directory before the copy.
src_base_path = '/avi_images/'
for testdata in folder_data_test:
[src_dir_path,file_name] = os.path.split(testdata)
sub_dir = os.path.join(dst_dir_test, src_dir_path[len(src_base_path):])
os.makedirs(sub_dir, exist_ok=True)
shutil.copy(testdata, sub_dir)
I didn't try it, but it should be something along those lines.

Loading datasets in offline mode in sklearn and skmultilearn

I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5).
I have used the following codes:
from skmultilearn.dataset import load_dataset
X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')
It works successfully when I am connected to the internet,
But when I am offline, it doesn't work!
I have downloaded all 3 named above datasets in a folder like this:
H:\Projects\Datasets
How can I use this folder as my source datasets while I am offline?
(I'm using windows 10)
The extensions of datasets that I have downloaded them are: .rar
Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html

You can but you first need to know the path that the dataset was stored to.
To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.
Example:
from sklearn.datasets import load_iris
import pandas as pd, os
#get the path
path = load_iris()['filename']
print(path)
#offline load
df = pd.read_csv(path)
#the path: THIS IS WHAT YOU NEED
main_path_with_datasets = os.path.dirname(path)
Once you get the main_path_with_datasets i.e. by doing main_path_with_datasets = os.path.dirname(path), you will now have the path. You can use it to load all the available downloaded datasets.
os.listdir(main_path_with_datasets)
['digits.csv.gz',
'wine_data.csv',
'diabetes_target.csv.gz',
'iris.csv',
'breast_cancer.csv',
'diabetes_data.csv.gz',
'linnerud_physiological.csv',
'linnerud_exercise.csv',
'boston_house_prices.csv']
EDIT for skmultilearn
from skmultilearn.dataset import load_dataset_dump
path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'
X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')

FileNotFoundError: No such file or directory (for Dogs and Cats code)

I'm new to Machine Learning and I'm following a Sentdex tutorial on Google Colab. It's supposed to be a ML program that distinguishes between cat and dog images. However, whenever I run my code, somethings wrong with my 'file or directory.'
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\atlgwc16\\PetImages/Dog'
I honestly don't know where Google Colab stores its files so I don't know where to put the folder of images.
Here is my full code so far:
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
from tqdm import tqdm
DATADIR = "C:\Users\atlgwc16\PetImages"
CATEGORIES = ["Dog", "Cat"]
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
for img in os.listdir(path):
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)
plt.imshow(img_array, cmap = 'gray')
plt.show()
break

Tutorial being followed as referenced in the question:
https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/
Since you are using Google Colab, you can upload the Kaggle dataset of dog and cat images to Google Drive. See the Google Colab Jupyter notebook provided by Google that explains how to do this:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
You would then access files from your Google Drive (in this case, the training set after you upload it to Google Drive) much in the same way as accessing files locally on your computer.
This is the example provided in the link above:
with open('/content/gdrive/My Drive/foo.txt', 'w') as f:
f.write('Hello Google Drive!')
!cat /content/gdrive/My\ Drive/foo.txt
So, since you are using Google Colab, you would need to adjust the code from the Sentdex tutorial to work better with the notebook you are creating. Google Colab uses Jupyter notebooks. Each cell in the notebook runs off of the same 'session'. So, if you import a Python module in one cell, it can be used in the next cells. It's magic like that.
It would look like this:
[CELL 1]
from google.colab import drive
drive.mount('/content/gdrive')
You will then give permission for Google Colab to access your Google Drive.
[CELL 2]
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
from tqdm import tqdm
DATADIR = '/content/gdrive/My Drive/PetImages/'
#^See?#
# You would need to go to Google Drive and create the 'PetImages' folder at the top level of your Google Drive. You would upload the data set to the PetImages folder creating a 'Dog' subfolder and a 'Cat' subfolder.
CATEGORIES = ["Dog", "Cat"]
for category in CATEGORIES: # do dogs and cats
path = os.path.join(DATADIR,category) # create path to dogs and cats
for img in os.listdir(path): # iterate over each image per dogs and cats
img_array = cv2.imread(os.path.join(path,img) ,cv2.IMREAD_GRAYSCALE) # convert to array
plt.imshow(img_array, cmap='gray') # graph it
plt.show() # display!
break # we just want one for now so break
break #...and one more!
After properly uploading the data set to Google Drive and using the special google.colab module, you should be able to easily access your training data. Google Colab is a cloud-based tool for creating Jupyter notebooks and running Python programs. So, while similar to running a Python program locally on your computer, it is not exactly the same. It would help to read through how Google Colab works more if you want to use it completely in the cloud--using GDrive to store files rather than your own computer. See the link I posted above from Google.
Happy coding.

I did it for my self and it works for me.
I use data set from my local drive such as a hard disk.
note: your dataset folder must be in the zip form.
first, follow the method with me and you will access your dataset from the local drive.I use google colab. first, create a Jupyter notebook in google Colab and run the below code step by step:
first step: run the below code in your notebook and upload your dataset from your hard drive or local drive
from google.Colab import files
uploaded = files.upload()
when the process is complete 100 percent and do the second step:
second step:
copy and run the below code, this step will unzip the dataset
import zipfile
import io
zf = zipfile.ZipFile(io.BytesIO(uploaded['DogVsCat.zip']), "r")
zf.extractall()
third step: run the code it will import all the required libraries
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
this will import all the required libraries for you.
fourth step:
specify the path for specifying the path do below steps:
fist: check the image for more ease of your
on the left corner folder icon click on the highlighted folder and you will see your unzip dataset in my case my dataset is "DogVsCat",
note: there you will see two kinds of dataset zip and unzip, you copy the path of unzip data.
right click on it and copy the path from it and
run the below code:
DIRECTORY ='/content/DogVsCats'
CATEGORIES = ['cats', 'dogs']
note: please add your path in DIRECTORY(This directory is the path for me) path, not my path. and again run the code:
note: please add your own folder names in CATEGORIES not my folder names for more information see the picture:
my dataset structure
at the end create train data
5th step:
data = []
for category in CATEGORIES:
path = os.path.join(DIRECTORY, category)
for img in os.listdir(path):
img_path = os.path.join(path, img)
label = CATEGORIES.index(category)
arr = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
new_arr = cv2.resize(arr, (60, 60))
data.append([new_arr, label])
sixth step:
print the data:
run the below code to show you
data
seventh step: shuffle your data:
import random
random.shuffle(data)
eight-step:
specify features and labels for training the model
X = []
y = []
for features, label in data:
X.append(features)
y.append(label)
X = np.array(X)
y = np.array(y)
ninth-step: print features
X
tenth-step: print labels
y
note I can not share all the code with you for the lack of time.
note: for more clearness check my code pictures:
pic1-Of-My-Code
pic2-of-my-code

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to unzip cats-vs-dogs data in kaggle? - python

You can load the zip file into pandas, df = pd.read_csv('train.zip') df

You are looking at the unzipped files at the wrong place. Instead of: !unzip ../input/dogs-vs-cats/train.zip !ls ../input/dogs-vs-cats/train/ Do this: !unzip ../input/dogs-vs-cats/train.zip !ls train/ To check in python import os print(os.listdir("train"))

Related

Extract waymo dataset to jpeg image files

How to read raster as array?

Copy images in subfolders to another directory with subfolders using Python

Loading datasets in offline mode in sklearn and skmultilearn

FileNotFoundError: No such file or directory (for Dogs and Cats code)

Categories

Resources