Load multiple memory mapped files with numpy - python

I'm trying to load two memory mapped files,
temp = numpy.load(currentDirectory + "\\tmp\\temperature.npy", mmap_mode='r')
salinity = numpy.load(currentDirectory + "\\tmp\\salinity.npy", mmap_mode='r')
but Python throws the following error:
IOError: Failed to interpret file 'C:\\my\\file\\path\\..\\tmp\\salinity.npy' as a pickle
When I load either by itself, it works just fine.
The files that are quite large (~500MB), but otherwise I don't think they are notable.
What might the problem be here?

This works for me. Both files are large than 5GB.
X = np.load(os.path.join(path, '_file1.npy'), mmap_mode='r')
Y = np.load(os.path.join(path, '_file2.npy'), mmap_mode='r')
Which operating system are you using? The problem is not with size of the "npy" files but problem is with "\" in path. change your path as:
path = '/media/gtx1060/DATA/Datasets'

Related

Problem with reading multi extension .fits files

I am trying to read/open some multi-extension .fits files. But I have a problem opening them. Here is the part of the cod I am using to open .fits files located in the same folder:
imgs = sorted(glob.glob('location_of_the_files/*.fits'))
for location in imgs:
hdul = fits.open(imgs)
original = hdul[1].data
model = hdul[2].data
residual = hdul[3].data
When running this I am getting this:
OSError: File-like object does not have a 'write' method, required for mode 'ostream'.
I try to check on the internet, but I do not understand what is going on.
Any help on how to solve this?
Maybe it is important to mention when trying to open single .fits file with this code everything works without any problem:
hdul = fits.open("location_of_the_files/image_data.fits")
original = hdul[1].data
model = hdul[2].data
residual = hdul[3].data
If needed let me know and I can upload .fits files (in that case, please just tell me how to do this here).
Thanks.
I suppose that you were trying to do something like this:
for location in imgs:
with fits.open(location) as hdul:
original = hdul[1].data
model = hdul[2].data
residual = hdul[3].data
...
Note the location instead of imgs as an argument of open method.

List Index out of range.. works on google colab but not on local machine?

I'm trying to recreate this project on my local machine. It's designed to run on Google Colab and I've recreated it there, and it works just fine. I want to try running it on my local machine now, so I installed all the required packages, anaconda, Juypter Notebook etc.
When I come to the part where I process the images:
# Loops through imagepaths to load images and labels into arrays
for path in imagepaths:
img = cv2.imread(path) # Reads image and returns np.array
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Converts into the corret colorspace (GRAY)
img = cv2.resize(img, (320, 120)) # Reduce image size so training can be faster
X.append(img)
#Processing label in image path
category = path.split("/")[3]
label = int(category.split("_")[0][1])
y.append(label)
It throws the following error:
IndexError: list index out of range
The code has not been changed, for the most part, and the dataset is the same. The only difference is I'm running locally vs google colab. I searched online and someone said do len(path) to verify that (in my case) it goes up to [3], which it does (its size 33).
Code has changed here:
I did not use this line, as I'm not using google colab:
from google.colab import files
The "files" is used in this part of code:
# We need to get all the paths for the images to later load them
imagepaths = []
# Go through all the files and subdirectories inside a folder and save path to images inside list
for root, dirs, files in os.walk(".", topdown=False):
for name in files:
path = os.path.join(root, name)
if path.endswith("png"): # We want only the images
imagepaths.append(path)
print(len(imagepaths)) # If > 0, then a PNG image was loaded
On my local machine, I removed the from google.colab... line, and ran everything else normally. The keyword files is used in the code snippet above, however when running it I was thrown no errors. **NOTE len(path) on Jupyter shows 33, len(path) on Google shows 16..?**
Does anyone have any idea what the issue could be? I don't think it came from removing that one line of code. If it did, what do you suggest I do to fix it?
Your local machine is running on Windows while the colab runs on linux and the path separators are different for both.
Now you need to replace
category = path.split("/")[3]
with
category = path.split("\\")[2]
And your code should work.

Augmenting images in a dataset - encountering ValueError: Could not find a format to read the specified file in mode 'i'

I'm in a beginner neural networks class and am really struggling.
I have a dataset of images that isn't big enough to train my network with, so I'm trying to augment them (rotate/noise addition etc.) and add the augmented images onto the original set. I'm following the code found on Medium: https://medium.com/#thimblot/data-augmentation-boost-your-image-dataset-with-few-lines-of-python-155c2dc1baec
However, I'm encountering ValueError: Could not find a format to read the specified file in mode 'i'
Not sure what this error means or how to go about solving it. Any help would be greatly appreciated.
import random
from scipy import ndarray
import skimage as sk
from skimage import transform
from skimage import util
path1 = "/Users/.../"
path2 = "/Users/.../"
listing = os.listdir(path1)
num_files_desired = 1000
image = [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))]
num_generated_files = 0
while num_generated_files <= num_files_desired:
image_path = random.choice(image)
image_to_transform = sk.io.imread(image_path)
137 if format is None:
138 raise ValueError(
--> 139 "Could not find a format to read the specified file " "in mode %r" % mode
140 )
141
ValueError: Could not find a format to read the specified file in mode 'i'
I can see few possiblities. Before passing to them. I'd like to express what is your error. It's basically an indicator that your images cannot be read by sk.io.imread(). Let me pass to the possible things to do:
Your [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))] part may not give the image path correctly. You have to correct it manually. If so, you can manually give the exact folder without doing such kind of a loop. Just simply use os.listdir() and read the files manually.
You can also use glob to read the files that having same extension like .jpg or stuff.
Your files may be corrupted. You can simply eliminate them by using PIL and read the images with PIL like image = Image.open() first and use image.verify() method.
Try to read about sk.io.imread(filename, plugin='' the plugin part may resolve your issue.
Hope it helps.

Saving images on external hard drive

I have a simple loop to download a large number of images (1,5 million). The images itself are small, but I estimated that the total size will be 250 GB, which is too much for my HDD.
I got an external HDD, but even though the code runs without errors, the designated image folder is empty!
I tried the same code for a direction on my internal HDD and it works fine, slowly retrieving the images. Interestingly, the code reads the .csv file from the external HDD, so reading seems to not problem.
Any idea what I could do?
import os
import pandas as pd
import urllib
# change paths and dependencies:
file_name = "ID_with_image_links.csv"
file_path = "/Volumes/Extreme SSD/"
path_for_images = "/Volumes/Extreme SSD/images"
os.chdir(file_path)
df = pd.read_csv(file_name)
total_len = len(df)
os.chdir(path_for_images)
df = df.head(10) # this is for try-out
n = 1
for index, row in df.iterrows():
id = str(row['ID'])
im_num = str(row["Image Number"])
link = str(row["Links"])
urllib.request.urlretrieve(link, (id + "_" + im_num + ".jpg"))
print("Image", n, "of ", total_len, "downloaded")
n = n +1
Try setting the directory writable. I figure you are using macOS?
You can set the directories rights to read/write by using chmod 666 /Volumes/Extreme SSD/images/ on the terminal as root.
At least on BSD (and macOS is based on that) mounting an external drive is read only by default IIRC.

imread_collection There is no item

I am trying to read several images from archive with skimage.io.imread_collection, but for some reason it throws an error:
"There is no item named '00071198d059ba7f5914a526d124d28e6d010c92466da21d4a04cd5413362552/masks/*.png' in the archive".
I checked several times, such directory exists in archive and with *.png I just specify that I want to have all images in my collection, and imread_collection works well, when I am trying to download images not from archive, but from extracted folder.
#specify folder name
each_img_idx = '00071198d059ba7f5914a526d124d28e6d010c92466da21d4a04cd5413362552'
with zipfile.ZipFile('stage1_train.zip') as archive:
mask_ = skimage.io.imread_collection(archive.open(str(each_img_idx) + '/masks/*.png')).concatenate()
May some one explain me, what's going on?
Not all scikit-image plugins support reading from bytes, so I recommend using imageio. You'll also have to tell ImageCollection how to access the images inside the archive, which is done using a customized load_func:
from skimage import io
import imageio
archive = zipfile.ZipFile('foo.zip')
images = [f.filename for f in zf.filelist]
def zip_imread(fn):
return imageio.imread(archive.read(fn))
ic = io.ImageCollection(images, load_func=zip_imread)
ImageCollection has some benefits like not loading all images into memory at the same time. But if you simply want a long list of NumPy arrays, you can do:
collection = [imageio.imread(zf.read(f)) for f in zf.filelist]

Categories