skimage.external.TiffWriter: naming importance - python

I need to save images as a .tif. I cannot install a new python package (university-controlled computer, with Anaconda), so I decide to use the available Tifffile present in skimage.external.
Here is the trouble:
The following code works nice and saves the 3 channels of my image:
from skimage.external.tifffile import TiffWriter
import numpy as np
path = '26.01.2021 NLFK CPV timeseries A3B10 imaged 07.04.2021/26.01.2021 NLFK CPV timeseries A3B10 imaged 07.04.2021/Processed'
result_img2 = np.zeros((3, 512, 512))
res = (6.006634765625e-08, 6.006634765625e-08)
raw_b = str('abcdefg')
meta = {}
for i, c in enumerate(result_img2):
with TiffWriter(path+'/'+'test_C'+str(i)+'.tif') as tif:
tif.save(np.array(c, dtype='uint8'), resolution=res, description=raw_b, metadata=meta)
But I just cannot keep my images named 'test'!
However, doing like this:
filename = "07.04.2021 NLFK nucleolin r488 A3B10 m546 DAPI mock Preview 2_nucleus1_"
with TiffWriter(path+'/'+filename+str(i)+'.tif') as tif:
tif.save(np.array(c, dtype='uint8'), resolution=res, description=raw_b, metadata=meta)
Results in an Errno 2 error.
I tried to check if this filename contains forbidden characters (from another post):
keepcharacters = (' ','.','_')
filename = "".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()
or the encoding filename.encode('utf8') and it also failed (again, Errno 2).
The name indicates the condition of each experiment and changes every time, so I need to save it like that.
After a quick check, I can save it at the root of the .py. But again, I need the file to be saved in their respective folder.
Any idea of where my trouble can be?

Related

Convert .IMG (Classic Disk Image) to .PNG/.JPG in Python

I have a dataset of 1,00,000+ .IMG files that I need to convert to .PNG / .JPG format to apply CNN for a simple classification task.
I referred to this answer and the solution works for me partially. What I mean is that some images are not properly converted. The reason for that, according to my understanding is that some images have a Pixel Depth of 16 while some have 8.
for file in fileList:
rawData = open(file, 'rb').read()
size = re.search("(LINES = \d\d\d\d)|(LINES = \d\d\d)", str(rawData))
pixelDepth = re.search("(SAMPLE_BITS = \d\d)|(SAMPLE_BITS = \d)", str(rawData))
size = (str(size)[-6:-2])
pixelDepth = (str(pixelDepth)[-4:-2])
print(int(size))
print(int(pixelDepth))
imgSize = (int(size), int(size))
img = Image.frombytes('L', imgSize, rawData)
img.save(str(file)+'.jpg')
Data Source: NASA Messenger Mission
.IMG files and their corresponding converted .JPG Files
Files with Pixel Depth of 8 are successfully converted:
Files with Pixel Depth of 16 are NOT properly converted:
Please let me know if there's any more information that I should provide.
Hopefully, from my other answer, here, you now have a better understanding of how your files are formatted. So, the code should look something like this:
#!/usr/bin/env python3
import sys
import re
import numpy as np
from PIL import Image
import cv2
rawData = open('EW0220137564B.IMG', 'rb').read()
# File size in bytes
fs = len(rawData)
bitDepth = int(re.search("SAMPLE_BITS\s+=\s+(\d+)",str(rawData)).group(1))
bytespp = int(bitDepth/8)
height = int(re.search("LINES\s+=\s+(\d+)",str(rawData)).group(1))
width = int(re.search("LINE_SAMPLES\s+=\s+(\d+)",str(rawData)).group(1))
print(bitDepth,height,width)
# Offset from start of file to image data - assumes image at tail end of file
offset = fs - (width*height*bytespp)
# Check bitDepth
if bitDepth == 8:
na = np.frombuffer(rawData, offset=offset, dtype=np.uint8).reshape(height,width)
elif bitDepth == 16:
dt = np.dtype(np.uint16)
dt = dt.newbyteorder('>')
na = np.frombuffer(rawData, offset=offset, dtype=dt).reshape(height,width).astype(np.uint8)
else:
print(f'ERROR: Unexpected bit depth: {bitDepth}',file=sys.stderr)
# Save either with PIL
Image.fromarray(na).save('result.jpg')
# Or with OpenCV may be faster
cv2.imwrite('result.jpg', na)
If you have thousands to do, I would recommend GNU Parallel which you can easily install on your Mac with homebrew using:
brew install parallel
You can then change my program above to accept a filename as parameter in-place of the hard-coded filename and the command to get them all done in parallel is:
parallel --dry-run script.py {} ::: *.IMG
For a bit more effort, you can get it done even faster by putting the code above in a function and calling the function for each file specified as a parameter. That way you can avoid starting a new Python interpreter per image and tell GNU Parallel to pass as many files as possible to each invocation of your script like this:
parallel -X --dry-run script.py ::: *.IMG
The structure of the script then looks like this:
def processOne(filename):
open, read, search, extract, save as per my code above
# Main - process all filenames received as parameters
for filename in sys.argv[1:]:
processOne(filename)

Defining a filename and calling the filename in various loops and functions

In short, I have written a code that opens up a file and does a number of modifications on it. However, I don't want to keep going through my script and renaming all the files when I want to open up a new file.
I'm thinking of setting a variable early on that defines the filename, i.e.
A=filename('png1.png')
B=filename('png2.png')
However, I don't quite know how to implement this. This is my current code:
import os
from os import path
import numpy as np
from PIL import Image
from wordcloud import WordCloud, STOPWORDS
#d=path.dirname(_file_) if "_file_" in locals() else os.getcwd()
os.chdir('C:/Users/Sams PC/Desktop/Word_Cloud_Scripts/Dmitrys Papers/Word_Cloud_Dmitry')
Document=open('Dmitry_all_lower.txt', 'r', encoding='utf-8')
text=Document.read()
heart_mask=np.array(Image.open("**png1.png**"))
print (heart_mask)
split= str('**png1.png**').rsplit('.')
extension=split[len(split)-1]
if extension == "png":
image = Image.open("**png1.png**")
image.convert("RGBA") # Convert this to RGBA if possible
canvas = Image.new('RGBA', image.size, (255,255,255,255)) # Empty canvas colour (r,g,b,a)
canvas.paste(image, mask=image) # Paste the image onto the canvas, using it's alpha channel as mask
#canvas.thumbnail([width, height], Image.ANTIALIAS)
canvas.save('**png2.png**')
from wand.image import Image
with Image(filename='**png2.png**') as img:
img.format='jpeg'
img.save(filename='**png1.jpg**')
from PIL import Image
heart_mask=np.array(Image.open("**png1.jpg**"))
else:
print ('')
print (heart_mask)
stopwords=set(STOPWORDS)
stopwords.update(["will", "us","protein","residue", "interaction","residues","using","proteins","thus","fig"])
wc= WordCloud(stopwords=stopwords, background_color="white",max_words=1000, mask=heart_mask, contour_width=3, contour_color='black')
print ('Generating Word Cloud')
wc.generate(text)
wc.to_file("Dmitry3.png")
import matplotlib.pyplot as plt
plt.figure()
plt.imshow(wc,interpolation="bilinear")
plt.axis("off")
print ('Generation Done')
plt.show()
I've put the entire thing just to see what's going on, but I've bolded (put stars next to), the files I'm trying to modify in my idea. As you can see, I have multiple calls to my file 'png1.png', and I also have calls to save a modified version of that file to 'png2.png' and later a jpeg version of it 'png1.jpg'. I don't want to have to go through my script each time and change each one individually. I was hoping to define them earlier such as A=png1, B=png2, C=jpg1 so that I can replace the calls in my loops with simply A B and C, and if I do choose a new image to upload, I simply change 1 or 2 lines rather than 5 or 6. I.E.
heart_mask=np.array(Image.open("A"))
split= str('A').rsplit('.')
image = Image.open("A")
canvas.save('B')
... so on and so forth
To make your task easier, perhaps you should establish a naming standard defining which files are to be modified, and which ones are already processed. Also, the images you are to process should have a dedicated directory for the purpose.
From what I understand in your code, PNG files are the ones getting processed, while the JPEG files are already done. You can use os.listdir() to traverse a list of files which have a .png extension, something similar to the one below:
for file in os.listdir( "/dedicated_image_dir" ):
if file.endswith(".png"):
# Process your PNG images here
That way, you wouldn't even need to change your code just to accommodate new PNG images with different filenames.

Augmenting images in a dataset - encountering ValueError: Could not find a format to read the specified file in mode 'i'

I'm in a beginner neural networks class and am really struggling.
I have a dataset of images that isn't big enough to train my network with, so I'm trying to augment them (rotate/noise addition etc.) and add the augmented images onto the original set. I'm following the code found on Medium: https://medium.com/#thimblot/data-augmentation-boost-your-image-dataset-with-few-lines-of-python-155c2dc1baec
However, I'm encountering ValueError: Could not find a format to read the specified file in mode 'i'
Not sure what this error means or how to go about solving it. Any help would be greatly appreciated.
import random
from scipy import ndarray
import skimage as sk
from skimage import transform
from skimage import util
path1 = "/Users/.../"
path2 = "/Users/.../"
listing = os.listdir(path1)
num_files_desired = 1000
image = [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))]
num_generated_files = 0
while num_generated_files <= num_files_desired:
image_path = random.choice(image)
image_to_transform = sk.io.imread(image_path)
137 if format is None:
138 raise ValueError(
--> 139 "Could not find a format to read the specified file " "in mode %r" % mode
140 )
141
ValueError: Could not find a format to read the specified file in mode 'i'
I can see few possiblities. Before passing to them. I'd like to express what is your error. It's basically an indicator that your images cannot be read by sk.io.imread(). Let me pass to the possible things to do:
Your [os.path.join(path2, f) for f in os.listdir(path2) if os.path.isfile(os.path.join(path2, f))] part may not give the image path correctly. You have to correct it manually. If so, you can manually give the exact folder without doing such kind of a loop. Just simply use os.listdir() and read the files manually.
You can also use glob to read the files that having same extension like .jpg or stuff.
Your files may be corrupted. You can simply eliminate them by using PIL and read the images with PIL like image = Image.open() first and use image.verify() method.
Try to read about sk.io.imread(filename, plugin='' the plugin part may resolve your issue.
Hope it helps.

Python Pillow error: No such file or directory

I have Pillow function for displaying images in Pandas dataframe on Windows machine.It works ok on testing dataset.
Pill function:
from PIL import Image
def get_thumbnail(path):
i = Image.open(path)
print(path)
i.show()
return i
Than I'm using that function to new create new Pandas column that should hold PIL image info. Image is generated based on image URL which is stored in another Pandas column:
adInfoListPD['Ad_thumb']
which looks like this:
> 0
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\03_ps4-pro-500-million-limited-edition-blau-transparent-bundle-29598325900_ps4-pro-500-million-limited-edition-blau-transparent-bundle-295983259__thumb_100x75.jpg
> 1
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\04_playstation-4-20th-anniversary-edition-ungeoeffnet-29586533000_playstation-4-20th-anniversary-edition-ungeoeffnet-295865330__thumb_100x75.jpg
> 2
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\05_playstation-4-20th-anniversary-sammleredition-ovp-29496806400_playstation-4-20th-anniversary-sammleredition-ovp-294968064__thumb_100x75.jpg
> 3
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\07_gratis-versand-alles-zusammen-xxxl-paket-29517022700_gratis-versand-alles-zusammen-xxxl-paket-295170227__thumb_100x75.jpg
> 4
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\08_groesste-ankauf-mit-sofortigem-bargeld-30099513000_groesste-ankauf-mit-sofortigem-bargeld-300995130__thumb_100x75.jpg
> 5
> C:\Users\user\Documents\001ML\willat\Games_Konsolen\09_wir-zahlen-sofort-bargeld-30099285800_wir-zahlen-sofort-bargeld-300992858__thumb_100x75.jpg
And I'm using this line to create column which will hold pill image:
adInfoListPD['image'] = adInfoListPD.Ad_thumb.map(lambda f: get_thumbnail(f)).
I get error:
FileNotFoundError: [Errno 2] No such file or directory:
'C:\\Users\\user\\Documents\\001ML\\willat\\Games_Konsolen\\03_ps4-pro-500-million-limited-edition-blau-transparent-bundle-29598325900_ps4-pro-500-million-limited-edition-blau-transparent-bundle-295983259__thumb_100x75.jpg'
I've double checked paths and they are ok.
I've also read all other posts about python path problems on windows. I think I'm passing path in proper way.
As I said, everything works ok on demo data, but it doesn't work with my data.
Path was the problem in the end.
I used:
os.path.join(dir_base,category_folder,dir_name,file_name_thumb)
to create proper path that would work on all platforms.
I was using that already but I didn't know os.path.join works with filenames as well. So I was adding it manually and therefore omitting "\\" in path between
dir_name and file_name_thumb.
And I also added "\\\\?\\" to path in Pil function:
def get_thumbnail(path):
path = "\\\\?\\"+path
i = Image.open(path)
return i
To fix problems with paths longer than 255 on Windows.
At the same time this is not suppose to mess up path interpretation on Linux machines. But I'm not sure about that...

imread_collection There is no item

I am trying to read several images from archive with skimage.io.imread_collection, but for some reason it throws an error:
"There is no item named '00071198d059ba7f5914a526d124d28e6d010c92466da21d4a04cd5413362552/masks/*.png' in the archive".
I checked several times, such directory exists in archive and with *.png I just specify that I want to have all images in my collection, and imread_collection works well, when I am trying to download images not from archive, but from extracted folder.
#specify folder name
each_img_idx = '00071198d059ba7f5914a526d124d28e6d010c92466da21d4a04cd5413362552'
with zipfile.ZipFile('stage1_train.zip') as archive:
mask_ = skimage.io.imread_collection(archive.open(str(each_img_idx) + '/masks/*.png')).concatenate()
May some one explain me, what's going on?
Not all scikit-image plugins support reading from bytes, so I recommend using imageio. You'll also have to tell ImageCollection how to access the images inside the archive, which is done using a customized load_func:
from skimage import io
import imageio
archive = zipfile.ZipFile('foo.zip')
images = [f.filename for f in zf.filelist]
def zip_imread(fn):
return imageio.imread(archive.read(fn))
ic = io.ImageCollection(images, load_func=zip_imread)
ImageCollection has some benefits like not loading all images into memory at the same time. But if you simply want a long list of NumPy arrays, you can do:
collection = [imageio.imread(zf.read(f)) for f in zf.filelist]

Categories