How to use spectral python to handle multispectral raster files? - python

I'm interested in using Spectral Python (SPy) to visualize and classify multiband raster GeoTIFF (not hyperspectral data). Currently it appaers that only .lan, .gis File Formats are readable.
I've tried to convert files to .lan with gdal_translate but the image format is not supported( IOError: Unable to determine file type or type not supported).
Any idea how to use this library for non hypersperctral dataset?

Convert the GeoTIFF file to a compatible format (e.g. LAN). This can be done in one of two ways. From a system shell, use gdal_translate:
gdal_translate -of LAN file.tif file.lan
Or similar within Python:
from osgeo import gdal
src_fname = 'file.tif'
dst_fname = 'file.lan'
driver = gdal.GetDriverByName('LAN')
sds = gdal.Open(src_fname)
dst = driver.CreateCopy(dst_fname, sds)
dst = None # close dataset; the file can now be used by other processes
Note that the first method is actually better, as it also transfers other metadata, such as the spatial reference system and possibly other data. To correctly do the same in Python would require adding more lines of code.

Related

Adding custom extratags with tifffile

I'm trying to write a script to simplify my everyday life in the lab. I operate one ThermoFisher / FEI scanning electron microscope and I save all my pictures in the TIFF format.
The microscope software is adding an extensive custom TiffTag (code 34682) containing all the microscope / image parameters.
In my script, I would like to open an image, perform some manipulations and then save the data in a new file, including the original FEI metadata. To do so, I would like to use a python script using the tifffile module.
I can open the image file and perform the needed manipulations without problems. Retrieving the FEI metadata from the input file is also working fine.
I was thinking to use the imwrite function to save the output file and using the extratags optional argument to transfer to the output file the original FEI metadata.
This is an extract of the tifffile documentation about the extratags:
extratags : sequence of tuples
Additional tags as [(code, dtype, count, value, writeonce)].
code : int
The TIFF tag Id.
dtype : int or str
Data type of items in 'value'. One of TIFF.DATATYPES.
count : int
Number of data values. Not used for string or bytes values.
value : sequence
'Count' values compatible with 'dtype'.
Bytes must contain count values of dtype packed as binary data.
writeonce : bool
If True, the tag is written to the first page of a series only.
Here is a snippet of my code.
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
input_tags['FEI_HELIOS'].value, True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
This code is not working and complaining that the value of the extra tag should be ASCII 7-bit encoded. This looks already very strange to me because I haven't touched the metadata and I am just copying it to the output file.
If I convert the metadata tag value in a string as below:
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
str(input_tags['FEI_HELIOS'].value), True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
the code is working, the image is saved, the metadata corresponding to 'FEI_HELIOS' is created but it is empty!
Can you help me in finding what I am doing wrongly?
I don't need to use tifffile, but I would prefer to use python rather than ImageJ because I have already several other python scripts and I would like to integrate this new one with the others.
Thanks a lot in advance!
toto
ps. I'm a frequent user of stackoverflow, but this is actually my first question!
In principle the approach is correct. However, tifffile parses the raw values of certain tags, including FEI_HELIOS, to dictionaries or other Python types. To get the raw tag value for rewriting, it needs to be read from file again. In these cases, use the internal TiffTag._astuple function to get an extratag compatible tuple of the tag, e.g.:
import tifffile
with tifffile.TiffFile('FEI_SEM.tif') as tif:
assert tif.is_fei
page = tif.pages[0]
image = page.asarray()
... # process image
with tifffile.TiffWriter('copy1.tif') as out:
out.write(
image,
photometric=page.photometric,
compression=page.compression,
planarconfig=page.planarconfig,
rowsperstrip=page.rowsperstrip,
resolution=(
page.tags['XResolution'].value,
page.tags['YResolution'].value,
page.tags['ResolutionUnit'].value,
),
extratags=[page.tags['FEI_HELIOS']._astuple()],
)
This approach does not preserve Exif metadata, which tifffile cannot write.
Another approach, since FEI files seem to be written uncompressed, is to directly memory map the image data in the file to a numpy array and manipulate that array:
import shutil
import tifffile
shutil.copyfile('FEI_SEM.tif', 'copy2.tif')
image = tifffile.memmap('copy2.tif')
... # process image
image.flush()
Finally, consider tifftools for rewriting TIFF files where tifffile is currently failing, e.g. Exif metadata.

Errno 20: Not a directory when saving into zip file

When I try to save a pyplot figure as a jpg, I keep getting a directory error saying that the given file name is not a directory. I am working in Colab. I have a numpy array called z_img and have opened a zip file.
import matplotlib.pyplot as plt
from zipfile import ZipFile
zipObj = ZipFile('slices.zip', 'w') # opening zip file
plt.imshow(z_img, cmap='binary')
The plotting works fine. I did a test of saving the image into Colab's regular memory like so:
plt.savefig(str(ii)+'um_slice.jpg')
And this works perfectly, except I am intending to use this code in a for loop. ii is an index to differentiate between each image, and several hundred images would be created so I want them going in the zipfile. Now when I try adding the path to the zipfile:
plt.savefig('/content/slices.zip/'+str(ii)+'um_slice.jpg')
I get: NotADirectoryError: [Errno 20] Not a directory: '/content/slices.zip/150500um_slice.jpg'
I assume it's because the {}.jpg string is a filename, and not a directory per se. But I am quite new to Python, and don't know how to get the plot into the zip file. That's all I want. Would love any advice!
First off, for anything that's not photographic content (ie. nice and soft), JPEG is the wrong format. You'll have a better time using a different file format. PNG is nice for pixels, SVG for vector graphics (in case you embed this in a website later!), PDF for vector, too.
The error message is quite on point: you cannot just save to a zip file as if it was a directory.
Multiple ways around:
use the tempfile module's mkdtemp to make a temporary directory, save into that, and zip the result
save not into a filename, but into a buffer (BytesIO I guess) and append that to the compressed stream (I'm not too familiar with ZipFile)
use PDF as output and simply generate a multipage PDF; it's not hard, and probably much nicer in the long term. You can still convert that vector graphic result to PNG (or any other pixel format9 as desired, but for the time being, it's space efficient, arbitrarily scaleable and keeps all your pages in one place. It's easy to import selected pages into LaTeX (matter of fact, \includegraphics does it directly) or into websites (pdf.js).
From the docs, matplotlib.pyplot.savefig accepts a binary file-like object. ZipFile.open creates binary file like objects. These two have to get todgether!
with zipobj.open(str(ii)+'um_slice.jpg', 'w') as fp:
plt.savefig(fp)

how to read .gdf file using Python

I am working on EEG signal data stored in a ".gdf" file, for my university project. My aim is to open that file using Python. Till now, I am able to open that file using MNE package. The code is :
import os
import numpy as np
import mne
raw=mne.io.read_raw_gdf('1.gdf')
print(raw.info)
As a result, I am getting :
Extracting EDF parameters from C:\Users\Gamer\Desktop\1.gdf...
GDF file detected
Setting channel info structure...
Creating raw.info structure...
<Info | 7 non-empty values
bads: []
ch_names: AFz, F3, F1, Fz, F2, F4, FFC5h, FFC3h, FFC1h, FFC2h, FFC4h, ...
chs: 64 EEG
custom_ref_applied: False
highpass: 0.0 Hz
lowpass: 128.0 Hz
meas_date: 2017-04-04 12:50:01 UTC
nchan: 64
projs: []
sfreq: 256.0 Hz
>
Now, my questions are:
how can I get the values in tabular form?
How do I know the dimension of the dataset using Python?
Is there any way to convert the .gdf file into .csv file or any other format (like pandas dataframe)?
The data set description is available at http://bnci-horizon-2020.eu/database/data-sets/001-2019/dataset_description_v1-1.pdf
Welcome to stack overflow and mne-python! :)
How do I know the dimension of the dataset using Python?
If you try to print the raw file that you've read (instead of just its info attribute) you should be able to see the dimensions. Raw files are always stored in mne python channels-first, so the dimensions of the data array are channels x samples.
how can I get the values in tabular form?
If you are fine with an array you can get it via .get_data() method (see the docs here). If you prefer pandas dataframe you can get it by raw.to_data_frame() (docs).
But before getting the data array/table you might want to perform filtering (for example raw.filter(1, None)), annotate bad data segments (tutorial), interpolate bad channels (tutorial) and perform ICA (tutorial). In general, it is likely that the analysis you are trying to conduct is either implemented in mne or easier to perform using mne objects.
Make sure to see the many rich tutorials and examples in mne docs.
If you have any further problems we use Discourse now: https://mne.discourse.group/

How to convert numpy array to bytes object without save audio file on disk?

I am now learning to build a TTS project based on Tacotron-2.
Here, the original code in save_wav(wav, path, sr) function has a step to save a numpy array to .wav file by using
wav *= 32767 / max(0.01, np.max(np.abs(wav)))
scipy.io.wavfile.write(path, hparams.sample_rate, wav.astype(np.int16))
However, after obtained a numpy array using wav *= 32767 / max(0.01, np.max(np.abs(wav))), I want to convert it to a .mp3 file so that it will be easier to send it back as streaming response.
Right now, I can convert .wav bytes object to a .mp3 file, but the problem is that I don't know how to convert the numpy array to a .wav bytes object.
I searched about it and found that it seems like I need to set a header for the numpy array, but in almost all posts that I looked into indicated using modules like scipy.io.wave and audioop, which will first save the numpy array to a .wav file and then with open('filename.wav', 'rb').
(This is the link for scipy.io.wavfile.write module, where the filename param should be string or open file handle which, from my understanding, the generated .wav file will be saved on disk.)
Could anyone give any suggestion on how to achieve this?
Use io.BytesIO
There is a much simpler and more convenient solution using a little hack creating i/o interface of bytes. We can use it like file for write and read:
import io
from scipy.io.wavfile import write
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, <audio_sr>, <audio_numpy_array>)
result_bytes = byte_io.read()
Use your data sample rate and values array instead of <audio_sr> and <audio_numpy_array>.
You can operate with result_bytes as bytes of .wav file (as required).
P.S. Also check this simple gist of how to perform values array -> bytes -> values array for wav file.
I finally solved this problem by modifying and creating new modules based on scipy.io.wavfile.write and audio_segment.py of pydub.
Beside, when you want to do operation on wave/mp3 bytes without saving them as a .wav/.mp3 file (normally by using some handful APIs or python package module), you should manually add header for it. It will not be a too-tough task if you look into those excellent package source codes.

Write Latitude and Longitude to Geotiff file

I am basically trying to achieve the opposite of this question.
I have a set of latitude and longitude coordinates (with values) in the WGS84 coordinate system, that I would like to write to a geotiff (or just add to a gdal dataset) via the gdal python bindings.
For example, my starting data might be:
lat = np.array([45.345,56.267,23.425])
lon = np.array([134.689,128.774,111.956])
value = np.array([3.0,6.2,2.5])
How might one do this? Thanks!
Although it is not in your question, it appears you need to project the lat/long data from the WGS84 datum to a UTM projection. This can be using the ogr2ogr command line from GDAL using the two options -a_srs 4326 -t_srs ???? (the target SRID). It can also be done internally with Python using the OGR module of GDAL. Here is an example of use.
There are two independent ways to get a raster from point data. The first is to interpolate the values in data, so that the values flood the region (or sometimes only the convex hull). There are many methods and tools to interpolate values in 2D. With GDAL, a comand-line tool gdal_grid is useful for this purpose, although I don't think it is possible to use from Python. Probably the simplest would be to use scipy.interpolate. Once you have a 2D NumPy array, it is simple to create a raster file with GDAL/Python.
The second method of converting the points to a raster is to burn the point locations to pixels on a raster. Unlike the first method, only the locations where the points are have values, while the values are not interpolated anywhere else in the raster. Rasterising or burning vectors into a raster can be done from a GDAL command line tool gdal_rasterize. It can also be done internally with GDAL/Python, here is an example.
It is possible to use gdal_grid from Python. I am using it.
All you need to do is construct the command as if you were using it from the command line and put it inside a subprocess.call(com, shell=True). You need to import subprocess module first.
This is actually how I am using it:
pcall= "gdal_grid --config 'NUM_THREADS=ALL_CPUS GDAL_CACHEMAX=2000'\
-overwrite -a invdist:power=2.0:smoothing=2.0:radius1=360.0:radius2=360.0\
-ot UInt16 -of GTiff -outsize %d %d -l %s -zfield 'Z' %s %s "%(npx, npy,\
lname,ptshapefile,interprasterfile)
subprocess.call(pcall, shell= True)
NUM_THREADS option is available from gdal 1.10+

Categories