How to Load .mat Folder and files - python

I am trying to load a .mat dataset into my dataframe. So, I am only able to load a single file at a time from the Folder TrainingSet1 with
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
data = loadmat('A2001.mat')
And i am able to see the data in it, but how am i supposed to load the whole TrainingSet1 Folder, so that i can view the whole thing.
Also, how could I view the .mat files as images?
Heres my code,
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.metrics import error_rate
from mat4py import loadmat
from pylab import*
import matplotlib
import os
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
data = loadmat('A2001.mat')
data
{'ECG': {'sex': 'Male', 'age': 68,
'data': [[0.009784321006571624,
0.006006033870606647,
...This is roughly how the data looks like
imshow('A2001.mat',[])
---------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-23bbdf3a7668> in <module>
----> 1 imshow('A2001.mat',[])...A long error is displayed
TypeError: unhashable type: 'list'
Thanks for any help

It hard to tell from your post what is the input format, and what is your desired output format.
I am giving you an example of reading all the .mat files in the folder, and an example of how to show data['data'] as image.
I hope the example is enough for you to keep advancing by your own.
I created a sample data set 'A2001.mat', 'A2002.mat', 'A2003.mat' using MATLAB.
In case you have MATLAB installation, I recommend you to execute the following code for creating a sample input (in order for the Python sample to be reproducible):
ECG.sex = 'Male';
ECG.age = 68;
data = im2double(imread('cameraman.tif')) / 10; % Divide by 10 for simulating range [0, 0.1] instead of [0, 1]
save('A2001.mat', 'ECG', 'data');
ECG.sex = 'Male';
ECG.age = 46;
data = im2double(imread('cell.tif'));
save('A2002.mat', 'ECG', 'data');
ECG.sex = 'Female';
ECG.age = 54;
data = im2double(imread('tire.tif'));
save('A2003.mat', 'ECG', 'data');
The Python code sample does the following:
Get a list of all mat files in the folder using glob.glob('*.mat').
Iterate mat files, load data from the files, and append the data to a list.
The result of the loop is a list named alldata, containing data from all mat files.
Iterate alldata and showing data['data'] as an image.
(Assuming data['data'] is the matrix you want to show as an image).
Here is the code:
from matplotlib import pyplot as plt
from mat4py import loadmat
import glob
import os
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
# Get a list for .mat files in current folder
mat_files = glob.glob('*.mat')
# List for stroring all the data
alldata = []
# Iterate mat files
for fname in mat_files:
# Load mat file data into data.
data = loadmat(fname)
# Append data to the list
alldata.append(data)
# Iterate alldata elelemts, and show images
for data in alldata:
# Assume image is stored in matrix named data in MATLAB.
# data['data'], access data with string 'data', becuase data is a dictionary
img = data['data']
# Show data as image using matplotlib
plt.imshow(img, cmap='gray')
plt.show(block=True) # Show image with "blocking"
Update:
The ECG data is not an image but a list of 12 data samples.
The internal structure of the data (after data = loadmat(fname)) is:
Parent dictionary named data.
data contains a dictionary in data['ECG'].
data['ECG']['data'] is a list of 12 lists.
The following code iterates the mat files and displays the ECG data as a graph:
from matplotlib import pyplot as plt
from mat4py import loadmat
import glob
import os
import numpy as np
os.chdir('/Users/Ashi/Downloads/TrainingSet2')
# Get a list for .mat files in current folder
mat_files = glob.glob('*.mat')
# List for stroring all the data
alldata = []
# Iterate mat files
for fname in mat_files:
# Load mat file data into data.
data = loadmat(fname)
# Append data to the list
alldata.append(data)
# Iterate alldata elelemts, and show images
for data in alldata:
# The internal structure of the data is a dictionary with a dictionary.
ecg = data['ECG']
data = ecg['data'] # Data is a list of lists
# Convert data to NumPy array
ecg_data = np.array(data)
# Show data as image using matplotlib
#plt.imshow(img, cmap='gray')
plt.plot(ecg_data.T) # Plot the data as graph.
plt.show(block=True) # Show image with "blocking"
Result:
A0001.mat:
A0002.mat:
Graph with labels:
# Iterate alldata elements, and show images
for data in alldata:
# The internal structure of the data is a dictionary with a dictionary.
ecg = data['ECG']
data = ecg['data'] # Data is a list of lists
# Convert data to NumPy array
#ecg_data = np.array(data)
# Show data as graph using matplotlib
# Iterate data list:
for i in range(len(data)):
# Plot the data as graph.
# Set labels d0, d1, d2...
plt.plot(data[i], label='d'+str(i))
plt.legend() # Add legend
plt.show(block=True) # Show image with "blocking"
Result:

Related

How to save multiple images generated using matplotlib & imshow in a folder

I'm working with matplotlib, specifically its imshow() operation. I have a multi-dimension array generated by random.rand function from NumPy.
data_array = np.random.rand(63, 4, 4, 3)
Now I want to generate images using the imshow() function from matplotlib using every 63 entries of this data array, and even this code below has generated the desired image I wanted.
plt.imshow(data_array[0]) #image constructed for the 1st element of the array
Now I wanted to save all the images produced using the imshow() from the array's entries and save those in a specific folder on my computer with a specific name. I tried with this code below.
def create_image(array):
return plt.imshow(array, interpolation='nearest', cmap='viridis')
count = 0
for i in data_array:
count += 63
image_machine = create_image(i)
image_machine.savefig('C:\Users\Asus\Save Images\close_'+str(count)+".png")
Using this code, I want to save each image produced using each entry of data_array using the imshow() function and save it to the specific folder 'C:\Users\Asus\Save Images' with a particular name encoding like 1st image will be saved as close_0.png
Please help me in this saving step, I'm stuck here.
You can do the following:
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
output_folder = "path/to/folder/"
data_array = np.random.rand(63, 4, 4, 3)
for index, array in enumerate(data_array):
fig = plt.figure()
plt.imshow(array, interpolation="nearest", cmap="viridis")
fig.savefig(Path(output_folder, f"close_{index}.png"))
plt.close()
I have added the plt.close otherwise you will end up with a lot of images simultaneously open.

How can I label my data in dict format to prepare for deep learning train and test?

I have 3222 .wav data structured as below as a variable output :
final = {'a0001.wav':Numpy array, 'a0002.wav':Numpay array, ... ,'a3222.wav':Numpy array}
*All Numpy array size is 99X160
My question is how can I add the label of each .wav for example 'a0001.wav' label is 1 or 'a0002.wav' label is 0, etc.
And how can I split the final labeled data to train and test parts?
Below is the full code :
import glob
from scipy.io import wavfile
import numpy as np
# Data augmentation
def augmentation(signal, split_size):
split_signal={}
split_list=[j*split_size for j in range(int(len(signal)/split_size))]
for item in range(len(split_list)-1):
split_signal[item]=signal[int(split_list[item]):int(split_list[item+1])]
Num_split=int(len(signal)/split_size)
return split_signal, Num_split
# Variable defenition
aug_sig={}
final={}
aug_len=160
# Import file
files = glob.glob('*.wav')
for item in files:
fs, data = wavfile.read(item)
aug_sig[item]=augmentation(data[int(len(data)/2)-8000:int(len(data)/2)+8000], aug_len)
arr=[]
for i in range(aug_sig[item][1]-1):
arr=np.concatenate((arr, aug_sig[item][0][i].transpose()), axis=0)
final[item]=arr.reshape(aug_sig[item][1]-1,aug_len)

matlab data file to pandas DataFrame [duplicate]

This question already has answers here:
Read .mat files in Python
(15 answers)
Closed 6 years ago.
Is there a standard way to convert matlab .mat (matlab formated data) files to Panda DataFrame?
I am aware that a workaround is possible by using scipy.io but I am wondering whether there is a straightforward way to do it.
I found 2 way: scipy or mat4py.
mat4py
Load data from MAT-file
The function loadmat loads all variables stored in the MAT-file into a
simple Python data structure, using only Python’s dict and list
objects. Numeric and cell arrays are converted to row-ordered nested
lists. Arrays are squeezed to eliminate arrays with only one element.
The resulting data structure is composed of simple types that are
compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
data = loadmat('datafile.mat')
From:
https://pypi.python.org/pypi/mat4py/0.1.0
Scipy:
Example:
import numpy as np
from scipy.io import loadmat # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd
mat = loadmat('measured_data.mat') # load mat-file
mdata = mat['measuredData'] # variable in mat file
mdtype = mdata.dtype # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
# elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
index=[datetime(*ts) for ts in ndata['timestamps']],
columns=columns)
From:
http://poquitopicante.blogspot.fr/2014/05/loading-matlab-mat-file-into-pandas.html
Finally you can use PyHogs but still use scipy:
Reading complex .mat files.
This notebook shows an example of reading a Matlab .mat file,
converting the data into a usable dictionary with loops, a simple plot
of the data.
http://pyhogs.github.io/reading-mat-files.html
Ways to do this:
As you mentioned scipy
import scipy.io as sio
test = sio.loadmat('test.mat')
Using the matlab engine:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat",nargout=1)

Save contour images generated in a loop as a single pdf file (2 images per page preferably)

I have written this code which will generate a number of contour plots, each of which corresponds to a single text file. I have multiple text files. Currently, I am able to generate all of the images separately in png format without any issues.
When I try to save the images as a pdf file, it is saving only the last image generated in a loop.I tried using the PdfPages package. This question is similar to the one that I posted before but with a different question. Similar
Issue: I want to able to generate all of the images into a single pdf file automatically from python. So for eg. if I have 100 text files, then I want to save all of the 100 images onto a single pdf file.Also ideally I want to save 2 images in a single page in the pdf file. There are some questions in SO about this, but I couldn't find an appropriate solution for my issue. Since I have many case for which I have to generate the images, I want to save them as a single pdf file as it is more easier to analyze them. I would appreciate any suggestions/advice to help me with this.
This is link for the sample text file Sample Text
ges
from __future__ import print_function
import numpy as np
from matplotlib import pyplot as plt
from scipy.interpolate import griddata
from matplotlib.backends.backend_pdf import PdfPages
path = 'location of the text files'
FT_init = 5.4311
delt = 0.15
TS_init = 140
dj_length = 2.4384
def streamfunction2d(y,x,Si_f,q):
with PdfPages('location of the generated pdf') as pdf:
Stf= plt.contour(x,y,Si_f,20)
Stf1 = plt.colorbar(Stf)
plt.clabel(Stf,fmt='%.0f',inline=True)
plt.figtext(0.37,0.02,'Flowtime(s)',style= 'normal',alpha=1.0)
plt.figtext(0.5,0.02,str(q[p]),style= 'normal',alpha=1.0)
plt.title('Streamfunction_test1')
plt.hold(True)
plt.tight_layout()
pdf.savefig()
path1 = 'location where the image is saved'
image = path1+'test_'+'Stream1_'+str((timestep[p]))+'.png'
plt.savefig(image)
plt.close()
timestep = np.linspace(500,600,2)
flowtime = np.zeros(len(timestep))
timestep = np.array(np.round(timestep),dtype = 'int')
###############################################################################
for p in range(len(timestep)):
if timestep[p]<TS_init:
flowtime[p] = 1.1111e-01
else:
flowtime[p] = (timestep[p]-TS_init)*delt+FT_init
q = np.array(flowtime)
timestepstring=str(timestep[p]).zfill(4)
fname = path+"ddn150AE-"+timestepstring+".txt"
f = open(fname,'r')
data = np.loadtxt(f,skiprows=1)
data = data[data[:, 1].argsort()]
data = data[np.logical_not(data[:,11]== 0)]
Y = data[:,2] # Assigning Y to column 2 from the text file
limit = np.nonzero(Y==dj_length)[0][0]
Y = Y[limit:]
Vf = data[:,11]
Vf = Vf[limit:]
Tr = data[:,9]
Tr = Tr[limit:]
X = data[:,1]
X = X[limit:]
Y = data[:,2]
Y = Y[limit:]
U = data[:,3]
U = U[limit:]
V = data[:,4]
V = V[limit:]
St = data[:,5]
St = St[limit:]
###########################################################################
## Using griddata for interpolation from Unstructured to Structured data
# resample onto a 300x300 grid
nx, ny = 300,300
# (N, 2) arrays of input x,y coords and dependent values
pts = np.vstack((X,Y )).T
vals = np.vstack((Tr))
vals1 = np.vstack((St))
# The new x and y coordinates for the grid
x = np.linspace(X.min(), X.max(), nx)
y = np.linspace(Y.min(), Y.max(), ny)
r = np.meshgrid(y,x)[::-1]
# An (nx * ny, 2) array of x,y coordinates to interpolate at
ipts = np.vstack(a.ravel() for a in r).T
Si = griddata(pts, vals1, ipts, method='linear')
print(Ti.shape,"Ti_Shape")
Si_f = np.reshape(Si,(len(y),len(x)))
print(Si_f.shape,"Streamfunction Shape")
Si_f = np.transpose(Si_f)
streamfunction2d(y,x,Si_f,q)
Edit : As you mentioned matplotlib is probably able to handle everything by itself using PdfPages function. See this related answer. My original answer is a hack.
I think the error in your code is that you are creating another PdfPage object each time you go through the loop. My advice would be to add the PdfPage object as an argument to your streamfunction2d function and create the PdfPage object once and for all before the loop (using a with statement as in the documentation seems a good idea).
Example:
def streamfunction2d(y,x,Si_f,q,pdf):
# (...)
pdf.savefig(plt.gcf())
with PdfPages('output.pdf') as pdf:
for p in range(len(timestep)):
# (...)
streamfunction2d(y,x,Si_f,q,pdf)
Original answer:
Here is a quick and dirty solution using the pdfunite software.
from matplotlib import pyplot as plt
import numpy as np
import subprocess
import os
X = np.linspace(0,1,100)
for i in range(10):
# random plot
plt.plot(X,np.cos(i*X))
# Save each figure as a pdf file.
plt.savefig("page_{:0}.pdf".format(i))
plt.clf()
# Calling pdfunite to merge all the pages
subprocess.call("pdfunite page_*.pdf united.pdf",shell=True)
# Removing temporary files
for i in range(10):
os.remove("page_{:0}.pdf".format(i))
It uses two things:
You can save your figures as pdf using matplotlib's savefig command.
You can call other programs using the subprocess library. I used pdfunite to merge all the pages. Be sure it is available on your machine !
If you want to have several graph by page, you can use subplots.
Alternatively, you could use another python library (such as pyPDF) to merge the pages, but it would require slightly more code. Here is an (untested) example:
from matplotlib import pyplot as plt
import numpy as np
from pyPdf import PdfFileWriter, PdfFileReader
# create an empty pdf file
output = PdfFileWriter()
X = np.linspace(0,1,100)
for i in range(10):
# random plot
plt.plot(X,np.cos(i*X))
# Save each figure as a pdf file.
fi = "page_{:0}.pdf".format(i)
plt.savefig(fi)
plt.clf()
# add it to the end of the output
input = PdfFileReader(file(fi, "rb"))
output.addPage(input.getPage(0))
# Save the resulting pdf file.
outputStream = file("document-output.pdf", "wb")
output.write(outputStream)

How do I import tif using gdal?

How do I import tif using gdal?
I'm trying to get my tif file in a usable format in Python, so I can analyze the data. However, every time I import it, I just get an empty list. Here's my code:
xValues = [447520.0, 432524.0, 451503.0]
yValues = [4631976.0, 4608827.0, 4648114.0]
gdal.AllRegister()
dataset = gdal.Open('final_snow.tif', GA_ReadOnly)
if dataset is None:
print 'Could not open image'
sys.exit(1)
data = np.array([gdal.Open(name, gdalconst.GA_ReadOnly).ReadAsArray() for name, descr in dataset.GetSubDatasets()])
print 'this is data ', data`
It always prints an empty list, but it doesn't throw an error. I checked out other questions, such as [this] (Create shapefile from tif file using GDAL) What might be the problem?
For osgeo.gdal, it should look like this:
from osgeo import gdal
gdal.UseExceptions() # not required, but a good idea
dataset = gdal.Open('final_snow.tif', gdal.GA_ReadOnly)
data = dataset.ReadAsArray()
Where data is either a 2D array for 1-banded rasters, or a 3D array for multiband.
An alternative with rasterio looks like:
import rasterio
with rasterio.open('final_snow.tif', 'r') as r:
data = r.read()
Where data is always a 3D array, with the first dimension as band index.

Categories