Load many files into one array - Python

Load many files into one array - Python - python

So, I have to load many .mat files with some features to plot it.
Each array to be plotted is loaded into a dictionary:
import numpy as np
import scipy.io as io
dict1 = io.loadmat('file1.MAT')
dict2 = io.loadmat('file2.MAT') # type = dict
dict3 = io.loadmat('file3.MAT')
...
so I have to take the dictionarie's element I need, to plot after:
array1 = dict1['data']
array2 = dict2['data']
array3 = dict3['data']
...
After this, I can plot the data. It works, but looks dumb to me (If I have 100 vectors, it will take some time...). Is there a better way to make this task?

Given that you are talking about dealing with many matrices, you should manage them as a collection. First, let's define your set of files. It could be a tuple, or a list:
Matrix_files = [ 'fileA.MAT', 'file1.MAT', 'no pattern to these names.MAT' ]
If they happen to have a pattern, you might try generating the names:
Matrix_files = [ 'file{}.MAT'.format(num) for num in range(1,4) ]
If they share a common location, you might consider using one of the various directory scanning approaches (opendir or glob, to name two).
Once you have a list of filenames, you can read the dictionaries in:
def read_matrix(filespec):
from scipy.io import loadmat
md = loadmat(filespec)
# process md
return md
With that, you can either get all the data, or get some of the data:
All_data = [read_matrix(f) for f in Matrix_files]
Some_data = [read_matrix(f)['data'] for f in Matrix_files]
If you only care about the data, you could skip the function definition:
from scipy.io import loadmat
Just_data = [loadmat(f)['data'] for f in Matrix_files]

Related

How to turn items from extracted data to numbers for plotting in python?

So i have a text document with a lot of values from calculations. I have extracted all the data and stored it in an array, but they are not numbers that I can use for anything. I want to use the number to plot them in a graph, but the elements in the array are text-strings, how would i turn them into numbers and remove unneccesary signs like commas and n= for instance?
Here is code, and under is my print statement.
import numpy as np
['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9', 'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])

I'd use the conversion method presented in this post within the extract function, so e.g.
...
delta_x.append(strtofloat(words[1]))
...
where you might as well do the conversion inline (my strtofloat is a function you'd have to write based on mentioned post) and within a try/except block, so failed conversions are just ignored from your list.
To make it more consistent, any conversion error should discard the whole line affected, so you might want to use intermediate variables and a check for each field.
Btw. I noticed the argument to the extract function, it would seem logical to make the argument a string containing the file name from which to extract the data?
EDIT: as a side note, you might want to look into pandas, which is a library specialised in numerical data handling. Depending on the format of your data file there are probably standard functions to read your whole file into a DataFrame (which is a kind of super-charged array class which can handle a lot of data processing as well) in a single command.

I would consider using regular expression:
import re
match_number = re.compile('-?[0-9]+\.?[0-9]*(?:[Ee]-?[0-9]+)?')
for line in infile:
words = line.split()
new_delta_x = float(re.search(match_number, words[1]).group())
new_abs_error = float(re.search(match_number, words[7]).group())
new_n = int(re.search(match_number, words[10]).group())
delta_x.append(new_delta_x)
abs_error.append(new_abs_error)
n.append(new_n)
But it seems like your data is already in csv format. So try using pandas.
Then read data into dataframe without header (column names will be integers).
import numpy as np
import pandas as pd
df = pd.read_csv('approx_derivative_sine.txt', header=None)
delta_x = df[1].to_numpy()
abs_error = df[7].to_numpy()
# if n is always number of the row
n = df.index.to_numpy(dtype=int)
# if n is always in the form 'n=<integer>'
n = df[10].apply(lambda x: x.strip()[2:]).to_numpy(dtype=int)
If you could post a few rows of your approx_derivative_sine.txt file, that would be useful.

From the given array in the question, If you would like to remove the 'n=' and convert each element to an integer, you may try the following.
import numpy as np
array = np.array(['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9',
'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])
array = [int(i.replace('n=', '')) for i in array]
print(array)

Numpy put .txt integervalues into an array [duplicate]

How can I import an array to python (numpy.arry) from a file and that way the file must be written if it doesn't already exist.
For example, save out a matrix to a file then load it back.

Checkout the entry on the numpy example list. Here is the entry on .loadtxt()
>>> from numpy import *
>>>
>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers
>>> t,z = data[:,0], data[:,3] # data is 2D numpy array
>>>
>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to unpack all columns
>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns
>>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file
>>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#'
>>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace
>>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats

Another option is numpy.genfromtxt, e.g:
import numpy as np
data = np.genfromtxt("myfile.dat",delimiter=",")
This will make data a numpy array with as many rows and columns as are in your file

(I know the question is old, but I think this might be good as a reference for people with similar questions)
If you want to load data from an ASCII/text file (which has the benefit or being more or less human-readable and easy to parse in other software), numpy.loadtxt is probably what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
If you just want to quickly save and load numpy arrays/matrices to and from a file, take a look at numpy.save and numpy.load:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

In Python, Storing a bare python list as a numpy.array and then saving it out to file, then loading it back, and converting it back to a list takes some conversion tricks. The confusion is because python lists are not at all the same thing as numpy.arrays:
import numpy as np
foods = ['grape', 'cherry', 'mango']
filename = "./outfile.dat.npy"
np.save(filename, np.array(foods))
z = np.load(filename).tolist()
print("z is: " + str(z))
This prints:
z is: ['grape', 'cherry', 'mango']
Which is stored on disk as the filename: outfile.dat.npy
The important methods here are the tolist() and np.array(...) conversion functions.

Have a look at SciPy cookbook. It should give you an idea of some basic methods to import /export data.
If you save/load the files from your own Python programs, you may also want to consider the Pickle module, or cPickle.

python reading in files for a specific column range

I'm writing a script that reads in one file containing a list of files and performing gaussian fits on each of those files. Each of these files is made up of two columns (wv and flux in the script below). My small issue here is how do I limit the range based "wv" values? I tried using a "for" loop for this but I get errors related to the fit (which I don't get if I don't limit the "wv" range).
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
fits = []
wvi_b = []
wvi_r = []
p = open("file_input.txt","r")
for line in p:
fits.append(str(line.split()[0]))
wvi_b.append(float(line.split()[1]))
wvi_r.append(float(line.split()[2]))
p.close()
for j in range(len(fits)):
wv = []
flux = []
f = open("%s"%(fits[j]),"r")
for line in f:
wv.append(float(line.split()[0]))
flux.append(float(line.split()[1]))
f.close()
def gauss(x,a,b,c,a1,b1,c1,d):
func = a*np.exp(-((x-b)**2)/(2.0*(c)**2)) + a1*np.exp(-((x-b1)**2)/(2.0*(c1)**2))+d
return func
for wv in range(6450, 6575):
guess=(0.8,wvi_b[j],3.0,1.0,wvi_r[j],3.0,1.0)
popt,pconv=curve_fit(gauss,wv,flux,guess)
print popt[1], popt[4]
ymod=gauss(wv,*popt)
plt.plot(wv,ymod)
plt.plot(wv,flux,marker='.')
plt.show()

When you call for wv in range(6450, 6575), wv is just an integer in that range, not a member of the list. I'd try taking a look at how you're using that variable. If you want to access data from the list wv, you would have to update the syntax to be wv[wv] (which is a little confusing - it might be best to change the variable in your for loop to something else).

reading v 7.3 mat file in python

I am trying to read a matlab file with the following code
import scipy.io
mat = scipy.io.loadmat('test.mat')
and it gives me the following error
raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files
so could anyone please had the same problem and could please any sample code
thanks

I've created a small library to load MATLAB 7.3 files:
pip install mat73
To load a .mat 7.3 into Python as a dictionary:
import mat73
data_dict = mat73.loadmat('data.mat')
simple as that!

Try using h5py module
import h5py
with h5py.File('test.mat', 'r') as f:
f.keys()

import h5py
import numpy as np
filepath = '/path/to/data.mat'
arrays = {}
f = h5py.File(filepath)
for k, v in f.items():
arrays[k] = np.array(v)
you should end up with your data in the arrays dict, unless you have MATLAB structures, I suspect. Hope it helps!

Per Magu_'s answer on a related thread, check out the package hdf5storage which has convenience functions to read v7.3 matlab mat files; it is as simple as
import hdf5storage
mat = hdf5storage.loadmat('test.mat')

I had a look at this issue: https://github.com/h5py/h5py/issues/726. If you saved your mat file with -v7.3 option, you should generate the list of keys with (under Python 3.x):
import h5py
with h5py.File('test.mat', 'r') as file:
print(list(file.keys()))
In order to access the variable a for instance, you have to use the same trick:
with h5py.File('test.mat', 'r') as file:
a = list(file['a'])

According to the Scipy cookbook. http://wiki.scipy.org/Cookbook/Reading_mat_files,
Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use the -vX flag at save time, see help save in Matlab). These files can be read in Python using, for instance, the PyTables or h5py package. Reading Matlab structures in mat files does not seem supported at this point.
Perhaps you could use Octave to re-save using the -vX flag.

Despite hours of searching I've not found how to access Matlab v7.3 structures either. Hopefully this partial answer will help someone, and I'd be very happy to see extra pointers.
So starting with (I think the [0][0] arises from Matlab giving everything to dimensions):
f = h5py.File('filename', 'r')
f['varname'][0][0]
gives: < HDF5 object reference >
Pass this reference to f again:
f[f['varname'][0][0]]
which gives an array:
convert this to a numpy array and extract the value (or, recursively, another < HDF5 object reference > :
np.array(f[f['varname'][0][0]])[0][0]
If accessing the disk is slow, maybe loading to memory would help.
Further edit: after much futile searching my final workaround (I really hope someone else has a better solution!) was calling Matlab from python which is pretty easy and fast:
eng = matlab.engine.start_matlab() # first fire up a Matlab instance
eng.quit()
eng = matlab.engine.connect_matlab() # or connect to an existing one
eng.sqrt(4.0)
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
x = eng.eval('parameterised_function_in_Matlab(1, 1)', nargout=1)
a = eng.eval('Structured_variable{1}{2}.object_name') # (nested cell, cell, object)

This function reads Matlab-produced HDF5 .mat files, and returns a structure of nested dicts of Numpy arrays. Matlab writes matrices in Fortran order, so this also transposes matrices and higher-dimensional arrays into conventional Numpy order arr[..., page, row, col].
import h5py
def read_matlab(filename):
def conv(path=''):
p = path or '/'
paths[p] = ret = {}
for k, v in f[p].items():
if type(v).__name__ == 'Group':
ret[k] = conv(f'{path}/{k}') # Nested struct
continue
v = v[()] # It's a Numpy array now
if v.dtype == 'object':
# HDF5ObjectReferences are converted into a list of actual pointers
ret[k] = [r and paths.get(f[r].name, f[r].name) for r in v.flat]
else:
# Matrices and other numeric arrays
ret[k] = v if v.ndim < 2 else v.swapaxes(-1, -2)
return ret
paths = {}
with h5py.File(filename, 'r') as f:
return conv()

If you are only reading in basic arrays and structs, see vikrantt's answer on a similar post. However, if you are working with a Matlab table, then IMHO the best solution is to avoid the save option altogether.
I've created a simple helper function to convert a Matlab table to a standard hdf5 file, and another helper function in Python to extract the data into a Pandas DataFrame.
Matlab Helper Function
function table_to_hdf5(T, path, group)
%TABLE_TO_HDF5 Save a Matlab table in an hdf5 file format
%
% TABLE_TO_HDF5(T) Saves the table T to the HDF5 file inputname.h5 at the root ('/')
% group, where inputname is the name of the input argument for T
%
% TABLE_TO_HDF5(T, path) Saves the table T to the HDF5 file specified by path at the
% root ('/') group.
%
% TABLE_TO_HDF5(T, path, group) Saves the table T to the HDF5 file specified by path
% at the group specified by group.
%
%%%
if nargin < 2
path = [inputname(1),'.h5']; % default file name to input argument
end
if nargin < 3
group = ''; % We will prepend '/' later, so this is effectively root
end
for field = T.Properties.VariableNames
% Prepare to write
field = field{:};
dataset_name = [group '/' field];
data = T.(field);
if ischar(data) || isstring(data)
warning('String columns not supported. Skipping...')
continue
end
% Write the data
h5create(path, dataset_name, size(data))
h5write(path, dataset_name, data)
end
end
Python Helper Function
import pandas as pd
import h5py
def h5_to_df(path, group = '/'):
"""
Load an hdf5 file into a pandas DataFrame
"""
df = pd.DataFrame()
with h5py.File(path, 'r') as f:
data = f[group]
for k,v in data.items():
if v.shape[0] > 1: # Multiple column field
for i in range(v.shape[0]):
k_new = f'{k}_{i}'
df[k_new] = v[i]
else:
df[k] = v[0]
return df
Important Notes
This will only work on numerical data. If you know how to add string data, please comment.
This will create the file if it does not already exist.
This will crash if the data already exists in the file. You'll want to include logic to handle those cases as you deem appropriate.

import an array in python

How can I import an array to python (numpy.arry) from a file and that way the file must be written if it doesn't already exist.
For example, save out a matrix to a file then load it back.

Checkout the entry on the numpy example list. Here is the entry on .loadtxt()
>>> from numpy import *
>>>
>>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers
>>> t,z = data[:,0], data[:,3] # data is 2D numpy array
>>>
>>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to unpack all columns
>>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns
>>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file
>>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#'
>>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace
>>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats

Another option is numpy.genfromtxt, e.g:
import numpy as np
data = np.genfromtxt("myfile.dat",delimiter=",")
This will make data a numpy array with as many rows and columns as are in your file

(I know the question is old, but I think this might be good as a reference for people with similar questions)
If you want to load data from an ASCII/text file (which has the benefit or being more or less human-readable and easy to parse in other software), numpy.loadtxt is probably what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
If you just want to quickly save and load numpy arrays/matrices to and from a file, take a look at numpy.save and numpy.load:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

In Python, Storing a bare python list as a numpy.array and then saving it out to file, then loading it back, and converting it back to a list takes some conversion tricks. The confusion is because python lists are not at all the same thing as numpy.arrays:
import numpy as np
foods = ['grape', 'cherry', 'mango']
filename = "./outfile.dat.npy"
np.save(filename, np.array(foods))
z = np.load(filename).tolist()
print("z is: " + str(z))
This prints:
z is: ['grape', 'cherry', 'mango']
Which is stored on disk as the filename: outfile.dat.npy
The important methods here are the tolist() and np.array(...) conversion functions.

Have a look at SciPy cookbook. It should give you an idea of some basic methods to import /export data.
If you save/load the files from your own Python programs, you may also want to consider the Pickle module, or cPickle.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Load many files into one array - Python - python

Related

How to turn items from extracted data to numbers for plotting in python?

Numpy put .txt integervalues into an array [duplicate]

python reading in files for a specific column range

reading v 7.3 mat file in python

import an array in python

Categories

Resources