I want to convert a .mat file into excel - python

enter image description herethe type of data is numpy.ndarray. I am trying to convert it to pandas data frame with following code:
import pandas as pd
import numpy as np
from scipy.io import loadmat
data= loadmat(r"EAOW_FLOW_TimeSeries_1hr_LocationA1_final.mat")
ary = np.array(data)
ser = pd.Series(ary)
df=pd.DataFrame(ser)
df.to_csv(r"data.csv", index=False)
this is generating an excel but all the data is in single cell.
I am new to python. please help me resolve this error and convert the mat file to csv

Related

Read a HDF data to a 3d array and save as a dataframe in python

I am currently working on the NASA aerosol optical depth data (MCD19A2), which is a NASA satellite level three product. I have uploaded the data. I want to save the data as a dataframe including all the information of longitude and latitude, and values. I have successfully converted the 0.47um band file into a three-dimensional array. I want to ask how to convert this array into a correct dataframe includes X, Y and the value.
Below are the codes I have tried:
from osgeo import gdal
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
rds = gdal.Open("MCD19A2.A2006001.h26v04.006.2018036214627.hdf")
names=rds.GetSubDatasets()
names[0][0]
*'HDF4_EOS:EOS_GRID:"MCD19A2.A2006001.h26v04.006.2018036214627.hdf":grid1km:Optical_Depth_047'*
aod_047 = gdal.Open(names[0][0])
a47=aod_047.ReadAsArray()
a47[1].shape
(1200,1200)
I would like the result to be like
X (n=1200)
Y (n=1200)
AOD_047
8896067
5559289
0.0123
I know that in R this can be done by
require('gdalUtils')
require('raster')
require('rgdal')
file.name<-"MCD19A2.A2006001.h26v04.006.2018036214627.hdf"
sds <- get_subdatasets(file.name)
gdal_translate(sds[1], dst_dataset = paste0('tmp047', basename(file.name), '.tiff'), b = nband)
r.047 <- raster(paste0('tmp047', basename(file.name), '.tiff'))
df.047 <- raster::as.data.frame(r.047, xy = T)
names(df.047)[3] <- 'AOD_047'
But, R really relies on memory and saving to 'tif' and reading 'tif' is using a lot of memory. So I want to do this task in python. Thanks a lot for your help.
You can use pandas:
import pandas as pd
df=pd.read_hdf('filename.hdf')

How to iterate over rows of .csv file and pass each row to a time-series analysis model?

I want to write a program in python that iterate over each row of a data-matrix in a .csv file and then pass each row as an input to time-series-analysis model and the output(which is going to be a single value) of each row analysed over model will be stored in a form of column.
So far, I have tried iterating over rows, passing it through model and printing each output:
import pandas as pd
import numpy as np
from statsmodels.tsa.ar_model import AR
from random import random
data=pd.read_csv('EXAMPLEMATRIX.csv',header=None)
for i in data.iterrows():
df=np.asarray(i)
model=AR(df)
model_fit=model.fit()
yhat=model_fitd.predict(len(df),len(df))
print(yhat)
but I get an error:
ValueError: maxlag should be < nobs
Please help me solve this problem or finding out where it is going wrong or provide me a reference for solving this problem.
THANKS in advance
Use that instead:
import pandas as pd
import numpy as np
from statsmodels.tsa.ar_model import AR
from random import random
for i in range(data.shape[0]):
row = data.iloc[i]
model=AR(row.values)
model_fit=model.fit()
yhat=model_fit.predict(len(row),len(row))
print(yhat)

Use Spreadsheet Names as Variables in Pandas

I have different models and initialValues stored in different sheets in an Excel File called RateMatrix1. My models are WIN, WSW, WPA, WFR... and my initialValues are WI, WS, WP, WF... and the sheets on Excel are named exactly as such.
Now, I would like to write a function that uses the name of the model and the initialValues as "sheetnames" below. I was wondering if there is a way to do so in python.
import pandas as pd
import numpy as np
def MLA(model, initialValues)
RMatrix=(pd.read_excel("C:\Anaconda3\RateMatrix1.xlsx", sheetname="model", skiprows=0)).as_matrix() #read the matrix values from excel spreadsheet, and converts the values to a matrix
initialAmount = (pd.read_excel("C:\Anaconda3\RateMatrix1.xlsx", sheetname="initialValues", skiprows=0)).as_matrix() #read the column matrix (initial values) from excel spreadsheet, and converts the values to a matrix
return np.dot(RMatrix,initialAmount)
print(MLA(WIN,WI))
Nevermind...I found a solution.
For anyone else looking to do the same,here's my code:
import pandas as pd
import numpy as np
def MLA(model, initialValues)
RMatrix=(pd.read_excel("C:\Anaconda3\RateMatrix1.xlsx", sheetname=model, skiprows=0)).as_matrix() #read the matrix values from excel spreadsheet, and converts the values to a matrix
initialAmount = (pd.read_excel("C:\Anaconda3\RateMatrix1.xlsx", sheetname=initialValues, skiprows=0)).as_matrix() #read the column matrix (initial values) from excel spreadsheet, and converts the values to a matrix
return np.dot(RMatrix,initialAmount)
print(MLA("WIN","WI"))

pandas.DataFrame returns Series not a Dataframe

I am working with a series of images. I read them first and store in the list then I convert them to dataframe and finally I would like to implement Isomap. When I read images (I have 84 of them) I get 84x2303 dataframe of objects. Now each object by itself also looks like a dataframe. I am wondering how to convert all of it to_numeric so I can use Isomap on it and then plot it.
Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples)
print df.dtypes
print df.shape
Thanks!

matlab data file to pandas DataFrame [duplicate]

This question already has answers here:
Read .mat files in Python
(15 answers)
Closed 6 years ago.
Is there a standard way to convert matlab .mat (matlab formated data) files to Panda DataFrame?
I am aware that a workaround is possible by using scipy.io but I am wondering whether there is a straightforward way to do it.
I found 2 way: scipy or mat4py.
mat4py
Load data from MAT-file
The function loadmat loads all variables stored in the MAT-file into a
simple Python data structure, using only Python’s dict and list
objects. Numeric and cell arrays are converted to row-ordered nested
lists. Arrays are squeezed to eliminate arrays with only one element.
The resulting data structure is composed of simple types that are
compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
data = loadmat('datafile.mat')
From:
https://pypi.python.org/pypi/mat4py/0.1.0
Scipy:
Example:
import numpy as np
from scipy.io import loadmat # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd
mat = loadmat('measured_data.mat') # load mat-file
mdata = mat['measuredData'] # variable in mat file
mdtype = mdata.dtype # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
# elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
index=[datetime(*ts) for ts in ndata['timestamps']],
columns=columns)
From:
http://poquitopicante.blogspot.fr/2014/05/loading-matlab-mat-file-into-pandas.html
Finally you can use PyHogs but still use scipy:
Reading complex .mat files.
This notebook shows an example of reading a Matlab .mat file,
converting the data into a usable dictionary with loops, a simple plot
of the data.
http://pyhogs.github.io/reading-mat-files.html
Ways to do this:
As you mentioned scipy
import scipy.io as sio
test = sio.loadmat('test.mat')
Using the matlab engine:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat",nargout=1)

Categories