MATLAB to Python Conversion Arrays - python

I have some code in MATLAB that I'm trying to convert into python. I know very little about python, so this is turning out to be a bit of a challenge.
Here's the MATLAB code:
xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
xm = xm_row(ones(Ny-1, 1), :);
ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
ym = ym_col(:,ones(Nx-1,1));
And here is my very rough attempt at trying to do the same thing in python:
for x in range (L-1):
for y in range (L-1):
xm_row = x[((x-1)/2.0+0.5):((x-1)/2.0-.5)]
xm = xm_row[(ones(y-1,1)),:]
ym_column = transposey[(-(y-1)/2.0+0.5):((y-1)/2.0-.5)]
ym = ym_column[:,ones(x-1,1)]
In my python code, L is the size of the array I am looping across.
When I try to run it in python, I get there error:
'int' object has no attribute '__getitem__'
at the line:
xm_row = x[((x-1)/2.0+0.5):((x-1)/2.0-.5)]
Any help is appreciated!

In MATLAB, you can implement that in a simpler way with meshgrid, like so -
Nx = 5;
Ny = 7;
xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
[xm_out,ym_out] = meshgrid(xm_row,ym_col)
Let's compare this meshgrid version with the original code for verification -
>> Nx = 5;
>> Ny = 7;
>> xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
>> ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
>> xm = xm_row(ones(Ny-1, 1), :)
xm =
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
>> ym = ym_col(:,ones(Nx-1,1))
ym =
-2.5 -2.5 -2.5 -2.5
-1.5 -1.5 -1.5 -1.5
-0.5 -0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
1.5 1.5 1.5 1.5
2.5 2.5 2.5 2.5
>> [xm_out,ym_out] = meshgrid(xm_row,ym_col)
xm_out =
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
ym_out =
-2.5 -2.5 -2.5 -2.5
-1.5 -1.5 -1.5 -1.5
-0.5 -0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
1.5 1.5 1.5 1.5
2.5 2.5 2.5 2.5
Now, transitioning from MATLAB to Python has a simpler medium in NumPy, as it hosts many counterparts from MATLAB for use in a Python environment. For our case, we have a NumPy version of meshgrid and that makes it just a straight-forward porting as listed below -
import numpy as np # Import NumPy module
Nx = 5;
Ny = 7;
# Use np.arange that is a colon counterpart in NumPy/Python
xm_row = np.arange(-(Nx-1)/2.0+0.5,(Nx-1)/2.0-0.5+1)
ym_col = np.arange(-(Ny-1)/2.0+0.5,(Ny-1)/2.0-0.5+1)
# Use meshgrid just like in MATLAB
xm,ym = np.meshgrid(xm_row,ym_col)
Output -
In [28]: xm
Out[28]:
array([[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5]])
In [29]: ym
Out[29]:
array([[-2.5, -2.5, -2.5, -2.5],
[-1.5, -1.5, -1.5, -1.5],
[-0.5, -0.5, -0.5, -0.5],
[ 0.5, 0.5, 0.5, 0.5],
[ 1.5, 1.5, 1.5, 1.5],
[ 2.5, 2.5, 2.5, 2.5]])
Also, please notice that +1 was being added at the end of the second argument to np.arange in both cases, as np.arange excludes the second argument element when creating the range of elements. As an example, if we want to create a range of elements from 3 to 10, we would be required to do np.arange(3,10+1) as shown below -
In [32]: np.arange(3,10+1)
Out[32]: array([ 3, 4, 5, 6, 7, 8, 9, 10])

Related

Sensor Data Sampling Frequency Mismatch

I have sensor data captured at different frequencies (this is data I've invented to simplify the operation). I want to resample the voltage data by increasing the number of data points and interpolate them so I have 16 instead of 12.
Pandas has a resample/upsample function but I can only find examples where people have gone from weekly data to daily data (adding 6 daily data points by interpolation between two weekly data points).
time (pressure)
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
pressure
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
time (voltage)
0.07
0.14
0.21
0.28
0.35
0.42
0.49
0.56
0.63
0.7
0.77
0.84
voltage
2.2
2.5
2.8
3.1
3.4
3.7
4
4.3
4.6
4.9
5.2
5.5
I would like my voltage to have 16 samples instead of 12 with the missing values interpolated. Thanks!
Let's assume two Series, "pressure" and "voltage":
pressure = pd.Series({0.05: 1.0, 0.1: 1.1, 0.15: 1.2, 0.2: 1.3, 0.25: 1.4, 0.3: 1.5, 0.35: 1.6, 0.4: 1.7, 0.45: 1.8,
0.5: 1.9, 0.55: 2.0, 0.6: 2.1, 0.65: 2.2, 0.7: 2.3, 0.75: 2.4, 0.8: 2.5}, name='pressure')
voltage = pd.Series({0.07: 2.2, 0.14: 2.5, 0.21: 2.8, 0.28: 3.1, 0.35: 3.4, 0.42: 3.7,
0.49: 4.0, 0.56: 4.3, 0.63: 4.6, 0.7: 4.9, 0.77: 5.2, 0.84: 5.5}, name='voltage')
You can either use pandas.merge_asof:
pd.merge_asof(pressure, voltage, left_index=True, right_index=True)
output:
or pandas.concat+interpolate:
(pd.concat([pressure, voltage], axis=1)
.sort_index()
.apply(pd.Series.interpolate)
#.plot(x='pressure', y='voltage', marker='o') # uncomment to plot
)
output:
Finally, to interpolate only on voltage, drop NAs on pressure first:
(pd.concat([pressure, voltage], axis=1)
.sort_index()
.dropna(subset=['pressure'])
.apply(pd.Series.interpolate)
)
output:

How do I mask only the output (labelled data). I don't have any problem in input data

I have so many Nan values in my output data and I padded those values with zeros. Please don't suggest me to delete Nan or impute with any other no. I want model to skip those nan positions.
example:
x = np.arange(0.5, 30)
x.shape = [10, 3]
x = [[ 0.5 1.5 2.5]
[ 3.5 4.5 5.5]
[ 6.5 7.5 8.5]
[ 9.5 10.5 11.5]
[12.5 13.5 14.5]
[15.5 16.5 17.5]
[18.5 19.5 20.5]
[21.5 22.5 23.5]
[24.5 25.5 26.5]
[27.5 28.5 29.5]]
y = np.arange(2, 10, 0.8)
y.shape = [10, 1]
y[4, 0] = 0.0
y[6, 0] = 0.0
y[7, 0] = 0.0
y = [[2. ]
[2.8]
[3.6]
[4.4]
[0. ]
[6. ]
[0. ]
[0. ]
[8.4]
[9.2]]
I expect keras deep learning model to predict zeros for 5th, 7th and 8th row as similar to the padded value in 'y'.

deleting rows by default value

I have found code that i am interested in, on this forum.
But it's not working for my dataframe.
INPUT:
x , y ,value ,value2
1.0 , 1.0 , 12.33 , 1.23367543
2.0 , 2.0 , 11.5 , 1.1523123
4.0, 2.0 , 22.11 , 2.2112312
5.0, 5.0 , 78.13 , 7.8131239
6.0, 6.0 , 33.68 , 3.3681231
i need delete rows in distance between =1, and leave only one where is highest "value"
RESULT to get:
1.0 , 1.0 , 12.23 , 1.23367543
4.0, 2.0 , 22.11 , 2.2112312
5.0, 5.0 , 78.13 , 7.8131239
CODE:
def dist_value_comp(row):
x_dist = abs(df['y'] - row['y']) <= 1
y_dist = abs(df['x'] - row['x']) <= 1
xy_dist = x_dist & y_dist
max_value = df.loc[xy_dist, 'value2'].max()
return row['value2'] == max_value
df['keep_row'] = df.apply(dist_value_comp, axis=1)
df.loc[df['keep_row'], ['x', 'y','value', 'value2']]
PROBLEM:
When i am adding 4th columnvalue2 where valueshave more numbers after dot, code showing me only row with the highest value2 but result should be same as for value.
UPDATE:
it's working when i am using old pycharm and python 2.7 , on new version it's not, any idea why?

How to normalize data in a text file while preserving the first variable

I have a text file with this format:
1 10.0e+08 1.0e+04 1.0
2 9.0e+07 9.0e+03 0.9
2 8.0e+07 8.0e+03 0.8
3 7.0e+07 7.0e+03 0.7
I would like to preserve the first variable of every line and to then normalize the data for all lines by the data on the first line. The end result would look something like;
1 1.0 1.0 1.0
2 0.9 0.9 0.9
2 0.8 0.8 0.8
3 0.7 0.7 0.7
so essentially, we are doing the following:
1 10.0e+08/10.0e+08 1.0e+04/1.0e+04 1.0/1.0
2 9.0e+07/10.0e+08 9.0e+03/1.0e+04 0.9/1.0
2 8.0e+07/10.0e+08 8.0e+03/1.0e+04 0.8/1.0
3 7.0e+07/10.0e+08 7.0e+03/1.0e+04 0.7/1.0
I'm still researching and reading on how to do this. I'll upload my attempt shortly. Also can anyone point me to a place where I can learn more about manipulating data files?
Read your file into a numpy array and use numpy broadcast feature:
import numpy as np
data = np.loadtxt('foo.txt')
data = data / data[0]
#array([[ 1. , 1. , 1. , 1. ],
# [ 2. , 0.09, 0.9 , 0.9 ],
# [ 2. , 0.08, 0.8 , 0.8 ],
# [ 3. , 0.07, 0.7 , 0.7 ]])
np.savetxt('new.txt', data)

Meshgrid of z values that match x and y meshgrid values

Edit: Original question was flawed but I am leaving it here for reasons of transparency.
Original:
I have some x, y, z data where x and y are coordinates of a 2D grid and z is a scalar value corresponding to (x, y).
>>> import numpy as np
>>> # Dummy example data
>>> x = np.arange(0.0, 5.0, 0.5)
>>> y = np.arange(1.0, 2.0, 0.1)
>>> z = np.sin(x)**2 + np.cos(y)**2
>>> print "x = ", x, "\n", "y = ", y, "\n", "z = ", z
x = [ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
y = [ 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9]
z = [ 0.29192658 0.43559829 0.83937656 1.06655187 0.85571064 0.36317266
0.02076747 0.13964978 0.62437081 1.06008127]
Using xx, yy = np.meshgrid(x, y) I can get two grids containing x and y values corresponding to each grid position.
>>> xx, yy = np.meshgrid(x, y)
>>> print xx
[[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]]
>>> print yy
[[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ]
[ 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1]
[ 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2]
[ 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.3]
[ 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4]
[ 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5]
[ 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6]
[ 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7]
[ 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8]
[ 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9]]
Now I want an array of the same shape for z, where the grid values correspond to the matching x and y values in the original data! But I cannot find an elegant, built-in solution where I do not need to re-grid the data, and I think I am missing some understanding of how I should approach it.
I have tried following this solution (with my real data, not this simple example data, but it should have the same result) but my final grid was not fully populated.
Please help!
Corrected question:
As was pointed out by commenters, my original dummy data was unsuitable for the question I am asking. Here is an improved version of the question:
I have some x, y, z data where x and y are coordinates of a 2D grid and z is a scalar value corresponding to (x, y). The data is read from a text file "data.txt":
#x y z
1.4 0.2 1.93164166734
1.4 0.3 1.88377897779
1.4 0.4 1.81946452501
1.6 0.2 1.9596778849
1.6 0.3 1.91181519535
1.6 0.4 1.84750074257
1.8 0.2 1.90890970517
1.8 0.3 1.86104701562
1.8 0.4 1.79673256284
2.0 0.2 1.78735230743
2.0 0.3 1.73948961789
2.0 0.4 1.67517516511
Loading the text:
>>> import numpy as np
>>> inFile = 'C:\data.txt'
>>> x, y, z = np.loadtxt(inFile, unpack=True, usecols=(0, 1, 2), comments='#', dtype=float)
>>> print x
[ 1.4 1.4 1.4 1.6 1.6 1.6 1.8 1.8 1.8 2. 2. 2. ]
>>> print y
[ 0.2 0.3 0.4 0.2 0.3 0.4 0.2 0.3 0.4 0.2 0.3 0.4]
>>> print z
[ 1.93164167 1.88377898 1.81946453 1.95967788 1.9118152 1.84750074
1.90890971 1.86104702 1.79673256 1.78735231 1.73948962 1.67517517]
Using xx, yy= np.meshgrid(np.unique(x), np.unique(y)) I can get two grids containing x and y values corresponding to each grid position.
>>> xx, yy= np.meshgrid(np.unique(x), np.unique(y))
>>> print xx
[[ 1.4 1.6 1.8 2. ]
[ 1.4 1.6 1.8 2. ]
[ 1.4 1.6 1.8 2. ]]
>>> print yy
[[ 0.2 0.2 0.2 0.2]
[ 0.3 0.3 0.3 0.3]
[ 0.4 0.4 0.4 0.4]]
Now each corresponding cell position in both xx and yy correspond to one of the original grid point locations.
I simply need an equivalent array where the grid values correspond to the matching z values in the original data!
"""e.g.
[[ 1.93164166734 1.9596778849 1.90890970517 1.78735230743]
[ 1.88377897779 1.91181519535 1.86104701562 1.73948961789]
[ 1.81946452501 1.84750074257 1.79673256284 1.67517516511]]"""
But I cannot find an elegant, built-in solution where I do not need to re-grid the data, and I think I am missing some understanding of how I should approach it. For example, using xx, yy, zz = np.meshgrid(x, y, z) returns three 3D arrays that I don't think I can use.
Please help!
Edit:
I managed to make this example work thanks to the solution from Jaime: Fill 2D numpy array from three 1D numpy arrays
>>> x_vals, x_idx = np.unique(x, return_inverse=True)
>>> y_vals, y_idx = np.unique(y, return_inverse=True)
>>> vals_array = np.empty(x_vals.shape + y_vals.shape)
>>> vals_array.fill(np.nan) # or whatever your desired missing data flag is
>>> vals_array[x_idx, y_idx] = z
>>> zz = vals_array.T
>>> print zz
But the code (with real input data) that led me on this path was still failing. I found the problem now. I have been using scipy.ndimage.zoom to resample my gridded data to a higher resolution before generating zz.
>>> import scipy.ndimage
>>> zoom = 2
>>> x = scipy.ndimage.zoom(x, zoom)
>>> y = scipy.ndimage.zoom(y, zoom)
>>> z = scipy.ndimage.zoom(z, zoom)
This produced an array containing many nan entries:
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]])
When I skip the zoom stage, the correct array is produced:
array([[-22365.93400183, -22092.31794674, -22074.21420168, ...,
-14513.89091599, -12311.97437017, -12088.07062786],
[-29264.34039242, -28775.79743097, -29021.31886353, ...,
-21354.6799064 , -21150.76555669, -21046.41225097],
[-39792.93758344, -39253.50249278, -38859.2562673 , ...,
-24253.36838785, -25714.71895023, -29237.74277727],
...,
[ 44829.24733543, 44779.37084337, 44770.32987311, ...,
21041.42652441, 20777.00408692, 20512.58162671],
[ 44067.26616067, 44054.5398901 , 44007.62587598, ...,
21415.90416488, 21151.48168444, 20887.05918082],
[ 43265.35371973, 43332.5983711 , 43332.21743471, ...,
21780.32283309, 21529.39770759, 21278.47255848]])

Categories