How to specify multiple "num" parameters for np.linspace function? - python

I wonder if there is a way to set up a NumPy linspace function with multiple num parameters so that I can create sequence of evenly spaced values with different intervals without any for loop operations.
To illustrate a bit more my issue, I have the following np.array for which I want to subdivide 3 segments represented by their 2 respective vertices on the x,y,z axis:
*************************
3D SEGMENTS TO DISCRETIZE
*************************
SegmentToDiscretize = np.array([[[150.149, 167.483, 4.2 ],[160.149, 167.483, 4.2 ]],
[[148.594, 163.634, 25.8 ],[180.547, 170.667, 25.8 ]],
[[180.547, 170.667, 25.8 ],[200.547, 190.667, 25.8 ]]])
And the folling function dedicated to add equidistant points between each pairs of vertices:
******************************
EQUIDISTANT POINTS COMPUTATION
******************************
nbsubdiv = 10
addedpoint = np.linspace(SegmentToDiscretize[:,0],SegmentToDiscretize[:,1],nbsubdiv, dtype = np.float)
Thanks to the argument nbsubdiv, I can specify how many subdivisions I want.
But I would like to specify 3 different subdivision values for each segments/rows contained in my SegmentToDiscretize np.array
[[[150.149, 167.483, 4.2 ],[160.149, 167.483, 4.2 ]], <-- nbsubdiv = 4
[[148.594, 163.634, 25.8 ],[180.547, 170.667, 25.8 ]], <-- nbsubdiv = 30
[[180.547, 170.667, 25.8 ],[200.547, 190.667, 25.8 ]]] <-- nbsubdiv = 10
I tried to transform my nbsubdiv parameter as a list but without success...
nbsubdiv = [4,30,10]
addedpoint = np.linspace(SegmentToDiscretize[:,0],SegmentToDiscretize[:,1],nbsubdiv[0], dtype = np.float)
With the above code, I obtain :
[[148.594 163.634 4.2 ]
[150.149 165.97833333 4.2 ]
[153.48233333 167.483 4.2 ]
[156.81566667 167.483 4.2 ]
[159.245 167.483 25.8 ]
[160.149 167.483 25.8 ]
[169.896 168.32266667 25.8 ]
[180.547 170.667 25.8 ]
[180.547 170.667 25.8 ]
[187.21366667 177.33366667 25.8 ]
[193.88033333 184.00033333 25.8 ]
[200.547 190.667 25.8 ]]
Which is normal since nbsubdiv[0] takes the first element in the list. But I did not succeeded in finding a way to use each values in this list recursively without a for loop.
So I would be very delighted if anyone could help me solve this challenge.
Thanks in advance
Warm regards,
Hervé

Related

How do I mask only the output (labelled data). I don't have any problem in input data

I have so many Nan values in my output data and I padded those values with zeros. Please don't suggest me to delete Nan or impute with any other no. I want model to skip those nan positions.
example:
x = np.arange(0.5, 30)
x.shape = [10, 3]
x = [[ 0.5 1.5 2.5]
[ 3.5 4.5 5.5]
[ 6.5 7.5 8.5]
[ 9.5 10.5 11.5]
[12.5 13.5 14.5]
[15.5 16.5 17.5]
[18.5 19.5 20.5]
[21.5 22.5 23.5]
[24.5 25.5 26.5]
[27.5 28.5 29.5]]
y = np.arange(2, 10, 0.8)
y.shape = [10, 1]
y[4, 0] = 0.0
y[6, 0] = 0.0
y[7, 0] = 0.0
y = [[2. ]
[2.8]
[3.6]
[4.4]
[0. ]
[6. ]
[0. ]
[0. ]
[8.4]
[9.2]]
I expect keras deep learning model to predict zeros for 5th, 7th and 8th row as similar to the padded value in 'y'.

How to correctly display a multi - criteria dataset on a heatmap?

I have a dataset in a numpy array in the below format. Each "column" is a separate criteria. I want to display a heatmap where each "column" would correspond to the score range within that column:
[[ 226 600 3.33 915. 92.6 98.6 ]
[ 217 700 3.34 640. 93.7 98.5 ]
[ 213 900 3.35 662. 88.8 96. ]
...
[ 108 600 2.31 291. 64. 70.4 ]
[ 125 800 3.36 1094. 65.5 84.1 ]
[ 109 400 2.44 941. 52.3 68.7 ]]
I have written a function to generate a heatmap:
def HeatMap(data):
#generate heatmap figure
figure = plt.figure()
sub_figure = figure.add_subplot(111)
heatmap = sub_figure.imshow(data, interpolation='nearest',cmap='jet', aspect=0.05)
#generate color bar
cbar = figure.colorbar(ax=sub_figure, mappable=heatmap, orientation='horizontal')
cbar.set_label('Scores')
plt.show()
This is what the function generates:
As per above, it can be seen that the problem lies in my function somewhere as the Scores range from 0 to a maximum value in the dataset of 2500. How can I amend my function so that the heatmap displays the scores in the columns according to their range rather than the range of the whole dataset? My first thoughts are to change the array dimensions to something like [[226],[600]] etc. but not sure if that's the solution
Thanks for your help
You cannot have a separate cmap for each column.
If you want to see the variation in each column as per their own range, you can normalize the data by column before plotting the heatmap.
Code
import numpy as np
x = np.array([[1000, 10, 0.5],
[ 765, 5, 0.35],
[ 800, 7, 0.09]])
x_normed = x / x.max(axis=0)
print(x_normed)
# [[ 1. 1. 1. ]
# [ 0.765 0.5 0.7 ]
# [ 0.8 0.7 0.18 ]]
# Plot the heatmap for x_normed.
This will preserve the variation in each column.

Interpolation of gridded data

I was hoping someone could help me with a problem that Ive been having (I'm still very new to python). I have been trying to interpolate data from a 50x4 array that is read from an excel sheet seen below.
[ 60. 0. 23.88 22.38 ]
[ 60. 5. 19.508 28.2 ]
[ 60. 10. 16.9 32.23 ]
[ 60. 15. 15.4 34.03 ]
[ 60. 20. 14.4 35.12 ]
[ 60. 25. 13.66 36.02 ]
[ 60. 30. 13.14 36.61 ]
[ 60. 35. 12.69 37.14 ]
[ 60. 40. 12.53 37.56 ]
[ 60. 50. 12.33 38.32 ]
[ 70. 0. 19.3 21.34 ]
[ 70. 5. 16.06 25.37 ]
[ 70. 10. 13.74 28.08 ]
[ 70. 15. 12.33 40.07 ]
[ 70. 20. 11.45 41.78 ]
[ 70. 25. 10.77 42.8 ]
etc...
What I'm trying to achieve is to enter 2 values (say 65 and 12) which correspond to interpolated values in the 1st and 2nd column, and it would return the interpolated values for columns 3 and 4. I managed to get it working using the griddata function in matlab. However no luck in python yet.
Thanks in advance
I think that scipy.interpolate might do the same (or at least similar) as MATLAB's Griddata. Below code uses the Radial Basis Function for interpolation. I've only made the example for your column 3 as z-axis.
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.array([60] * 10 + [70] * 6)
y = np.array([0,5,10,15,20,25,30,35,40,50,0,5,10,15,20,25])
z = np.array([23.88, 19.508, 16.9, 15.4, 14.4, 13.66, 13.14, 12.69, 12.53, 12.33, 19.3, 16.06, 13.74, 12.33, 11.45, 10.77])
x_ticks = np.linspace(60, 70, 11)
y_ticks = np.linspace(0, 50, 51)
XI, YI = np.meshgrid(x_ticks, y_ticks)
rbf = interpolate.Rbf(x, y, z, epsilon=2)
ZI = rbf(XI, YI)
print(ZI[np.argwhere(y_ticks==12)[0][0], np.argwhere(x_ticks==65)[0][0]])
>>> 14.222288614849171
Be aware that the result is ZI[y,x], not ZI[x,y]. Also be aware that your ticks must contain the x and y values you query, otherwise you'll get an IndexError.
Maybe you can build up on that solution depending on your needs.

Read txt data separated by empty lines as several numpy arrays

I have some data in a txt file as follows:
# Contour 0, label: 37
41.6 7.5
41.5 7.4
41.5 7.3
41.4 7.2
# Contour 1, label:
48.3 2.9
48.4 3.0
48.6 3.1
# Contour 2, label:
61.4 2.9
61.3 3.0
....
So every block begins with a comment and ends with a blank line.
I want to read out those data and bring them into a list which consists of numpy arrays, so like
# list as i want it:
[array([[41.6, 7.5], [41.5, 7.4], [1.5, 7.3], [41.4, 7.2]]),
array([[48.3, 2.9], [48.4, 3.0], [48.6, 3.1]]),
array([[61.4, 2.9], [61.3, 3.0]]), ...]
Is there an efficient way to do that with numpy? genfromtxt or loadtxt seems not to have the required options!?
Like this?
import numpy as np
text = \
'''
# Contour 0, label: 37
41.6 7.5
41.5 7.4
41.5 7.3
41.4 7.2
# Contour 1, label:
48.3 2.9
48.4 3.0
48.6 3.1
# Contour 2, label:
61.4 2.9
61.3 3.0
'''
for line in text.split('\n'):
if line != '' and not line.startswith('#'):
data = line.strip().split()
array = np.array([float(d) for d in data])
print(array)
You could use Python's groupby function to group the 3 entries together as follows:
from itertools import groupby
import numpy as np
array_list = []
with open('data.txt') as f_data:
for k, g in groupby(f_data, lambda x: x.startswith('#')):
if not k:
array_list.append(np.array([[float(x) for x in d.split()] for d in g if len(d.strip())]))
for entry in array_list:
print entry
print
This would display the array_list as follows:
[[ 41.6 7.5]
[ 41.5 7.4]
[ 41.5 7.3]
[ 41.4 7.2]]
[[ 48.3 2.9]
[ 48.4 3. ]
[ 48.6 3.1]]
[[ 61.4 2.9]
[ 61.3 3. ]]

Rolling median in python

I have some stock data based on daily close values. I need to be able to insert these values into a python list and get a median for the last 30 closes. Is there a python library that does this?
In pure Python, having your data in a Python list a, you could do
median = sum(sorted(a[-30:])[14:16]) / 2.0
(This assumes a has at least 30 items.)
Using the NumPy package, you could use
median = numpy.median(a[-30:])
Have you considered pandas? It is based on numpy and can automatically associate timestamps with your data, and discards any unknown dates as long as you fill it with numpy.nan. It also offers some rather powerful graphing via matplotlib.
Basically it was designed for financial analysis in python.
isn't the median just the middle value in a sorted range?
so, assuming your list is stock_data:
last_thirty = stock_data[-30:]
median = sorted(last_thirty)[15]
Now you just need to get the off-by-one errors found and fixed and also handle the case of stock_data being less than 30 elements...
let us try that here a bit:
def rolling_median(data, window):
if len(data) < window:
subject = data[:]
else:
subject = data[-30:]
return sorted(subject)[len(subject)/2]
#found this helpful:
list=[10,20,30,40,50]
med=[]
j=0
for x in list:
sub_set=list[0:j+1]
median = np.median(sub_set)
med.append(median)
j+=1
print(med)
Here is a much faster method with w*|x| space complexity.
def moving_median(x, w):
shifted = np.zeros((len(x)+w-1, w))
shifted[:,:] = np.nan
for idx in range(w-1):
shifted[idx:-w+idx+1, idx] = x
shifted[idx+1:, idx+1] = x
# print(shifted)
medians = np.median(shifted, axis=1)
for idx in range(w-1):
medians[idx] = np.median(shifted[idx, :idx+1])
medians[-idx-1] = np.median(shifted[-idx-1, -idx-1:])
return medians[(w-1)//2:-(w-1)//2]
moving_median(np.arange(10), 4)
# Output
array([0.5, 1. , 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8. ])
The output has the same length as the input vector.
Rows with less than one entry will be ignored and with half of them nans (happens only for an even window-width), only the first option will be returned. Here is the shifted_matrix from above with the respective median values:
[[ 0. nan nan nan] -> -
[ 1. 0. nan nan] -> 0.5
[ 2. 1. 0. nan] -> 1.0
[ 3. 2. 1. 0.] -> 1.5
[ 4. 3. 2. 1.] -> 2.5
[ 5. 4. 3. 2.] -> 3.5
[ 6. 5. 4. 3.] -> 4.5
[ 7. 6. 5. 4.] -> 5.5
[ 8. 7. 6. 5.] -> 6.5
[ 9. 8. 7. 6.] -> 7.5
[nan 9. 8. 7.] -> 8.0
[nan nan 9. 8.] -> -
[nan nan nan 9.]]-> -
The behaviour can be changed by adapting the final slice medians[(w-1)//2:-(w-1)//2].
Benchmark:
%%timeit
moving_median(np.arange(1000), 4)
# 267 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Alternative approach: (the results will be shifted)
def moving_median_list(x, w):
medians = np.zeros(len(x))
for j in range(len(x)):
medians[j] = np.median(x[j:j+w])
return medians
%%timeit
moving_median_list(np.arange(1000), 4)
# 15.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Both algorithms have a linear time complexity.
Therefore, the function moving_median will be the faster option.

Categories