Interpolation of gridded data - python

I was hoping someone could help me with a problem that Ive been having (I'm still very new to python). I have been trying to interpolate data from a 50x4 array that is read from an excel sheet seen below.
[ 60. 0. 23.88 22.38 ]
[ 60. 5. 19.508 28.2 ]
[ 60. 10. 16.9 32.23 ]
[ 60. 15. 15.4 34.03 ]
[ 60. 20. 14.4 35.12 ]
[ 60. 25. 13.66 36.02 ]
[ 60. 30. 13.14 36.61 ]
[ 60. 35. 12.69 37.14 ]
[ 60. 40. 12.53 37.56 ]
[ 60. 50. 12.33 38.32 ]
[ 70. 0. 19.3 21.34 ]
[ 70. 5. 16.06 25.37 ]
[ 70. 10. 13.74 28.08 ]
[ 70. 15. 12.33 40.07 ]
[ 70. 20. 11.45 41.78 ]
[ 70. 25. 10.77 42.8 ]
etc...
What I'm trying to achieve is to enter 2 values (say 65 and 12) which correspond to interpolated values in the 1st and 2nd column, and it would return the interpolated values for columns 3 and 4. I managed to get it working using the griddata function in matlab. However no luck in python yet.
Thanks in advance

I think that scipy.interpolate might do the same (or at least similar) as MATLAB's Griddata. Below code uses the Radial Basis Function for interpolation. I've only made the example for your column 3 as z-axis.
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
x = np.array([60] * 10 + [70] * 6)
y = np.array([0,5,10,15,20,25,30,35,40,50,0,5,10,15,20,25])
z = np.array([23.88, 19.508, 16.9, 15.4, 14.4, 13.66, 13.14, 12.69, 12.53, 12.33, 19.3, 16.06, 13.74, 12.33, 11.45, 10.77])
x_ticks = np.linspace(60, 70, 11)
y_ticks = np.linspace(0, 50, 51)
XI, YI = np.meshgrid(x_ticks, y_ticks)
rbf = interpolate.Rbf(x, y, z, epsilon=2)
ZI = rbf(XI, YI)
print(ZI[np.argwhere(y_ticks==12)[0][0], np.argwhere(x_ticks==65)[0][0]])
>>> 14.222288614849171
Be aware that the result is ZI[y,x], not ZI[x,y]. Also be aware that your ticks must contain the x and y values you query, otherwise you'll get an IndexError.
Maybe you can build up on that solution depending on your needs.

Related

How to add color array to mesh in meshio?

############points (108 ea)##################
[[362. 437. 0.]
[418. 124. 0.]
[452. 64. 0.]
...
[256. 512. 0.]
[ 0. 256. 0.]
[512. 256. 0.]]
##########triangles (205 ea)#################
[[ 86 106 100]
[104 95 100]
[ 41 104 101]
...
[ 0 84 36]
[ 84 6 36]
[ 6 84 0]]
################triangle_colours (205 ea)##############
[[0.69140625 0.2734375 0.3203125 1. ]
[0.8046875 0.37109375 0.36328125 1. ]
[0.83203125 0.48046875 0.40234375 1. ]
...
[0.46875 0.13671875 0.26171875 1. ]
[0.49609375 0.1796875 0.28515625 1. ]
[0.91796875 0.796875 0.71484375 1. ]]
Code:
import meshio
cells = [
("triangle", triangles)
]
mesh = meshio.Mesh(
points,
cells,
cell_data={"a": triangle_colours},
)
mesh.write(
"foo.vtk",
)
Above code gives
ValueError: Incompatible cell data. 1 cell blocks, but 'a' has 205 blocks.
I just want to add colors to triangles. triangle_colours array has the same size as triangles as per the example in here: https://github.com/nschloe/meshio .(Both has 205 elements) How can I correct this error?
cell_data corresponds to cells, so it needs to have the same "blocked" structure.
import meshio
cells = [("triangle", triangles)]
mesh = meshio.Mesh(
points,
cells,
cell_data={"a": [triangle_colours]},
)
mesh.write("foo.vtk")

How to extract a DataFrame to obtain a nested array?

I have a sample DataFrame as below:
First column consists of 2 years, for each year, 2 track exist and each track includes pairs of longitude and latitude coordinated. How can I extract every track for each year separately to obtain an array of tracks with lat and long?
df = pd.DataFrame(
{'year':[0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1],
'track_number':[0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1],
'lat': [11.7,11.8,11.9,11.9,12.0,12.1,12.2,12.2,12.3,12.3,12.4,12.5,12.6,12.6,12.7,12.8],
'long':[-83.68,-83.69,-83.70,-83.71,-83.71,-83.73,-83.74,-83.75,-83.76,-83.77,-83.78,-83.79,-83.80,-83.81,-83.82,-83.83]})
You can groupby year and then extract a numpy.array from the created dataframes with .to_numpy().
>>> years = []
>>> for _, df2 in df.groupby(["year"]):
years.append(df2.to_numpy()[:, 1:])
>>> years[0]
array([[ 0. , 11.7 , -83.68],
[ 0. , 11.8 , -83.69],
[ 0. , 11.9 , -83.7 ],
[ 0. , 11.9 , -83.71],
[ 1. , 12. , -83.71],
[ 1. , 12.1 , -83.73],
[ 1. , 12.2 , -83.74],
[ 1. , 12.2 , -83.75]])
>>> years[1]
array([[ 0. , 12.3 , -83.76],
[ 0. , 12.3 , -83.77],
[ 0. , 12.4 , -83.78],
[ 0. , 12.5 , -83.79],
[ 1. , 12.6 , -83.8 ],
[ 1. , 12.6 , -83.81],
[ 1. , 12.7 , -83.82],
[ 1. , 12.8 , -83.83]])
Where years[0] would have the desired information for the year 0. And so on. Inside the array, the positions of the original dataframe are preserved. That is, the first element is the track; the second, the latitude, and the third, the longitude.
If you wish to do the same for the track, i.e, have an array of only latitude and longitude, you can groupby(["year", "track_number"]) as well.

How to correctly display a multi - criteria dataset on a heatmap?

I have a dataset in a numpy array in the below format. Each "column" is a separate criteria. I want to display a heatmap where each "column" would correspond to the score range within that column:
[[ 226 600 3.33 915. 92.6 98.6 ]
[ 217 700 3.34 640. 93.7 98.5 ]
[ 213 900 3.35 662. 88.8 96. ]
...
[ 108 600 2.31 291. 64. 70.4 ]
[ 125 800 3.36 1094. 65.5 84.1 ]
[ 109 400 2.44 941. 52.3 68.7 ]]
I have written a function to generate a heatmap:
def HeatMap(data):
#generate heatmap figure
figure = plt.figure()
sub_figure = figure.add_subplot(111)
heatmap = sub_figure.imshow(data, interpolation='nearest',cmap='jet', aspect=0.05)
#generate color bar
cbar = figure.colorbar(ax=sub_figure, mappable=heatmap, orientation='horizontal')
cbar.set_label('Scores')
plt.show()
This is what the function generates:
As per above, it can be seen that the problem lies in my function somewhere as the Scores range from 0 to a maximum value in the dataset of 2500. How can I amend my function so that the heatmap displays the scores in the columns according to their range rather than the range of the whole dataset? My first thoughts are to change the array dimensions to something like [[226],[600]] etc. but not sure if that's the solution
Thanks for your help
You cannot have a separate cmap for each column.
If you want to see the variation in each column as per their own range, you can normalize the data by column before plotting the heatmap.
Code
import numpy as np
x = np.array([[1000, 10, 0.5],
[ 765, 5, 0.35],
[ 800, 7, 0.09]])
x_normed = x / x.max(axis=0)
print(x_normed)
# [[ 1. 1. 1. ]
# [ 0.765 0.5 0.7 ]
# [ 0.8 0.7 0.18 ]]
# Plot the heatmap for x_normed.
This will preserve the variation in each column.

Python genfromtxt file path

I have an extremely basic problem with the numpy.genfromtxt function. I'm using the Enthought Canopy package: where shall I save the file.txt I want to use, or how shall I tell Python where to look for it? When using IDLE I simply save the file in a preset folder such as C:\Users\Davide\Python\data.txt and what I get is
>>> import numpy as np
>>> np.genfromtxt('data.txt')
array([[ 33.1 , 32.6 , 18.2 , 17.9 ],
[ 32.95, 32.7 , 17.95, 17.9 ],
[ 32.9 , 32.6 , 18. , 17.9 ],
[ 33. , 32.65, 18. , 17.9 ],
[ 32.95, 32.65, 18.05, 17.9 ],
[ 33. , 32.6 , 18. , 17.9 ],
[ 33.05, 32.7 , 18. , 17.9 ],
[ 33.05, 32.5 , 18.1 , 17.9 ],
[ 33. , 32.6 , 18.05, 17.9 ],
[ 33. , 32.55, 18. , 17.95]])
while working with Canopy the same code gives IOError: data.txt not found, nor something like np.genfromtxt('C:\Users\Davide\Python\data.txt') works. I'm sorry for the question's banality but I'm really going crazy with this. Thanks for help.
You can pass a fully qualified path but this:
np.genfromtxt('C:\Users\Davide\Python\data.txt')
won't work because back slashes need to be escaped:
np.genfromtxt('C:\\Users\\Davide\\Python\\data.txt')
or you could use a raw string:
np.genfromtxt(r'C:\Users\Davide\Python\data.txt')
As to where the currect saved location is you can query this using os.getcwd():
In [269]:
import os
os.getcwd()
Out[269]:
'C:\\WinPython-64bit-3.4.3.1\\notebooks\\docs'

Not able to get my head around this python

I just implemented a hierarchical clustering by following the documentation here: http://www.mathworks.com/help/stats/hierarchical-clustering.html?s_tid=doc_12b
So, let me try to put down what I am trying to do.
Take a look at the following figure:
Now, this dendogram is generated from the following data:
node1 node2 dist(node1,node2) num_elems
assigning index **37 to [ 16. 26**. 1.14749118 2. ]
assigning index 38 to [ 4. 7. 1.20402602 2. ]
assigning index 39 to [ 13. 29. 1.44708015 2. ]
assigning index 40 to [ 12. 18. 1.45827365 2. ]
assigning index 41 to [ 10. 34. 1.49607538 2. ]
assigning index 42 to [ 17. 38. 1.52565922 3. ]
assigning index 43 to [ 8. 25. 1.58919037 2. ]
assigning index 44 to [ 3. 40. 1.60231007 3. ]
assigning index 45 to [ 6. 42. 1.65755731 4. ]
assigning index 46 to [ 15. 23. 1.77770844 2. ]
assigning index 47 to [ 24. 33. 1.77771082 2. ]
assigning index 48 to [ 20. 35. 1.81301111 2. ]
assigning index 49 to [ 19. 48. 1.9191061 3. ]
assigning index 50 to [ 0. 44. 1.94238609 4. ]
assigning index 51 to [ 2. 36. 2.0444266 2. ]
assigning index 52 to [ 39. 45. 2.11667375 6. ]
assigning index 53 to [ 32. 43. 2.17132916 3. ]
assigning index 54 to [ 21. 41. 2.2882061 3. ]
assigning index 55 to [ 9. 30. 2.34492327 2. ]
assigning index 56 to [ 5. 51. 2.38383321 3. ]
assigning index 57 to [ 46. 52. 2.42100025 8. ]
assigning index 58 to [ **28. 37**. 2.48365024 3. ]
assigning index 59 to [ 50. 53. 2.57305009 7. ]
assigning index 60 to [ 49. 57. 2.69459675 11. ]
assigning index 61 to [ 11. 54. 2.75669475 4. ]
assigning index 62 to [ 22. 27. 2.77163751 2. ]
assigning index 63 to [ 47. 55. 2.79303418 4. ]
assigning index 64 to [ 14. 60. 2.88015327 12. ]
assigning index 65 to [ 56. 59. 2.95413905 10. ]
assigning index 66 to [ 61. 65. 3.12615829 14. ]
assigning index 67 to [ 64. 66. 3.28846304 26. ]
assigning index 68 to [ 31. 58. 3.3282066 4. ]
assigning index 69 to [ 63. 67. 3.47397104 30. ]
assigning index 70 to [ 62. 68. 3.63807605 6. ]
assigning index 71 to [ 1. 69. 4.09465969 31. ]
assigning index 72 to [ 70. 71. 4.74129435 37.
So basically, there are 37 points in my data same indexed from 0-36..Now, when I see the first element in this list... I assign i + len(thiscompletelist) + 1
So for example, when the id is 37 seen again in future iterations, then that basically means that it is linked to a branch as well.
I used matlab to generate this image. But I want to query this information as query_node(node_id) such that it returns me a list by level.. such that... on query_node(37) I get
{ "left": {"level":1 {"id": 28}} , "right":{"level":0 {"left" :"id":16},"right":{"id":26}}}
Actually.. I dont even know what is the right data structure to do this..
Basically I want to query by node and gain some insight on what does the structure of this dendogram looks like when I am standing on that node and looking below. :(
EDIT 1:
*OOH I didn't knew that you wont be able to zoom the image.. basically the fourth element from the left is 28 and the green entry is the first row of the data..
So fourth vertical line on dendogram represents 28
Next to that line (the first green line) represents 16
and next to that line (the second green line) represents 26*
Well it's always good to build upon something already existing so take a look at dendrogram in scipy.

Categories