I have a file containing 3 columns, where the first two are coordinates (x,y) and the third is a value (z) corresponding to that position. Here's a short example:
x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13
I want to create a 2D array of values from the third row based on their x,y coordinates in the file. I read in each column as an individual array, and I created grids of x values and y values using numpy.meshgrid, like this:
x = [[0 1 2] and y = [[0 0 0]
[0 1 2] [1 1 1]
[0 1 2]] [2 2 2]]
but I'm new to Python and don't know how to produce a third grid of z values that looks like this:
z = [[Nan 15 Nan]
[14 16 18]
[17 Nan 13]]
Replacing Nan with 0 would be fine, too; my main problem is creating the 2D array in the first place. Thanks in advance for your help!
Assuming the x and y values in your file directly correspond to indices (as they do in your example), you can do something similar to this:
import numpy as np
x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]
z_array = np.nan * np.empty((3,3))
z_array[y, x] = z
print z_array
Which yields:
[[ nan 15. nan]
[ 14. 16. 18.]
[ 17. nan 13.]]
For large arrays, this will be much faster than the explicit loop over the coordinates.
Dealing with non-uniform x & y input
If you have regularly sampled x & y points, then you can convert them to grid indices by subtracting the "corner" of your grid (i.e. x0 and y0), dividing by the cell spacing, and casting as ints. You can then use the method above or in any of the other answers.
As a general example:
i = ((y - y0) / dy).astype(int)
j = ((x - x0) / dx).astype(int)
grid[i,j] = z
However, there are a couple of tricks you can use if your data is not regularly spaced.
Let's say that we have the following data:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y, z = np.random.random((3, 10))
fig, ax = plt.subplots()
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)
That we want to put into a regular 10x10 grid:
We can actually use/abuse np.histogram2d for this. Instead of counts, we'll have it add the value of each point that falls into a cell. It's easiest to do this through specifying weights=z, normed=False.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y, z = np.random.random((3, 10))
# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
zi = np.ma.masked_equal(zi, 0)
fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)
plt.show()
However, if we have a large number of points, some bins will have more than one point. The weights argument to np.histogram simply adds the values. That's probably not what you want in this case. Nonetheless, we can get the mean of the points that fall in each cell by dividing by the counts.
So, for example, let's say we have 50 points:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y, z = np.random.random((3, 50))
# Bin the data onto a 10x10 grid
# Have to reverse x & y due to row-first indexing
zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
counts, _, _ = np.histogram2d(y, x, bins=(10,10))
zi = zi / counts
zi = np.ma.masked_invalid(zi)
fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(x, y, c=z, s=200)
fig.colorbar(scat)
ax.margins(0.05)
plt.show()
With very large numbers of points, this exact method will become slow (and can be sped up easily), but it's sufficient for anything less than ~1e6 points.
Kezzos beat me to it but I had a similar approach,
x = np.array([0,0,1,1,2,2])
y = np.array([1,2,0,1,1,2])
z = np.array([14,17,15,16,18,13])
Z = np.zeros((3,3))
for i,j in enumerate(zip(x,y)):
Z[j] = z[i]
Z[np.where(Z==0)] = np.nan
You could try something like:
import numpy as np
x = [0, 0, 1, 1, 2, 2]
y = [1, 2, 0, 1, 1, 2]
z = [14, 17, 15, 16, 18, 13]
arr = np.zeros((3,3))
yx = zip(y,x)
for i, coord in enumerate(yx):
arr[coord] = z[i]
print arr
>>> [[ 0. 15. 0.]
[ 14. 16. 18.]
[ 17. 0. 13.]]
If you have scipy installed, you could take advantage of its sparse matrix module. Get the values from the text file with genfromtxt, and plug those 'columns' directly into a sparse matrix creator.
In [545]: txt=b"""x y z
0 1 14
0 2 17
1 0 15
1 1 16
2 1 18
2 2 13
"""
In [546]: xyz=np.genfromtxt(txt.splitlines(),names=True,dtype=int)
In [547]: sparse.coo_matrix((xyz['z'],(xyz['y'],xyz['x']))).A
Out[547]:
array([[ 0, 15, 0],
[14, 16, 18],
[17, 0, 13]])
But Joe's z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] is considerably faster.
Nice answers by others. Thought this might be a useful snippet for someone else who might need this.
def make_grid(x, y, z):
'''
Takes x, y, z values as lists and returns a 2D numpy array
'''
dx = abs(np.sort(list(set(x)))[1] - np.sort(list(set(x)))[0])
dy = abs(np.sort(list(set(y)))[1] - np.sort(list(set(y)))[0])
i = ((x - min(x)) / dx).astype(int) # Longitudes
j = ((y - max(y)) / dy).astype(int) # Latitudes
grid = np.nan * np.empty((len(set(j)),len(set(i))))
grid[-j, i] = z # if using latitude and longitude (for WGS/West)
return grid
Related
I have a numpy array created as follows
results = np.zeros((X, Y, Z))
Then I am setting values of the points in 3D space as follows (representative of density / intensity of that point)
results[x,y,z] = 5.0
I now want to visualize this data using the x,y,z coordinates and an intensity value (like opacity or size of a scatter plot). However I cannot figure out how to convert this into four lists of x, y, z, and intensity, for a 3D scatter plot. How do I do this?
i would do smth like this:
import numpy as np
import matplotlib.pyplot as plt
dots = np.random.randint(0, 2, size = (3, 3, 3))
dots *= np.random.randint(0, 2, size = (3, 3, 3))
dots *= np.arange(27).reshape(3, 3, 3)
x, y, z = np.where(dots!=0)
o = dots[x, y, z]
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
for i in range(len(x)):
print(o[i]/27)
ax.plot([x[i]], [y[i]], [z[i]], 'o', color=[0, 0, 0, float(o[i])/27])
output:
dots =
[[[ 0 0 0]
[ 0 0 0]
[ 6 0 0]]
[[ 0 0 11]
[ 0 13 0]
[15 16 17]]
[[ 0 0 0]
[21 22 23]
[24 0 0]]]
My solution:
fig = plt.figure(figsize=(15, 15))
ax = fig.add_subplot(projection="3d")
plt.title("Spherical Potential Heatmap ($J = 32, simuls = 6.4M, E = 1, cutoff = 100$)", fontsize=18)
ax.xaxis.pane.fill = False
ax.yaxis.pane.fill = False
ax.zaxis.pane.fill = False
mask = base_array_e0 > 100
idx = np.arange(int(np.prod(base_array_e0.shape)))
x, y, z = np.unravel_index(idx, base_array_e0.shape)
plot = ax.scatter(x, y, z, c=base_array_e0.flatten(), s=10.0 * mask, edgecolor="face", alpha=0.15, marker="o", cmap="magma", linewidth=0)
color_bar = plt.colorbar(plot, ax = ax,fraction=0.036, pad=0.04)
color_bar.set_alpha(1)
color_bar.draw_all()
color_bar.set_label('Steps')
plt.savefig('random_walk_3d_energy_sphere_0.png', bbox_inches='tight');
I would like to produce a heat map from CSV data that contain negative values in the x axis. I copied code from this post as a starting point: previous post. However, when I try it this does not display the negative x values. In fact, with some data sets (like the example) it doesn't appear to set the correct axis values at all. I am unsure why this is the case as the axis seem to be defined from the CSV data in the code. I thought it might be to do with dtype=np.int but it seems it is not.
import numpy as np
import matplotlib.pyplot as plt
csv_file_path = '<FILE PATH>'
def get_xyz_from_csv_file_np(csv_file_path):
'''
get a grid of values from a csv file
csv file format: x0,y0,z0
'''
x, y, z = np.loadtxt(csv_file_path, delimiter=',', dtype=np.int).T
plt_z = np.zeros((y.max()+1, x.max()+1))
plt_z[y, x] = z
return plt_z
def draw_heatmap(plt_z):
# Generate y and x values from the dimension lengths
plt_y = np.arange(plt_z.shape[0])
plt_x = np.arange(plt_z.shape[1])
z_min = plt_z.min()
z_max = plt_z.max()
plot_name = "plot"
z_name = "Signal"
color_map = plt.cm.rainbow
fig, ax = plt.subplots()
cax = ax.pcolor(plt_x, plt_y, plt_z, cmap=color_map, vmin=z_min, vmax=z_max)
ax.set_xlim(plt_x.min(), plt_x.max())
ax.set_ylim(plt_y.min(), plt_y.max())
fig.colorbar(cax).set_label(z_name, rotation=270)
ax.set_title(plot_name)
ax.set_aspect('auto')
plt.show()
return figure
figure = plt.gcf()
plt.show()
return figure
if __name__ == "__main__":
fname = 'temp.csv'
# create_test_csv(fname)
res = get_xyz_from_csv_file_np(csv_file_path)
draw_heatmap(res)
The output I get is this:
The example data file is a comma delimited csv with this data (x,y,z):
-2 -1 0
-2 0 10
-2 1 0
-1 -1 2
-1 0 5
-1 1 2
0 -1 0
0 0 0
0 1 10
1 -1 10
1 0 0
1 1 0
2 -1 10
2 0 0
2 1 10
Can anyone (1) fix this code so that negative values can be displayed an the axis are correct and (2) explain to me what I am doing wrong.
Thanks!
The code below first mimics the .csvfile with an array and then extracts x, y and z. To know the dimensions, not just the maximum but the difference between maximum and minimum need to be considered. The x and y arrays are only interesting for the rest of the code because of their minimum and maximum.
To draw the heatmap, only plt_z is needed, as it already has the correct shape. x and y can be used to set the extents (i.e. the values for the x and y axis). plt.imshow() is a similar function to plt.pcolor() but allows to set the extents as a parameter. It needs origin='lower' because for many image formats the origin is at the top.
To have the ticks in the center of the cells, an extra margin 0.5 needs to be added. To have the ticks shown at every integer position, a MultipleLocator() can be used.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker
def get_xyz_from_csv_file_np():
data = [[-2, -1, 0],
[-2, 0, 10],
[-2, 1, 0],
[-1, -1, 2],
[-1, 0, 5],
[-1, 1, 2],
[0, -1, 0],
[0, 0, 0],
[0, 1, 10],
[1, -1, 10],
[1, 0, 0],
[1, 1, 0],
[2, -1, 10],
[2, 0, 0],
[2, 1, 10]]
data = np.array(data, dtype=np.int)
x = data[:, 0]
y = data[:, 1]
z = data[:, 2]
n = y.max() - y.min() + 1
m = x.max() - x.min() + 1
return x.reshape(n, m), y.reshape(n, m), z.reshape(n, m)
def draw_heatmap(plt_x, plt_y, plt_z):
plot_name = "plot"
z_name = "Signal"
color_map = plt.cm.rainbow
fig, ax = plt.subplots()
cax = ax.imshow(plt_z, cmap=color_map,
extent=[plt_x.min() - 0.5, plt_x.max() + 0.5, plt_y.min() - 0.5, plt_y.max() + 0.5], origin='lower')
fig.colorbar(cax).set_label(z_name, rotation=270)
ax.set_title(plot_name)
ax.set_aspect('auto')
# optionally force to have ticks at every integer position
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
plt.show()
x, y, z = get_xyz_from_csv_file_np()
draw_heatmap(x, y, z)
PS: In case the z-values have a natural order, it would be best not to use the rainbow colormap, but one of the 'Perceptually Uniform Sequential' colormaps ('viridis', 'plasma', 'inferno', 'magma', 'cividis').
i'm looking for the best way to create a contour plot using a numpy meshgrid.
I have excel data in columns simplyfied looking like this:
x data values: -3, -2, -1, 0, 1, 2 ,3, -3, -2, -1, 0, 1, 2, 3
y data values: 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2
z data values: 7 , 5, 6, 5, 1, 0, 9, 5, 3, 8, 3, 1, 0, 4
The x and y values define a 2d plane with the length (x-Axis) of 7 values and depth (y-Axis) of 2 values. The z values define the colour at the corresponing points (more or less a z-Axis).
I've tried:
import matplotlib.pyplot as plt
import numpy as np
x = [-3,-2,-1,0,1,2,3]
y = [1,2]
z = [7,5,6,5,1,0,9,5,3,8,3,1,0,4]
x, y = np.meshgrid(x, y)
A = np.array(z)
B = np.reshape(A, (-1, 2))
fig = plt.figure()
ax1 = plt.contourf(x, y, B)
plt.show()
I'm pretty sure i'm not getting how the meshgrid works. Do i have to use the whole List of x and y values for it to work?
How do i create a rectangular 2d plot with the length (x) of 7 and the depth (y) of 2 and the z values defining the shading/colour at the x and y values?
Thanks in advance guys!
Try
x_, y_ = np.meshgrid(x, y)
z_grid = np.array(z).reshape(2,7)
fig = plt.figure()
ax1 = plt.contourf(x_,y_,z_grid)
plt.show()
Edit: If you would like to smooth, as per your comment, you can try something like scipy.ndimage.zoom() as described here, i.e., in your case
from scipy import ndimage
z_grid = np.array(z).reshape(2,7)
z_grid_interp = ndimage.zoom(z_grid, 100)
x_, y_ = np.meshgrid(np.linspace(-3,3,z_grid_interp.shape[1]),np.linspace(1,2,z_grid_interp.shape[0]))
and then plot as before:
fig = plt.figure()
ax1 = plt.contourf(x_,y_,z_grid_interp)
plt.show()
This is one way where you use the shape of the meshgrid (X or Y) to reshape your z array. You can, moreover, add a color bar using plt.colorbar()
import matplotlib.pyplot as plt
import numpy as np
x = [-3,-2,-1,0,1,2,3]
y = [1,2]
z = np.array([7,5,6,5,1,0,9,5,3,8,3,1,0,4])
X, Y = np.meshgrid(x, y)
print (X.shape, Y.shape)
# (2, 7) (2, 7) Both have same shape
Z = z.reshape(X.shape) # Use either X or Y to define shape
fig = plt.figure()
ax1 = plt.contourf(X, Y, Z)
plt.colorbar(ax1)
plt.show()
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2, 3 )
y = np.linspace(0, 3, 4)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, cmap='RdGy');
I am trying to convert a dictionary into a form which can be plotted as a contour using matplotlib. The keys to the dictionary are a tuple of the X,Y coordinates, and the value is the reading at that coordinate. I would like put these into a three numpy array, a 1D array of x coordinates, a 1D array of y coordinates, and a 2D array of values. The respective indices of the x,y arrays should corresponds to the index of the value in the 2D array defined in the dictionary.
An edit to better define the question:
Example Input Data:
Dictionary
(0,0): 1
(1.5,0): 2
(0,1.5): 3
(1.5,1.5): 4
What I would like
x = [0,1.5]
y = [0,1.5]
values = [[1,2],[3,4]]
I have got
for key in corr_data.items():
X.append(key[0])
Y.append(key[1])
X = list(dict.fromkeys(X))
Y = list(dict.fromkeys(Y))
which gets the x and y arrays but the values array eludes me.
Any help is appreciated
You can simply iterate over your dict and create your lists and maybe convert that lists to numpy.ndarray
x = []
y = []
vals = np.zeros(your_grid_shape)
for ((i,j), v) in your_dict.iteritems():
x.append(i)
y.append(j)
vals[i, j] = v
x = list(set(x))
y = list(set(y))
I here a 'self-containing' answer in the sense that I first generate some input data, which I then convert into a dictionary and then back into the original arrays. On the way, I add some random noise to keep the x and y values close to each other but still make them unique. Following this answer, a list of all values that are 'close' to each other can be found by first rounding the values and then using np.unique.
mport numpy as np
##generating some input data:
print('input arrays')
xvals = np.linspace(1,10, 5)
print(xvals)
yvals = np.linspace(0.1, 0.4, 4)
print(yvals)
xvals, yvals = np.meshgrid(xvals, yvals)
##adding some noise to make it more interesting:
xvals += np.random.rand(*xvals.shape)*1e-3
yvals += np.random.rand(*yvals.shape)*1e-5
zvals = np.arange(xvals.size).reshape(*xvals.shape)
print(zvals)
input_dict ={
(i,j): k for i,j,k in zip(
list(xvals.flatten()), list(yvals.flatten()), list(zvals.flatten())
)
}
##print(input_dict)
x,y,z = map(np.array,zip(*((x,y,z) for (x,y),z in input_dict.items())))
##this part will need some tweaking depending on the size of your
##x and y values
xlen = len(np.unique(x.round(decimals=2)))
ylen = len(np.unique(y.round(decimals=3)))
x = x.round(decimals=2).reshape(ylen,xlen)[0,:]
y = y.round(decimals=3).reshape(ylen,xlen)[:,0]
z = z.reshape(ylen,xlen)
print('\n', 'output arrays')
print(x)
print(y)
print(z)
The output looks like this:
input arrays
[ 1. 3.25 5.5 7.75 10. ]
[0.1 0.2 0.3 0.4]
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
output arrays
[ 1. 3.25 5.5 7.75 10. ]
[0.1 0.2 0.3 0.4]
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
Old Answer:
There are a lot of assumptions in this answer, mainly because there is not quite enough information in the question. But, assuming that
the x and y values are as nicely ordered as in the example data
the x and y values are complete
One could go about the problem with a list comprehension and a reshaping of numpy ndarrays:
import numpy as np
input_dict = {
(0,0): 1,
(1,0): 2,
(0,1): 3,
(1,1): 4,
}
x,y,z = map(np.array,zip(*((x,y,z) for (x,y),z in input_dict.items())))
xlen = len(set(x))
ylen = len(set(y))
x = x.reshape(xlen,ylen)[0,:]
y = y.reshape(xlen,ylen)[:,0]
z = z.reshape(xlen,ylen)
print(x)
print(y)
print(z)
which gives
[0 1]
[0 1]
[[1 2]
[3 4]]
hope this helps.
PS: If the x and y values are not in necessarily in the order suggested by the posted example data, one can still solve the issue with some clever sorting.
In the REPL
In [9]: d = {(0,0): 1, (1,0): 2, (0,1): 3, (1,1): 4}
In [10]: x = set(); y = set()
In [11]: for xx, yy in d.keys():
...: x.add(xx)
...: y.add(yy)
In [12]: x
Out[12]: {0, 1}
In [13]: x = sorted(x) ; y = sorted(y)
In [14]: x
Out[14]: [0, 1]
In [15]: v = [[d.get((xx,yy)) for yy in y] for xx in x]
In [16]: v
Out[16]: [[1, 3], [2, 4]]
As you can see, my result is different from your example but it's common to have x corresponding to rows and y corresponding to columns. If you want a more geographic convention, swap x and y in the final list comprehension.
As a script we may write
def extract{d}:
x = set(); y = set()
for xx, yy in d.keys():
x.add(xx)
y.add(yy)
x = sorted(x) ; y = sorted(y)
v = [[d.get((xx,yy)) for yy in y] for xx in x]
# = [[d.get((xx,yy)) for xx in x] for yy in y]
return x, y, v
I'm not quite sure how to say this so I'll try to be clear in my description.
Right now I have a 3D numpy array where the 1st column represents a depth and the 2nd a position on the x-axis. My goal is to make a pcolor where the columns are spread out along the x-axis based on the values in a 1D float array.
Here's where it gets tricky, I only have the relative distances between points. That is, the distance between column 1 and column 2 and so on.
Here's an example of what I have and what I'd like:
darray = [[2 3 7 7]
[4 8 2 3]
[6 1 9 5]
[3 4 8 4]]
posarray = [ 3.767, 1.85, 0.762]
DesiredArray = [[2 0 0 0 3 0 7 7]
[4 0 0 0 8 0 2 3]
[6 0 0 0 1 0 9 5]
[3 0 0 0 4 0 8 4]]
How I tried implementing it:
def space_set(darr, sarr):
spaced = np.zeros((260,1+int(sum(sarr))), dtype = float)
x = 0
for point in range(len(sarr)):
spaced[:, x] = darr[:,point]
x = int(sum(sarr[0:point]))
spaced[:,-1] = darr[:,-1]
Then I was planning on using matplotlibs pcolor to plot it. This method seems to lose columns though. Any ideas for either directly plotting or making a numpy array to plot? Thanks in advance.
Here's an example of what I'm looking for.
Since there is so much whitespace, perhaps it would be easier to draw the Rectangles, rather than use pcolor. As a bonus, you can place the rectangles exactly where you want them, rather than having to "snap" them to an integer-valued grid. And, you do not have to allocate space for a larger 2D array mainly filled with zeros. (In your case the memory required is probably measly, but the idea does not scale well, so it is nice if we can avoid doing that.)
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches as patches
import matplotlib.cm as cm
def draw_rect(x, y, z):
rect = patches.Rectangle((x,y), 1, 1, color = jet(z))
ax.add_patch(rect)
jet = plt.get_cmap('jet')
fig = plt.figure()
ax = fig.add_subplot(111)
darray = np.array([[2, 3, 7, 7],
[4, 8, 2, 3],
[6, 1, 9, 5],
[3, 4, 8, 4]], dtype = 'float')
darray_norm = darray/darray.max()
posarray = [3.767, 1.85, 0.762]
x = np.cumsum(np.hstack((0, np.array(posarray)+1)))
for j, i in np.ndindex(darray.shape):
draw_rect(x[j], i, darray_norm[i, j])
ax.set_xlim(x.min(),x.max()+1)
ax.set_ylim(0,len(darray))
ax.invert_yaxis()
m = cm.ScalarMappable(cmap = jet)
m.set_array(darray)
plt.colorbar(m)
plt.show()
yields