In a Python script, I have a set of 2D NumPy float arrays, let say n1, n2, n3 and n4. For each such array I have two integer values offset_i_x and offset_i_y (replace i by 1, 2, 3 and 4).
Currently I'm able to create an image for one NumPy array using the following script:
def make_img_from_data(data)
fig = plt.imshow(data, vmin=-7, vmax=0)
fig.set_cmap(cmap)
fig.axes.get_xaxis().set_visible(False)
fig.axes.get_yaxis().set_visible(False)
filename = "my_image.png"
plt.savefig(filename, bbox_inches='tight', pad_inches=0)
plt.close()
Now I would like to consider each array to be a tile of a bigger image and should be placed according to the offset_i_x/y values, to finally write a single figure instead of 4 (in my example). I'm very new to MatplotLib and Python in general. How can I do that?
Also I have noticed that the script above produces images that are 480x480 pixels, whatever the size of the original NumPy array. How can I control the size of the resulting image?
Thanks
You may want to consider the add_axes function of matplotlib.pyplot.
Below is a dirty example, based on what you want to achieve.
Note that I have chosen values of offsets so the example works. You will have to figure out how to convert the values of the offsets you have for each images in fraction of the figure.
import numpy as np
import matplotlib.pyplot as plt
def make_img_from_data(data, offset_xy, fig_number=1):
fig.add_axes([0+offset_xy[0], 0+offset_xy[1], 0.5, 0.5])
plt.imshow(data)
# creation of a dictionary with of 4 2D numpy array
# and corresponding offsets (x, y)
# offsets for the 4 2D numpy arrays
offset_a_x = 0
offset_a_y = 0
offset_b_x = 0.5
offset_b_y = 0
offset_c_x = 0
offset_c_y = 0.5
offset_d_x = 0.5
offset_d_y = 0.5
data_list = ['a', 'b', 'c', 'd']
offsets_list = [[offset_a_x, offset_a_y], [offset_b_x, offset_b_y],
[offset_c_x, offset_c_y], [offset_d_x, offset_d_y]]
# dictionary of the data and offsets
data_dict = {f: [np.random.rand(12, 12), values] for f,values in zip(data_list, offsets_list)}
fig = plt.figure(1, figsize=(6,6))
for n in data_dict:
make_img_from_data(data_dict[n][0], data_dict[n][1])
plt.show()
which produces:
If I understand correctly, you seem to be looking for subplots. Have a look at the thumbnail gallery for examples.
Related
I am trying to plot a heatmap from a 2000x2000 NumPy array. I have tried every solution from this post and many others. I have tried many cmaps and interpolation combinations.
This is the code that prepares the data:
def parse_cords(cord: float):
cord = str(cord).split(".")
h_map[int(cord[0])][int(cord[1])] += 1
df["coordinate"] is a pandas series of floats x,y coordinate. x and y are ranging from 0 to 1999.
I have decided to modify the array so that values will range from 0 to 1, but I have tested the code also without changing the range.
h_map = np.zeros((2000, 2000), dtype='int')
cords = df["coordinate"].map(lambda cord: parse_cords(cord))
maximum = float(np.max(h_map))
precent = lambda x: x/maximum
h_map = precent(h_map)
h_map looks like this:
[[0.58396242 0.08840799 0.03153833 ... 0.00285187 0.00419393 0.06324442]
[0.09075658 0.11172622 0.01476262 ... 0.00134206 0.00687804 0.0082201 ]
[0.02986076 0.01862104 0.03959067 ... 0.00100654 0.00134206 0.00251636]
...
[0.00301963 0.00134206 0.00134206 ... 0.00100654 0.00150981 0.00553598]
[0.00419393 0.00268411 0.00100654 ... 0.00201309 0.00402617 0.01342057]
[0.05183694 0.00251636 0.00184533 ... 0.00301963 0.00838785 0.1016608 ]]
Now the plot:
fig, ax = plt.subplots(figsize=figsize)
ax = plt.imshow(h_map)
And result:
final plot
The result is always a heatmap with only a single color depending on the cmap used. Is my array just too big to be plotted like this or am I doing something wrong?
EDIT:
I have added plt.colorbar() and removed scaling from 0 to 1. The plot knows the range of data (0 to 5500) but assumes that every value is equal to 0.
I think that is because you only provide one color channel. Therefore, plt.imshow() interprets the data as black and white image. You could either add more channels or use a different function e.g. sns.heatmap().
from seaborn import sns
I searched online and couldn't find anything about this that does what I want.
I would like to save a numpy array as an image but instead of having a colorful image, I want a black and white representation of the pixel values in their corresponding grid location.
For example:
import numpy as np
x = np.array([[1,2],[3,4]])
print(x)
# [[1 2]
# [3 4]]
I would like to save this as an image (.PNG) that looks like the following:
My current code creates a grid and places the numbers inside but it is very difficult to adjust everything to make it presentable in a research paper.
So rather than posting my overly complex code, I was wondering if there is a built in function to handle this in a few lines of code.
I would use LaTeX to generate the tables, since they look fancy and you can either generate an image or directly put them in your document. I used the following code to achieve this:
#!/usr/bin/env
import numpy as np
import os
x = np.array([[1,2],[3,4]])
def generateLatexTable(x):
start = [r'\documentclass[preview]{standalone}', r'\begin{document}', r'\begin{tabular}{%s}' % ('{1}{0}{1}'.format('|'.join(['r'] * x.shape[1]), '|')), r'\hline']
tab = [' & '.join(['%d' % val for val in row]) + r' \\ \hline' for row in x]
end = [r'\end{tabular}', r'\end{document}']
text = '\n'.join(start + tab + end)
return text
with open('table.tex', 'w') as f:
f.write(generateLatexTable(x))
os.system("pdflatex table.tex")
Here, the document class preview is used which returns an image resized to the content of the document, i.e. just the table. Only a tabular environment is used to present the data. There are horizontal and vertical bars between the cells, but it is very easy to change this. In the variable tab the data is processed for each row and converted into a string. Note that you have to specify the output format at this position. I set it to %d so everything is converted to integers.
If you want to use the table directly in a latex source, you have to remove documentclass and \begin{document} as well as \end{document} in the variables of start and end. Finally, everything is put together in a latex-source which is then stored to disk as table.tex. If you just want the image in the end, the resulting file is compiled to table.pdf.
Here is what the output looks like. But like I said, it is very easy to change the looks since it is LaTeX :)
Here is another example with a large matrix (14 x 14), filled with random numbers ranging from 0 to 100:
You can use the table function of matplot to plot the simple table. Furthermore, you can save the plot as PNG.
Below is the simple code for your requirements:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([[1,2],[3,4]])
plt.figure()
plt.table(cellText=x,cellLoc='center',loc='center')
plt.axis('off')
plt.savefig('table.png')
Size of the plot or image can be adjusted by changing figsize parameters in the line : plt.figure(figsize=(x,y))
For better appearance, it can be modified as below:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([[1,2],[3,4]])
fig = plt.figure(figsize=(2,2))
plt.axis('off')
plt.axis('tight')
plt.table(cellText=x,cellLoc='center',loc='center')
#plt.subplots_adjust(hspace=0.5)
fig.tight_layout()
plt.savefig('table.png')
May be this will help:
from matplotlib import pyplot as plt
import numpy as np
w = 10
h = 10
img = np.random.randint(255, size=(w, h))
plt.figure(figsize=(5,8))
plt.imshow(img, interpolation='nearest')
plt.axis('off')
cellTextimg = []
for j in range(0,h):
cellTextimg.append(img[j,:])
the_table = plt.table(cellText= cellTextimg, loc='bottom')
I have data that are multidimensional compositional data (all dimensions sum to 1 or 100). I have learned how to use three of the variables to create a 2d ternary plot.
I would like to add a fourth dimension such that my plot looks like this.
I am willing to use python or R. I am using pyr2 to create the ternary plots in python using R right now, but just because that's an easy solution. If the ternary data could be transformed into 3d coordinates a simple wire plot could be used.
This post shows how 3d compositional data can be transformed into 2d data so that normal plotting method can be used. One solution would be to do the same thing in 3d.
Here is some sample Data:
c1 c2 c3 c4
0 0.082337 0.097583 0.048608 0.771472
1 0.116490 0.065047 0.066202 0.752261
2 0.114884 0.135018 0.073870 0.676229
3 0.071027 0.097207 0.070959 0.760807
4 0.066284 0.079842 0.103915 0.749959
5 0.016074 0.074833 0.044532 0.864561
6 0.066277 0.077837 0.058364 0.797522
7 0.055549 0.057117 0.045633 0.841701
8 0.071129 0.077620 0.049066 0.802185
9 0.089790 0.086967 0.083101 0.740142
10 0.084430 0.094489 0.039989 0.781093
Well, I solved this myself using a wikipedia article, an SO post, and some brute force. Sorry for the wall of code, but you have to draw all the plot outlines and labels and so forth.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
from itertools import combinations
import pandas as pd
def plot_ax(): #plot tetrahedral outline
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
lines=combinations(verts,2)
for x in lines:
line=np.transpose(np.array(x))
ax.plot3D(line[0],line[1],line[2],c='0')
def label_points(): #create labels of each vertices of the simplex
a=(np.array([1,0,0,0])) # Barycentric coordinates of vertices (A or c1)
b=(np.array([0,1,0,0])) # Barycentric coordinates of vertices (B or c2)
c=(np.array([0,0,1,0])) # Barycentric coordinates of vertices (C or c3)
d=(np.array([0,0,0,1])) # Barycentric coordinates of vertices (D or c3)
labels=['a','b','c','d']
cartesian_points=get_cartesian_array_from_barycentric([a,b,c,d])
for point,label in zip(cartesian_points,labels):
if 'a' in label:
ax.text(point[0],point[1]-0.075,point[2], label, size=16)
elif 'b' in label:
ax.text(point[0]+0.02,point[1]-0.02,point[2], label, size=16)
else:
ax.text(point[0],point[1],point[2], label, size=16)
def get_cartesian_array_from_barycentric(b): #tranform from "barycentric" composition space to cartesian coordinates
verts=[[0,0,0],
[1,0,0],
[0.5,np.sqrt(3)/2,0],
[0.5,0.28867513, 0.81649658]]
#create transformation array vis https://en.wikipedia.org/wiki/Barycentric_coordinate_system
t = np.transpose(np.array(verts))
t_array=np.array([t.dot(x) for x in b]) #apply transform to all points
return t_array
def plot_3d_tern(df,c='1'): #use function "get_cartesian_array_from_barycentric" to plot the scatter points
#args are b=dataframe to plot and c=scatter point color
bary_arr=df.values
cartesian_points=get_cartesian_array_from_barycentric(bary_arr)
ax.scatter(cartesian_points[:,0],cartesian_points[:,1],cartesian_points[:,2],c=c)
#Create Dataset 1
np.random.seed(123)
c1=np.random.normal(8,2.5,20)
c2=np.random.normal(8,2.5,20)
c3=np.random.normal(8,2.5,20)
c4=[100-x for x in c1+c2+c3] #make sur ecomponents sum to 100
#df unecessary but that is the format of my real data
df1=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df1=df1/100
#Create Dataset 2
np.random.seed(1234)
c1=np.random.normal(16,2.5,20)
c2=np.random.normal(16,2.5,20)
c3=np.random.normal(16,2.5,20)
c4=[100-x for x in c1+c2+c3]
df2=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df2=df2/100
#Create Dataset 3
np.random.seed(12345)
c1=np.random.normal(25,2.5,20)
c2=np.random.normal(25,2.5,20)
c3=np.random.normal(25,2.5,20)
c4=[100-x for x in c1+c2+c3]
df3=pd.DataFrame(data=[c1,c2,c3,c4],index=['c1','c2','c3','c4']).T
df3=df3/100
fig = plt.figure()
ax = Axes3D(fig) #Create a 3D plot in most recent version of matplot
plot_ax() #call function to draw tetrahedral outline
label_points() #label the vertices
plot_3d_tern(df1,'b') #call function to plot df1
plot_3d_tern(df2,'r') #...plot df2
plot_3d_tern(df3,'g') #...
The accepted answer explains how to do this in python but the question was also asking about R.
I've provided an answer in this thread on how to do this 'manually' in R.
Otherwise, you can use the klaR package directly for this:
df <- matrix(c(
0.082337, 0.097583, 0.048608, 0.771472,
0.116490, 0.065047, 0.066202, 0.752261,
0.114884, 0.135018, 0.073870, 0.676229,
0.071027, 0.097207, 0.070959, 0.760807,
0.066284, 0.079842, 0.103915, 0.749959,
0.016074, 0.074833, 0.044532, 0.864561,
0.066277, 0.077837, 0.058364, 0.797522,
0.055549, 0.057117, 0.045633, 0.841701,
0.071129, 0.077620, 0.049066, 0.802185,
0.089790, 0.086967, 0.083101, 0.740142,
0.084430, 0.094489, 0.039989, 0.781094
), byrow = TRUE, nrow = 11, ncol = 4)
# install.packages(c("klaR", "scatterplot3d"))
library(klaR)
#> Loading required package: MASS
quadplot(df)
Created on 2020-08-14 by the reprex package (v0.3.0)
Pretty much exactly what the question states, but a little context:
I'm creating a program to plot a large number of points (~10,000, but it will be more later on). This is being done using matplotlib's plt.scatter. This command is part of a loop that saves the figure, so I can later animate it.
What I want to be able to do is randomly select a small portion of these particles (say, maybe 100?) and give them a different marker than the rest, even though they're part of the same data set. This is so I can use them as placeholders to see the motion of individual particles, as well as the bulk material.
Is there a way to use a different marker for a small subset of the same data?
For reference, the particles are uniformly distributed just using the numpy random sampler, but my code for that is:
for i in range(N): # N number of particles
particle_position[i] = np.random.uniform(0, xmax) # Initialize in spatial domain
particle_velocity[i] = np.random.normal(0, 5) # Initialize in velocity space
for i in range(maxtime):
plt.scatter(particle_position, particle_velocity, s=1, c=norm_xvel, cmap=br_disc, lw=0)
The position and velocity change on each iteration of the main loop (there's quite a bit of code), but these are the main initialization and plotting routines.
I had an idea that perhaps I could randomly select a bunch of i values from range(N), and use an ax.scatter() command to plot them on the same axes?
Here is a possible solution to have a subset of your points identified with a different marker:
import matplotlib.pyplot as plt
import numpy as np
SIZE = 100
SAMPLE_SIZE = 10
def select_subset(seq, size):
"""selects a subset of the data using ...
"""
return seq[:size]
points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)
plt.scatter(points_x, points_y, marker=".", color="blue")
plt.scatter(select_subset(points_x, SAMPLE_SIZE),
select_subset(points_y, SAMPLE_SIZE),
marker="o", color="red")
plt.show()
It uses plt.scatter twice; once on the full data set, the other on the sample points.
You will have to decide how you want to select the sample of points - it is isolated in the select_subset function..
You could also extract the sample points from the data set to prevent marking them twice, but numpy is rather inefficient at deleting or resizing.
Maybe a better method is to use a mask? A mask has the advantage of leaving your original data intact and in order.
Here is a way to proceed with masks:
import matplotlib.pyplot as plt
import numpy as np
import random
SIZE = 100
SAMPLE_SIZE = 10
def make_mask(data_size, sample_size):
mask = np.array([True] * sample_size + [False ] * (data_size - sample_size))
np.random.shuffle(mask)
return mask
points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)
mask = make_mask(SIZE, SAMPLE_SIZE)
not_mask = np.invert(mask)
plt.scatter(points_x[not_mask], points_y[not_mask], marker=".", color="blue")
plt.scatter(points_x[mask], points_y[mask], marker="o", color="red")
plt.show()
As you see, scatter is called once on a subset of the data points (the ones not selected in the sample), and a second time on the sampled subset, and draws each subset with its own marker. It is efficient & leaves the original data intact.
The code below does what you want. I have selected a random set v_sub_index of N_sub indices in the correct range (0 to N) and draw those (with _sub suffix) from the larger samples particle_position and particle_velocity. Please note that you don't have to loop to generate random samples. Numpy has great functionality for that without having to use for loops.
import numpy as np
import matplotlib.pyplot as pl
N = 100
xmax = 1.
v_sigma = 2.5 / 2. # 95% of the samples contained within 0, 5
v_mean = 2.5 # mean at 2.5
N_sub = 10
v_sub_index = np.random.randint(0, N, N_sub)
particle_position = np.random.rand (N) * xmax
particle_velocity = np.random.randn(N)
particle_position_sub = np.array(particle_position[v_sub_index])
particle_velocity_sub = np.array(particle_velocity[v_sub_index])
particle_position_nosub = np.delete(particle_position, v_sub_index)
particle_velocity_nosub = np.delete(particle_velocity, v_sub_index)
pl.scatter(particle_position_nosub, particle_velocity_nosub, color='b', marker='o')
pl.scatter(particle_position_sub , particle_velocity_sub , color='r', marker='^')
pl.show()
I would like to plot a distance matrix plot for distance between 6 towns. A1 to A3 and B1 to B3.
I have calculated the distance like A1-B1, A1-B2....likewise....A3-B3 and I got an 1D array
I got a 1D numpy array for distance between 6 towns .
np.array(R)
[ 3.00 2.50 1.00 3.3192 2.383 2.7128 3.8662 3.6724 3.5112]
Now I want plot in an distance matrix format which should look something like as shown in Figure below.
it is just a representative data. I got lots of values so need python program.
Any suggestion or sample python matplotlib script will help.
Regards.
Looks like you got most of the way yourself. You can clean up your plot to make it a little more like what you intended by changing the axis labels to A1, A2,... and by printing the values of each cell within them.
The cleaned up version of your script is below:
import numpy as np
import matplotlib.pyplot as plt
R = np.array ([3.00, 2.50, 1.00, 3.3192, 2.383, 2.7128, 3.8662, 3.6724, 3.5112])
# Calculate the shape of the 2d array
n = int( np.sqrt( R.size ) )
C = R.reshape((n,n))
# Plot the matrix
plt.matshow(C,cmap="Reds")
ax = plt.gca()
# Set the plot labels
xlabels = ["B%d" % i for i in xrange(n+1)]
ylabels = ["A%d" % i for i in xrange(n+1)]
ax.set_xticklabels(xlabels)
ax.set_yticklabels(ylabels)
#Add text to the plot showing the values at that point
for i in xrange(n):
for j in xrange(n):
plt.text(j,i, C[i,j], horizontalalignment='center', verticalalignment='center')
plt.show()
And will create the following plot:
import numpy as np
from matplotlib.pylab import *
R = np.array ([3.00, 2.50, 1.00, 3.3192, 2.383, 2.7128, 3.8662, 3.6724, 3.5112])
C = np.split(R, 3)
print(C)
matshow(C,cmap=cm.gray)
plt.show()