plt.imshow() shows only one color - python

I am trying to plot a heatmap from a 2000x2000 NumPy array. I have tried every solution from this post and many others. I have tried many cmaps and interpolation combinations.
This is the code that prepares the data:
def parse_cords(cord: float):
cord = str(cord).split(".")
h_map[int(cord[0])][int(cord[1])] += 1
df["coordinate"] is a pandas series of floats x,y coordinate. x and y are ranging from 0 to 1999.
I have decided to modify the array so that values will range from 0 to 1, but I have tested the code also without changing the range.
h_map = np.zeros((2000, 2000), dtype='int')
cords = df["coordinate"].map(lambda cord: parse_cords(cord))
maximum = float(np.max(h_map))
precent = lambda x: x/maximum
h_map = precent(h_map)
h_map looks like this:
[[0.58396242 0.08840799 0.03153833 ... 0.00285187 0.00419393 0.06324442]
[0.09075658 0.11172622 0.01476262 ... 0.00134206 0.00687804 0.0082201 ]
[0.02986076 0.01862104 0.03959067 ... 0.00100654 0.00134206 0.00251636]
...
[0.00301963 0.00134206 0.00134206 ... 0.00100654 0.00150981 0.00553598]
[0.00419393 0.00268411 0.00100654 ... 0.00201309 0.00402617 0.01342057]
[0.05183694 0.00251636 0.00184533 ... 0.00301963 0.00838785 0.1016608 ]]
Now the plot:
fig, ax = plt.subplots(figsize=figsize)
ax = plt.imshow(h_map)
And result:
final plot
The result is always a heatmap with only a single color depending on the cmap used. Is my array just too big to be plotted like this or am I doing something wrong?
EDIT:
I have added plt.colorbar() and removed scaling from 0 to 1. The plot knows the range of data (0 to 5500) but assumes that every value is equal to 0.

I think that is because you only provide one color channel. Therefore, plt.imshow() interprets the data as black and white image. You could either add more channels or use a different function e.g. sns.heatmap().
from seaborn import sns

Related

Plotting a 3-dimensional graph by increasing the size of the points

Currently I have a plot that look like this.
How do I increase the size of each point by the count? In other words, if a certain point has 9 counts, how do I increase it so that it is bigger than another point with only 2 counts?
If you look closely, I think there are overlaps (one point has both grey and orange circles). How do I make it so there's a clear difference?
In case you have no idea what I mean by "Plotting a 3-dimensional graph by increasing the size of the points", this below is what I mean, where the z-axis is the count
This answer doesn't really answer the question straight up, but please consider this multivariate solution seaborn has:
The syntax is way easier to write than using matplotlib.
seaborn.jointplot(
data = data,
x = x_name,
y = y_name,
hue = label_name
)
And voila! You should get something that looks like this
See: https://seaborn.pydata.org/generated/seaborn.jointplot.html
Using matplotlib library you could iterate on your data and count it (some example below).
import numpy as np
import matplotlib.pyplot as plt
# Generate Data
N = 100
l_bound = -10
u_bound = 10
s_0 = 40
array = np.random.randint(l_bound, u_bound, (N, 2))
# Plot it
points, counts = np.unique(array, axis = 0, return_counts = True)
for point_, count_ in zip(points, counts):
plt.scatter(point_[0], point_[1], c = np.random.randint(0, 3), s = s_0* count_**2, vmin = 0, vmax = 2)
plt.colorbar()
plt.show()
Result
You can probably do the same with Plotly to have something fancier and closer to your second picture.
Cheers
Try this:
x_name = 'x_name'
y_name = 'y_name'
z_name = 'z_name'
scatter_data = pd.DataFrame(data[[x_name, y_name, z_name]].value_counts())
scatter_data.reset_index(inplace=True)
plt.scatter(
scatter_data.loc[:, x_name],
scatter_data.loc[:, y_name],
s=scatter_data.loc[:, 0],
c=scatter_data.loc[:, z_name]
)
The thing is that the reason why your scatter plot looks like this is because every point which is at (1, 1) or (0,1) is overlapping.
With the plt.scatter argument (s=), you can specify the size of the points. If you print scatter_data, it is a "group by" clause with the count of each of the indexes.
It should look something like this, with column 0 being the count.
It should look something like the above.

Plotting per-point alpha values in 3D scatterplot throws ValueError

I have data in form of a 3D array, with "intensities" at every point. Depending on the intensity, I want to plot the point with a higher alpha. There are a lot of low-value outliers, so color coding (with scalar floats) won't work since they eclipse the real data.
What I have tried:
#this generates a 3D array with higher values around the center
a = np.array([0,1,2,3,4,5,4,3,2,1])
aa = np.outer(a,a)
aaa = np.einsum("ij,jk,jl",aa,aa,aa)
x_,y_,z_,v_ = [],[],[],[]
from matplotlib.colors import to_rgb,to_rgba
for x in range(aaa.shape[0]):
for y in range(aaa.shape[1]):
for z in range(aaa.shape[2]):
x_.append(x)
y_.append(y)
z_.append(z)
v_.append(aaa[x,y,z])
r,g,b = to_rgb("blue")
color = np.array([[r,g,b,a] for a in v_])
fig = plt.figure()
ax = fig.add_subplot(projection = '3d')
ax.scatter(x_,y_,z_,c =color)
plt.show()
the scatterplot documentation says that color can be a 2D array of RGBA, which I do pass. Hoever when I try to run the code, I get the following error:
ValueError: 'c' argument has 4000 elements, which is inconsistent with 'x' and 'y' with size 1000.
I just found my own answer.
The "A 2D array in which the rows are RGB or RGBA." statement in the documentation was a bit confusing - one needs to convert the RGBA rows to RGBA objects first, so that list comprehension should read:
color = np.array([to_rgba([r,g,b,a]) for a in v_])

Personalised colourmap plot using set numbers using matplotlib

I have a data which looks like (example)
x y d
0 0 -2
1 0 0
0 1 1
1 1 3
And I want to turn this into a coloumap plot which looks like one of these:
where x and y are in the table and the color is given by 'd'. However, I want a predetermined color for each number, for example:
-2 - orange
0 - blue
1 - red
3 - yellow
Not necessarily these colours but I need to address a number to a colour and the numbers are not in order or sequence, the are just a set of five or six random numbers which repeat themselves across the entire array.
Any ideas, I haven't got a code for that as I don't know where to start. I have however looked at the examples in here such as:
Matplotlib python change single color in colormap
However they only show how to define colours and not how to link those colours to an specific value.
It turns out this is harder than I thought, so maybe someone has an easier way of doing this.
Since we need to create an image of the data, we will store them in a 2D array. We can then map the data to the integers 0 .. number of different data values and assign a color to each of them. The reason is that we want the final colormap to be equally spaced. So
value -2 --> integer 0 --> color orange
value 0 --> integer 1 --> color blue
and so on.
Having nicely spaced integers, we can use a ListedColormap on the image of newly created integer values.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors
# define the image as a 2D array
d = np.array([[-2,0],[1,3]])
# create a sorted list of all unique values from d
ticks = np.unique(d.flatten()).tolist()
# create a new array of same shape as d
# we will later use this to store values from 0 to number of unique values
dc = np.zeros(d.shape)
#fill the array dc
for i in range(d.shape[0]):
for j in range(d.shape[1]):
dc[i,j] = ticks.index(d[i,j])
# now we need n (= number of unique values) different colors
colors= ["orange", "blue", "red", "yellow"]
# and put them to a listed colormap
colormap = matplotlib.colors.ListedColormap(colors)
plt.figure(figsize=(5,3))
#plot the newly created array, shift the colorlimits,
# such that later the ticks are in the middle
im = plt.imshow(dc, cmap=colormap, interpolation="none", vmin=-0.5, vmax=len(colors)-0.5)
# create a colorbar with n different ticks
cbar = plt.colorbar(im, ticks=range(len(colors)) )
#set the ticklabels to the unique values from d
cbar.ax.set_yticklabels(ticks)
#set nice tickmarks on image
plt.gca().set_xticks(range(d.shape[1]))
plt.gca().set_yticks(range(d.shape[0]))
plt.show()
As it may not be intuitively clear how to get the array d in the shape needed for plotting with imshow, i.e. as 2D array, here are two ways of converting the input data columns:
import numpy as np
x = np.array([0,1,0,1])
y = np.array([ 0,0,1,1])
d_original = np.array([-2,0,1,3])
#### Method 1 ####
# Intuitive method.
# Assumption:
# * Indexing in x and y start at 0
# * every index pair occurs exactly once.
# Create an empty array of shape (n+1,m+1)
# where n is the maximum index in y and
# m is the maximum index in x
d = np.zeros((y.max()+1 , x.max()+1), dtype=np.int)
for k in range(len(d_original)) :
d[y[k],x[k]] = d_original[k]
print d
#### Method 2 ####
# Fast method
# Additional assumption:
# indizes in x and y are ordered exactly such
# that y is sorted ascendingly first,
# and for each index in y, x is sorted.
# In this case the original d array can bes simply reshaped
d2 = d_original.reshape((y.max()+1 , x.max()+1))
print d2

Two colour scatter plot in R or in python

I have a dataset of three columns and n number of rows. column 1 contains name, column 2 value1, and column 3 value2 (rank2).
I want to plot a scatter plot with the outlier values displaying names.
The R commands I am using in are:
tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)
dev.off()
and I get a figure like this:
What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively.
Any suggestions, or adjustment in the commands?
You already have a logical test that works to your satisfaction. Just use it in the color spec to text:
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50,
col=c("blue", "green")[
which(2^(data[,2]-data[,3]) >= 4 , 2^(data[,2]-data[,3]) <=0.25)] )
It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector.
Using python, matplotlib (pylab) to plot, and scipy, numpy to fit data. The trick with numpy is to create a index or mask to filter out the results that you want.
EDIT: Want to selectively color the top and bottom outliers? It's a simple combination of both masks that we created:
import scipy as sci
import numpy as np
import pylab as plt
# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here
# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)
# Find all points above the line
idx = (X*a + b) < Y
# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')
# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
plt.text(X[i], Y[i], NAMES[i])
# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]
plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')
XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--')
plt.axis('tight')
plt.show()

creating a color coded time chart using colorbar and colormaps in python

I'm trying to make a time tracking chart based on a daily time tracking file that I used. I wrote code that crawls through my files and generates a few lists.
endTimes is a list of times that a particular activity ends in minutes going from 0 at midnight the first day of the month to however many minutes are in a month.
labels is a list of labels for the times listed in endTimes. It is one shorter than endtimes since the trackers don't have any data about before 0 minute. Most labels are repeats.
categories contains every unique value of labels in order of how well I regard that time.
I want to create a colorbar or a stack of colorbars (1 for eachday) that will depict how I spend my time for a month and put a color associated with each label. Each value in categories will have a color associated. More blue for more good. More red for more bad. It is already in order for the jet colormap to be right, but I need to get desecrate color values evenly spaced out for each value in categories. Then I figure the next step would be to convert that to a listed colormap to use for the colorbar based on how the labels associated with the categories.
I think this is the right way to do it, but I am not sure. I am not sure how to associate the labels with color values.
Here is the last part of my code so far. I found one function to make a discrete colormaps. It does, but it isn't what I am looking for and I am not sure what is happening.
Thanks for the help!
# now I need to develop the graph
import numpy as np
from matplotlib import pyplot,mpl
import matplotlib
from scipy import interpolate
from scipy import *
def contains(thelist,name):
# checks if the current list of categories contains the one just read
for val in thelist:
if val == name:
return True
return False
def getCategories(lastFile):
'''
must determine the colors to use
I would like to make a gradient so that the better the task, the closer to blue
bad labels will recieve colors closer to blue
read the last file given for the information on how I feel the order should be
then just keep them in the order of how good they are in the tracker
use a color range and develop discrete values for each category by evenly spacing them out
any time not found should assume to be sleep
sleep should be white
'''
tracker = open(lastFile+'.txt') # open the last file
# find all the categories
categories = []
for line in tracker:
pos = line.find(':') # does it have a : or a ?
if pos==-1: pos=line.find('?')
if pos != -1: # ignore if no : or ?
name = line[0:pos].strip() # split at the : or ?
if contains(categories,name)==False: # if the category is new
categories.append(name) # make a new one
return categories
# find good values in order of last day
newlabels=[]
for val in getCategories(lastDay):
if contains(labels,val):
newlabels.append(val)
categories=newlabels
# convert discrete colormap to listed colormap python
for ii,val in enumerate(labels):
if contains(categories,val)==False:
labels[ii]='sleep'
# create a figure
fig = pyplot.figure()
axes = []
for x in range(endTimes[-1]%(24*60)):
ax = fig.add_axes([0.05, 0.65, 0.9, 0.15])
axes.append(ax)
# figure out the colors to use
# stole this function to make a discrete colormap
# http://www.scipy.org/Cookbook/Matplotlib/ColormapTransformations
def cmap_discretize(cmap, N):
"""Return a discrete colormap from the continuous colormap cmap.
cmap: colormap instance, eg. cm.jet.
N: Number of colors.
Example
x = resize(arange(100), (5,100))
djet = cmap_discretize(cm.jet, 5)
imshow(x, cmap=djet)
"""
cdict = cmap._segmentdata.copy()
# N colors
colors_i = np.linspace(0,1.,N)
# N+1 indices
indices = np.linspace(0,1.,N+1)
for key in ('red','green','blue'):
# Find the N colors
D = np.array(cdict[key])
I = interpolate.interp1d(D[:,0], D[:,1])
colors = I(colors_i)
# Place these colors at the correct indices.
A = zeros((N+1,3), float)
A[:,0] = indices
A[1:,1] = colors
A[:-1,2] = colors
# Create a tuple for the dictionary.
L = []
for l in A:
L.append(tuple(l))
cdict[key] = tuple(L)
# Return colormap object.
return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
# jet colormap goes from blue to red (good to bad)
cmap = cmap_discretize(mpl.cm.jet, len(categories))
cmap.set_over('0.25')
cmap.set_under('0.75')
#norm = mpl.colors.Normalize(endTimes,cmap.N)
print endTimes
print labels
# make a color list by matching labels to a picture
#norm = mpl.colors.ListedColormap(colorList)
cb1 = mpl.colorbar.ColorbarBase(axes[0],cmap=cmap
,orientation='horizontal'
,boundaries=endTimes
,ticks=endTimes
,spacing='proportional')
pyplot.show()
It sounds like you want something like a stacked bar chart with the color values mapped to a given range? In that case, here's a rough example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Generate data....
intervals, weights = [], []
max_weight = 5
for _ in range(30):
numtimes = np.random.randint(3, 15)
times = np.random.randint(1, 24*60 - 1, numtimes)
times = np.r_[0, times, 24*60]
times.sort()
intervals.append(np.diff(times) / 60.0)
weights.append(max_weight * np.random.random(numtimes + 1))
# Plot the data as a stacked bar chart.
for i, (interval, weight) in enumerate(zip(intervals, weights)):
# We need to calculate where the bottoms of the bars will be.
bottoms = np.r_[0, np.cumsum(interval[:-1])]
# We want the left edges to all be the same, but increase with each day.
left = len(interval) * [i]
patches = plt.bar(left, interval, bottom=bottoms, align='center')
# And set the colors of each bar based on the weights
for val, patch in zip(weight, patches):
# We need to normalize the "weight" value between 0-1 to feed it into
# a given colorbar to generate an actual color...
color = cm.jet(float(val) / max_weight)
patch.set_facecolor(color)
# Setting the ticks and labels manually...
plt.xticks(range(0, 30, 2), range(1, 31, 2))
plt.yticks(range(0, 24 + 4, 4),
['12am', '4am', '8am', '12pm', '4pm', '8pm', '12am'])
plt.xlabel('Day')
plt.ylabel('Hour')
plt.axis('tight')
plt.show()

Categories