I have a data which looks like (example)
x y d
0 0 -2
1 0 0
0 1 1
1 1 3
And I want to turn this into a coloumap plot which looks like one of these:
where x and y are in the table and the color is given by 'd'. However, I want a predetermined color for each number, for example:
-2 - orange
0 - blue
1 - red
3 - yellow
Not necessarily these colours but I need to address a number to a colour and the numbers are not in order or sequence, the are just a set of five or six random numbers which repeat themselves across the entire array.
Any ideas, I haven't got a code for that as I don't know where to start. I have however looked at the examples in here such as:
Matplotlib python change single color in colormap
However they only show how to define colours and not how to link those colours to an specific value.
It turns out this is harder than I thought, so maybe someone has an easier way of doing this.
Since we need to create an image of the data, we will store them in a 2D array. We can then map the data to the integers 0 .. number of different data values and assign a color to each of them. The reason is that we want the final colormap to be equally spaced. So
value -2 --> integer 0 --> color orange
value 0 --> integer 1 --> color blue
and so on.
Having nicely spaced integers, we can use a ListedColormap on the image of newly created integer values.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors
# define the image as a 2D array
d = np.array([[-2,0],[1,3]])
# create a sorted list of all unique values from d
ticks = np.unique(d.flatten()).tolist()
# create a new array of same shape as d
# we will later use this to store values from 0 to number of unique values
dc = np.zeros(d.shape)
#fill the array dc
for i in range(d.shape[0]):
for j in range(d.shape[1]):
dc[i,j] = ticks.index(d[i,j])
# now we need n (= number of unique values) different colors
colors= ["orange", "blue", "red", "yellow"]
# and put them to a listed colormap
colormap = matplotlib.colors.ListedColormap(colors)
plt.figure(figsize=(5,3))
#plot the newly created array, shift the colorlimits,
# such that later the ticks are in the middle
im = plt.imshow(dc, cmap=colormap, interpolation="none", vmin=-0.5, vmax=len(colors)-0.5)
# create a colorbar with n different ticks
cbar = plt.colorbar(im, ticks=range(len(colors)) )
#set the ticklabels to the unique values from d
cbar.ax.set_yticklabels(ticks)
#set nice tickmarks on image
plt.gca().set_xticks(range(d.shape[1]))
plt.gca().set_yticks(range(d.shape[0]))
plt.show()
As it may not be intuitively clear how to get the array d in the shape needed for plotting with imshow, i.e. as 2D array, here are two ways of converting the input data columns:
import numpy as np
x = np.array([0,1,0,1])
y = np.array([ 0,0,1,1])
d_original = np.array([-2,0,1,3])
#### Method 1 ####
# Intuitive method.
# Assumption:
# * Indexing in x and y start at 0
# * every index pair occurs exactly once.
# Create an empty array of shape (n+1,m+1)
# where n is the maximum index in y and
# m is the maximum index in x
d = np.zeros((y.max()+1 , x.max()+1), dtype=np.int)
for k in range(len(d_original)) :
d[y[k],x[k]] = d_original[k]
print d
#### Method 2 ####
# Fast method
# Additional assumption:
# indizes in x and y are ordered exactly such
# that y is sorted ascendingly first,
# and for each index in y, x is sorted.
# In this case the original d array can bes simply reshaped
d2 = d_original.reshape((y.max()+1 , x.max()+1))
print d2
Related
I am trying to plot a heatmap from a 2000x2000 NumPy array. I have tried every solution from this post and many others. I have tried many cmaps and interpolation combinations.
This is the code that prepares the data:
def parse_cords(cord: float):
cord = str(cord).split(".")
h_map[int(cord[0])][int(cord[1])] += 1
df["coordinate"] is a pandas series of floats x,y coordinate. x and y are ranging from 0 to 1999.
I have decided to modify the array so that values will range from 0 to 1, but I have tested the code also without changing the range.
h_map = np.zeros((2000, 2000), dtype='int')
cords = df["coordinate"].map(lambda cord: parse_cords(cord))
maximum = float(np.max(h_map))
precent = lambda x: x/maximum
h_map = precent(h_map)
h_map looks like this:
[[0.58396242 0.08840799 0.03153833 ... 0.00285187 0.00419393 0.06324442]
[0.09075658 0.11172622 0.01476262 ... 0.00134206 0.00687804 0.0082201 ]
[0.02986076 0.01862104 0.03959067 ... 0.00100654 0.00134206 0.00251636]
...
[0.00301963 0.00134206 0.00134206 ... 0.00100654 0.00150981 0.00553598]
[0.00419393 0.00268411 0.00100654 ... 0.00201309 0.00402617 0.01342057]
[0.05183694 0.00251636 0.00184533 ... 0.00301963 0.00838785 0.1016608 ]]
Now the plot:
fig, ax = plt.subplots(figsize=figsize)
ax = plt.imshow(h_map)
And result:
final plot
The result is always a heatmap with only a single color depending on the cmap used. Is my array just too big to be plotted like this or am I doing something wrong?
EDIT:
I have added plt.colorbar() and removed scaling from 0 to 1. The plot knows the range of data (0 to 5500) but assumes that every value is equal to 0.
I think that is because you only provide one color channel. Therefore, plt.imshow() interprets the data as black and white image. You could either add more channels or use a different function e.g. sns.heatmap().
from seaborn import sns
I am trying to subset a matrix by using values from another smaller matrix. The number of rows in each are the same, but the smaller matrix has fewer columns. Each column in the smaller matrix contains the value of the column in the larger matrix that should be referenced. Here is what I have done, along with comments that hopefully describe this better, along with what I have tried. (The wrinkle in this is that the values of the columns to be used in each row change...)
I have tried Google, searching on stackoverflow, etc and can't find what I'm looking for. (The closest I came was something in sage called matrix_from_columns, which isn't being used here) So I'm probably making a very simple referencing error.
TIA,
mconsidine
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
#Problem: for each row in a matrix/image I need to replace
# a value in a particular column in that row by a
# weighted average of some of the values on either
# side of that column in that row. The wrinkle
# is that the column that needs to be changed may
# vary from row to row. The columns that need to
# have their values changes is stored in an array.
#
# How do I do something like:
# img[:, selectedcolumnarray] = somefunction(img,targetcolumnmatrix)
#
# I can do this for setting the selectedcolumnarray to a value, like 0
# But I am not figuring out how to select the targeted values to
# average.
#dimensions of subset of the matrix/image that will be averaged
rows = 7
columns = 5
#weights that will be used to average surrounding values
the_weights = np.ones((rows,columns)).astype(float)*(1/columns)
print(the_weights)
#make up some data to create a set of column
# values that vary by row
y = np.asarray(range(0,rows)).astype(float)
x = -0.095*(y**2) - 0.05*y + 12.123
fit=[x.astype(int),x-x.astype(int),y]
print(np.asarray(fit)[0])
#create a test array, eg "image' of 20 columns that will have
# values in targeted columns replaced
testarray = np.asarray(range(1,21))
img = np.ones((rows,20)).astype(np.uint16)
img = img*testarray.T #give it some values
print(img)
#values of the rows that will be replaced
targetcolumn = np.asarray(fit)[0].astype(int)
print(targetcolumn)
#calculate the range of columns in each row that
# will be used in the averaging
startcol = targetcolumn-2
endcol = targetcolumn+2
testcoords=np.linspace(startcol,endcol,5).astype(int).T
#this is the correct set of columns in the corresponding
# row to use for averaging
print(testcoords)
img2=img.copy()
#this correctly replaces the targetcolumn values with 0
# but I want to replace them with the sum of the values
# in the respective row of testcoords, weighted by the_weights
img2[np.arange(rows),targetcolumn]=0
#so instead of selecting the one column, I want to select
# the block of the image represented by testcoords, calculate
# a weighted average for each row, and use those values instead
# of 0 to set the values in targetcolumn
#starting again with the 7x20 (rowsxcolumns) "image"
img3=img.copy()
#this gives me the wrong size, ie 7,7,5 when I think I want 7,5;
print(testcoords.shape)
#I thought "take" might help, but ... nope
#img3=np.take(img,testcoords,axis=1)
#something here maybe??? :
#https://stackoverflow.com/questions/40084931/taking-subarrays-from-numpy-array-with-given-stride-stepsize
# but I can't figure out what
##### plot surface to try to visualize what is going on ####
'''
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Make data.
X = np.arange(0, 20, 1)
Y = np.arange(0, rows, 1)
X, Y = np.meshgrid(X, Y)
Z = img2
# Plot the surface.
surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
# Customize the z axis.
ax.set_zlim(0, 20)
ax.zaxis.set_major_locator(LinearLocator(10))
# A StrMethodFormatter is used automatically
ax.zaxis.set_major_formatter('{x:.02f}')
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
It turns out that "take_along_axis" does the trick:
imgsubset = np.take_along_axis(img3,testcoords,axis=1)
print(imgsubset)
newvalues = imgsubset * the_weights
print(newvalues)
newvalues = np.sum(newvalues, axis=1)
print(newvalues)
img3[np.arange(rows),targetcolumn] = np.round(newvalues,0)
print(img3)
(It becomes more obvious when non trivial weights are used.)
Thanks for listening...
mconsidine
I'm trying to do something that I think should be pretty straight forward but I can't seem to get it to work.
I'm trying to plot 16 byte values measured over time to see how they change. I'm trying to use a scatter plot to do this with:
x axis being the measurement index
y axis being the index of the byte
and the color indicating the value of the byte.
I have the data stored in a numpy array where data[2][14] would give me the value of the 14th byte in the 2nd measurement.
Every time I try to plot this, I'm getting either:
ValueError: x and y must be the same size
IndexError: index 10 is out of bounds for axis 0 with size 10
Here is the sample test I'm using:
import numpy
import numpy.random as nprnd
import matplotlib.pyplot as plt
#generate random measurements
# 10 measurements of 16 byte values
x = numpy.arange(10)
y = numpy.arange(16)
test_data = nprnd.randint(low=0,high=65535, size=(10, 16))
#scatter plot the measurements with
# x - measurement index (0-9 in this case)
# y - byte value index (0-15 in this case)
# c = test_data[x,y]
plt.scatter(x,y,c=test_data[x][y])
plt.show()
I'm sure it is something stupid I'm doing wrong but I can't seem to figure out what.
Thanks for the help.
Try using a meshgrid to define your point locations, and don't forget to index into your NumPy array properly (with [x,y] rather than [x][y]):
x, y = numpy.meshgrid(x,y)
plt.scatter(x,y,c=test_data[x,y])
plt.show()
I have a dataset of three columns and n number of rows. column 1 contains name, column 2 value1, and column 3 value2 (rank2).
I want to plot a scatter plot with the outlier values displaying names.
The R commands I am using in are:
tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)
dev.off()
and I get a figure like this:
What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively.
Any suggestions, or adjustment in the commands?
You already have a logical test that works to your satisfaction. Just use it in the color spec to text:
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50,
col=c("blue", "green")[
which(2^(data[,2]-data[,3]) >= 4 , 2^(data[,2]-data[,3]) <=0.25)] )
It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector.
Using python, matplotlib (pylab) to plot, and scipy, numpy to fit data. The trick with numpy is to create a index or mask to filter out the results that you want.
EDIT: Want to selectively color the top and bottom outliers? It's a simple combination of both masks that we created:
import scipy as sci
import numpy as np
import pylab as plt
# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here
# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)
# Find all points above the line
idx = (X*a + b) < Y
# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')
# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
plt.text(X[i], Y[i], NAMES[i])
# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]
plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')
XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--')
plt.axis('tight')
plt.show()
I'm trying to make a time tracking chart based on a daily time tracking file that I used. I wrote code that crawls through my files and generates a few lists.
endTimes is a list of times that a particular activity ends in minutes going from 0 at midnight the first day of the month to however many minutes are in a month.
labels is a list of labels for the times listed in endTimes. It is one shorter than endtimes since the trackers don't have any data about before 0 minute. Most labels are repeats.
categories contains every unique value of labels in order of how well I regard that time.
I want to create a colorbar or a stack of colorbars (1 for eachday) that will depict how I spend my time for a month and put a color associated with each label. Each value in categories will have a color associated. More blue for more good. More red for more bad. It is already in order for the jet colormap to be right, but I need to get desecrate color values evenly spaced out for each value in categories. Then I figure the next step would be to convert that to a listed colormap to use for the colorbar based on how the labels associated with the categories.
I think this is the right way to do it, but I am not sure. I am not sure how to associate the labels with color values.
Here is the last part of my code so far. I found one function to make a discrete colormaps. It does, but it isn't what I am looking for and I am not sure what is happening.
Thanks for the help!
# now I need to develop the graph
import numpy as np
from matplotlib import pyplot,mpl
import matplotlib
from scipy import interpolate
from scipy import *
def contains(thelist,name):
# checks if the current list of categories contains the one just read
for val in thelist:
if val == name:
return True
return False
def getCategories(lastFile):
'''
must determine the colors to use
I would like to make a gradient so that the better the task, the closer to blue
bad labels will recieve colors closer to blue
read the last file given for the information on how I feel the order should be
then just keep them in the order of how good they are in the tracker
use a color range and develop discrete values for each category by evenly spacing them out
any time not found should assume to be sleep
sleep should be white
'''
tracker = open(lastFile+'.txt') # open the last file
# find all the categories
categories = []
for line in tracker:
pos = line.find(':') # does it have a : or a ?
if pos==-1: pos=line.find('?')
if pos != -1: # ignore if no : or ?
name = line[0:pos].strip() # split at the : or ?
if contains(categories,name)==False: # if the category is new
categories.append(name) # make a new one
return categories
# find good values in order of last day
newlabels=[]
for val in getCategories(lastDay):
if contains(labels,val):
newlabels.append(val)
categories=newlabels
# convert discrete colormap to listed colormap python
for ii,val in enumerate(labels):
if contains(categories,val)==False:
labels[ii]='sleep'
# create a figure
fig = pyplot.figure()
axes = []
for x in range(endTimes[-1]%(24*60)):
ax = fig.add_axes([0.05, 0.65, 0.9, 0.15])
axes.append(ax)
# figure out the colors to use
# stole this function to make a discrete colormap
# http://www.scipy.org/Cookbook/Matplotlib/ColormapTransformations
def cmap_discretize(cmap, N):
"""Return a discrete colormap from the continuous colormap cmap.
cmap: colormap instance, eg. cm.jet.
N: Number of colors.
Example
x = resize(arange(100), (5,100))
djet = cmap_discretize(cm.jet, 5)
imshow(x, cmap=djet)
"""
cdict = cmap._segmentdata.copy()
# N colors
colors_i = np.linspace(0,1.,N)
# N+1 indices
indices = np.linspace(0,1.,N+1)
for key in ('red','green','blue'):
# Find the N colors
D = np.array(cdict[key])
I = interpolate.interp1d(D[:,0], D[:,1])
colors = I(colors_i)
# Place these colors at the correct indices.
A = zeros((N+1,3), float)
A[:,0] = indices
A[1:,1] = colors
A[:-1,2] = colors
# Create a tuple for the dictionary.
L = []
for l in A:
L.append(tuple(l))
cdict[key] = tuple(L)
# Return colormap object.
return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
# jet colormap goes from blue to red (good to bad)
cmap = cmap_discretize(mpl.cm.jet, len(categories))
cmap.set_over('0.25')
cmap.set_under('0.75')
#norm = mpl.colors.Normalize(endTimes,cmap.N)
print endTimes
print labels
# make a color list by matching labels to a picture
#norm = mpl.colors.ListedColormap(colorList)
cb1 = mpl.colorbar.ColorbarBase(axes[0],cmap=cmap
,orientation='horizontal'
,boundaries=endTimes
,ticks=endTimes
,spacing='proportional')
pyplot.show()
It sounds like you want something like a stacked bar chart with the color values mapped to a given range? In that case, here's a rough example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Generate data....
intervals, weights = [], []
max_weight = 5
for _ in range(30):
numtimes = np.random.randint(3, 15)
times = np.random.randint(1, 24*60 - 1, numtimes)
times = np.r_[0, times, 24*60]
times.sort()
intervals.append(np.diff(times) / 60.0)
weights.append(max_weight * np.random.random(numtimes + 1))
# Plot the data as a stacked bar chart.
for i, (interval, weight) in enumerate(zip(intervals, weights)):
# We need to calculate where the bottoms of the bars will be.
bottoms = np.r_[0, np.cumsum(interval[:-1])]
# We want the left edges to all be the same, but increase with each day.
left = len(interval) * [i]
patches = plt.bar(left, interval, bottom=bottoms, align='center')
# And set the colors of each bar based on the weights
for val, patch in zip(weight, patches):
# We need to normalize the "weight" value between 0-1 to feed it into
# a given colorbar to generate an actual color...
color = cm.jet(float(val) / max_weight)
patch.set_facecolor(color)
# Setting the ticks and labels manually...
plt.xticks(range(0, 30, 2), range(1, 31, 2))
plt.yticks(range(0, 24 + 4, 4),
['12am', '4am', '8am', '12pm', '4pm', '8pm', '12am'])
plt.xlabel('Day')
plt.ylabel('Hour')
plt.axis('tight')
plt.show()