How to plot a heatmap on x-y plane?

How to plot a heatmap on x-y plane? - python

I create this dataframe (attached) (which has 3 columns, x location, y location, and performance ratio for different systems) and I am trying to make a heatmap regarding to x-y values as a points on x-y coordinates and the last column as the value for the heatmap.
I have tried this:
hm = sns.kdeplot(inv_positions['x'] , inv_positions['y'], shade = True).
but since this function only takes two columns I can not plot the values column on x-y plane.
Any ideas how to do that?enter image description here

Joooeey's comment provides a good example using a different method; here is another question more specifically using the same library as you: Generating heat map in seaborn

Related

Output the Y axis given the X axis

I have the following question,
I plotted a graphic using biosspy. Using an integrated function, I could have a list of X-axis coordinates (where there were spikes).
I would like to know if there is a function that given the list o X-axis coordinates can give me the list of the Y-axis coordinates to see the amplitude of the waves.
This is the code, and using the heart_rate_ts it returns the list of the x-axis
from biosppy.signals import bvp
ts, filtered, onsets, heart_rate_ts, heart_rate = bvp.bvp(signal=data1, sampling_rate=50.0, show=True)
Thank you in advance

Using python to plot a heat map from five arrays: x,y and 3 arrays indicating RGB

I have 2 arrays, x and y, respectively representing each point's coordinate on a 2D plane. I also have another 3 arrays of the same length as x and y. These three arrays represent the RGB values of a color. Therefore, each point in x,y correspond to a color indicated by the RGB arrays. In Python, how can I plot a heat map with x,y as its axes and colors from the three RGB arrays? Each array is, say, 1000 in length.
As an example that takes the first 10 points, I have:
x = [10.946028, 16.229064, -36.855, -38.719057, 11.231684, 33.256904999999996, -41.21, 12.294958, 16.113228, -43.429027000000005]
y = [-21.003803, 4.5, 4.5, -22.135853, 4.084630000000001, 17.860079000000002, -18.083685, -3.98297, -19.565272, 0.877016]
R = [0,1,2,3,4,5,6,7,8,9]
G = [2,4,6,8,10,12,14,16,18,20]
B = [0,255,0,255,0,255,0,255,0,255]
I'd like to draw a heat map that, for example, the first point would have the coordinates (10.946028,-21.003803) and has a color of R=0,G=2,B=0. The second point would have the coordinates (16.229064, 4.5) and has a color of R=1,G=4,B=255.

Ok it seems like you want like your own colormap for your heatmap. Actually you can write your own, or just use some of matplotlibs templates. Check out this post for the use of heatmaps with matplotlib. If you want to do it on your own, the easiest way is to recombine the 5 one-dimension vectors to a 3D-RGB image. Afterwards you have to define a mapping function which combines the R-G and B value to a new single value for every pixel. Like:
f(R,G,B) = a*R +b*G + c*B
a,b,c can be whatever you like, actually the formular can be way more complex, but you have to determine in which correlation the values should be. From that you get a 2D-Matrix filled with values of your function f(R,G,B). Now you have to define which value of this new matrix gets what color. This can be a linear mapping by hand (like just writing a list: 0=deep-Blue , 1= ligth-Red ...). Using this look-up table you can now get your own specific heatmap. But as you may see, that path takes some time so i would recommend not doing it and just use one of the various templates of matplotlib. Example:
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
You can use various types of these buy changing the string after cmap="hot" to sth of that list. Hope i could help you, gl hf.

how to create interactive graph on a large data set?

I am trying to create an interactive graph using holoviews on a large data set. Below is a sample of the data file called trackData.cvs
Event Time ID Venue
Javeline 11:25:21:012345 JVL Dome
Shot pot 11:25:22:778929 SPT Dome
4x4 11:25:21:993831 FOR Track
4x4 11:25:22:874293 FOR Track
Shot pot 11:25:21:087822 SPT Dome
Javeline 11:25:23:878792 JVL Dome
Long Jump 11:25:21:892902 LJP Aquatic
Long Jump 11:25:22:799422 LJP Aquatic
This is how I read the data and plot a scatter plot.
trackData = pd.read_csv('trackData.csv')
scatter = hv.Scatter(trackData, 'Time', 'ID')
scatter
Because this data set is quite huge, zooming in and out of the scatter plot is very slow and would like to speed this process up.
I researched and found about holoviews decimate that is recommended on large datasets but I don't know how to use in the above code.
Most cases I tried seems to throw an error. Also, is there a way to make sure the Time column is converted to micros? Thanks in advance for the help

Datashader indeed does not handle categorical axes as used here, but that's not so much a limitation of the software than of my imagination -- what should it be doing with them? A Datashader scatterplot (Canvas.points) is meant for a very large number of points located on a continuously indexed 2D plane. Such a plot approximates a 2D probability distribution function, accumulating points per pixel to show the density in that region, and revealing spatial patterns across pixels.
A categorical axis doesn't have the same properties that a continuous numerical axis does, because there's no spatial relationship between adjacent values. Specifically in this case, there's no apparent meaning to an ordering of the ID field (it appears to be a letter code for a sporting event type), so I can't see any meaning to accumulating across ID values per pixel the way Datashader is designed to do. Even if you convert IDs to numbers, you'll either just get random-looking noise (if there are more ID values than vertical pixels), or a series of spotty lines (if there are fewer ID values than pixels).
Here, maybe there are only a few dozen or so unique ID values, but many, many time measurements? In that case most people would use a box, violin, histogram, or ridge plot per ID, to see the distribution of values for each ID value. A Datashader points plot is a 2D histogram, but if one axis is categorical you're really dealing with a set of 1D histograms, not a single combined 2D histogram, so just use histograms if that's what you're after.
If you really do want to try plotting all the points per ID as raw points, you could do that using vertical spike events as in https://examples.pyviz.org/iex_trading/IEX_stocks.html . You can also add some vertical jitter and then use Datashader, but that's not something directly supported right now, and it doesn't have the clear mathematical interpretation that a normal Datashader plot does (in terms of approximating a density function).

The disadvantage of decimate() is that it downsamples your datapoints.
I think you need datashader() here, but datashader doesn't like that ID is a categorical variable instead of a numerical value.
So a solution could be to convert your categorical variable to a numerical code.
See the code example below for both hvPlot (which I prefer) and HoloViews:
import io
import pandas as pd
import hvplot.pandas
import holoviews as hv
# dynspread is for making point sizes larger when using datashade
from holoviews.operation.datashader import datashade, dynspread
# sample data
text = """
Event Time ID Venue
Javeline 11:25:21:012345 JVL Dome
Shot pot 11:25:22:778929 SPT Dome
4x4 11:25:21:993831 FOR Track
4x4 11:25:22:874293 FOR Track
Shot pot 11:25:21:087822 SPT Dome
Javeline 11:25:23:878792 JVL Dome
Long Jump 11:25:21:892902 LJP Aquatic
Long Jump 11:25:22:799422 LJP Aquatic
"""
# create dataframe and parse time
df = pd.read_csv(io.StringIO(text), sep='\s{2,}', engine='python')
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S:%f')
df = df.set_index('Time').sort_index()
# get a column that converts categorical id's to numerical id's
df['ID'] = pd.Categorical(df['ID'])
df['ID_code'] = df['ID'].cat.codes
# use this to overwrite numerical yticks with categorical yticks
yticks=[(0, 'FOR'), (1, 'JVL'), (2, 'LJP'), (3, 'SPT')]
# this is the hvplot solution: set datashader=True
df.hvplot.scatter(
x='Time',
y='ID_code',
datashade=True,
dynspread=True,
padding=0.05,
).opts(yticks=yticks)
# this is the holoviews solution
scatter = hv.Scatter(df, kdims=['Time'], vdims=['ID_code'])
dynspread(datashade(scatter)).opts(yticks=yticks, padding=0.05)
More info on datashader and decimate:
http://holoviews.org/user_guide/Large_Data.html
Resulting plot:

How to fill between the lines in a pandas/matplotlib time series plot

I have a data frame which is indexed by DataTime in pandas.
I have data about a car with the Inside temperature, Lowest inside temperature, Highest temperature and the same three features for the Outside temperature.
Thus I plot all 6 features like so as a time series, and have tried to use plt.fill_between like so :
car_df[['insideTemp','insideTempLow','insideTempHigh','outsideTemp','outsideTempLow','outsideTempHigh']].plot()
plt.fill_between(car_df['insideTemp'], car_df['insideTempLow'],car_df['insideTempHigh'], data=car_df)
plt.fill_between(car_df['outsideTemp'], car_df['outsideTempLow'],car_df['outsideTempHigh'], data=car_df)
plt.show()
I get 6 lines as desired, however nothing seems to get filled (thus not separating the two categories of indoor and outdoor).
Any ideas? Thanks in advance.

You passed wrong arguments to fill_between.
The proper parameters are as follows:
x - x coordinates, in your case index values,
y1 - y coordinates of the first curve,
y2 - y coordinates of the secondt curve.
For readability, usually there is a need to pass also color parameter.
I performed such a test to draw just 2 lines (shortening column names)
and fill the space between them:
car_df[['inside', 'outside']].plot()
plt.fill_between(car_df.index, car_df.inside, car_df.outside,
color=(0.8, 0.9, 0.5));
and got the followig picture:

How to plot coarse-grained average of a set of data points?

I have a set of discrete 2-dimensional data points. Each of these points has a measured value associated with it. I would like to get a scatter plot with points colored by their measured values. But the data points are so dense that points with different colors would overlap with each other, that may not be good for visualization. So I am thinking if I could associate the color for each point based on the coarse-grained average of measured values of some points near it. Does anyone know how to implement this in Python?
Thanks!

I have it done by using sklearn.neighbors.RadiusNeighborsClassifier(), the idea is the take the average of the values of the neighbors within a specific radius. Suppose the coordinates of the data points are in the list temp_coors, the values associated with these points are coloring, then coloring could be coarse-grained in the following way:
r_neigh = RadiusNeighborsRegressor(radius=smoothing_radius, weights='uniform')
r_neigh.fit(temp_coors, coloring)
coloring = r_neigh.predict(temp_coors)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.