correlation heatmap in bokeh python - python

Tried looking around a bit for finding a simpler way to plot the correlation matrix in bokeh via heatmap; however could not find much help.
Let's say i have a correlation DF created by way of :
corr_df = df.corr()
Can you please assist how I can use the continuous number range in this df to reflect the color intensity in bokeh?
I understand for the X and Y columns i will first have to pull those unique column names in a factors list.
factors = ["A","B","C"]
x = ["A","A","A","B"]
y = ["A","B","C","C"]
Is there a easy peasy way to do all of this ?
I can do all of this in seaborn with just a single line function.

Related

Folium Heatmap generate legend based on geopoint occurrence

I am attempting to generate a folium heatmap based on geopoints and how often the geopoint appear in my dataset. Tbh it seems that the counter how often the geopoint appear did not affect my heatmap.
Additionally to that i need a legend that everybody can read my heatmap.
My data is saved in a pandas dataframe with following columns:
Latitude Longitude count
Count hold the data for how often each Latitude and Longitude point occur in the dataset.
If i generate a heatmap like:
heat_data2 = [[row['Latitude'], row['Longitude'], row['count']] for index, row in df.iterrows()]
it seems that the count does not get included. I noticed this at the moment i added a legend like this:
steps = 200
colormap = branca.colormap.linear.YlOrRd_09.scale(0, 5000).to_step(steps)
gradient_map = defaultdict(dict)
for i in range(steps):
gradient_map[1 / steps * i] = colormap.rgb_hex_str(1 / steps * i)
colormap.add_to(map)
all points i see on heatmap have the color of the least occurrence. How can i combine the the count and the geopoints to get a heatmap that shows me how often each point occurred?
I also appreciate any tips for better tools to generate heatmaps for geodata in python!

How to plot a heatmap on x-y plane?

I create this dataframe (attached) (which has 3 columns, x location, y location, and performance ratio for different systems) and I am trying to make a heatmap regarding to x-y values as a points on x-y coordinates and the last column as the value for the heatmap.
I have tried this:
hm = sns.kdeplot(inv_positions['x'] , inv_positions['y'], shade = True).
but since this function only takes two columns I can not plot the values column on x-y plane.
Any ideas how to do that?enter image description here
Joooeey's comment provides a good example using a different method; here is another question more specifically using the same library as you: Generating heat map in seaborn

How can we plot line-chart between repeating non-numeric column values in python, containing information of more than two columns?

considering the database image attached below, suppose we have to plot x = TIME; y = Value; the plot should place countries in the graph for particular quarters and values. So there are there columns values interacting with each other. We are trying to represent countries in the axes of TIME and Value. I am trying to find an alternative without using one-hot encoding.
When trying to plot the data using this code:
x = x.sort_values(by = ['TIME'])
x[['TIME', 'Value']].plot(x="TIME", y = "Value", kind="bar")
The quarters are getting repeated in the x-axis.
Can you explain how can we deal with such scenarios.
the sample of dataset
One of the best solutions will be to convert the values using One-Hot encoding and then plot them.

Python - produce conditional average of variable 1 based on variable 2 with numpy?

I'm trying to make some basic plots so I can better understand what is happening in my data. Currently 1 have 4 variables each with 200*387 data points. I've stored everything in a 3D array, with the 3rd dimension representing different variables associated with the data.
Currently I have produced some scatterplots of var1 vs. var2. However, i would like to add a conditional mean curve on top of this scatterplot. This would be the average var1 (y-axis) value for any given var2 (x-axis) value. However, I am quite new to Python and so am pretty sure that the way I am currently thinking of approaching this is by a long way not the most efficient.
What I'm thinking at the moment is that I can vectorise the data for each variable (i.e. make it 1D) and then create bins of var2 of some reasonable size and then find the average of var1 for each of these bins. I store these averages in some new vector and then plot that.
Is this a very stupid way of doing this? From what I've searched it seems like pandas may have a simple way of doing this but given how new to Python I am I'm also not sure if going straight to pandas would be overkill.
Thank you in advance for any and all responses!
Thank you for the responses. Re-reading my question I've realised that it was pretty poorly worded, so my apologies for that.
I found my solution, it was pretty simple in the end. There was no need to use pandas and change data type from arrays to dataframes. I ended up just using the binned_statistics function from scipy. My code was effectively just:
import scipy as sp
n_bins = 80
cond_means, bin_edges, binnumber = sp.stats.binned_statistics(var2, var1, statistic='mean', bins=n_bins)
Where again var2 is the independent (x-axis) variable and var1 is the dependent (y-axis) variable.
For anyone who is also interested in using this for conditional mean plots be aware that binned_statistics provides bin edges, not bin means. This means that you will always have one more bin_edges element than you will have cond_means elements. An easy fix to this is:
bin_width = bin_edges[1] - bin_edges[0]
bin_centres = bin_edges[1:] - bin_width/2
You should now be able to plot your conditional mean simply as:
import matplotlib.pyplot as plt
fig1 = plt.figure()
plt.scatter(var2, var1, color = 'blue', label = 'raw data')
plt.plot(bin_centres, cond_means, color = 'black', label = 'Conditional mean')
plt.legend()
plt.xlabel('var2')
plt.ylabel('var1')
plt.show()

Plotting histogram using clusters

I have tried to research this problem, but failed. I'm quite a beginner at python, so bear with me.
I have a textfile containing numbers on each line (they are angles in degrees).
I want to first cluster the angles into cluster sizes of 20. Then I want to plot this on a histogram. I have the following code:
angle = open(output_dir+'/chi_angle.txt', 'r').read().splitlines()
array = numpy.array(map(float, angle))
hello = list(array)
from cluster import *
cl = HierarchicalClustering(hello, lambda x,y: abs(x-y))
clusters = cl.getlevel(20)
frequency = [len(x) for x in clusters]
average = [1.0*sum(x)/len(x) for x in clusters]
Now. My question is: How do I plot the histogram?
Doing the following:
pylab.hist(average, bins=50)
pylab.xlabel('Chi 1 Angle [degrees]')
pylab.ylabel('#')
pylab.show()
will show a histogram with bars correctly placed (i.e. at the average of each cluster), but it wont show how many "angles" each cluster contains.
Just for clarification. The clustered data looks like this:
clusters = [[-60.26, -30.26, -45.24], [163.24, 173.24], [133.2, 123.23, 121.23]]
I want the mean of each cluster, and the number of angles in each cluster. On the histogram the first bar will thus be located at around -50 and will be a height of 3. How do I plot this?
Thanks a lot!
Not sure I understood your question. Anyhow try saving your histogram in this array
H=hist(average, bins=50)
If you want to plot it then do
plot(H[1][1:],H[0])
H[1] is an array that stores the bins centers and H[0] the counts in each bin. I hope this helped.
Why don't you just use a histogram right away?
A histogram of cluster centers is not a very sensible representation of your data.

Categories