So I'm still getting to grips with Python after coming over from R recently.
I'm struggling to automatically annotate plots from DF Column. Which is easily done in R.
I was helped the other day on the same matter on MPL Scatter plots.
But I've been tearing my hair out trying to figure this out. I'll add some random data, and show a picture of the sort of thing I'm after.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
d = {'Player': ['Messi', 'Ronaldo','Mbappe','Kovacic', 'Werner','Salah'], '% of Squad pass': [3.2,3.2,4.4,9.9,7.4,4.8)
df = pd.DataFrame(data = d)
This is what I'm doing at the minute.
fig, ax = plt.subplots(1,1, figsize=(4,4))
sns.swarmplot(data=df, x ='% of Squad Pass', ax = ax)
Which gets me this,
Is there a loop function I can use that will automatically annotate the plot points with text from the 'Player' column in the dataframe?
So I'd end with something like this
Thanks and hopefully this will be my last question on the matter!
This is my proposal. You need import random library.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import random
d = {'Player': ['Messi', 'Ronaldo','Mbappe','Kovacic', 'Werner','Salah'], '% of Squad pass': [3.2,3.2,4.4,9.9,7.4,4.8]}
df = pd.DataFrame(data = d)
fig, ax = plt.subplots()
sns.swarmplot(data = df, x=df['% of Squad pass'], ax = ax)
for i, j in enumerate(df['% of Squad pass']):
plt.annotate(df['Player'][i],
xy=(df['% of Squad pass'][i],0),
xytext=(df['% of Squad pass'][i], random.uniform(0.2,0.4)),
arrowprops=dict(arrowstyle="->"))
Related
When I run the code below I notice that the heatmap does not have a square shape knowing that I have used square=True but it did not work! Any idea how can I print the heatmap in a square format? Thank you!
The code:
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib as plt
import os
import seaborn as sns
temp_hourly_A5_A7_AX_ASHRAE=pd.read_csv('C:\\Users\\cvaa4\\Desktop\\projects\\s\\temp_hourly_A5_A7_AX_ASHRAE.csv',index_col=0, parse_dates=True, dayfirst=True, skiprows=2)
sns.heatmap(temp_hourly_A5_A7_AX_ASHRAE,cmap="YlGnBu", vmin=18, vmax=27, square=True, cbar=False, linewidth=0.0001);
The result:
square=True should work to have square cells, below is a working example:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.tile([0,1], 15*15).reshape(-1,15))
sns.heatmap(df, square=True)
If you want a square shape of the plot however, you can use set_aspect and the shape of the data:
ax = sns.heatmap(df)
ax.set_aspect(df.shape[1]/df.shape[0]) # here 0.5 Y/X ratio
You can use matplotlib and set a figsize before plotting heatmap.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
rnd = np.random.default_rng(12345)
data = rnd.uniform(-100, 100, [100, 50])
plt.figure(figsize=(6, 5))
sns.heatmap(data, cmap='viridis');
Note that I used figsize=(6, 5) rather than a square figsize=(5, 5). This is because on a given figsize, seaborn also puts the colorbar, which might cause the heatmap to be squished a bit. You might want to change those figsizes too depending on what you need.
I want to change the labels [2,3,4,5] from my pie chart and instead have them say [Boomer, Gen X, Gen Y, Gen Z] respectively. I can't seem to find a direct way of doing this without changing the dataframe. Is there any way to do this by working through the code I have?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = df.groupby("Q10_Ans")["Q4_Agree"].count()
pie, ax = plt.subplots(figsize=[10,6])
labels = data.keys()
plt.pie(x=data, autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
plt.title("Generations that agree data visualization will help with job prospects", fontsize=14);
pie.savefig("DeliveryPieChart.png")
how about change the code
labels = data.keys()
to
labels = ['Boomer','Gen X','Gen Y','Gen Z']
I don't know the data structure of your data, so I made a sample data and created a pie chart. Please modify your code to follow this.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# data = df.groupby("Q10_Ans")["Q4_Agree"].count()
data = pd.DataFrame({'Q10_Ans':['Boomer','Gen X','Gen Y','Gen Z'],'Q4_Agree':[2,3,4,5]})
fig, ax = plt.subplots(figsize=[10,6])
labels = data['Q10_Ans']
ax.pie(x=data['Q4_Agree'], autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
ax.set_title("Generations that agree data visualization will help with job prospects", fontsize=14);
plt.savefig("DeliveryPieChart.png")
I have categorized data. At specific dates I have data (A to E) that is counted every 15 minutes.
When I want to plot with seaborn I get this:
Bigger bubbles cover smaller ones and the entire thing is not easy readable (e.g. 2020-05-12 at 21:15). Is it possible to display the bubbles for each 15-minute-class next to each other with a little bit of overlap?
My code:
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import os
df = pd.read_csv("test_df.csv")
#print(df)
sns.set_theme()
sns.scatterplot(
data = df,
x = "date",
y = "time",
hue = "category",
size = "amount",sizes=(15, 200)
)
plt.gca().invert_yaxis()
plt.show()
My CSV file:
date,time,amount,category
2020-05-12,21:15,13,A
2020-05-12,21:15,2,B
2020-05-12,21:15,5,C
2020-05-12,21:15,1,D
2020-05-12,21:30,4,A
2020-05-12,21:30,2,C
2020-05-12,21:30,1,D
2020-05-12,21:45,3,B
2020-05-12,22:15,4,A
2020-05-12,22:15,2,D
2020-05-12,22:15,9,E
2020-05-12,00:15,21,D
2020-05-12,00:30,11,E
2020-05-12,04:15,7,A
2020-05-12,04:30,1,B
2020-05-12,04:30,2,C
2020-05-12,04:45,1,A
2020-05-14,21:15,1,A
2020-05-14,21:15,5,C
2020-05-14,21:15,3,D
2020-05-14,21:30,4,A
2020-05-14,21:30,1,D
2020-05-14,21:45,5,B
2020-05-14,22:15,4,A
2020-05-14,22:15,11,E
2020-05-14,00:15,2,D
2020-05-14,00:30,11,E
2020-05-14,04:15,9,A
2020-05-14,04:30,11,B
2020-05-14,04:30,5,C
2020-05-14,05:00,7,A
You can use a seaborn swarmplot for this. You first have to separate the "amount" column into separate entries, using .reindex and .repeat. Then you can plot.
Here is the code:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import os
df = pd.read_csv("test.csv")
df = df.reindex(df.index.repeat(df.amount))
sns.swarmplot(data = df, x = "date", y = "time", hue = "category")
plt.gca().invert_yaxis()
plt.show()
Here is the output:
I have a scatter plot im working with and for some reason im not seeing all the x values on my graph
#%%
from pandas import DataFrame, read_csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file = r"re2.csv"
df = pd.read_csv(file)
#sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='city')
g.map(plt.scatter, 'type', 'price').add_legend()
This is an image of a small subset of my plots, you can see that Res is displaying, the middle bar should be displaying Con and the last would be Mlt. These are all defined in the type column from my data set but are not displaying.
Any clue how to fix?
Python is doing what you tell it to do. Just pick different features, presumably things that make more sense for plotting, if you want to generate a more interesting plots. See this generic example below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
Personally, I like plotly plots, which are dynamic, more than I like seaborn plots.
https://plotly.com/python/line-and-scatter/
suppose I want to plot 2 histogram subplots on the same window in python, one below the next. The data from these histograms will be read from a file containing a table with attributes A and B.
In the same window, I need a plot of A vs the number of each A and a plot of B vs the number of each B - directly below the plot of A. so suppose the attributes were height and weight, then we'd have a graph of height and number of people with said height and below it a separate graph of weight and number of people with said weight.
import numpy as np; import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
frame = pd.read_csv('data.data', header=None)
subplot.hist(frame['A'], frame['A.count()'])
subplot.hist(frame['B'], frame['B.count()'])
Thanks for any help!
Using pandas you can make histograms like this:
import numpy as np; import pandas as pd
import matplotlib.pyplot as plt
frame = pd.read_csv('data.csv')
frame.hist(layout = (2,1))
plt.show()
I'm confused by the second part of the question. Do you want four separate subplots?
You can do this:
import numpy as np
import numpy.random
import pandas as pd
import matplotlib.pyplot as plt
#df = pd.read_csv('data.data', header=None)
df = pd.DataFrame({'A': numpy.random.random_integers(0,10,30),
'B': numpy.random.random_integers(0,10,30)})
print df['A']
ax1 = plt.subplot(211)
ax1.set_title('A')
ax1.set_ylabel('number of people')
ax1.set_xlabel('height')
ax2 = plt.subplot(212)
ax2.set_title('B')
ax2.set_ylabel('number of people')
ax2.set_xlabel('weight')
ax1.hist(df['A'])
ax2.hist(df['B'])
plt.tight_layout()
plt.show()