Given that, I have a a dataset as below:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
disease_type = list(np.random.choice(['TB','P'],100))
gender = list(np.random.choice(['M','F'],100))
dict = { 'Disease Type': disease_type ,'Gender':gender }
dt = pd.DataFrame(dict)
I would like to generate a barchart diagram using pyplot which show different disease type based on gender. Somthing like the below image:
I understand that, I can do a groupby as below:
dt = dt.groupby(['Gender'], as_index=False).count()
But, i don't know how to feed it to pyplot ?
I tried the following code for visualization but it did not work for me:
fig= plt.Figure(figsize=(10,10))
ax = fig.add_axes([0.1,0.1,0.8,0.8])
ax.bar(height=dt['Disease Type'])
plt.show()
Related
I am trying to plot a density chart. Below you can see data and chart
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {'type_sale':[100,200,400,400,200,400,300,200,210,300],
'bool':[0,1,0,1,1,0,1,1,0,1],
}
df1 = pd.DataFrame(data, columns = ['type_sale',
'bool'])
df1['bool']= df1['bool'].astype('int32')
I tried with the command above but is not working. Can anybody help me how to solve this problem ?
plot_density_chart(df1[['type_sale', 'bool']], "bool", 'type_sale',
category_var="type_sale", title='prevalence',
xlabel='Type_sale', logx="Yes", vline=None,
save_figure_name = 'type_sale_prevalence.pdf')
You can use seaborn to plot the density chart:
import seaborn as sns
g = sns.FacetGrid(df1,hue='bool')
g = g.map(sns.kdeplot,'type_sale',fill=True,alpha=0.3)
g.add_legend()
g.fig.suptitle('Prevalence', fontsize=16)
g.axes[0,0].set_xlabel('Type_sale')
Which gives you the figure:
If you want to set x-axis to log, add this :
g.axes[0,0].set_xscale('log')
I have written a code that looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
exp1= sns.lineplot(data=df1)
plt.savefig('exp1.png')
exp1_smooth= sns.lmplot(x='Size', y='Time', data=df, ci=None, order=4, truncate=False)
plt.savefig('exp1_smooth.png')
That gives me Graph_1:
The Size = x- axis is a constant line but as you can see in my code it varies from (10,100,1000).
How does this produces a constant line? I want to produce a multiline graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2).
Also I wanted to plot a smooth graph of the same graph I am getting right now but it gives me error. What needs to be done to achieve a smooth multi-line graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2)?
I think it not the issue, the line represents for size looks like constant but it NOT.
Can see that values of size in range 10-1000 while the minimum division of y-axis is 20,000 (20 times bigger), make it look like a horizontal line on your graph.
You can try with a bigger values to see the slope clearly.
If you want 'size` as x-axis, you can try below example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
fig = plt.figure()
fig = sns.lineplot(data=df1, x='Size',y='Encrypt_Time' )
fig = sns.lineplot(data=df1, x='Size',y='Decrypt_Time' )
I want to change the labels [2,3,4,5] from my pie chart and instead have them say [Boomer, Gen X, Gen Y, Gen Z] respectively. I can't seem to find a direct way of doing this without changing the dataframe. Is there any way to do this by working through the code I have?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = df.groupby("Q10_Ans")["Q4_Agree"].count()
pie, ax = plt.subplots(figsize=[10,6])
labels = data.keys()
plt.pie(x=data, autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
plt.title("Generations that agree data visualization will help with job prospects", fontsize=14);
pie.savefig("DeliveryPieChart.png")
how about change the code
labels = data.keys()
to
labels = ['Boomer','Gen X','Gen Y','Gen Z']
I don't know the data structure of your data, so I made a sample data and created a pie chart. Please modify your code to follow this.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# data = df.groupby("Q10_Ans")["Q4_Agree"].count()
data = pd.DataFrame({'Q10_Ans':['Boomer','Gen X','Gen Y','Gen Z'],'Q4_Agree':[2,3,4,5]})
fig, ax = plt.subplots(figsize=[10,6])
labels = data['Q10_Ans']
ax.pie(x=data['Q4_Agree'], autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
ax.set_title("Generations that agree data visualization will help with job prospects", fontsize=14);
plt.savefig("DeliveryPieChart.png")
I have categorized data. At specific dates I have data (A to E) that is counted every 15 minutes.
When I want to plot with seaborn I get this:
Bigger bubbles cover smaller ones and the entire thing is not easy readable (e.g. 2020-05-12 at 21:15). Is it possible to display the bubbles for each 15-minute-class next to each other with a little bit of overlap?
My code:
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import os
df = pd.read_csv("test_df.csv")
#print(df)
sns.set_theme()
sns.scatterplot(
data = df,
x = "date",
y = "time",
hue = "category",
size = "amount",sizes=(15, 200)
)
plt.gca().invert_yaxis()
plt.show()
My CSV file:
date,time,amount,category
2020-05-12,21:15,13,A
2020-05-12,21:15,2,B
2020-05-12,21:15,5,C
2020-05-12,21:15,1,D
2020-05-12,21:30,4,A
2020-05-12,21:30,2,C
2020-05-12,21:30,1,D
2020-05-12,21:45,3,B
2020-05-12,22:15,4,A
2020-05-12,22:15,2,D
2020-05-12,22:15,9,E
2020-05-12,00:15,21,D
2020-05-12,00:30,11,E
2020-05-12,04:15,7,A
2020-05-12,04:30,1,B
2020-05-12,04:30,2,C
2020-05-12,04:45,1,A
2020-05-14,21:15,1,A
2020-05-14,21:15,5,C
2020-05-14,21:15,3,D
2020-05-14,21:30,4,A
2020-05-14,21:30,1,D
2020-05-14,21:45,5,B
2020-05-14,22:15,4,A
2020-05-14,22:15,11,E
2020-05-14,00:15,2,D
2020-05-14,00:30,11,E
2020-05-14,04:15,9,A
2020-05-14,04:30,11,B
2020-05-14,04:30,5,C
2020-05-14,05:00,7,A
You can use a seaborn swarmplot for this. You first have to separate the "amount" column into separate entries, using .reindex and .repeat. Then you can plot.
Here is the code:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import os
df = pd.read_csv("test.csv")
df = df.reindex(df.index.repeat(df.amount))
sns.swarmplot(data = df, x = "date", y = "time", hue = "category")
plt.gca().invert_yaxis()
plt.show()
Here is the output:
I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")