Plotting categorized data in Seaborn - python

I have categorized data. At specific dates I have data (A to E) that is counted every 15 minutes.
When I want to plot with seaborn I get this:
Bigger bubbles cover smaller ones and the entire thing is not easy readable (e.g. 2020-05-12 at 21:15). Is it possible to display the bubbles for each 15-minute-class next to each other with a little bit of overlap?
My code:
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import os
df = pd.read_csv("test_df.csv")
#print(df)
sns.set_theme()
sns.scatterplot(
data = df,
x = "date",
y = "time",
hue = "category",
size = "amount",sizes=(15, 200)
)
plt.gca().invert_yaxis()
plt.show()
My CSV file:
date,time,amount,category
2020-05-12,21:15,13,A
2020-05-12,21:15,2,B
2020-05-12,21:15,5,C
2020-05-12,21:15,1,D
2020-05-12,21:30,4,A
2020-05-12,21:30,2,C
2020-05-12,21:30,1,D
2020-05-12,21:45,3,B
2020-05-12,22:15,4,A
2020-05-12,22:15,2,D
2020-05-12,22:15,9,E
2020-05-12,00:15,21,D
2020-05-12,00:30,11,E
2020-05-12,04:15,7,A
2020-05-12,04:30,1,B
2020-05-12,04:30,2,C
2020-05-12,04:45,1,A
2020-05-14,21:15,1,A
2020-05-14,21:15,5,C
2020-05-14,21:15,3,D
2020-05-14,21:30,4,A
2020-05-14,21:30,1,D
2020-05-14,21:45,5,B
2020-05-14,22:15,4,A
2020-05-14,22:15,11,E
2020-05-14,00:15,2,D
2020-05-14,00:30,11,E
2020-05-14,04:15,9,A
2020-05-14,04:30,11,B
2020-05-14,04:30,5,C
2020-05-14,05:00,7,A

You can use a seaborn swarmplot for this. You first have to separate the "amount" column into separate entries, using .reindex and .repeat. Then you can plot.
Here is the code:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import os
df = pd.read_csv("test.csv")
df = df.reindex(df.index.repeat(df.amount))
sns.swarmplot(data = df, x = "date", y = "time", hue = "category")
plt.gca().invert_yaxis()
plt.show()
Here is the output:

Related

How to draw a groupby columns using pyplot?

Given that, I have a a dataset as below:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
disease_type = list(np.random.choice(['TB','P'],100))
gender = list(np.random.choice(['M','F'],100))
dict = { 'Disease Type': disease_type ,'Gender':gender }
dt = pd.DataFrame(dict)
I would like to generate a barchart diagram using pyplot which show different disease type based on gender. Somthing like the below image:
I understand that, I can do a groupby as below:
dt = dt.groupby(['Gender'], as_index=False).count()
But, i don't know how to feed it to pyplot ?
I tried the following code for visualization but it did not work for me:
fig= plt.Figure(figsize=(10,10))
ax = fig.add_axes([0.1,0.1,0.8,0.8])
ax.bar(height=dt['Disease Type'])
plt.show()

How to plot Multiline Graphs Via Seaborn library in Python?

I have written a code that looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
exp1= sns.lineplot(data=df1)
plt.savefig('exp1.png')
exp1_smooth= sns.lmplot(x='Size', y='Time', data=df, ci=None, order=4, truncate=False)
plt.savefig('exp1_smooth.png')
That gives me Graph_1:
The Size = x- axis is a constant line but as you can see in my code it varies from (10,100,1000).
How does this produces a constant line? I want to produce a multiline graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2).
Also I wanted to plot a smooth graph of the same graph I am getting right now but it gives me error. What needs to be done to achieve a smooth multi-line graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2)?
I think it not the issue, the line represents for size looks like constant but it NOT.
Can see that values of size in range 10-1000 while the minimum division of y-axis is 20,000 (20 times bigger), make it look like a horizontal line on your graph.
You can try with a bigger values to see the slope clearly.
If you want 'size` as x-axis, you can try below example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
fig = plt.figure()
fig = sns.lineplot(data=df1, x='Size',y='Encrypt_Time' )
fig = sns.lineplot(data=df1, x='Size',y='Decrypt_Time' )

Box and whisker plot on multiple columns

I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")

How to fill color by groups in histogram using Matplotlib?

I know how to do this in R and have provided a code for it below. I want to know how can I do something similar to the below mentioned in Python Matplotlib or using any other library
library(ggplot2)
ggplot(dia[1:768,], aes(x = Glucose, fill = Outcome)) +
geom_bar() +
ggtitle("Glucose") +
xlab("Glucose") +
ylab("Total Count") +
labs(fill = "Outcome")
Using pandas you can pivot the dataframe and directly plot it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# dataframe with two columns in "long form"
g = np.array([np.random.normal(5, 10, 500),
np.random.rayleigh(10, size=500)]).flatten()
df = pd.DataFrame({'Glucose': g, 'Outcome': np.repeat([0,1],500)})
# pivot and plot
df.pivot(columns="Outcome", values="Glucose").plot.hist(bins=100)
plt.show()
Please consider the following example, which uses seaborn 0.11.1.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate random data
data = {'Glucose': np.random.normal(5, 10, 100),
'Outcome': np.random.randint(2, size=100)}
df = pd.DataFrame(data)
# plot
fig, ax = plt.subplots(figsize=(10, 10))
sns.histplot(data=df, x='Glucose', hue='Outcome', stat='count', edgecolor=None)
ax.set_title('Glucose')

Timeseries plot appears as vertical line

import pandas as pd
import quandl as qndl
import datetime as dt
import matplotlib.pyplot as plt
import QuandlAPIKey (My QUANDL API KEY, IGNORE THIS.)
data = qndl.get_table('AUSBS/D')
dataframe = pd.DataFrame(data)
sorteddataframe = dataframe.sort_values(by='date')
dfdate = sorteddataframe[['date']]
dfvalue = sorteddataframe[['value']]
dfx = dfdate
dfy = dfvalue
values_to_read = 100
print(sorteddataframe.head(values_to_read))
plt.plot(dfx.head(values_to_read),dfy.head(values_to_read))
plt.xlabel("Years")
plt.ylabel("Stock Values (Scaled down by 10%)")
plt.show()
I have checked the dataframes (dfx, dfy, sorteddataframe), all of them are properly sorted but the graph being generated is just a simple vertical line. Pic is posted.

Categories