2 line plot using seaborn - python

I want to plot this array. I am using seaborn to do that. I used
import seaborn as sns
sns.set_style('whitegrid')
sns.kdeplot(data= score_for_modelA[:,0])
But the above one only gives for column 1. My scores are in column 1 and 2 and I want both of them plotted in the same graph.
The sample data is like this:
array ([[0.67,0.33],[0.45,0.55],......,[0.81,0.19]]

You can try putting them into a data frame first, with the proper column names, for example:
import seaborn as sns
import numpy as np
import pandas as pd
# create sample dataframe in wide format
score_for_modelA = np.random.normal(0, 1, (50, 2))
df = pd.DataFrame(score_for_modelA, columns=['col1', 'col2'])
# use melt to convert the dataframe to a long form
dfm = df.melt()
Plot the long form dataframe
sns.kdeplot(data=dfm, hue="variable", x="value")
As pointed out by #JohanC, if you want all of the columns:
sns.kdeplot(data=df)

Related

How can one create histograms with subplots according to grouped variables in seaborn?

I am attempting to create a histogram using seaborn and census data that displays 3 subplots for age composition, and I have the data grouped the way that I would like it, but I am struggling to turn that into a histogram.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
filename = "/scratch/%s_class_root/%s_class/materials/data/pums_short.csv.gz"
acs = pd.read_csv(filename)
R65_agg = acs.groupby(["R65", "PUMA"])["HINCP"]
R65_meds = R65_agg.agg(np.median).unstack()
R65_f = R65_meds.dropna()
R65_f = R65_meds.reset_index(drop = True)
I was expecting this code to give me data that I could plug into a histogram but instead of being distinct subplots, the "0.0, 1.0, 2,0" in the final variable just get added together when I apply the .describe() function. Any advice for how I can convert this into a form that's readable with the sns.histplot() function?

Python: How to construct a joyplot with values taken from a column in pandas dataframe as y axis

I have a dataframe df in which the column extracted_day consists of dates ranging between 2022-05-08 to 2022-05-12. I have another column named gas_price, which consists of the price of the gas. I want to construct a joyplot such that for each date, it shows the gas_price in the y axis and has minutes_elapsed_from_start_of_day in the x axis. We may also use ridgeplot or any other plot if this doesn't work.
This is the code that I have written, but it doesn't serve my purpose.
from joypy import joyplot
import matplotlib.pyplot as plt
df['extracted_day'] = df['extracted_day'].astype(str)
joyplot(df, by = 'extracted_day', column = 'minutes_elapsed_from_start_of_day',figsize=(14,10))
plt.xlabel("Number of minutes elapsed throughout the day")
plt.show()
Create dataframe with mock data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from joypy import joyplot
np.random.seed(111)
df = pd.DataFrame({
'minutes_elapsed_from_start_of_day': np.tile(np.arange(1440), 5),
'extracted_day': np.repeat(['2022-05-08', '2022-05-09', '2022-05-10','2022-05-11', '2022-05-12'], 1440),
'gas_price': abs(np.cumsum(np.random.randn(1440*5)))})
Then create the joyplot. It is important that you set kind='values', since you do not want joyplot to show KDEs (kernel density estimates, joyplot's default) but the raw gas_price values:
joyplot(df, by='extracted_day',
column='gas_price',
kind='values',
x_range=np.arange(1440),
figsize=(7,5))
The resulting joyplot looks like this (the fake gas prices are represented by the y-values of the lines):

Box and whisker plot on multiple columns

I am trying to make a Box and Whisker plot on my dataset that looks something like this -
& the chart I'm trying to make
My current lines of code are below -
import seaborn as sns
import matplotlib.pyplot as plt
d = df3.boxplot(column = ['Northern California','New York','Kansas','Texas'], by = 'Banner')
d
Thank you
I've recreated a dummy version of your dataset:
import numpy as np
import pandas as pd
dictionary = {'Banner':['Type1']*10+['Type2']*10,
'Northen_californina':np.random.rand(20),
'Texas':np.random.rand(20)}
df = pd.DataFrame(dictionary)
What you need is to melt your dataframe (unpivot) in orther to have the information of geographical zone stored in a column and not as column name. You can use pandas.melt method and specify all the columns you want to put in your boxplot in the value_vars argument.
With my dummy dataset you can do this:
df = pd.melt(df,id_vars=['Banner'],value_vars=['Northen_californina','Texas'],
var_name='zone', value_name='amount')
Now you can apply a boxplot using the hue argument:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(9,9)) #for a bigger image
sns.boxplot(x="Banner", y="amount", hue="zone", data=df, palette="Set1")

Count Rows in Dictionary of Dataframes

I have a dictionary of dataframes. I am trying to count the rows in each dataframe. For the real data, my code is counting just over ten thousand rows for a dataframe that has only has a few rows.
I have tried to reproduce the error using dummy data. Unfortunately, the code works fine with the dummy data!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Dataframe
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# Map
Ma = Df.groupby('D')
# Dictionary of Dataframes
Di = {}
for name, group in Ma:
Di[str(name)] = group
# Count the Rows in each Dataframe
Li = []
for k in Di:
Count = Di[k].shape[0]
Li.append([Count])
# Flatten
Li_1 = []
for sublist in Li:
for item in sublist:
Li_1.append(item)
# Histogram
plt.hist(Li_1, bins=10)
plt.xlabel("Rows / Dataframe")
plt.ylabel("Frequency")
fig = plt.gcf()
To get the number of rows corresponding to each category in 'D', you can simply use .size when you do your groupby:
Df.groupby('D').size()
pandas also allows you to directly plot graphs, so your code can be reduced to:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
Df.groupby('D').size().plot.hist()
plt.xlabel("Rows / Dataframe")
plt.ylabel("Frequency")
fig = plt.gcf()
Assuming that, the data in column D is a categorical variable. You can get the count for each category using seaborn countplot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Dataframe
df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
# easy count plot in sns
sns.countplot(x='D',data=df)
plt.xlabel("category")
plt.ylabel("frequency")
But if you are looking for distribution plot but not categorical count plot then you can use the folowing part of the code to have distribution plot.
# for distribution plot
sns.distplot(df['D'],kde=False,bins=10)
plt.xlabel("Spread")
plt.ylabel("frequency")
But if you want distribution plot after group by the elements which does not make any sense to me but you can use the following:
# for distribution plot after group by
sns.distplot(df.groupby('D').size() ,kde=False,bins=10)
plt.xlabel("Spread")
plt.ylabel("frequency")

plot graph from python dataframe

i want to convert that dataframe
into this dataframe and plot a matplotlib graph using date along x axis
changed dataframe
Use df.T.plot(kind='bar'):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df.T.plot(kind='bar')
plt.show()
you can also assign the transpose to a new variable and plot that (what you asked in the comment):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df_transposed = df.T
df_transposed.plot(kind='bar')
plt.show()
both result the same:

Categories