I'm trying to use seaborn to make a simple tsplot, but for reasons that aren't clear to me nothing shows up when I run the code. Here's a minimal example:
import numpy as np
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'value': np.random.rand(31), 'time': range(31)})
ax = sns.tsplot(data=df, value='value', time='time')
sns.plt.show()
Usually tsplot you supply multiple data points for each time point, but does it just not work if you only supply one?
I know matplotlib can be used to do this pretty easily, but I wanted to use seaborn for some of its other functionality.
You are missing individual units. When using a data frame the idea is that multiple timeseries for the same unit have been recorded, which can be individually identifier in the data frame. The error is then calculated based on the different units.
So for one series only, you can get it working again like this:
df = pd.DataFrame({'value': np.random.rand(31), 'time': range(31)})
df['subject'] = 0
sns.tsplot(data=df, value='value', time='time', unit='subject')
Just to see how the error is computed, look at this example:
dfs = []
for i in range(10):
df = pd.DataFrame({'value': np.random.rand(31), 'time': range(31)})
df['subject'] = i
dfs.append(df)
all_dfs = pd.concat(dfs)
sns.tsplot(data=all_dfs, value='value', time='time', unit='subject')
You can use set_index for index from column time and then plot Series:
df = pd.DataFrame({'value': np.random.rand(31), 'time': range(31)})
df = df.set_index('time')['value']
ax = sns.tsplot(data=df)
sns.plt.show()
Related
I'm starting to use Jupyter and Pandas library and I have a trouble with the boxplot graphic.
I have the next dataframe:
dataframe
The problem with this dataframe is that I only have the data for frequency in different range of values. How could I make a graphic with this kind of value table? I'd like to make a boxplot for each column of frequency.
Thank you!!!
I m not sure that I understand your question well but here is a demo for boxplot visualization hope its helps
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# a custom dataframe
df = pd.DataFrame({'A': np.random.randint(1, 10, 5),
'B': np.random.randint(1, 10, 5)})
# and your first column
df['E'] = pd.Series(['0-6', '7-13', '14-20', '21-27', '28-34'
])
print(df)
df.boxplot(column=['A', 'B'],
by='E',vert=True,showmeans=True,meanline=True,showfliers=False)
plt.show()
I have the below python code. but as an output it gives a chart like in the attachment. And its really messy in python. Can anybody tell me hw to fix the issue and make the day in ascenting order in X axis?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("C/desktop/data.xlsx")
df = df.loc[df['month'] == 8]
df = df.astype({'day': str})
plt.plot( 'day', 'cases', data=df)
In the first instance, i didnt take the day as str. So it came like this.
Because it had decimal numbers, i have converted it to str. now this happens.
What you got is typical of an unsorted dataset with many points per group.
As you did not provide an example, here is one:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'day': np.random.randint(1,21,size=100),
'cases': np.random.randint(0,50000,size=100),
})
plt.plot('day', 'cases', data=df)
There is no reason to plot a line in this case, you can use a scatter plot instead:
plt.scatter('day', 'cases', data=df)
To make more sense of your data, you can also compute an aggregated value (ex. mean):
plt.plot('day', 'cases', data=df.groupby('day', as_index=False)['cases'].mean())
I am using pandas dataframes to hold some volume calculation results, and trying to configure a seaborn FacetGrid setup to visualize results of 4 different types of volume calculations for a reservoir zone.
I believe I can handle the dataframe part, my problems is with the visualization part:
Each different type of volume calculations is loaded in the dataframe as a series. The series name corresponds to the type of volume calculation. I want to create a number of plots then, aligned so that each column of plot corresponds to one series in my dataframe.
Theory (documentation) says this should do it (example from tutorial at https://seaborn.pydata.org/tutorial/axis_grids.html):
import seaborn as sns
import matpltlib.pyplot as plt
tips = sns.load_dataset("tips")
g=sns.FacetGrid(tips, col = "time")
I cannot find the referenced dataset "tips" for download, but I think that is a minor problem. From the code snippet above and after some testing on my own data, I infer that "time" in that dataset refers to the name of one series in the dataframe and that different times would then be different categories or other types of values in that series.
This is not how my dataset is ordered. I have the different types of volume calculations that I would see as individual plots (in columns) represented as series in my dataframe. How do I provide the series name as input to seaborn FacetGrid col= argument?
g = seaborn.FacetGrid(data=volumes_table, col=?????)
I cannot figure out how I can get col=dataframe.series and I cannot find any documented example of that.
here's a setup with some exciting dummy names and dummy values
import os
import pandas
import numpy
import seaborn
import matplotlib.pyplot as plt
#provide some input data, using a small dictionary
volumes_categories = {'zone_numbers': [1, 2, 3, 4],
'zone_names': ['corona', 'hiv', 'h5n1', 'measles'],
'grv': [30, 90, 80, 100],
'nv': [20, 60, 20, 50],
'pv': [5, 12, 4, 25],
'hcpv': [4, 6, 1, 20]}
# create the dataframe
volumes_table = pandas.DataFrame(volumes_categories)
# set up for plotting
seaborn.set(style='ticks')
g= seaborn.FacetGrid(data=volumes_table, col='zone_names')
The above setup generates columns ok, but I cannot get the colums to represent series in my dataframe (the columns when visualizing the dataframe as a table....)
What do I need to do?
The main part of the solution is described in BBQuercus's answer: reshaping the nice, human-readable wide-format dataframe/table into a long-format table which is simpler to digest for seaborn, using seaborn.melt()
I implemented this by creating a copy of the original dataframe and melting the copy:
# first copy dataframe
vol_table2 = volumes_table.copy()
#melt it into long format
vol_table2 = pandas.melt(vol_table2, id_vars = ['zone_numbers','zone_names'], value_vars=['grv','nv','pv','hcpv'], var_name = "volume_type", value_name = "volume")
In the end I also decided to scrap the explicit FacetGrid and map setup and use seaborn.catplot (with FacetGrid functionality included).
Thanks for assistance
(PS: it must be a good idea for seaborn to accept series names for Facetgrid setup)
Once we imported all requirements:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
The FacetGrid essentially just provides a canvas to draw on. You can then use the map function to "project" plotting functions onto the canvas:
# Blueprint
g = sns.FacetGrid(dataframe, col="dataframe.column", row="dataframe.column")
g = g.map(plotting.function, "dataframe.column")
# Example with the tips dataset
g = sns.FacetGrid(tips, col="time", row="smoker")
g = g.map(plt.hist, "total_bill")
plt.show()
In your case as mentioned above I would also melt the columns first to get a tidy data format and then plot as usual. Changing what to plot however necessary:
volumes_table = volumes_table.melt(id_vars=['zone_numbers', 'zone_names'])
g = sns.FacetGrid(data=volumes_table, col='variable')
g = g.map(plt.scatter, 'zone_numbers', 'value')
plt.show()
One year of sample data:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for A and one for B).
For a single column sns.boxplot(df.index.month, df["A"]) works fine. However, sns.boxplot(df.index.month, df[["A", "B"]]) throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) in order to use seaborn's hue property as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex').
(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)
Edit
I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!
df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)
Produces:
here's a solution using pandas melting and seaborn:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
"B": rnd.randn(n)+1,
"C": rnd.randn(n) + 10, # will not be plotted
},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)
month_dfs = []
for group in df.groupby(df.index.month):
month_dfs.append(group[1])
plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
axi = plt.subplot(1, len(month_dfs), i + 1)
month_df.plot(kind='box', subplots=False, ax = axi)
plt.title(i+1)
plt.ylim([-4, 4])
plt.show()
Will give this
Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables.
You can also easily remove the axis by using
if i > 0:
y_axis = axi.axes.get_yaxis()
y_axis.set_visible(False)
in the loop before plt.show()
This is quite straight-forward using Altair:
alt.Chart(
df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
extent='min-max'
).encode(
alt.X('variable:N', title=''),
alt.Y('value:Q'),
column='month:N',
color='variable:N'
)
The code above melts the DataFrame and adds a month column. Then Altair creates box-plots for each variable broken down by months as the plot columns.
I have a data frame that looks like this:
type price1 price2
0 A 5450.0 31980.0
1 B 5450.0 20000.0
2 C 15998.0 18100.0
What I want is a clustered bar chart that plots "type" against "price". The end goal is a chart that has two bars for each type, one bar for "price1" and the other for "price2". Both columns are in the same unit ($). Using Bokeh I can group by type, but I cant seem to group by a generic "price" unit. I have this code so far:
import pandas as pd
import numpy as np
from bokeh.charts import Bar, output_file, show
from bokeh.palettes import Category20 as palette
from bokeh.models import HoverTool, PanTool
p = Bar(
df,
plot_width=1300,
plot_height=900,
label='type',
values='price2',
bar_width=0.4,
legend='top_right',
agg='median',
tools=[HoverTool(), PanTool()],
palette=palette[20])
But that only gets me one column for each type.
How can I modify my code to get two bars for each type?
What you are searching for is a grouped Bar plot.
But you have to reorganise your data a little bit, so that bokeh (or better Pandas) is able to group the data correctly.
df2 = pd.DataFrame(data={'type': ['A','B','C', 'A', 'B', 'C'],
'price':[5450, 5450, 15998, 3216, 20000, 15000],
'price_type':['price1', 'price1', 'price1', 'price2', 'price2', 'price2']})
p = Bar(
df2,
plot_width=1300,
plot_height=900,
label='type',
values='price',
bar_width=0.4,
group='price_type',
legend='top_right')
show(p)
Your table is "wide" format. you want to melt it to a long format first using pd.melt() function. For visualization,I suggest you use the "Seaborn" package and make your life easier. you can visualize every thing in one line.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
your_df = pd.DataFrame(data={'type': ['A','B','C'],
'price1':[5450, 5450, 15998],
'price2' : [3216, 20000, 15000]})
long_df = pd.melt(your_df,id_vars = ['type'],value_vars =['price1','price2'])
print long_df
my_plot = sns.barplot(x="type", y="value",hue = "variable", data=long_df)
sns.plt.show()
A good post on long and wide formats can be found here:
Reshape Long Format Multivalue Dataframes with Pandas
if you insist on using bokeh here is how you do it as renzop pointed out :
p = Bar(long_df,
plot_width=1000,
plot_height=800,
label='type',
values='value',
bar_width=0.4,
group='variable',
legend='top_right')
show(p)