How do I plot an entire Pandas DataFrame under the same label? - python

I am attempting to write a plotting function that takes 2 arguments - a DataFrame and a label (df1 and label1). My function also has optional arguments to add more DataFrames if I have more groups of data to plot:
def REE_GroupPlot(df1, label1,
df2=None, label2=None,
df3=None, label3=None,
df4=None, label4=None,
df5=None, label5=None):
I am trying to plot all of the columns in a single DataFrame under the same key label and as the same color, but I am not quite sure how. I have looked into pd.groupby, but I can't seem to make it do what I want.
I would appreciate any suggestions - thanks!

What if you pass in an list of dataframes like
def REE_Group_Plot(dfs, label):
# combine all dataframes into 1
all_dfs = pd.concat(dfs)
# the rest of the code to plot the graph
REE_Group_Plot(dfs=[df, df2, df3, df4], label = “ABC”)

Related

pandas bar plot xlabel based on two column values

given the code below, I get the expected plot attached.
Is there a way to get pandas to plot the X labels as a combination of A and B?
I've tried passing in x=['A','B'] as well as x=('A','B') which does not work...
I would like the labels just to include both of them.
It is possible to pivot and get a semi-workable solution but I don't want to actually compare the subset B side by side...
df.pivot(index='B',columns='A',values='Val').plot(kind='bar')
import pandas as pd
df = pd.DataFrame(columns=['A','B','Val'])
for a in range(2):
for b in range(3):
df = df.append({'A':str(a),'B':str(b),'Val':a+b},ignore_index=True)
df.plot(kind='bar',x='B',y='Val')
You can create multiindex by setting columns A and B as index then use plot with kind=bar:
df.set_index(['A', 'B']).plot(kind='bar', y='Val')

How to create a hexbin plot from a pandas dataframe

I have this dataframe:
! curl -O https://raw.githubusercontent.com/msu-cmse-courses/cmse202-S21-student/master/data/Dataset.data
import pandas as pd
#I read it in
data = pd.read_csv("Dataset.data", delimiter=' ', header = None)
#Now I want to add column titles to the file so I add them
data.columns = ['sex','length','diameter','height','whole_weight','shucked_weight','viscera_weight','shell_weight','rings']
print(data)
Now I want to grab the x variable column shell_weight and the y variable column rings and graph them as a histogram using plt.hexbin:
df = pd.DataFrame(data)
plt.hexbin(x='shell_weight', y='rings')
For some reason when I graph the code it is not working:
ValueError: First argument must be a sequence
Can anyone help me graph these 2 variables?
ValueError: First argument must be a sequence
The issue with plt.hexbin(x='shell_weight', y='rings') is that matplotlib doesn't know what shell_weight and rings are supposed to be. It doesn't know about df unless you specify it.
Since you already have a dataframe, it's simplest to plot with pandas, but pure matplotlib is still possible if you specify the source df:
df.plot.hexbin (simplest)
In this case, pandas will automatically infer the columns from df, so we can just pass the column names:
df.plot.hexbin(x='shell_weight', y='rings') # pandas infers the df source
plt.hexbin
With pure matplotlib, either pass the actual columns:
plt.hexbin(x=df.shell_weight, y=df.rings) # actual columns, not column names
# ^^^ ^^^
Or pass the column names while specifying the data source:
plt.hexbin(x='shell_weight', y='rings', data=df) # column names with df source
# ^^^^^^^

Plot each column as a line and group by the first row of each column of Dataframe with Python

So I have a dataframe with 8-year energy values and types of the object on the first row. I need to plot all 3000 sources grouped by their types. 3000 Lines, X-axis is the 8 years, Y-axis is the energy.
It's yearly energy and the type of source
Sorry if anything is missing, it's my first question.
I've tried to just delete the types and plot without grouping by:
blazar_eneryg_with_type.plot(x ='Year', kind = 'line')
Here's the result, that I need but only in 3 groups
Ok having seen your raw data you must have done some processing before hand. Looks like you are calling df.plot(). This is what I would do:
# read in your csv
df = pd.read_csv('your_csv')
# drop redundant column
df.drop(columns=['Unnamed: 0'], inplace=True)
#transpose and drop type column and create second df
df1 = df.T.drop(['type1'])
# create color map
colors = {'bcu': 'red', 'bll': 'blue', 'fsrq': 'green'}
# plot colors using mapping from the type column
df1.plot(color=df['type1'].apply(lambda x: colors[x]))
plt.show()
The graph is an absolute mess and I am not sure why you would want to plot that much data on a single chart but there you go.
EDIT : To change the legend
ax = df1.plot(color=df['type1'].apply(lambda x: colors[x]))
ax.legend(['bcu', 'bll', 'fsrq'])
plt.show()

How to join pandas dataframe so that seaborn boxplot or violinplot can use a column as hue?

I have a dataframe with multiple columns, and I can easily use seaborn to plot it in a boxplot (or violinplot, etc), like this:
data1 = {'p0':[1.,2.,5,0.], 'p1':[2., 1.,1,3], 'p2':[3., 3.,2., 4.]}
df1 = pd.DataFrame.from_dict(data1)
sns.boxplot(data=df1)
What I now need is to merge this dataframe with another one, so that I can plot them in a single boxplot, just as is done here: http://seaborn.pydata.org/examples/grouped_boxplot.html
I have tried adding a column and concatenating. The result seems ok
data1 = {'p0':[1.,2.,5,0.], 'p1':[2., 1.,1,3], 'p2':[3., 3.,2., 4.]}
data2 = {'p0':[3.,1.,5,1.], 'p1':[3., 2.,3,3], 'p2':[1., 2.,2., 5.]}
df1 = pd.DataFrame.from_dict(data1)
df1['method'] = 'A'
df2 = pd.DataFrame.from_dict(data2)
df2['method'] = 'B'
df_all = pd.concat([df1,df2])
sns.boxplot(data=df_all)
This works, but it plots together data from methods A and B. However this fails:
sns.boxplot(data=df_all, hue='method')
because I then need to specify x and y. If I specify x as x=['p0', 'p1', 'p2'], them the 3 columns are averaged.
So I guess I can merge the dataframes in a different way so that its representation is simple with seaborn.
I think what would be needed here for this to work in a simple way would be to have a dataframe like so:
value method p
1.0 A p0
2.1 A p0
3.0 A p1
1.3 B p0
4.3 B p1
Then you could get what you want with sns.boxplot(data=df, hue='method', x='p', y='value')
I'm looking into how to merge df1 and df2 easily into a data frame like this one, but I'm not really a pandas expert.
Edit: Figured it out, need to use the melt method:
df3 = pd.concat([df1.melt(id_vars='method', var_name='p'),
df2.melt(id_vars='method', var_name='p')],
ignore_index=True)
sns.boxplot(x='p', y='value', hue='method', data=df3)
sns.boxplot(data=df1, hue='method')
just contains the information from the first dataframe (df1). If you just use df1, all the rows in df1["method"] have the same value ("A"), so the color will be the same for all of them.
An option would be to concatenate both dataframes; for example:
result = pd.concat([df1, df2])
sns.boxplot(data=result, hue='method')
UPDATED QUESTION:
If you pass a data=pandas.Dataframe() as argument, you should then define the x and y arguments with the column names of the dataframe.
Try this out:
fig,ax = plt.subplots(1,2,sharey=True)
for i,g in enumerate(df_all.groupby(by=df_all.method)):
sns.boxplot(g[1],ax=ax[i])
ax[i].set_title(g[0])
Result:

python pandas dataframe - can't figure out how to lookup an index given a value from a df

I have 2 dataframes of numerical data. Given a value from one of the columns in the second df, I would like to look up the index for the value in the first df. More specifically, I would like to create a third df, which contains only index labels - using values from the second to look up its coordinates from the first.
listso = [[21,101],[22,110],[25,113],[24,112],[21,109],[28,108],[30,102],[26,106],[25,111],[24,110]]
data = pd.DataFrame(listso,index=list('abcdefghij'), columns=list('AB'))
rollmax = pd.DataFrame(data.rolling(center=False,window=5).max())
So for the third df, I hope to use the values from rollmax and figure out which row they showed up in data. We can call this third df indexlookup.
For example, rollmax.ix['j','A'] = 30, so indexlookup.ix['j','A'] = 'g'.
Thanks!
You can build a Series with the indexing the other way around:
mapA = pd.Series(data.index, index=data.A)
Then mapA[rollmax.ix['j','A']] gives 'g'.

Categories