given the code below, I get the expected plot attached.
Is there a way to get pandas to plot the X labels as a combination of A and B?
I've tried passing in x=['A','B'] as well as x=('A','B') which does not work...
I would like the labels just to include both of them.
It is possible to pivot and get a semi-workable solution but I don't want to actually compare the subset B side by side...
df.pivot(index='B',columns='A',values='Val').plot(kind='bar')
import pandas as pd
df = pd.DataFrame(columns=['A','B','Val'])
for a in range(2):
for b in range(3):
df = df.append({'A':str(a),'B':str(b),'Val':a+b},ignore_index=True)
df.plot(kind='bar',x='B',y='Val')
You can create multiindex by setting columns A and B as index then use plot with kind=bar:
df.set_index(['A', 'B']).plot(kind='bar', y='Val')
Related
I'm trying to create a DataFrame out of two existing ones. I read the title of some articles in the web, first column is title and the ones after are timestamps
i want to concat both data frames but leave out the ones with the same title (column one)
I tried
df = pd.concat([df1,df2]).drop_duplicates().reset_index(drop=True)
but because the other columns may not be the exact same all the time, I need to leave out every data pack that has the same first column. how would I do this?
btw sorry for not knowing all the right terms for my problem
You should first remove the duplicate rows from df2 and then concat it with df1:
df = pd.concat([df1, df2[~df2.title.isin(df1.title)]]).reset_index(drop=True)
This probably solves your problem:
import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df2=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah1','blah2','blah3','blah4','blah']
df2.columns=['blah5','blah6','blah7','blah8','blah']
for i in range(len(df.columns)):
for j in range(len(df2.columns)):
if df.columns[i] == df2.columns[j]:
df2 = df2.drop(df2.columns[j], axis = 1)
else:
continue
print(pd.concat([df, df2], axis =1))
I have two dataframes. One has two important labels that have some associated columns for each label. The second one has the same labels and more useful data for those same labels. I'm trying to replace the values in the first with the values of the second for each appropriate label. For example:
df = {'a':['x','y','z','t'], 'b':['t','x','y','z'], 'a_1':[1,2,3,4], 'a_2':[4,2,4,1], 'b_1':[1,2,3,4], 'b_2':[4,2,4,2]}
df_2 = {'n':['x','y','z','t'], 'n_1':[1,2,3,4], 'n_2':[1,2,3,4]}
I want to replace the values for n_1 and n_2 in a_1 and a_2 for a and b that are the same as n. So far i tried using the replace and map functions, and they work when I use them like this:
df.iloc[0] = df.iloc[0].replace({'a_1':df['a_1']}, df_2['n_1'].loc(df['a'].iloc[0])
I can make the substitution for one specific line, but if I try to put that in a for loop and change the numbers I get the error Cannot index by location index with a non-integer key. If I take the ilocs from there I get the original df unchanged and without any error messages. I get the same behavior when I use the map function. The way i tried to do the for loop and the map:
for i in df:
df.iloc[i] = df.iloc[i].replace{'a_1':df['a_1']}, df_2['n_1'].loc(df['a'].iloc[i])
df.iloc[i] = df.iloc[i].replace{'b_1':df['b_1']}, df_2['n_1'].loc(df['b'].iloc[i])
And so on. And for the map function:
for i in df:
df = df.map(df['b_1']}: df_2['n_1'].loc(df['b'].iloc[i])
df = df.map(df['a_1']}: df_2['n_1'].loc(df['a'].iloc[i])
I would like the resulting dataframe to have the same format as the first but with the values of the second, something like this:
df = {'a':['x','y','z','t'], 'b':['t','x','y','z'], 'an_1':[1,2,3,4], 'an_2':[1,2,3,4], 'bn_1':[1,2,3,4], 'bn_2':[1,2,3,4]}
where an and bn are the values for a and b when n is equal to a or b in the second dataframe.
Hope this is comprehensible.
I am attempting to write a plotting function that takes 2 arguments - a DataFrame and a label (df1 and label1). My function also has optional arguments to add more DataFrames if I have more groups of data to plot:
def REE_GroupPlot(df1, label1,
df2=None, label2=None,
df3=None, label3=None,
df4=None, label4=None,
df5=None, label5=None):
I am trying to plot all of the columns in a single DataFrame under the same key label and as the same color, but I am not quite sure how. I have looked into pd.groupby, but I can't seem to make it do what I want.
I would appreciate any suggestions - thanks!
What if you pass in an list of dataframes like
def REE_Group_Plot(dfs, label):
# combine all dataframes into 1
all_dfs = pd.concat(dfs)
# the rest of the code to plot the graph
REE_Group_Plot(dfs=[df, df2, df3, df4], label = “ABC”)
I have a dataframe with multiple columns, and I can easily use seaborn to plot it in a boxplot (or violinplot, etc), like this:
data1 = {'p0':[1.,2.,5,0.], 'p1':[2., 1.,1,3], 'p2':[3., 3.,2., 4.]}
df1 = pd.DataFrame.from_dict(data1)
sns.boxplot(data=df1)
What I now need is to merge this dataframe with another one, so that I can plot them in a single boxplot, just as is done here: http://seaborn.pydata.org/examples/grouped_boxplot.html
I have tried adding a column and concatenating. The result seems ok
data1 = {'p0':[1.,2.,5,0.], 'p1':[2., 1.,1,3], 'p2':[3., 3.,2., 4.]}
data2 = {'p0':[3.,1.,5,1.], 'p1':[3., 2.,3,3], 'p2':[1., 2.,2., 5.]}
df1 = pd.DataFrame.from_dict(data1)
df1['method'] = 'A'
df2 = pd.DataFrame.from_dict(data2)
df2['method'] = 'B'
df_all = pd.concat([df1,df2])
sns.boxplot(data=df_all)
This works, but it plots together data from methods A and B. However this fails:
sns.boxplot(data=df_all, hue='method')
because I then need to specify x and y. If I specify x as x=['p0', 'p1', 'p2'], them the 3 columns are averaged.
So I guess I can merge the dataframes in a different way so that its representation is simple with seaborn.
I think what would be needed here for this to work in a simple way would be to have a dataframe like so:
value method p
1.0 A p0
2.1 A p0
3.0 A p1
1.3 B p0
4.3 B p1
Then you could get what you want with sns.boxplot(data=df, hue='method', x='p', y='value')
I'm looking into how to merge df1 and df2 easily into a data frame like this one, but I'm not really a pandas expert.
Edit: Figured it out, need to use the melt method:
df3 = pd.concat([df1.melt(id_vars='method', var_name='p'),
df2.melt(id_vars='method', var_name='p')],
ignore_index=True)
sns.boxplot(x='p', y='value', hue='method', data=df3)
sns.boxplot(data=df1, hue='method')
just contains the information from the first dataframe (df1). If you just use df1, all the rows in df1["method"] have the same value ("A"), so the color will be the same for all of them.
An option would be to concatenate both dataframes; for example:
result = pd.concat([df1, df2])
sns.boxplot(data=result, hue='method')
UPDATED QUESTION:
If you pass a data=pandas.Dataframe() as argument, you should then define the x and y arguments with the column names of the dataframe.
Try this out:
fig,ax = plt.subplots(1,2,sharey=True)
for i,g in enumerate(df_all.groupby(by=df_all.method)):
sns.boxplot(g[1],ax=ax[i])
ax[i].set_title(g[0])
Result:
I am trying to plot a pandas DataFrame with TimeStamp indizes that has a time gap in its indizes. Using pandas.plot() results in linear interpolation between the last TimeStamp of the former segment and the first TimeStamp of the next. I do not want linear interpolation, nor do I want empty space between the two date segments. Is there a way to do that?
Suppose we have a DataFrame with TimeStamp indizes:
>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> df = pd.DataFrame(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
>>> df = df.cumsum()
Now lets take two time chunks of it and plot it:
>>> df = pd.concat([df['Jan 2000':'Aug 2000'], df['Jan 2001':'Aug 2001']])
>>> df.plot()
>>> plt.show()
The resulting plot has an interpolation line connecting the TimeStamps enclosing the gap. I cannot figure out how to upload pictures on this machine, but these pictures from Google Groups show my problem (interpolated.jpg, no-interpolation.jpg and no gaps.jpg). I can recreate the first as shown above. The second is achievable by replacing all gap values with NaN (see also this question). How can I achieve the third version, where the time gap is omitted?
Try:
df.plot(x=df.index.astype(str))
You may want to customize ticks and tick labels.
EDIT
That works for me using pandas 0.17.1 and numpy 1.10.4.
All you really need is a way to convert the DatetimeIndex to another type which is not datetime-like. In order to get meaningful labels I chose str. If x=df.index.astype(str) does not work with your combination of pandas/numpy/whatever you can try other options:
df.index.to_series().dt.strftime('%Y-%m-%d')
df.index.to_series().apply(lambda x: x.strftime('%Y-%m-%d'))
...
I realized that resetting the index is not necessary so I removed that part.
In my case I had DateTimeIndex objects instead of TimeStamp, but the following works for me in pandas 0.24.2 to eliminate the time series gaps after converting the DatetimeIndex objects to string.
df = pd.read_sql_query(sql, sql_engine)
df.set_index('date'), inplace=True)
df.index = df.index.map(str)