plotting multiple columns of a pandas dataframe - python

I am new python and I have two columns in a dataframe that i want to plot against date
plt.scatter(thing.date,thing.loc[:,['numbers','more_numbers']])
my intuition is the above should work (because matlab allows for this kind of thing), but it doesn't, and I'm not sure why.
Is there away around this?
I'm hoping to plot these columns for a sequence of 4 dataframes on the same axes - so i'd like to use a command like the above so I can colour the columns from each data frame to make it distinctive.

Easiest is to do a loop:
fig, ax = plt.subplots()
for col in ['numbers', 'more_numbers']:
ax.scatter(things.date, things[col], label=col)
# or
# things.scatter(x='date', y=col, label=col, ax=ax)
plt.show()

Related

Python Stacked barchart with dataframe

I'm trying to visualize a data frame I have with a stacked barchart, where the x is websites, the y is frequency and then the groups on the barchart are different groups using them.
This is the dataframe:
This is the plot created just by doing this:
web_data_roles.plot(kind='barh', stacked=True, figsize=(20,10))
As you can see its not what I want, vie tried changing the plot so the axes match up to the different columns of the dataframe but it just says no numerical data to plot, Not sure how to go about this anymore. so all help is appreciated
You need to organise your dataframe so that role is a column.
set_index() initial preparation
unstack() to move role out of index and make a column
droplevel() to clean up multi index columns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize=[10,5],
sharey=False, sharex=False, gridspec_kw={"hspace":0.3})
df = pd.read_csv(io.StringIO("""website,role,freq
www.bbc.co.uk,director,2000
www.bbc.co.uk,technical,500
www.twitter.com,director,4000
www.twitter.com,technical,1500
"""))
df.set_index(["website","role"]).unstack(1).droplevel(0,axis=1).plot(ax=ax, kind="barh", stacked=True)

How to remove the extra figures created when running a for loop to create seaborn plots

I am trying to do EDA along with exploring the Matplotlib and Seaborn libraries.
The data_cat DataFrame has 4 columns and I want to create plots in a single row with 4 columns.
For that, I created a figure object with 4 axes objects.
fig, ax = plt.subplots(1,4, figsize = (16,4))
for i in range(len(data_cat.columns)):
sns.catplot(x = data_cat.columns[i], kind = 'count', data = data_cat, ax= ax[i])
The output for it is a figure with the 4 plots (as required) but it is followed by 4 blank plots that I think are the extra figure objects generated by the sns.catplot function.
Your code does not work as intended because sns.catplot() is a figure level function, that is designed to create its own grid of subplots if desired. So if you want to set up the subplot grid directly in matplotlib, as you do with your first line, you should use the appropriate axes level function instead, in this case sns.countplot():
fig, ax = plt.subplots(1, 4, figsize = (16,4))
for i in range(4):
sns.countplot(x = data_cat.columns[i], data = data_cat, ax= ax[i])
Alternatively, you could use pandas' df.melt() method to tidy up your dataset so that all the values from your four columns are in one column (say 'col_all'), and you have another column (say 'subplot') that identifies from which original column each value is. Then you can get all the subplots with one call:
sns.catplot(x='col_all', kind='count', data=data_cat, col='subplot')
I answered a related question here.

Seaborn plot pandas dataframe by multiple groupby

I have pandas dataframe where I have nested 4 categories (50,60,70,80) within two categories (positive, negative) and I would like to plot with seaborn kdeplot of a column (eg., A_mean...) based on groupby. What I want to achieve is this (this was done by splitting the pandas to a list). I went over several posts, this code (Multiple single plots in seaborn with pandas groupby data) works for one level but not for the two if I want to plot this for each Game_RS:
for i, group in df_hb_SLR.groupby('Condition'):
sns.kdeplot(data=group['A_mean_per_subject'], shade=True, color='blue', label = 'label name')
I tried to use this one (Seaborn groupby pandas Series) but the first answer did not work for me:
sns.kdeplot(df_hb_SLR.A_mean_per_subject, groupby=df_hb_SLR.Game_RS)
AttributeError: 'Line2D' object has no property 'groupby'
and the pivot answer I was not able to make work.
Is there a direct way from seaborn or any better way directly from pandas Dataframe?
My data are accessible in csv format under this link -- data and I load them as usual:
df_hb_SLR = pd.read_csv('data.csv')
Thank you for help.
Here is a solution using seaborn's FacetGrid, which makes this kind of things really easy
g = sns.FacetGrid(data=df_hb_SLR, col="Condition", hue='Game_RS', height=5, aspect=0.5)
g = g.map(sns.kdeplot, 'A_mean_per_subject', shade=True)
g.add_legend()
The downside of FacetGrid is that it creates a new figure, so If you'd like to integrate those plots into a larger ensemble of subplots, you could achieve the same result using groupby() and some looping:
group1 = "Condition"
N1 = len(df_hb_SLR[group1].unique())
group2 = 'Game_RS'
target = 'A_mean_per_subject'
height = 5
aspect = 0.5
colour = ['gray', 'blue', 'green', 'darkorange']
fig, axs = plt.subplots(1,N1, figsize=(N1*height*aspect,N1*height*aspect), sharey=True)
for (group1Name,df1),ax in zip(df_hb_SLR.groupby(group1),axs):
ax.set_title(group1Name)
for (group2Name,df2),c in zip(df1.groupby(group2), colour):
sns.kdeplot(df2[target], shade=True, label=group2Name, ax=ax, color = c)

Plot columns of a dataframe using default colormap, except one column using a different color

I have a whole bunch of columns in a dataframe that I'm plotting like this:
df.xs('Mean', level=1).cumsum().plot(cmap='winter').
This plots the cumulative sum of the means over time for all of my columns. (level 0 multiindex is time) However, I'd like to highlight a specific column in the plot with a different colour, but am unsure how to go about it. Something like this:
df.xs('Mean', level=1).cumsum().plot(cmap='winter', color={'my_specific_column':'red'}).
which will plot all of the other columns along the 'winter' cmap spectrum (teal to blue), except for 'my_specific_column', which will be red.
Any ideas?
I think the simple option is to plot twice
to_draw = df.xs('Mean', level=1).cumsum()
fig, ax = plt.subplots()
to_draw.drop('my_specific_column', axis=1).plot(cmap='winter', ax=ax)
to_draw['my_specific_column'].plot(color='red', ax=ax, label='my_specific_column')
ax.legend()
This is another solution. Firstly we get the colors from the colormap, and set them to each of the columns we want to plot. We update the dictionary with forcing the color we want. And finally, we set a map to be applied to the plot.
import matplotlib
cmap = matplotlib.cm.get_cmap('winter')
# get colors for each of the columns
columns = df.columns.values
colors = [cmap(i) for i in range(1, len(columns)+1)]
# build a dictionary with them
COLORS_DICT = dict(zip(columns, colors))
# update with the column we want to change
COLORS_DICT.update({"my_specific_column": 'red'})
COLORS_MAP = list(map(lambda x : COLORS_DICT[x], COLORS_DICT))
df[columns].plot(color=COLORS_MAP)

Plot more than data frame on the same figure pandas

I have a list of many aggregated data frames with identical structure.
I would like to plot two columns from each dataframe on the same graph.
I used this code snippet but it gives me a separate plot for each dataframe:
# iterate through a list
for df in frames:
df.plot(x='Time', y='G1', figsize=(16, 10))
plt.hold(True)
plt.show()
If you have each set indexed, you can just concatenate all of them and plot them at once without having to iterate.
# If not indexed:
# frames = [df.assign(sample=i) for i, df in enumerate(frames)]
df = pd.concat(frames).pivot(index='Time', columns='sample', values='G1')
df.plot(figsize=(16, 10));
This helps make sure that your data is aligned and plt.hold is deprecated in matplotlib 2.0.
As you noticed, pandas.DataFrame.plot is not affected by matplotlib's hold parameter because it creates a new figure every time. The way to get around this is to pass in the ax parameter explicitly. If ax is not None, it tells the DataFrame to plot on a specific set of axes instead of making a new figure on its own.
You can prepare a set of axes ahead of time, or use the return value of the first call to df.plot. I show the latter approach here:
ax = None
for df in frames:
ax = df.plot(x='Time', y='G1', figsize=(16, 10), ax=ax)
plt.hold(True)
plt.show()

Categories