Scatter plot form dataframe with index on x-axis - python

I've got pandas DataFrame, df, with index named date and the columns columnA, columnB and columnC
I am trying to scatter plot index on a x-axis and columnA on a y-axis using the DataFrame syntax.
When I try:
df.plot(kind='scatter', x='date', y='columnA')
I ma getting an error KeyError: 'date' probably because the date is not column
df.plot(kind='scatter', y='columnA')
I am getting an error:
ValueError: scatter requires and x and y column
so no default index on x-axis.
df.plot(kind='scatter', x=df.index, y='columnA')
I am getting error
KeyError: "DatetimeIndex(['1818-01-01', '1818-01-02', '1818-01-03', '1818-01-04',\n
'1818-01-05', '1818-01-06', '1818-01-07', '1818-01-08',\n
'1818-01-09', '1818-01-10',\n ...\n
'2018-03-22', '2018-03-23', '2018-03-24', '2018-03-25',\n
'2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29',\n
'2018-03-30', '2018-03-31'],\n
dtype='datetime64[ns]', name='date', length=73139, freq=None) not in index"
I can plot it if I use matplotlib.pyplot directly
plt.scatter(df.index, df['columnA'])
Is there a way to plot index as x-axis using the DataFrame kind syntax?

This is kind of ugly (I think the matplotlib solution you used in your question is better, FWIW), but you can always create a temporary DataFrame with the index as a column usinng
df.reset_index()
If the index was nameless, the default name will be 'index'. Assuming this is the case, you could use
df.reset_index().plot(kind='scatter', x='index', y='columnA')

A more simple solution would be:
df['x1'] = df.index
df.plot(kind='scatter', x='x1', y='columnA')
Just create the index variable outside of the plot statement.

At least in pandas>1.4 whats easiest is this:
df['columnA'].plot(style=".")
This lets you mix scatter and line plots, as well as use the standard pandas plot interface

Related

How can I plot large amount of data? [duplicate]

I've got pandas DataFrame, df, with index named date and the columns columnA, columnB and columnC
I am trying to scatter plot index on a x-axis and columnA on a y-axis using the DataFrame syntax.
When I try:
df.plot(kind='scatter', x='date', y='columnA')
I ma getting an error KeyError: 'date' probably because the date is not column
df.plot(kind='scatter', y='columnA')
I am getting an error:
ValueError: scatter requires and x and y column
so no default index on x-axis.
df.plot(kind='scatter', x=df.index, y='columnA')
I am getting error
KeyError: "DatetimeIndex(['1818-01-01', '1818-01-02', '1818-01-03', '1818-01-04',\n
'1818-01-05', '1818-01-06', '1818-01-07', '1818-01-08',\n
'1818-01-09', '1818-01-10',\n ...\n
'2018-03-22', '2018-03-23', '2018-03-24', '2018-03-25',\n
'2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29',\n
'2018-03-30', '2018-03-31'],\n
dtype='datetime64[ns]', name='date', length=73139, freq=None) not in index"
I can plot it if I use matplotlib.pyplot directly
plt.scatter(df.index, df['columnA'])
Is there a way to plot index as x-axis using the DataFrame kind syntax?
This is kind of ugly (I think the matplotlib solution you used in your question is better, FWIW), but you can always create a temporary DataFrame with the index as a column usinng
df.reset_index()
If the index was nameless, the default name will be 'index'. Assuming this is the case, you could use
df.reset_index().plot(kind='scatter', x='index', y='columnA')
A more simple solution would be:
df['x1'] = df.index
df.plot(kind='scatter', x='x1', y='columnA')
Just create the index variable outside of the plot statement.
At least in pandas>1.4 whats easiest is this:
df['columnA'].plot(style=".")
This lets you mix scatter and line plots, as well as use the standard pandas plot interface

Whats the easiest way to plot two dataframes onto a single plot

I have two dataframes df1 and df2.
df1 has two columns, column 1 'key' with 20 items. Column 2 'df1_val' with values against each key.
df2 is similar but column 2 is called df2_val.
Whats the easiest way to plot a single plot with both df1_val and df2_val - x-axis assigned to keys
I would do it this way:
Start by naming your df_1value and df_2value as the same 'value', then
fig = plt.figure()
for r in [df_1,df_2]:
plt.plot(r['key'], r['value'])
plt.xlim(0,<a value>)
plt.ylim(0,<a value >)
plt.show()
An alternativ is to do this. Plot what you need for first dataframe df1 and use ax to "force" the other plots in the same graph
ax = df.plot()
```
And
```
df2.plot(ax=ax)
```

Python pandas plot scatter datetime error

I want a scatter plot where x-axis is a datetime, y-axis is an int. And I have only a few of datapoints that are discrete and not continuous, so I don't want to connect datapoints.
My DataFrame is:
df = pd.DataFrame({'datetime':[dt.datetime(2016,1,1,0,0,0), dt.datetime(2016,1,4,0,0,0),
dt.datetime(2016,1,9,0,0,0)], 'value':[10, 7, 8]})
If I use "normal" plot than I got a "line" figure:
df.plot(x='datetime', y='value')
But how can I plot only the dots? This gives error:
df.plot.scatter(x='datetime', y='value')
KeyError: 'datetime'
Of course I can use some cheat to get the result I want, for example:
df.plot(x='datetime', y='value', marker='o', linewidth=0)
But I don't understand why the scatter version does not work...
Thank you for help!
Scatter plot can be drawn by using the DataFrame.plot.scatter() method. Scatter plot requires numeric columns for x and y axis. These
can be specified by x and y keywords each.
Alternative Approach:
In [71]: df['day'] = df['datetime'].dt.day
In [72]: df.plot.scatter(x='day', y='value')
Out[72]: <matplotlib.axes._subplots.AxesSubplot at 0x25440a1bc88>


How to change xlabel in pandas matplotlib stacked bar

The easiest way to plot a pandas dataframe is as described in the documentation like this:
http://pandas.pydata.org/pandas-docs/stable/visualization.html
In my case I want to create a stacked bar chart:
df2.plot(kind='bar', stacked=True);
This is all working well, but I would like to use one column of the df2 as xlabels and not simply have [1,2,3,4... etc] as labels. Is there a simple way to achieve it with an additional parameter in the plot function or do I need to do it in a more complicated way?
The plot uses the index of your dataframe as the labels so if you want to your use use a particular column, set it as your index:
df2.index = df2.labelcol
df2.plot(kind='bar', stacked=True)

Using a Pandas dataframe index as values for x-axis in matplotlib plot

I have time series in a Pandas dateframe with a number of columns which I'd like to plot. Is there a way to set the x-axis to always use the index from a dateframe?
When I use the .plot() method from Pandas the x-axis is formatted correctly however I when I pass my dates and the column(s) I'd like to plot directly to matplotlib the graph doesn't plot correctly. Thanks in advance.
plt.plot(site2.index.values, site2['Cl'])
plt.show()
FYI: site2.index.values produces this (I've cut out the middle part for brevity):
array([
'1987-07-25T12:30:00.000000000+0200',
'1987-07-25T16:30:00.000000000+0200',
'2010-08-13T02:00:00.000000000+0200',
'2010-08-31T02:00:00.000000000+0200',
'2010-09-15T02:00:00.000000000+0200'
],
dtype='datetime64[ns]')
It seems the issue was that I had .values. Without it (i.e. site2.index) the graph displays correctly.
You can use plt.xticks to set the x-axis
try:
plt.xticks( site2['Cl'], site2.index.values ) # location, labels
plt.plot( site2['Cl'] )
plt.show()
see the documentation for more details: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
That's Builtin Right Into To plot() method
You can use yourDataFrame.plot(use_index=True) to use the DataFrame Index On X-Axis.
The "use_index=True" sets the DataFrame Index on the X-Axis.
Read More Here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
you want to use matplotlib to select a 'sensible' scale just like me, there is one way can solve this question. using a Pandas dataframe index as values for x-axis in matplotlib plot. Code:
ax = plt.plot(site2['Cl'])
x_ticks = ax.get_xticks() # use matplotlib default xticks
x_ticks = list(filter(lambda x: x in range(len(site2)), x_ticks))
ax.set_xticklabels([' '] + site2.index.iloc[x_ticks].to_list())

Categories