I want a scatter plot where x-axis is a datetime, y-axis is an int. And I have only a few of datapoints that are discrete and not continuous, so I don't want to connect datapoints.
My DataFrame is:
df = pd.DataFrame({'datetime':[dt.datetime(2016,1,1,0,0,0), dt.datetime(2016,1,4,0,0,0),
dt.datetime(2016,1,9,0,0,0)], 'value':[10, 7, 8]})
If I use "normal" plot than I got a "line" figure:
df.plot(x='datetime', y='value')
But how can I plot only the dots? This gives error:
df.plot.scatter(x='datetime', y='value')
KeyError: 'datetime'
Of course I can use some cheat to get the result I want, for example:
df.plot(x='datetime', y='value', marker='o', linewidth=0)
But I don't understand why the scatter version does not work...
Thank you for help!
Scatter plot can be drawn by using the DataFrame.plot.scatter() method. Scatter plot requires numeric columns for x and y axis. These
can be specified by x and y keywords each.
Alternative Approach:
In [71]: df['day'] = df['datetime'].dt.day
In [72]: df.plot.scatter(x='day', y='value')
Out[72]: <matplotlib.axes._subplots.AxesSubplot at 0x25440a1bc88>

Related
This question already has answers here:
Barplot and line plot in seaborn/matplotlib
(1 answer)
How to line plot timeseries data on a bar plot
(1 answer)
pandas bar plot combined with line plot shows the time axis beginning at 1970
(2 answers)
Closed 4 months ago.
I have barplot and lineplots that share the same x axis that I want to plot together. Here's the picture:
I want the graph plot to keep the "average_daily_price" as y axis and disregard "num_sales" as y axis. Here's the result I want to achieve:
I've tried the following
fig, ax1 = plt.subplots()
sns.lineplot(filtered_df, x='date', y='average_daily_price', ax=ax1)
sns.barplot(filtered_df, x="date", y="num_sales", alpha=0.5, ax=ax1)
But it gives weird result. I've also tried twinx() but couldn't make it work, besides it creates second y axis which I don't want.
Edit: running rafael's code results in this plot:
I'd like to add that date is in a datetime64[ns] format.
Edit 2: This post has been closed for duplicate. I've already seen the posts in duplicate list and tried the solutions listed, but they do not apply to my case, I don't know why, that's what I'm trying to figure out by opening new question. I'm guessing it has to do with my x variable being a datetime object.
The seaborn "barplot" is dedicated to plotting categorical variables. As such, it understands that each date is an unique value and plots the corresponding values sequentially.
This breaks the behavior of the dates in the x-axis.
A workaround for this is to use matplotlibs ax.bar directly:
# imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
import pandas as pd
# generate dummy data
rng = np.random.default_rng()
size=100
vals = rng.normal(loc=0.02,size=size).cumsum() + 50
drange = pd.date_range("2014-01", periods=size, freq="D")
num_sales = rng.binomial(size=size,n=50,p=0.4)
# store data in a pandas DF
df = pd.DataFrame({'date': drange,
'average_daily_price': vals,
'num_sales': num_sales})
# setup axes
fig, ax1 = plt.subplots(figsize=(12,3))
# double y-axis is necessary due to the difference in the range of both variables
ax2 = ax1.twinx()
# plot the number of sales as a series of vertical bars
ax2.bar(df['date'], df['num_sales'], color='grey', alpha=0.5, label='Number of sales')
# plot the price as a time-series line plot
sns.lineplot(data=df, x='date', y='average_daily_price', ax=ax1)
# format the x-axis ticks as dates in weekly intervals
# the format is datetime64[ns]
ax1.xaxis.set_major_locator(mpl.dates.WeekdayLocator(interval=1, byweekday=1)) #weekly
ax1.xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m-%d'))
# rotate the x-axis tick labels for readability
ax1.tick_params(axis='x', rotation=50)
and the output is
I have data as follows:
I want to plot for three different dataframes of the same kind with same no. of columns two line plot and one scatter plot( which is a smaller dataframe from the rest). The code I have used is as follows:
fig, axs = plt.subplots(figsize = (16,8))
df1.plot(ax=axs,x='day-month', y='2015Data_Value', kind = 'scatter')
df2.plot(ax=axs, x='day-month', y='Data_Value', linewidth=2)
df3.plot(ax=axs, x='day-month', y='Data_Value', linewidth=2)
but there is an error for scatter plot as it is not able to take x-axis value, always the error shows with 'day-month', but line plot run fine and give correct plotting when you comment out scatter plot. How does one solve such problem ?
I am trying to plot my pandas Series with its values but having wrong values on my x-axis. I have done the same thing more than 3 times on the same workbook. What am doing wrong here?
S1=rog1.groupby('Date')['availabi'].mean()
S1.index
# output
DatetimeIndex(['2018-05-10', '2018-06-10', '2018-07-10'],
dtype='datetime64[ns]', name='Date', freq=None)
But when I decide to plot the lot.
plt.figure(figsize=(10,4))
plt.plot(S1.index, S1)
The below is what I get
The y-axis values are fine. I dunno where the plotted values are coming from. I only have 3 lines in this Series
The issue is that matplotlib auto-detects the number and spacing of x-ticks to populate the x-axis without overlapping labels, and also without leaving too much white space.
The simplest workaround I can think of:
1. Create figure and axis handles
2. Plot your data in the axis
3. Manually set the xtick positions and labels
Code to replace your two lines of plotting:
fig, ax = plt.subplots(figsize=(10, 4))
S1.plot(ax=ax)
ax.set_xticks(S1.index);
ax.set_xticklabels(S1.index.strftime('%Y-%m-%d'));
I have one plot in the format:
df.plot()
The other one is in the format:
fig,ax=plt.subplots()
ax.plot_date(t,y,'b-')
I cannot convert the first plot into the standard matplotlib plot because it is resampled from a pandas timeseries.
How do I overlay the two plots?
Try df.plot(ax=ax). This causes the dataframe object to be plotted in the supplied axis.
I have time series in a Pandas dateframe with a number of columns which I'd like to plot. Is there a way to set the x-axis to always use the index from a dateframe?
When I use the .plot() method from Pandas the x-axis is formatted correctly however I when I pass my dates and the column(s) I'd like to plot directly to matplotlib the graph doesn't plot correctly. Thanks in advance.
plt.plot(site2.index.values, site2['Cl'])
plt.show()
FYI: site2.index.values produces this (I've cut out the middle part for brevity):
array([
'1987-07-25T12:30:00.000000000+0200',
'1987-07-25T16:30:00.000000000+0200',
'2010-08-13T02:00:00.000000000+0200',
'2010-08-31T02:00:00.000000000+0200',
'2010-09-15T02:00:00.000000000+0200'
],
dtype='datetime64[ns]')
It seems the issue was that I had .values. Without it (i.e. site2.index) the graph displays correctly.
You can use plt.xticks to set the x-axis
try:
plt.xticks( site2['Cl'], site2.index.values ) # location, labels
plt.plot( site2['Cl'] )
plt.show()
see the documentation for more details: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
That's Builtin Right Into To plot() method
You can use yourDataFrame.plot(use_index=True) to use the DataFrame Index On X-Axis.
The "use_index=True" sets the DataFrame Index on the X-Axis.
Read More Here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
you want to use matplotlib to select a 'sensible' scale just like me, there is one way can solve this question. using a Pandas dataframe index as values for x-axis in matplotlib plot. Code:
ax = plt.plot(site2['Cl'])
x_ticks = ax.get_xticks() # use matplotlib default xticks
x_ticks = list(filter(lambda x: x in range(len(site2)), x_ticks))
ax.set_xticklabels([' '] + site2.index.iloc[x_ticks].to_list())