I have a simple dataframe with two columns, 'date' and 'amount'. I want to plot the amount using date as the x-axis. The first lines of the data are:
22/05/2018,52068.67
21/05/2018,52159.19
15/05/2018,52744.03
08/05/2018,54666.21
08/05/2018,54677.51
01/05/2018,53890.59
30/04/2018,54812.25
27/04/2018,52258.23
26/04/2018,52351.47
23/04/2018,49777.04
23/04/2018,49952.44
23/04/2018,49992.44
05/04/2018,53238.59
03/04/2018,53631.09
03/04/2018,53839.64
28/03/2018,50836.78
26/03/2018,51206.67
26/03/2018,51372.02
14/03/2018,51110.17
12/03/2018,51411.31
06/03/2018,51169.91
05/03/2018,51374.57
27/02/2018,48728.85
27/02/2018,48730.5
16/02/2018,44988.25
14/02/2018,41948.03
12/02/2018,43776.31
12/02/2018,43800.31
12/02/2018,43840.11
05/02/2018,29358.96
26/01/2018,39491.0
24/01/2018,36470.03
23/01/2018,36562.76
23/01/2018,36616.61
22/01/2018,36582.46
22/01/2018,36665.71
22/01/2018,36743.31
17/01/2018,36965.3
16/01/2018,37044.6
09/01/2018,42083.65
08/01/2018,42183.39
05/01/2018,42285.41
03/01/2018,41537.51
03/01/2018,41579.51
02/01/2018,41945.32
27/12/2017,43003.33
27/12/2017,43217.29
18/12/2017,38208.63
15/12/2017,38315.53
However, the plot gives me points that don't appear in the data. For example, in May 2018 there is no value near 30000.
My code is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("test.csv", header=None, names =['date', 'amount'])
df['time'] = pd.to_datetime(df['date'])
df.set_index(['time'],inplace=True)
df['amount'].plot()
plt.show()
What am I doing wrong?
You need to covert the dates to date time using correct format and use pandas plot
df['date'] = pd.to_datetime(df['date'], format = '%d/%m/%Y')
df.plot('date', 'amount')
Related
I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()
I am trying to create a line plot in order of time. For the df below, the first value appears at 07:00:00 and finishes at 00:00:40.
But the timestamps aren't assigned to the x-axis and the row after midnight is plotted first, instead of last.
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'Time' : ['7:00:00','10:30:00','12:40:00','16:25:00','18:30:00','22:40:00','00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_timedelta(df['Time'])
plt.plot(df['Time'], df['Value'])
plt.show()
print(df)
Your timedelta object is being converted to a numerical representation by matplotlib. That's why you aren't getting a date on your x axis. And the plot is going in order. It's just that '00:40:00' is less than all the other times so it's being plotted as the left most point.
What you can do instead is use a datetime format to include days, which will indicate that 00:40:00 should come last since it'll fall on the next day. You can also use pandas plotting method for easier formatting:
d = ({
'Time' : ['2019/1/1 7:00:00','2019/1/1 10:30:00','2019/1/1 12:40:00',
'2019/1/1 16:25:00','2019/1/1 18:30:00','2019/1/1 22:40:00',
'2019/1/2 00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_datetime(df['Time'])
df.plot(x='Time', y='Value')
Update
To set the tick/tick lables at your time points is a bit tricky. This post will give you an idea of how the positioning works. Basically, you'll need to use something like matplotlib.dates.date2num to get the numerical representation of datetime:
xticks = [matplotlib.dates.date2num(x) for x in df['Time']]
xticklabels = [x.strftime('%H:%M') for x in df['Time']]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
Can someone help me with my problem because I am newby to pandas and I have been confused.
Initially I made some subset selections and everything OK with my new dataframe(which is type pandas.core.frame.DataFrame). My new dataframe has two columns (date, count) and I want to plot a line plot having the date at the x axis and the count on y axis.
Suppose the name of the data frame is df and the names of the columns are date and count according to pandas documentation the command is:
ts = pd.Series(df['count'], index = df['date'])
ts.plot()
where is the wrong?
any help
It's best to refer Pandas website for first hand information. However, you can try the below code out-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # For show command
# Creating a dummy dataframe (You can also go ahead with Series)
df = pd.DataFrame([45, 20], columns=['count'], index=['12/11/2018', '10/1/2018'])
# Converting string to datetime format
df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
df.index
# DatetimeIndex(['2018-11-12', '2018-01-10'], dtype='datetime64[ns]', freq=None)
df.plot()
plt.show()
I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:
However, when I use the same data in a Panda Series, the chart goes all lumpy like this:
How can I create a smooth time series line graph (like the first image) using a Panda Series?
Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show
Here is what I am doing to get the lumpy one using a Panda series:
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
How do I make this second graph show as a nice smooth proper time series?
* This is my first StackOverflow question so please shout if I have left anything out or not been clear *
Any help greatly appreciated
When you did parse_dates=True, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as
data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})
The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True inside the read_csv call.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv',
index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
I am new to matplotlib and need some guidance. I have being trying to reproduce this code from "Candlestick Plot from a Pandas DataFrame" as a way to learn by adding a "read_csv" function.
my error message keep saying "valueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements"
my questions is:
what am I missing in the code? I read in the cvs, I use the right columns of data, and I understand there is a reset of index, but I don't know why it keeps getting an error.
please help.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates
import datetime as dt
df = pd.read_csv("/Users/paul/Documents/python (original)/Quant/sp500.csv", usecols=['Date', 'Open','High','Low','Close'])
#Reset the index to remove Date column from index
df_ohlc = df.reset_index()
#Naming columns
df_ohlc.columns = ["Date","Open","High",'Low',"Close"]
#Converting dates column to float values
df_ohlc['Date'] = df_ohlc['Date'].map(mdates.date2num)
#Making plot
fig = plt.figure()
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=6, colspan=1)
#Converts raw mdate numbers to dates
ax1.xaxis_date()
plt.xlabel("Date")
print(df_ohlc)
#Making candlestick plot
candlestick_ohlc(ax1,df_ohlc.values,width=1, colorup='g', colordown='k',alpha=0.75)
plt.ylabel("Price")
plt.legend()
plt.show()
You don't need "0", "1", "2" before each line in your .csv file. You must first remove that and then:
If you're going to reset the index, you need an actual index column in your dataframe, so add index_col like this:
df = pd.read_csv("/Users/paul/Documents/python (original)/Quant/sp500.csv", usecols=['Date', 'Open','High','Low','Close'], index_col= 'Date')
Convert your date column from string to datetime :
df_ohlc['Date'] = pd.to_datetime(df_ohlc['Date'])
EDIT:
If you can't delete the 0, 1, 2... column in your csv file because it's too large, modify the first row to make the 'index' column appear like this:
'index', 'Date', 'Open','High','Low','Close'
Then, in your code:
df = pd.read_csv("/Users/paul/Documents/python (original)/Quant/sp500.csv", usecols=['index', 'Date', 'Open','High','Low','Close'], index_col="Date")
df.drop('index', axis=1, inplace=True)