Plot the x-axis as a date - python

I am trying to perform some analysis on data. I got csv file and I convert it into pandas dataframe. the data looks like this. Its has several columns, but I am trying to draw x-axis as date column. .
the pandas dataframe looks like this
print (df.head(10)
cus-id date value_limit
0 10173 2011-06-12 455
1 95062 2011-09-11 455
2 171081 2011-07-05 212
3 122867 2011-08-18 123
4 107186 2011-11-23 334
5 171085 2011-09-02 376
6 169767 2011-07-03 34
7 80170 2011-03-23 34
8 154178 2011-10-02 34
9 3494 2011-01-01 34
I am trying to plot date data because there are multiple values for same date. for this purpose I am trying to plot x-asis ticks as date. since the minimum date in date column is 2011-01-01 and maximum date is 2012-04-20.
I tried something like this
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates as mdates
df = pd.read_csv('rio_data.csv', delimiter=',')
print (df.head(10))
d = []
for dat in df.date:
# print (dat)
d.append(datetime.strptime(df['date'], '%Y-%m-%d'))
days = dates.DayLocator()
datemin = datetime(2011, 1, 1)
datemax = datetime(2012, 4, 20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(days)
ax.set_xlim(datemin, datemax)
ax.set_ylabel('Count values')
But I am getting this error.
AttributeError: 'DataFrame' object has no attribute 'date'
I am trying to draw date as x-axis, it should look like this.
Can someone help me to draw the x-axis as date column. I would be grateful.

Set the index as a datetime dtype
If you set the index to the datetime series by converting the dates with pd.to_datetime(...), matplotlib will handle the x axis for you.
Here is a minimal example of how you might deal with this visualization.
Plot directly with pandas.DataFrame.plot, which uses matplotlib as the default backend.
Simple example:
import pandas as pd
import matplotlib.pyplot as plt
date_time = ["2011-09-01", "2011-08-01", "2011-07-01", "2011-06-01", "2011-05-01"]
# convert the list of strings to a datetime and .date will remove the time component
date_time = pd.to_datetime(date_time).date
temp = [2, 4, 6, 4, 6]
DF = pd.DataFrame({'temp': temp}, index=date_time)
ax = DF.plot(x_compat=True, rot=90, figsize=(6, 5))
This will yield a plot that looks like the following:
Setting the index makes things easier
The important note is that setting the DataFrame index to the datetime series allows matplotlib to deal with x axis on time series data without much help.
Follow this link for detailed explanation on spacing axis ticks (specifically dates)

You missed a ' line 12. It cause the SyntaxError.
This should correct the error.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates as mdates
df = pd.read_csv('rio_data.csv', delimiter=',')
print (df.head(10))
d = []
for dat in df.date:
# print (dat)
d.append(datetime.strptime(df['date'], '%Y-%m-%d'))
days = dates.DayLocator()
datemin = datetime(2011, 1, 1)
datemax = datetime(2012, 4, 20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(days)
ax.set_xlim(datemin, datemax)
ax.set_ylabel('Count values')

Related

Plotting whole month in python with only 1 day data

I am trying to create a plot with an amount (int) in the y-axis and days in the x-axis.
I want the plot to always have the whole month in the x-axis although I dont have data for all days.
This is the code I tryed:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import datetime as dt
df=get_pandas_data(datab) #Taking data from database in pandas DataFrame
fig = plt.figure(figsize=(10,10)) #Initialize plot
ax1 = fig.add_subplot(1,1,1)
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in df['date']]
dates=list(set(dates)) #Takes all the dates from de Dataframe and sets to avoid repeated dates
s=df.resample('D', on='date')['amount'].sum() #Takes the total amount for the same date
ax1.bar(dates,s) #Bar plot for dates and amount
ax1.set(xlabel="Date",
ylabel="Balance (€)",
title="Total Monthly balance") # Plot information
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
#this is soposed to set all days of the month in the x-axis
ax1.xaxis.set_major_locator(mdates.DayLocator(interval=1))
fig.autofmt_xdate()
plt.show()
The result I get from this is a plot but only with those days that have data.
How can I make the plot to have all days in the month and plot the bar on those who have data?
This works fine with bare datetimes and matplotlib so you must be malforming your data somehow when doing your pandas manipulations. But we can't really help because we don't have your dataframe. Its always preferable to create a standalone example with dummy data, and as little code as possible to recreate the issue. a) 90% of the time you will realize your problem b) if not, we can help...
import numpy as np
import matplotlib.pyplot as plt
import datetime
x = np.array([1, 3, 7, 8, 10])
y = x * 2
dates = [datetime.datetime(2000, 2, xx) for xx in x]
fig, ax = plt.subplots()
ax.bar(dates, y)
fig.autofmt_xdate()
plt.show()

How to format x axis time data from 'Month-Day Hour' (mm-dd HH) to just 'Hour' in matplotlib? Or how to place Hour data on the minor axis? [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

Set x axis labels for joyplot

I have written the code below to visualise a joyplot. When trying to change the x axis labels using axes.set_xticks, I get the error: AttributeError: 'list' object has no attribute 'set_xticks'
import joypy
import pandas as pd
from matplotlib import pyplot as plt
data = pd.DataFrame.from_records([['twitter', 1],
['twitter', 6],
['wikipedia', 1],
['wikipedia', 3],
['indymedia', 1],
['indymedia', 9]], columns=['platform','day'])
# Get number of days in the dataset
numdays = max(set(data['day'].tolist()))
# Generate date strings from a manually set start date
start_date = "2010-01-01"
dates = pd.date_range(start_date, periods=numdays)
dates = [str(date)[:-9] for date in dates]
fig, axes = joypy.joyplot(data,by="platform")
axes.set_xticks(range(numdays)); axes.set_xticklabels(dates)
plt.show()
The expected output should look something like the following but with the dates from dates as ticklabels.
Since joypy.joyplot(..) returns a tuple of figure, axes and axes should be list of axes, you probably want to set the labels for the last axes,
axes[-1].set_xticks(range(numdays))
axes[-1].set_xticklabels(dates)
To make date plots with python matplotlib do you should use plot_date function.
fig, ax = plt.subplots()
ax.plot_date(dates, data1, '-')
I put the complete example in pastebin, follow the link:
https://pastebin.com/sVPUZaeM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
from random import randrange, random
from datetime import datetime
#generate date list
start_date = np.datetime64('2010-01-01').astype(datetime)
numdays = 10
dates = pd.date_range(start_date, periods=numdays)
#Generate data example
data1 = [(random()+idx)**1.2 for idx in range(len(dates))]
data2 = [(random()+idx)**1.5 for idx in range(len(dates))]
#plot
fig, ax = plt.subplots()
ax.plot_date(dates, data1, '-')
ax.plot_date(dates, data2, '-')
#set the label for x and y and title
plt.title('Matplot lib dates wc example')
plt.xlabel('Dates')
plt.ylabel('Random values example')
#date format
ax.fmt_xdata = DateFormatter('%Y%m%d')
ax.grid(True)
fig.autofmt_xdate()
plt.show()
Python version tested successfully: 2.7.12
This code generates: this follow plot

How to make bar plot with converting the month column in python?

I have a dataframe like this. The month column is in type of string.
I want to make a bar plot from 201501 to 201505 with x axis is month while y axis is total_gmv. x format is like Jan,2015 Feb 2015. So how can I realize it using python? Thanks.
month total_gmv
201501 NaN
201502 2.824294e+09
201503 7.742665e+09
201504 2.024132e+10
201505 6.705012e+10
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'month': ['201501', '201502', '201503', '201504', '201505'],
'total_gmv': [np.nan, 2.824294e+09, 7.742665e+09, 2.024132e+10, 6.705012e+10]})
df['month'] = pd.to_datetime(df['month'], format='%Y%m').dt.month
df = df.set_index('month')
print df
df.plot(kind='bar')
plt.show()
Result:
total_gmv
month
1 NaN
2 2.824294e+09
3 7.742665e+09
4 2.024132e+10
5 6.705012e+10
You should be able to force month to be a timestamp and then set it as an index and plot it.
df['month'] = pd.to_datetime(df.month)
ax = df.set_index('month').plot(kind='bar')
And you might have to change the date format.
import matplotlib.dates as mdates
ax.xaxis.set_major_formatter= mdates.DateFormatter('%b, %Y')
Check here for more
Previous replies have some clues but its does not show exhaustive answer.
You have to set custom xtick labels and rotate it like here:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'month': ['201501', '201502', '201503', '201504', '201505'],
'total_gmv': [np.nan, 2.824294e+09, 7.742665e+09, 2.024132e+10, 6.705012e+10]})
df['month'] = pd.to_datetime(df['month'], format='%Y%m', errors='ignore')
ax = df.plot(kind='bar')
ax.set_xticklabels(df['month'].dt.strftime('%b, %Y'))
plt.xticks(rotation=0)
plt.show()
You should use matplotlib.pyplot and calendar module.
import matplotlib.pyplot as plt
import calendar
#change the numeric representation to texts (201501 -> Jan,2015)
df['month_name'] = [','.join([calendar.month_name[int(date[-1:-3]),date[-3:]] for date in df['month']
#change the type of df['month'] to int so plt can read it
df['month'].apply(int)
x = df['month']
y = df['total_gmv']
plt.bar(x, y, align = 'center')
#i'm not sure if you have to change the Series to a list; do whatever works
plt.xticks =(x, df['month_name'])
plt.show()

python pandas plot with uneven timeseries index (with count evenly distributed)

My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.
Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)
I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)
Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.

Categories