I've been attempting to plot data from a comma delimited csv file which contains a date and a float:
Date,Price (€)
01062017,20.90
02062017,30.90
03062017,40.90
04062017,60.90
05062017,50.90
I then attempt to plot this with the following code:
import matplotlib.pyplot as plt
import numpy as np
import datetime
dates,cost = np.loadtxt('price_check.csv',delimiter=',',skiprows=1,unpack=True)
xdates = [datetime.datetime.strptime(str(int(date)),'%d%m%Y') for date in dates]
fig = plt.figure()
ax = plt.subplot(111)
plt.plot(xdates, cost,'o-',label='Cost')
plt.legend(loc=4)
plt.ylabel('Price (Euro)')
plt.xlabel('date')
plt.gcf().autofmt_xdate()
plt.grid()
plt.savefig('sunglasses_cost.png')
plt.show()
However, when the data is plotted, it looks like the leading zero in in the date string is being dropped:
Is there an easy way for the full date to be used in the plot?
The problem are the dates, which are converted to integers and loose their leading zero. Then
"01062017" becomes 1062017 and is then interpreted as (2017, 6, 10, 0, 0), so 2 digits as day, one digit month. For 5062017, because there is no 50th of june, it is interpreted differently and correctly as (2017, 6, 5, 0, 0).
The least invasive method to overcome this would be to format the string such that it always has 8 digist before datetime conversion:
xdates = [datetime.datetime.strptime('{:08}'.format(int(date)),'%d%m%Y') for date in dates]
This will then result in the correct plot. However, the xticklabels may show in an inconvenient way. This could be adjusted by choosing some locator and formatter
import matplotlib.dates as mdates
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
As a last comment: If you have the choice to select the format of your input file, it might be worth specifing it in a non-ambiguous way, e.g. 20170601.
Related
I have an index array (x) of dates (datetime objects) and an array of actual values (y: bond prices). Doing (in iPython):
plot(x,y)
Produces a perfectly fine time series graph with the x axis labeled with the dates. No problem so far. But I want to add text on certain dates. For example, at 2009-10-31 I wish to display the text "Event 1" with an arrow pointing to the y value at that date.
I have read trough the Matplotlib documentation on text() and annotate() to no avail. It only covers standard numbered x-axises, and I can´t infer how to work those examples on my problem.
Thank you
Matplotlib uses an internal floating point format for dates.
You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual.
As a somewhat excessively fancy example:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01),
dt.datetime(2011, 04, 01), dt.datetime(2012, 06, 01)]
y = [1, 3, 2, 5]
fig, ax = plt.subplots()
ax.plot_date(x, y, linestyle='--')
ax.annotate('Test', (mdates.date2num(x[1]), y[1]), xytext=(15, 15),
textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
fig.autofmt_xdate()
plt.show()
I am trying to represent CDC Delay of Care data as a line graph but am having some trouble formatting the y axis so that it is a percentage to the hundredths place. I would also like for the x axis to show every year in the range selected.
Here is my code:
import pandas as pd
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
plt.plot(x_axis, y_axis)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()
The graph that displays looks like this:
line graph
Excuse me for being a novice, I have been googling for an hour now and instead of continue to search for an answer I thought it would be more productive to ask StackOverflow.
Thank you for your time.
To change the format of Y-axis, you can use set_major_formatter
To change X-axis to date in year format, you will need to use set_major_locator, assuming that your date is in datetime format
To change format of X-axis, you can again use the set_major_formatter
I am showing a small example below with dummy data. Hope this works.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib.dates as mdate
estimate = [8, 7.1, 11, 10.6, 8, 8.3]
year = ['2000-01-01', '2004-01-01', '2008-01-01', '2012-01-01', '2016-01-01', '2020-01-01']
year=pd.to_datetime(year) ## Convert string to datetime
plt.figure(figsize=(12,5)) ## Added so the Years don't overlap on each other
plt.plot(year, estimate)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.gca().yaxis.set_major_formatter(FormatStrFormatter('%.2f')) ## Makes X-axis label with two decimal points
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator) ## Changes datetime to years - 1 label per year
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%Y')) ## Shows X-axis in Years
plt.gcf().autofmt_xdate() ## Rotates X-labels, if you want to use it
plt.show()
Output plot
I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.
I am plotting over a period of seconds and have time as the labels on the x-axis. Here is the only way I could get the correct time stamps. However, there are a bunch of zeros on the end. Any idea how to get rid of them??
plt.style.use('seaborn-whitegrid')
df['timestamp'] = pd.to_datetime(df['timestamp'])
fig, ax = plt.subplots(figsize=(8,4))
seconds=MicrosecondLocator(interval=500000)
myFmt = DateFormatter("%S:%f")
ax.plot(df['timestamp'], df['vibration(g)_0'], c='blue')
ax.xaxis.set_major_locator(seconds)
ax.xaxis.set_major_formatter(myFmt)
plt.gcf().autofmt_xdate()
plt.show()
This produces this image. Everything looks perfect except for all of the extra zeros. How can I get rid of them while still keeping the 5?
I guess you would want to simply cut the last 5 digits out of the string. That's also what answers to python datetime: Round/trim number of digits in microseconds suggest.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import MicrosecondLocator, DateFormatter
from matplotlib.ticker import FuncFormatter
x = np.datetime64("2018-11-30T00:00") + np.arange(1,4, dtype="timedelta64[s]")
fig, ax = plt.subplots(figsize=(8,4))
seconds=MicrosecondLocator(interval=500000)
myFmt = DateFormatter("%S:%f")
ax.plot(x,[2,1,3])
def trunc_ms_fmt(x, pos=None):
return myFmt(x,pos)[:-5]
ax.xaxis.set_major_locator(seconds)
ax.xaxis.set_major_formatter(FuncFormatter(trunc_ms_fmt))
plt.gcf().autofmt_xdate()
plt.show()
Note that this format is quite unusual; so make sure the reader of the plot understands it.
I am plotting such data:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2011-06-01' , freq='M')
b = pd.Series(np.random.randn(len(a)), index=a)
I would like the plot to be in the format of bars, so I use this:
b.plot(kind='bar')
This is what I get:
As you can see, the dates are formatted in full, which is very ugly and unreadable. I happened to test this command which creates a very nice Date format:
b.plot()
As you can see:
I like this format very much, it includes the months, marks the beginning of the year and is easily readable.
After doing some search, the closest I could get to that format is using this:
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
However the output looks like this:
I am able to have month names on x axis this way, but I like the first formatting much more. That is much more elegant. Does anyone know how I can get the same exact xticks for my bar plot?
Here's a solution that will get you the format you're looking for. You can edit the tick labels directly, and use set_major_formatter() method:
fig, ax = plt.subplots()
ax.bar(b.index, b)
ticklabels = [item.strftime('%b') for item in b.index] #['']*len(b.index)
ticklabels[::12] = [item.strftime('%b\n%Y') for item in b.index[::12]]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
ax.set_xticks(b.index)
plt.gcf().autofmt_xdate()
Output: