I am trying to make a visually appealing graph in Python. I was using Randal Olson's example http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/ here and trying to make some adjustments.
My data is simple,
dispute_percentage
Out[34]:
2015-08-11 0.017647
2015-08-12 0.004525
2015-08-13 0.006024
2015-08-14 0.000000
2015-08-15 0.000000
2015-08-17 0.000000
The problem is that the data starts loading at Feb 2015, and I want to start displaying at April 2015.
Here is my code
from __future__ import division
from collections import OrderedDict
import pandas as pd
from collections import Counter
from pylab import *
import datetime as datetime
dispute_percentage.plot(kind = 'line')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.title('Percent Disputes In FY2015')
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
#ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
datenow = datetime.datetime.now
dstart = datetime(2015,4,1)
print datenow
plt.ylim(0, .14)
plt.xlim(dstart, datenow)
The xlim is what I am struggling with. I'm getting the error
File "C:/Mypath/name.py", line 52, in <module>
dstart = datetime(2015,4,1)
TypeError: 'module' object is not callable
If anyone could help with this that would be great. Also any input on trying to make it prettier would also be appreciated.
You need to call datetime.datetime.now() with the parentheses on the end, and for dstart, you need to use the datetime method of the datetime module: datetime.datetime(2015,4,1)
import datetime
datenow = datetime.datetime.now()
dstart = datetime.datetime(2015,4,1)
EDIT:
To set the xticks to the first of the month (thanks to #AndyKubiak):
firsts=[]
for i in range(dstart.month, datenow.month+1):
firsts.append(datetime.datetime(2015,i,1))
plt.xticks(firsts)
Related
I want to visualise the daily data using Matplotlib. The data is temperature against time and has this format:
Time Temperature
1 8:23:04 18.5
2 8:23:04 19.0
3 9:12:57 19.0
4 9:12:57 20.0
... ... ...
But when plotting the graph, the Time values on x-axis is distorted, which looks like this:
Realising Matplotlib may not be interpreting time data correctly, I converted the time format using pd.to_datetime:
df['Time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")
df.plot( 'Time', 'Temperature',figsize=(20, 10))
df.describe()
but this again returned:
How to make the time on x-axis look normal? Thanks
As #Michael O. was saying, you need to take care of the datetime.
You miss the day, year and month. Here I implemented a possible solution adding these missing data with some default values, you may want to change them.
The code is very simple and the comments illustrate what I am doing.
import pandas as pd
from datetime import datetime, date, time, timezone
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
vals=[["8:23:04", 18.5],
["8:23:04", 19.0],
["9:12:57", 19.0],
["9:12:57", 20.0]]
apd=pd.DataFrame(vals, columns=["Time", "Temp"])
# a simple function to convert a string to datetime
def conv_time(cell):
dt = datetime.strptime(cell, "%d/%m/%Y %H:%M:%S")
return(dt)
# the dataframe misses the day, month and year, we need to add some
apd["Time"]=["{}/{}/{} {}".format(1,1,2020, cell) for cell in apd["Time"]]
# we use the function to convert the column to a datetime
apd["Time"]=[conv_time(cell) for cell in apd["Time"]]
## plotting the results taking care of the axis
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H"))
ax.set_xlim([pd.to_datetime('2020-01-1 6:00:00'), pd.to_datetime('2020-01-1 12:00:00')])
ax.scatter(apd["Time"], apd["Temp"])
I have an amount of times written in the format hh:mm:ss, if I use the code below and print what x is I get 1900, 1, 1, 10, 29, 34 for every timestamp. How can I take away the year, month and date? As I want to have the time in the format hh:mm:ss
EDIT: Updated with the whole code as it looks now with help from comments.
import matplotlib.pyplot as plt
import matplotlib.ticker
import time
import datetime
x = ['10:29:55', '10:34:44']
sy1 = [679.162, 679.802]
x_labels = [datetime.datetime.strptime(elem, '%H:%M:%S') for elem in x]
formatter = matplotlib.ticker. FixedFormatter(x_labels)
plt.gca().xaxis.set_major_formatter(formatter)
plt.gca().xaxis.set_minor_formatter(formatter)
plt.plot(x_labels, sy1, 'ro')
plt.xlabel('Time')
plt.ylabel('Position')
plt.show()
But obviously it displays the time when taking into account the year, month and date too.
Plotting (wrong) time against y values
If I use strftime instead of strptime I get a TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
Ok. There are a few ways to get what you want.
If you're willing to settle for having microseconds, you should be really close to what you need:
import matplotlib.pyplot as plt
import datetime
time_strings = ['10:29:55', '10:34:44']
sy1 = [679.162, 679.802]
times = [datetime.datetime.strptime(elem, '%H:%M:%S') for elem in time_strings]
plt.plot(times, sy1, 'ro')
plt.xlabel('Time')
plt.ylabel('Position')
plt.show()
This should show the times you want in a plot, just with microseconds in the formatting. The microseconds make it all ugly, but my only changes were ones for clarity - I didn't import time or import matplotlib.ticker, I changed your x to a more accurate variable name, and created the datetimes as you did. To get rid of the microseconds, things get uglier. You can't just use the FixedFormatter because we only set 2 values, and the standard plot has more than 2 ticks; you have to find a way to get the FuncFormatter to work. This works as desired, but is still too noisy, so I'm adding in the plt.gcf().autofmt_xdate() as well.
import matplotlib.pyplot as plt
import matplotlib.ticker
import datetime
import pylab
time_strings = ['10:29:55', '10:34:44']
sy1 = [679.162, 679.802]
times = [datetime.datetime.strptime(elem, '%H:%M:%S') for elem in time_strings]
plt.plot(times, sy1, 'ro')
formatter = matplotlib.ticker.FuncFormatter(lambda tick_value, _: datetime.datetime.strftime(pylab.num2date(tick_value), '%H:%M:%S'))
plt.gca().xaxis.set_major_formatter(formatter)
plt.gca().xaxis.set_minor_formatter(formatter)
plt.xlabel('Time')
plt.ylabel('Position')
plt.gcf().autofmt_xdate()
plt.show()
The line defining the FuncFormatter is messy. I define a lambda, which is a function defined on a single line. FuncFormatter expects it to take 2 arguments. The first one is the tick_value, and we don't really care what the second one is, so I gave it the standard variable name of _ to show we don't care. The tick values are datetimes or timestamps. The way we get from the tick value to a datetime is by calling pylab.num2date.
You'll find that this second solution is just what you need. The key thing you needed to do was keep track of what your variable types were, and what variable types were needed where.
I am new to matplotlib (1.3.1-2) and I cannot find a decent place to start.
I want to plot the distribution of points over time in a histogram with matplotlib.
Basically I want to plot the cumulative sum of the occurrence of a date.
date
2011-12-13
2011-12-13
2013-11-01
2013-11-01
2013-06-04
2013-06-04
2014-01-01
...
That would make
2011-12-13 -> 2 times
2013-11-01 -> 3 times
2013-06-04 -> 2 times
2014-01-01 -> once
Since there will be many points over many years, I want to set the start date on my x-Axis and the end date, and then mark n-time steps(i.e. 1 year steps) and finally decide how many bins there will be.
How would I achieve that?
Matplotlib uses its own format for dates/times, but also provides simple functions to convert which are provided in the dates module. It also provides various Locators and Formatters that take care of placing the ticks on the axis and formatting the corresponding labels. This should get you started:
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# generate some random data (approximately over 5 years)
data = [float(random.randint(1271517521, 1429197513)) for _ in range(1000)]
# convert the epoch format to matplotlib date format
mpl_data = mdates.epoch2num(data)
# plot it
fig, ax = plt.subplots(1,1)
ax.hist(mpl_data, bins=50, color='lightblue')
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d.%m.%y'))
plt.show()
Result:
To add to hitzg's answer, you can use AutoDateLocator and AutoDateFormatter to have matplotlib do the location and formatting for you:
locator = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(locator))
Here is a more modern solution for matplotlib version 3.5.3.
Also, it explicitly specifies the min/max date instead of relying on min/max values derived from the data.
import random
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
days = 365*3
start_date = datetime.now()
random_dates = [
start_date + timedelta(days=int(random.random()*days))
for _ in range(100)
]
end_date = start_date + timedelta(days=days)
fig, ax = plt.subplots(figsize=(5,3))
n, bins, patches = ax.hist(random_dates, bins=52, range=(start_date, end_date))
fig.autofmt_xdate()
plt.show()
When plotting a timeseries with the built-in plot function of pandas, it seems to ignore the timezone of my index: it always uses the UTC time for the x-axis. An example:
import numpy as np
import matplotlib.pyplot as plt
from pandas import rolling_mean, DataFrame, date_range
rng = date_range('1/1/2011', periods=200, freq='S', tz="UTC")
data = DataFrame(np.random.randn(len(rng), 3), index=rng, columns=['A', 'B', 'C'])
data_cet = data.tz_convert("CET")
# plot with data in UTC timezone
fig, ax = plt.subplots()
data[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
# plot with data in CET timezone, but the x-axis remains the same as above
fig, ax = plt.subplots()
data_cet[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
The plot does not change, although the index has:
In [11]: data.index[0]
Out[11]: <Timestamp: 2011-01-01 00:00:00+0000 UTC, tz=UTC>
In [12]: data_cet.index[0]
Out[12]: <Timestamp: 2011-01-01 01:00:00+0100 CET, tz=CET>
Should I file a bug, or do I miss something?
This is definitely a bug. I've created a report on github. The reason is because internally, pandas converts a regular frequency DatetimeIndex to PeriodIndex to hook into formatters/locators in pandas, and currently PeriodIndex does NOT retain timezone information.
Please stay tuned for a fix.
from pytz import timezone as ptz
import matplotlib as mpl
...
data.index = pd.to_datetime(data.index, utc=True).tz_localize(tz=ptz('<your timezone>'))
...
mpl.rcParams['timezone'] = data.index.tz.zone
... after which matplotlib prints as that zone rather than UTC.
However! Note if you need to annotate, the x locations of the annotations will still need to be in UTC, even whilst strings passed to data.loc[] or data.at[] will be assumed to be in the set timezone!
For instance I needed to show a series of vertical lines labelled with timestamps on them:
(this is after most of the plot calls, and note the timestamp strings in sels were UTC)
sels = ['2019-03-21 3:56:28',
'2019-03-21 4:00:30',
'2019-03-21 4:05:55',
'2019-03-21 4:13:40']
ax.vlines(sels,125,145,lw=1,color='grey') # 125 was bottom, 145 was top in data units
for s in sels:
tstr = pd.to_datetime(s, utc=True)\
.astimezone(tz=ptz(data.index.tz.zone))\
.isoformat().split('T')[1].split('+')[0]
ax.annotate(tstr,xy=(s,125),xycoords='data',
xytext=(0,5), textcoords='offset points', rotation=90,
horizontalalignment='right', verticalalignment='bottom')
This puts grey vertical lines at the times chosen manually in sels, and labels them in local timezone hours, minutes and seconds. (the .split()[] business discards the date and timezone info from the .isoformat() string).
But when I need to actually get corresponding values from data using the same s in sels, I then have to use the somewhat awkward:
data.tz_convert('UTC').at[s]
Whereas just
data.at[s]
Fails with a KeyError because pandas interprets s is in the data.index.tz timezone, and so interpreted, the timestamps fall outside of range of the contents of data
How to deal with UTC to local time conversion
import time
import matplotlib.dates
…
tz = pytz.timezone(time.tzname[0])
…
ax.xaxis.set_major_locator(matplotlib.dates.HourLocator(interval=1, tz=tz))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%H', tz=tz))
I have an array of timestamps in the format (HH:MM:SS.mmmmmm) and another array of floating point numbers, each corresponding to a value in the timestamp array.
Can I plot time on the x axis and the numbers on the y-axis using Matplotlib?
I was trying to, but somehow it was only accepting arrays of floats. How can I get it to plot the time? Do I have to modify the format in any way?
Update:
This answer is outdated since matplotlib version 3.5. The plot function now handles datetime data directly. See https://matplotlib.org/3.5.1/api/_as_gen/matplotlib.pyplot.plot_date.html
The use of plot_date is discouraged. This method exists for historic
reasons and may be deprecated in the future.
datetime-like data should directly be plotted using plot.
If you need to plot plain numeric data as Matplotlib date format or
need to set a timezone, call ax.xaxis.axis_date / ax.yaxis.axis_date
before plot. See Axis.axis_date.
Old, outdated answer:
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
import matplotlib.pyplot
import matplotlib.dates
from datetime import datetime
x_values = [datetime(2021, 11, 18, 12), datetime(2021, 11, 18, 14), datetime(2021, 11, 18, 16)]
y_values = [1.0, 3.0, 2.0]
dates = matplotlib.dates.date2num(x_values)
matplotlib.pyplot.plot_date(dates, y_values)
You can also plot the timestamp, value pairs using pyplot.plot (after parsing them from their string representation). (Tested with matplotlib versions 1.2.0 and 1.3.1.)
Example:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Resulting image:
Here's the same as a scatter plot:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Produces an image similar to this:
7 years later and this code has helped me.
However, my times still were not showing up correctly.
Using Matplotlib 2.0.0 and I had to add the following bit of code from Editing the date formatting of x-axis tick labels in matplotlib by Paul H.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(myFmt)
I changed the format to (%H:%M) and the time displayed correctly.
All thanks to the community.
I had trouble with this using matplotlib version: 2.0.2. Running the example from above I got a centered stacked set of bubbles.
I "fixed" the problem by adding another line:
plt.plot([],[])
The entire code snippet becomes:
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(minutes=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot([],[])
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)
plt.show()
plt.close()
This produces an image with the bubbles distributed as desired.
Pandas dataframes haven't been mentioned yet. I wanted to show how these solved my datetime problem. I have datetime to the milisecond 2021-04-01 16:05:37. I am pulling linux/haproxy throughput from /proc so I can really format it however I like. This is nice for feeding data into a live graph animation.
Here's a look at the csv. (Ignore the packets per second column I'm using that in another graph)
head -2 ~/data
date,mbps,pps
2021-04-01 16:05:37,113,9342.00
...
By using print(dataframe.dtype) I can see how the data was read in:
(base) ➜ graphs ./throughput.py
date object
mbps int64
pps float64
dtype: object
Pandas pulls the date string in as "object", which is just type char. Using this as-is in a script:
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
Matplotlib renders all the milisecond time data. I've added plt.xticks(rotation=45) to tilt the dates but it's not what I want. I can convert the date "object" to a datetime64[ns]. Which matplotlib does know how to render.
dataframe["date"] = pd.to_datetime(dataframe["date"])
This time my date is type datetime64[ns]
(base) ➜ graphs ./throughput.py
date datetime64[ns]
mbps int64
pps float64
dtype: object
Same script with 1 line difference.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
# convert object to datetime64[ns]
dataframe["date"] = pd.to_datetime(dataframe["date"])
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
This might not have been ideal for your usecase but it might help someone else.