Having an issue using matplotlib event.xdata when plotting pandas.Timeseries, I tried to reproduce the answer proposed in a very related question, but get a very strange behavior.
Here's the code, adapted to python3 and with a little more stuff in the on_click() function:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def on_click(event):
if event.inaxes is not None:
# provide raw and converted x data
print(f"{event.xdata} --> {mdates.num2date(event.xdata)}")
# add a vertical line at clicked location
line = ax.axvline(x=event.xdata)
plt.draw()
t = pd.date_range('2015-11-01', '2016-01-06', freq='H')
y = np.random.normal(0, 1, t.size).cumsum()
df = pd.DataFrame({'Y':y}, index=t)
fig, ax = plt.subplots()
line = None
df.plot(ax=ax)
fig.canvas.mpl_connect('button_press_event', on_click)
plt.show()
If I launch this, I get the following diagram, with expected date range between Nov. 2015 and Jan. 2016, as is the cursor position information provided in the footer of the window (here 2015-11-01 10:00), and correct location of the vertical lines:
However, the command-line output is as follows:
C:\Users\me\Documents\code\>python matplotlib_even.xdate_num2date.py
402189.6454115977 --> 1102-02-27 15:29:23.562039+00:00
402907.10400704964 --> 1104-02-15 02:29:46.209088+00:00
Those event.xdata values are clearly out of both input data range and x axis data range, and are unusable for later use (like, try to find the closest y value in the serie).
So, does anyone know how I can get a correct xdata?
Something must have changed in the way matplotlib/pandas handles datetime info between the answer to the related question you linked and now. I cannot comment on why, but I found a solution to your problem.
I went digging through the code that shows the coordinates in the bottom left of the status bar, and I found that when you're plotting a timeseries, pandas patches the functions that prints this info and replaces it with this one.
From there, you can see that you need to convert the float value to a Period object.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def on_click(event):
print(pd.Period(ordinal=int(event.xdata), freq='H'))
t = pd.date_range('2015-11-01', '2016-01-06', freq='H')
y = np.random.normal(0, 1, t.size).cumsum()
df = pd.DataFrame({'Y': y}, index=t)
fig, ax = plt.subplots()
df.plot(ax=ax)
fig.canvas.mpl_connect('button_press_event', on_click)
plt.show()
Related
I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.
pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.
For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)
I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer
I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)
You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.
Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))
Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])
add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())
I want to set the x tick density by specifying how many ticks to skip each time. For example, if the x axis is labelled by 100 consecutive dates, and I want to skip every 10 dates, then I will do something like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ts = pd.period_range("20060101", periods=100).strftime("%Y%m%d")
y = np.random.randn(100)
ax = plt.subplot(1, 1, 1)
ax.plot(ts, y)
xticks = ax.get_xticks()
ax.set_xticks(xticks[::10])
plt.xticks(rotation="vertical")
plt.show()
However the output is out of place. Pyplot only picks the first few ticks and place them all in the wrong positions, although the spacing is correct:
What can I do to get the desired output? Namely the ticks should be instead:
['20060101' '20060111' '20060121' '20060131' '20060210' '20060220'
'20060302' '20060312' '20060322' '20060401']
#klim's answer seems to put the correct marks on the axis, but the labels still won't show. An example where the date axis is correctly marked yet without labels:
Set xticklabels also. Like this.
xticks = ax.get_xticks()
xticklabels = ax.get_xticklabels()
ax.set_xticks(xticks[::10])
ax.set_xticklabels(xticklabels[::10], rotation=90)
Forget the above, which doesn't work.
How about this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ts = pd.period_range("20060101", periods=100).strftime("%Y%m%d")
x = np.arange(len(ts))
y = np.random.randn(100)
ax = plt.subplot(1, 1, 1)
ax.plot(x, y)
ax.set_xticks(x[::10])
ax.set_xticklabels(ts[::10], rotation="vertical")
plt.show()
This works on my machine.
I am trying to compare and get a proper point of intersection between the two CSV files. I am using the graph depiction for better understanding.
But I am getting very diminished image of one graph as compared to another.
See the following:
Here is the data: trade-volume.csv
Here is the real graph:
Here is the data: miners-revenue.csv
Here is the real graph:
Here is the program I wrote for comparison:
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'])
ax.plot(dat3['timeDiff'], dat3['Value'])
plt.show()
I got the output like the following:
As one can see the orange color graph is very low and I could not understand the points as it is lower. I am willing to overlap the graphs and then check.
Please help me make it possible with my existing code, if no alteration required.
The problem comes down to your y axis. One has a maximum of 60,000,000 while the other has a maximum of 6,000,000,000. Trying to plot these on the same graph is going to lead to one "looking" like a straight line even though it isn't if you zoom in.
A possible solution is to use a second y axis (you can change the color of the lines using the color= argument in ax.plot():
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'], color="blue")
ax2=ax.twinx()
ax2.plot(dat3['timeDiff'], dat3['Value'], color="red")
plt.show()
Both data live on very different scales. You may normalize both in order to compare them.
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value']/dat2['Value'].values.max())
ax.plot(dat3['timeDiff'], dat3['Value']/dat3['Value'].values.max())
plt.show()
My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.
Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)
I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)
Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.
I am really struggling with matplotlib, escpecially with the axis settings. My goal is to set up 6 subplots in one figure, which all display different datasets but have the same amount of ticklabels.
The relevant part of my sourcecode looks like:
graph4.py:
# Import Matolotlib Modules #
import matplotlib as mpl
from matplotlib.figure import Figure
from matplotlib.backends.backend_gtkagg import FigureCanvasGTKAgg as FigureCanvas
from matplotlib import ticker
import matplotlib.pyplot as plt
mpl.rcParams['font.sans-serif']='Arial' #set font to arial
# Import GTK Modules #
import gtk
#Import System Modules #
import sys
# Import Numpy Modules #
from numpy import genfromtxt
import numpy
# Import Own Modules #
import mysubplot as mysp
class graph4():
weekdays = ['Montag', 'Dienstag', 'Mittwoch', 'Donnerstag', 'Freitag', 'Samstag']
def __init__(self, graphview):
#create new Figure
self.figure = Figure(figsize=(100,100), dpi=75)
#create six subplots within self.figure
self.subplot = []
for j in range(6):
self.subplot.append(self.figure.add_subplot(321 + j))
self.__conf_subplots__() #configure title, xlabel, ylabel and grid of all subplots
#to make it look better
self.figure.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.96, wspace=0.2, hspace=0.6)
#Matplotlib <-> GTK
self.canvas = FigureCanvas(self.figure) # a gtk.DrawingArea
self.canvas.set_flags(gtk.HAS_FOCUS|gtk.CAN_FOCUS)
self.canvas.grab_focus()
self.canvas.show()
graphview.pack_start(self.canvas, True, True)
#add labels and grid to all subplots
def __conf_subplots__(self):
index = 0
for i in self.subplot:
mysp.conf_subplot(i, 'Zeit', 'Menge', graph4.weekdays[index], True)
i.plot([], [], 'bo') #empty plot
index +=1
def plot(self, filename_list):
index = 0
for filename in filename_list:
data = genfromtxt(filename, delimiter=',') #load data from filename
if data.size != 0: #only if file isn't empty
if index <= len(self.subplot): #plot every file on a different subplot
mysp.plot(self.subplot[index],data[0:, 1], data[0:, 0])
index +=1
self.canvas.draw()
def clear_plot(self):
#clear axis of all subplots
for i in self.subplot:
i.cla()
self.__conf_subplots__()
mysubplot.py: (helper module)
# Import Matplotlib Modules
from matplotlib.axes import Subplot
import matplotlib.dates as md
import matplotlib.pyplot as plt
# Import Own Modules #
import mytime as myt
# Import Numpy Modules #
import numpy as np
def conf_subplot(subplot, xlabel, ylabel, title, grid):
if(xlabel != None):
subplot.set_xlabel(xlabel)
if(ylabel != None):
subplot.set_ylabel(ylabel)
if(title != None):
subplot.set_title(title)
subplot.grid(grid)
#rotate xaxis labels
plt.setp(subplot.get_xticklabels(), rotation=30, fontsize=12)
#display date on xaxis
subplot.xaxis.set_major_formatter(md.DateFormatter('%H:%M:%S'))
subplot.xaxis_date()
def plot(subplot, x, y):
subplot.plot(x, y, 'bo')
I think the best way to explain what goes wrong is with the use of screenshots. After I start my application, everything looks good:
If I double click a 'Week'-entry on the left, the method clear_plot() in graph4.py is called to reset all subplots. Then a list of filenames is passed to the method plot() in graph4.py. The method plot() opens each file and plots each dataset on a different subplot. So after I double click a entry, it looks like:
As you can see, each subplot has a different number of xtick labels, which looks pretty ugly to me. Therefore, I am looking for a solution to improve this. My first approach was to set the ticklabels manually with xaxis.set_ticklabels(), so that each subplot has the same number of ticklabels. However, as strange as it sounds, this only works on some datasets and I really don't know why. On some datasets, everything works fine and on other datasets, matplotlib is basically doing what it wants and displays xaxis labels that I didn't specify. I also tried FixedLocator(), but I got the same result. On some datasets it is working and on others, matplotlib is using a different number of xtick labels.
What am I doing wrong?
Edit:
As #sgpc suggested, I tried to use pyplot. My sourcecode now looks like this:
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.backends.backend_gtkagg import FigureCanvasGTKAgg as FigureCanvas
import matplotlib.dates as md
mpl.rcParams['font.sans-serif']='Arial' #set font to arial
import gtk
import sys
# Import Numpy Modules #
from numpy import genfromtxt
import numpy
# Import Own Modules #
import mysubplot as mysp
class graph2():
weekdays = ['Montag', 'Dienstag', 'Mittwoch', 'Donnerstag', 'Freitag', 'Samstag']
def __init__(self, graphview):
self.figure, temp = plt.subplots(ncols=2, nrows=3, sharex = True)
#2d array -> list
self.axes = [ y for x in temp for y in x]
#axis: date
for i in self.axes:
i.xaxis.set_major_formatter(md.DateFormatter('%H:%M:%S'))
i.xaxis_date()
#make space and rotate xtick labels
self.figure.autofmt_xdate()
#Matplotlib <-> GTK
self.canvas = FigureCanvas(self.figure) # a gtk.DrawingArea
self.canvas.set_flags(gtk.HAS_FOCUS|gtk.CAN_FOCUS)
self.canvas.grab_focus()
self.canvas.show()
graphview.pack_start(self.canvas, True, True)
def plot(self, filename_list):
index = 0
for filename in filename_list:
data = genfromtxt(filename, delimiter=',') #get dataset
if data.size != 0: #only if file isn't empty
if index < len(self.axes): #print each dataset on a different subplot
self.axes[index].plot(data[0:, 1], data[0:, 0], 'bo')
index +=1
self.canvas.draw()
#not yet implemented
def clear_plot(self):
pass
If I plot some datasets, I get the following output:
http://i.imgur.com/3ngYTNr.png (sorry, I still don't have enough reputation to embedd pictures)
Moreover, I am not really sure if sharing the x-axis is a really good idea, because it is possible that the x-values differ in every subplot (for example: in the first subplot, the x-values ranges from 8:00am - 11:00am and in the second subplot the x-values ranges from 7:00pm - 9:00pm).
If I get rid of sharex = True, I get the following output:
http://i.imgur.com/rxHeSyJ.png (sorry, I still don't have enough reputation to embedd pictures)
As you can see, the output now looks better. However now, the labels on the x-axes are not updated. I assume that is because the last suplots are empty.
My next attempt was to use an axis for each subplot. Therefore, I made this changes:
for i in self.axes:
plt.setp(i.get_xticklabels(), visible=True, rotation = 30) #<-- I added this line...
i.xaxis.set_major_formatter(md.DateFormatter('%H:%M:%S'))
i.xaxis_date()
#self.figure.autofmt_xdate() #<--changed this line
self.figure.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.96, wspace=0.2, hspace=0.6) #<-- and added this line
Now I get the following output:
i.imgur.com/TmA1goE.png (sorry, I still don't have enough reputation to embedd pictures)
So with this attempt, I am basically struggling with the same problem as with Figure() and add_subplot().
I really don't know, what else I could try to make it work...
I would strongly recommend you to use pyplot.subplots() with sharex=True:
fig, axes = subplots(ncols=2, nrows=3, sharex= True)
Then you access each axes using:
ax = axes[i,j]
And you can plot doing:
ax.plot(...)
To control the number of ticks for each AxesSubplot you can use:
ax.locator_params(axis='x', nbins=6)
OBS: axis can be 'x', 'y' or 'both'