python pandas plot with uneven timeseries index (with count evenly distributed) - python

My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.

Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)

I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)

Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.

Related

Seaborn plot showing two labels at the start and end of a month [duplicate]

I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.

datetime x-axis matplotlib labels causing uncontrolled overlap

I'm trying to plot a pandas series with a 'pandas.tseries.index.DatetimeIndex'. The x-axis label stubbornly overlap, and I cannot make them presentable, even with several suggested solutions.
I tried stackoverflow solution suggesting to use autofmt_xdate but it doesn't help.
I also tried the suggestion to plt.tight_layout(), which fails to make an effect.
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
#plt.tight_layout()
print(type(test_df[(test_df.index.year ==2017) ]['error'].index))
UPDATE: That I'm using a bar chart is an issue. A regular time-series plot shows nicely-managed labels.
A pandas bar plot is a categorical plot. It shows one bar for each index at integer positions on the scale. Hence the first bar is at position 0, the next at 1 etc. The labels correspond to the dataframes' index. If you have 100 bars, you'll end up with 100 labels. This makes sense because pandas cannot know if those should be treated as categories or ordinal/numeric data.
If instead you use a normal matplotlib bar plot, it will treat the dataframe index numerically. This means the bars have their position according to the actual dates and labels are placed according to the automatic ticker.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=42).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(42)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gcf().autofmt_xdate()
plt.show()
The advantage is then in addition that matplotlib.dates locators and formatters can be used. E.g. to label each first and fifteenth of a month with a custom format,
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=93).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(93)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gca().xaxis.set_major_locator(mdates.DayLocator((1,15)))
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%d %b %Y"))
plt.gcf().autofmt_xdate()
plt.show()
In your situation, the easiest would be to manually create labels and spacing, and apply that using ax.xaxis.set_major_formatter.
Here's a possible solution:
Since no sample data was provided, I tried to mimic the structure of your dataset in a dataframe with some random numbers.
The setup:
# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
# A dataframe with random numbers ro run tests on
np.random.seed(123456)
rows = 100
df = pd.DataFrame(np.random.randint(-10,10,size=(rows, 1)), columns=['error'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
test_df = df.copy(deep = True)
# Plot of data that mimics the structure of your dataset
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
plt.figure(figsize=(15,8))
A possible solution:
test_df = df.copy(deep = True)
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
plt.figure(figsize=(15,8))
# Make a list of empty myLabels
myLabels = ['']*len(test_df.index)
# Set labels on every 20th element in myLabels
myLabels[::20] = [item.strftime('%Y - %m') for item in test_df.index[::20]]
ax.xaxis.set_major_formatter(ticker.FixedFormatter(myLabels))
plt.gcf().autofmt_xdate()
# Tilt the labels
plt.setp(ax.get_xticklabels(), rotation=30, fontsize=10)
plt.show()
You can easily change the formatting of labels by checking strftime.org

Date removed from x axis on overlaid plots matplotlib

I am trying to show time series lines representing an effort amount using matplotlib and pandas.
I've got my DF's to all to overlay in one plot, however when I do python seems to strip the x axis of the date and input some numbers. (I'm not sure where these come from but at a guess, not all days contain the same data so python has reverted to using an index id number). If I plot any one of these they come up with date on the x-axis.
Any hints or solutions to make the x axis show date for the multiple plot would be much appreciated.
This is the single figure plot with time axis:
Code I'm using to plot is
fig = pl.figure()
ax = fig.add_subplot(111)
ax.plot(b342,color='black')
ax.plot(b343,color='blue')
ax.plot(b344,color='red')
ax.plot(b345,color='green')
ax.plot(b346,color='pink')
ax.plot(fi,color='yellow')
plt.show()
This is the multiple plot fig with weird x axis:
One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# make up some data
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["FishEffort"] )
df.df_name = str(i)
dfs.append(df)
# plot it directly using matplotlib instead of through the DataFrame
fig = plt.figure()
ax = fig.add_subplot()
for df in dfs:
plt.plot(df.index,df["FishEffort"], label = df.df_name)
plt.legend()
plt.show()
Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["DataFrame #" + str(i) ] )
df.df_name = str(i)
dfs.append(df)
df = pd.concat(dfs, axis = 1)
df.plot()
I've found an answer that does what I want, it seems that calling plt.plot wasn't using the date as the x axis, however calling it using the pandas documentation did the trick.
ax = b342.plot(label='342')
b343.plot(ax=ax, label='test')
b344.plot(ax=ax)
b345.plot(ax=ax)
b346.plot(ax=ax)
fi.plot(ax=ax)
plt.show()
I was wondering if anyone knew hwo to change the labels here?

pandas: plot multiple columns of a timeseries with labels (example from pandas documentation)

I'm trying to recreate the example of plot multiple columns of a timeseries with labels as shown in the pandas documentation here: http://pandas.pydata.org/pandas-docs/dev/visualization.html#visualization-basic (second graph)
Here's my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
ts = pd.DataFrame(np.random.randn(1000, 4), index=pd.date_range('1/1/2000', periods=1000), columns=list('ABCD'))
ts = ts.cumsum()
fig = plt.figure()
print ts.head()
ts.plot()
fig.savefig("x.png")
the text output seems ok:
A B C D
2000-01-01 1.547838 -0.571000 -1.780852 0.559283
2000-01-02 1.165659 -1.859979 -0.490980 0.796502
2000-01-03 0.786416 -2.543543 -0.903669 1.117328
2000-01-04 1.640174 -3.756809 -1.862188 0.466236
2000-01-05 2.119575 -4.590741 -1.055563 1.004607
but x.png is always empty.
If I plot just one column:
ts['A'].plot()
I do get a result.
Is there a way to debug this, to find out what's going wrong here?
The reason you don't get a result is because you are not saving the 'correct' figure: you are making a figure with plt.figure(), but pandas does not plot on the current figure, and will create a new one.
If you do:
ts.plot()
fig = plt.gcf() # get current figure
fig.savefig("x.png")
I get the correct output. When plotting a Series, it does use the current axis if no axis is passed.
But it seems that the pandas docs are not fully correct on that account (as they use the plt.figure()), I reported an issue for that: https://github.com/pydata/pandas/issues/8776
Another option is to provide an axes object using the ax argument:
fig = plt.figure()
ts.plot(ax=plt.gca()) # 'get current axis'
fig.savefig("x.png")
or slightly cleaner (IMO):
fig, ax = plt.subplots()
ts.plot(ax=ax)
fig.savefig("x.png")

How to make X axis in matplotlib/pylab to NOT sort automatically the values?

Whenever I plot, the X axis sorts automatically (for example, if i enter values 3, 2, 4, it will automatically sort the X axis from smaller to larger.
How can I do it so the axis remains with the order I input the values i.e 3, 2, 4
import pylab as pl
data = genfromtxt('myfile.dat')
pl.axis('auto')
pl.plot(data[:,1], data[:,0])
I found one function, set_autoscalex_on(FALSE) but I'm not sure how to use it or whether it is what I want.
Thanks
You could provide a dummy x-range, and then override the xtick labels. I do agree with the comments above questioning wether its the best solution, but thats hard to judge without any context.
If you really want to, this might be an option:
fig, ax = plt.subplots(1,2, figsize=(10,4))
x = [2,4,3,6,1,7]
y = [1,2,3,4,5,6]
ax[0].plot(x, y)
ax[1].plot(np.arange(len(x)), y)
ax[1].set_xticklabels(x)
edit: If you work with dates, why not plot the real date on the axis (and perhaps format it by the day-of-month if you do want 29 30 1 2 etc on the axis?
Maybe you want to set the xticks:
import pylab as pl
data = genfromtxt('myfile.dat')
pl.axis('auto')
xs = pl.arange(data.shape[0])
pl.plot(xs, data[:,0])
pl.xticks(xs, data[:,1])
Working sample:
Another option would be to work with datetimes. If you work with dates, you can use those as input to the plot command.
Working sample:
import random
import pylab as plt
import datetime
from matplotlib.dates import DateFormatter, DayLocator
fig, ax = plt.subplots(2,1, figsize=(6,8))
# Sample 1: use xticks
days = [29,30,31,1,2,3,4,5]
values = [random.random() for x in days]
xs = range(len(days))
plt.axes(ax[0])
plt.plot(xs, values)
plt.xticks(xs, days)
# Sample 2: Work with dates
date_strings = ["2013-01-30",
"2013-01-31",
"2013-02-01",
"2013-02-02",
"2013-02-03"]
dates = [datetime.datetime.strptime(x, "%Y-%m-%d") for x in date_strings]
values = [random.random() for x in dates]
plt.axes(ax[1])
plt.plot(dates,values)
ax[1].xaxis.set_major_formatter(DateFormatter("%b %d"))
ax[1].xaxis.set_major_locator(DayLocator())
plt.show()

Categories