Date removed from x axis on overlaid plots matplotlib - python

I am trying to show time series lines representing an effort amount using matplotlib and pandas.
I've got my DF's to all to overlay in one plot, however when I do python seems to strip the x axis of the date and input some numbers. (I'm not sure where these come from but at a guess, not all days contain the same data so python has reverted to using an index id number). If I plot any one of these they come up with date on the x-axis.
Any hints or solutions to make the x axis show date for the multiple plot would be much appreciated.
This is the single figure plot with time axis:
Code I'm using to plot is
fig = pl.figure()
ax = fig.add_subplot(111)
ax.plot(b342,color='black')
ax.plot(b343,color='blue')
ax.plot(b344,color='red')
ax.plot(b345,color='green')
ax.plot(b346,color='pink')
ax.plot(fi,color='yellow')
plt.show()
This is the multiple plot fig with weird x axis:

One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# make up some data
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["FishEffort"] )
df.df_name = str(i)
dfs.append(df)
# plot it directly using matplotlib instead of through the DataFrame
fig = plt.figure()
ax = fig.add_subplot()
for df in dfs:
plt.plot(df.index,df["FishEffort"], label = df.df_name)
plt.legend()
plt.show()
Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["DataFrame #" + str(i) ] )
df.df_name = str(i)
dfs.append(df)
df = pd.concat(dfs, axis = 1)
df.plot()

I've found an answer that does what I want, it seems that calling plt.plot wasn't using the date as the x axis, however calling it using the pandas documentation did the trick.
ax = b342.plot(label='342')
b343.plot(ax=ax, label='test')
b344.plot(ax=ax)
b345.plot(ax=ax)
b346.plot(ax=ax)
fi.plot(ax=ax)
plt.show()
I was wondering if anyone knew hwo to change the labels here?

Related

Seaborn showing wrong y-axis values

The dataframe I created is as follows:
import pandas as pd
import numpy as np
import seaborn as sns
date = pd.date_range('2003-01-01', '2022-11-01', freq='MS').strftime('%Y-%m-%d').tolist()
mom = [np.nan] + list(np.repeat([0.01], 238))
cpi = [100] + list(np.repeat([np.nan], 238))
df = pd.DataFrame(list(zip(date, mom, cpi)), columns=['date','mom','cpi'])
df['date'] = pd.to_datetime(df['date'])
for i in range(1,len(df),1):
df['cpi'][i] = df['cpi'][(i-1)] * (1 + df['mom'][i])
df['yoy'] = df['cpi'].pct_change(periods=12)
Y-axis values not displaying correctly as can be seen below.
sns.lineplot(
x = 'date',
y = 'yoy',
data = df
)
I think the percentage changes I calculated for the yoy column are the cause of the issue. Because there are no issues if I manually fill in the yoy column.
Thanks in advance.
You can use matplotlib to set the axis scaling, as the difference is really subtle in your data:
import matplotlib.pyplot as plt
ax = plt.gca()
ax.set_ylim([df.yoy.min(numeric_only=True), df.yoy.max(numeric_only=True)])
sns.lineplot(
x = 'date',
y = 'yoy',
data = df,
ax = ax
)
With this the result should be more of a stepping function.
You can use something like the max difference to the mean times 1.01 to set the limits a little better, but this is the idea. You can set the axis ticks using ax.set_yticks(ticks=<list of ticks>) (documentation).

How to filter on index while creating a Matplotlib graph?

I’m trying to create a matplotlib graph by filtering the index column, which in my data frame is a date time column.
Here are the steps I follow to create the unfiltered graph.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame([1,2,3,4,5], index = [‘01/01/2021’,’01/02/2021’,’01/03/2021’,’01/04/2021’,’01/05/2021’], columns = [‘quantity’])
fig, axes = plt.subplots()
axes.scatter(data = df, x = df.index, y = ‘quantity’)
This work as expected. In order to avoid creating a data frame containing filtered data, I was trying to create the filtered graph in one line, basically doing something like this.
fig, axes = plt.subplot()
axes.scatter(data = df[df.index < ‘01/04/2021’], x = df.index, y = ‘quantity’)
This obviously doesn’t work because x and y are not the same size.
One workaround is to create a new column in the df, which is an exact copy of the index, but I was wondering if there was an easier and cleaner solution that is escaping my mind.
you are absoluted right. There are two ways:
Filtering the x-Values similar to the data:
fig, axes = plt.subplot()
axes.scatter(data = df[df.index < '01/04/2021'], x = df[df.index < '01/04/2021'].index, y = 'quantity')
restraining one axis of the plot:
fig, axes = plt.subplot()
axes.scatter(data = df, x = df.index, y = 'quantity')
axes.xlim(df[df.index < '01/04/2021'].index)

How to create a Boxplot with Timestamp using Matplotlib and Seaborn?

I have been trying to get a boxplot with each box representing an emotion over a period of time.
The data frame used to plot this contains timestamp and emotion name. I have tried converting the timestamp into a string first and then to datetime and finally to int64. This resulted in the gaps between x labels as seen in the plot. I have tried the same without converting to int64, but the matplotlib doesn't seem to allow the dates in the plot.
I'm attaching the code I have used here:
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib qt
import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
data = pd.read_csv("TX-governor-sentiment.csv")
## check data types
data.dtypes
# drop rows with all missing values
data = data.dropna(how='all')
## transforming the timestamp column
#convert from obj type to string then to date type
data['timestamp2'] = data['timestamp']
data['timestamp2'] = pd.to_datetime(data['timestamp2'].astype(str), format='%m/%d/%Y %H:%M')
# convert to number format with the following logic:
# yyyymmddhourmin --> this allows us to treat dates as a continuous variable
data['timestamp2'] = data['timestamp2'].dt.strftime('%Y%m%d%H%M')
data['timestamp2'] = data['timestamp2'].astype('int64')
print (data[['timestamp','timestamp2']])
#data transformation for data from Orange
df = pd.DataFrame(columns=('timestamp', 'emotion'))
for index, row in data.iterrows():
if row['sentiment'] == 0:
df.loc[index] = [row['timestamp2'], 'Neutral']
else:
df.loc[index] = [row['timestamp2'], row['Emotion']]
# Plot using Seaborn & Matplotlib
#convert timestamp in case it's not in number format
df['timestamp'] = df['timestamp'].astype('int64')
fig = plt.figure(figsize=(10,10))
#colors = {"Neutral": "grey", "Joy": "pink", "Surprise":"blue"}
#visualize as boxplot
plot_ = sns.boxplot(x="timestamp", y="emotion", data=df, width=0.5,whis=np.inf);
#add data point on top
plot_ = sns.stripplot(x="timestamp", y="emotion", data=df, alpha=0.8, color="black");
fig.canvas.draw()
#modify ticks and labels
plt.xlim([202003010000,202004120000])
plt.xticks([202003010000, 202003150000, 202003290000, 202004120000], ['2020/03/01', '2020/03/15', '2020/03/29', '2020/04/12'])
#add colors
for patch in plot_.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3))
Please let me know how I can overcome this problem of gaps in the boxplot. Thank you!

matplotlib update plot in while-loop with dates as x-axis

This might be obvious, so sorry in advance for this nooby question. I want to update a time series dynamically with matplotlib.pyplot. More precisely, I want to plot newly generated data in a while-loop.
This is my attempt so far:
import numpy as np
import matplotlib.pyplot as plt; plt.ion()
import pandas as pd
import time
n = 100
x = np.NaN
y = np.NaN
df = pd.DataFrame(dict(time=x, value=y), index=np.arange(n)) # not neccessarily needed to have a pandas df here, but I like working with it.
# initialise plot and line
line, = plt.plot(df['time'], df['value'])
i=0
# simulate line drawing
while i <= len(df):
#generate random data point
newData = np.random.rand()
# extend the data frame by this data point and attach the current time as index
df.loc[i, "value"] = newData
df.loc[i, "time"] = pd.datetime.now()
# plot values against indices
line.set_data(df['time'][:i], df['value'][:i])
plt.draw()
plt.pause(0.001)
# add to iteration counter
i += 1
print(i)
This returns TypeError: float() argument must be a string or a number, not 'datetime.datetime'. But as far as I can remeber, matplotlib doesn't have any problems with plotting dates on the x-axis (?).
Many thanks.
As Andras Deak pointed out, you should tell pandas explicitly that your time column is datetime. When you do df.info() at the end of your code, you will see that it takes df['time'] as float64. You can achieve this with df['time'] = pd.to_datetime(df['time']).
I was able to make your code run, but I had to add a few lines of code. I was running it in a iPython (Jupyter) console and without the two lines autoscale_view and relim, it was not updating the plot correctly. What's left to do is a nice formatting of the x-axis labels.
import numpy as np
import matplotlib.pyplot as plt; plt.ion()
import pandas as pd
import time
n = 100
x = np.NaN
y = np.NaN
df = pd.DataFrame(dict(time=x, value=y), index=np.arange(n)) # not neccessarily needed to have a pandas df here, but I like working with it.
df['time'] = pd.to_datetime(df['time']) #format 'time' as datetime object
# initialise plot and line
fig = plt.figure()
axes = fig.add_subplot(111)
line, = plt.plot(df['time'], df['value'])
i=0
# simulate line drawing
while i <= len(df):
#generate random data point
newData = np.random.rand()
# extend the data frame by this data point and attach the current time as index
df.loc[i, "value"] = newData
df.loc[i, "time"] = pd.datetime.now()
# plot values against indices, use autoscale_view and relim to readjust the axes
line.set_data(df['time'][:i], df['value'][:i])
axes.autoscale_view(True,True,True)
axes.relim()
plt.draw()
plt.pause(0.01)
# add to iteration counter
i += 1
print(i)

python pandas plot with uneven timeseries index (with count evenly distributed)

My dataframe has uneven time index.
how could I find a way to plot the data, and local the index automatically? I searched here, and I know I can plot something like
e.plot()
but the time index (x axis) will be even interval, for example per 5 minutes.
if I have to 100 data in first 5 minutes and 6 data for the second 5 minutes, how do I plot
with number of data evenly. and locate the right timestamp on x axis.
here's even count, but I don't know how to add time index.
plot(e['Bid'].values)
example of data format as requested
Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272
and here's the link
http://code.google.com/p/eu-ats/source/browse/trunk/data/new/eur-fix.csv
here's the code, I used to plot
import numpy as np
import pandas as pd
import datetime as dt
e = pd.read_csv("data/ecb/eur.csv", dtype={'Time':object})
e.Time = pd.to_datetime(e.Time, format='%Y-%m-%d %H:%M:%S:%f')
e.plot()
f = e.copy()
f.index = f.Time
x = [str(s)[:-7] for s in f.index]
ff = f.set_index(pd.Series(x))
ff.index.name = 'Time'
ff.plot()
Update:
I added two new plots for comparison to clarify the issue. Now I tried brute force to convert timestamp index back to string, and plot string as x axis. the format easily got messed up. it seems hard to customize location of x label.
Ok, it seems like what you're after is that you want to move around the x-tick locations so that there are an equal number of points between each tick. And you'd like to have the grid drawn on these appropriately-located ticks. Do I have that right?
If so:
import pandas as pd
import urllib
import matplotlib.pyplot as plt
import seaborn as sbn
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
every30 = df.loc[df.index % 30 == 0, 'Time'].values
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
df.plot(x='Time', y='Bid', ax=ax)
ax.set_xticks(every30)
I have tried to reproduce your issue, but I can't seem to. Can you have a look at this example and see how your situation differs?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
np.random.seed(0)
idx = pd.date_range('11:00', '21:30', freq='1min')
ser = pd.Series(data=np.random.randn(len(idx)), index=idx)
ser = ser.cumsum()
for i in range(20):
for j in range(8):
ser.iloc[10*i +j] = np.nan
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ser.plot(ax=axes[0])
ser.dropna().plot(ax=axes[1])
gives the following two plots:
There are a couple differences between the graphs. The one on the left doesn't connect the non-continuous bits of data. And it lacks vertical gridlines. But both seem to respect the actual index of the data. Can you show an example of your e series? What is the exact format of its index? Is it a datetime_index or is it just text?
Edit:
Playing with this, my guess is that your index is actually just text. If I continue from above with:
idx_str = [str(x) for x in idx]
newser = ser
newser.index = idx_str
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
newser.plot(ax=axes[0])
newser.dropna().plot(ax=axes[1])
then I get something like your problem:
More edit:
If this is in fact your issue (the index is a bunch of strings, not really a bunch of timestamps) then you can convert them and all will be well:
idx_fixed = pd.to_datetime(idx_str)
fixedser = newser
fixedser.index = idx_fixed
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0])
fixedser.dropna().plot(ax=axes[1])
produces output identical to the first code sample above.
Editing again:
To see the uneven spacing of the data, you can do this:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fixedser.plot(ax=axes[0], marker='.', linewidth=0)
fixedser.dropna().plot(ax=axes[1], marker='.', linewidth=0)
Let me try this one from scratch. Does this solve your issue?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
import urllib
content = urllib.urlopen('https://eu-ats.googlecode.com/svn/trunk/data/new/eur-fix.csv')
df = pd.read_csv(content, header=0, index_col='Time')
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M:%S:%f')
df.plot()
The thing is, you want to plot bid vs time. If you've put the times into your index then they become your x-axis for "free". If the time data is just another column, then you need to specify that you want to plot bid as the y-axis variable and time as the x-axis variable. So in your code above, even when you convert the time data to be datetime type, you were never instructing pandas/matplotlib to use those datetimes as the x-axis.

Categories