python matplotlib plot datetime index - python

I am trying to create a simple line graph based on a datetime index. But I get an error message.
#standard packages
import numpy as np
import pandas as pd
#visualization
%matplotlib inline
import matplotlib.pylab as plt
#create weekly datetime index
edf = pd.read_csv('C:\Users\j~\raw.csv', parse_dates=[6])
edf2 = edf[['DATESENT','Sales','Traffic']].copy()
edf2['DATESENT']=pd.to_datetime(edf2['DATESENT'],format='%m/%d/%Y')
edf2 = edf2.set_index(pd.DatetimeIndex(edf2['DATESENT']))
edf2.resample('w').sum()
edf2
#output
SALES
DATESENT
2014-01-05 476
2014-01-12 67876
Then I try to plot (the simplest line plot possible to see sales by week)
#linegraph
edf3.plot(x='DATESENT',y='Sales')
But I get this error message
KeyError: 'DATESENT'

You're getting a KeyError because your 'DATESENT' is the index and NOT a column in edf3. You can do this instead:
#linegraph
edf3.plot(x=edf3.index,y='Sales')

Related

Decompose time series in Python with statsmodels

I am trying to decompose a time series. The database is a 2x8638 matrix. Follow the code.
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df1 = pd.read_csv("u_x_ts.csv").set_index("0")
df1.head()
enter image description here
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(df1, model='multiplicative')
result.plot()
plt.show()
then python returns the error message:
ValueError: Multiplicative seasonality is not appropriate for zero and negative values
I think statsmodels doesn't support such small values, because at the beginning of the series the values ​​are too small.
But if anyone knows a way out of this problem I appreciate it.

Showing all Full Hours on X-Axis in Matplotlib

Python Beginer here. I have a .tsv file with data like this:
Date Time Day Sales
2020-08-07 17:20:04 Friday 37
2020-08-07 17:30:05 Friday 38
...and so on
I would like to plot this. I've tried this:
from pandas import read_csv
from matplotlib import pyplot
import datetime
import pandas as pd
series = read_csv('data.tsv', sep="\t")
pyplot.figure()
x = pd.to_datetime(series['Time']).dt.time
y = series['Sales']
pyplot.plot(x,y)
pyplot.show()
It works! However, I'd like to show every hour of the day on the x-axis. I've tried doing:
times = [datetime.datetime.strptime(str(i), '%H') for i in range(24)]
pyplot.xticks(times)
... but it doesn't work. Right now it seems quite random whats on the x-axis (00:00, 05:33:20, 11:06 ...)
Any ideas?
Here is some example code. AutoDateFormatter() sets an automatic format. DateFormatter('%H:%M') set hours:minutes as format.
See the docs for more options, both for the locator and for the formatter.
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
times = pd.date_range(start='2020-02-14 20:30', end='2020-02-14 22:20', freq='7min')
series = pd.DataFrame({'Time': times,
'Sales': 20 + np.random.uniform(-1, 1, len(times)).cumsum()})
x = series['Time']
y = series['Sales']
plt.plot(x, y)
plt.gca().xaxis.set_major_locator(mdates.AutoDateLocator())
# plt.gca().xaxis.set_major_formatter(mdates.AutoDateFormatter())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.show()

Getting a length mismatch error on this code, what does it mean?

I am trying to plot a time series analysis chart and I am getting an error that says "ValueError: Length mismatch: Expected axis has 50 elements, new values have 1 elements". What does it mean? I'll include my code:
import pandas as pd
import numpy as np
import matplotlib as plt
import datetime
from dateutil.relativedelta import relativedelta
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf
from statsmodels.tsa.stattools import pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
def init_data_visualisation():
df = pd.read_csv('MasterFile.csv', index_col=0)
df.index.name=None
df.reset_index(inplace=True)
df.set_index(['index'], inplace=True)
df.index.name=None
df.columns = ['Robbery']
df['Robbery'] = df.Robbery.apply(lambda x: int(x) *100)
df.Robbery.plot(title='Robbery Over 18 Months', fontsize=14)
plt.show()
if __name__ == '__main__':
init_data_visualisation()
"ValueError: Length mismatch: Expected axis has 50 elements, new values have 1 elements"
It basically means that in the data frame you have 50 columns, but you are trying to rename them with 1 column.
try this link if you need a more clear understanding:
https://joshuaotwell.com/renaming-pandas-dataframe-columns-with-examples/

How to properly handle 'despine' function to avoid error message

I have been using despine(plt.gca()) as a tool to plot my time series data as demonstrated below:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datetime import datetime
date_rng = pd.date_range(start='1/2015', end='1/2019', freq='M')
#Let’s create an example data frame with the timestamp data and look at the first 5
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))
df.head()
df['datetime'] = pd.to_datetime(df['date'])
df = df.set_index('datetime')
df.drop(['date'], axis=1, inplace=True)
df.head()
# we visualize the data:
df.plot(lw=1.5)
despine(plt.gca())
plt.gcf().autofmt_xdate()
plt.ylabel('Series');
The above python code gives the below error message
NameError: name 'despine' is not defined
if I import seaborn as below:
mport seaborn as sns
df.plot(lw=1.5)
sns.despine(plt.gca())
plt.gcf().autofmt_xdate()
plt.ylabel('Series');
it will produce the error below:
'AxesSubplot' object is not iterable
though the plot is made, but I will prefer there is no error message at all. This error message keep coming each time I use the particular line of code.
Please help me figure out what is wrong with despine(plt.gca()). I am running this code on python 3
You haven't defined any function called despine, nor have you imported any modules with that function defined within it. Assuming you want to use seaborn.despine, you need to import the module and then access the despine function:
import seaborn as sns
# Your code here
sns.despine(ax=plt.gca())

matplotlib dataframe x axis date issue

import sys
import ConfigParser
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as DT
import bokeh
sys.path.extend(['..\..\myProj\SOURCE'])
fullfilepath = "../../myProj/SOURCE/" + 'myparts.txt'
ohg_df = pd.read_csv(fullfilepath, sep="\t" )
temp_df = temp_df[['as_on_date', 'ohg_qty']]
temp_df = temp_df.sort(['as_on_date'], ascending=[1])
temp_df.set_index('as_on_date')
plt.plot(temp_df.index, temp_df.ohg_qty)
plt.show()
This is my dataframe after importing.
I am trying to plot the line graph with x axis as date mentioned in the dataframe.
Can someone guide me... I am new to pandas.
dataframe picture
output pitcure
Easier:
# Set index directly
ohg_df = pd.read_csv(fullfilepath, sep="\t", index='as_on_date')
# Convert string index to dates
ohg_df.index = pd.to_datetime(ohg_df.index)
# Get a column and plot it (taking a column keeps the index)
plt.plot(ohg_df.ohg_qty)

Categories