Plotting time-series data from pandas results in ValueError - python

I'm using pandas DataFrame and matplotlib to draw three lines in the same figure. The data ought to be correct, but when I try to plot the lines, the code returns a ValueError, which is unexpected.
The detail error warning says: ValueError: view limit minimum -105920.979 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
How to fix this error, and plot three lines in the same figure?
import pandas as pd
import datetime as dt
import pandas_datareader as web
import matplotlib.pyplot as plt
from matplotlib import style
import matplotlib.ticker as ticker
spot=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/RWTCd.xls',sheet_name='Data 1',skiprows=2) #this is spot price data
prod=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WCRFPUS2w.xls',sheet_name='Data 1',skiprows=2) #this is production data
stkp=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WTTSTUS1w.xls',sheet_name='Data 1',skiprows=2) #this is stockpile data
fig,ax = plt.subplots()
ax.plot(spot,label='WTI Crude Oil Price')
ax.plot(prod,label='US Crude Oil Production')
ax.plot(stkp,label='US Crude Oil Stockpile')
plt.legend()
plt.show()
print(spot,prod,stkp)

I don't get an error running the code, though I have made a couple of adjustments to the import and the plot.
Update matplotlib and pandas.
If you're using Anaconda, at the Anaconda prompt, type conda update --all
Parse the 'Date' column to datetime and set it as the index.
Place the legend outside the plot
Set the yscale to 'log', because the range of numbers is large.
import pandas as pd
import matplotlib.pyplot as plt
spot=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/RWTCd.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is spot price data
prod=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WCRFPUS2w.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is production data
stkp=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WTTSTUS1w.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is stockpile data
fig,ax = plt.subplots()
ax.plot(spot, label='WTI Crude Oil Price')
ax.plot(prod, label='US Crude Oil Production')
ax.plot(stkp, label='US Crude Oil Stockpile')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.yscale('log')
plt.show()

Related

How to plot two columns on one line graph using pandas and seaborn?

I want to plot columns High and Low into the same line graph using seaborn.
They are from the same csv file. I managed to graph High on the y-axis but I am struggling to get the Low in.
For reference, the files are from Kaggle.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
# Read Data from CSV:
micro = pd.read_csv('Microsoft Corporationstock.csv', parse_dates=['Date'])
covid = pd.read_csv('day_wise.csv', parse_dates=['Date'])
# Filter Data Needed:
start_date = covid['Date'].loc[0]
index_date = int(micro[micro['Date'] == start_date].index.values)
# Microsoft Stock Plot:
mg = sns.relplot(x='Date', y='High', kind='line', data=micro.loc[index_date:])
mg.set_xticklabels(rotation=30)
plt.ylabel('Market Value')
mg.fig.autofmt_xdate()
plt.show()
High Stock Graph
In general seaborn plots work better in "melted" format.
Here you can melt the data into Date, Type (High or Low), and Market Value. Then use y='Market Value' and hue='Type' to get both High and Low in the same plot:
mg = sns.relplot(
data=micro.loc[index_date:].melt(id_vars='Date', var_name='Type', value_name='Market Value')),
x='Date',
y='Market Value',
hue='Type',
kind='line',
)
Does changing it to this work? If it's similar to matplotlib it might.
plot_y = ['High', 'Low']
for y in plot_y:
mg = sns.relplot(x='Date', y=y, kind='line', data=micro.loc[index_date:])
mg.fig.autofmt_xdate()
mg.set_xticklabels(rotation=30)
plt.ylabel('Market Value')
plt.show()

Random data appearing in bar plot at regular intervals

I have a dataset containing information related to COVID-19 data with columns = ['total_cases', 'new_cases', 'date']. The data increases monotonically with atleast no sudden spikes in new_cases in January month. The dataset can be found here: https://fnvuusdqoptinxntjrmodi.coursera-apps.org/edit/CovidIndiaData.csv with lots of columns out of which I use only ['total_cases', 'new_cases', 'date'].
First 10 days data is 0 for 'new_cases' as shown in this image:
I use this code to plot bar plot for 'date' vs 'new_cases':
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
df = pd.read_csv("CovidIndiaData.csv", parse_dates=['date'], index_col=['date'])
df = df[['new_cases', 'total_cases']]
df.fillna(0)
fig = plt.figure()
ax = plt.gca()
ax.bar(df.index.values,
df['new_cases'],
color='purple')
ax.set(xlabel="Date",
ylabel="New Cases",
title="New Cases per day",
xlim=["2020-01-01", "2020-07-18"])
date_form = DateFormatter("%m-%d")
ax.xaxis.set_major_formatter(date_form)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
plt.setp(ax.get_xticklabels(), rotation=45)
plt.show()
The final plot looks like this:
The plot shows some spikes at 7th January ('01-07' on plot) where clearly in dataset the new_cases are 0. This is continued approximately after every one month interval.
Where does this data come from? How can I plot a correct graph for this data?
Thanks to Davis Herring for pointing out my mistake.
In case anyone faces similar issue, the solution is to specify date format when your date isn't in standardized format.
What I did is:
mydateparser = lambda x: pd.datetime.strptime(x, "%d-%m-%Y")
df = pd.read_csv("CovidIndiaData.csv", parse_dates=['date'], date_parser=mydateparser, index_col=['date'])

pyplot line plotting erratically

I got these 2 simple csv data but when plotting the 'mon' line gone strange toward the end.
When plotting one chart, it is fine but when the 2 charts plotted together the 'monarch' one goes strange.
Thanks in advance.
Here is the code
import pandas as pd
from matplotlib import pyplot as plt
def run_plot1():
df_ash = pd.read_csv('./data/ashburn.csv')
df_mon = pd.read_csv('./data/monarch1bed.csv')
plt.grid(True)
plt.plot(df_ash['Date'], df_ash['Ash1bed'], label='Ashburn 1 bed')
plt.plot(df_mon['Date'], df_mon['Mon1bed'], label='Monarch 1 bed')
plt.xlabel("Date")
plt.ylabel("Rate")
plt.style.use("fivethirtyeight")
plt.title("One Bed Comparison")
plt.legend()
plt.savefig('data/sample.png')
plt.tight_layout()
plt.show()
run_plot1()
and the csv datas:
Date,Ash1bed,Ash2bed,Ash3bed
08-01,306,402
22-01,181,286,349
05-02,176,281,336
19-02,188,293,369
04-03,201,306,402
18-03,209
01-04,217,394,492
15-04,209,354,455
29-04,197,302,387
13-05,205,326,414
27-05,217,362,473
10-06,390,532
08-07,415
22-07,415
05-08,415
19-08,415
15-09,290,452,594
and another :
Date,Mon1bed
08-01,230
05-02,160
19-02,160
04-03,190
18-03,190
01-04,260
15-04,260
29-04,260
13-05,300
27-05,330
10-06,330
24-06,350
08-07,350
22-07,350
05-08,350
19-08,350
02-09,350
The basic reason of erratic printout is that your Date columns
in both DataFrames are of string type.
Convert them to datetime:
df_ash.Date = pd.to_datetime(df_ash.Date, format='%d-%m')
df_mon.Date = pd.to_datetime(df_mon.Date, format='%d-%m')
But to have "reader friendly" X-axis labels, a number of additional
steps are required.
Start from necessary imports:
from pandas.plotting import register_matplotlib_converters
import matplotlib.dates as mdates
Then register matplotlib converters:
register_matplotlib_converters()
And to get proper printout, run:
fig, ax = plt.subplots() # figsize=(10, 6)
ax.grid(True)
ax.plot(df_ash['Date'], df_ash['Ash1bed'], label='Ashburn 1 bed')
ax.plot(df_mon['Date'], df_mon['Mon1bed'], label='Monarch 1 bed')
plt.xlabel("Date")
plt.ylabel("Rate")
plt.style.use("fivethirtyeight")
plt.title("One Bed Comparison")
plt.legend()
dm_fmt = mdates.DateFormatter('%d-%m')
ax.xaxis.set_major_formatter(dm_fmt)
plt.xticks(rotation=45);
For your data I got:
You should convert the date variable to a date format
df1.Date = pd.to_datetime(df1.Date, format='%d-%m')
df2.Date = pd.to_datetime(df2.Date, format='%d-%m')
plt.plot(df1.Date, df1.Ash1bed)
plt.plot(df2.Date, df2.Mon1bed)

Creating a visualization with 2 Y-Axis scales

I am currently trying to plot the price of the 1080 graphics card against the price of bitcoin over time, but the scales of the Y axis are just way off. This is my code so far:
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
from matplotlib.pyplot import *
import numpy as np
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
date = pd.to_datetime(GPUDATA["Date"])
price = GPUDATA["Price_USD"]
date1 = pd.to_datetime(BCDATA["Date"])
price1 = BCDATA["Close"]
plot(date, price)
plot(date1, price1)
And that produces this:
The GPU prices, of course, are in blue and the price of bitcoin is in orange. I am fairly new to visualizations and I'm having a rough time finding anything online that could help me fix this issue. Some of the suggestions I found on here seem to deal with plotting data from a single datasource, but my data comes from 2 datasources.
One has entries of the GPU price in a given day, the other has the open, close, high, and low price of bitcoin in a given day. I am struggling to find a solution, any advice would be more than welcome! Thank you!
What you want to do is twin the X-axis, such that both plots will share the X-axis, but have separate Y-axes. That can be done in this way:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
gpu_dates = pd.to_datetime(GPUDATA["Date"])
gpu_prices = GPUDATA["Price_USD"]
btc_dates = pd.to_datetime(BCDATA["Date"])
btc_prices = BCDATA["Close"]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx() # Create a new Axes object sharing ax1's x-axis
ax1.plot(gpu_dates, gpu_prices, color='blue')
ax2.plot(btc_dates, btc_prices, color='red')
As you have not provided sample data, I am unable to show a relevant demonstration, but this should work.

Matplotlib plots turn out blank even having values

I am new to analytics,python and machine learning and I am working on Time forecasting. Using the following code I am getting the value for train and test data but the graph is plotted blank.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.api as ExponentialSmoothing
#Importing data
df = pd.read_csv('international-airline-passengers - Copy.csv')
#Printing head
print(df.head())
#Printing tail
print(df.tail())
df = pd.read_csv('international-airline-passengers - Copy.csv', nrows = 11856)
#Creating train and test set
#Index 10392 marks the end of October 2013
train=df[0:20]
test=df[20:]
#Aggregating the dataset at daily level
df.Timestamp = pd.to_datetime(df.Month,format='%m/%d/%Y %H:%M')
df.index = df.Timestamp
df = df.resample('D').mean()
train.Timestamp = pd.to_datetime(train.Month,format='%m/%d/%Y %H:%M')
print('1')
print(train.Timestamp)
train.index = train.Timestamp
train = train.resample('D').mean()
test.Timestamp = pd.to_datetime(test.Month,format='%m/%d/%Y %H:%M')
test.index = test.Timestamp
test = test.resample('D').mean()
train.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
plt.show()
Not able to understand the reason for getting the graph blank even when train and test data is having value.
Thanks in advance.
I think I found the issue here. The thing is you are using train.Count.plot here, while the value of "plt" is still empty.If you go through the documentation of matplotlib(link down below), you will find that you need to store some value in plt first and here since plt is empty, it is giving back empty plot.
Basically you are not plotting anything and just showing up the blank plot.
Eg: plt.subplots(values) or plt.scatter(values), or any of its function depending on requirements.Hope this helps.
https://matplotlib.org/
import holoviews as hv
import pandas as pd
import numpy as np
data=pd.read_csv("C:/Users/Nisarg.Bhatt/Documents/data.csv", engine="python")
train=data.groupby(["versionCreated"])["Polarity Score"].mean()
table=hv.Table(train)
print(table)
bar=hv.Bars(table).opts(plot=dict(width=1500))
renderer = hv.renderer('bokeh')
app = renderer.app(bar)
print(app)
from bokeh.server.server import Server
server = Server({'/': app}, port=0)
server.start()
server.show("/")
This is done by using Holoviews, it is used for visualisation purpose.If you are using for a professional application, you should definitely try this. Here the versionCreated is date and Polarity is similar to count. Try this
OR, if you want to stick to matplotlib try this:
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(msft.index, msft, label='MSFT')
ax.plot(short_rolling_msft.index, short_rolling_msft, label='20 days rolling')
ax.plot(long_rolling_msft.index, long_rolling_msft, label='100 days rolling')
ax.set_xlabel('Date')
ax.set_ylabel('Adjusted closing price ($)')
ax.legend()
Also this can be used, if you want to stick with matplotlib

Categories