Creating a visualization with 2 Y-Axis scales - python

I am currently trying to plot the price of the 1080 graphics card against the price of bitcoin over time, but the scales of the Y axis are just way off. This is my code so far:
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
from matplotlib.pyplot import *
import numpy as np
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
date = pd.to_datetime(GPUDATA["Date"])
price = GPUDATA["Price_USD"]
date1 = pd.to_datetime(BCDATA["Date"])
price1 = BCDATA["Close"]
plot(date, price)
plot(date1, price1)
And that produces this:
The GPU prices, of course, are in blue and the price of bitcoin is in orange. I am fairly new to visualizations and I'm having a rough time finding anything online that could help me fix this issue. Some of the suggestions I found on here seem to deal with plotting data from a single datasource, but my data comes from 2 datasources.
One has entries of the GPU price in a given day, the other has the open, close, high, and low price of bitcoin in a given day. I am struggling to find a solution, any advice would be more than welcome! Thank you!

What you want to do is twin the X-axis, such that both plots will share the X-axis, but have separate Y-axes. That can be done in this way:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
GPUDATA = pd.read_csv("1080Prices.csv")
BCDATA = pd.read_csv("BitcoinPrice.csv")
gpu_dates = pd.to_datetime(GPUDATA["Date"])
gpu_prices = GPUDATA["Price_USD"]
btc_dates = pd.to_datetime(BCDATA["Date"])
btc_prices = BCDATA["Close"]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx() # Create a new Axes object sharing ax1's x-axis
ax1.plot(gpu_dates, gpu_prices, color='blue')
ax2.plot(btc_dates, btc_prices, color='red')
As you have not provided sample data, I am unable to show a relevant demonstration, but this should work.

Related

Plotting time-series data from pandas results in ValueError

I'm using pandas DataFrame and matplotlib to draw three lines in the same figure. The data ought to be correct, but when I try to plot the lines, the code returns a ValueError, which is unexpected.
The detail error warning says: ValueError: view limit minimum -105920.979 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
How to fix this error, and plot three lines in the same figure?
import pandas as pd
import datetime as dt
import pandas_datareader as web
import matplotlib.pyplot as plt
from matplotlib import style
import matplotlib.ticker as ticker
spot=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/RWTCd.xls',sheet_name='Data 1',skiprows=2) #this is spot price data
prod=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WCRFPUS2w.xls',sheet_name='Data 1',skiprows=2) #this is production data
stkp=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WTTSTUS1w.xls',sheet_name='Data 1',skiprows=2) #this is stockpile data
fig,ax = plt.subplots()
ax.plot(spot,label='WTI Crude Oil Price')
ax.plot(prod,label='US Crude Oil Production')
ax.plot(stkp,label='US Crude Oil Stockpile')
plt.legend()
plt.show()
print(spot,prod,stkp)
I don't get an error running the code, though I have made a couple of adjustments to the import and the plot.
Update matplotlib and pandas.
If you're using Anaconda, at the Anaconda prompt, type conda update --all
Parse the 'Date' column to datetime and set it as the index.
Place the legend outside the plot
Set the yscale to 'log', because the range of numbers is large.
import pandas as pd
import matplotlib.pyplot as plt
spot=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/RWTCd.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is spot price data
prod=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WCRFPUS2w.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is production data
stkp=pd.read_excel('https://www.eia.gov/dnav/pet/hist_xls/WTTSTUS1w.xls', sheet_name='Data 1',skiprows=2, parse_dates=['Date'], index_col='Date') #this is stockpile data
fig,ax = plt.subplots()
ax.plot(spot, label='WTI Crude Oil Price')
ax.plot(prod, label='US Crude Oil Production')
ax.plot(stkp, label='US Crude Oil Stockpile')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.yscale('log')
plt.show()

Pandas line graph - y-axis high values at the bottom and low values at the top (fliped 180 degree)

I am new to pandas and just want to show my rank vs my friends rank using pandas.
And because a lower Rank is better than a higher rank (the #1 = better then #2)
I want the graph to rising and not to fall. With the code I have, the graph is falling... Please help.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
"Me" : [10,9,7,6,3,2,1],
"My friend" : [20,19,18,15,14,10,6]},
index=list(range(7)))
lines = df.plot.line()
plt.show()
So over time I gain a higher rank but pandas is making a falling graph instead of a rising.
I hope you understand what I mean. Thanks for your help
Are you looking for invert_yaxis:
fig, ax = plt.subplots()
lines = df.plot.line(ax=ax)
ax.invert_yaxis()
Output:

Plotting histrogram with weighted bell curve

I'm plotting a histogram with a bell curve and I'm running into problems with the bell curve part. Basically, my data consists of 3 columns, an ITEM_TYPE, QTY, and WIDTH. The data of my histogram needs to account for the quantity column, and I have no problem doing that, however, when I try to do the same for the bell curve, I'm not sure how to best go about it. View my code below:
import pandas as pd
import matplotlib.pylab as plt
import matplotlib.ticker as mtick
import numpy as np
from scipy import stats
import seaborn as sns
import statsmodels.api as sm
df = pd.read_csv('Size_Overview.csv')
df2 = df[df['ITEM_TYPE'] == 'Fixed Window']
weighted = sm.nonparametric.KDEUnivariate(df2['WIDTH'])
weighted.fit(fft=False, weights=df2['QTY_ORD'])
ax = plt.subplot()
ax.hist(df2['WIDTH'], bins = [0,1,2,3,4,5,6,7,8,9,10], weights=df2['QTY_ORD'])
lnspc = np.linspace(0, 10, len(df2['WIDTH']))
m, s = stats.norm.fit(df2['WIDTH'], weights=df2['QTY_ORD'])
pdf_g = stats.norm.pdf(lnspc, m, s)
plt.plot(lnspc, pdf_g, label='Norm', c='red')
ax.set_ylabel('Unit Count (% of Total)')
ax.set_xlabel('Width (in Feet)')
ax.set_title('Width Distribution (Fixed Windows)')
ax.set_xticks(np.arange(0, 11, 1.0))
ax.yaxis.set_major_formatter(mtick.PercentFormatter(1.0))
plt.xticks(rotation=45)
plt.show()
So basically, the "weights" argument in weighted.fit and ax.hist work perfectly, however, the same argument down in stats.norm.fit seems to be ignored completely. So if I have 1 row in my data with a really high quantity, the histogram will make the proper adjustment but the bell curve stays exactly the same every time. It's basically just calculating the mean and std of the WIDTH column completely ignoring the QTY
Here's what my chart looks like:
Here's what it looks like if I add a really large quantity to an item between 1 and 2 feet:
As you can see, the histogram adjusted correctly but the bell curve stayed roughly the same. How can I make the bell curve adjust for my quantity column? All tips are appreciated, thanks in advance
Edit: Looks like nobody could help, but I figured it out anyway. Here's the workaround I came up with:
Added this code:
values = df2['WIDTH'].values
qty = df2['QTY_ORD'].astype(int)
count = qty.values
full_values = np.repeat(values, count)
and replaced:
m, s = stats.norm.fit(df2['WIDTH'], weights=df2['QTY_ORD'])
with:
m, s = stats.norm.fit(full_values)
So basically use the numpy repeat function to pass in the entire width column based on the number in the qty column. That's it!!
So now my second chart looks like this:

How to plot a time series plot for each party and the total votes they got each year

Here's my dataset
Newbie here
I want to plot the total votes each party has got for each year, I think bar plot would be a good fit here but I'm not understanding how to do it.
I want to do it with plotly.
The output should be something like this.
Here is working sample for you use case
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {'Partyname': ['Independents', 'INC','Independents','Independents','Independents'], 'Year': [1977, 1977,1980,1980,1980], "totPoll":[25168,35400,109,125,405]}
df = pd.DataFrame(data)
grpByParty = df.groupby(['Partyname'])
sumVotes = grpByParty['totPoll'].agg(np.sum)
y_values = sumVotes.keys().tolist()
y_pos = np.arange(len(y_values))
votes = sumVotes.tolist()
plt.bar(y_pos, votes, align='center', alpha=0.5)
plt.xticks(y_pos, y_values)
plt.ylabel('votes')
plt.title('party wise votes ')
plt.show()
Approach that have taken here
Group the data as a party wise.
Get sum of the total vote as party wise using aggregate.
Take The x any y coordinates in a list.
Plot the diagram using matplotlib.pyplot
Output will look like this.

Matplotlib plots turn out blank even having values

I am new to analytics,python and machine learning and I am working on Time forecasting. Using the following code I am getting the value for train and test data but the graph is plotted blank.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.api as ExponentialSmoothing
#Importing data
df = pd.read_csv('international-airline-passengers - Copy.csv')
#Printing head
print(df.head())
#Printing tail
print(df.tail())
df = pd.read_csv('international-airline-passengers - Copy.csv', nrows = 11856)
#Creating train and test set
#Index 10392 marks the end of October 2013
train=df[0:20]
test=df[20:]
#Aggregating the dataset at daily level
df.Timestamp = pd.to_datetime(df.Month,format='%m/%d/%Y %H:%M')
df.index = df.Timestamp
df = df.resample('D').mean()
train.Timestamp = pd.to_datetime(train.Month,format='%m/%d/%Y %H:%M')
print('1')
print(train.Timestamp)
train.index = train.Timestamp
train = train.resample('D').mean()
test.Timestamp = pd.to_datetime(test.Month,format='%m/%d/%Y %H:%M')
test.index = test.Timestamp
test = test.resample('D').mean()
train.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
plt.show()
Not able to understand the reason for getting the graph blank even when train and test data is having value.
Thanks in advance.
I think I found the issue here. The thing is you are using train.Count.plot here, while the value of "plt" is still empty.If you go through the documentation of matplotlib(link down below), you will find that you need to store some value in plt first and here since plt is empty, it is giving back empty plot.
Basically you are not plotting anything and just showing up the blank plot.
Eg: plt.subplots(values) or plt.scatter(values), or any of its function depending on requirements.Hope this helps.
https://matplotlib.org/
import holoviews as hv
import pandas as pd
import numpy as np
data=pd.read_csv("C:/Users/Nisarg.Bhatt/Documents/data.csv", engine="python")
train=data.groupby(["versionCreated"])["Polarity Score"].mean()
table=hv.Table(train)
print(table)
bar=hv.Bars(table).opts(plot=dict(width=1500))
renderer = hv.renderer('bokeh')
app = renderer.app(bar)
print(app)
from bokeh.server.server import Server
server = Server({'/': app}, port=0)
server.start()
server.show("/")
This is done by using Holoviews, it is used for visualisation purpose.If you are using for a professional application, you should definitely try this. Here the versionCreated is date and Polarity is similar to count. Try this
OR, if you want to stick to matplotlib try this:
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(msft.index, msft, label='MSFT')
ax.plot(short_rolling_msft.index, short_rolling_msft, label='20 days rolling')
ax.plot(long_rolling_msft.index, long_rolling_msft, label='100 days rolling')
ax.set_xlabel('Date')
ax.set_ylabel('Adjusted closing price ($)')
ax.legend()
Also this can be used, if you want to stick with matplotlib

Categories