I am coding a project for school (A-Level) and need to be able to download stock data and chart it. I am able to chart the data using matplotlib. However I am only allowed to use a certain number of libraries.
I need to get the data without importing a library,but was unable to do it. I've tried downloading from https://query1.finance.yahoo.com/v7/finance/download/ticker, but the crumb value keeps changing so I keep getting errors from wrong cookie.
How can i fix this? Or is there an easier site for the data?
My code:
import requests
r = requests.get("query1.finance.yahoo.com/v7/finance/download/…)
file = open(r"MSFT.csv", 'w')
file.write(r.text) file.close()
Download the data from https://datahub.io or else you can subscribe for real time data feed from third party vendors of different exchanges.
You stated: 'I am only allowed to use a certain number of libraries.' What does that mean? You should be able to use any libraries you need to use, right. Run the script below. It will download stock data from Yahoo and plot the time series.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.optimize as sco
import datetime as dt
import math
from datetime import datetime, timedelta
from pandas_datareader import data as wb
from sklearn.cluster import KMeans
np.random.seed(777)
start = '2019-4-30'
end = '2019-10-31'
# N = 90
# start = datetime.now() - timedelta(days=N)
# end = dt.datetime.today()
tickers = ['MMM',
'ABT',
'ABBV',
'ABMD',
'AAPL',
'XEL',
'XRX',
'XLNX',
'XYL',
'YUM',
'ZBH',
'ZION',
'ZTS']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
df = pd.concat(price_data)
df.dtypes
df.head()
df.shape
pd.set_option('display.max_columns', 500)
df = df.reset_index()
df = df.set_index('Date')
table = df.pivot(columns='ticker')
# By specifying col[1] in below list comprehension
# You can select the stock names under multi-level column
table.columns = [col[1] for col in table.columns]
table.head()
plt.figure(figsize=(14, 7))
for c in table.columns.values:
plt.plot(table.index, table[c], lw=3, alpha=0.8,label=c)
plt.legend(loc='upper left', fontsize=12)
plt.ylabel('price in $')
Related
I have the following code:
import pandas as pd
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
import warnings
warnings.filterwarnings("ignore")
start = datetime.date(2020,1,1)
end = datetime.date.today()
stock = 'fb'
data = web.DataReader(stock, 'yahoo', start, end)
data.index = pd.to_datetime(data.index, format ='%Y-%m-%d')
data = data[~data.index.duplicated(keep='first')]
data['year'] = data.index.year
data['month'] = data.index.month
data['week'] = data.index.week
data['day'] = data.index.day
data.set_index('year', append=True, inplace =True)
data.set_index('month',append=True,inplace=True)
data.set_index('week',append=True,inplace=True)
data.set_index('day',append=True,inplace=True)
fig, ax = plt.subplots(dpi=300, figsize =(30,4))
data.plot(y='Close', ax=ax, xlabel= 'Date')
plt.show()
What can I do to interpret the multiindex dates as the x axis in more readable year and month format? Such as in a format like strftime('%y -%m'). A similar question was asked here: Renaming months from number to name in pandas
But I am unable to see how I can use this to rename the x axis. Any help would be appreciated.
You can use the dates from matplotlib. See the following link for more details:
https://matplotlib.org/stable/api/dates_api.html#matplotlib.dates.ConciseDateFormatter
Here is the modified code:
import pandas as pd
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
import warnings
warnings.filterwarnings("ignore")
from matplotlib import dates as mdates
start = datetime.date(2020,1,1)
end = datetime.date.today()
stock = 'fb'
data = web.DataReader(stock, 'yahoo', start, end)
data.index = pd.to_datetime(data.index, format ='%Y-%m-%d')
data = data[~data.index.duplicated(keep='first')]
data['year'] = data.index.year
data['month'] = data.index.month
data['week'] = data.index.week
data['day'] = data.index.day
data.set_index('year', append=True, inplace =True)
data.set_index('month',append=True,inplace=True)
data.set_index('week',append=True,inplace=True)
data.set_index('day',append=True,inplace=True)
fig, ax = plt.subplots(dpi=300, figsize =(15,4))
plt.plot(data.index.get_level_values('Date'), data['Close'])
#--------------------------------------
#Feel free to try different options
#--------------------------------------
#locator = mdates.AutoDateLocator()
locator = mdates.MonthLocator()
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
plt.show()
Here is the
output.
I have this part of the code done, but I would like to be able to add more columns like the Volume, Open, High.
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
assets = ["LAC", "NIO"]
weights = np.array([0.5, 0.5])
stockStartDate = "2016-01-01"
today = datetime.today().strftime("%Y-%m-%d")
today
df = pd.DataFrame()
for stock in assets:
df[stock] = web.DataReader(stock, data_source ="yahoo", start = stockStartDate, end = today)["Adj Close"]
Not very sure you want to do, but in essence if you want to be flexible for visualizing more variables, it's easier to keep the data in a long format, with a column (i used stock below) indicating the data source :
import pandas as pd
import numpy as np
import pandas_datareader.data as web
import datetime
assets = ["LAC", "NIO"]
stockStartDate = "2016-01-01"
today = datetime.date.today().strftime("%Y-%m-%d")
df = []
for stock in assets:
x = web.DataReader(stock, data_source ="yahoo",
start = stockStartDate, end = today)
x['stock'] = stock
df.append(x)
df = pd.concat(df)
Then pivot it on the fly to plot:
df.pivot(values='High',columns='stock').plot.line()
Or use seaborn:
import seaborn as sns
sns.lineplot(x = df.index,y = "High",hue = 'stock',data=df)
Here is my code :
import pandas as pd
import io
data="""
;Barcode;Created;Hash;Modified;Tag;Tag2
0;9780735711020;2019-02-22T22:35:06.628Z;None;2019-02-22T22:35:06.628Z;NEED_PICS;
1;3178041328890;2019-02-22T22:37:44.546Z;None;2019-02-22T22:37:44.546Z;DISPLAY;
2;8718951129597;2019-02-23T04:53:17.925Z;None;2019-02-23T04:53:17.925Z;DISPLAY;
3;3770006053078;2019-02-23T05:25:56.454Z;None;2019-02-23T05:25:56.454Z;DISPLAY;
4;3468080404892;2019-02-23T05:26:39.923Z;None;2019-02-23T05:26:39.923Z;NEED_PICS;
5;3517360013757;2019-02-23T05:27:24.910Z;None;2019-02-23T05:27:24.910Z;DISPLAY;
6;3464660000768;2019-02-23T05:27:51.379Z;None;2019-02-23T05:27:51.379Z;DISPLAY;
7;30073357;2019-02-23T06:20:53.075Z;None;2019-02-23T06:20:53.075Z;NEED_PICS;
8;02992;2019-02-23T06:22:57.326Z;None;2019-02-23T06:22:57.326Z;NEED_PICS;
9;3605532558776;2019-02-23T06:23:45.010Z;None;2019-02-23T06:23:45.010Z;NEED_PICS;
10;3605532558776;2019-02-23T06:23:48.291Z;None;2019-02-23T06:23:48.291Z;NEED_PICS;
11;3605532558776;2019-02-23T06:23:52.579Z;None;2019-02-23T06:23:52.579Z;NEED_PICS;
"""
from io import StringIO
TESTDATA = StringIO(data)
df = pd.read_csv(TESTDATA, sep=";")
df["Created"] = pd.to_datetime(df["Created"],errors='coerce')
df["Barcode"] = df["Barcode"].astype(str)
df.set_index(df.columns[0], inplace=True)
df2 = df #df[df.Hash != "None"]
df3 = df2
df3 = df3.loc[df3.Tag == "DISPLAY"]
df = df2.merge(df3, on='Created', how='outer').fillna(0)
df['sum'] = df['Barcode_x']+df['Barcode_y']
df.plot(df['sum'], df['Created'])
So i'am trying at the end to plot, two line graph on the same plot.
I would like to have regrouped by day the number of occurence for two dataframe df2 all Tag , and df3 with just the tag display.
And i would like to plot 2 line in the same graph one with all occurence by day with all the time and one with occurence of just the tag display.
For the moment i only managed to get this :
import pandas as pd
import matplotlib.dates as mdates
import io
data="""
;Barcode;Created;Hash;Modified;Tag;Tag2
0;9780735711020;2019-02-22T22:35:06.628Z;None;2019-02-22T22:35:06.628Z;NEED_PICS;
1;3178041328890;2019-02-22T22:37:44.546Z;None;2019-02-22T22:37:44.546Z;DISPLAY;
2;8718951129597;2019-02-23T04:53:17.925Z;None;2019-02-23T04:53:17.925Z;DISPLAY;
3;3770006053078;2019-02-23T05:25:56.454Z;None;2019-02-23T05:25:56.454Z;DISPLAY;
4;3468080404892;2019-02-23T05:26:39.923Z;None;2019-02-23T05:26:39.923Z;NEED_PICS;
5;3517360013757;2019-02-23T05:27:24.910Z;None;2019-02-23T05:27:24.910Z;DISPLAY;
6;3464660000768;2019-02-23T05:27:51.379Z;None;2019-02-23T05:27:51.379Z;DISPLAY;
7;30073357;2019-02-23T06:20:53.075Z;None;2019-02-23T06:20:53.075Z;NEED_PICS;
8;02992;2019-02-23T06:22:57.326Z;None;2019-02-23T06:22:57.326Z;NEED_PICS;
9;3605532558776;2019-02-23T06:23:45.010Z;None;2019-02-23T06:23:45.010Z;NEED_PICS;
10;3605532558776;2019-02-23T06:23:48.291Z;None;2019-02-23T06:23:48.291Z;NEED_PICS;
11;3605532558776;2019-02-23T06:23:52.579Z;None;2019-02-23T06:23:52.579Z;NEED_PICS;
"""
from io import StringIO
TESTDATA = StringIO(data)
df = pd.read_csv(TESTDATA, sep=";")
df["Created"] = pd.to_datetime(df["Created"],errors='coerce').dt.date
df["Barcode"] = df["Barcode"].astype(str)
# custom date formatting
fig, ax = plt.subplots()
myFmt = mdates.DateFormatter('%Y-%m-%d')
ax.yaxis.set_major_formatter(myFmt)
df1 = df.groupby(["Created"])["Tag"].count().reset_index()
df2 = df[df["Tag"] == "DISPLAY"].groupby(["Created"])["Tag"].count().reset_index()
plt.plot(df1['Tag'], df1['Created'], label='ALL')
plt.plot(df2['Tag'], df2['Created'], label="DISPLAY")
plt.legend(loc='upper left')
plt.show()
Note that since there is no much data, I have chopped off the time part of the data
df["Created"] = pd.to_datetime(df["Created"],errors='coerce').dt.date
You can modify it as per your needs based on whether you want to bucket Tags by date or date-hours, or date-hours-minutes etc
I am new to analytics,python and machine learning and I am working on Time forecasting. Using the following code I am getting the value for train and test data but the graph is plotted blank.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.tsa.api as ExponentialSmoothing
#Importing data
df = pd.read_csv('international-airline-passengers - Copy.csv')
#Printing head
print(df.head())
#Printing tail
print(df.tail())
df = pd.read_csv('international-airline-passengers - Copy.csv', nrows = 11856)
#Creating train and test set
#Index 10392 marks the end of October 2013
train=df[0:20]
test=df[20:]
#Aggregating the dataset at daily level
df.Timestamp = pd.to_datetime(df.Month,format='%m/%d/%Y %H:%M')
df.index = df.Timestamp
df = df.resample('D').mean()
train.Timestamp = pd.to_datetime(train.Month,format='%m/%d/%Y %H:%M')
print('1')
print(train.Timestamp)
train.index = train.Timestamp
train = train.resample('D').mean()
test.Timestamp = pd.to_datetime(test.Month,format='%m/%d/%Y %H:%M')
test.index = test.Timestamp
test = test.resample('D').mean()
train.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Result', fontsize=14)
plt.show()
Not able to understand the reason for getting the graph blank even when train and test data is having value.
Thanks in advance.
I think I found the issue here. The thing is you are using train.Count.plot here, while the value of "plt" is still empty.If you go through the documentation of matplotlib(link down below), you will find that you need to store some value in plt first and here since plt is empty, it is giving back empty plot.
Basically you are not plotting anything and just showing up the blank plot.
Eg: plt.subplots(values) or plt.scatter(values), or any of its function depending on requirements.Hope this helps.
https://matplotlib.org/
import holoviews as hv
import pandas as pd
import numpy as np
data=pd.read_csv("C:/Users/Nisarg.Bhatt/Documents/data.csv", engine="python")
train=data.groupby(["versionCreated"])["Polarity Score"].mean()
table=hv.Table(train)
print(table)
bar=hv.Bars(table).opts(plot=dict(width=1500))
renderer = hv.renderer('bokeh')
app = renderer.app(bar)
print(app)
from bokeh.server.server import Server
server = Server({'/': app}, port=0)
server.start()
server.show("/")
This is done by using Holoviews, it is used for visualisation purpose.If you are using for a professional application, you should definitely try this. Here the versionCreated is date and Polarity is similar to count. Try this
OR, if you want to stick to matplotlib try this:
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(msft.index, msft, label='MSFT')
ax.plot(short_rolling_msft.index, short_rolling_msft, label='20 days rolling')
ax.plot(long_rolling_msft.index, long_rolling_msft, label='100 days rolling')
ax.set_xlabel('Date')
ax.set_ylabel('Adjusted closing price ($)')
ax.legend()
Also this can be used, if you want to stick with matplotlib
Newbie question, thank you in advance!
I'm trying to group the data by both date and industry and display a chart that shows the different industry revenue numbers across the time series in monthly increments.
I am working from a SQL export that has timestamps, having a bear of time getting this to work.
Posted sample csv data file here:
https://drive.google.com/open?id=0B4xdnV0LFZI1WGRMN3AyU2JERVU
Here's a small data example:
Industry Date Revenue
Fast Food 01-05-2016 12:18:02 100
Fine Dining 01-08-2016 09:17:48 110
Carnivals 01-18-2016 10:48:52 200
My failed attempt is here:
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('2012_to_12_27_2016.csv')
df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors = 'coerce')
df['Year'] = df.Ship_Date.dt.year
df['Ship_Date'] = pd.DatetimeIndex(df.Ship_Date).normalize()
df.index = df['Ship_Date']
df_skinny = df[['Shipment_Piece_Revenue', 'Industry']]
groups = df_skinny[['Shipment_Piece_Revenue', 'Industry']].groupby('Industry')
groups = groups.resample('M').sum()
groups.index = df['Ship_Date']
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
You could use DataFrame.Series.unique to get a list of all industries and then, using DataFrame.loc, define a new DataFrame object that only contains data from a single Industry.
Then if we set the Ship Date column as the index of the new DataFrame, we can use DataFrame.resample, specify the frequency as months and call sum() to get the total revenue for that month.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Graph_Sample_Data.csv')
df['Ship Date'] = pd.to_datetime(df['Ship Date'], errors='coerce')
fig, ax = plt.subplots()
for industry in df.Industry.unique():
industry_df = df.loc[df.Industry == industry]
industry_df.index = industry_df['Ship Date']
industry_df = industry_df.resample('M').sum()
industry_df.plot(x=industry_df.index,
y='Revenue',
ax=ax,
label=industry)
plt.show()