Python DataFrames putting multipule dataframes into a single Data Frame - python

Import the required modules
import pandas.io.data as web
import datetime
import pandas as pd
import matplotlib.pyplot as plt
# Enable inline plotting
%matplotlib inline
Set the date range and the stock code of the stock
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime.now()
Stock_List = ('BHP.Ax','AMP.AX','PRR.AX')
Stock_Code = "BHP.AX"
df_Stock_Code = pd.DataFrame()
Results = pd.DataFrame()
Loops though codes and gets data
for Stock_Code in Stock_List:
#Queries yahoo website for asx code froma start and end date
f = web.DataReader(Stock_Code, 'yahoo', start, end)
f['Stock_Code'] = Stock_Code
df_Stock_Code = f
Concatenates all of the data frames into on but it fails for some reason
Results = [Results,df_Stock_Code]
df_Results = pd.Concat(Results)

In the original code, df_Stock_Code is treated as a list of dataframes. However, in the for loop, the current dataframe is being assigned to it rather than appending the latest dataframe to it.
The result is that in the concatenation step, df_Stock_Code is just a single dataframe referring to the last stock in your Stock_List. Results is also never used in the loop and is just an empty dataframe from when it was initialized at the start.
Try this instead:
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime.now()
Stock_List = ('BHP.Ax','AMP.AX','PRR.AX')
Stock_Code = "BHP.AX"
df_Stock_Code = pd.DataFrame()
Results = []
for Stock_Code in Stock_List:
#Queries yahoo website for asx code froma start and end date
f = web.DataReader(Stock_Code, 'yahoo', start, end)
f['Stock_Code'] = Stock_Code
df_Stock_Code = Results.append(f)
df_Results = pd.concat(Results)
print df_Results

Related

Is it possible to create a master dataframe for many companies from yfinance?

My objectives are to:
Get Yahoo finance OHLC (Open, High, Low, and Close) data into Postgres.
Being able to update the data easily.
Being able to easily add or remove tickers.
My current methodology:
Create pandas dataframe.
Dump data to .csv
From Postgres COPY
ISSUE:
I do not know how to create a dataframe for company A, then append (merge, join, concat, ECT) dataframe for other companies (~150 companies so far) and dump to .cvs.
Below are my actual code and a workaround that provides for the desired result but is clunky.
Let me know what you think.
ACTUAL (not working as expected)
import pandas as pd
import yfinance as yf
tickers = ['VIR','PATH']
#ticker = ['VIR']
for ticker in tickers:
df_yahoo = yf.download(ticker,
#start='2000-01-01',
#end='2010-12-31',
progress='True')
df = pd.DataFrame(df_yahoo)
df.insert(0, 'TICKER', ticker)
file_name = "/Users/kevin/Dropbox/Programming/Python/test_data/deleteme.csv"
df.to_csv(file_name)
print(df)
WORKAROUND (working)
import pandas as pd
import yfinance as yf
import pickle
tickers = ['VIR']
#ticker = ['VIR']
for ticker in tickers:
df_yahoo = yf.download(ticker,
#start='2000-01-01',
#end='2010-12-31',
progress='True')
df = pd.DataFrame(df_yahoo)
df.insert(0, 'TICKER', ticker)
tickers = ['PATH']
#ticker = ['VIR']
for ticker in tickers:
df_yahoo = yf.download(ticker,
#start='2000-01-01',
#end='2010-12-31',
progress='True')
df1 = pd.DataFrame(df_yahoo)
df1.insert(0, 'TICKER', ticker)
frames = [df1, df]
result = pd.concat(frames)
file_name = "/Users/kevin/Dropbox/Programming/Python/test_data/deleteme.csv"
result.to_csv(file_name)
print(df)
Given what I think you want to accomplish, this is how I would do it:
# Create a function to load the data and create the frame
# Assumes len(tickers) >= 1
def build_df(tickers):
df = pd.DataFrame(yf.download(tickers[0],
#start='2000-01-01',
#end='2010-12-31',
progress='True'))
df.insert(0, 'TICKER', tickers[0])
for ticker in tickers[1:]:
dx = pd.DataFrame(yf.download(ticker,
#start='2000-01-01',
#end='2010-12-31',
progress='True'))
dx.insert(0, 'TICKER', ticker)
df = pd.concat([df, dx])
return df
Then Call the function to assemble the desired DF as follows:
result = build_df(tickers)
Finally, output the completed frame to CSV
file_name = "/Users/kevin/Dropbox/Programming/Python/test_data/deleteme.csv"
result.to_csv(file_name)

Using .pivot() after saving and loading from CSV causes KeyError

I am trying to pull data from Yahoo! Finance for analysis and am having trouble when I want to read from a CSV file instead of downloading from Yahoo! every time I run the program.
import pandas_datareader as pdr
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
def get(tickers, startdate, enddate):
def data(ticker):
return pdr.get_data_yahoo(ticker, start = startdate, end = enddate)
datas = map(data, tickers)
return(pd.concat(datas, keys = tickers, names = ['Ticker', 'Date']))
tickers = ['AAPL', 'MSFT', 'GOOG']
all_data = get(tickers, datetime.datetime(2006, 10,1), datetime.datetime(2018, 1, 7))
all_data.to_csv('data/alldata.csv')
#Open file
all_data_csv = pd.read_csv('data/alldata.csv', header = 0, index_col = 'Date', parse_dates = True)
daily_close = all_data[['Adj Close']].reset_index().pivot('Date', 'Ticker', 'Adj Close')
I'm having problems with the 'daily_close' section. The above code works as it is using 'all_data' which comes directly from the web. How do I alter the bottom line of code so that the data is being pulled from my csv file? I have tried daily_close = all_data_csv[['Adj Close']].reset_index().pivot('Date', 'Ticker', 'Adj Close') however this results in a KeyError due to 'Ticker'.
The csv data is in the following format, with the first column containing all of the tickers:
Your current code for all_data_csv will not work as it did for all_data. This is a consequence of the fact that all_data contains a MultiIndex with all the information needed to carry out the pivot.
However, in the case of all_data_csv, the only index is Date. So, we'd need to do a little extra in order to get this to work.
First, reset the Date index
Select only the columns you need - ['Date', 'Ticker', 'Adj Close']
Now, pivot on these columns
c = ['Date', 'Ticker', 'Adj Close']
daily_close = all_data_csv.reset_index('Date')[c].pivot(*c)
daily_close.head()
Ticker AAPL GOOG MSFT
Date
2006-10-02 9.586717 199.422943 20.971155
2006-10-03 9.486828 200.714539 20.978823
2006-10-04 9.653308 206.506866 21.415722
2006-10-05 9.582876 204.574448 21.400393
2006-10-06 9.504756 208.891357 21.362070

Dataframe to .csv - is only writing last value - Python/Pandas

I'm trying to write a dataframe to a .csv using df.to_csv(). For some reason, its only writing the last value (data for the last ticker). It reads through a list of tickers (turtle, all tickers are in first column) and spits out price data for each ticker. I can print all the data without a problem but can't seem to write to .csv. Any idea why? Thanks
input_file = pd.read_csv("turtle.csv", header=None)
for ticker in input_file.iloc[:,0].tolist():
data = web.DataReader(ticker, "yahoo", datetime(2011,06,1), datetime(2016,05,31))
data['ymd'] = data.index
year_month = data.index.to_period('M')
data['year_month'] = year_month
first_day_of_months = data.groupby(["year_month"])["ymd"].min()
first_day_of_months = first_day_of_months.to_frame().reset_index(level=0)
last_day_of_months = data.groupby(["year_month"])["ymd"].max()
last_day_of_months = last_day_of_months.to_frame().reset_index(level=0)
fday_open = data.merge(first_day_of_months,on=['ymd'])
fday_open = fday_open[['year_month_x','Open']]
lday_open = data.merge(last_day_of_months,on=['ymd'])
lday_open = lday_open[['year_month_x','Open']]
fday_lday = fday_open.merge(lday_open,on=['year_month_x'])
monthly_changes = {i:MonthlyChange(i) for i in range(1,13)}
for index,ym, openf,openl in fday_lday.itertuples():
month = ym.strftime('%m')
month = int(month)
diff = (openf-openl)/openf
monthly_changes[month].add_change(diff)
changes_df = pd.DataFrame([monthly_changes[i].get_data() for i in monthly_changes],columns=["Month","Avg Inc.","Inc","Avg.Dec","Dec"])
CSVdir = r"C:\Users\..."
realCSVdir = os.path.realpath(CSVdir)
if not os.path.exists(CSVdir):
os.makedirs(CSVdir)
new_file_name = os.path.join(realCSVdir,'PriceData.csv')
new_file = open(new_file_name, 'wb')
new_file.write(ticker)
changes_df.to_csv(new_file)
Use a for appending instead of wb because it overwrites the data in every iteration of loop.For different modes of opening a file see here.

how to overrule / skip items in a list that cause error

Below is the python script i'm working on to pick out the stocks that meet certain price criteria (as written, tickerlist=[]collects tickers of the stocks whose max price and min price were >30 and <2 respectively.)
import matplotlib.pyplot as plt
import math
import csv
import pandas as pd
import datetime
import pandas.io.data as web
from filesortfunct import filesort
from scipy import stats
from scipy.stats.stats import pearsonr
import numpy as np
import math
dataname= 'NASDAQ.csv' #csv file from which to extract stock tickers
df = pd.read_csv(dataname, sep=',')
df = df[['Symbol']]
df.to_csv(new+dataname, sep=',', index=False)
x=open(new+dataname,'rb') #convert it into a form more managable
f = csv.reader(x) # csv is binary
Symbol = zip(*f)
print type(Symbol) #list format
Symbol=Symbol[0] #pick out the first column
Symbol = Symbol[1:len(Symbol)] #remove the first row "symbol" header
Symbol= Symbol[0:2] #slicing to coose which stocks to look at
#decide the two dates between which to look at stock prices
start = datetime.datetime.strptime('2/10/2016', '%m/%d/%Y')
end = datetime.datetime.strptime('2/24/2016', '%m/%d/%Y')
#intended to collect indeces and min/max prices
tickerlist=[]
maxpricelist = []
minpricelist =[]
for item in Symbol:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
plt.figure()
ap = plt.plot(serious2)
indexmax, valuemax = max(enumerate(serious2))
indexmin, valuemin = min(enumerate(serious2))
if valuemax>30 and valuemin<2:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
plt.show()
The issue that i have right now is that some of the stocks on the list are discontinued? or YAHOO does not have their stock prices listed I suppose. So, when those stock tickers are included in the slicing, i get the following error message.
RemoteDataError: No data fetched using '_get_hist_yahoo'
Is there a way to bypass that?
Thanks in advance!
-------------Add------------------------
I added except RemoteDataError: as suggested but i get either invalid syntax or unexpected indentation..
for item in Symbol:
print item
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
except RemoteDataError:
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
plt.figure()
ap = plt.plot(serious2)
indexmax, valuemax = max(enumerate(serious2))
indexmin, valuemin = min(enumerate(serious2))
if valuemax>30 and valuemin<100:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
plt.show()
print tickerlist

For loop after for loop produces wrong output Python

I am trying to use for loops to iterate through some Yahoo Finance data and calculate the return the papers. The problem is that I want to do this for different times, and that I have a document containing the different start and end dates. This is the code I have been using:
import pandas as pd
import numpy as np
from pandas.io.data import DataReader
from datetime import datetime
# This function is just used to download the data I want and saveing
#it to a csv file.
def downloader():
start = datetime(2005,1,1)
end = datetime(2010,1,1)
tickers = ['VIS', 'VFH', 'VPU']
stock_data = DataReader(tickers, "yahoo", start, end)
price = stock_data['Adj Close']
price.to_csv('data.csv')
downloader()
#reads the data into a Pandas DataFrame.
price = pd.read_csv('data.csv', index_col = 'Date', parse_dates = True)
#Creates a Pandas DataFrame that holdt multiple dates. The formate on this is the same as the format I have on the dates when I load the full csv file of dates.
inp = [{'start' : datetime(2005,1,3), 'end' : datetime(2005,12,30)},
{'start' : datetime(2005,2,1), 'end' : datetime(2006,1,31)},
{'start' : datetime(2005,3,1), 'end' : datetime(2006,2,28)}]
df = pd.DataFrame(inp)
#Everything above this is not part of the original script, but this
#is just used to replicate the problem I am having.
results = pd.DataFrame()
for index, row in df.iterrows():
start = row['start']
end = row['end']
price_initial = price.ix[start:end]
for column1 in price_initial:
price1 = price_initial[column1]
startprice = price1.ix[end]
endprice = price1.ix[start]
momentum_value = (startprice / endprice)-1
results = results.append({'Ticker' : column1, 'Momentum' : momentum_value}, ignore_index=True)
results = results.sort(columns = "Momentum", ascending = False).head(1)
print(results.to_csv(sep= '\t', index=False))
I am not sure what I am doing wrong here. But I suspect there is something about the way I iterate over or the way I save the output from the script.
The output I get is this:
Momentum Ticker
0.16022263953253435 VPU
Momentum Ticker
0.16022263953253435 VPU
Momentum Ticker
0.16022263953253435 VPU
That is clearly not correct. Hope someone can help me get this right.

Categories