For loop after for loop produces wrong output Python - python

I am trying to use for loops to iterate through some Yahoo Finance data and calculate the return the papers. The problem is that I want to do this for different times, and that I have a document containing the different start and end dates. This is the code I have been using:
import pandas as pd
import numpy as np
from pandas.io.data import DataReader
from datetime import datetime
# This function is just used to download the data I want and saveing
#it to a csv file.
def downloader():
start = datetime(2005,1,1)
end = datetime(2010,1,1)
tickers = ['VIS', 'VFH', 'VPU']
stock_data = DataReader(tickers, "yahoo", start, end)
price = stock_data['Adj Close']
price.to_csv('data.csv')
downloader()
#reads the data into a Pandas DataFrame.
price = pd.read_csv('data.csv', index_col = 'Date', parse_dates = True)
#Creates a Pandas DataFrame that holdt multiple dates. The formate on this is the same as the format I have on the dates when I load the full csv file of dates.
inp = [{'start' : datetime(2005,1,3), 'end' : datetime(2005,12,30)},
{'start' : datetime(2005,2,1), 'end' : datetime(2006,1,31)},
{'start' : datetime(2005,3,1), 'end' : datetime(2006,2,28)}]
df = pd.DataFrame(inp)
#Everything above this is not part of the original script, but this
#is just used to replicate the problem I am having.
results = pd.DataFrame()
for index, row in df.iterrows():
start = row['start']
end = row['end']
price_initial = price.ix[start:end]
for column1 in price_initial:
price1 = price_initial[column1]
startprice = price1.ix[end]
endprice = price1.ix[start]
momentum_value = (startprice / endprice)-1
results = results.append({'Ticker' : column1, 'Momentum' : momentum_value}, ignore_index=True)
results = results.sort(columns = "Momentum", ascending = False).head(1)
print(results.to_csv(sep= '\t', index=False))
I am not sure what I am doing wrong here. But I suspect there is something about the way I iterate over or the way I save the output from the script.
The output I get is this:
Momentum Ticker
0.16022263953253435 VPU
Momentum Ticker
0.16022263953253435 VPU
Momentum Ticker
0.16022263953253435 VPU
That is clearly not correct. Hope someone can help me get this right.

Related

Create Pandas Dataframe from WebScraping results of stock price

I'm trying to write an script which creates an Pandas Dataframe (df) an add every time x an stock price to the df. The Data is from wrebscraping.
This is my code, but I have no idea how to add every time x (e.g. 1min) new data to the df and not replace the old.
import bs4 as bs
import urllib.request
import time as t
import re
import pandas as pd
i = 1
while i == 1:
# !! Scraping
link = 'https://www.onvista.de/aktien/DELIVERY-HERO-SE-Aktie-DE000A2E4K43'
parser = urllib.request.urlopen(link).read()
soup = bs.BeautifulSoup(parser, 'lxml')
stock_data = soup.find('ul', {'class': 'KURSDATEN'})
stock_price_eur_temp = stock_data.find('li')
stock_price_eur = stock_price_eur_temp.get_text()
final_stock_price = re.sub('[EUR]','', stock_price_eur)
print (final_stock_price)
t.sleep(60)
# !! Building an dataframe
localtime = t.asctime(t.localtime(t.time()))
stock_data_b = {
'Price': [final_stock_price],
'Time': [localtime],
}
df = pd.DataFrame(stock_data_b, columns=['Price', 'Time'])
I hope you can help me with an idea for this problem.
Because you create df inside the loop, you're re-writing that variable name each time, writing over the data from the previous iteration. You want to initialize a dataframe before the loop, and then add to it each time.
Before your loop, add the line
df2 = pd.DataFrame()
which just creates an empty dataframe. After the end of the code you posted, add
df2 = df2.append(df, ignore_index = True)
which will tack each new df on to the end of df2.

How to convert dataframe containing date into list of list with correct date format and save in csv file

How to write dates of dataframe in a file.
import csv
import pandas as pd
writeFile = open("dates.csv","w+")
writer = csv.writer(writeFile)
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
Convert2List = dates.values.tolist()
for row in Convert2List:
writer.writerow(row)
writeFile.close()
My actual values are:
1.54699E+18
1.54708E+18
1.54716E+18
1.54725E+18
1.54734E+18
And the expected values should be:
01-09-2019
02-09-2019
03-09-2019
If you have a pandas dataframe you can just use the method pandas.DataFrame.to_csv and set the parameters (link to documentation).
Pandas has a write to file function build-in. Try:
import pandas as pd
dates = pd.DataFrame(pd.date_range(start = '01-09-2019', end = '30-09-2019'))
#print (dates) # check here if the dates is written correctly.
dates.to_csv('dates.csv') # writes the dataframe directly to a file.
The date.csv file gives me:
,0
0,2019-01-09
1,2019-01-10
2,2019-01-11
3,2019-01-12
...snippet...
262,2019-09-28
263,2019-09-29
264,2019-09-30
Changing date order to get date range September for default settings:
dates = pd.DataFrame(pd.date_range(start = '2019-09-01', end = '2019-09-30'))
Gives:
0_29 entries for 30 days of September.
Furthermore, changing the date order for custom settings:
dates[0] = pd.to_datetime(dates[0]).apply(lambda x:x.strftime('%d-%m-%Y'))
Gives you:
01-09-2019
02-09-2019
03-09-2019
...etc.

Python: outputting lists to excel

For my master thesis, I need to calculate expected returns for x number of stocks on a given event date. I have written the following code, which does what I intends (match Fama & French factors with a sample of event dates). However, when I try to export it to excel I can't seem to get the correct output. I.e. it doesn't contain column headings such as Dates, names of fama & french factors and the corresponding rows.
Does anybody have a workaround for this? Any improvements are gladly appreciated. Here are my code:
import pandas as pd
# Data import
ff_five = pd.read_excel('C:/Users/MBV/Desktop/cmon.xlsx',
infer_datetime_format=True)
df = pd.read_csv('C:/Users/MBV/Desktop/4.csv', parse_dates=True,
infer_datetime_format=True)
# Converting dates to datetime
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
# Creating an empty placeholder
end_date = []
# Iterating over the event dates, creating a start and end date 60 months
apart
for index, row in df.iterrows():
end_da = row['Date']-pd.DateOffset(months=60)
end_date.append(end_da)
end_date_df = pd.DataFrame(data=end_date)
m = pd.merge(end_date_df,df,left_index=True,right_index=True)
m.columns = ['Start','End']
ff_factors = []
for index, row in m.iterrows():
ff_five['Date'] = pd.to_datetime(ff_five['Date'])
time_range= (ff_five['Date'] > row['Start']) & (ff_five['Date'] <=
row['End'])
df = ff_five.loc[time_range]
ff_factors.append(df)
EDIT:
Here are my attempt at getting the data from python to excel.
ff_factors_df = pd.DataFrame(data=ff_factors)
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('estimation_data.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
ff_factors_df.to_csv(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
To output a dataframe to csv or excel should be able to be done with
ff_five.to_excel('Filename.xls')
Change excel to csv if you want it to a csv.
Ok I tried to interpret what you were trying to do without it being very clear. But if I was interpreting it correctly you are trying to create some addition columns based on other data. Instead of creating separate lists you could possibly just put them in as new columns and then just output the columns you want potentially. Something like this maybe (had to make some assumptions and create some fake data to see if this is on the right track):
import pandas as pd
ff_five = pd.DataFrame()
ff_five['Date'] = ["2012-11-01", "2012-11-30"]
df = pd.DataFrame()
df['Date'] = ["2012-12-01", "2012-12-30"]
df['Date'] = pd.to_datetime(df['Date'])
df['End'] = df['Date'] - pd.DateOffset(months=60)
df.columns = ['Start', 'End']
ff_five['Date'] = pd.to_datetime(ff_five['Date'])
df['ff_factor'] = (ff_five['Date'] > df['Start']) & (ff_five['Date'] <= df['End'])
df.to_excel('estimation_data.xlsx', sheet_name='Sheet1')

Python DataFrames putting multipule dataframes into a single Data Frame

Import the required modules
import pandas.io.data as web
import datetime
import pandas as pd
import matplotlib.pyplot as plt
# Enable inline plotting
%matplotlib inline
Set the date range and the stock code of the stock
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime.now()
Stock_List = ('BHP.Ax','AMP.AX','PRR.AX')
Stock_Code = "BHP.AX"
df_Stock_Code = pd.DataFrame()
Results = pd.DataFrame()
Loops though codes and gets data
for Stock_Code in Stock_List:
#Queries yahoo website for asx code froma start and end date
f = web.DataReader(Stock_Code, 'yahoo', start, end)
f['Stock_Code'] = Stock_Code
df_Stock_Code = f
Concatenates all of the data frames into on but it fails for some reason
Results = [Results,df_Stock_Code]
df_Results = pd.Concat(Results)
In the original code, df_Stock_Code is treated as a list of dataframes. However, in the for loop, the current dataframe is being assigned to it rather than appending the latest dataframe to it.
The result is that in the concatenation step, df_Stock_Code is just a single dataframe referring to the last stock in your Stock_List. Results is also never used in the loop and is just an empty dataframe from when it was initialized at the start.
Try this instead:
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime.now()
Stock_List = ('BHP.Ax','AMP.AX','PRR.AX')
Stock_Code = "BHP.AX"
df_Stock_Code = pd.DataFrame()
Results = []
for Stock_Code in Stock_List:
#Queries yahoo website for asx code froma start and end date
f = web.DataReader(Stock_Code, 'yahoo', start, end)
f['Stock_Code'] = Stock_Code
df_Stock_Code = Results.append(f)
df_Results = pd.concat(Results)
print df_Results

how to overrule / skip items in a list that cause error

Below is the python script i'm working on to pick out the stocks that meet certain price criteria (as written, tickerlist=[]collects tickers of the stocks whose max price and min price were >30 and <2 respectively.)
import matplotlib.pyplot as plt
import math
import csv
import pandas as pd
import datetime
import pandas.io.data as web
from filesortfunct import filesort
from scipy import stats
from scipy.stats.stats import pearsonr
import numpy as np
import math
dataname= 'NASDAQ.csv' #csv file from which to extract stock tickers
df = pd.read_csv(dataname, sep=',')
df = df[['Symbol']]
df.to_csv(new+dataname, sep=',', index=False)
x=open(new+dataname,'rb') #convert it into a form more managable
f = csv.reader(x) # csv is binary
Symbol = zip(*f)
print type(Symbol) #list format
Symbol=Symbol[0] #pick out the first column
Symbol = Symbol[1:len(Symbol)] #remove the first row "symbol" header
Symbol= Symbol[0:2] #slicing to coose which stocks to look at
#decide the two dates between which to look at stock prices
start = datetime.datetime.strptime('2/10/2016', '%m/%d/%Y')
end = datetime.datetime.strptime('2/24/2016', '%m/%d/%Y')
#intended to collect indeces and min/max prices
tickerlist=[]
maxpricelist = []
minpricelist =[]
for item in Symbol:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
plt.figure()
ap = plt.plot(serious2)
indexmax, valuemax = max(enumerate(serious2))
indexmin, valuemin = min(enumerate(serious2))
if valuemax>30 and valuemin<2:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
plt.show()
The issue that i have right now is that some of the stocks on the list are discontinued? or YAHOO does not have their stock prices listed I suppose. So, when those stock tickers are included in the slicing, i get the following error message.
RemoteDataError: No data fetched using '_get_hist_yahoo'
Is there a way to bypass that?
Thanks in advance!
-------------Add------------------------
I added except RemoteDataError: as suggested but i get either invalid syntax or unexpected indentation..
for item in Symbol:
print item
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
except RemoteDataError:
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
plt.figure()
ap = plt.plot(serious2)
indexmax, valuemax = max(enumerate(serious2))
indexmin, valuemin = min(enumerate(serious2))
if valuemax>30 and valuemin<100:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
plt.show()
print tickerlist

Categories