want to add some columns from multiple dataframe into one specific dataframe - python

so basically I have downloaded multiple stocks data in and stored in CSV format so I created a function to that and passed a list of stocks name to that user-defined function .so one stock data have multiple columns in like open price, close price etc so I want close price column from every stock df stored in a new data frame with stock names as heading to the columns in new data frame with their close prices in it
so I created a function to download multiple stocks data and passed a list of stocks names to get data I wanted and the function stores them in a CSV format
2) then I tried creating a for loop which reads each and every stock data CSV file and tries to pick only close column from each stock dataframe and store it another empty data frame so i have a data frame of-the stocks close prices with their column header as stock name of the close prices so i was succesful in dowloading the stocks data but failed in 2 part
stocks = ['MSFT','IBM', 'GM', 'ACN', 'GOOG']
end=datetime.datetime.now().date()
start=end-pd.Timedelta(days=365*5)
def hist_data(stocks):
stock_df=web.DataReader(stocks,'iex',start,end)
stock_df['Name']=stocks
fileName=stocks+'_data.csv'
stock_df.to_csv(fileName)
with futures.ThreadPoolExecutor(len(stocks)) as executor:
result=executor.map(dwnld_data,stocks)
print('completed')
#failing in the code below
close_prices = pd.DataFrame()
for i in stocks:
df = pd.read_csv(i + '_data.csv')
df1 = df['close']
close_prices.append(df1)
#so when I try to print close_prices I get blank output

Try the following:
close_prices = pd.DataFrame()
for i in stocks:
df = pd.read_csv(i + '_data.csv')
close_prices[i] = df['close']

Related

Cannot get a file to be read into a list of stock tickers and then get yfinance data for each

I am trying to read a csv file into a dataframe and then iterate over each ticker to get some yahoo finance data, but I struggle with matching the right data type read from the CSV file. The problem is that yfinance needs a str for the ticker parameter data = yf.download("AAPL', start ="2019-01-01", end="2022-04-20")
and I cannot convert the df row item into str.
This is my code:
combined = yf.download("SPY", start ="2019-01-01", end="2019-01-02")
for index, row in stockslist.iterrows():
data = yf.download([index,row["ticker"].to_string], start ="2019-01-01", end="2022-04-20")
and this is the csv file
The question is basically about this part of the code " [index,row["ticker"].to_string] " . I cannot get to pass each row of the dataframe as a ticker argument to finance.
The error I get is "TypeError: expected string or bytes-like object "
The download function doesn't understand [index,row["ticker"].to_string] parameter. Like where does it come from ?
you have to give the context of it. Like building an array with the values from the CSV then you pass the array[i].value to the download function.
A quick example with fictional code :
#initiate the array with the ticker list
array_ticker = ['APPL''MSFT''...']
#reads array + download
for i=0 in range(array_ticker.size):
data = yf.download(array_ticker[i], start ="2019-01-01", end="2022-04-20")
i=i+1
UPDATE :
If you want to keep the dataframe as you are using now, I just did a simple code to help you to sort your issue :
import pandas as pd
d = {'ticker': ['APPL', 'MSFT', 'GAV']}
ticker_list = pd.DataFrame(data=d) #creating the dataframe
print(ticker_list) #print the whole dataframe
print('--------')
print(ticker_list.iloc[1]['ticker']) #print the 2nd value of the column ticker
Same topic : How to iterate over rows in a DataFrame in Pandas

adding a column to my dataframe with yFinance

beginner question coming up and cant seem to connect the dots.
I have a portfolio data frame called my_pf which includes the tickers that I use for collecting the opening price. I success in collecting the opening data via the next two steps.
#create a list from the column 'ticker'
my_tickers = my_pf['ticker'].tolist()
#collect the opening data per ticker
for ticker in my_tickers:
open_price = yf.Ticker(ticker).info.get('open')
print(ticker, open_price)
The next step is adding the extracted data to my initial data frame. But how would i go about this?
Thank you for your help in advance.
There are many ways to add data to a column, such as df.append() and pd.concat(), but we created our code with df.append(). We start with an empty data frame to create the stock column and the opening price column. Once we have the opening price, we add the brand name and opening price to the data frame we just created.
import pandas as pd
import yfinance as yf
# my_tickers = my_pf['ticker'].tolist()
my_tickers = ['msft', 'aapl', 'goog']
tickers = yf.Tickers(my_tickers)
df = pd.DataFrame(index=[], columns=['ticker','Open'])
for ticker in my_tickers:
open_price = yf.Ticker(ticker).info.get('open')
df = df.append(pd.Series([ticker,open_price], index=df.columns), ignore_index=True)
print(df)
ticker Open
0 msft 204.07
1 aapl 112.37
2 goog 1522.36

How to merge some CSV files into one DataFrame?

I have some CSV files with exactly the same structure of stock quotes (timeframe is one day):
date,open,high,low,close
2001-10-15 00:00:00 UTC,56.11,59.8,55.0,57.9
2001-10-22 00:00:00 UTC,57.9,63.63,56.88,62.18
I want to merge them all into one DataFrame with only close price columns for each stock. The problem is different files has different history depth (they started from different dates in different years). I want to align them all by date in one DataFrame.
I'm trying to run the following code, but I have nonsense in the resulted df:
files = ['FB', 'MSFT', 'GM', 'IBM']
stock_d = {}
for file in files: #reading all files into one dictionary:
stock_d[file] = pd.read_csv(file + '.csv', parse_dates=['date'])
date_column = pd.Series() #the column with all dates from all CSV
for stock in stock_d:
date_column = date_column.append(stock_d[stock]['date'])
date_column = date_column.drop_duplicates().sort_values(ignore_index=True) #keeping only unique values, then sorting by date
df = pd.DataFrame(date_column, columns=['date']) #creating final DataFrame
for stock in stock_d:
stock_df = stock_d[stock] #this is one of CSV files, for example FB.csv
df[stock] = [stock_df.iloc[stock_df.index[stock_df['date'] == date]]['close'] for date in date_column] #for each date in date_column adding close price to resulting DF, or should be None if date not found
print(df.tail()) #something strange here - Series objects in every column
The idea is first to extract all dates from each file, then to distribute close prices among according columns and dates. But obviously I'm doing something wrong.
Can you help me please?
If I understand you correctly, what you are looking for is the pivot operation:
files = ['FB', 'MSFT', 'GM', 'IBM']
df = [] # this is a list, not a dictionary
for file in files:
# You only care about date and closing price
# so only keep those 2 columns to save memory
tmp = pd.read_csv(file + '.csv', parse_dates=['date'], usecols=['date', 'close']).assign(symbol=file)
df.append(tmp)
# A single `concat` is faster then sequential `append`s
df = pd.concat(df).pivot(index='date', columns='symbol')

Reading multiple csv files into single DataFrame

I am trying to read multiple csv stock price files all of which have following columns: Date,Time, Open, High, Low, Close. The code is:
import pandas as pd
tickers=['gmk','yandex','sberbank']
ohlc_intraday={}
ohlc_intraday['gmk']=pd.read_csv("gmk_15min.csv",parse_dates=["<DATE>"],dayfirst=True)
ohlc_intraday['yandex']=pd.read_csv("yndx_15min.csv",parse_dates=["<DATE>"],dayfirst=True)
ohlc_intraday['sberbank']=pd.read_csv("sber_15min.csv",parse_dates=["<DATE>"],dayfirst=True)
df=copy.deepcopy(ohlc_intraday)
for i in range(len(tickers)):
df[tickers[i]] = df[tickers[i]].iloc[:, 2:]
df[tickers[i]].columns = ['Date','Time',"Open", "High", "Low", "Adj Close", "Volume"]
df[tickers[i]]['Time']=[x+':00' for x in df['Time']]
However, I am then faced with the KeyError: 'Time'. Seems like columns are not keys.
Is it possible to read or convert it to a DataFrame format with keys being stock tickers (gmk, yandex, sberbank) and column names, so I can easily extract value using following code
ohlc_intraday['sberbank']['Date'][1]
What you could do is create a DataFrame that has a column that specifies the market.
import pandas as pd
markets = ["gmk", "yandex", "sberbank"]
markets = ["gmk_15min.csv", "yndx_15min.csv", "sberbank.csv"]
dfs = [pd.read_csv(market, parse_dates=["<DATE>"], dayfirst=True)
for market in markets]
# add market column to each df
for df in dfs:
df['market'] = market
# concatenate in one dataframe
df = pd.concat(dfs)
Then access what you want in this manner
df[df['market'] == 'yandex']['Date'].iloc[1]

Create new csv based on content of row

I have a .csv called cleaned_data.csv formatted like so:
Date,State,Median Listing Price
1/31/2010,Alabama,169900
2/28/2010,Alabama,169900
3/31/2010,Alabama,169500
1/31/2010,Alaska,239900
2/28/2010,Alaska,241250
3/31/2010,Alaska,248000
I would like to create a new .csv file for each state, named {state}.csv, that has the Date and Median Listing Price.
So far I have this:
import pandas
csv = pandas.read_csv('cleaned_data.csv', sep='\s*,\s*', header=0, encoding='utf-8-sig')
state_list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', ...]
for state in state_list:
csv = csv[csv['State'] == f'{state}']
csv.to_csv(f'state_csvs/{state}.csv', index=False, sep=',')
This successfully creates 51 .csv files named after each state, but only the Alabama.csv has Date, State, and Median Listing Price data for Alabama. Every other .csv only has the following headers with no data:
Date,State,Median Listing Price
Can someone explain to me why this is happening and how to fix it or a better way to do it?
Bonus points: I don't actually need the "State" column in the new .csv files but I'm unsure how to only add Date and Median Listing Price.
Try:
for i in df['State'].unique():
df.loc[df['State'] == i][['Date', 'Median Listing Price']].to_csv(f'state_csvs/{i}.csv', index=False)

Categories