I try to download key financial ratios from yahoo finance via the FundamentalAnalysis library. It's pretty easy for single I have a df with tickers and names:
Ticker Company
0 A Agilent Technologies Inc.
1 AA ALCOA CORPORATION
2 AAC AAC Holdings Inc
3 AAL AMERICAN AIRLINES GROUP INC
4 AAME Atlantic American Corp.
I then tried to use a for-loop to download the ratios for every ticker with fa.ratios().
for i in range (3):
i = 0
i = i + 1
Ratios = fa.ratios(tickers["Ticker"][i])
So basically it shall download all ratios for one ticker and the second and so on. I also tried to change the df into a list, but it didn't work as well. If I put them in a list manually like:
Symbol = ["TSLA" , "AAPL" , "MSFT"]
it works somehow. But as I want to work with Data from 1000+ Tickers I don't want to type all of them manually into a list.
Maybe this question has already been answered elsewhere, in that case sorry, but I've not been able to find a thread that helps me. Any ideas?
You can get symbols using
symbols = df['Ticker'].to_list()
and then you could use for-loop without range()
ratios = dict()
for s in symbols:
ratios[s] = fa.ratios(s)
print(ratios)
Because some symbols may not give ratios so you should use try/except
Minimal working example. I use io.StringIO only to simulate file.
import FundamentalAnalysis as fa
import pandas as pd
import io
text='''Ticker Company
A Agilent Technologies Inc.
AA ALCOA CORPORATION
AAC AAC Holdings Inc
AAL AMERICAN AIRLINES GROUP INC
AAME Atlantic American Corp.'''
df = pd.read_csv(io.StringIO(text), sep='\s{2,}')
symbols = df['Ticker'].to_list()
#symbols = ["TSLA" , "AAPL" , "MSFT"]
print(symbols)
ratios = dict()
for s in symbols:
try:
ratios[s] = fa.ratios(s)
except Exception as ex:
print(s, ex)
for s, ratio in ratios.items():
print(s, ratio)
EDIT: it seems fa.ratios() returns DataFrames and if you will keep them on list then you can concatenate all DataFrames to one DataFrame
ratios = list() # list instead of dictionary
for s in symbols:
try:
ratios.append(fa.ratios(s)) # append to list
except Exception as ex:
print(s, ex)
df = pd.concat(ratios, axis=1) # convert list of DataFrames to one DataFrame
print(df.columns)
print(df)
Doc: pandas.concat()
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 10 months ago.
Improve this question
I'm using a for loop to get a particular financial data (around 800 sets) from yfinance.
But the running time for this execution is over an hour!
This is just a small part of my whole project.
How to reduce the execution time?
for loop code
==========================================
yfinance doesn't appear to have a native way to parallel download institutional holders, so we can work our own with pandarellel~
A parallelized version:
import yfinance as yf
import pandas as pd
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=16, progress_bar=True)
t = yf.Tickers([x for x in tqdm(symbol['Symbol'])])
def get_holders(x):
try:
return x.institutional_holders.head()
except:
pass
call = pd.Series(t.tickers).parallel_apply(get_holders)
df = pd.concat(call.to_dict())
df
(text Output Same as below)
Here, Downloading the whole S&P 500 in 4 minutes:
import yfinance as yf
# symbols = [x for x in tqdm(symbol['Symbol'])]
symbols = ['AAPL', 'MSFT'] # Testing only, Swap for yours
tickers = yf.Tickers(' '.join(symbols))
institutional_holders = {x:tickers.tickers[x].institutional_holders.head() for x in symbols}
df = pd.concat(institutional_holders)
print(df)
Output:
Holder Shares Date Reported % Out Value
AAPL 0 Vanguard Group, Inc. (The) 1261261357 2021-12-30 0.0779 223962179162
1 Blackrock Inc. 1019810291 2021-12-30 0.0630 181087713372
2 Berkshire Hathaway, Inc 887135554 2021-12-30 0.0548 157528660323
3 State Street Corporation 633115246 2021-12-30 0.0391 112422274232
4 FMR, LLC 352204129 2021-12-30 0.0218 62540887186
MSFT 0 Vanguard Group, Inc. (The) 615950062 2021-12-30 0.0824 207156324851
1 Blackrock Inc. 519035634 2021-12-30 0.0694 174562064426
2 State Street Corporation 302541869 2021-12-30 0.0405 101750881382
3 FMR, LLC 215377233 2021-12-30 0.0288 72435671002
4 Price (T.Rowe) Associates Inc 204196901 2021-12-30 0.0273 68675501744
How many columns do you have? Instead of creating dataframes for every iteration, I'd use a dictionary to store the column values, append the dictionary in a list. And after the loop, make a dataframe from the list of dictionary.
d = []
for i in tqdm(symbol['Symbol']:
dict_store = {}
try:
dict_store['col_1'] = value1
dict_store['col_2'] = value2
dict_store['col_3'] = value3
except:
dict_store['col_1'] = ''
dict_store['col_2'] = ''
dict_store['col_3'] = ''
d.append(dict_store)
if the list is d, holding multiple dictionaries, then:
df = pd.DataFrame(d)
Python 3.9.5/Pandas 1.1.3
I have a very large csv file with values that look like:
Ac\\Nme Products Inc.
and all the values are different company names with double backslashes in random places throughout.
I'm attempting to get rid of all the double backslashes. It's not working in Pandas. But a simple test against the standalone value just using string.replace does work.
Example:
org = "Ac\\Nme Products Inc."
result = org.replace("\\","")
print(result)
returns AcNme Products Inc. as the output, as I would expect.
However, using Pandas with the names in a csv file:
import pandas as pd
csv_input = pd.read_csv('/Users/me/file.csv')
csv_input.replace("\\", "")
csv_input.to_csv('/Users/me/file_revised.csv', index=False)
When I open the new file_revised.csv file, the value still shows as Ac\\Nme Products Inc.
EDIT 1:
Here is a snippet of file.csv as requested:
id,company_name,address,country 1000566,A1 Comm\\Nodity Traders,LEVEL 28 THREE PACIFIC PLACE 1 QUEEN'S RD EAST HK,TH 1000579,"A2 A Mf\\g. Co., Ltd.",53 YONG-AN 2ND ST. TAINAN TAIWAN,CA 1000585,"A2 Z Logisitcs Indi\\Na Pvt., Ltd.",114A/1 1ST FLOOR SOUTH RAJA ST TUTICORIN - 628 001 TAMILNADU - INDIA,PE
Pandas doesn't have a dataframe level string operation, but it can be updated per-column:
for col in csv_input.columns:
if col == 'that_int_column':
continue
csv_input[col] = csv_input[col].str.replace(r"\\N", "")
I am attempting to load multiple (hundreds) of spreadsheets into one dataframe. The problem is these spreadsheets are located in different folders/paths. I am hoping to iterate through a central spreadsheet that lists all of the specific paths (each spreadsheet contains a tab named "Test" that I am hoping to pull, this tab has the same structure/layout across all spreadsheets) but am having some issues.
I have listed everything that might be helpful below, any insight would be greatly appreciated!
Existing Code Problems:
I receive a TypeError: cannot concatenate object of type class numpy.ndarray specific to the row where I am using concat(df.values) below
I would like to add a column that lists the "Identifier" value for each spreadsheet in the aggregated dataframe (so that I can group by specific company later on)
Current Code:
df_0 = pd.read_excel(r'PATH TO CENTRAL SPREADSHEET')
list_of_paths = df_0['Path'].tolist()
all_data = pd.DataFrame()
for itr in range(len(list_of_paths)):
df = pd.read_excel(list_of_paths[itr], sheet_name="Test", ignore_index=True)
cdf = pd.concat(df.values)
all_data = all_data.append(cdf,ignore_index=True)
Central Spreadsheet:
Identifier Path
AAPL PATH TO UNDERLYING AAPL FILE
GOOG PATH TO UNDERLYING GOOG FILE
Example of Underlying File ("Test" tab) Structure
Metric 2018 2017
Revenue 2mm 3mm
Expense 1mm 2mm
Desired Output
Metric Ticker 2018 2017
Revenue AAPL 2mm 3mm
Revenue GOOG 5mm 8mm
Expense AAPL 1mm 2mm
Expense GOOG 4mm 6mm
Doing it in steps:
Aim: load the spreadsheets into a list of df's
df_0 = pd.read_excel(r'PATH TO CENTRAL SPREADSHEET')
dict_of_paths = {}
for i,j in df_0.iterrows():
dict_of_paths[j['Identifier']] = j['Path']
df_list = []
for key in dict_of_paths.keys():
df = pd.read_excel(dict_of_paths[key], sheet_name="Test")
df['ticker'] = key
df_list.append(df)
Now all the df's are in the df_list
mdf = pd.concat(df_list,ignore_index=True)
As long as the columns are the same. This should work.
this is my csv excel file information:
Receipt merchant Address Date Time Total price
25007 A ABC pte ltd 3/7/2016 10:40 12.30
25008 A ABC ptd ltd 3/7/2016 11.30 6.70
25009 B CCC ptd ltd 4/7/2016 07.35 23.40
25010 A ABC pte ltd 4/7/2016 12:40 9.90
how is it possible to add the 'Total Price' of each line together only if they belong to the same 'merchant', 'date' and 'time' then grouping them together in a list or dict, example: {['A','3/7/2016', '19.0'], ['A',4/7/2016, '9.90'],..}
My previous code does what i wanted except that i lack the code to count the total price for each same date and merchant.
from collections import defaultdict
from csv import reader
with open("assignment_info.csv") as f:
next(f)
group_dict = defaultdict(list)
for rec, name, _, dte, time, price in reader(f):
group_dict[name, dte].extend(time)
for v in group_dict.values():v.sort()
from pprint import pprint as pp
print 'Sales tracker:'
pp(dict(group_dict))
import pandas as pd
df = pd.read_csv('assignment_info.csv')
df = df.groupby(['merchant', 'Date', 'Time']).sum().reset_index()
df
As the other answer points out, pandas is an excellent library for this kind of data manipulation. My answer won't use pandas though.
A few issues:
In your problem description, you state that you want to group by three columns, but in your example cases you are only grouping by two. Since the former makes more sense, I am only grouping by name and date
You are looping and sorting each value, but for the life of me I can't figure out why.
You declare the default type of the defaultdict a list and then extend with a string, which ends up giving you a (sorted!) list of characters. You don't really want to do this.
Your example uses the syntax of a set: { [a,b,c], [d,e,f] } but the syntax of a dict makes more sense: { (a, b): c, }. I have changed the output to the latter.
Here is a working example:
from collections import defaultdict
from csv import reader
with open("assignment_info.csv") as f:
next(f)
group_dict = defaultdict(float)
for rec, name, _, dte, time, price in reader(f):
group_dict[name, dte] += float(price)
group_dict is now:
{('A', '3/7/2016'): 19.0, ('A', '4/7/2016'): 9.9, ('B', '4/7/2016'): 23.4}
I removed extra columns which aren't in your example: here's the file I worked with:
Receipt,merchant,Address,Date,Time,Total price
25007,A,ABC pte ltd,3/7/2016,10:40,12.30
25008,A,ABC ptd ltd,3/7/2016,11.30,6.70
25009,B,CCC ptd ltd,4/7/2016,07.35,23.40
25010,A,ABC pte ltd,4/7/2016,12:40,9.90
I have a data frame df with a column name - Company. Few examples of the company names are: ABC Inc., XYZ Gmbh, PQR Ltd, JKL Limited etc. I want a list of all the suffixes (Inc.,Gmbh, Ltd., Limited etc). Please notice that suffix length is always different. There might be companies without any suffix, for example: Apple. I need a complete list of all suffixes from the all the company names, keeping only unique suffixes in the list.
How do I accomplish this task?
try this:
In [36]: df
Out[36]:
Company
0 Google
1 Apple Inc
2 Microsoft Inc
3 ABC Inc.
4 XYZ Gmbh
5 PQR Ltd
6 JKL Limited
In [37]: df.Company.str.extract(r'\s+([^\s]+$)', expand=False).dropna().unique()
Out[37]: array(['Inc', 'Inc.', 'Gmbh', 'Ltd', 'Limited'], dtype=object)
or ignoring punctuation:
In [38]: import string
In [39]: df.Company.str.replace('['+string.punctuation+']+','')
Out[39]:
0 Google
1 Apple Inc
2 Microsoft Inc
3 ABC Inc
4 XYZ Gmbh
5 PQR Ltd
6 JKL Limited
Name: Company, dtype: object
In [40]: df.Company.str.replace('['+string.punctuation+']+','').str.extract(r'\s+([^\s]+$)', expand=False).dropna().unique()
Out[40]: array(['Inc', 'Gmbh', 'Ltd', 'Limited'], dtype=object)
export result into Excel file:
data = df.Company.str.replace('['+string.punctuation+']+','').str.extract(r'\s+([^\s]+$)', expand=False).dropna().unique()
res = pd.DataFrame(data, columns=['Comp_suffix'])
res.to_excel(r'/path/to/file.xlsx', index=False)
You can use cleanco Python library for that, it has a list of all possible suffixes inside. E.g. it contains all the examples you provided (Inc, Gmbh, Ltd, Limited).
So you can take the suffixes from the library and use them as a dictionary to search in your data, e.g.:
import pandas as pd
company_names = pd.Series(["Apple", "ABS LLC", "Animusoft Corp", "A GMBH"])
suffixes = ["llc", "corp", "abc"] # take from cleanco source code
found = [any(company_names.map(lambda x: x.lower().endswith(' ' + suffix))) for suffix in suffixes]
suffixes_found = [suffix for (suffix, suffix_found) in zip(suffixes, found) if suffix_found]
print suffixes_found # outputs ['llc', 'corp']
So you want the last word of the Company name, assuming the company has a name more than one word long?
set(name_list[-1] for name_list in map(str.split, company_names) if len(name_list) > 1)
The [-1] gets the last word. str.split splits on spaces. I've never used pandas, so getting company_names might be the hard part of this.
This only adds the suffixes when the company name has more than one word as you required.
company_names = ["Apple", "ABS LLC", "Animusoft Corp"]
suffixes = [name.split()[-1] for name in company_names if len(name.split()) > 1]
Now having into account that this doesn't cover the unique requirement.
This doesn't cover that you can have a company named like "Be Smart" and "Smart" is not a suffix but part of the name. However this takes care of the unique requirement:
company_names = ["Apple", "ABS LLC", "Animusoft Corp", "BBC Corp"]
suffixes = []
for name in company_names:
if len(name.split()) > 1 and name.split()[-1] not in suffixes:
suffixes.append(name.split()[-1])