How to create DataFrames using Cycle? - python

I just want to create DataFrames named by companies containing Financial quotes of stocks using cycle and dict:
financials = {'jp_morgan' : 'JPM', 'bank_of_amerika' : 'BAC', 'credit_suisse' : 'CS', 'visa' :'V',\
'mastercard' : 'MA', 'morgan_stanley' : 'MS', 'citigroup' : 'C', 'wells_fargo' : 'WFC',\
'blackrock' : 'CLOA', 'goldman_sachs' : 'GS'}
for i in financials:
i = yf.download(financials[i],'2016-01-01','2019-08-01')
I want to get dataframes

Related

How to reduce read/write to excel time in python pandas

I have 640,000 rows of data in excel.
I want to append some rows to the data so I used
pd.read_excel and pd.concat([excel, some_data]).
After that, I used df.to_excel() to write back to excel.
But, read_excel is taking a long time, about 3 minutes, and to_excel is too.
How can I fix it?
def update_mecab(new_word_list):
user_dicpath='C:\\mecab\\user-dic\\custom.csv'
dictionary=pd.read_excel('./first_dictionary.xlsx')
dictionary=pd.concat([dictionary, new_word_list])
part_names= {
'일반 명사' : 'NNG',
'고유 명사' : 'NNP',
'의존 명사' : 'NNB',
'수사' : 'NR',
'대명사' : 'NP',
'동사' : 'VV',
'형용사' : 'VA',
'보조 용언' : 'VX',
'관형사' : 'MM',
'일반 부사' : 'MAG',
'접속 부사' : 'MAJ',
'감탄사' : 'IC'
}
new_word_pt=new_word_list.replace({"part":part_names})
user_dict=open(user_dicpath, 'a', encoding="UTF-8")
for index, item in new_word_pt.iterrows() :
custom_word=item['word']+',*,*,*,'+item['part']+',*,T,'+item['word']+',*,*,*,*,*\n'
user_dict.write(custom_word)
user_dict.close()
del user_dict
dictionary=dictionary.reset_index()
dictionary=dictionary[['word', 'part']]
dictionary.to_excel('first_dictionary.xlsx', sheet_name = "Sheet_1", index=None)
subprocess.call("powershell C:\\mecab\\add-userdic-win.ps1")

Merge multiple rows into one with additional columns in pandas

Say I have the dataframe below:
import pandas as pd
rankings = {'Team': ['A', 'A', 'B',
'B', 'C', 'C', 'C'],
'Description': ['Aggressive', 'Strong', 'Passive',
'Strong', 'Bad', 'Loser Streak', 'Injured'],
'Description Continued': ['Aggressive ...', 'Strong ...', 'Passive ...',
'Strong ...', 'Bad ...', 'Loser Streak ...', 'Injured ...']}
rankings_pd = pd.DataFrame(rankings)
There are multiple rows for each team. I want to have one row against each team and add additional columns in the dataframe to contain the extra information. This is the desired output:
How can I achieve this?
How to pivot a dataframe?
This doesn't work for multiple columns as in the example above.
The key is to create numeric index per group before pivoting
rankings_pd["count"] = rankings_pd.groupby("Team").cumcount() + 1
df = rankings_pd.pivot(
index="Team", columns="count", values=["Description", "Description Continued"]
)
df.columns = [f"{x}{y}" for x, y in df.columns]

How do I add values to a list stored as a dictionary value?

I have an empty dictionary, and need to pull industry info base on ticker symbols. I would then like to add all tickers under the same industry in a list with the industry as the key.
For example, the end would look something like the below:
{'technology': ['AAPL', 'ADBE'], 'Consumer Cyclical': ['TSLA', 'UA']}
Here is what I've been working on with no success:
import yfinance as yf
tickers = ['AAPL', 'ADBE', 'AMD', 'AMAT', 'AMZN', 'ANF',
'APA', 'BA', 'BABA', 'BBY', 'BIDU', 'BMY', 'BRX', 'BZUN',
'C', 'CAT', 'CLF', 'CMCSA', 'CMG', 'COST', 'CRM', 'CVX',
'DE', 'EBAY', 'FB', 'FCX', 'FDX', 'FSLR',
'GILD', 'GM', 'GME', 'GOOG','GPRO', 'GS', 'HAL', 'HD',
'HIG', 'HON', 'IBM', 'JCPB', 'JD', 'JPM', 'LULU', 'LYG',
'MA', 'MCD', 'MDT', 'MS', 'MSFT','MU', 'NEM', 'NFLX',
'NKE','PBR', 'QCOM', 'SLB', 'SNAP', 'SPG', 'TSLA', 'TWTR',
'TXN', 'UA', 'UAL', 'V', 'VZ' 'X', 'XLNX', 'ZM']
sector_dict = dict()
for ticker in tickers:
try:
sector = yf.Ticker(ticker).info['sector']
sector_dict[sector].update(ticker)
except:
sector_dict.update({'no sector':[ticker]})
The below just gives me an empty dictionary. Does anybody see where the issue is?
Assuming the information you need is returned from the API call - the code below may work for you.
import yfinance as yf
from collections import defaultdict
tickers = ['AAPL','ADBE']
sector_dict = defaultdict(list)
for ticker in tickers:
try:
sector_dict[yf.Ticker(ticker).info['sector']].append(ticker)
except Exception as e:
print(f'Failed to get ticker info for {ticker}')
print(sector_dict)
output
defaultdict(<class 'list'>, {'Technology': ['AAPL', 'ADBE']})
You should always avoid catch-all exceptions.
Your original example was masking the fact that update isn't a list method.
When you subscript a python dictionary like sector_dict[ticker], we're now talking about the value associated with the ticker key. In this case a list.
Also update isn't used like that, so I think it was masking a second error. It's usage is to update a dictionary with another dictionary or an iterable. Not to update an existing entry.
Finally, the try clause should be as small as possible, in order to be sure where the error is coming from or at least you can guarantee there won't be conflicting exceptions such as this case.
I think that's why your list is returning with only the last ticker in my previous solution, as yf.Ticker causes a KeyError and the KeyError exception gets called instead of the last one.
Here's how I'd do it:
sector_dict = {'no sector':[]}
for ticker in tickers:
try:
sector = yf.Ticker(ticker).info['sector']
except KeyError:
sector_dict['no sector'].append(ticker)
try:
sector_dict[sector].append(ticker)
except KeyError:
sector_dict[sector] = [ticker]

How to populate multiple dictionary with common keys to pandas dataframe?

I have a list of dictionaries where keys are identical but values in each dictionary is not same, and the order of each dictionary strictly preserved. I am trying to find an automatic solution to populate these dictionaries to pandas dataframe as new column, but didn't get the expected output.
original data on gist
here is the data that I have on old data on gist.
my attempt
here is my attempt to populate multiple dictionaries with same keys but different values (binary value), my goal is I want to write down handy function to vectorize the code. Here is my inefficient code but works on gist
import pandas as pd
dat= pd.read_csv('old_data.csv', encoding='utf-8')
dat['type']=dat['code'].astype(str).map(typ)
dat['anim']=dat['code'].astype(str).map(anim)
dat['bovin'] = dat['code'].astype(str).map(bov)
dat['catg'] = dat['code'].astype(str).map(cat)
dat['foot'] = dat['code'].astype(str).map(foo)
my code works but it is not vectorized (not efficient I think). I am wondering how can I make this few lines of a simple function. Any idea? how to we make this happen as efficiently as possible?
Here is my current and the desired output:
since I got correct output but code is not well efficient here. this is my current output on gist
If you restructure your dictionaries into a dictionary of dictionaries you can one line it:
for keys in values.keys():
dat[keys]=dat['code'].astype(str).map(values[keys])
Full code:
values = {"typ" :{
'20230' : 'A',
'20130' : 'A',
'20220' : 'A',
'20120' : 'A',
'20329' : 'A',
'20322' : 'A',
'20321' : 'B',
'20110' : 'B',
'20210' : 'B',
'20311' : 'B'
} ,
"anim" :{
'20230' : 'AOB',
'20130' : 'AOB',
'20220' : 'AOB',
'20120' : 'AOB',
'20329' : 'AOC',
'20322' : 'AOC',
'20321' : 'AOC',
'20110' : 'AOB',
'20210' : 'AOB',
'20311' : 'AOC'
} ,
"bov" :{
'20230' : 'AOD',
'20130' : 'AOD',
'20220' : 'AOD',
'20120' : 'AOD',
'20329' : 'AOE',
'20322' : 'AOE',
'20321' : 'AOE',
'20110' : 'AOD',
'20210' : 'AOD',
'20311' : 'AOE'
} ,
"cat" :{
'20230' : 'AOF',
'20130' : 'AOG',
'20220' : 'AOF',
'20120' : 'AOG',
'20329' : 'AOF',
'20322' : 'AOF',
'20321' : 'AOF',
'20110' : 'AOG',
'20210' : 'AOF',
'20311' : 'AOG'
} ,
"foo" :{
'20230' : 'AOL',
'20130' : 'AOL',
'20220' : 'AOM',
'20120' : 'AOM',
'20329' : 'AOL',
'20322' : 'AOM',
'20321' : 'AOM',
'20110' : 'AOM',
'20210' : 'AOM',
'20311' : 'AOM'
}
}
import pandas as pd
dat= pd.read_csv('old_data.csv', encoding='utf-8')
for keys in values.keys():
dat[keys]=dat['code'].astype(str).map(values[keys])

Same DataFrame.reindex code - different output

Good afternoon everyone,
I want to filter out from a DataFrame the columns that I am not interested in.
To do that - and since the columns could change based on user input (that I will not show here) - I am using the following code within my offshore_filter function:
# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers
import datetime as d
import pandas as pd
COUNTRIES = [
'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
'TR', 'UA', 'UK', 'XK'
YEARS = list(range(2005, int(d.datetime.now().year)))
def offshore_filter(df, countries=COUNTRIES, years=YEARS):
# This function is specific for filtering out the countries
# and the years not needed in the analysis
# Filter out all of the countries not of interest
df.drop(df[~df['country'].isin(countries)].index, inplace=True)
# Filter out all of the years not of interest
columns_to_keep = ['country', 'country_name'] + [i for i in years]
temp = df.reindex(columns=columns_to_keep)
df = temp # This step to avoid the copy vs view complication
return df
When I pass a years list of integers, the code works well and filters the DataFrame by taking only the columns in the years list.
However, if the DataFrame's column headers are strings (e.g. '2018' instead of 2018), changing [i for i in years] into [str(i) for i in years] doesn't work, and I have columns of Nan's (as the reindex documentation states).
Can you help me spot me why?

Categories