Transform data to growth rates in Python - python

I have two variables and I want to express one of them (monetary_base) in terms of monthly growth.
How can I do that?. In the R language you should first transform the data into time series, in Python is this also the case?
#LLamando a las series que buscamos
inflacion = llamada_api('https://api.estadisticasbcra.com/inflacion_mensual_oficial')
base_monetaria = llamada_api('https://api.estadisticasbcra.com/base')
#Armando DataFrames
df = pd.DataFrame(inflacion)
df_bm = pd.DataFrame(base_monetaria)
#Renombrando columnas
df = df.rename(columns={'d':'Fecha',
'v':'IPC'})
df_bm = df_bm.rename(columns={'d':'Fecha',
'v':'base_monetaria'})
#Arreglando tipo de datos
df['Fecha']=pd.to_datetime(df['Fecha'])
df_bm['Fecha']=pd.to_datetime(df_bm['Fecha'])
#Verificando que las fechas esten en formato date
df['Fecha'].dtype
df_bm['Fecha'].dtype
#Filtrando
df_ipc = df[(df['Fecha'] > '2002-12-31')]
df_bm_filter = df_bm[(df_bm['Fecha'] > '2002-12-31')]
#Graficando
plt.figure(figsize=(14,12))
df_ipc.plot(x = 'Fecha', y = 'IPC')
plt.title('IPC-Mensual', fontdict={'fontsize':20})
plt.ylabel('IPC')
plt.xticks(rotation=45)
plt.show()
The data looks like this
Fecha base_monetaria
1748 2003-01-02 29302
1749 2003-01-03 29360
1750 2003-01-06 29524
1751 2003-01-07 29867
1752 2003-01-08 29957
... ...
5966 2020-02-18 1941302
5967 2020-02-19 1941904
5968 2020-02-20 1887975
5969 2020-02-21 1855477
5970 2020-02-26 1807042
The idea is to take the data for the last day of the month and calculate the growth rate with the data for the last day of the previous month.

You can try something like this
from pandas.tseries.offsets import MonthEnd
import pandas as pd
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': ['32132', '54321', '3213121', '432123', '32132', '54321', '32132', '54321', '3213121', '432123', '32132', '54321']})
df['Fecha'] = df['Fecha'].astype('datetime64[ns]')
df['is_month_end'] = df['Fecha'].dt.is_month_end
df = df[df['is_month_end'] == True]
df.sort_values('Fecha',inplace=True)
df.reset_index(drop=True, inplace = True)
def change(x,y):
try:
index = df[df['Fecha']==y].index.item()
last = df.loc[index-1][1]
return float(x)/float(last)
except:
return 0
df['new_column'] = df.apply(lambda row: change(row['price'],row['Fecha']), axis=1)
df.head(12)

Assuming the base_moetaria is a monthly cumulative value then
df = pd.DataFrame({'Fecha': ['2020-01-31', '2020-02-29', '2020-03-31', '2020-05-31', '2020-04-30', '2020-07-31', '2020-06-30', '2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
'price': [32132, 54321, 3213121, 432123, 32132, 54321, 32132, 54321, 3213121, 432123, 32132, 54321]})
df['Fecha'] = pd.to_datetime(df['Fecha'])
df.set_index('Fecha', inplace=True)
new_df = df.groupby(pd.Grouper(freq="M")).tail(1).reset_index()
new_df['rate'] = (new_df['price'] -new_df['price'].shift(1))/new_df['price'].shift(1)
The new_df['rate'] will give you the growth rate the way you explained in the comment below

The problem can be solve creating a column with the lag values of base_monetaria
df_bm_filter['is_month_end'] = df_bm_filter['Fecha'].dt.is_month_end
df_last_date = df_bm_filter[df_bm_filter['is_month_end'] == True]
df_last_date['base_monetaria_lag'] = df_last_date['base_monetaria'].shift(1)
df_last_date['bm_growth'] = (df_last_date['base_monetaria'] - df_last_date['base_monetaria_lag']) / df_last_date['base_monetaria_lag']

Related

Is there any parameter to remove the time stamp from 'date_input()' on streamlit?

Is there any parameter to format the date when using st.date_input() in streamlit? I want to remove T00:00:00. This is the output:
I have written this code that allows the user to add new data to a DF:
st.sidebar.header("Afegeix una classe")
options_form2 = st.sidebar.form("options_form2")
dataClasse = options_form2.date_input("Data de la classe")
genere = options_form2.selectbox(
"Gènere",
('H', 'D')
)
idCode = options_form2.selectbox(
"ID",
('ABSUDUHM', 'DWHWBMMX', 'MIXEECJR', 'NFKQWKOP', 'RQWLPVCJ')
)
duradaClasse = options_form2.selectbox(
"Durada de la classe",
('1h', '1h 30min', '2h')
)
preu = options_form2.number_input("Preu")
submitButton = options_form2.form_submit_button()
if submitButton:
st.write(dataClasse, genere, idCode, duradaClasse)
newData = {
"Data de la classe": dataClasse,
"Genere": genere,
"ID": idCode,
"Durada de la classe": duradaClasse,
"Preu": preu
}
# Add new data to the data frame
df = df.append(newData, ignore_index=True)
df.to_excel("classes_particulars.xlsx", index=False)
However, the date format added to the DF contains the time, and I don't want it. I just want to add the date.
I have tried with from datetime import date but I am not sure how to implement it the right way.
I don't think there is such a parameter but want you can do is to format your column.
before:
Example:
Note: If the name of your column is not Date make sure you replace my Date index with the name of your column.
import datetime
import streamlit as st
df["Date"] = [
datetime.datetime.strptime(
str(target_date).split(" ")[0], '%Y-%m-%d').date()
for target_date in df["Date"]
]
st.dataframe(df)
After:
Edition:
import streamlit as st
import pandas as pd
import datetime
st.sidebar.header("Afegeix una classe")
options_form2 = st.sidebar.form("options_form2")
dataClasse = options_form2.date_input("Data de la classe")
genere = options_form2.selectbox("Gènere", ('H', 'D'))
idCode = options_form2.selectbox(
"ID", ('ABSUDUHM', 'DWHWBMMX', 'MIXEECJR', 'NFKQWKOP', 'RQWLPVCJ'))
duradaClasse = options_form2.selectbox("Durada de la classe",
('1h', '1h 30min', '2h'))
preu = options_form2.number_input("Preu")
submitButton = options_form2.form_submit_button()
df = pd.DataFrame([])
if submitButton:
st.write(dataClasse, genere, idCode, duradaClasse)
newData = {
"Data de la classe": dataClasse,
"Genere": genere,
"ID": idCode,
"Durada de la classe": duradaClasse,
"Preu": preu
}
# Add new data to the data frame
df = df.append(newData, ignore_index=True)
df["Data de la classe"] = [
datetime.datetime.strptime(str(target_date).split(" ")[0],
'%Y-%m-%d').date()
for target_date in df["Data de la classe"]
]
df.to_excel("classes_particulars.xlsx", index=False)
st.dataframe(df)
Output:

Pandas dataframe: keep only rows depending on actual date and maximum 7 days old

I have a dataframe with articles, here is the first articles:
0 La reprise de l’économie française s’étiole et... Sur le Vieux-Port, à Marseille, le 28 septembr... 2020-10-06
1 Aux Etats-Unis, un rapport parlementaire veut ... Les icones des services de Google, Amazon, Fac... 2020-10-07
2 Les beaux jours de la médiation en entreprise Carnet de bureau. Des entreprises appellent de... 2020-10-07
3 Plan de relance : comment « déterminer mainten... Tribune. Parmi les multiples critiques entendu... 2020-10-07
4 Des lauréats du Nobel qui ne le méritaient pas Chaque automne, depuis plus d’un siècle, le pe... 2015-10-07
I would like to only keep articles that have 7 days maximum from actual date.
Something like this: actual date <= articles <= 7 days old maximum
I have coded this to scrape articles:
%%time
lemonde_title = []
lemonde_content = []
published_date =[]
from newspaper import Article
from newspaper import ArticleException
from datetime import datetime
for art_link in all_urls:
try:
art = Article(art_link)
art.download()
art.parse()
lemonde_title.append(art.title)
lemonde_content.append(art.text)
try:
publish_date = datetime.strptime(str(art.publish_date), '%Y-%m-%d %H:%M:%S').strftime('%Y-%M-%D')
published_date.append(publish_date)
except:
published_date.append('unconverted')
except ArticleException:
pass
I converted the date column like this:
# converting the string to datetime format
df['date'] = pd.to_datetime(df['date'], format='%Y-%M-%D')
And when I try the following code I got an error TypeError: Invalid comparison between dtype=datetime64[ns] and date:
import datetime
date_before = datetime.date.today() - datetime.timedelta(days=7)
df = df[df['date'] >date_before]
df = pd.DataFrame({
'text': ["t1", "t2", "t3"],
'date' : ['2020-10-06', '2020-10-05', '2012-10-06']
})
df['date'] = pd.to_datetime(df['date'])
till = pd.to_datetime(datetime.date.today() - datetime.timedelta(days=7))
df = df[df['date'] >= till]
Output:
text date
0 t1 2020-10-06
1 t2 2020-10-05
use this this works
import datetime
date_before = datetime.date.today() - datetime.timedelta(days=7)
df = df[df['date'] >date_before]
date before can be edited by you in the way you want to.
import datetime as dt
df[(dt.datetime.today()-df.date).apply(lambda x: 0<= x.days <7) ]
This should do the trick!

A list of ticker to get setor and name

import pandas as pd
import datetime as dt
from pandas_datareader import data as web
import yfinance as yf
yf.pdr_override()
filename=r'C:\Users\User\Desktop\from_python\data_from_python.xlsx'
yeah = pd.read_excel(filename, sheet_name='entry')
stock = []
stock = list(yeah['name'])
stock = [ s.replace('\xa0', '') for s in stock if not pd.isna(s) ]
adj_close=pd.DataFrame([])
high_price=pd.DataFrame([])
low_price=pd.DataFrame([])
volume=pd.DataFrame([])
print(stock)
['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT', 'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS', 'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR', 'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL', 'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
for symbol in stock:
adj_close[symbol] = web.get_data_yahoo([symbol],start,end)['Adj Close']
I have a list of tickers, I have got the adj close price, how can get these tickers NAME and SECTORS?
for single ticker I found in web, it can be done like as below
sbux = yf.Ticker("SBUX")
tlry = yf.Ticker("TLRY")
print(sbux.info['sector'])
print(tlry.info['sector'])
How can I make it as a dataframe that I can put the data into excel as I am doing for adj price.
Thanks a lot!
You can try this answer using a package called yahooquery. Disclaimer: I am the author of the package.
from yahooquery import Ticker
import pandas as pd
symbols = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT', 'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS', 'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR', 'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL', 'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
# Create Ticker instance, passing symbols as first argument
# Optional asynchronous argument allows for asynchronous requests
tickers = Ticker(symbols, asynchronous=True)
data = tickers.get_modules("summaryProfile quoteType")
df = pd.DataFrame.from_dict(data).T
# flatten dicts within each column, creating new dataframes
dataframes = [pd.json_normalize([x for x in df[module] if isinstance(x, dict)]) for module in ['summaryProfile', 'quoteType']]
# concat dataframes from previous step
df = pd.concat(dataframes, axis=1)
# View columns
df.columns
Index(['address1', 'address2', 'city', 'state', 'zip', 'country', 'phone',
'fax', 'website', 'industry', 'sector', 'longBusinessSummary',
'fullTimeEmployees', 'companyOfficers', 'maxAge', 'exchange',
'quoteType', 'symbol', 'underlyingSymbol', 'shortName', 'longName',
'firstTradeDateEpochUtc', 'timeZoneFullName', 'timeZoneShortName',
'uuid', 'messageBoardId', 'gmtOffSetMilliseconds', 'maxAge'],
dtype='object')
# Data you're looking for
df[['symbol', 'shortName', 'sector']].head(10)
symbol shortName sector
0 NQZ20.CME Nasdaq 100 Dec 20 NaN
1 ALB Albemarle Corporation Basic Materials
2 AOS A.O. Smith Corporation Industrials
3 ASPN Aspen Aerogels, Inc. Industrials
4 AAU Almaden Minerals, Ltd. Basic Materials
5 ^GSPC S&P 500 NaN
6 ATHM Autohome Inc. Communication Services
7 AQB AquaBounty Technologies, Inc. Consumer Defensive
8 APPS Digital Turbine, Inc. Technology
9 BCYC Bicycle Therapeutics plc Healthcare
It processes stocks and sectors at the same time. However, some stocks do not have a sector, so an error countermeasure is added.
Since the issue column name consists of sector and issue name, we change it to a hierarchical column and update the retrieved data frame. Finally, I save it in CSV format to import it into Excel. I've only tried some of the stocks due to the large number of stocks, so there may be some issues.
import datetime
import pandas as pd
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()
start = "2018-01-01"
end = "2019-01-01"
# symbol = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS', 'AQB', 'ASPN', 'ATHM', 'AZRE', 'BCYC', 'BGNE', 'CAT',
#'CC', 'CLAR', 'CLCT', 'CMBM', 'CMT', 'CRDF', 'CYD', 'DE', 'DKNG', 'EARN', 'EMN', 'FBIO', 'FBRX', 'FCX', 'FLXS',
#'FMC', 'FMCI', 'GME', 'GRVY', 'HAIN', 'HBM', 'HIBB', 'IEX', 'IOR', 'KFS', 'MAXR', 'MPX', 'MRTX', 'NSTG', 'NVCR',
#'NVO', 'OESX', 'PENN', 'PLL', 'PRTK', 'RDY', 'REGI', 'REKR', 'SBE', 'SQM', 'TCON', 'TCS', 'TGB', 'TPTX', 'TRIL',
#'UEC', 'VCEL', 'VOXX', 'WIT', 'WKHS', 'XNCR']
stock = ['^GSPC', 'NQ=F', 'AAU', 'ALB', 'AOS', 'APPS']
adj_close = pd.DataFrame([])
for symbol in stock:
try:
sector = yf.Ticker(symbol).info['sector']
name = yf.Ticker(symbol).info['shortName']
except:
sector = 'None'
name = 'None'
adj_close[sector, symbol] = web.get_data_yahoo(symbol, start=start, end=end)['Adj Close']
idx = pd.MultiIndex.from_tuples(adj_close.columns)
adj_close.columns = idx
adj_close.head()
None Basic Materials Industrials Technology
^GSPC_None NQ=F_None AAU_None ALB_Albemarle Corporation AOS_A.O. Smith Corporation APPS_Digital Turbine, Inc.
2018-01-02 2695.810059 6514.75 1.03 125.321663 58.657742 1.79
2018-01-03 2713.060059 6584.50 1.00 125.569397 59.010468 1.87
2018-01-04 2723.989990 6603.50 0.98 124.073502 59.286930 1.86
2018-01-05 2743.149902 6667.75 1.00 125.502716 60.049587 1.96
2018-01-08 2747.709961 6688.00 0.95 130.962250 60.335583 1.96
# for excel
adj_close.to_csv('stock.csv', sep=',')

pandas.concat() does not fill the columns

I am trying to create dummy data as follows:
import numpy as np
import pandas as pd
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame(0, index=[0], columns=seclist)
for sec in seclist:
# (works fine)
svalue = startvalues[sec].max()
# this creates a random sequency of 84 rows and 1 column (works fine)
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
# does not work
dfHist[sec] = pd.concat([ dfHist[sec] , dfRandom ])
return dfHist
When I print dfHist, it only shows me the first row (as when initiated). Thus nothing has been filled.
Here is an example of the data:
seclist = ['AAPL', 'GOOGL']
# use any number for startvalues
dates = DatetimeIndex(['2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10', '2017-01-11', '2017-01-12',
'2017-01-13', '2017-01-14', '2017-01-15', '2017-01-16',
'2017-01-17', '2017-01-18', '2017-01-19', '2017-01-20',
'2017-01-21', '2017-01-22', '2017-01-23', '2017-01-24',
'2017-01-25', '2017-01-26', '2017-01-27', '2017-01-28',
'2017-01-29', '2017-01-30', '2017-01-31', '2017-02-01',
'2017-02-02', '2017-02-03', '2017-02-04', '2017-02-05',
'2017-02-06', '2017-02-07', '2017-02-08', '2017-02-09',
'2017-02-10', '2017-02-11', '2017-02-12', '2017-02-13',
'2017-02-14', '2017-02-15', '2017-02-16', '2017-02-17',
'2017-02-18', '2017-02-19', '2017-02-20', '2017-02-21',
'2017-02-22', '2017-02-23', '2017-02-24', '2017-02-25',
'2017-02-26', '2017-02-27', '2017-02-28', '2017-03-01',
'2017-03-02', '2017-03-03', '2017-03-04', '2017-03-05',
'2017-03-06', '2017-03-07', '2017-03-08', '2017-03-09',
'2017-03-10', '2017-03-11', '2017-03-12', '2017-03-13',
'2017-03-14', '2017-03-15', '2017-03-16', '2017-03-17',
'2017-03-18', '2017-03-19', '2017-03-20', '2017-03-21',
'2017-03-22', '2017-03-23', '2017-03-24', '2017-03-25',
'2017-03-26', '2017-03-27', '2017-03-28', '2017-03-29'],
dtype='datetime64[ns]', freq='D')
You need to pass axis=1 to concat if you want to concatenate columns. In addition, you don't need to initialize your data frame with data in the beginning (except you want to have the 0 value):
def dummy_historical(seclist, dates, startvalues):
dfHist = pd.DataFrame()
for sec in seclist:
svalue = startvalues[sec].max()
dfRandom = pd.DataFrame(np.random.randint(svalue-10,svalue+10, size=(dates.size, 1 )), index=dates, columns=[sec])
dfHist = pd.concat([ dfHist , dfRandom ], axis=1)
return dfHist
You can even write in a more concise way avoiding concat like:
def generate(sec):
svalue = startvalues[sec].max()
return np.random.randint(svalue-10,svalue+10, size=dates.size)
dfHist = pd.DataFrame({sec: generate(sec) for sec in seclist}, index=dates)

Try to include a column based on input and file name in Pandas Dataframe in Python

I have a several csv files which have the following structure:
Erster Hoch Tief Schlusskurs Stuecke Volumen
Datum
14.02.2017 151.55 152.35 151.05 152.25 110.043 16.687.376
13.02.2017 149.85 152.20 149.25 151.25 415.76 62.835.200
10.02.2017 149.00 150.05 148.65 149.40 473.664 70.746.088
09.02.2017 144.75 148.45 144.35 148.00 642.175 94.348.392
Erster Hoch Tief Schlusskurs Stuecke Volumen
Datum
14.02.2017 111.454 111.776 111.454 111.776 44 4.918
13.02.2017 110.570 110.989 110.570 110.989 122 13.535
10.02.2017 109.796 110.705 109.796 110.705 0 0
09.02.2017 107.993 108.750 107.993 108.750 496 53.933
all are different based on the naming of the file name:
wkn_A1EWWW_historic.csv
wkn_A0YAQA_historic.csv
I want to have the following Output:
Date wkn Open High low Close pieced Volume
14.02.2017 A1EWWW 151.55 152.35 151.05 152.25 110.043 16.687.376
13.02.2017 A1EWWW 149.85 152.20 149.25 151.25 415.76 62.835.200
10.02.2017 A1EWWW 149.00 150.05 148.65 149.40 473.664 70.746.088
09.02.2017 A1EWWW 144.75 148.45 144.35 148.00 642.175 94.348.392
Date wkn Open High low Close pieced Volume
14.02.2017 A0YAQA 111.454 111.776 111.454 111.776 44 4.918
13.02.2017 A0YAQA 110.570 110.989 110.570 110.989 122 13.535
10.02.2017 A0YAQA 109.796 110.705 109.796 110.705 0 0
09.02.2017 A0YAQA 107.993 108.750 107.993 108.750 496 53.933
The code looks like the following:
import pandas as pd
wkn_list_dummy = {'A0YAQA','A1EWWW'}
for w_list in wkn_list_dummy:
url = 'C:/wkn_'+str(w_list)+'_historic.csv'
df = pd.read_csv(url, encoding='cp1252', sep=';', decimal=',', index_col=0)
print(df)
I tried using melt but it was not working.
You can add column by just assigning a value to it:
df['new_column'] = 'string'
All together:
import pandas as pd
wkn_list_dummy = {'A0YAQA','A1EWWW'}
final_df = pd.DataFrame()
for w_list in wkn_list_dummy:
url = 'C:/wkn_'+str(w_list)+'_historic.csv'
df = pd.read_csv(url, encoding='cp1252', sep=';', decimal=',', index_col=0)
df['wkn'] = w_list
final_df = final_df.append(df)
final_df.reset_index(inplace=True)
print(final_df)

Categories