CPI data in python, create a dataframe - python

I am fairly new to python and I am trying to create a data frame with all the CPI data in it.
Nonetheless I do not manage to increase the month in the function.
Can anybody help?
\>import cpi
\>from datetime import date
\>cpi.get(date(2022, 11, 1))
\>n = 12
I suppose it should be something like this:
for i in range(n):
cpi.get(date(2022,n+1,1)
But no idea how to increase this and put into a dataframe.

For a single year's worth of data:
import cpi
import datetime as dt
import pandas as pd
date_list = []
cpi_list = []
for i in range(12):
date_list.append(dt.date(2022, i + 1, 1)) # i + 1 to avoid 0 value
cpi_list.append(cpi.get(date_list[i]))
df = pd.DataFrame({'Date': date_list, 'CPI': cpi_list})
Hope that helps,
--K

Related

Choosing a date randomly in a period?

I want to randomly choose a date from 2021/1/1 to 2021/12/31, the process might include as follows:
generate a date list from 2021/1/1 to 2021/12/31, totally 365 elements;
randomly choose a date from the list.
Thanks!
As you tagged the question pandas, here is a pandas way:
out = (pd.date_range('2021/1/1', '2021/12/31') # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
variant, setting the start date and number of days
out = (pd.date_range('2021/1/1', periods=365) # random dates
.strftime('%Y/%m/%d') # format as string
.to_series().sample(n=1) # select 1 random date
.squeeze() # compress output
)
example output: '2021/10/09'
list of random dates
You can easily adapt to generate several dates:
out = (pd
.date_range('2021/1/1', periods=365)
.strftime('%Y/%m/%d').to_series()
.sample(n=10)
.to_list()
)
example output:
['2021/04/06', '2021/09/11', '2021/08/02', '2021/09/17', '2021/12/30',
'2021/10/27', '2021/03/09', '2021/02/27', '2021/11/28', '2021/01/18']
Here's another way, using random between to epoch dates:
import pandas as pd
import numpy as np
date1 = pd.Timestamp("2021/01/01")
date2 = pd.Timestamp("2021/12/31")
print(date1.timestamp())
print(date2.timestamp())
n = 3 # How many samples to take
out = pd.to_datetime(
np.random.randint(date1.timestamp(), date2.timestamp(), n), unit="s"
).normalize()
print(out)
Output:
1609459200.0
1640908800.0
DatetimeIndex(['2021-04-13', '2021-01-17', '2021-08-24'], dtype='datetime64[ns]', freq=None)
import datetime
from datetime import date, timedelta
from random import sample
start = date(2021, 1, 1)
end = date(2021, 12, 31)
dates = []
day = start
while day <= end:
dates.append(day)
day = day + datetime.timedelta(days=1)
sample(dates, 1)

Converting days to years in Pandas DataFrame

I am trying to find the difference between 2 dates in a Pandas DataFrame this is my code:
raw['CALCULATED_AGE'] = ((raw.COMMENCEMENT_DATE - raw.DATE_OF_BIRTH))
this gives me the following output:
Pandas Output Column
I just want to convert the days to years, any easy way to do this ?
Thank you so much
You can use "relativedelta" and match it to your case:
from dateutil.relativedelta import relativedelta
rdelta = relativedelta(raw.COMMENCEMENT_DATE,raw.DATE_OF_BIRTH).years
Full code example:
create the data:
import pandas as pd
from dateutil.relativedelta import relativedelta
raw = pd.DataFrame({'COMMENCEMENT_DATE': ['3/10/2000', '3/11/2000', '3/12/2000'],
'DATE_OF_BIRTH': ['3/10/1990', '3/11/1991', '3/12/1990']})
raw['COMMENCEMENT_DATE'] = pd.to_datetime(raw['COMMENCEMENT_DATE'])
raw['DATE_OF_BIRTH'] = pd.to_datetime(raw['DATE_OF_BIRTH'])
Calc:
raw['CALCULATED_AGE'] = raw.apply(lambda x: relativedelta(x.COMMENCEMENT_DATE, x.DATE_OF_BIRTH).years, axis=1)
Output:
COMMENCEMENT_DATE DATE_OF_BIRTH CALCULATED_AGE
0 2000-03-10 1990-03-10 10
1 2000-03-11 1991-03-11 9
2 2000-03-12 1990-03-12 10
EDIT
Another solution works also for months:
raw['CALCULATED_AGE'] = (raw.COMMENCEMENT_DATE - raw.DATE_OF_BIRTH)/np.timedelta64(1, 'Y')
raw['CALCULATED_AGE'] = raw['CALCULATED_AGE'].astype(int)
If you want calc for months just change 'Y' to 'M'.

Am i doing something wrong with the loops?

I am using python to do some data cleaning and i've used the datetime module to split date time and tried to create another column with just the time.
My script works but it just takes the last value of the data frame.
Here is the code:
import datetime
i = 0
for index, row in df.iterrows():
date = datetime.datetime.strptime(df.iloc[i, 0], "%Y-%m-%dT%H:%M:%SZ")
df['minutes'] = date.minute
i = i + 1
This is the dataframe :
Output
df['minutes'] = date.minute reassigns the entire 'minutes' column with the scalar value date.minute from the last iteration.
You don't need a loop, as 99% of the cases when using pandas.
You can use vectorized assignment, just replace 'source_column_name' with the name of the column with the source data.
df['minutes'] = pd.to_datetime(df['source_column_name'], format='%Y-%m-%dT%H:%M:%SZ').dt.minute
It is also most likely that you won't need to specify format as pd.to_datetime is fairly smart.
Quick example:
df = pd.DataFrame({'a': ['2020.1.13', '2019.1.13']})
df['year'] = pd.to_datetime(df['a']).dt.year
print(df)
outputs
a year
0 2020.1.13 2020
1 2019.1.13 2019
Seems like you're trying to get the time column from the datetime which is in string format. That's what I understood from your post.
Could you give this a shot?
from datetime import datetime
import pandas as pd
def get_time(date_cell):
dt = datetime.strptime(date_cell, "%Y-%m-%dT%H:%M:%SZ")
return datetime.strftime(dt, "%H:%M:%SZ")
df['time'] = df['date_time'].apply(get_time)

pandas - get a dataframe for every day

I have a DataFrame with dates in the index. I make a Subset of the DataFrame for every Day. Is there any way to write a function or a loop to generate these steps automatically?
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import datetime as dt
#Get the channel feeds from Thinkspeak
response = requests.get("https://api.thingspeak.com/channels/518038/feeds.json?api_key=XXXXXX&results=500")
#Convert Json object to Python object
response_data = response.json()
channel_head = response_data["channel"]
channel_bottom = response_data["feeds"]
#Create DataFrame with Pandas
df = pd.DataFrame(channel_bottom)
#rename Parameters
df = df.rename(columns={"field1":"PM 2.5","field2":"PM 10"})
#Drop all entrys with at least on nan
df = df.dropna(how="any")
#Convert time to datetime object
df["created_at"] = df["created_at"].apply(lambda x:dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%SZ"))
#Set dates as Index
df = df.set_index(keys="created_at")
#Make a DataFrame for every day
df_2018_12_07 = df.loc['2018-12-07']
df_2018_12_06 = df.loc['2018-12-06']
df_2018_12_05 = df.loc['2018-12-05']
df_2018_12_04 = df.loc['2018-12-04']
df_2018_12_03 = df.loc['2018-12-03']
df_2018_12_02 = df.loc['2018-12-02']
Supposing that you do that on the first day of next week (so, exporting monday to sunday next monday, you can do that as follows:
from datetime import date, timedelta
day = date.today() - timedelta(days=7) # so, if today is monday, we start monday before
df = df.loc[today]
while day < today:
df1 = df.loc[str(day)]
df1.to_csv('mypath'+str(day)+'.csv') #so that export files have different names
day = day+ timedelta(days=1)
you can use:
from datetime import date
today = str(date.today())
df = df.loc[today]
and schedule the script using any scheduler such as crontab.
You can create dictionary of DataFrames - then select by keys for DataFrame:
dfs = dict(tuple(df.groupby(df.index.strftime('%Y-%m-%d'))))
print (dfs['2018-12-07'])

Selecting specific date from pandas data-frame

From the daily stock price data, I want to sample and select end of the month price. I am accomplishing using the following code.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin=end-pd.DateOffset(365*2)
st=begin.strftime('%Y-%m-%d')
ed=end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2])).set_index(data.index)
The line above selects end of the month data and here is the output.
If I want to select penultimate value of the month, I can do it using the following code.
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2]))
Here is the output.
However the index shows end of the month value. When I choose penultimate value of the month, I want index to be 2015-12-30 instead of 2015-12-31.
Please suggest the way forward. I hope my question is clear.
Thanking you in anticipation.
Regards,
Abhishek
I am not sure if there is a way to do it with resample. But, you can get what you want using groupby and TimeGrouper.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin = end - pd.DateOffset(365*2)
st = begin.strftime('%Y-%m-%d')
ed = end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
data['Date'] = data.index
mon_data = (
data[['Date', 'Adj Close']]
.groupby(pd.TimeGrouper(freq='M')).nth(-2)
.set_index('Date')
)
simplest solution is to take the index of your newly created dataframe and subtract the number of days you want to go back:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-1-n]))
mon_data.index = mon_data.index - datetime.timedelta(days=n)
also, seeing your data, i think that you should resample not to ' month end frequency' but rather to 'business month end frequency':
.resample('BM')
but even that won't cover it all, because for instance December 29, 2017 is a business month end, but this date doesn't appear in your data (which ends in December 08 2017). so you could add a small fix to that (assuming the original data is sorted by the date):
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
so, the full code will look like:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('BM').apply(lambda x: x[-1-n]))
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
mon_data.index = mon_data.index - datetime.timedelta(days=n)
btw: your .set_index(data.index) throw an error because data and mon_data are in different dimensions (mon_data is monthly grouped_by)

Categories