Adapting Pandas Dataframes index with a function - python

I am working with different data sources and storaging each data in a different dataframe. I want to unify those dataframes in a big one but first I need to unify their indexes. Some of the dataframes' index follow the format YYYY-MM-DD, others do with YYYYTNN with n=1,2,3,4 and the last format is YYYYMNN with N from 01 to 12.
They represent the date, the period of 3 months in a year and the period of one month in a year. Mathematically is quite easy to transform all of them to the first format but I was sondering if there is some way to write in Python so that I do not have to change all my data indexes manually. The indexes are just pieces of text so I dont know how could I read the YYYYTN and detect the value of T for example.
Thank you in advance.

I finally managed to solve my issues. In case someone is interested this is what I did:
For those representing a period of 3 months I used datetime and timedelta to create a list of dates from the date where my data began until when it ended taking only 1st of January, April, July and October. The data for the first 3 months of the year was storaged as from the first of January and the other data was storaged the same way. After that I added the list as a new column to the dataframe in order to define it as the index. Here is the code.
def perdelta(start, end, delta):
curr = start
while curr < end:
yield curr
curr += delta
I did not write this code, I got it from another question in this page but I can no longer find it.
Trim20052020=[]
for result in perdelta(date(2005, 1, 1),date(2020,12,1),relativedelta(months=3)):
Trim20052020.append(result)
This one creates the dates set for one data each three months from 2005 to the end of 2020.
df['index']=Trim20052020
df.set_index('index',inplace=True)
And finally defined it as the index.

Related

How can I have the exact date for a certain week of the year? [duplicate]

This question already has answers here:
Pandas: How to create a datetime object from Week and Year?
(4 answers)
Closed 4 months ago.
I have the following data:
week = [202001, 202002, 202003, ..., 202052]
Where the composition of the variable is [year - 4 digits] + [week - 2 digits] (so, the first row means it's the first week of 2020, and so on).
I want to transform this, to a date-time variable [YYYY - MM - DD]. I'm not sure what day could fit in this format :( maybe the first saturday of every week.
week_date = [2020-01-04, 2020-01-11, 2020-01-18, ...]
It seems like a simple sequence, neverthless I have some missings values on the data, so my n < number of weeks of 2020.
The main purpose of this conversion is that I can have a fit model to train in prophet. I also think I need no missing values when incorporating the data into prophet, so maybe the answer could be also adding 0 to my time series?
Any ideas? Thanks
Try:
l = [202001, 202002, 202003, 202052]
out [datetime.datetime.fromisocalendar(int(x[:4]), int(x[4:]), 6).strftime("%Y-%m-%d") for x in map(str,l)]
print(out)
outputs:
['2020-01-04', '2020-01-11', '2020-01-18', '2020-12-26']
Here I used 6 as the week day but chose as you want
This makes a datetime object from the first and last part of each number after mapping them to a string, then outputs a string back with strftime and the right format.

How to get the start and end date for previous X months based on datetime.today() in python?

I want get a list of start and end dates of the current + previous 4 months based on the current date.
Currently I manually update the list every month.
from datetime import datetime
dates = (
["2020-10-01", "2020-10-31"],
["2020-11-01", "2020-11-30"],
["2020-12-01", "2020-12-31"],
["2021-01-01", datetime.today().strftime('%Y-%m-%d')])
How can I can get the same list as above but without manually adjusting the months?
Ideally I want to be able to create lists that go further back than the last 4 months.
Thanks in advance.

Make conditional changes to numerous dates

I'm sure this is really easy to answer but I have only just started using Pandas.
I have a column in my excel file called 'Day' and a Date/time column called 'Date'.
I want to update my Day column with the corresponding day of NUMEROUS Dates from the 'Date' column.
So far I use this code shown below to change the date/time to just date
df['Date'] = pd.to_datetime(df.Date).dt.strftime('%d/%m/%Y')
And then use this code to change the 'Day' column to Tuesday
df.loc[df['Date'] == '02/02/2018', 'Day'] = '2'
(2 signifies the 2nd day of the week)
This works great. The problem is, my excel sheet has 500000+ rows of data and lots of dates. Therefore I need this code to work with numerous dates (4 different dates to be exact)
For example; I have tried this code;
df.loc[df['Date'] == '02/02/2018' + '09/02/2018' + '16/02/2018' + '23/02/2018', 'Day'] = '2'
Which does not give me an error, but does not change the date to 2. I know I could just use the same line of code numerous times and change the date each time...but there must be a way to do it the way I explained? Help would be greatly appreciated :)
2/2/2018 is a Friday so I don't know what "2nd day in a week" mean. Does your week starts on Thursday?
Since you have already converted day to Timestamp, use the dt accessor:
df['Day'] = df['Date'].dt.dayofweek()
Monday is 0 and Sunday = 6. Manipulate that as needed.
If got it right, you want to change the Day column for just a few Dates, right? If so, you can just include these dates in a separated list and do
my_dates = ['02/02/2018', '09/02/2018', '16/02/2018', '23/02/2018']
df.loc[df['Date'].isin(my_dates), 'Day'] = '2'

How to compute SMA for months based on weeks data?

I have a dataframe with 100 Keys(column 1) and 6 months data (from Jan to June in column format like 2019_Jan_Week1,2019_Jan_Week2 etc. till June). Agenda is to forecast for future 3 months (from July to Sep) using Simple Moving Average of last 6 months. For instance, for July Week1 forecast the moving average should be calculated using 2019_Jan_Week1,2019,Feb_Week1,2019_Mar_Week1,2019_Apr_Week1,2019_May_Week1, and 2019_Jun_Week1.
The question is how to effective and speedily compute this operation?
Currently I am using For loop which takes huge amount of time?
I have tried using for loop, but it is taking huge amount of time.
counter=1
for keyIndex in range(0,len(finalForecastingData)):
print(keyIndex)
for forcastingMonthsIndex in range(31,columns):
finalForecastingData.iloc[keyIndex,forcastingMonthsIndex] = finalForecastingData.iloc[keyIndex,counter]+finalForecastingData.iloc[keyIndex,counter+5]+finalForecastingData.iloc[keyIndex,counter+10]+finalForecastingData.iloc[keyIndex,counter+15]+finalForecastingData.iloc[keyIndex,counter+20]
counter = counter+1
counter=1
Welcome to stackoverflow.
You can very easily get a rolling mean with pandas .rolling('60d').mean(). For that you need to convert the time data into pandas Datetime format with pd.to_datetime() and set it as index with set_index().
You should also check out https://stackoverflow.com/help/minimal-reproducible-example. It really helps to give example data. With a sample DataFrame it becomes much easier to give you concrete code rather than a direction where to look.

How can a DataFrame change from having two columns (a "from" datetime and a "to" datetime) to having a single column for a date?

I've got a DataFrame that looks like this:
It has two columns, one of them being a "from" datetime and one of them being a "to" datetime. I would like to change this DataFrame such that it has a single column or index for the date (e.g. 2015-07-06 00:00:00 in datetime form) with the variables of the other columns (like deep) split proportionately into each of the days. How might one approach this problem? I've meddled with groupby tricks and I'm not sure how to proceed.
So I don't have time to work through your specific problem at the moment. But the way to approach this is to us pandas.resample(). Here are the steps I would take. 1) Resample your to date column by minute. 2) Populate the other columns out over that resample. 3) Add the date column back in as an index.
If this doesn't work or is being tricky to work with I would create a date range from your earliest date to your latest date (at the smallest interval you want - so maybe hourly?) and then run some conditional statements over your other columns to fill in the data.
Here is somewhat what your code may look like for the resample portion (replace day with hour or whatever):
drange = pd.date_range('01-01-1970', '01-20-2018', freq='D')
data = data.resample('D').fillna(method='ffill')
data.index.name = 'date'
Hope this helps!

Categories