I am running a script (script 1) to create an empty data frame which is populated by another script (script 2). The index in this empty data frame is a time series of 30 minute intervals across 365 days, beginning 1st October 2016. To create this time series index, Script 1 contains the following code:
time_period_start = dt.datetime(2016,01,10).strftime("%Y-%d-%m")
index = pd.date_range(time_period_start, freq='30min', periods=17520)
Script 2 pulls data out of a CSV file, containing values across a time series. The plan is for this script to put this data into a dataframe, and then merge this with the dataframe created in Script 1.
The problem I am having is that the format of the dates in the dataframe created in Script 2 is Y-D-M, which is what comes out of the CSV files. However, the format of the dates in the dataframe created in Script 1 is Y-M-D, which causes incorrect results when I try to merge. This is even despite my use of ".strftime("%Y-%d-%m")" in the first line of code above. Is there any way of amending the second line of code so that the output dataframe is in Y-D-M?
.strftime() isn't affecting the final dataframe, since pd.date_range transforms it back into a datetime anyway. Instead of trying to match on strings, you should convert the dates in the second dataframe (the one created by Script 2) to datetime as well.
df2.date = pd.to_datetime(df2.date, format='%Y-%d-%m')
Related
I am trying to create a new data frame that excludes all timestamps bigger/later than 12:00:00. I tried multiple approaches but I keep on having an issue. The time was previously a datetime column that I change to 2 columns, a date and time column (format datetime.time)
Code:
Issue thrown out:
Do you have any suggestions to be able to do this properly?
Alright the solution:
change the Datetype from datetime.time to Datetime
Use the between method by setting the Datetime column as the index and setting the range
The result is a subset of the dataframe based on the needed time frame.
I have a large dataset with multiple date columns that I need to clean up, mostly by removing the time stamp since it is all 00:00:00. I want to write a function that collects all columns if type is datetime, then format all of them instead of having to attack one each.
I figured it out. This is what I came up with and it works for me:
def tidy_dates(df):
for col in df.select_dtypes(include="datetime64[ns, UTC]"):
df[col] = df[col].dt.strftime("%Y-%m-%d")
return df
I have an Excel sheet of time series data of prices where each day consists of 6 hourly periods. I am trying to use Python and Pandas which I have setup and working, importing a CSV and then creating a df from this. That is fine it is just the sorting code I am struggling with. In excel I can do this using a Sum(sumifs) array function but I would like it to work in Python/Pandas.
I am looking to produce a new time series from the data where for each day I get the average price for periods 3 to 5 inclusive only, excluding the others. I am struggling with this.
An example raw data exert and result I am looking for is below:
You need filter by between and boolean indexing and then aggregate mean:
df = df[df['Period'].between(3,5)].groupby('Date', as_index=False)['Price'].mean()
I have to do the tasks below, but some parts do not work out properly. Here are the steps:
Converting a dataframe with unix timestamps to a dataframe with datetime values, works with the following code:
datetime_df = pd.to_datetime(unix_df, unit='s')
Resampling the datetime with smpl = datetime_df[0].resample('10min')
Converting it back to unix timestamp format with: unix_df =
datetime_df.astype(np.int64) // 10 ** 9
Step 1 and 3 work, but step 2 doesn't work because Python tells me it needs a DateTimeIndex and I don't know how to set it. The strange thing is that the index completely disappears after to_datetime, so I tried creating a list and then making a dataframe from it again, but it still didn't work. Somebody can help me?
I'm working with a large dataset in Pandas, that needs to be able to do lots of time calculations.
I was previously formatting and keeping time in Epoch format, but discovered the to_datetime functionality of Pandas and want to switch to it. The issue is it isn't correctly formatting my timezone.
sample date time
2015-03-01T15:41:53.825992-06:00
after parsing through Pandas to_datetime
3/1/2015 21:41:53.825992
It isn't keeping the time items in US/Central timezone instead converting to GMT
Another issue is I have row items that have an array of times. Like so
Index singleTime arrayTime
0 3/1/2015 21:41:53.825992 [3/1/2015 21:41:53.825992, 3/1/2015 21:44:53.825992,...,3/1/2015 21:49:53.825992]
1 3/1/2015 22:43:53.825992 [3/1/2015 22:43:53.825992, 3/1/2015 22:44:53.825992,...,3/1/2015 22:49:53.825992]
Currently I parse the times using
pd.to_datetime(timeString)
Then iterate through the data frame column
newTimearray=[]
for each in df.singleTime:
newTimearray.append(each.tz_localize('UTC').tz_convert('US/Central'))
df.singleTime=newTimearray
I suspect this isn't very efficient and this doesn't work for the array of times. A lot of the solutions I have seen have hinged around the time thats an index. I can index off one time, but either way I will have multiple time columns, and items that I need to convert that aren't an index.
So how do I effectively convert all time items formatted like this?