How to apply lambda function to timestamp column in pandas dataframe

How to apply lambda function to timestamp column in pandas dataframe - python

I have dataframe with a timestamp column and iam using lambda function to that column. When i am doing that i am getting the following error:
row['date'] = pd.Timestamp(row['date']).apply(lambda t: t.replace(minute=15*(t.minute//15)).strftime('%H:%M'))
AttributeError: 'Timestamp' object has no attribute 'apply'
How can i do that in pandas?
example: output:
05:06 05:00
05:20 05:15
09:18 09:15
10:03 10:00

It seems you need to_datetime for convert column to datetimes instead Timestamp - it convert only scalar:
row['date']=pd.to_datetime(row['date']).apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
EDIT:
print (df)
a b
0 05:06 05:00
1 05:20 05:15
2 09:18 09:15
3 10:03 10:00
df['date'] = pd.to_datetime(df['a'])
.apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
print (df)
a b date
0 05:06 05:00 05:00
1 05:20 05:15 05:15
2 09:18 09:15 09:15
3 10:03 10:00 10:00
Another solution but with different output:
df['date'] = pd.to_datetime(df['a']).dt.round('15min').dt.strftime('%H:%M')
For checking output you can use:
L = ['5:' + str(x).zfill(2) for x in range(60)]
df = pd.DataFrame({'a':L})
#print (df)
df['date1'] = pd.to_datetime(df['a']).dt.round('15min').dt.strftime('%H:%M')
df['date'] = pd.to_datetime(df['a'])
.apply(lambda t: t.replace(minute=15*(t.minute//15)))
.dt.strftime('%H:%M')
print (df)
a date1 date
0 5:00 05:00 05:00
1 5:01 05:00 05:00
2 5:02 05:00 05:00
3 5:03 05:00 05:00
4 5:04 05:00 05:00
5 5:05 05:00 05:00
6 5:06 05:00 05:00
7 5:07 05:00 05:00
8 5:08 05:15 05:00
9 5:09 05:15 05:00
10 5:10 05:15 05:00
11 5:11 05:15 05:00
12 5:12 05:15 05:00
13 5:13 05:15 05:00
14 5:14 05:15 05:00
15 5:15 05:15 05:15
16 5:16 05:15 05:15
17 5:17 05:15 05:15
18 5:18 05:15 05:15
19 5:19 05:15 05:15
20 5:20 05:15 05:15
21 5:21 05:15 05:15
22 5:22 05:15 05:15
23 5:23 05:30 05:15
24 5:24 05:30 05:15
25 5:25 05:30 05:15
26 5:26 05:30 05:15
27 5:27 05:30 05:15
28 5:28 05:30 05:15
29 5:29 05:30 05:15
30 5:30 05:30 05:30
31 5:31 05:30 05:30
32 5:32 05:30 05:30
33 5:33 05:30 05:30
34 5:34 05:30 05:30
35 5:35 05:30 05:30
36 5:36 05:30 05:30
37 5:37 05:30 05:30
38 5:38 05:45 05:30
39 5:39 05:45 05:30
40 5:40 05:45 05:30
41 5:41 05:45 05:30
42 5:42 05:45 05:30
43 5:43 05:45 05:30
44 5:44 05:45 05:30
45 5:45 05:45 05:45
46 5:46 05:45 05:45
47 5:47 05:45 05:45
48 5:48 05:45 05:45
49 5:49 05:45 05:45
50 5:50 05:45 05:45
51 5:51 05:45 05:45
52 5:52 05:45 05:45
53 5:53 06:00 05:45
54 5:54 06:00 05:45
55 5:55 06:00 05:45
56 5:56 06:00 05:45
57 5:57 06:00 05:45
58 5:58 06:00 05:45
59 5:59 06:00 05:45

Related

Unable to math values based on timestamps due to differing years in python

I have this dataframe A:
timestamp value_A value_A2
----------------------------------------
5/3/16 8:00 64 43
5/3/16 9:00 74 33
5/3/16 10:00 54 23
5/3/16 11:00 34 54
5/3/16 12:00 26 34
5/3/16 13:00 42 65
5/3/16 14:00 44 87
5/3/16 15:00 14 32
5/3/16 16:00 65 44
5/3/16 19:00 36 23
5/3/16 20:00 32 54
5/3/16 23:00 32 56
...
I want to merge this dataframe with this dataframe B:
Month value_B
-----------------------
05-03 08:00 35
05-03 09:00 44
05-03 10:00 22
05-03 11:00 25
05-03 12:00 75
05-03 13:00 64
05-03 14:00 43
05-03 15:00 44
05-03 16:00 26
05-03 17:00 22
05-03 18:00 35
05-03 19:00 36
05-03 20:00 32
05-03 21:00 26
05-03 22:00 44
05-03 23:00 22
...
I want to merge the dataframe based on matching day/month/hour/minute timestamps, with no year.
Notice that there are skipped timestamp entries in dataframe A, and so dataframe B has more rows. What I want to do is match up the rows by timestamp and create a new dataframe with columns for both value_A and value_B, forgetting the value_B rows in dataframe_B that do not have a corresponding timestamp with dataframe A. And so here is the dataframe I want to produce dataframe C:
timestamp value_A value_A2 value_B
-------------------------------------------------
5/3/16 8:00 64 43 35
5/3/16 9:00 74 33 44
5/3/16 10:00 54 23 22
5/3/16 11:00 34 54 25
5/3/16 12:00 26 34 75
5/3/16 13:00 42 65 64
5/3/16 14:00 44 87 43
5/3/16 15:00 14 32 44
5/3/16 16:00 65 44 26
5/3/16 19:00 36 23 36
5/3/16 20:00 32 54 32
5/3/16 23:00 32 56 22
I would like the value_B values to match all timestamps on the corresponding month and day, regardless of the year. By this I mean that a value of let's say 55 in dataframe B at let's say 5/25, should be matched to timestamps of both 5/25/16 and also 5/25/17.
I am having a lot of trouble with this, because it seems no matter what I try, when I send the final merged dataframe to a csv and then look at it in Excel, I keep seeing the year attached, which means means that for the given above example, 5/25/16 would be matched to 55, but 5/25/17 would have no value. How can I go about accomplishing this in a way that distributes that matches value_B values to dataframe A timestamps, based on just the month/day/hour/minute timestamp?

First convert column timestamp to datetimes and then create new column Month with same format like :
df1['Month'] = pd.to_datetime(df1['timestamp']).dt.strftime('%m-%d %H:%M')
And then use left join with second DataFrame with remove Month column:
df = df1.merge(df2, on='Month', how='left').drop('Month', axis=1)
print (df)
timestamp value_A value_B
0 5/3/16 8:00 64 35
1 5/3/16 9:00 74 44
2 5/3/16 10:00 54 22
3 5/3/16 11:00 34 25
4 5/3/16 12:00 26 75
5 5/3/16 13:00 42 64
6 5/3/16 14:00 44 43
7 5/3/16 15:00 14 44
8 5/3/16 16:00 65 26
9 5/3/16 19:00 36 36
10 5/3/16 20:00 32 32
11 5/3/16 23:00 32 22

First, you will have to get both dataframes in same format.
For df1:
df1["timestamp"] = pd.to_datetime(df1["timestamp"])
df1["timestamp"] = df1["timestamp"].dt.strftime("%d-%m %H:%M")
For df2:
df2["Month"] = pd.to_datetime(df2["Month"], format="%d-%m %H:%M")
df2["Month"] = df2["Month"].dt.strftime("%m-%d %H:%M")
Then, we will merge it into a single dataframe and drop one column from timestamp or Month.
df3 = pd.merge(df1, df2, left_on="timestamp", right_on="Month")
df3.drop(['Month'], axis=1, inplace=True)
This will produce the same output you are looking for!

Formatting pandas dataframe printing

I have the following pandas dataframe that was converted to string with to_string().
It was printed like this:
S T Q U X A D
02:36 06:00 06:00 06:00 06:30 09:46 07:56
02:37 06:10 06:15 06:15 06:40 09:48 08:00
12:00 11:00 12:00 12:00 07:43 12:00 18:03
13:15 13:00 13:15 13:15 07:50 13:15 18:08
14:00 14:00 14:00 14:00 14:00 19:00
15:15 15:00 14:15 15:15 15:15 19:05
16:15 16:00 15:15 16:15 16:15 20:15
17:15 17:00 17:15 17:15 17:15 20:17
18:15 21:22 21:19 19:55 18:15 20:18
19:15 21:24 21:21 19:58 19:15 20:19
The gaps are due to empty values in the dataframe. I would like to keep the column alignment, perhaps by replacing the empty values with tabs. I would also like to center align the header line.
This wasn't printed in a terminal, but was sent over telegram with the requests post command. I think though, it is just a print formatting problem, independent of the telegram requests library.
The desired output would be like this:
S T Q U X A D
02:36 06:00 06:00 06:00 06:30 09:46 07:56
02:37 06:10 06:15 06:15 06:40 09:48 08:00
12:00 11:00 12:00 12:00 07:43 12:00 18:03
13:15 13:00 13:15 13:15 07:50 13:15 18:08
14:00 14:00 14:00 14:00 14:00 19:00
15:15 15:00 14:15 15:15 15:15 19:05
16:15 16:00 15:15 16:15 16:15 20:15
17:15 17:00 17:15 17:15 17:15 20:17
18:15 21:22 21:19 19:55 18:15 20:18
19:15 21:24 21:21 19:58 19:15 20:19

you can use dataframe style.set_properties to set some of these options like:
df.style.set_properties(**{'text-align': 'center'})
read more here:
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.set_properties.html

Changing the date format of an entire Dataframe column when multiple date formats already exist in the column?

bond_df['Maturity']
0 2022-07-15 00:00:00
1 2024-07-18 00:00:00
2 2027-07-16 00:00:00
3 2020-07-28 00:00:00
4 2019-10-09 00:00:00
5 2022-04-08 00:00:00
6 2020-12-15 00:00:00
7 2022-12-15 00:00:00
8 2026-04-08 00:00:00
9 2023-04-11 00:00:00
10 2024-12-15 00:00:00
11 2019
12 2020-10-25 00:00:00
13 2024-04-22 00:00:00
14 2047-12-15 00:00:00
15 2020-07-08 00:00:00
17 2043-04-11 00:00:00
18 2021
19 2022
20 2023
21 2025
22 2026
23 2027
24 2029
25 2021-04-15 00:00:00
26 2044-04-22 00:00:00
27 2043-10-02 00:00:00
28 2039-01-19 00:00:00
29 2040-07-09 00:00:00
30 2029-09-21 00:00:00
31 2040-10-25 00:00:00
32 2019
33 2035-09-04 00:00:00
34 2035-09-28 00:00:00
35 2041-04-15 00:00:00
36 2040-04-02 00:00:00
37 2034-03-27 00:00:00
38 2030
39 2027-04-05 00:00:00
40 2038-04-15 00:00:00
41 2037-08-17 00:00:00
42 2023-10-16 00:00:00
43 -
45 2019-10-09 00:00:00
46 -
47 2021-06-23 00:00:00
48 2021-06-23 00:00:00
49 2023-06-26 00:00:00
50 2025-06-26 00:00:00
51 2028-06-26 00:00:00
52 2038-06-28 00:00:00
53 2020-06-23 00:00:00
54 2020-06-23 00:00:00
55 2048-06-29 00:00:00
56 -
57 -
58 2029-07-08 00:00:00
59 2026-07-08 00:00:00
60 2024-07-08 00:00:00
61 2020-07-31 00:00:00
Name: Maturity, dtype: object
This is a column of data that I imported from Excel of maturity dates for various Walmart bonds. All I am concerned with is the year portion of these dates. How can I format the entire column to just return the year values?
dt.strftime didn't work
Thanks in advance

Wrote this little script for you which should output the years in a years.txt file, assuming your data is in data.txt as only the years you posted above.
Script also lets you toggle if you want to include the dash and the years on the right.
Contents of the data.txt I tested with:
0 2022-07-15 00:00:00
1 2024-07-18 00:00:00
2 2027-07-16 00:00:00
3 2020-07-28 00:00:00
4 2019-10-09 00:00:00
5 2022-04-08 00:00:00
6 2020-12-15 00:00:00
7 2022-12-15 00:00:00
8 2026-04-08 00:00:00
9 2023-04-11 00:00:00
10 2024-12-15 00:00:00
11 2019
12 2020-10-25 00:00:00
13 2024-04-22 00:00:00
14 2047-12-15 00:00:00
15 2020-07-08 00:00:00
17 2043-04-11 00:00:00
18 2021
19 2022
20 2023
21 2025
22 2026
23 2027
24 2029
25 2021-04-15 00:00:00
26 2044-04-22 00:00:00
27 2043-10-02 00:00:00
28 2039-01-19 00:00:00
29 2040-07-09 00:00:00
30 2029-09-21 00:00:00
31 2040-10-25 00:00:00
32 2019
33 2035-09-04 00:00:00
34 2035-09-28 00:00:00
35 2041-04-15 00:00:00
36 2040-04-02 00:00:00
37 2034-03-27 00:00:00
38 2030
39 2027-04-05 00:00:00
40 2038-04-15 00:00:00
41 2037-08-17 00:00:00
42 2023-10-16 00:00:00
43 -
45 2019-10-09 00:00:00
46 -
47 2021-06-23 00:00:00
48 2021-06-23 00:00:00
49 2023-06-26 00:00:00
50 2025-06-26 00:00:00
51 2028-06-26 00:00:00
52 2038-06-28 00:00:00
53 2020-06-23 00:00:00
54 2020-06-23 00:00:00
55 2048-06-29 00:00:00
56 -
57 -
58 2029-07-08 00:00:00
59 2026-07-08 00:00:00
60 2024-07-08 00:00:00
61 2020-07-31 00:00:00
and the script I wrote:
#!/usr/bin/python3
all_years = []
include_dash = False
include_years_on_right = True
with open("data.txt", "r") as f:
text = f.read()
lines = text.split("\n")
for line in lines:
line = line.strip()
if line == "":
continue
if "00" in line:
all_years.append(line.split("-")[0].split()[-1])
else:
if include_years_on_right == False:
continue
year = line.split(" ")[-1]
if year == "-":
if include_dash == True:
all_years.append(year)
else:
continue
else:
all_years.append(year)
with open("years.txt", "w") as f:
for year in all_years:
f.write(year + "\n")
and the output to the years.txt:
2022
2024
2027
2020
2019
2022
2020
2022
2026
2023
2024
2019
2020
2024
2047
2020
2043
2021
2022
2023
2025
2026
2027
2029
2021
2044
2043
2039
2040
2029
2040
2019
2035
2035
2041
2040
2034
2030
2027
2038
2037
2023
2019
2021
2021
2023
2025
2028
2038
2020
2020
2048
2029
2026
2024
2020
Contact me if you have any issues, and I hope I can help you!

How to manipulate values in a dataframe column at specific times for each day within the dataframe

So, I have a datetime indexed dataframe that looks like this:
eventTime Energy Power RunningHours
9/29/2018 0:00 146.985 65 2256.88
9/29/2018 1:00 147.05 64.5 2257.87
9/29/2018 2:00 147.116 65 2258.87
9/29/2018 3:00 147.181 65 2259.87
9/29/2018 4:00 147.246 65 2260.87
9/29/2018 5:00 147.312 65 2261.87
9/29/2018 5:11 76.428
9/29/2018 5:12 65
9/29/2018 6:00 147.377 65 2262.87
9/29/2018 7:00 147.443 65 2263.87
9/29/2018 8:00 147.45 2263.98
9/29/2018 9:17 76.558
9/29/2018 9:17 1174.35
9/29/2018 19:00 147.502 65 2264.75
9/29/2018 20:00 147.567 65 2265.75
9/29/2018 21:00 147.633 65 2266.75
9/29/2018 22:00 147.698 65 2267.75
9/29/2018 23:00 147.764 65 2268.75
9/30/2018 0:00 147.829 65 2269.75
9/30/2018 1:00 147.895 65 2270.75
9/30/2018 2:00 147.961 65 2271.75
9/30/2018 3:00 148.026 65 2272.73
9/30/2018 4:00 148.092 65 2273.73
9/30/2018 5:00 148.157 65 2274.73
9/30/2018 6:00 148.223 65 2275.73
9/30/2018 7:00 148.288 65 2276.73
9/30/2018 8:00 148.297 2276.87
9/30/2018 13:51 64
9/30/2018 19:00 148.35 65 2277.68
9/30/2018 20:00 148.415 65 2278.67
9/30/2018 21:00 148.481 65 2279.67
9/30/2018 22:00 148.546 65 2280.67
9/30/2018 23:00 148.611 65 2281.67
For each day in the datetime index, I am looking to find the difference between "RunningHours" value at 23 hours and 0 hours.
I am imagining my output to look like
9/29/2018 11.87
9/30/2018 11.92
How do I get to this. I am currently disaggregating the datetime index to date and time, then looping down date and time to find the difference. Seems complicated for something very simple and I am sure there is an easier way using the datetime index as is. I just don't know how. Help please.
#ansev Your code works very well for data that is continuous and where the information exists for 00: and 23:00 timestamps. However, if data is missing for these 2 timestamps, the script picks up the first available or the last available datapoint for this date.
For. e.g.: For the data below
6/7/2018 0:00 67.728 64 1037.82
6/7/2018 1:00 67.793 64 1038.82
6/7/2018 2:00 67.857 64 1039.82
6/7/2018 3:00 67.922 64 1040.82
6/7/2018 4:00 67.987 64 1041.82
6/7/2018 5:00 64 1042.82
6/7/2018 6:00 1043.43
6/7/2018 23:00 68.288
The output from the script is
6/7/2018 1037.82 1043.43 5.61
How do I modify it to say NaN if data is not available ?
Thanks so much for your help on this.

assuming it is ordered chronologically we can use groupby.agg to get first and last for each date then we can get the difference
new_df = (df.groupby(pd.to_datetime(df['eventTime']).dt.date)['RunningHours']
.agg(['first','last'])
.assign(difference=lambda x: x['last']-x['first'])
.reset_index())
print(new_df)
eventTime first last difference
0 2018-09-29 2256.88 2268.75 11.87
1 2018-09-30 2269.75 2281.67 11.92

Find values from a column in a DF at very specific times for every unique date
I answered my own question here for those that are looking for something different.

How to convert pandas dataframe date column in format of 'dd/mm/yyyy %H:%M' to 'yyyy/mm/dd %H:%M'

I have dataframe in the format of 'dd/mm/yyyy %H:%M'.
Date Price
29/10/2018 19:30 163.09
29/10/2018 20:00 211.95
29/10/2018 20:30 205.86
29/10/2018 21:00 201.39
29/10/2018 21:30 126.68
29/10/2018 22:00 112.36
29/10/2018 22:30 120.94
I want this dataframe in the format of 'yyyy/mm/dd %H:%M' as following.
Date Price
2018/29/10 19:30 163.09
2018/29/10 20:00 211.95
2018/29/10 20:30 205.86
2018/29/10 21:00 201.39
2018/29/10 21:30 126.68
2018/29/10 22:00 112.36
2018/29/10 22:30 120.94
I tried
df['Date'] = pd.to_datetime(df['Date]) but it gives result as following which is not something I am looking for
Date Price
2018-29-10 19:30:00 163.09
2018-29-10 20:00:00 211.95
2018-29-10 20:30:00 205.86
2018-29-10 21:00:00 201.39

Use strftime for convert datetimes to string format:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y/%m/%d %H:%M')
print (df)
Date Price
0 2018/10/29 19:30 163.09
1 2018/10/29 20:00 211.95
2 2018/10/29 20:30 205.86
3 2018/10/29 21:00 201.39
4 2018/10/29 21:30 126.68
5 2018/10/29 22:00 112.36
6 2018/10/29 22:30 120.94
print (type(df.loc[0, 'Date']))
<class 'str'>
print (df['Date'].dtype)
object
So if want working with datetimeslike function, use only to_datetime, format is YYYY-MM-DD HH:MM:SS:
df['Date'] = pd.to_datetime(df['Date'])
print (df)
Date Price
0 2018-10-29 19:30:00 163.09
1 2018-10-29 20:00:00 211.95
2 2018-10-29 20:30:00 205.86
3 2018-10-29 21:00:00 201.39
4 2018-10-29 21:30:00 126.68
5 2018-10-29 22:00:00 112.36
6 2018-10-29 22:30:00 120.94
print (type(df.loc[0, 'Date']))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
print (df['Date'].dtype)
datetime64[ns]

Pandas stores datetime as integers
When you say it gives result as following, you are only seeing a string representation of these underlying integers. You should not misconstrue this as how Pandas stores your data or, indeed, how the data will be represented when you export it to another format.
Convert to object dtype
You can use pd.Series.dt.strftime to convert your series to a series of strings. This will have object dtype, which represents a sequence of pointers:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y/%m/%d %H:%M')
You will lose all vectorisation benefits, so you should aim to perform this operation only if necessary and as late as possible.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to apply lambda function to timestamp column in pandas dataframe - python

Related

Unable to math values based on timestamps due to differing years in python

Formatting pandas dataframe printing

Changing the date format of an entire Dataframe column when multiple date formats already exist in the column?

How to manipulate values in a dataframe column at specific times for each day within the dataframe

How to convert pandas dataframe date column in format of 'dd/mm/yyyy %H:%M' to 'yyyy/mm/dd %H:%M'

Categories

Resources