I have a dataframe:
id timestamp
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:09:59"
I need to turn timestamp into integer number to iterate over conditions. So it look like this:
id timestamp
1 20250802190859
1 20250802190859
1 20250802190959
you can convert string using string of pandas :
df = pd.DataFrame({'id':[1,1,1],'timestamp':["2025-08-02 19:08:59",
"2025-08-02 19:08:59",
"2025-08-02 19:09:59"]})
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df['timestamp'] = df['timestamp'].str.replace(r'[-\s:]', '').astype('float64')
>>> df
id timestamp
0 1 20250802190859.000
1 1 20250802190859.000
2 1 20250802190959.000
Have you tried opening the file, skipping the first line (or better: validating that it contains the header fields as expected) and for each line, splitting it at the first space/tab/whitespace. The second part, e.g. "2025-08-02 19:08:59", can be parsed using datetime.fromisoformat(). You can then turn the datetime object back to a string using datetime.strftime(format) with e.g. format = '%Y%m%d%H%M%S'. Note that there is no "milliseconds" format in strftime though. You could use %f for microseconds.
Note: if datetime.fromisoformat() fails to parse the dates, try datetime.strptime(date_string, format) with a different format, e.g. format = '%Y-%m-%d %H:%M:%S'.
You can use the solutions provided in this post: How to turn timestamp into float number? and loop through the dataframe.
Let's say you have already imported pandas and have a dataframe df, see the additional code below:
import re
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
This way you will not modify the original dataframe df and will get desired output in a new dataframe df1.
Full code that I tried (including creatiion of first dataframe), this might help in removing any confusions:
import pandas as pd
import re
l = ["2025-08-02 19:08:59", "2025-08-02 19:08:59", "2025-08-02 19:09:59"]
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
Related
I hope you are well I have the following string:
"{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"...
To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index
desired output:
id, idType, suscriptionID
0. '7-84-1811', 'CIP', 21312421412
1. '1-232-42', 'IO' , 21421e324
My code:
import pandas as pd
import json
path = '/example.csv'
df = pd.read_csv(path)
normalize_df = json.load(df)
print(df)
Considering your string is in JSON format, you can do this.
drop columns, transpose, and get headers right.
toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}"
json_string = toEscape.encode('utf-8').decode('unicode_escape')
df = pd.read_json(json_string)
df = df.drop(["code","description"], axis=1)
df = df.transpose().reset_index().drop("index", axis=1)
df.to_csv("user_details.csv")
the output looks like this:
id idType suscriptionId
0 8-717-2346 CIP 92118213
Thank you for the question.
I have a txt file with data and values like this one:
PP C timestamp HR RMSSD SCL
PP1 1 20120918T131600000 NaN NaN 80.239727
PP1 1 20120918T131700000 61 0.061420 77.365127
and I am importing it like that:
df = pd.read_csv('data.txt','\t', header=0)
which gives me a nice looking dataframe:
Running
df.columns
shows this result Index(['PP', 'C', 'timestamp', 'HR', 'RMSSD', 'SCL'], dtype='object').
Now when I am trying to convert the timestamp column into a datetime column:
df["datetime"] = pd.to_datetime(df["timestamp"], format='%Y%m%dT%H%M%S%f')
I get this:
ValueError: time data 'timestamp' does not match format '%Y%m%dT%H%M%S%f' (match)
Any ideas would be appreciated.
First, the error message you're quoting is from the header row. It's trying to parse the literal string 'timestamp' as a timestamp, which is failing. If you're getting an error on an actual data row, show us that message.
All three of your posted data rows parse fine with your format in my testing:
>>> [pandas.to_datetime(s, format='%Y%m%dT%H%M%S%f')
for s in ['20120918T131600000', '20120918T131700000',
'20120918T131800000']]
[Timestamp('2012-09-18 13:16:00'), Timestamp('2012-09-18 13:17:00'), Timestamp('2012-09-18 13:18:00')]
I have no idea where you got format='%Y%m%dT%H%M%S%f'[:-3], which just removes the S%f from the format string, leaving it invalid. If you want to remove the last three digits of the data so that you ca just use %H%M%S instead of %H%M%S%f, you need to put the [:-3] on the timestamp data value, not the format.
I was trying to modify each string present in column named Date_time in a data-frame. The values(string type) present in that column is as:
"40 11-02-20 11:42:36"
I was trying to delete the characters until first space and trying to replace it with: "11-02-20 11:42:36". I was able to split the value but unable to rewrite it in the same cell of that column. Here is the code i have done so far:
import numpy as np
import matplotlib as plt
import pandas as pd
dataset = pd.read_csv('20-02-11.csv')
for i in dataset.itertuples():
print(type(i.Date_time))
str = i.Date_time
str1 = str.split(None,1)[1]
i.Date_time = str1
print(str1)
print(i.Date_time)
break
and it shows AttributeError when i am trying to assign str1 to i.Date_time.
Please help.
The tuples that itertuples() returns, can/should not be used to set values in the original dataframe. They are copies not the actual data of the dataframe. You can try something like this:
for i in range(len(dataset)):
your_string = dataset.loc[i, "Date_time"]
adjusted_string = your_string.split(None, 1)[1]
dataset.loc[i, "Date_time"] = adjusted_string
This will use the actual data stored in the dataframe.
Using the df.at()-function:
for i, row in dataset.iterrows():
your_string = row.Date_time # or row['Date_time']
adjusted_string = your_string.split(None, 1)[1]
dataset.at[i,'Date_time'] = adjusted_string
You can format the entire column at once. Starting with a dataframe like this:
df = pd.DataFrame({'date_time': ['40 11-02-20 11:42:36', '31 11-02-20 11:42:36']})
print(df)
returns
date_time
0 40 11-02-20 11:42:36
1 31 11-02-20 11:42:36
You can remove the first characters and space like this:
df['date_time'] = [i[1+len(i.split(' ')[0]):] for i in df['date_time']]
print(df)
returns
date_time
0 11-02-20 11:42:36
1 11-02-20 11:42:36
I am asking for help in transforming values into date format.
I have following data structure:
ID ACT1 ACT2 ACT3 ACT4
1 154438.0 154104.0 155321.0 155321.0
2 154042.0 154073.0 154104.0 154104.0
...
The number in columns ACT1-4 need to be converted. Some rows contain NaN values.
I found that following function helps me to get a Gregorian date:
from datetime import datetime, timedelta
gregorian = datetime.strptime('1582/10/15', "%Y/%m/%d")
modified_date = gregorian + timedelta(days=154438)
datetime.strftime(modified_date, "%Y/%m/%d")
It would be great to know how I can apply this transformation to all columns except for "ID" and whether the approach is correct (or could be improved).
After the transformation is applied, I need to extract the order of column items, sorted by date in ascending order. For instance
ID ORDER
1 ACT1, ACT3, ACT4, ACT2
2 ACT2, ACT1, ACT3, ACT4
Thank you!
It sounds like you have two questions here.
1) To change to datetime:
cols = [col for col in df.columns if col != 'ID']
df.loc[:, cols] = df.loc[:, cols].applymap(lambda x: datetime.strptime('1582/10/15', "%Y/%m/%d") + timedelta(days=x) if np.isfinite(x) else x)
2) To get the sorted column names:
df['ORDER'] = df.loc[:, cols].apply(lambda dr: ','.join(df.loc[:, cols].columns[dr.dropna().argsort()]), axis=1)
Note: the dropna above will omit columns with NaT values from the order string.
First I would make the input column comma separated so that its much easier to handle of the form:
ID,ACT1,ACT2,ACT3,ACT4
1,154438.0,154104.0,155321.0,155321.0
2,154042.0,154073.0,154104.0,154104.0
Then you can read each line using a CSV reader, extracting key,value pairs that have your column names as keys. Then you pop the ID off that dictionary to get its value ie, 1,2,etc. And you can then reorder according to the value which is the date. The code is below:
#!/usr/bin/env python3
import csv
from operator import itemgetter
idAndTuple = {}
with open('time.txt') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
myID = row.pop('ID',None)
reorderedList = sorted(row.items(), key = itemgetter(1))
idAndTuple[myID] = reorderedList
print( myID, reorderedList )
The result when you run this is:
1 [('ACT2', '154104.0'), ('ACT1', '154438.0'), ('ACT3', '155321.0'), ('ACT4', '155321.0')]
2 [('ACT1', '154042.0'), ('ACT2', '154073.0'), ('ACT3', '154104.0'), ('ACT4', '154104.0')]
which I think is what you are looking for.
I have the following Pandas dataframe in Python 2.7.
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
dfc = pd.DataFrame(zip(*[trial_num,sail_rem_time]),columns=['Temp_Reading','Time_of_Sail'])
print dfc
The dataframe looks like this:
Temp_Reading Time_of_Sail
1 11:33:11
2 16:29:05
3 09:37:56
4 21:43:31
5 17:42:06
This dataframe comes from a *.csv file. I use Pandas to read in the *.csv file as a Pandas dataframe. When I use print dfc.dtypes, it shows me that the column Time_of_Sail has a datatype object. I would like to convert this column to datetime datatype BUT I only want the time part - I don't want the year, month, date.
I can try this:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
but the problem is that the when I run print dfc.dtypes it still shows that the column Time_of_Sail is object.
Is there a way to convert this column into a datetime format that only has the time?
Additional Information:
To create the above dataframe and output, this also works:
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
data = [
[trial_num[0],sail_rem_time[0]],
[trial_num[1],sail_rem_time[1]],[trial_num[2],sail_rem_time[2]],
[trial_num[3],sail_rem_time[3]]
]
dfc = pd.DataFrame(data,columns=['Temp_Reading','Time_of_Sail'])
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
print dfc
print dfc.dtypes
These two lines:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
Can be written as:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time
Using to_timedelta,we can convert string to time format(timedelta64[ns]) by specifying units as second,min etc.,
dfc['Time_of_Sail'] = pd.to_timedelta(dfc['Time_of_Sail'], unit='s')
This seems to work:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'], format='%H:%M:%S' ).apply(pd.Timestamp)
If anyone is searching for a more generalized answer try
dfc['Time_of_Sail']= pd.to_datetime(dfc['Time_of_Sail'])
If you just want a simple conversion you can do the below:
import datetime as dt
dfc.Time_of_Sail = dfc.Time_of_Sail.astype(dt.datetime)
or you could add a holder string to your time column as below, and then convert afterwards using an apply function:
dfc.Time_of_Sail = dfc.Time_of_Sail.apply(lambda x: '2016-01-01 ' + str(x))
dfc.Time_of_Sail = pd.to_datetime(dfc.Time_of_Sail).apply(lambda x: dt.datetime.time(x))