Python/Pandas convert string to time only - python

I have the following Pandas dataframe in Python 2.7.
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
dfc = pd.DataFrame(zip(*[trial_num,sail_rem_time]),columns=['Temp_Reading','Time_of_Sail'])
print dfc
The dataframe looks like this:
Temp_Reading Time_of_Sail
1 11:33:11
2 16:29:05
3 09:37:56
4 21:43:31
5 17:42:06
This dataframe comes from a *.csv file. I use Pandas to read in the *.csv file as a Pandas dataframe. When I use print dfc.dtypes, it shows me that the column Time_of_Sail has a datatype object. I would like to convert this column to datetime datatype BUT I only want the time part - I don't want the year, month, date.
I can try this:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
but the problem is that the when I run print dfc.dtypes it still shows that the column Time_of_Sail is object.
Is there a way to convert this column into a datetime format that only has the time?
Additional Information:
To create the above dataframe and output, this also works:
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
data = [
[trial_num[0],sail_rem_time[0]],
[trial_num[1],sail_rem_time[1]],[trial_num[2],sail_rem_time[2]],
[trial_num[3],sail_rem_time[3]]
]
dfc = pd.DataFrame(data,columns=['Temp_Reading','Time_of_Sail'])
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
print dfc
print dfc.dtypes

These two lines:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
Can be written as:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time

Using to_timedelta,we can convert string to time format(timedelta64[ns]) by specifying units as second,min etc.,
dfc['Time_of_Sail'] = pd.to_timedelta(dfc['Time_of_Sail'], unit='s')

This seems to work:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'], format='%H:%M:%S' ).apply(pd.Timestamp)

If anyone is searching for a more generalized answer try
dfc['Time_of_Sail']= pd.to_datetime(dfc['Time_of_Sail'])

If you just want a simple conversion you can do the below:
import datetime as dt
dfc.Time_of_Sail = dfc.Time_of_Sail.astype(dt.datetime)
or you could add a holder string to your time column as below, and then convert afterwards using an apply function:
dfc.Time_of_Sail = dfc.Time_of_Sail.apply(lambda x: '2016-01-01 ' + str(x))
dfc.Time_of_Sail = pd.to_datetime(dfc.Time_of_Sail).apply(lambda x: dt.datetime.time(x))

Related

Dataframe questions about read excel

I canĀ“t see the result...
My result is 0 and it should be 824
import pandas as pd
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
dateindextime = data.set_index("timestamp")
rango = dateindextime.loc["2011-08-20":"2008-05-15"]
print(len(rango))
If I do
print(rango)
output:
Empty DataFrame Columns: [open, high, low, close, adjusted_close, volume] Index: []
Kinda hard to tell without the AAPL.xlsx dataset, but I'm guessing you will need to convert the "timestamp" column to a datetime object first using pd.to_datetime. From there you would slice on the datetime object vs slicing on a string, which is what you were doing below. If you posted the AAPL.xlsx dataset, I could dig deeper.
import pandas as pd
import datetime
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
data["datetime_timestamp"] = pd.to_datetime(data["timestamp"], infer_datetime_format=True)
dateindextime = data.set_index("datetime_timestamp")
ti = datetime.date(2008,5,15)
tf = datetime.date(2011,8,20)
rango = dateindextime.loc[ti:tf]
print(len(rango))

How to turn value in timestamp column into numbers

I have a dataframe:
id timestamp
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:09:59"
I need to turn timestamp into integer number to iterate over conditions. So it look like this:
id timestamp
1 20250802190859
1 20250802190859
1 20250802190959
you can convert string using string of pandas :
df = pd.DataFrame({'id':[1,1,1],'timestamp':["2025-08-02 19:08:59",
"2025-08-02 19:08:59",
"2025-08-02 19:09:59"]})
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df['timestamp'] = df['timestamp'].str.replace(r'[-\s:]', '').astype('float64')
>>> df
id timestamp
0 1 20250802190859.000
1 1 20250802190859.000
2 1 20250802190959.000
Have you tried opening the file, skipping the first line (or better: validating that it contains the header fields as expected) and for each line, splitting it at the first space/tab/whitespace. The second part, e.g. "2025-08-02 19:08:59", can be parsed using datetime.fromisoformat(). You can then turn the datetime object back to a string using datetime.strftime(format) with e.g. format = '%Y%m%d%H%M%S'. Note that there is no "milliseconds" format in strftime though. You could use %f for microseconds.
Note: if datetime.fromisoformat() fails to parse the dates, try datetime.strptime(date_string, format) with a different format, e.g. format = '%Y-%m-%d %H:%M:%S'.
You can use the solutions provided in this post: How to turn timestamp into float number? and loop through the dataframe.
Let's say you have already imported pandas and have a dataframe df, see the additional code below:
import re
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
This way you will not modify the original dataframe df and will get desired output in a new dataframe df1.
Full code that I tried (including creatiion of first dataframe), this might help in removing any confusions:
import pandas as pd
import re
l = ["2025-08-02 19:08:59", "2025-08-02 19:08:59", "2025-08-02 19:09:59"]
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])

overwriting the values of a column in python

I was trying to modify each string present in column named Date_time in a data-frame. The values(string type) present in that column is as:
"40 11-02-20 11:42:36"
I was trying to delete the characters until first space and trying to replace it with: "11-02-20 11:42:36". I was able to split the value but unable to rewrite it in the same cell of that column. Here is the code i have done so far:
import numpy as np
import matplotlib as plt
import pandas as pd
dataset = pd.read_csv('20-02-11.csv')
for i in dataset.itertuples():
print(type(i.Date_time))
str = i.Date_time
str1 = str.split(None,1)[1]
i.Date_time = str1
print(str1)
print(i.Date_time)
break
and it shows AttributeError when i am trying to assign str1 to i.Date_time.
Please help.
The tuples that itertuples() returns, can/should not be used to set values in the original dataframe. They are copies not the actual data of the dataframe. You can try something like this:
for i in range(len(dataset)):
your_string = dataset.loc[i, "Date_time"]
adjusted_string = your_string.split(None, 1)[1]
dataset.loc[i, "Date_time"] = adjusted_string
This will use the actual data stored in the dataframe.
Using the df.at()-function:
for i, row in dataset.iterrows():
your_string = row.Date_time # or row['Date_time']
adjusted_string = your_string.split(None, 1)[1]
dataset.at[i,'Date_time'] = adjusted_string
You can format the entire column at once. Starting with a dataframe like this:
df = pd.DataFrame({'date_time': ['40 11-02-20 11:42:36', '31 11-02-20 11:42:36']})
print(df)
returns
date_time
0 40 11-02-20 11:42:36
1 31 11-02-20 11:42:36
You can remove the first characters and space like this:
df['date_time'] = [i[1+len(i.split(' ')[0]):] for i in df['date_time']]
print(df)
returns
date_time
0 11-02-20 11:42:36
1 11-02-20 11:42:36

Reading csv ,saving in dataframe, putting if condition, not getting exectped result

I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")

parsing CSV in pandas

I want to calculate the average number of successful Rattatas catches hourly for this whole dataset. I am looking for an efficient way to do this by utilizing pandas--I'm new to Python and pandas.
You don't need any loops. Try this. I think logic is rather clear.
import pandas as pd
#read csv
df = pd.read_csv('pkmn.csv', header=0)
#we need apply some transformations to extract date from timestamp
df['time'] = df['time'].apply(lambda x : pd.to_datetime(str(x)))
df['date'] = df['time'].dt.date
#main transformations
df = df.query("Pokemon == 'rattata' and caught == True").groupby('hour')
result = pd.DataFrame()
result['caught total'] = df['hour'].count()
result['days'] = df['date'].nunique()
result['caught average'] = result['caught total'] / result['days']
If you have your pandas dataframe saved as df this should work:
rats = df.loc[df.Pokemon == "rattata"] #Gives you subset of rows relating to Rattata
total = sum(rats.Caught) #Gives you the number caught total
diff = rats.time[len(rats)] - rats.time[0] #Should give you difference between first and last
average = total/diff #Should give you the number caught per unit time

Categories