I have the following Pandas dataframe in Python 2.7.
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
dfc = pd.DataFrame(zip(*[trial_num,sail_rem_time]),columns=['Temp_Reading','Time_of_Sail'])
print dfc
The dataframe looks like this:
Temp_Reading Time_of_Sail
1 11:33:11
2 16:29:05
3 09:37:56
4 21:43:31
5 17:42:06
This dataframe comes from a *.csv file. I use Pandas to read in the *.csv file as a Pandas dataframe. When I use print dfc.dtypes, it shows me that the column Time_of_Sail has a datatype object. I would like to convert this column to datetime datatype BUT I only want the time part - I don't want the year, month, date.
I can try this:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
but the problem is that the when I run print dfc.dtypes it still shows that the column Time_of_Sail is object.
Is there a way to convert this column into a datetime format that only has the time?
Additional Information:
To create the above dataframe and output, this also works:
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
data = [
[trial_num[0],sail_rem_time[0]],
[trial_num[1],sail_rem_time[1]],[trial_num[2],sail_rem_time[2]],
[trial_num[3],sail_rem_time[3]]
]
dfc = pd.DataFrame(data,columns=['Temp_Reading','Time_of_Sail'])
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
print dfc
print dfc.dtypes
These two lines:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
Can be written as:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time
Using to_timedelta,we can convert string to time format(timedelta64[ns]) by specifying units as second,min etc.,
dfc['Time_of_Sail'] = pd.to_timedelta(dfc['Time_of_Sail'], unit='s')
This seems to work:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'], format='%H:%M:%S' ).apply(pd.Timestamp)
If anyone is searching for a more generalized answer try
dfc['Time_of_Sail']= pd.to_datetime(dfc['Time_of_Sail'])
If you just want a simple conversion you can do the below:
import datetime as dt
dfc.Time_of_Sail = dfc.Time_of_Sail.astype(dt.datetime)
or you could add a holder string to your time column as below, and then convert afterwards using an apply function:
dfc.Time_of_Sail = dfc.Time_of_Sail.apply(lambda x: '2016-01-01 ' + str(x))
dfc.Time_of_Sail = pd.to_datetime(dfc.Time_of_Sail).apply(lambda x: dt.datetime.time(x))
Related
I canĀ“t see the result...
My result is 0 and it should be 824
import pandas as pd
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
dateindextime = data.set_index("timestamp")
rango = dateindextime.loc["2011-08-20":"2008-05-15"]
print(len(rango))
If I do
print(rango)
output:
Empty DataFrame Columns: [open, high, low, close, adjusted_close, volume] Index: []
Kinda hard to tell without the AAPL.xlsx dataset, but I'm guessing you will need to convert the "timestamp" column to a datetime object first using pd.to_datetime. From there you would slice on the datetime object vs slicing on a string, which is what you were doing below. If you posted the AAPL.xlsx dataset, I could dig deeper.
import pandas as pd
import datetime
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
data["datetime_timestamp"] = pd.to_datetime(data["timestamp"], infer_datetime_format=True)
dateindextime = data.set_index("datetime_timestamp")
ti = datetime.date(2008,5,15)
tf = datetime.date(2011,8,20)
rango = dateindextime.loc[ti:tf]
print(len(rango))
I have a dataframe:
id timestamp
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:09:59"
I need to turn timestamp into integer number to iterate over conditions. So it look like this:
id timestamp
1 20250802190859
1 20250802190859
1 20250802190959
you can convert string using string of pandas :
df = pd.DataFrame({'id':[1,1,1],'timestamp':["2025-08-02 19:08:59",
"2025-08-02 19:08:59",
"2025-08-02 19:09:59"]})
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df['timestamp'] = df['timestamp'].str.replace(r'[-\s:]', '').astype('float64')
>>> df
id timestamp
0 1 20250802190859.000
1 1 20250802190859.000
2 1 20250802190959.000
Have you tried opening the file, skipping the first line (or better: validating that it contains the header fields as expected) and for each line, splitting it at the first space/tab/whitespace. The second part, e.g. "2025-08-02 19:08:59", can be parsed using datetime.fromisoformat(). You can then turn the datetime object back to a string using datetime.strftime(format) with e.g. format = '%Y%m%d%H%M%S'. Note that there is no "milliseconds" format in strftime though. You could use %f for microseconds.
Note: if datetime.fromisoformat() fails to parse the dates, try datetime.strptime(date_string, format) with a different format, e.g. format = '%Y-%m-%d %H:%M:%S'.
You can use the solutions provided in this post: How to turn timestamp into float number? and loop through the dataframe.
Let's say you have already imported pandas and have a dataframe df, see the additional code below:
import re
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
This way you will not modify the original dataframe df and will get desired output in a new dataframe df1.
Full code that I tried (including creatiion of first dataframe), this might help in removing any confusions:
import pandas as pd
import re
l = ["2025-08-02 19:08:59", "2025-08-02 19:08:59", "2025-08-02 19:09:59"]
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
I was trying to modify each string present in column named Date_time in a data-frame. The values(string type) present in that column is as:
"40 11-02-20 11:42:36"
I was trying to delete the characters until first space and trying to replace it with: "11-02-20 11:42:36". I was able to split the value but unable to rewrite it in the same cell of that column. Here is the code i have done so far:
import numpy as np
import matplotlib as plt
import pandas as pd
dataset = pd.read_csv('20-02-11.csv')
for i in dataset.itertuples():
print(type(i.Date_time))
str = i.Date_time
str1 = str.split(None,1)[1]
i.Date_time = str1
print(str1)
print(i.Date_time)
break
and it shows AttributeError when i am trying to assign str1 to i.Date_time.
Please help.
The tuples that itertuples() returns, can/should not be used to set values in the original dataframe. They are copies not the actual data of the dataframe. You can try something like this:
for i in range(len(dataset)):
your_string = dataset.loc[i, "Date_time"]
adjusted_string = your_string.split(None, 1)[1]
dataset.loc[i, "Date_time"] = adjusted_string
This will use the actual data stored in the dataframe.
Using the df.at()-function:
for i, row in dataset.iterrows():
your_string = row.Date_time # or row['Date_time']
adjusted_string = your_string.split(None, 1)[1]
dataset.at[i,'Date_time'] = adjusted_string
You can format the entire column at once. Starting with a dataframe like this:
df = pd.DataFrame({'date_time': ['40 11-02-20 11:42:36', '31 11-02-20 11:42:36']})
print(df)
returns
date_time
0 40 11-02-20 11:42:36
1 31 11-02-20 11:42:36
You can remove the first characters and space like this:
df['date_time'] = [i[1+len(i.split(' ')[0]):] for i in df['date_time']]
print(df)
returns
date_time
0 11-02-20 11:42:36
1 11-02-20 11:42:36
I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")
I want to calculate the average number of successful Rattatas catches hourly for this whole dataset. I am looking for an efficient way to do this by utilizing pandas--I'm new to Python and pandas.
You don't need any loops. Try this. I think logic is rather clear.
import pandas as pd
#read csv
df = pd.read_csv('pkmn.csv', header=0)
#we need apply some transformations to extract date from timestamp
df['time'] = df['time'].apply(lambda x : pd.to_datetime(str(x)))
df['date'] = df['time'].dt.date
#main transformations
df = df.query("Pokemon == 'rattata' and caught == True").groupby('hour')
result = pd.DataFrame()
result['caught total'] = df['hour'].count()
result['days'] = df['date'].nunique()
result['caught average'] = result['caught total'] / result['days']
If you have your pandas dataframe saved as df this should work:
rats = df.loc[df.Pokemon == "rattata"] #Gives you subset of rows relating to Rattata
total = sum(rats.Caught) #Gives you the number caught total
diff = rats.time[len(rats)] - rats.time[0] #Should give you difference between first and last
average = total/diff #Should give you the number caught per unit time