I am learning python and came across an issue where I am trying to read timestamp from CSV file in below format,
43:32.0
here 43 is at hours position and convert it to DateTime format in Pandas.
I tried code,
df['time'] = df['time'].astype(str).str[:-2]
df['time'] = pd.to_datetime(df['time'], errors='coerce')
But, this is converting all values to NaT
I need the output to be in format - mm/dd/yyyy hh:mm:ss
I'm going to assume that this is a Date for 11-29-17 (today's date)?
I believe you need to add an extra 0: in the beginning of the string. Basic Example:
import pandas as pd
# creating a dataframe of your string
df1 = pd.DataFrame({'A':['43:32.0']})
# adding '0:' to the front
df1['A'] = '0:' + df1['A'].astype(str)
# making new column to show the output
df1['B'] = pd.to_datetime(df1['A'], errors='coerce')
#output
A B
0 0:43:32.0 2017-11-29 00:43:32
Related
I'm trying to convert a column in a dataframe to timeseries, the values in the column are strings and they are in the following form:
12/10/202110:42:05.397
which means 12-10-2021 at 10:42:05 and 397 milliseconds. This is the format that Labview is saving the data into a file.
I'm trying to use the following command, but I can't figure out how to define the format for my case:
pd.to_datetime(df.DateTime, format=???)
Note that there is no space between year 2021 and hour 10
Use:
df['dt'] = pd.to_datetime(df['DateTime'], format='%d/%m/%Y%H:%M:%S.%f')
print(df)
# Output
DateTime dt
0 12/10/202110:42:05.397 2021-10-12 10:42:05.397
Setup:
df = pd.DataFrame({'DateTime': ['12/10/202110:42:05.397']})
As suggested by #RaymondKwok, use the documentation:
strftime() and strptime() Format Codes
I have a dataframe with date information in one column.
The date visually appears in the dataframe in this format: 2019-11-24
but when you print the type it shows up as:
Timestamp('2019-11-24 00:00:00')
I'd like to convert each value in the dataframe to a format like this:
24-Nov
or
7-Nov
for single digit days.
I've tried using various datetime and strptime commands to convert but I am getting errors.
Here's a way to do:
df = pd.DataFrame({'date': ["2014-10-23","2016-09-08"]})
df['date_new'] = pd.to_datetime(df['date'])
df['date_new'] = df['date_new'].dt.strftime("%d-%b")
date date_new
0 2014-10-23 23-Oct
1 2016-09-08 08-Sept
This question is different from all the available questions and answers available in stack overflow because I do not want to change my data type to string in order to obtain desired output.
I find it as a most confusing and not able to find proper solution of my problem.
I read an excel file which have one column as following-
Date
9/20/2017 7:27:30 PM
9/20/2017 7:27:30 PM
11/21/2018 8:28:30 AM
7/18/2019 9:30:08 PM
.
.
.
I am taking this data from excel sheet with the help of dataframe
df = pd.read_excel("data.xlsx")
Firstly I want to remove time from this column. I am doing it as -
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = pd.to_datetime(df['Date'], errors='ignore', format='%d/%b/%Y').dt.date
It produces following output and datatype as datetime.date
Date
20/9/2017
20/9/2017
21/11/2018
18/7/2019
.
.
.
But I want it as following type without changing it into string.Because I want to store this data into another excel file and this column must behave as a date column if we apply filtering in my excel sheet.
Date
20/Sep/2017
20/Sep/2017
21/Nov/2018
18/Jul/2019
.
.
.
I can produce above output by
df['Date'] = df['Date'].apply(lambda x: x.strftime('%d/%b/%Y'))
But again this date column will be changed into string.But I do not want it as string. I want it as datetime type excluding time values from each cell.
A possible solution after converting it from string to datetime is as following but it will again add time values in it-
df['Date'] = pd.to_datetime(df['Date'])
After executing above two steps it will also include time as 12:00:00 AM or 00:00:00 AM along with date value.
Hope I am clear.
How to obtained the desired result with final column value as date type
But I want it as following type without changing it into string
No it is not possible, if want datetimes without times there is only pattern YYYY-MM-DD in python/pandas.
#datetimes with no times
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.floor('d')
#python dates
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.date
For all custom formats are datetimes converted to strings like:
df['Date'] = df['Date'].dt.strftime('%d/%b/%Y')
You can set the date_format in the excelwriter
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter',
date_format='%d/%b/%Y')
df.to_excel(writer)
think i am bit late here, as a workaround
do not format the date column , let it be a regular df date column, save the excel workbook and then open the excel again and using openpyxl module format that column range
import openpyxl
workbook = openpyxl.load_workbook(file_path)
sheet = workbook['Sheet1'] # get the active sheet
#-- assuming that the column is M and data starts from M2
last_line_end = 'M' + str(len(df)+1)
for row in sheet['M2:' + last_line_end]:
for cell in row:
cell.number_format = "DD/MM/YY"
workbook.save(file_name) # save workbook
workbook.close()
I have a dateframe column in Python that is in the format YYMM. E.g January 1996 is 9601.
I'm having a hard time converting it from 9601 to a useable date time format. I want the new format to be 01-01-1996. Does anyone have any suggestions? I tried pd.to_datetime function but it's not getting the results I'm looking for.
Use to_datetime with parameter format:
df = pd.DataFrame({'col':['9601', '9705']})
df['col'] = pd.to_datetime(df['col'], format='%y%m')
print (df)
col
0 1996-01-01
1 1997-05-01
I am reading from an Excel sheet. The header is date in the format of Month-Year and I want to keep it that way. But when it reades it, it changes the format to "2014-01-01 00:00:00". I wrote the following peice to fix it, but doesn't work.
import pandas as pd
import numpy as np
import datetime
from datetime import date
import time
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df.columns=pd.to_datetime(df.columns, format='%b-%y')
Which didn't do anything. On another try, I did the following:
df.columns = datetime.datetime.strptime(df.columns, '%Y-%m-%d %H:%M:%S').strftime('%b-%y')
Which returns the must be str, not datetime.datetime error. I don't know how make it read the row cell by cell to read the strings!
Here is a sample data:
NaT 11/14/2015 00:00:00 12/15/2015 00:00:00 1/15/2016 00:00:00
A 5 1 6
B 6 3 3
My main problem with this is that it does not recognize it as the header, e.g., df['11/14/2015 00:00:00'] retuns an keyError.
Any help is appreciated.
UPDATE: Here is a photo to illustrate what I keep geting! Box 6 is the implementation of apply, and box 7 is what my data looks like.
import datetime
df = pd.DataFrame({'data': ["11/14/2015 00:00:00", "11/14/2015 00:10:00", "11/14/2015 00:20:00"]})
df["data"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'))
EDIT
If you'd like to work with df.columns you could use map function:
df.columns = list(map(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'), df1.columns))
You need list if you are using python 3.x because it's iterator by default.
The problem might be that the data in excel isn't stored in the string format you think it is. Perhaps it is stored as a number, and just displayed as a date string in excel.
Excel sometimes uses milliseconds after an epoch to store dates.
Check what the actual values you see in the df array.
What does this show?
from pprint import pprint
pprint(df)