How to format all dates in a sheet by Pandas? - python

I had the below sheet data in a excel file:
id data_1 data_2
1 2018/11/11 00:00 123
2 123 2018/11/2 00:00
The date in excel actully is a float, so I want change it to str by using the following syntax:
df = df.astype(dtype=str)
But the pandas change the date format YYYY/MM/DD to YYYY-MM-DD,so I get this in the output:
id data_1 data_2
1 2018-11-11 00:00 123
2 123 2018-11-2 00:00
How do change all dates to str and keep it format as YYYY/MM/DD?
I'm unable to use df.to_datetime() or some syntax like this, because not all dates are in a particular column.And I don't want to traverse all columns to achieve it.
The only way I konw is use regex:
df.replace(['((?<=[0-9]{4})-(?=([0-9]{2}-[0-9]{2})))|((?<=[0-9]{4}-[0-9]{2})-(?=[0-9]{2}))'], ['/'], regex=True)
But it will lead to errors while I have a YYYY-MM-DD data in some other str data.
I only want change the date type in sheet, and df.astype can do it. The only problem is I want YYYY/MM/DD instead of YYYY-MM-DD.
In general, I want change all dates in sheet to type of str. And format it to YYYY/MM/DD HH:MM:SS. astype can achieve the first step.
Is there a simple and quick way to achieve this?
Think you for reading.

consider you have a dataframe with datetime objects but also random integers:
df = pd.DataFrame(pd.date_range(dt.datetime(2018,1,1), dt.datetime(2018,1,6)))
df[0][0] = 123
print(df)
0
0 123
1 2018-01-02
2 2018-01-03
3 2018-01-04
4 2018-01-05
5 2018-01-06
now you can create a new column with the datetime in the desired format by using df.apply and this function convert:
def convert(x):
try:
return x.strftime('%Y/%m/%d')
except AttributeError:
return str(x)
df['date'] = df[0].apply(convert)
print(df)
0 date
0 123 123
1 2018-01-02 00:00:00 2018/01/02
2 2018-01-03 00:00:00 2018/01/03
3 2018-01-04 00:00:00 2018/01/04
4 2018-01-05 00:00:00 2018/01/05
5 2018-01-06 00:00:00 2018/01/06
Note: it might be a better idea to clean up the dates first to avoid unexpected behavior. For example with this
df[df[0].apply(lambda x: type(x)==pd._libs.tslibs.timestamps.Timestamp)]

Related

Python - Converting a column with weekly data to a datetime object

I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')

Issue in converting column to datetime in pandas

I have a csv and am reading the csv using the following code
df1 = pd.read_csv('dataDate.csv');
df1
Out[57]:
Date
0 01/01/2019
1 01/01/2019
2 01/01/2019
3 01/01/2019
4 01/01/2019
5 01/01/2019
Currently the column has dtype : dtype('O') I am now doing the following command to convert the following date to datetime in the format %d/%m/%Y
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
It produces output as :
9 2019-01-01
35 2019-01-01
48 2019-01-01
38 2019-01-01
18 2019-01-01
36 2019-01-01
31 2019-01-01
6 2019-01-01
Not sure what is wrong here, I want the same format as the input for my process. Can anyone tell what's wrong with the same?
Thanks
The produced output is the default format for pandas' datetime object, so there is nothing wrong. Yet, you can play around with the format and produce a datetime string with strftime method. This built-in method for python is implemented in pandas.
You can try the following:
df1.Date = pd.to_datetime(df1.Date, format='%d/%m/%Y')
df1['my_date'] = df1.Date.dt.strftime('%d/%m/%Y')
So that 'my_date' column has the desired format. Yet, you cannot do datetime operations with that column, but you can use for representation. You can work with Date column for your mathematical operations, etc. and represent them with my_date column.

Python pandas only reads days when comparing dates from csv

so let's say this is my code:
df = pd.read_table('file_name', sep=';')
pd.Timestamp("today").strftime(%d.%m.%y)
df = df[(df['column1'] < today)]
df
Here's the table from the csv file:
Column 1
27.02.2018
05.11.2018
22.05.2018
01.11.2018
01.08.2018
01.08.2018
16.10.2018
22.08.2018
21.11.2018
so as you can see, I imported a table from a csv file. I only need to see dates before today (16.10.2018), but when I run the code this is what I get
Column 1
05.11.2018
01.11.2018
01.08.2018
01.08.2018
Which means Python is only looking at the days and ignoring the months, and this is wrong. I need it to understand this is a date not just numbers. What do I do to achieve that?
PS I'm new to Python
You should convert your column to the date type, not strings, since strings are compared lexicographically.
You can thus convert it with:
# convert the strings to date(time) objects
df['column1'] = pd.to_datetime(df['column1'], format='%d.%m.%Y')
Then you can compare it with a date object, like:
>>> from datetime import date
>>> df[df['column1'] < date.today()]
column1
0 2018-02-27
1 2018-05-11
2 2018-05-22
3 2018-01-11
4 2018-01-08
5 2018-01-08
7 2018-08-22

Error converting data type float to datetime format

I would like to convert the data type float below to datetime format:
df
Date
0 NaN
1 NaN
2 201708.0
4 201709.0
5 201700.0
6 201600.0
Name: Cred_Act_LstPostDt_U324123, dtype: float64
pd.to_datetime(df['Date'],format='%Y%m.0')
ValueError: time data 201700.0 does not match format '%Y%m.0' (match)
How could I transform these rows without month information as yyyy01 as default?
You can use pd.Series.str.replace to clean up your month data:
s = [x.replace('00.0', '01.0') for x in df['Date'].astype(str)]
df['Date'] = pd.to_datetime(s, format='%Y%m.0', errors='coerce')
print(df)
Date
0 NaT
1 NaT
2 2017-08-01
4 2017-09-01
5 2017-01-01
6 2016-01-01
Create a string that contains the float using .asType(str), then split the string at the fourth char and using cat insert a hyphen. Then you can use format='%Y%m.
However this may still fail if you try to use incorrect month numbering, such as month 00
string = df['Date'].astype(str)
s = pd.Series([string[:4], '-',string[4:6])
date = s.str.cat(sep=',')
pd.to_datetime(date.astype(str),format='%Y%m')

Efficiently handling missing dates when aggregating Pandas Dataframe

Follow up from Summing across rows of Pandas Dataframe and Pandas Dataframe object types fillna exception over different datatypes
One of the columns that I am aggregating using
df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()
this method is not very forgiving if there are missing data. If there are any missing data in same1, same2, etc it pads totally unrelated values. Workaround is to do a fillna loop over the columns to replace missing strings with '' and missing numbers with zero solves the problem.
I do however have one column with missing dates as well. column type is 'object' with nan of type float and in the missing cells and datetime objects in the existing data fields. important that I know that the data is missing, i.e. the missing indicator must survive the groupby transformation.
Dataset outlining the problem:
csv file that I use as input is:
Date,Stock,Position,Expiry,same
2012/12/01,A,100,2013/06/01,AA
2012/12/01,A,200,2013/06/01,AA
2012/12/01,B,300,,BB
2012/6/01,C,400,2013/06/01,CC
2012/6/01,C,500,2013/06/01,CC
I then read in file:
df = pd.read_csv('example', parse_dates=[0])
def convert_date(d):
'''Converts YYYY/mm/dd to datetime object'''
if type(d) != str or len(d) != 10: return np.nan
dd = d[8:]
mm = d[5:7]
YYYY = d[:4]
return datetime.datetime(int(YYYY), int(mm), int(dd))
df['Expiry'] = df.Expiry.map(convert_date)
df
df looks like:
Date Stock Position Expiry same
0 2012-12-01 00:00:00 A 100 2013-06-01 00:00:00 AA
1 2012-12-01 00:00:00 A 200 2013-06-01 00:00:00 AA
2 2012-12-01 00:00:00 B 300 NaN BB
3 2012-06-01 00:00:00 C 400 2013-06-01 00:00:00 CC
4 2012-06-01 00:00:00 C 500 2013-06-01 00:00:00 CC
can quite easily change the convert_date function to pop anything else for missing data in Expiry column.
Then using:
df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
to aggregate the Position column. Get a TypeError: can't compare datetime.datetime to str with any non date that I plug into missing date data. Important for later functionality to know if Expiry is missing.
You need to convert your dates to the datetime64[ns] dtype (which manages how datetimes work). An object column is not efficient nor does it deal well with datelikes. datetime64[ns] allow missing values usingNaT (not-a-time), see here: http://pandas.pydata.org/pandas-docs/dev/missing_data.html#datetimes
In [6]: df['Expiry'] = pd.to_datetime(df['Expiry'])
# alternative way of reading in the data (in 0.11.1, as ``NaT`` will be set
# for missing values in a datelike column)
In [4]: df = pd.read_csv('example',parse_dates=['Date','Expiry'])
In [9]: df.dtypes
Out[9]:
Date datetime64[ns]
Stock object
Position int64
Expiry datetime64[ns]
same object
dtype: object
In [7]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
Out[7]:
Stock Expiry same Position
0 A 2013-06-01 00:00:00 AA 300
1 B NaT BB 300
2 C 2013-06-01 00:00:00 CC 900
In [8]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum().dtypes
Out[8]:
Stock object
Expiry datetime64[ns]
same object
Position int64
dtype: object

Categories