Pandas column - converting string into specific date format - python

One of the columns in my pandas dataframe looks like this:
14.3.2019
15.3.2019
16.3.2019
So this is European/German date that I have to convert to USA format:
2019-3-14
2019-3-15
2019-3-16
What is the fastest way to do it, possibly inplace, if I have a large dataset?

Correct answer given by both commenters, posting here faster solution from #QuangHoang.
Casting string column in date type in desired format:
df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y').dt.strftime('%Y-%m-%d')

Related

Split the given integer value as date

20160116
Suppose this is the data with datatype integer in a column and now I want to convert it like 2016/01/16 or 2016-01-16 and datatype as date. My column name is system and dataframe is df. How can I do that?
I tried using many date format function but It was not good enough to achieve the answer.
convert using to_datetime, provide the format
then convert to the format of your desire
pd.to_datetime(df['dte'], format='%Y%m%d').dt.strftime('%Y/%m/%d')
0 2016/01/06
Name: dte, dtype: object
Using str.replace we can try:
df["date"] = df["system"].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1/\2/\3', regex=True)

Converting a date format from a csv. file in python from YYYY-MM-DD HH:MM:SS+00:00 to YYYY-MM-DD

I have a dataframe with one of the columns being 'dates' (being a dtype: object) where I have a format YYYY-MM-DD HH:MM:SS+00:00 (there is a space between the days and the hours) but I want to simplify this by just having the YYYY-MM-DD format. Is there a way to cut off the HH:MM:SS+00:00 with a few lines of code? I've tried using but it didn't work:
pd.to_datetime(combined_csv['dates'], format='%Y-%m-%dT')
Any suggestions?
I hope that's useful for you
import pandas as pd
df = pd.read_csv("combined.csv")
df[["Date", "Time"]] = df["dates"].str.split(" ", expand=True)

Python - Pandas - Convert YYYYMM to datetime

Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')
This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime is having an issue with that.
Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime ignore them?
pass errors='coerce' and then dropna the NaT rows:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()
The duff month values will get converted to NaT values
In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')
Out[36]: NaT
Alternatively you could filter them out before the conversion
df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')
although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce' is a simpler solution
Clean up the dataframe first.
df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])
May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.
First turn YYYYMM into index, then convert it to monthly period.
df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')

Getting columns with datetime format such as (2017-02-12 10:23:55 AM)[YYYY-MM-dd hh:mm:ss AM/PM] using pandas

I recently asked a question about identifing all the columns which are datetime. Here it is: Get all columns with datetime type using pandas?
The answer was correct for a proper date time format, however, I now realize my data isn't proper date time, it is a string formatted like "2017-02-12 10:23:55 AM" and I was advised to create a new question.
I have a huge dataframe with an unknown number of date time columns, where I do not know their names nor their position. How do I identify the column names of the date time columns which have the date of format such as YYYY-MM-dd hh:mm:ss AM/PM?
One way to do this would be to test for successful conversion:
def is_datetime(datetime_string):
try:
pd.to_datetime(datetime_string)
return True
except ValueError:
return False
With this:
dt_columns = [c for c in df.columns if is_datetime(df[c][0])]
Note: This tests for any string that can be converted to a datetime.

Converting objects from CSV into datetime

I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)
if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.
At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.
I've tried multiple methods but have run into a problem every way I've tried.
Any help would be appreciated. If I need to give any more information then I will.
As suggested by #MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:
df['datetime'] = pd.to_datetime(df.datetime)
You would have to do this on whatever columns you have that you need trasformed to the right dtype.
Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:
df = pd.read_csv(path, parse_dates=[1,2,3])
where columns 1,2,3 are expected to contain data that can be interpreted as dates.
I hope this helps.
convert a column to datetime using this approach
df["Date"] = pd.to_datetime(df["Date"])
If column has empty values then change error level to coerce to ignore errors: Details
df["Date"] = pd.to_datetime(df["Date"], errors='coerce')
After which you should be able to subtract two dates.
example:
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
consult this answer for more details:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
If you want to directly load the column as datetime object while reading from csv, consider this example :
Pandas read csv dateint columns to datetime
I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.

Categories