Formatting datetime pandas - python

I have some rows in my dataset with the following release date format:
1995-10-30
It is an object/string. However, I want to convert it to datetime, so I wrote the following to achieve that:
movies_df["release_date"] = pd.to_datetime(movies_df.release_date)
It gets converted to datetime as it should, but I would like to have the following format
mm-dd-year
I have tried yearfirst=False and dayfirst=False but nothing seems to be happening and I cant figure out why it isnt working.
I have also tried to specify the format in the to_datetime method as following:
movies_df["release_date"] = pd.to_datetime(movies_df.release_date, format="%Y/%m/%d", dayfirst=False, yearfirst=False)
Any help is appriciated

You can convert datetimes to strings with format mm-dd-YY:
movies_df["release_date"] = pd.to_datetime(movies_df.release_date).dt.strftime('%m-%d-%Y')
But if want datetimes in format mm-dd-YY it is not possible in python.

Related

Pyspark - Convert to Timestamp

Spark version : 2.1
I'm trying to convert a string datetime column to utc timestamp with the format yyyy-mm-ddThh:mm:ss
I first start by changing the format of the string column to yyyy-mm-ddThh:mm:ss
and then convert it to timestamp type. Later I would convert the timestamp to UTC using to_utc_timestamp function.
df.select(
f.to_timestamp(
f.date_format(f.col("time"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss"
)
).show(5, False)
The date_format works fine by giving me the correct format. But, when I do to_timestamp on top of that result, the format changes to yyyy-MM-dd HH:mm:ss, when it should instead be yyyy-MM-dd'T'HH:mm:ss. Why does this happen?
Could someone tell me how I could retain the format given by date_format? What should I do?
The function to_timestamp returns a string to a timestamp, with the format yyyy-MM-dd HH:mm:ss.
The second argument is used to define the format of the DateTime in the string you are trying to parse.
You can see a couple of examples in the official documentation.
The code should be like this, just look at the single 'd' part here, and this is tricky in many cases.
data= data.withColumn('date', to_timestamp(col('date'), 'yyyy/MM/d'))

changing time format in pandas

I have a dataframe with a column datetime that looks like this 2020-05-03T14:51:31.23625 (I assume %Y-%m-%dT%H:%M:%S)
I would like to change it to dd/mm/yyyy hh:mm:ss format.
I found this post and I tried something similar (code below) but it works ony for the first row of the dataframe. Could someone help me to find the mistake? Thanks!
df['time']=pd.DataFrame({'time':pd.to_datetime(df['time'])})
df['new'] = df['time'].dt.strftime("%d/%m/%Y %H:%M:%S")
[![enter image description here][2]][2]
Try via split() and to_datetime() method:
df['datetime']=pd.to_datetime(df['datetime'].str.split('.').str[0],errors='coerce')

How to fix weird date values to datetime type in pandas

I'm new to Python and dataframes.
I have a date value that is not formatted as date. Since this value has a 'weird' format, pandas' function to_datetime() doesn't work properly. The values are formatted like:
['20190630', '20190103']
This is the 'yyyymmdd' format.
I have tried to slice the values and make different columns where I extract the year- month- day. But this doesn't work, since the slicing wasn't working. This is the code I have now, but it isn't doing anything.
df.Date = pd.to_datetime(df.date)
I would like to have the dd-mm-yyyy format and datetime type. What can I do?

Convert UTC timestamp to local timezone issue in pandas

I'm trying to convert a Unix UTC timestamp to a local date format in Pandas. I've been looking through a few solutions but I can't quite get my head around how to do this properly.
I have a dataframe with multiple UTC timestamp columns which all need to be converted to a local timezone. Let's say EU/Berlin.
I first convert all the timestamp columns into valid datetime columns with the following adjustments:
df['date'] = pd.to_datetime(df['date'], unit='s')
This works and gives me the following outcome e.g. 2019-01-18 15:58:25 if I know try to adjust the timezone for this Date Time I have tried both:
df['date'].tz_localize('UTC').tz_convert('Europe/Berlin')
and
df['date'].tz_convert('Europe/Berlin')
In both cases the error is: TypeError: index is not a valid DatetimeIndex or PeriodIndex and I don't understand why.
The problem must be that the DateTime column is not on the index. But even when I use df.set_index('date') and after I try the above options it doesn't work and I get the same error.
Also, if it would work it seems that this method only allows the indexed DateTime to be timezone adjusted. How would I then adjust for the other columns that need adjustment?
Looking to find some information on how to best approach these issues once and for all! Thanks
You should first specify that it is a datetime by adding the .dt. to a non index
df['date'] = df['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Berlin')
This should be used if the column is not the index column.

How to deal with multiple date string formats in a python series

I have a csv file which I am trying to complete operations on. I have created a dataframe with one column titled "start_date" which has the date of warranty start. The problem I have encountered is that the format of the date is not consistent. I would like to know the number of days passed from today's calendar date and the date warranty started for this product.
Two examples of the entries in this start_date series:
9/11/15
9/11/15 0:00
How can I identify each of these formats and treat them accordingly?
Unfortunately you just have to try each format it might be. If you give an example format, strptime will attempt to parse it for you as discussed here.
The code will end up looking like:
import datetime
POSSIBLE_DATE_FORMATS = ['%m/%d/%Y', '%Y/%m/%d', etc...] # all the formats the date might be in
for date_format in POSSIBLE_DATE_FORMATS :
try:
parsed_date = datetime.strptime(raw_string_date, date_format) # try to get the date
break # if correct format, don't test any other formats
except ValueError:
pass # if incorrect format, keep trying other formats
You have a few options really. I'm not entirely sure what happens when you try to directly load the file with a 'pd.read_csv' but as suggested above you can define a set of format strings that you can try to use to parse the data.
One other option would be to read the date column in as a string and then parse it yourself. If you want the column to be like 'YYYY-MM-DD' then parse the string to have just that data and then save it back, something like.
import pandas as prandas
import datetime
df = prandas.read_csv('supa_kewl_data.dis_fmt_rox', dtype={'start_date': str})
print df.head()
# we are interested in start_date
date_strs = df['start_date'].values
#YYYY-MM-DD
#012345678910
filter_date_strs = [x[0:10] for x in date_strs]
df['filter_date_strs] = filter_date_strs
# sometimes i've gotten complained at by pandas for doing this
# try doing df.loc[:,'filter_date_strs'] = filter_date_strs
# if you get some warning thing
# if you want you can convert back to date time using a
dobjs = [datetime.datetime.strptime(x,'%Y-%m-%d') for x in filter_date_strs]
df['dobj_start_date'] = dobjs
df.to_csv('even_better_data.csv', index=False)
Hopefully this helps! Pandas documentation is sketchy sometimes, looking at the doc in 0.16.2 for read_csv() is intimidating... http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
The library itself is stellar!
Not sure if this will help, but this is what I do when I'm working with Pandas on excel files and want the date format to be 'mm/dd/yyyy' or some other.
writer = pd.ExcelWriter(filename, engine='xlsxwriter', datetime_format='mm/dd/yyyy')
df.to_excel(writer, sheetname)
Maybe it'll work with:
df.to_csv

Categories