Can't apply proper format to a pandas dateframe - python

I have a code for date interpretation that goes like this:
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
The results, however, vary in the format. Some are like this: 2017-08-17 04:00:00.000, and others are like that: 2020-05-05 20:00:00. Is there a way to remove the miliseconds? I've tried with the format kwarg, but pandas doesn't allow both unit and format in the same line. I've tried messing with the replace function, but only got errors. So is there a way to remove them?

Here you go:
data['timestamp'] = data['timestamp'].astype('datetime64[s]')

Related

pandas.to_datetime() does not filter when used with loc[] and comparison operator

I downloaded a .csv file to do some practice, a column named "year_month" is string with the format "YYYY-MM"
By doing:
df = pd.read_csv('C:/..../migration_flows.csv',parse_dates=["year_month"])
"year_month" is Dtype=object. So far so good.
By doing:
df["year_month"] = pd.to_datetime(df["year_month"],format='%Y-%m-%d')
it is converted to daterime64[ns]. So far so good.
I try to filter certain dates by doing:
filtered_df = df.loc[(df["year_month"]>= pd.Timestamp(2018-1-1))]
The program returns the whole column as if nothing happened. For instance, it starts displaying, starting from the date "2001-01-01"
Any thoughts on how to filter properly? Many thanks
how about this
df.loc[(df["year_month"]>= pd.to_datetime('2018-01-01'))]
or
df.loc[(df["year_month"]>= pd.Timestamp('2018-01-01'))]

Get date format code from a string/datetime using python

is there a way to find out in Python the date format code of a string?
My Input would be e.g.:
2020-09-11T17:42:33.040Z
What I am looking for is in this example to get this:
'%Y-%m-%dT%H:%M:%S.%fZ'
Point is that I have diffrent time Formats for diffrent Files, therefore I don't know in Advancce how my datetime code format will look like.
For processing my data, I need unix time format, but to calculate that I need a solution to this problem.
data["time_unix"] = data.time.apply(lambda row: (datetime.datetime.strptime(row, '%Y-%m-%dT%H:%M:%S.%fZ').timestamp()*100))
Thank you for the support!

How to specify the format of timestamp in python

I have a dataframe with dates in string format. I convert those dates to timestamp, so that I could use this date column in the later part of the code. Everything is fine with calculations/comparisons etc, but I would like the timestamp to appear in %d.%m.%Y format, as opposed to default %Y-%m-%d. Let me illustrate it -
dt=pd.DataFrame({'date':['09.12.1998','07.04.2014']},index=[1,2])
dt
Out[4]:
date
1 09.12.1998
2 07.04.2014
dt['date_1']=pd.to_datetime(dt['date'],format='%d.%m.%Y')
dt
Out[7]:
date date_1
1 09.12.1998 1998-12-09
2 07.04.2014 2014-04-07
I would like to have dt['date_1'] to de displayed in the same format as dt['date']. I don't wish to use .strftime() function because it will convert the datatype from timestamp to string.
In Nutshell: How can I invoke the python system in displaying the timestamp in the format of my choice(months could be like APR, MAY etc), rather than getting a default format(like 1998-12-09), keeping in mind that the data type remains a timestamp, rather than string?
It seems Pandas didn't implement this option yet:
https://github.com/pandas-dev/pandas/issues/11501
having a look at https://pandas.pydata.org/pandas-docs/stable/options.html looks like you can set the display to achieve some of this, although not all.
display.date_dayfirst When True, prints and parses dates with the day first, eg 20/01/2005
display.date_yearfirst When True, prints and parses dates with the year first, eg 2005/01/20
so you can have dayfirst, but they haven't included names for months.
On a more fundamental level, whenever you're displaying something it is a string, right? I'm not sure why you wouldn't be able to convert it when you're displaying it without having to change the original dataframe.
your code would be:
pd.set_option("display.date_dayfirst", True)
except actually this doesn't work:
https://github.com/pandas-dev/pandas/issues/11501
the options have been implemented for parsing, but not for displaying.
Hallo Stael/Cezar/Droravr, Thank you all for providing your inputs. I value your time and appreciate your help a lot. Thanks for sharing this link https://github.com/pandas-dev/pandas/issues/11501 as well. I went through the link and understood that this problem can be broken down to a 'displaying problem' ultimately, as also expounded by jreback. This issue to have the dates displayed to your desired format has been marked as an Enhancement, so probably will be added to future versions.
All I wanted was the have to dates exported as dd-mm-yyy and by just formatting the string while exporting, we could solve this problem.
So, I sorted this issue by exporting the file as -
dt.to_csv(filename, date_format='%d-%m-%Y',index=False).
date date_1
09.12.1998 09-12-1998
07.04.2014 07-04-2014
Thus, this issue stands SOLVED.
Once again, thank you all for your kind help and the precious hours you spent with this issue. Deeply appreciated.

python pandas to_datetime change format

My input is text based, e.g. Column "ClosedDate" = "2016-10-31 16:54:18"
With:
df.ClosedDate = pd.to_datetime(df.ClosedDate).dt.date
I format that as "2016-10-31", i.e. keeping the date part only and dropping the time, which works fine so far, but what I need is "31.10.2016".
what would be the best and most "elegant way to accomplish that?
I tried adding a "format = "%d%m%Y" but that doesn't work.
thanks

Pandas plotting: How to format datetimeindex?

I am doing a barplot out of a dataframe with a 15min datetimeindex over a couple of years.
Using this code:
df_Vol.resample(
'A',how='sum'
).plot.bar(
title='Sums per year',
style='ggplot',
alpha=0.8
)
Unfortunately the ticks on the X-axis are now shown with the full timestamp like this: 2009-12-31 00:00:00.
I would prefer to Keep the code for plotting short, but I couldn't find an easy way to format the timestamp simply to the year (2009...2016) for the plot.
Can someone help on this?
As it does not seem to be possible to Format the date within the Pandas df.plot(), I have decided to create a new dataframe and plot from it.
The solution below worked for me:
df_Vol_new = df_Vol.resample('A',how='sum')
df_Vol_new.index = df_Vol_new.index.format(formatter=lambda x: x.strftime('%Y'))
ax2 =df_Vol_new.plot.bar(title='Sums per year',stacked=True, style='ggplot', alpha=0.8)
I figured an alternative (better, at least to me) way is to add the following to df_Vol_new.plot() command:
plt.legend(df_Vol_new.index.to_period('A'))
This way you would reserve df_Vol_new.index datetime format while getting better plots at the same time.

Categories