How to read_excel with a dayfirst condition? - python

I'm trying to read_excel through pandas. I have a date column in the format DD/MM/YYYY. Pandas will automatically read this as month first and as far as I've been able to tell there is no dayfirst function like there is with read_csv.
Is there a way to do read_excel while specifying date format?
xlxs_data = pd.DataFrame()
df = pd.read_excel('new.xlsx')
xlsx_data = xlxs_data.append(df, ignore_index=True, dayfirst=True)
TypeError: append() got an unexpected keyword argument 'dayfirst'

The dayfirst perimeter does not work with read_excel in Version 1.1.4 of pandas. The docs state "For non-standard datetime parsing, use pd.to_datetime after pd.read_excel."
So read in your data
df = pd.read_excel('new.xlsx', engine="openpyxl")
Then use this
pd.to_datetime(df['col_name'], dayfirst=True)
Or this
pd.to_datetime(df['col_name'], format='%d/%m/%Y')
Some info on format codes can be found here https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
Remember that Pandas displays dates in ISO format YYYY-MM-DD. If you want to convert to a different format you need to convert the datetime object into a string. But if you do that you will lose all the functionality of datetime object so best to do that during export.

You can pass dayfirst=True as a param to read_excel, although the docs don't state this is a param it recognises, it accepts kwargs and will resolve your problem:
df = pd.read_excel('new.xlsx', dayfirst=True)

Related

Keeping the detatime type when using to_csv [duplicate]

The default output format of to_csv() is:
12/14/2012 12:00:00 AM
I cannot figure out how to output only the date part with specific format:
20121214
or date and time in two separate columns in the csv file:
20121214, 084530
The documentation is too brief to give me any clue as to how to do these. Can anyone help?
Since version v0.13.0 (January 3, 2014) of Pandas you can use the date_format parameter of the to_csv method:
df.to_csv(filename, date_format='%Y%m%d')
You could use strftime to save these as separate columns:
df['date'] = df['datetime'].apply(lambda x: x.strftime('%d%m%Y'))
df['time'] = df['datetime'].apply(lambda x: x.strftime('%H%M%S'))
and then be specific about which columns to export to csv:
df[['date', 'time', ... ]].to_csv('df.csv')
To export as a timestamp, do this:
df.to_csv(filename, date_format='%s')
The %s format is not documented in python/pandas but works in this case.
I found the %s from the dates formats of ruby. Strftime doc for C here
Note that the timestamp miliseconds format %Q does not work with pandas (you'll have a litteral %Q in the field instead of the date). I caried my sets with python 3.6 and pandas 0.24.1

How to fix weird date values to datetime type in pandas

I'm new to Python and dataframes.
I have a date value that is not formatted as date. Since this value has a 'weird' format, pandas' function to_datetime() doesn't work properly. The values are formatted like:
['20190630', '20190103']
This is the 'yyyymmdd' format.
I have tried to slice the values and make different columns where I extract the year- month- day. But this doesn't work, since the slicing wasn't working. This is the code I have now, but it isn't doing anything.
df.Date = pd.to_datetime(df.date)
I would like to have the dd-mm-yyyy format and datetime type. What can I do?

pandas read_csv not converting string to date

I've looked for help on this one and didn't find the answer (i'm sure i'm asking the wrong question)
I have a CSV file, it has dates in it, when i read it in, the date conversion doesn't happen.
import pandas
df = pd.read_csv('file', index_col='Sequence', parse_dates='Date')
CSV file
Sequence,Date,Unit,Name,Indexed,Arbitrated,Redo
1,2013-01-01,Aloha,first last,831,0,0
df.Date is a bunch of strings not datetime values
You need to pass the column to parse as a list, not a string:
df = pd.read_csv('file', index_col='Sequence', parse_dates=['Date'])
The docstring explanation for parse_dates says "list of ints or names", as in this way you can specify multiple columns to parse. But I have to agree that for one column it is a bit surprising.

How to deal with multiple date string formats in a python series

I have a csv file which I am trying to complete operations on. I have created a dataframe with one column titled "start_date" which has the date of warranty start. The problem I have encountered is that the format of the date is not consistent. I would like to know the number of days passed from today's calendar date and the date warranty started for this product.
Two examples of the entries in this start_date series:
9/11/15
9/11/15 0:00
How can I identify each of these formats and treat them accordingly?
Unfortunately you just have to try each format it might be. If you give an example format, strptime will attempt to parse it for you as discussed here.
The code will end up looking like:
import datetime
POSSIBLE_DATE_FORMATS = ['%m/%d/%Y', '%Y/%m/%d', etc...] # all the formats the date might be in
for date_format in POSSIBLE_DATE_FORMATS :
try:
parsed_date = datetime.strptime(raw_string_date, date_format) # try to get the date
break # if correct format, don't test any other formats
except ValueError:
pass # if incorrect format, keep trying other formats
You have a few options really. I'm not entirely sure what happens when you try to directly load the file with a 'pd.read_csv' but as suggested above you can define a set of format strings that you can try to use to parse the data.
One other option would be to read the date column in as a string and then parse it yourself. If you want the column to be like 'YYYY-MM-DD' then parse the string to have just that data and then save it back, something like.
import pandas as prandas
import datetime
df = prandas.read_csv('supa_kewl_data.dis_fmt_rox', dtype={'start_date': str})
print df.head()
# we are interested in start_date
date_strs = df['start_date'].values
#YYYY-MM-DD
#012345678910
filter_date_strs = [x[0:10] for x in date_strs]
df['filter_date_strs] = filter_date_strs
# sometimes i've gotten complained at by pandas for doing this
# try doing df.loc[:,'filter_date_strs'] = filter_date_strs
# if you get some warning thing
# if you want you can convert back to date time using a
dobjs = [datetime.datetime.strptime(x,'%Y-%m-%d') for x in filter_date_strs]
df['dobj_start_date'] = dobjs
df.to_csv('even_better_data.csv', index=False)
Hopefully this helps! Pandas documentation is sketchy sometimes, looking at the doc in 0.16.2 for read_csv() is intimidating... http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
The library itself is stellar!
Not sure if this will help, but this is what I do when I'm working with Pandas on excel files and want the date format to be 'mm/dd/yyyy' or some other.
writer = pd.ExcelWriter(filename, engine='xlsxwriter', datetime_format='mm/dd/yyyy')
df.to_excel(writer, sheetname)
Maybe it'll work with:
df.to_csv

to_datetime not working on a string in the format YYYY-MM-DD HH:MM in pandas

I have a date in the format 2014-01-31 05:47.
When its read into pandas the object gets changed as object.
When i try to change it to pd.to_datetime, there is no error, but the datatype does not change to datatime.
Please suggest some way out.
T=pd.read_csv("TESTING.csv")
T['DATE']=pd.to_datetime(T['DATE'])
T.dtypes
>DATE object
T['DATE']
>2014-01-31 05:47
Basically, Pandas doesn't understand what the string "2014-01-31 05:47" is other than the fact that you gave it a string. If you read this string in from a CSV file then read the Pandas docs on the read_csv method that allows you to parse datetimes.
However, given something like this:
records = ["2014-01-31 05:47", "2014-01-31 14:12"]
df = pandas.DataFrame(records)
df.dtypes
>0 object
>dtype: object
This is because you haven't told Pandas how to parse your string into a datetime (or TimeStamp) type.
Using the pandas.to_datetime method is what you want but you must be careful to pass it only the column that has the values you want to convert. Remember that pandas won't mutate the dataframe you're working on, you need to save it again.
df[0] = pandas.to_datetime(df[0])
df.dtypes
>0 datetime64[ns]
>dtype: object
This is what you want. The cells are now the right format.
There are many ways to achieve the same thing, you could use the apply() method with a lambda, correctly parse from CSV or SQL or work with Series.

Categories