Pandas 19 not parsing dates - python

A script I'd written using an earlier version of Pandas now no longer works. Date parsing is not working. This is my read_html line:
gnu = pd.read_html('gnucash.html', flavor="html5lib", header=0, parse_dates=['Date'])
Pandas identifies the HTML table properly but returns the date as unicode. The HTML has been generated by Gnucash and is in ISO format Y-m-d (no times).
Whatever I do I can't get Pandas to recognise the dates. I tried including a date_parser, but read_html doesn't recognise that.
Apologies #IanS, I've inadvertently deleted your comment. When I set out an example it worked. I think the problem is with my html file. There must be a non-date buried in the date column. Anyway, Pandas did what it ought to with my sample file.
Thanks for taking an interest.

Related

Reading in spreadsheet with Python Pandas that does not parse the dates

I am trying to read in a spreadsheet with Python Pandas that does not parse the dates.
I have tried to use a lot of the methods mentioned in a previous posting, but none work for me.
This is the code (spreadsheet is just the name of the file):
columns = pd.ExcelFile(spreadsheet).parse(tab).columns
converters = {column: str for column in columns}
df1 = pd.read_excel(spreadsheet, sheet_name=tab, parse_dates=False, dtype=converters)
I realize that there is a previous posting on this, but none of the suggested fixes work for me. I've even included them above, but I am still getting parsed dates in the text files that I am creating out of the spreadsheet.

Pandas won't recognize date while reading csv

I'm working on a script which reads in a .csv file with pandas and fills in a specific form.
One column in the .csv file is a birthday-column.
While reading the .csv I parse it with 'parse_dates' to get a datetime object so i can format it for my needs:
df = pd.read_csv('readfile1.csv',sep=';', parse_dates=['birthday'])
While it works perfectly with readfile1.csv, it won't work with readfile2.csv. But these files look exactly the same.
The error i get makes me think that the automatic parsing to datetime through pandas is not working:
print(df.at[i,'birthday'].strftime("%d%m%Y"))
AttributeError: 'str' object has no attribute 'strftime'
In both cases the format of the birthday looks like:
'1965-05-16T12:00:00.000Z' #from readfile1.csv
'1934-04-06T11:00:00.000Z' #from readfile2.csv
I can't figure out what's wrong. I checked the encoding of the files and both are 'UTF-8'. Any ideas?
Thank you!
Greetings
if you do not set keyword parse_dates, and convert the column after
reading the csv, with pd.to_datetime and keyword errors='coerce', what
result do you get? does the column have NaT values? – MrFuppes 32 mins
ago
MrFuppes comment on calling pd.to_datetime led to success. One faulty date in the column was the cause of the error. Also Lumber Jacks's hint was helpful to determine the datatypes!

How can i convert my date column to datetime?

I have imported some data but the date column is in this format: 50:58.0, 23:11.0.. etc- when i click on the cell in excel however it is: 02/05/2019 07:50:58 (for the first one 50:54.0). So when i import into python as a pandas table it still retains the 50:54.0 format although i do not know why.
I tried changing the column to datetime as:
df['EventTS'] = pd.to_datetime(df['EventTS'], format='%d%b%Y:%H:%M:%S.%f')
but it doesn't work the error is time data '07:27.0' does not match format '%d%b%Y:%H:%M:%S.%f' (match)
without changing format in excel how do i correct this issue in python?

pandas.read_csv() can apply different date formats within the same column! Is it a known bug? How to fix it?

I have realised that, unless the format of a date column is declared explicitly or semi-explicitly (with dayfirst), pandas can apply different date formats to the same column, when reading a csv file! One row could be dd/mm/yyyy and another row in the same column mm/dd/yyyy!
Insane doesn't even come close to describing it! Is it a known bug?
To demonstrate: the script below creates a very simple table with the dates from January 1st to the 31st, in the dd/mm/yyyy format, saves it to a csv file, then reads back the csv.
I then use pandas.DatetimeIndex to extract the day.
Well, the day is 1 for the first 12 days (when month and day were both < 13), and 13 14 etc afterwards. How on earth is this possible?
The only way I have found to fix this is to declare the date format, either explicitly or just with dayfirst=True. But it's a pain because it means I must declare the date format even when I import csv with the best-formatted dates ever! Is there a simpler way?
This happens to me with pandas 0.23.4 and Python 3.7.1 on Windows 10
import numpy as np
import pandas as pd
df=pd.DataFrame()
df['day'] =np.arange(1,32)
df['day']=df['day'].apply(lambda x: "{:0>2d}".format(x) )
df['month']='01'
df['year']='2018'
df['date']=df['day']+'/'+df['month']+'/'+df['year']
df.to_csv('mydates.csv', index=False)
#same results whether you use parse_dates or not
imp = pd.read_csv('mydates.csv',parse_dates=['date'])
imp['day extracted']=pd.DatetimeIndex(imp['date']).day
print(imp['day extracted'])
By default it assumes the American dateformat, and indeed switches mid-column without throwing an Error, if that fails. And though it breaks the Zen of Python by letting this Error pass silently, "Explicit is better than implicit". So if you know your data has an international format, you can use dayfirst
imp = pd.read_csv('mydates.csv', parse_dates=['date'], dayfirst=True)
With files you produce, be unambiguous by using an ISO 8601 format with a timezone designator.

Pandas DatetimeIndex string format conversion from American to European

Ok I have read some data from a CSV file using:
df=pd.read_csv(path,index_col='Date',parse_dates=True,dayfirst=True)
The data are in European date convention format dd/mm/yyyy, that is why i am using dayfirst=True.
However, what i want to do is change the string format appearance of my dataframe index df from the American(yyyy/mm/dd) to the European format(dd/mm/yyyy) just to visually been consistent with how i am looking the dates.
I could't find any relevant argument in the pd.read_csv method.
In the output I want a dataframe in which simply the index will be a datetime index visually consistent with the European date format.
Could anyone propose a solution? It should be straightforward, since I guess there should be a pandas method to handle that, but i am currently stuck.
Try something like the following once it's loaded from the CSV. I don't believe it's possible to perform the conversion as part of the reading process.
import pandas as pd
df = pd.DataFrame({'date': pd.date_range(start='11/24/2016', periods=4)})
df['date_eu'] = df['date'].dt.strftime('%d/%m/%Y')

Categories