I am trying to read date column from a csv file. This column contains dates in just one format. Please see data below:
The problem arises when I am trying to read it using dateparser.
dateparse=lambda x:datetime.strptime(x, '%m/%d/%Y').date()
df = pd.read_csv('products.csv', parse_dates=['DateOfRun'], date_parser=dateparse)
Above logic works fine most of the cases, but sometimes randomly i get error that format is not matching, example below:
ValueError: time data '2020-02-23' does not match format '%m/%d/%Y'
Does anyone know how is this possible? Because that yyyy-mm-dd format is not in my data.. ANy tips will be useful.
Thanks
The problem happens when you open the csv file in Excel. Excel by default (and based on your OS settings) automatically changes the date format. For instance, in USA the default format is MM/DD/YYYY so if you have a date in a csv file such as YYYY-MM-DD it will automatically change it to MM/DD/YYYY.
The solution is to NOT open the csv file in Excel before manipulating it in Python. IF you must open it to inspect it either look at it in Python or in notepad or some other text editor.
I always assume that dates are going to be screwed up because someone might have opened it in Excel and so I test for the proper format and then change it if I get an AssertionError.
As an example if you want to change dates from YYYY-MM-DD try this:
from datetime import datetime
def change_dates(date_string):
try:
assert datetime.strptime(date_string, '%m/%d/%y'), 'format error'
return date_string
except AssertionError, ValueError:
dt = datetime.strptime(date_string, '%Y-%m-%d')
return dt.strftime('%m/%d/%Y')
Related
I am reading ~50 files and adding them to the same table consecutively, there is one for each month over the past few years. After the first year, the date format presented in the CSV files shifted from the format YYYY-mm-dd to mm/dd/YYYY.
SQL Server is fine with the date format YYYY-mm-dd and is what it expects, but once the format switched in the CSV my program will crash
I wrote a piece of code to try and convert the data to the correct format, but it didn't work, as shown here:
if '/' in df['SubmissionDate'].iloc[0]:
df['SubmissionDate'] = pd.to_datetime(df['SubmissionDate'], format = '%m/%d/%Y')
I believe that this would have worked, barring the issue that some of the rows of data have no date, so I need to either find some other way to allow the SQL Insert statement to accept this different date format, or avoid trying to convert the blank items in the Submission Date column.
Any help would be greatly appreciated!
It sounds like you are not using parse_dates= when loading the CSV file into the DataFrame. The date parser seems to be able to handle multiple date formats, even within the same file:
import io
import pandas as pd
csv = io.StringIO(
"""\
id,a_date
1,2001-01-01
2,1/2/2001
3,12/31/2001
4,31/12/2001
"""
)
df = pd.read_csv(csv, parse_dates=["a_date"])
print(df)
"""
id a_date
0 1 2001-01-01
1 2 2001-01-02
2 3 2001-12-31
3 4 2001-12-31
"""
I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')
I currently have a csv file containing a column of dates that is formatted as dd/mm/yyyy but i want to change it to yyyy/mm/dd
I have written the code below:
import csv
csv_in = open('news.csv','rb')
for rows in csv.reader(csv_in):
value = rows[0]
day= value[:2]
year= value[-4:]
month= value[3:5]
edited = year +'/'+month+'/'+day
rows[0] = edited
writer =csv.writer(open('newsedit.csv', 'wb'))
writer.writerow(rows)
for some reason the code above will only write one row and stop and i can't seem to figure out why this is happening.
Try convert the date by datetime module.
import datetime
datetime.datetime.strptime("25/01/2013", '%d/%m/%Y').strftime('%Y/%m/%d')
The strptime function loads a string to datetime object while strftime converts the datetime to another format as string.
It is because you keep initializing a new writer in each iteration. This causes that the output is being replaced over and over again. You should create a writer object only once, then you can use its writerow() method repeatedly.
(Btw. there is a nice datetime module that makes working with dates easy...)
I am using python's csv module to write a list to a csv file. One of the entries in the list is a string of format "MM/DD/YYYY HH:MM:SS AM/PM". When i open the csv file using excel, the format of this entry is showing up as
"MM/DD/YYYY HH:MM". On highlighting the cell I can see the formula bar shows the true format of "MM/DD/YYYY HH:MM:SS AM/PM". What is excel doing here and how can i ensure the original datetime format shows up correctly in excel.
To give you some background on the problem, I am trying to export data from an oracle database in a csv format which i have to do quite frequently for analysis and review.
Thanks for your help
select your cell(s)
right click
format cells
custom
type in [$-409]mm/dd/yyyy hh:mm:ss AM/PM;#
press ok
Some items may be named slightly different as I translated this from a German Excel version.
I have a table in csv file and I want to import it into a MySQL table. I copy paste a series of data from a website into an excel file and then convert it to csv file.
The columns in my table are like:
Date,name,version,link
the format of the date is like dd/mm/yy
primarily I tried to load the file into mySQL table but I got this error code 1292:
Incorrect date value while the date value is set as DATE
I have also tried to put the CSV table to MySQL with python but still have the same error:
_mysql_exceptions.OperationalError: (1292, "Incorrect date value: 'Date' for column 'Date' at row 1")
Do anyone has any idea what should I do?
yyyy/mm/dd and yyyy-mm-dd both formats will work
Since your format is dd/mm/yy so its giving error.
It should be like yy/mm/dd or yymmdd or yyyymmdd or yy-mm-dd or may other ways.
so change it from dd/mm/yy to yy/mm/dd
Read more here
I scrape a table from a web page and paste it into Excel. Then I save the Excel as csv format. However, Excel csv format contains Unicode characters because the table I scrape has them. So be sure to handle the Unicode before trying to put into your database.