i check many StackOverflow questions. But can't solve this problem...
import pandas as pd
from datetime import datetime
import csv
username = input("enter name: ")
with open('../data/%s_tweets.csv' % (username), 'rU') as f:
reader = csv.reader(f)
your_list = list(reader)
for x in your_list:
date = x[1] # is the date index
dateOb = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
# i also used "%d-%m-%Y %H:%M:%S" formate
# i also used "%d-%m-%Y %I:%M:%S" formate
# i also used "%d-%m-%Y %I:%M:%S%p" formate
# but the same error shows for every formate
print(dateOb)
i am getting the error
ValueError: time data 'date' does not match format '%d-%m-%Y %I:%M:%S'
in my csv file
ValueError: time data 'date' does not match format '%d-%m-%Y %I:%M:%S'
'date' is not a Date String.
That's why python can not convert this string into DateTime format.
I check in my .csv file, and there i found the 1st line of the date list is not a date string, is a column head.
I remove the first line of my CSV file, and then its works in Python 3.5.1.
But, sill the same problem is occurring in python 2.7
Related
So I'm getting this error:
time data '6/28/18' does not match format '%b/%d/%y'
I have a csv file with the 4th column having the dates and want to sort the data by date... Any suggestions or possible solutions? I'm not so familiar with the datetime feature of Python...
import csv
from datetime import datetime
with open('example.csv', newline='') as f:
reader = csv.reader(f)
data = sorted(reader, key = lambda row: datetime.strptime(row[4], '%b/%d/%y'))
print (data)
Use "%m/%d/%y" instead of "%b/%d/%y"
>>> x = '6/28/18'
>>> datetime.strptime(x, '%m/%d/%y')
datetime.datetime(2018, 6, 28, 0, 0)
Your datetime.strptime format string should be '%m/%d/%y'.
The %b option would work if your month was an abbreviated name like 'Jun'
For more on Python's datetime formatting options see this link:
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
I am making a generic tool which can take up any csv file.I have a csv file which looks something like this. The first row is the column name and the second row is the type of variable.
sam.csv
Time,M1,M2,M3,CityName
temp,num,num,num,city
20-06-13,19,20,0,aligarh
20-02-13,25,42,7,agra
20-03-13,23,35,4,aligarh
20-03-13,21,32,3,allahabad
20-03-13,17,27,1,aligarh
20-02-13,16,40,5,aligarh
Other CSV file looks like:
Time,M1,M2,M3,CityName
temp,num,num,num,city
20/8/16,789,300,10,new york
12/6/17,464,67,23,delhi
12/6/17,904,98,78,delhi
So, there could be any date format or it could be a time stamp.I want to convert it to "20-May-13" or "%d-%b-%y" format string everytime and sort the column from oldest date to the newest date. I have been able to search the column name where the type is "temp" and try to convert it to the required format but all the methods require me to specify the original format which is not possible in my case.
Code--
import csv
import time
from datetime import datetime,date
import pandas as pd
import dateutil
from dateutil.parser import parse
filename = 'sam.csv'
data_date = pd.read_csv(filename)
column_name = data_date.ix[:, data_date.loc[0] == "temp"]
column_work = column_name.iloc[1:]
column_some = column_work.iloc[:,0]
default_date = datetime.combine(date.today(), datetime.min.time()).replace(day=1)
for line in column_some:
print(parse(line[0], default=default_date).strftime("%d-%b-%y"))
In "sam.csv", the dates are in 2013. But in my output it gives the correct format but all the 6 dates as 2-Mar-2018
You can use the dateutil library for converting any date format to your required format.
Ex:
import csv
from dateutil.parser import parse
p = "PATH_TO_YOUR_CSV.csv" #I have used your sample data to test.
with open(p, "r") as infile:
reader = csv.reader(infile)
next(reader) #Skip Header
next(reader) #Skip Header
for line in reader:
print(parse(line[0]).strftime("%d-%B-%y")) #Parse Date and convert it to date-month-year
Output:
20-June-13
20-February-13
20-March-13
20-March-13
20-March-13
20-February-13
20-August-16
06-December-17
06-December-17
MoreInfo on Dateutil
I have a CSV file with a field named start_date that contains data in a variety of formats.
Some of the formats include e.g., June 23, 1912 or 5/11/1930 (month, day, year). But not all values are valid dates.
I want to add a start_date_description field adjacent to the start_date column to filter invalid date values into. Lastly, normalize all valid date values in start_date to ISO 8601 (i.e., YYYY-MM-DD).
So far I was only able to load the start_date into my file, I am stuck and would appreciate ant help. Please, any solution especially without using a library would be great!
import csv
date_column = ("start_date")
f = open("test.csv","r")
csv_reader = csv.reader(f)
headers = None
results = []
for row in csv_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in date_column:
headers.append(i)
else:
results.append(([row[i] for i in headers]))
print results
One way is to use dateutil module, you can parse data as follows:
from dateutil import parser
parser.parse('3/16/78')
parser.parse('4-Apr') # this will give current year i.e. 2017
Then parsing to your format can be done by
dt = parser.parse('3/16/78')
dt.strftime('%Y-%m-%d')
Suppose you have table in dataframe format, you can now define parsing function and apply to column as follows:
def parse_date(start_time):
try:
return parser.parse(x).strftime('%Y-%m-%d')
except:
return ''
df['parse_date'] = df.start_date.map(lambda x: parse_date(x))
Question: ... add a start_date_description ... normalize ... to ISO 8601
This reads the File test.csv and validates the Date String in Column start_date with Date Directive Patterns and returns a
dict{description, ISO}. The returned dict is used to update the current Row dict and the updated Row dict is writen to the File test_update.csv.
Put this in a NEW Python File and run it!
A missing valid Date Directive Pattern could be simple added to the Array.
Python ยป 3.6 Documentation: 8.1.8. strftime() and strptime() Behavior
from datetime import datetime as dt
import re
def validate(date):
def _dict(desc, date):
return {'start_date_description':desc, 'ISO':date}
for format in [('%m/%d/%y','Valid'), ('%b-%y','Short, missing Day'), ('%d-%b-%y','Valid'),
('%d-%b','Short, missing Year')]: #, ('%B %d. %Y','Valid')]:
try:
_dt = dt.strptime(date, format[0])
return _dict(format[1], _dt.strftime('%Y-%m-%d'))
except:
continue
if not re.search(r'\d+', date):
return _dict('No Digit', None)
return _dict('Unknown Pattern', None)
with open('test.csv') as fh_in, open('test_update.csv', 'w') as fh_out:
csv_reader = csv.DictReader(fh_in)
csv_writer = csv.DictWriter(fh_out,
fieldnames=csv_reader.fieldnames +
['start_date_description', 'ISO'] )
csv_writer.writeheader()
for row, values in enumerate(csv_reader,2):
values.update(validate(values['start_date']))
# Show only Invalid Dates
if any(w in values['start_date_description']
for w in ['Unknown', 'No Digit', 'missing']):
print('{:>3}: {v[start_date]:13.13} {v[start_date_description]:<22} {v[ISO]}'.
format(row, v=values))
csv_writer.writerow(values)
Output:
start_date start_date_description ISO
June 23. 1912 Valid 1912-06-23
12/31/91 Valid 1991-12-31
Oct-84 Short, missing Day 1984-10-01
Feb-09 Short, missing Day 2009-02-01
10-Dec-80 Valid 1980-12-10
10/7/81 Valid 1981-10-07
Facere volupt No Digit None
... (omitted for brevity)
Tested with Python: 3.4.2
I have an column in excel which has dates in the format ''17-12-2015 19:35". How can I extract the first 2 digits as integers and append it to a list? In this case I need to extract 17 and append it to a list. Can it be done using pandas also?
Code thus far:
import pandas as pd
Location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(Location)
time = df['Creation Date'].tolist()
print (time)
You could extract the day of each timestamp like
from datetime import datetime
import pandas as pd
location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(location)
timestamps = df['Creation Date'].tolist()
dates = [datetime.strptime(timestamp, '%d-%m-%Y %H:%M') for timestamp in timestamps]
days = [date.strftime('%d') for date in dates]
print(days)
The '%d-%m-%Y %H:%M'and '%d' bits are format specififers, that describe how your timestamp is formatted. See e.g. here for a complete list of directives.
datetime.strptime parses a string into a datetimeobject using such a specifier. dateswill thus hold a list of datetime instances instead of strings.
datetime.strftime does the opposite: It turns a datetime object into string, again using a format specifier. %d simply instructs strftime to only output the day of a date.
In python I import a csv file with one datetime value at each row (2013-03-14 07:37:33)
and I want to compare it with the datetime values I obtain with timestamp.
I assume that when I read the csv the result is strings, but when I try to compare them in a loop with the strings from timestamp does not compare them at all without giving me an error at the same time.
Any suggestions?
csv_in = open('FakeOBData.csv', 'rb')
reader = csv.reader(csv_in)
for row in reader:
date = row
OBD.append(date)
.
.
.
for x in OBD:
print x
sightings = db.edge.find ( { "tag" : int(participant_tag)},{"_id":0}).sort("time")
for sighting in sightings:
time2 = datetime.datetime.fromtimestamp(time)
if x == time2:
Use datetime.datetime.strptime to parse the strings into datetime objects. You may also have to work out what time zone the date strings in your CSV are from and adjust for that.
%Y-%m-%d %H:%M:%S should work as your format string:
x_datetime = datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
if x_datetime == time2:
Or parse it when reading:
for row in reader:
date = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
You could parse it yourself with datetime.datetime.strptime which should be fine if you know the format the date is in. If you do not know the format or want to be more robust I would advise you to use the parser from python-dateutil library, it has an awesome parser that is very robust.
pip install python-dateutil
Then
import dateutil.parser
d = dateutil.parser.parse('1 Jan 2012 12pm UTC') # its that robust!