In python I import a csv file with one datetime value at each row (2013-03-14 07:37:33)
and I want to compare it with the datetime values I obtain with timestamp.
I assume that when I read the csv the result is strings, but when I try to compare them in a loop with the strings from timestamp does not compare them at all without giving me an error at the same time.
Any suggestions?
csv_in = open('FakeOBData.csv', 'rb')
reader = csv.reader(csv_in)
for row in reader:
date = row
OBD.append(date)
.
.
.
for x in OBD:
print x
sightings = db.edge.find ( { "tag" : int(participant_tag)},{"_id":0}).sort("time")
for sighting in sightings:
time2 = datetime.datetime.fromtimestamp(time)
if x == time2:
Use datetime.datetime.strptime to parse the strings into datetime objects. You may also have to work out what time zone the date strings in your CSV are from and adjust for that.
%Y-%m-%d %H:%M:%S should work as your format string:
x_datetime = datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
if x_datetime == time2:
Or parse it when reading:
for row in reader:
date = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
You could parse it yourself with datetime.datetime.strptime which should be fine if you know the format the date is in. If you do not know the format or want to be more robust I would advise you to use the parser from python-dateutil library, it has an awesome parser that is very robust.
pip install python-dateutil
Then
import dateutil.parser
d = dateutil.parser.parse('1 Jan 2012 12pm UTC') # its that robust!
Related
I'm trying to format a date to a customized one. When I use datetime.datetime.now(), I get the right format of date I'm after. However, my intention is to get the same format when I use 1980-01-22 instead of now.
import datetime
date_string = "1980-01-22"
item = datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
print(item)
Output I get:
2021-05-04T09:52:04.010Z
How can I get the same format of date when I use a customized date, as in 1980-01-22 instead of now?
MrFuppes suggestion in the comments is the shortest way to accomplish your date conversion and formatting use case.
Another way is to use the Python module dateutil. This module has a lot of flexibility and I use it all the time.
Using dateutil.parser.parse:
from dateutil.parser import parse
# ISO FORMAT
ISO_FORMAT_MICROS = "%Y-%m-%dT%H:%M:%S.%f%z"
# note the format of these strings
date_strings = ["1980-01-22",
"01-22-1980",
"January 22, 1980",
"1980 January 22"]
for date_string in date_strings:
dt = parse(date_string).strftime(ISO_FORMAT_MICROS)
# strip 3 milliseconds for the output and add the ZULU time zone designator
iso_formatted_date = f'{dt[:-3]}Z'
print(iso_formatted_date)
# output
1980-01-22T00:00:00.000Z
1980-01-22T00:00:00.000Z
1980-01-22T00:00:00.000Z
1980-01-22T00:00:00.000Z
Using dateutil.parser.isoparse:
from dateutil.parser import isoparse
from dateutil.tz import *
dt = isoparse("1980-01-22").isoformat(timespec="milliseconds")
iso_formatted_date = f'{dt}Z'
print(iso_formatted_date)
# output
1980-01-22T00:00:00.000Z
Is this what your trying to achieve?
date_string = "1980-01-22"
datetime.datetime.strptime(date_string, "%Y-%m-%d").isoformat(timespec="milliseconds")
Output
'1980-01-22T00:00:00.000'
i check many StackOverflow questions. But can't solve this problem...
import pandas as pd
from datetime import datetime
import csv
username = input("enter name: ")
with open('../data/%s_tweets.csv' % (username), 'rU') as f:
reader = csv.reader(f)
your_list = list(reader)
for x in your_list:
date = x[1] # is the date index
dateOb = datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
# i also used "%d-%m-%Y %H:%M:%S" formate
# i also used "%d-%m-%Y %I:%M:%S" formate
# i also used "%d-%m-%Y %I:%M:%S%p" formate
# but the same error shows for every formate
print(dateOb)
i am getting the error
ValueError: time data 'date' does not match format '%d-%m-%Y %I:%M:%S'
in my csv file
ValueError: time data 'date' does not match format '%d-%m-%Y %I:%M:%S'
'date' is not a Date String.
That's why python can not convert this string into DateTime format.
I check in my .csv file, and there i found the 1st line of the date list is not a date string, is a column head.
I remove the first line of my CSV file, and then its works in Python 3.5.1.
But, sill the same problem is occurring in python 2.7
I have an column in excel which has dates in the format ''17-12-2015 19:35". How can I extract the first 2 digits as integers and append it to a list? In this case I need to extract 17 and append it to a list. Can it be done using pandas also?
Code thus far:
import pandas as pd
Location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(Location)
time = df['Creation Date'].tolist()
print (time)
You could extract the day of each timestamp like
from datetime import datetime
import pandas as pd
location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(location)
timestamps = df['Creation Date'].tolist()
dates = [datetime.strptime(timestamp, '%d-%m-%Y %H:%M') for timestamp in timestamps]
days = [date.strftime('%d') for date in dates]
print(days)
The '%d-%m-%Y %H:%M'and '%d' bits are format specififers, that describe how your timestamp is formatted. See e.g. here for a complete list of directives.
datetime.strptime parses a string into a datetimeobject using such a specifier. dateswill thus hold a list of datetime instances instead of strings.
datetime.strftime does the opposite: It turns a datetime object into string, again using a format specifier. %d simply instructs strftime to only output the day of a date.
My CSV file is arranged so that there's a row named "Dates," and below that row is a gigantic column of a million dates, in the traditional format like "4/22/2015" and "3/27/2014".
How can I write a program that identifies the earliest and latest dates in the CSV file, while maintaining the original format (month/day/year)?
I've tried
for line in count_dates:
dates = line.strip().split(sep="/")
all_dates.append(dates)
print (all_dates)
I've tried to take away the "/" and replace it with a blank space, but it does not print anything.
import pandas as pd
import datetime
df = pd.read_csv('file_name.csv')
df['Dates'] = df['Dates'].apply(lambda v: datetime.datetime.strptime(v, '%m/%d/%Y'))
print df['Dates'].min(), df['Dates'].max()
Considering you have a large file, reading it in its entirety into memory is a bad idea.
Read the file line by line, manually keeping track of the earliest and latest dates. Use datetime.datetime.strptime to convert the strings to dates (takes the string format as parameter.
import datetime
with open("input.csv") as f:
f.readline() # get the "Dates" header out of the way
first = f.readline().strip()
earliest = datetime.datetime.strptime(first, "%m/%d/%Y")
latest = datetime.datetime.strptime(first, "%m/%d/%Y")
for line in f:
date = datetime.datetime.strptime(line.strip(), "%m/%d/%Y")
if date < earliest: earliest = date
if date > latest: latest = date
print "Earliest date:", earliest
print "Latest date:", latest
Let's open the csv file, read out all the dates. Then use strptime to turn them into comparable datetime objects (now, we can use max). Lastly, let's print out the biggest (latest) date
import csv
from datetime import datetime as dt
with open('path/to/file') as infile:
dt.strftime(max(dt.strptime(row[0], "%m/%d/%Y") \
for row in csv.reader(infile)), \
"%m/%d/%Y")
Naturally, you can use min to get the earliest date. However, this takes two linear runs, and you can do this with just one, if you are willing to do some heavy lifting yourself:
import csv
from datetime import datetime as dt
with open('path/to/file') as infile:
reader = csv.reader(infile)
date, *_rest = next(infile)
date = dt.strptime(date, "%m/%d/%Y")
for date, *_rest in reader:
date = dt.strptime(date, "%m/%d/%Y")
earliest = min(date, earliest)
latest = max(date, latest)
print("earliest:", dt.strftime(earliest, "%m/%d/%Y"))
print("latest:", dt.strftime(latest, "%m/%d/%Y"))
A bit of an RTFM answer: Open the file in csv format (see the csv library), and then iterate line by line converting the field that is a date into a date object (see the docs for converting a string to a date object), and if it is less than minimum so far store it as minimum, similar for max, with a special condition on the first line that the date becomes both min and max dates.
Or for some overkill you could just use Pandas to read it into a data frame specifying the specific column as date format then just use max & min.
I think it is more convenient to use pandas for this purpose.
import pandas as pd
df = pd.read_csv('file_name.csv')
df['name_of_column_with_date'] = pd.to_datetime(df['name_of_column_with_date'], format='%-m/%d/%Y')
print('min_date{}'.format(min(df['name_of_column_with_date'])))
print('max_date{}'.format(max(df['name_of_column_with_date'])))
The built-in functions work well with Pandas Dataframes.
For more understanding of the format feature in pd.to_datatime you can use Python strftime cheat sheet
I have a huge logging file with time stamp in the format like below:
08/07/2013 11:40:08 PM INFO
I want to convert that to mysql timestamp using python, like:
2013-04-11 13:18:02
I have written a python script to do that but I am wondering is there some build-in python package/function written already to do the timestamp routine work easily and more efficiently.
Since data 'massaging' is part of my daily work so any suggestion to the efficiency of my code or usage of new function or even new tools would be gratefully appreciate.
(Note: input file is delimited by ^A and I am also converting that to csv at the same time)
($ cat output.file | python csv.py > output.csv)
import sys
def main():
for line in sys.stdin:
line = line[:-1]
cols = line.split(chr(1))
cols[0] = convertTime(cols[0])
cols = [ '"' + col + '"' for col in cols ]
print ",".join(cols)
def convertTime(loggingTime):
#mysqlTime example: 2013-04-11 13:18:02
#loggingTime example: 08/07/2013 11:40:08 PM INFO
#DATE
month, day, year = loggingTime[0:10].split('/')
date = '/'.join([year,month,day])
#TIME
hour, minute, second = loggingTime[11:19].split(':')
flag = loggingTime[20:22]
if flag == 'PM':
hour = str(int(hour) + 12)
time = ":".join([hour, minute, second])
mysqlTime = date + " " + time
return mysqlTime
if __name__ == '__main__':
main()
Use time.strptime to parse the time, then time.strftime to reformat to new format?
import time
input_format = "%m/%d/%Y %I:%M:%S %p INFO" # or %d/%m...
output_format = "%Y-%m-%d %H:%M:%S"
def convert_time(logging_time):
return time.strftime(output_format, time.strptime(logging_time, input_format))
print convert_time("08/07/2013 11:40:08 PM INFO")
# prints 2013-08-07 23:40:08
Notice however that strptime and strftime can be affected by the current locale, you might want to set the locale to C (it is internally used by the datetime module too), as the %p can give different formatting for AM/PM for different locales; thus to be safe you might need to run the following code in the beginning:
import locale
locale.setlocale(locale.LC_TIME, "C")
I would recommend using the datetime module. You can convert your date string into a python datetime object, which you can then use to output a reformatted version.
from datetime import datetime
mysqltime = "2013-04-11 13:18:02"
timeobj = datetime.strptime(mysqltime, "%Y-%m-%d %H:%M:%S")
loggingtime = timeobj.strftime("%m/%d/%Y %H:%M:%S %p")
Convert it, as suggested, with strptime like this:
converter="%d/%m/%Y %H:%M:%S %p INFO"
result = dt.datetime.strptime("08/07/2013 11:40:08 PM INFO",converter)
Split is needed due to the "INFO"-String (edit: not needed). Then parse with strftime:
result.strftime("%Y-%m-%d %H:%M:%S")