Python: Split timestamp by date and hour - python

I have a list of timestamps in the following format:
1/1/2013 3:30
I began to learn python some weeks ago and I have no idea how to split the date and time. Can anyone of you help me?
Output should be on column including
1/1/2013
and one column including
3:30

I think that all you need is str.split ...
>>> s = '1/1/2013 3:30'
>>> s.split()
['1/1/2013', '3:30']
If it's in a list, you can do with a list-comprehension:
>>> lst = ['1/1/2013 3:30', '1/2/2013 3:30']
>>> [s.split() for s in lst]
[['1/1/2013', '3:30'], ['1/2/2013', '3:30']]

If you want to use this date and time further in your code to perform operations on this data such as comparing dates, you can convert this timestamp to datetime objects. Refer the documentation on datetime module.
You can use the following code to convert your timestamp to datetime object.
>>> import datetime
>>> timestamp = datetime.datetime.strptime("1/1/2013 3:30", "%d/%m/%y %H:%M")
>>> timestamp
datetime.datetime(2013, 1, 1, 3, 30)
>>> timestamp.date()
datetime.date(2013, 1, 1)
>>> timestamp.time()
datetime.time(3, 30)
If you just want to strip date and time to use them as strings, use method suggested by mgilson.

Here is pseudocode to accomplish what you had mentioned in your comment:
f = file("path/to/file.csv", "r")
timestamp_column = 10
def get_updated_row(i, row):
row = row.split(',')
try:
timestamp = row.pop(timestamp_column) #remove column
if i == 0:
#header
row.extend(["date", "time"]) #add columns
else:
#normal row
date = timestamp[0]
time = timestamp[1]
row.extend([date, time])
except IndexError:
print("ERROR: Unable to parse row {0}".format(i))
return ','.join(row)
with f.read() as csv:
for i, row in enumerate(csv):
print(get_updated_row(i, row)) #write to file here instead if necessary

Related

How to split Datetime into date and time separately in python

I have a datetime format of yyyy-mm-dd hh:mm:ss in one line and I want to split date and time in separate columns can any one help
df_train['Time1'] = df_train['server_time'].apply(lambda x : x.split(' ')[1])
when I apply this code I'm getting an error as "list index out of range"
This will help you.
df_train['date'] = [d.date() for d in df_train['server_time']]
df_train['time'] = [d.time() for d in df_train['server_time']]
You can get the date and time from datetime.date and datetime.time:
import datetime as dt
df_train['Date'] = df_train['server_time'].dt.date
df_train['Time'] = df_train['server_time'].dt.time

Python- MM/DD/YY Sorting

So I'm getting this error:
time data '6/28/18' does not match format '%b/%d/%y'
I have a csv file with the 4th column having the dates and want to sort the data by date... Any suggestions or possible solutions? I'm not so familiar with the datetime feature of Python...
import csv
from datetime import datetime
with open('example.csv', newline='') as f:
reader = csv.reader(f)
data = sorted(reader, key = lambda row: datetime.strptime(row[4], '%b/%d/%y'))
print (data)
Use "%m/%d/%y" instead of "%b/%d/%y"
>>> x = '6/28/18'
>>> datetime.strptime(x, '%m/%d/%y')
datetime.datetime(2018, 6, 28, 0, 0)
Your datetime.strptime format string should be '%m/%d/%y'.
The %b option would work if your month was an abbreviated name like 'Jun'
For more on Python's datetime formatting options see this link:
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior

Sorting by month-year groups by month instead

I have a curious python problem.
The script takes two csv files, one with a column of dates and the other a column of text snippets. in the other excel file there is a bunch of names (substrings).
All that the code does is step through both lists building up a name-mentioned-per-month matrix.
FILE with dates and text: (Date, Snippet first column)
ENTRY 1 : Sun 21 nov 2014 etc, The release of the iphone 7 was...
-strings file
iphone 7
apple
apples
innovation etc.
The problem is that when i try to order it so that the columns follow in asceding order, e.g. oct-2014, nov-2014, dec-2014 and so on, it just groups the months together instead, which isn't what i want
import csv
from datetime import datetime
file_1 = input('Enter first CSV name (one with the date and snippet): ')
file_2 = input('Enter second CSV name (one with the strings): ')
outp = input('Enter the output CSV name: ')
file_1_list = []
head = True
for row in csv.reader(open(file_1, encoding='utf-8', errors='ignore')):
if head:
head = False
continue
date = datetime.strptime(row[0].strip(), '%a %b %d %H:%M:%S %Z %Y')
date_str = date.strftime('%b %Y')
file_1_list.append([date_str, row[1].strip()])
file_2_dict = {}
for line in csv.reader(open(file_2, encoding='utf-8', errors='ignore')):
s = line[0].strip()
for d in file_1_list:
if s.lower() in d[1].lower():
if s in file_2_dict.keys():
if d[0] in file_2_dict[s].keys():
file_2_dict[s][d[0]] += 1
else:
file_2_dict[s][d[0]] = 1
else:
file_2_dict[s] = {
d[0]: 1
}
months = []
for v in file_2_dict.values():
for k in v.keys():
if k not in months:
months.append(k)
months.sort()
rows = [[''] + months]
for k in file_2_dict.keys():
tmp = [k]
for m in months:
try:
tmp.append(file_2_dict[k][m])
except:
tmp.append(0)
rows.append(tmp)
print("still working on it be patient")
writer = csv.writer(open(outp, "w", encoding='utf-8', newline=''))
for r in rows:
writer.writerow(r)
print('Done...')
From my understanding I am months.sort() isnt doing what i expect it to?
I have looked here , where they apply some other function to sort the data, using attrgetter,
from operator import attrgetter
>>> l = [date(2014, 4, 11), date(2014, 4, 2), date(2014, 4, 3), date(2014, 4, 8)]
and then
sorted(l, key=attrgetter('month'))
But I am not sure whether that would work for me?
From my understanding I parse the dates 12-13, am I missing an order data first, like
data = sorted(data, key = lambda row: datetime.strptime(row[0], "%b-%y"))
I have only just started learning python and so many things are new to me i dont know what is right and what isnt?
What I want(of course with the correctly sorted data):
This took a while because you had so much unrelated stuff about reading csv files and finding and counting tags. But you already have all that, and it should have been completely excluded from the question to avoid confusing people.
It looks like your actual question is "How do I sort dates?"
Of course "Apr-16" comes before "Oct-14", didn't they teach you the alphabet in school? A is the first letter! I'm just being silly to emphasize a point -- it's because they are simple strings, not dates.
You need to convert the string to a date with the datetime class method strptime, as you already noticed. Because the class has the same name as the module, you need to pay attention to how it is imported. You then go back to a string later with the member method strftime on the actual datetime (or date) instance.
Here's an example:
from datetime import datetime
unsorted_strings = ['Oct-14', 'Dec-15', 'Apr-16']
unsorted_dates = [datetime.strptime(value, '%b-%y') for value in unsorted_strings]
sorted_dates = sorted(unsorted_dates)
sorted_strings = [value.strftime('%b-%y') for value in sorted_dates]
print(sorted_strings)
['Oct-14', 'Dec-15', 'Apr-16']
or skipping to the end
from datetime import datetime
unsorted_strings = ['Oct-14', 'Dec-15', 'Apr-16']
print (sorted(unsorted_strings, key = lambda x: datetime.strptime(x, '%b-%y')))
['Oct-14', 'Dec-15', 'Apr-16']

Python: Parse String as Date with Formatting

A user can input a string and the string contains a date in the following formats MM/DD/YY or MM/DD/YYYY. Is there an efficient way to pull the date from the string? I was thinking of using RegEx for \d+\/\d+\/\d+. I also want the ability to be able to sort the dates. I.e. if the strings contain 8/17/15 and 08/16/2015, it would list the 8/16 date first and then 8/17
Have a look at datetime.strptime, it's a built in function that knows how to create a datetime object from a string. It accepts a string to be converted and the format the date is written in.
from datetime import datetime
def str_to_date(string):
pattern = '%m/%d/%Y' if len(string) > 8 else '%m/%d/%y'
try:
return datetime.strptime(string, pattern).date()
except ValueError:
raise # TODO: handle invalid input
The function returns a date() object which can be directly compared with other date() objects (e.g. when sorting) them.
Usage:
>>> d1 = str_to_date('08/13/2015')
>>> d2 = str_to_date('08/12/15')
>>> d1
datetime.date(2015, 8, 13)
>>> d2
datetime.date(2015, 8, 12)
>>> d1 > d2
True
Update
OP explained in a comment that strings such as 'foo 08/13/2015 bar' should not be automatically thrown away, and that the date should be extracted from them.
To achieve that, we must first search for a candidate string in user's input:
import re
from datetime import date
user_string = input('Enter something') # use raw_input() in Python 2.x
pattern = re.compile(r'(\d{2})/(\d{2})/(\d{4}|\d{2})') # 4 digits match first!
match = re.search(pattern, user_string)
if not match:
d = None
else:
month, day, year = map(int, match.groups())
try:
d = date(year, month, day)
except ValueError:
d = None # or handle error in a different way
print(d)
The code reads user input and then tries to find a pattern in it that represents a date in MM/DD/YYYY or MM/DD/YY format. Note that the last capturing group (in parentheses, i.e. ()) checks for either four or two consecutive digits.
If it finds a candidate date, it unpacks the capturing groups in the match, converting them to integers at the same time. It then uses the three matched pieces to tries to create a new date() object. If that fails, the candidate date was invalid, e.g. '02/31/2015'
Footnotes:
the code will only catch the first date candidate in the input
the regular expression used will, in its current form, also match dates in inputs like '12308/13/2015123'. If this is not desirable it would have to be modified, probably adding some lookahead/lookbehind assertions.
you could also try strptime:
import time
dates = ('08/17/15', '8/16/2015')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
ret = time.strptime(date, "%m/%d/%y")
print(ret)
UPDATE
update after comments:
this way you will get a valid date back or None if the date can not be parsed:
import time
dates = ('08/17/15', '8/16/2015', '02/31/15')
for date in dates:
print(date)
ret = None
try:
ret = time.strptime(date, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date, "%m/%d/%y")
except ValueError:
pass
print(ret)
UPDATE 2
one more update after the comments about the requirements.
this is a version (it only takes care of the dates; not the text before/after. but using the regex group this can easily be extracted):
import re
import time
dates = ('foo 1 08/17/15', '8/16/2015 bar 2', 'foo 3 02/31/15 bar 4')
for date in dates:
print(date)
match = re.search('(?P<date>[0-9]+/[0-9]+/[0-9]+)', date)
date_str = match.group('date')
ret = None
try:
ret = time.strptime(date_str, "%m/%d/%Y")
except ValueError:
try:
ret = time.strptime(date_str, "%m/%d/%y")
except ValueError:
pass
print(ret)
Why not use strptime to store them as datetime objects. These objects can easily be compared and sorted that way.
import datetime
try:
date = datetime.datetime.strptime("08/03/2015", "%m/%d/%Y")
except:
date = datetime.datetime.strptime("08/04/15", "%m/%d/%y")
finally:
dateList.append(date)
Note the difference between %Y and %y. You can then just compare dates made this way to see which ones are greater or less. You can also sort it using dateList.sort()
If you want the date as a string again you can use:
>>> dateString = date.strftime("%Y-%m-%d")
>>> print dateString
'2015-08-03'
Why bother with regex when you can use datetime.strptime?
You can use the date parser from Pandas.
import pandas as pd
timestr = ['8/8/95', '8/15/2014']
>>> [pd.datetools.parse(d) for d in timestr]
[datetime.datetime(1995, 8, 8, 0, 0), datetime.datetime(2014, 8, 15, 0, 0)]
Using regex groups we'd get something like this:
import re
ddate = '08/16/2015'
reg = re.compile('(\d+)\/(\d+)\/(\d+)')
matching = reg.match(ddate)
if matching is not None:
print(matching.groups())
Would yield
('08','16','2015')
You could parse this after, but if you wanted to get rid of leading 0's from the first place you could use
reg = re.compile('0*(\d+)\/0*(\d+)\/(\d+)')

greater than 'date' python 3

I would like to be able to do greater than and less than against dates. How would I go about doing that? For example:
date1 = "20/06/2013"
date2 = "25/06/2013"
date3 = "01/07/2013"
date4 = "07/07/2013"
datelist = [date1, date2, date3]
for j in datelist:
if j <= date4:
print j
If I run the above, I get date3 back and not date1 or date2. I think I need I need to get the system to realise it's a date and I don't know how to do that. Can someone lend a hand?
Thanks
You can use the datetime module to convert them all to datetime objects. You are comparing strings in your example:
>>> from datetime import datetime
>>> date1 = datetime.strptime(date1, "%d/%m/%Y")
>>> date2 = datetime.strptime(date2, "%d/%m/%Y")
>>> date3 = datetime.strptime(date3, "%d/%m/%Y")
>>> date4 = datetime.strptime(date4, "%d/%m/%Y")
>>> datelist = [date1, date2, date3]
>>> for j in datelist:
... if j <= date4:
... print(j.strftime('%d/%m/%Y'))
...
20/06/2013
25/06/2013
01/07/2013
You are comparing strings, not dates. You should use a date-based object-type, such as datetime.
How to compare two dates?
You can use the datetime module:
>>> from datetime import datetime
>>> d = datetime.strptime(date4, '%d/%m/%Y')
>>> for j in datelist:
... d1 = datetime.strptime(j, '%d/%m/%Y')
... if d1 <= d:
... print j
...
20/06/2013
25/06/2013
01/07/2013
The problem with your comparison is that a string comparison first compares the first character, followed by the second one, and the third, and so on. You can of course convert the strings to dates, like the other answers suggest, but there is a different solution as well.
In order to compare dates as strings you need to have them in a different format like : 'yyyy-mm-dd'. In this way it first compares the year, than the month and finally the day:
>>> d1 = '2012-10-11'
>>> d2 = '2012-10-12'
>>> if d2 > d1:
... print('this works!')
this works!
The advantages of this are simplicity (for me at least) and performance because it saves the conversion of strings to dates (and possibly back) while still reliably comparing the dates. In programs I use I compare dates a lot as well. Since I take the dates from files it are always strings to begin with, and because performance is an issue with my program I normally like to compare dates as strings in this way.
Of course this would mean you would have to convert your dates to a different format, but if that is a one time action, it could well be worth the effort.. :)

Categories