How to convert relative time that are expressed in humanised words like "-100 days, -6 months, -1 year, +1 year" into YYYY-MM-DD format?
I am posting my answer to help others if they need the same thing.
What I am doing is developing a CLI application, part of the process is user will input starting date and end date, but I want the user to use a relative time like the following:
(for the sake of a an example date, October 23, 2017 would be the current date)
$ cliapp.py --start_time="10 days ago" --end_time="yesterday"
10 days ago is "2017-10-13"
yesterday is "2017-10-22"
$ cliapp.py --start_time="tomorrow"
tomorrow is "2017-10-24"
to accomplish this I found dateparser module and works exactly what I need.
here is the link to it: https://dateparser.readthedocs.io/en/latest/
If you have other solution, feel free to put on comment. :)
I'm astounded by some code I wrote some time ago. For not entering in much detail i have a method that runs through some objects, wich have a date parameter. If the date parameter is equal to today's date, goes on.
I have set this in my local machine for test and have like 695 objects all with the same date, today, but when the action is run nothing happens, so i debug it to find that my expression date.today() returns datetime.date(2014, 3, 19).
This is is incorrect, as the date of my computer from the date command is Tue Mar 18 20:56:09 AST 2014.
I used from datetime import date. This is one of the more cryptic errors i have ever got. Any experience someone can share here? Thanks a lot.
The method is not timezone aware and there's no platform-independent way to make it so. What is generally done is incorporate something like pytz and call .today() as:
datetime.utcnow().replace(tzinfo = pytz.utc).strftime('%Y-%m-%d')
As part of a larger personal project I'm working on, I'm attempting to separate out inline dates from a variety of text sources.
For example, I have a large list of strings (that usually take the form of English sentences or statements) that take a variety of forms:
Central design committee session Tuesday 10/22 6:30 pm
Th 9/19 LAB: Serial encoding (Section 2.2)
There will be another one on December 15th for those who are unable to make it today.
Workbook 3 (Minimum Wage): due Wednesday 9/18 11:59pm
He will be flying in Sept. 15th.
While these dates are in-line with natural text, none of them are in specifically natural language forms themselves (e.g., there's no "The meeting will be two weeks from tomorrow"—it's all explicit).
As someone who doesn't have too much experience with this kind of processing, what would be the best place to begin? I've looked into things like the dateutil.parser module and parsedatetime, but those seem to be for after you've isolated the date.
Because of this, is there any good way to extract the date and the extraneous text
input: Th 9/19 LAB: Serial encoding (Section 2.2)
output: ['Th 9/19', 'LAB: Serial encoding (Section 2.2)']
or something similar? It seems like this sort of processing is done by applications like Gmail and Apple Mail, but is it possible to implement in Python?
I was also looking for a solution to this and couldn't find any, so a friend and I built a tool to do this. I thought I would come back and share incase others found it helpful.
datefinder -- find and extract dates inside text
Here's an example:
import datefinder
string_with_dates = '''
Central design committee session Tuesday 10/22 6:30 pm
Th 9/19 LAB: Serial encoding (Section 2.2)
There will be another one on December 15th for those who are unable to make it today.
Workbook 3 (Minimum Wage): due Wednesday 9/18 11:59pm
He will be flying in Sept. 15th.
We expect to deliver this between late 2021 and early 2022.
'''
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print(match)
I am surprised that there is no mention of SUTime and dateparser's search_dates method.
from sutime import SUTime
import os
import json
from dateparser.search import search_dates
str1 = "Let's meet sometime next Thursday"
# You'll get more information about these jar files from SUTime's github page
jar_files = os.path.join(os.path.dirname(__file__), 'jars')
sutime = SUTime(jars=jar_files, mark_time_ranges=True)
print(json.dumps(sutime.parse(str1), sort_keys=True, indent=4))
"""output:
[
{
"end": 33,
"start": 20,
"text": "next Thursday",
"type": "DATE",
"value": "2018-10-11"
}
]
"""
print(search_dates(str1))
#output:
#[('Thursday', datetime.datetime(2018, 9, 27, 0, 0))]
Although I have tried other modules like dateutil, datefinder and natty (couldn't get duckling to work with python), this two seem to give the most promising results.
The results from SUTime are more reliable and it's clear from the above code snippet. However, the SUTime fails in some basic scenarios like parsing a text
"I won't be available until 9/19"
or
"I won't be available between (September 18-September 20).
It gives no result for the first text and only gives month and year for the second text.
This is however handled quite well in the search_dates method.
search_dates method is more aggressive and will give all possible dates related to any words in the input text.
I haven't yet found a way to parse the text strictly for dates in search_methods. If I could find a way to do that, it'll be my first choice over SUTime and I would also make sure to update this answer if I find it.
You can use the dateutil module's parse method with the fuzzy option.
>>> from dateutil.parser import parse
>>> parse("Central design committee session Tuesday 10/22 6:30 pm", fuzzy=True)
datetime.datetime(2018, 10, 22, 18, 30)
>>> parse("There will be another one on December 15th for those who are unable to make it today.", fuzzy=True)
datetime.datetime(2018, 12, 15, 0, 0)
>>> parse("Workbook 3 (Minimum Wage): due Wednesday 9/18 11:59pm", fuzzy=True)
datetime.datetime(2018, 3, 9, 23, 59)
>>> parse("He will be flying in Sept. 15th.", fuzzy=True)
datetime.datetime(2018, 9, 15, 0, 0)
>>> parse("Th 9/19 LAB: Serial encoding (Section 2.2)", fuzzy=True)
datetime.datetime(2002, 9, 19, 0, 0)
If you can identify the segments that actually contain the date information, parsing them can be fairly simple with parsedatetime. There are a few things to consider though namely that your dates don't have years and you should pick a locale.
>>> import parsedatetime
>>> p = parsedatetime.Calendar()
>>> p.parse("December 15th")
((2013, 12, 15, 0, 13, 30, 4, 319, 0), 1)
>>> p.parse("9/18 11:59 pm")
((2014, 9, 18, 23, 59, 0, 4, 319, 0), 3)
>>> # It chooses 2014 since that's the *next* occurence of 9/18
It doesn't always work perfectly when you have extraneous text.
>>> p.parse("9/19 LAB: Serial encoding")
((2014, 9, 19, 0, 15, 30, 4, 319, 0), 1)
>>> p.parse("9/19 LAB: Serial encoding (Section 2.2)")
((2014, 2, 2, 0, 15, 32, 4, 319, 0), 1)
Honestly, this seems like the kind of problem that would be simple enough to parse for particular formats and pick the most likely out of each sentence. Beyond that, it would be a decent machine learning problem.
Newer versions of parsedatetime lib provide search functionality.
Example
from dateparser.search import search_dates
dates = search_dates('Central design committee session Tuesday 10/22 6:30 pm')
Hi I'm not sure bellow approach is machine learning but you may try it:
add some context from outside text, e.g publishing time of text message, posting, now etc. (your text doesn't tell anything about year)
extract all tokens with separator white-space and should get something like this:
['Th','Wednesday','9:34pm','7:34','pm','am','9/18','9/','/18', '19','12']
process them with rule-sets e.g subsisting from weekdays and/or variations of components forming time and mark them e.g. '%d:%dpm', '%d am', '%d/%d', '%d/ %d' etc. may means time.
Note that it may have compositions e.g. "12 / 31" is 3gram ('12','/','31') should be one token "12/31" of interest.
"see" what tokens are around marked tokens like "9:45pm" e.g ('Th",'9/19','9:45pm') is 3gram formed from "interesting" tokens and apply rules about it that may determine meaning.
process for more specific analysis for example if have 31/12 so 31 > 12 means d/m, or vice verse, but if have 12/12 m,d will be available only in context build from text and/or outside.
Cheers
There is no any perfact solution. IT's completely depend on which type of data u are suppose to work. Quickly review and analyze data by going through certain set of data manually and prepare regex pattern and test it wheather it is working or not.
Predefined all packages solve a date extraction problem up to some extent and it is limited one. if one will approximately find out pattern by looking to data then user can prepare regex. It will help them to prevent to iterate and loop over all rules written in packages.
I'm trying to get week number with this simple script on python.
import datetime
t = datetime.date(2013,8,18)
print t.isocalendar()[1]
It returns 33 for ISO format, but for the US calendar it should be 34.
How can I get this week number for US format?
I ran into the same problem. The ISO calendar is a great concept and may have some really good uses, but for business in the U.S. it just doesn't work well. I can't write an application that could potentially report days in the end of December as being in the next calendar year. The solution that works for me is:
from datetime import *
today = datetime.today()
print today.strftime("%U")
This will return the correct U.S. week number. The only caveat with this is that all days in a new year preceding the first Sunday are considered to be in week 0. Credit to Chaggster who gave a similar answer here: How to get week number in Python?
I'm in calendar hell, and I'm hoping there exists a Python module out there that does what I want.
I'm making a Python web app that deals with subscriptions. It's conceptually similar to a cell phone plan: You start your subscription on a certain date (say 1.13.2011), and for every billing month you have a bunch of "sessions" (phone calls), that you would be charged for.
We need to:
Know under which billing month each session falls.
Know the start time and end time of each billing month.
For example, if you signed up on 1.13.2011, and made a phone call on 1.20.2011, it would count on your first billing month. Same for a phone call on 2.10.2011. But if you were to make a phone call on 2.15.2011, it will count on your second billing month.
Regarding start and end dates: If today is 2.15.2011, then the start date of the current month is 2.13.2011 and its end date is 3.13.2011.
You may be thinking this is not so complicated, but then you have to consider that months have different lengths. The rule for handling this is that if your subscription started on the 30th of whatever month, its cutoff dates on each month would be min(30, n_days_in_that_month). This goes for 29, 30 and 31 as well.
I tried coding this, but it got too complex. What I'm looking for is a ready-made, existing module that does these things.
For the love of God don't post an answer with a sketch of an implementation! This is useless for me. I appreciate your intentions, but in calendar hell, sketches of implementations do not help. I already have a sketch of an implementation, and debugging yours will take just as long as debugging mine.
I am only interested in using an existing module that handles such calendar tasks. Do you know one?
http://labix.org/python-dateutil
Ram's edit: The dateutil.rrule.rrule class is the one that did exactly what I wanted.
Regarding start and end dates: If today is 2.15.2011, then the start date of the current month is 2.13.2011 and its end date is 3.13.2011.
You may be thinking this is not so complicated, but then you have to consider that months have different lengths. The rule for handling this is that if your subscription started on the 30th of whatever month, its cutoff dates on each month would be min(30, n_days_in_that_month). This goes for 29, 30 and 31 as well.
Its still pretty basic. Use datetime module to store datetimes, so you can easily parse out the day (e.g., if dt is a date then dt.day). A billing cycle starts on say the 29th (toughest type of case). Let billing_cycle_day=29. A billable event occurs on say the event_day=10, event_month=5. Then since event_day < billing_cycle_day you bill to event_month's bill. Otherwise you bill to the next months bill (remembering that if month=12; you have to increment the year).
So now the billing cycle will always be from the 29th to the 28th in the next month. The complication arises if say a date like 2/29/2011 doesn't exist. E.g., a billing cycle start_date should be 2/29/2011 (but it doesn't exist); in this case you just make it the first on the next month.
billing_cycle_day = 29
year, month = 2011, 2
import datetime
def create_date_or_first_of_next_month(year, month, day):
try:
return datetime.date(year, month, day)
except ValueError:
year_n, month_n = (year, month+1) if month != 12 else (year+1, 1)
return datetime.date(year_n, month_n, 1)
This problem is not as hard as you think. All you have to do is write a function that given a starting day (like 13 or 30) it returns two date objects which are the beginning and end of the current fiscal month. You have already sketched out all the details in your question. Best to include an optional todayis parameter to the function so that you specify what day to use as a reference for today. For instance, if today is the 15th of October 2011, and you specify 13, the function would assume that you mean the 13th of October 2011. But if you want to rerun June data, you would specify todayis=date(2011,06,13)
The return values (start and end) allow you to pinpoint dates that belong in this fiscal month. But if the date is before the start date and less than 29 days before the start date, then you can also pinpoint in the previous fiscal month. The same goes for the next fiscal month. This is useful because there will be a lot of situations where you process data after a few days, so you will have a mix of two fiscal months to process.