How to replace a pattern using regular expression? - python

string1 = "2018-Feb-23-05-18-11"
I would like to replace a particular pattern in a string.
Output should be 2018-Feb-23-5-18-11.
How can i do that by using re.sub ?
Example:
import re
output = re.sub(r'10', r'20', "hello number 10, Agosto 19")
#hello number 20, Agosto 19
Fetching the current_datetime from datetime module. i'm formatting the obtained datetime in a desired format.
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime("%Y-%b-%d-%I-%M-%S")
I thought, re.sub is the best way to do that.
ex1 :
string1 = "2018-Feb-23-05-18-11"
output : 2018-Feb-23-5-18-11
ex2 :
string1 = "2018-Feb-23-05-8-11"
output : 2018-Feb-23-5-08-11

When working with dates and times, it is almost always best to convert the date first into a Python datetime object rather than trying to attempt to alter it using a regular expression. This can then be converted back into the required date format more easily.
With regards to leading zeros though, the formatting options only give leading zero options, so to get more flexibility it is sometimes necessary to mix the formatting with standard Python formatting:
from datetime import datetime
for test in ['2018-Feb-23-05-18-11', '2018-Feb-23-05-8-11', '2018-Feb-1-0-0-0']:
dt = datetime.strptime(test, '%Y-%b-%d-%H-%M-%S')
print '{dt.year}-{}-{dt.day}-{dt.hour}-{dt.minute:02}-{dt.second}'.format(dt.strftime('%b'), dt=dt)
Giving you:
2018-Feb-23-5-18-11
2018-Feb-23-5-08-11
2018-Feb-1-0-00-0
This uses a .format() function to combine the parts. It allows objects to be passed and the formatting is then able to access the object's attributes directly. The only part that needs to be formatted using strftime() is the month.
This would give the same results:
import re
for test in ['2018-Feb-23-05-18-11', '2018-Feb-23-05-8-11', '2018-Feb-1-0-0-0']:
print re.sub(r'(\d+-\w+)-(\d+)-(\d+)-(\d+)-(\d+)', lambda x: '{}-{}-{}-{:02}-{}'.format(x.group(1), int(x.group(2)), int(x.group(3)), int(x.group(4)), int(x.group(5))), test)

Use the datetime module.
Ex:
import datetime
string1 = "2018-Feb-23-05-18-11"
d = datetime.datetime.strptime(string1, "%Y-%b-%d-%H-%M-%S")
print("{0}-{1}-{2}-{3}-{4}-{5}".format(d.year, d.strftime("%b"), d.day, d.hour, d.minute, d.second))
Output:
2018-Feb-23-5-18-11

Related

How to detect dash or underscore in datetime string to use in strptime?

I have several thousand files which feature datetime in their file name.
Sadly the devider between the datetime blocks are not always the same.
Example:
Data_trul-100A1-Berlin_2019-01-31_150480.dat
Data_tral-2000B2-Frankf-2018_02_27-190200.dat
Data_bash-300003_Hambrg_2017-04-12_210500.dat
I managed to find the datetime part in the string with a regular expression
import re
strings = ['Data_trul-100A1-Berlin_2019-01-31_150430.dat',
'Data_tral-2000B2-Frankf-2018_02_27-190200.dat',
'Data_bash-300003_Hambrg_2017-04-12_210500.dat']
for part_string in strings:
match = re.search('\d{4}[-_]\d{2}[-_]\d{2}[-_]\d{6}', part_string)
print(match.group())
However, now I am stuck to convert the group to datetime
from datetime import datetime
date = datetime.strptime(match.group(), "%Y-%m-%d_%H%M%S")
because I need to specify dashes or underscores.
I came up with the following solution to just replace it, but that feels like cheating.
for part_string in strings:
part_string = part_string.replace('-',"_")
match = re.search('\d{4}_\d{2}_\d{2}_\d{6}', part_string)
date = datetime.strptime(match.group(), "%Y_%m_%d_%H%M%S")
print(date)
Is there a more elegant way? Using regex to find the divider and pass it on to strptime?
You could change your regular expression to find 4 separate elements
match = re.search('(\d{4})[-_](\d{2})[-_](\d{2})[-_](\d{6})', part_string)
Then combine them into one standard string format
fixedstring = "{}_{}_{}_{}".format(match.groups())
date = datetime.strptime(match.group(), "%Y_%m_%d_%H%M%S")
Of course at this point you could just split the HHMMSS part of the time into their own elements and build the datetime object directly,
m = re.search('(\d{4})[-_](\d{2})[-_](\d{2})[-_](\d{2})(\d{2})(\d{2})', part_string)
date = datetime.datetime(year=m.group(0),
month=m.group(1),
day=m.group(2),
hour=m.group(3),
minute=m.group(4),
second=m.group(5))

In Python, how to parse a datetime from a string which also contains other words

I'm familiar with dateutil.parser which allows one to parse a string representing a time into a datetime object. What I would like to do, however, is to 'search' for such a 'time string' within a larger string representing an interval of time. For example:
from datetime import timedelta
import dateutil.parser
import parse
start = dateutil.parser.parse("5 Nov 2016 15:00")
end = start + timedelta(hours=1)
string = "from {start} till {end}".format(start=start, end=end)
start_pattern = "from {:tg}"
result = parse.search(start_pattern, string)
I'd like to recover the start and end as datetime objects based on the fact that they follow the words "from" and "till", respectively.
Here I have tried to use the parse module, but the format specifier :tg (for global time syntax) doesn't seem to work on datetime's default string representation, nor do the other available ones look similar to the one in string.
What would be a simple and elegant way to parse back the start and end in this example?
The re package could help you in this case; just make regular expressions for the strings you want to match, and use them to extract the date part.
I found a way to do it using a regular expression:
from datetime import timedelta
import dateutil.parser
import re
start = dateutil.parser.parse("5 Nov 2016 15:00")
end = start + timedelta(hours=1)
string = "from {start} till {end}".format(start=start, end=end)
pattern = '(?:\s*from\s*)' + '(?P<start>.+)' + '(?:\s*till\s*)' + '(?P<end>.+)' + '(?:\s*)'
groups = re.match(pattern, string).groupdict()
parsed_start = dateutil.parser.parse(groups['start'])
parsed_end = dateutil.parser.parse(groups['end'])
assert parsed_start == start
assert parsed_end == end

How do you add : to a time formatted like this 203045 in python?

I've been trying to get this time formatted value from 203045 to 20:40:45 in python. I clearly have no clue where to start. Any help will be appreciated!
Thanks!
Use strptime and strftime functions from datetime, the former constructs a datetime object from string and the latter format datetime object to string with specific format:
from datetime import datetime
datetime.strptime("203045", "%H%M%S").strftime("%H:%M:%S")
# '20:30:45'
you can also play with the regular expression to get the same result :)
import re
ch = "203045"
print ":".join(re.findall('\d{2}',ch))
# '20:30:45'
try this to remove the two last digits if they are equal to zero :
import re
ch = "20304500"
print ":".join([e for e in re.findall('\d{2}',ch) if e!="00"])
# '20:30:45'
or whatever (the two last digits) :
import re
ch = "20304500"
print ":".join(re.findall('\d{2}',ch)[:-1])
# '20:30:45'

Extract date string from (more) complex string (possibly a regex match)

I have a string template that looks like 'my_index-{year}'.
I do something like string_template.format(year=year) where year is some string. Result of this is some string that looks like my_index-2011.
Now. to my question. I have a string like my_index-2011 and my template 'my_index-{year}' What might be a slick way to extract the {year} portion?
[Note: I know of the existence of parse library]
There is this module called parse which provides an opposite to format() functionality:
Parse strings using a specification based on the Python format() syntax.
>>> from parse import parse
>>> s = "my_index-2011"
>>> f = "my_index-{year}"
>>> parse(f, s)['year']
'2011'
And, an alternative option and, since you are extracting a year, would be to use the dateutil parser in a fuzzy mode:
>>> from dateutil.parser import parse
>>> parse("my_index-2011", fuzzy=True).year
2011
Use the split() string function to split the string into two parts around the dash, then grab just the second part.
mystring = "my_index-2011"
year = mystring.split("-")[1]
I assume "year" is 4 digits and you have multiple indexes
import re
res = ''
patterns = [ '%s-[0-9]{4}'%index for index in idx ]
for index,pattern in zip(idx,patterns):
res +=' '.join( re.findall(pattern ,data) ).replace(index+'-','') + ' '
---update---
dummyString = 'adsf-1234 fsfdr lkjdfaif ln ewr-1234 adsferggs sfdgrsfgadsf-3456'
dummyIdx = ['ewr','adsf']
output
1234 1234 3456
Yes, a regex would be helpful here.
In [1]: import re
In [2]: s = 'my_string-2014'
In [3]: print( re.search('\d{4}', s).group(0) )
2014
Edit: I should have mentioned your regex can be more sophisticated. You can haul out a subcomponent of a more specific string, for example:
In [4]: print( re.search('my_string-(\d{4})$', s).group(1) )
2014
Given the problem you presented, I think any "find the year" formula should be expressible in terms of a regular expression.
You are going to want to use the string method split to split on "-", and then catch the last element as your year:
year = "any_index-2016".split("-")[-1]
Because you caught the last element (using -1 as the index), your index can have hyphens in them, and you will still extract the year appropriately.

How to replace string with certain format in python

i am trying to do string manipulation based on format. str.replace(old,new) alllows changing by specific string pattern. is it possible to find and replace by format? for example,
i want to find all datetime like value in a long string and replace it with another format
assuming % is wildcard for number and datetime is %%/%%/%%T%%:%%
str.replace(%%/%%/%%T%%:%%, 'dummy value')
EDIT:
sorry i should have been more clearer. re.sub seems like I can use that, but how do it substitute it with a date converted value. in this case, e.g.
YY/MM/DDTHH:MM to (YY/MM/DD HH:MM)+8 hours
The easiest way to do this is probably using a combination of regular expression syntax, applying re.sub and using the fact that the repl parameter can be a function that takes a match and returns a string to replace it, and datetime's syntax for strptime and strftime:
>>> from datetime import datetime
>>> import re
>>> def replacer(match):
return datetime.strptime(
match.group(), # matched text
'%y/%m/%dT%H:%M', # source format in datetime syntax
).strftime('%d %B %Y at %H.%M') # destination format in datetime syntax
>>> re.sub(
r'\d{2}/\d{2}/\d{2}T\d{2}:\d{2}', # source format in regex syntax
replacer, # function to process match
'The date and time was 12/12/12T12:12 exactly.', # string to process
)
'The date and time was 12 December 2012 at 12.12 exactly.'
The only downside of this is that you need to define the source format in both datetime and re syntax, which isn't very DRY; if they don't match, you'll get nowhere.

Categories