Why is this python exception not ValueError? - python

The code:
import datetime
TF = "%d-%M-%Y %H:%M"
last= datetime.datetime.strptime( "11/07/10 10:00", TF)
Throws the following exception:
Traceback (most recent call last):
File "strange.py", line 4, in <module>
last= datetime.datetime.strptime( "11/07/10 10:00", TF)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 308, in _strptime
format_regex = _TimeRE_cache.compile(format)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 265, in compile
return re_compile(self.pattern(format), IGNORECASE)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: redefinition of group name 'M' as group 5; was group 2
Now I believe my error is that I use %M twice when defining the date format. Here's my query:
I would expect the code to either:
a) accept the fact you mine have the same time value twice in in a string (it might be redundent, but so is "monday" if you have the rest of the date)
b) throw a value error saying that the same field shouldn't be used more than once.
This looks like something very different. What's going on?

Value Error is used when "a built-in operation or function receives an argument that has the right type but an inappropriate value" (docs) - so in that case, that would mean sending TF as a malformed string with a wrong formatting (try with %K for example).
Here you used a correct formatting - but as your error mentions, you failed on the the SRE paring part - since you defined the same group (that's how the %xs are interpreted) twice, and the regex parser failed since it can not understand when you tell him that the group M should match two different parts of the string, which it can't "guess" by itself.

Nothing directly detected the specific error you made.
The datetime module turned your strptime format into a regular expression to do the actual parsing, without analyzing it (or having any need to analyze it) in sufficient detail to notice the duplicated field. This resulted in an invalid regular expression, and the re module rightfully threw an error - one which I'd consider to be closer to a SyntaxError than a ValueError. The datetime module passed this on without trying to figure out the source of the problem.

you have 2 errors:
the format does not match because you are using "-" in TF and passed "/" as a time! you should pass the same format as the string.
you passed 2 minutes symbols ( 'M' = minutes, 'm' = months )
this is the correct solution for you:
import datetime
TF = "%d-%m-%y %H:%M"
last= datetime.datetime.strptime( "11-07-10 10:00", TF)
good luck!

Related

Regex 'sre_constants.error: bad character range' in large regex pattern

The following is the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
This is my object:
>>> re101121=re.compile("""(?i)激[ _]{0,}活[ _]{0,}邮[ _]{0,}箱|(click|clicking)[ _]{1,}[here ]{0,1}to[ _]{1,}verify|stop[ _]{1,}mail[ _]{1,}.{1,16}[ _]{1,}here|(click|clicking|view|update)([ _-]{1,}|\\xc2\\xa0)(on|here|Validate)[^a-z0-9]{1}|(點|点)[ _]{0,}(擊|击)[ _]{0,}(這|这|以)[ _]{0,}(裡|里|下)|DHL[ _]{1,}international|DHL[ _]{1,}Customer[ _]{1,}Service|Online[ _]{1,}Banking|更[ _]{0,}新[ _]{0,}您[ _]{0,}的[ _]{0,}(帐|账)[ _]{0,}户|CONFIRM[ _]{1,}ACCOUNT[ _]{1,}NOW|avoid[ _]{1,}Account[ _]{1,}malfunction|confirm[ _]{1,}this[ _]{1,}request|verify your account IP|Continue to Account security|继[\\s-_]*续[\\s-_]*使[\\s-_]*用|崩[\\s-_]*溃[\\s-_]*信[\\s-_]*息|shipment[\\s]+confirmation|will be shutdown in [0-9]{0,} (hours|days)|DHL Account|保[ ]{0,}留[ ]{0,}密[ ]{0,}码|(Password|password|PASSWORD).*(expired|expiring)|login.*email.*password.*confirm|[0-9]{0,} messages were quarantined|由于.*错误(的)?(送货)?信息|confirm.*(same)? password|keep.*account secure|settings below|loss.*(email|messages)|simply login|quick verification now""")
After minimization, your error boils down to re.compile("""[\\s-_]"""). This is a bad character range indeed; you probably meant the dash to be literal re.compile(r"[\s\-_]") (always use raw strings for regex r"..."). Moving the dash to the end of the bracket group works too: r"[\s_-]".
In the future, try to binary search to find the minimal failing input: remove the right half of the regex. If it still fails, the problem must have been in the left half. Remove the right half of the remaining substring and repeat until you're down to a minimal failing case. This technique doesn't always work when the problem spans both halves, but it can't hurt to try.
As mentioned in the comments, it's pretty odd to have such a massive regex as this, but I'll assume you know what you're doing.
As another aside, there are some antipatterns in this regex (pardon the pun) like {0,} which can be simplified to *.

How to fix bad escape regex error (python re)

I've been messing around with re.sub() to see how I would change the format from Y-m-d to M/d/y. To perform the test, I defined the starting variable: current_date = "2012-05-26"
I would try to achieve to convert that date to 05/26/2012.
I tried to achieve this without using DateTime but with regex. I used re.sub as below:
formatted_date = re.sub(r"\d{2,4}-\d{1,2}-\d{1,2}", r"[^a-zA-Z]\d{1,2}/\d{1,2}/\d{2,4}", current_date)
The first regex is to match the original format of Y-M-D and the second Regex is to try to convert it to the format that I want it to be. I got the following error:
Traceback (most recent call last):
File "C:\Users\ghub4\AppData\Local\Programs\Python\Python39\lib\sre_parse.py", line 1039, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\\d'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\ghub4\OneDrive\Desktop\test_sub.py", line 5, in <module>
formatted_date = re.sub(r"\d{2,4}-\d{1,2}-\d{1,2}", r"[^a-zA-Z]\d{1,2}/\d{1,2}/\d{2,4}", current_date)
File "C:\Users\ghub4\AppData\Local\Programs\Python\Python39\lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Users\ghub4\AppData\Local\Programs\Python\Python39\lib\re.py", line 327, in _subx
template = _compile_repl(template, pattern)
File "C:\Users\ghub4\AppData\Local\Programs\Python\Python39\lib\re.py", line 318, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "C:\Users\ghub4\AppData\Local\Programs\Python\Python39\lib\sre_parse.py", line 1042, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \d at position 9
Full Code:
import re
current_date = "2012-05-26"
formatted_date = re.sub(r"\d{2,4}-\d{1,2}-\d{1,2}", r"[^a-zA-Z]\d{1,2}/\d{1,2}/\d{2,4}", current_date)
print(formatted_date)
I've traced the error to potential the second regex but I'm unsure where position 9 is and how to fix the error. Another reason why I'm not sure how to fix it is due to the first error where it stated a keyerror raised by \\d. I'm sure that when the regex is interpret somewhere in the code, it is taking the \d as \\d instead which Im also not sure how to prevent that. I'm also pretty sure that the second regex may backfire on me and I am working on a solution on that after this question is posted. How would I be able to correct these errors?
The replacement string for a regex is not a regex in itself, rather it is a string which may contain references to groups captured by the original regex. In your case, you want to capture the year, month and day and then output them in the result string. You do that with () around the values you want to capture, and then refer to the groups by \1, \2, and \3 in the replacement string, with the numbers being assigned in order of the groups being captured. So for your code, you want:
formatted_date = re.sub(r"(\d{2,4})-(\d{1,2})-(\d{1,2})", r"\2/\3/\1", current_date)
Try and group your digits (If you goal is testing then position 9 is your first \d in your second regex-check - It is an invalid group reference):
formatted_date = re.sub(r"(\d{2,4})-(\d{1,2})-(\d{1,2})",r"\2/\3/\1",current_date)

Sympifying strings containing expressions with superscripts

I am trying to sympify a string like these
str1="a^0_0"
ns={}
ns['a^0_0']=Symbol('a^0_0')
pprint(sympify(str1,locals=ns))
But I get the following error
Traceback (most recent call last):
File "cuaterniones_basic.py", line 114, in <module>
pprint(sympify(str1,locals=ns))
File "/usr/local/lib/python2.7/dist-packages/sympy/core/sympify.py", line 356, in sympify
raise SympifyError('could not parse %r' % a, exc)
sympy.core.sympify.SympifyError: Sympify of expression 'could not parse u'a^0_0'' failed, because of exception being raised:
SyntaxError: invalid syntax (<string>, line 1
How can get the symbol I want?
sympify can only parse expressions if they are valid Python (with a few minor exceptions). That means that symbol names can only be parsed if they are valid Python variable names. The solution depends on the exact nature of what you are trying to parse.
If the whole string is the symbol name, just use Symbol instead of sympify.
If you are constructing the Symbol objects from known strings, wrap them in Symbol('...') in your string, like sympify("Symbol('a^0') + 1").
If you know what characters you will see, you can try swapping them before parsing, then swapping them back in the expression with replace.
>>> sympify('a^0 + 1'.replace('^', '__').replace(lambda a: isinstance(a, Symbol), lambda a: Symbol(a.name.replace('__', '^')))
a^0 + 1
(don't confuse str.replace and SymPy's expr.replace here).
This will not work if the characters in your symbol names are also used to represent math outside of the symbol names (like if you use ^ to represent actual exponentiation).
In general, you may need to write your own parsing tool. SymPy's parsing utilities in sympy.parsing can help here.
Indeed, the parser makes a decision about the structure of your input string before it comes to converting pieces to SymPy atoms.
There are a bunch of knobs one can twist by using parse_expr instead of sympify but I haven't found one that works for this string. Instead, it may be easiest to preprocess the input with string replacement, replacing the troublesome characters with something else. This preprocessing doesn't affect the final outcome because the dictionary ns will make things right again.
str1 = "a^0_0"
new_str1 = str1.replace("^", "up")
ns = {new_str1: Symbol(str1)}
print(sympify(new_str1, locals=ns))
Prints a^0_0 which is the name of the created symbol.

Python TypeError when using variable in re.sub

I'm new to python and I keep getting an error doing the simpliest thing.
I'm trying to use a variable in a regular expression and replace that with an *
the following gets me the error "TypeError: not all arguments converted during string formatting" and I can't tell why. this should be so simple.
import re
file = "my123filename.zip"
pattern = "123"
re.sub(r'%s', "*", file) % pattern
Error:
Traceback (most recent call last):
File "", line 1, in ?
TypeError: not all arguments converted during string formatting
Any tips?
You're problem is on this line:
re.sub(r'%s', "*", file) % pattern
What you're doing is replacing every occurance of %s with * in the text from the string file (in this case, I'd recommend renaming the variable filename to avoid shadowing the builtin file object and to make it more explicit what you're working with). Then you're trying to replace the %s in the (already replaced) text with pattern. However, file doesn't have any format modifiers in it which leads to the TypeError you see. It's basically the same as:
'this is a string' % ("foobar!")
which will give you the same error.
What you probably want is something more like:
re.sub(str(pattern),'*',file)
which is exactly equivalent to:
re.sub(r'%s' % pattern,'*',file)
Try re.sub(pattern, "*", file)? Or maybe skip re altogether and just do file.replace("123", "*").

Python's Regular Expression Source String Length

In Python Regular Expressions,
re.compile("x"*50000)
gives me OverflowError: regular expression code size limit exceeded
but following one does not get any error, but it hits 100% CPU, and took 1 minute in my PC
>>> re.compile(".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000)
<_sre.SRE_Pattern object at 0x03FB0020>
Is that normal?
Should I assume, ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 is shorter than "x"*50000?
Tested on Python 2.6, Win32
UPDATE 1:
It Looks like ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 could be reduce to .*?
So, how about this one?
re.compile(".*?x"*50000)
It does compile, and if that one also can reduce to ".*?x", it should match to string "abcx" or "x" alone, but it does not match.
So, Am I missing something?
UPDATE 2:
My Point is not to know max limit of regex source strings, I like to know some reasons/concepts of "x"*50000 caught by overflow handler, but not on ".*?x"*50000.
It does not make sense for me, thats why.
It is something missing on overflow checking or Its just fine or its really overflowing something?
Any Hints/Opinions will be appreciated.
The difference is that ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 can be reduced to ".*?", while "x"*50000 has to generate 50000 nodes in the FSM (or a similar structure used by the regex engine).
EDIT: Ok, I was wrong. It's not that smart. The reason why "x"*50000 fails, but ".*?x"*50000 doesn't is that there is a limit on size of one "code item". "x"*50000 will generate one long item and ".*?x"*50000 will generate many small items. If you could split the string literal somehow without changing the meaning of the regex, it would work, but I can't think of a way to do that.
you want to match 50000 "x"s , correct??? if so, an alternative without regex
if "x"*50000 in mystring:
print "found"
if you want to match 50000 "x"s using regex, you can use range
>>> pat=re.compile("x{50000}")
>>> pat.search(s)
<_sre.SRE_Match object at 0xb8057a30>
on my system it will take in length of 65535 max
>>> pat=re.compile("x{65536}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 188, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.6/re.py", line 241, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.6/sre_compile.py", line 529, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
>>> pat=re.compile("x{65535}")
>>>
I don't know if there are tweaks in Python we can use to increase that limit though.

Categories