Python automatically converting some strings to raw strings? - python

Python seems to be automatically converting strings (not just input) into raw strings. Can somebody explain what is happening here?
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\stest'
>>> s
'\\stest'
# looks like a raw string
>>> print(s)
\stest
>>> s = '\ntest'
>>> s
'\ntest'
# this one doesn't
>>> s = '\n test'
>>> s
'\n test'
>>> s = r'\n test'
>>> s
'\\n test'
>>> print(s)
\n test
The question marked as a duplicate for this one seems to be useful, but then I do not understand why
>>> s = '\n test'
>>> s
'\n test'
>>> repr(s)
"'\\n test'"
does not get two backslashes when called, and does when repr() is called on it.

\n is a valid escape sequence and '\n' is a length 1 string (new line character). In contrast, \s is an invalid escape sequence, so Python is assuming that what you wanted there was a two character string: a backlash character plus an s character.
>>> len('\s')
2
What you saw on terminal output was just the usual representation for such a length 2 string. Note that the correct way to create the string which Python gave you back here would have been with r'\s' or with '\\s'.
>>> r'\s' == '\\s' == '\s'
True
This is a deprecated behavior. In a future version of Python, likely the next point release, your code will be a syntax error.
Since you're using v3.7.1, you could enable warnings if you want to be informed about such uses of deprecated features:
$ python -Wall
>>> '\s'
<stdin>:1: DeprecationWarning: invalid escape sequence \s
'\\s'
As for your subsequent question after the edit:
>>> s = '\n test'
>>> s # this prints the repr(s)
'\n test'
>>> repr(s) # this prints the repr(repr(s))
"'\\n test'"

Related

Evaluate ANSI escapes in Python string

Say I have the string '\033[2KResolving dependencies...\033[2KResolving dependencies...'
In the Python console, I can print this, and it'll only display once
Python 3.10.9 (main, Jan 19 2023, 07:59:38) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> output = '\033[2KResolving dependencies...\033[2KResolving dependencies...'
>>> print(output)
Resolving dependencies...
Is there a way to get a string that consists solely of the printed output? In other words, I would like there to be some function
def evaluate_ansi_escapes(input: str) -> str:
...
such that evaluate_ansi_escapes(output) == 'Resolving dependencies...' (ideally with the correct amount of whitespace in front)
edit: I've come up with the following stopgap solution
import re
def evaluate_ansi_escapes(input: str) -> str:
erases_regex = r"^.*(\\(033|e)|\x1b)\[2K"
erases = re.compile(erases_regex)
no_erases = []
for line in input.split("\n"):
while len(erases.findall(line)) > 0:
line = erases.sub("", line)
no_erases.append(line)
return "\n".join(no_erases)
This does successfully produce output that is close enough to I want:
>>> evaluate_ansi_escapes(output)
'Resolving dependencies...'
But I would love to know if there is a less hacky way to solve this problem, or if the whitespace preceding 'Resolving dependencies...' can be captured as well.

Removing non integers from a grep obtained string w/ Python and Bash

I am using grep to grab the text out of a file:
NELECT = 44.0000 total number of electrons,
and I need to save the number as a variable. I have tried a handful of methods I have found here such as using filters and findall. For some reason I can only get it to separate one zero.
So far the code looks like this:
wd=os.getcwd()
electrons=str(os.system("grep 'NELECT' "+wd+"/OUTCAR"))
VBM=(re.findall('\d+', electrons))
print VBM
And in return I get ['0'].
The result of os.system is the exit status of the command, not the output of the command -- see https://docs.python.org/3/library/os.html#os.system
$ cat OUTCAR
NELECT = 44.0000 total number of electrons,
$ python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> result = os.system("grep 'NELECT' "+os.getcwd()+"/OUTCAR")
NELECT = 44.0000 total number of electrons,
>>> result
0
The "NELECT" line was just printed by grep to stdout, but not captured in the result variable
>>> from subprocess import check_output
>>> result2 = check_output(["grep", "NELECT", os.getcwd()+"/OUTCAR"])
>>> result2
'NELECT = 44.0000 total number of electrons,\n'
>>> import re
>>> re.findall(r'\d+', result2)
['44', '0000']
Or, don't call out to grep, read the file yourself
>>> import os
>>> import re
>>> with open(os.getcwd() + "/OUTCAR") as f:
... for line in f:
... if "NELECT" in line:
... digits = re.findall(r'\d+', line)
... break
...
>>> digits
['44', '0000']
Or, maybe don't use a regular expression:
>>> words = line.split()
>>> words[2]
'44.0000'
>>> int(float(words[2]))
44
Are you sure that electrons has output specified? For me this regex returns list with two elements: ['44', '000'] and that's expected behavior. So most probably there is something wrong with grep call.
Your regex won't retrieve whole 44.000 as \d+ catches only continuous digit strings, no dot symbols. To get whole number use something like \b\d+\.\d+\b which means: any word (\b means word beginning/ending, dot must be escaped as . in regex matches any character) that contains at least 1 digit, dot and at least 1 more digit. If dot is optional, then something like this: \b(\d+(?:\.\d+)?)\b ((?:) creates group that will not be captured so your output will still be single element list).
Note that re.findall will return list of string matches. To retrieve number from first match: float(VBM[0])
Edit. Forgot to add: avoid using print statement, it works oddly with tuples and is completely removed in Python 3. Python 2 support ends in 2020 so it's better to prepare. You can replace print statement with Python 3 print function by adding from __future__ import print_function at the file beginning.

how to match 'space' symbol (and only space) in python2 regex?

I am trying to use Python v2 (2.7.5 specifically) and the 're' module for regex matching. My problem is that for my application I need to match the 'space' symbol (i.e. 0x20 in hex) and ONLY that symbol as part of the match string. The first thing I tried for that was '\s' and that does not work because it also matches the newline, return, tab and form.
The end requirement is to match a string where the first three characters are digits ('\d'), there is a comma (',') and then eight symbols that are either digits ('\d') or spaces (???).
Any suggestions on how to do that? What I have already tried...
C:\Users\jlaird>python
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> goodstring = '333,000000 2'
>>> badstring = '333,000000\t2'
>>> print badstring
333,000000 2
>>> sRegex = '\d\d\d,[\s\d][\s\d][\s\d][\s\d][\s\d][\s\d][\s\d][\s\d]'
>>> cRegex = re.compile(sRegex)
>>> cRegex.match(goodstring)
<_sre.SRE_Match object at 0x023A7A30>
>>> cRegex.match(badstring)
<_sre.SRE_Match object at 0x025E82C0>
>>>
I want 'badstring' to evaluate to None because it has the tab character instead of the space. How can I do this?
Thanks jonrsharpe...works. It is always something simple that I make complicated. Sorry...
>>> sRegex = '\d\d\d,[ \d][ \d][ \d][ \d][ \d][ \d][ \d][ \d]'
>>> cRegex = re.compile(sRegex)
>>> cRegex.match(goodstring)
<_sre.SRE_Match object at 0x023A7A30>
>>> cRegex.match(badstring)
>>>

Suppress print newline in python 3 str.format

I am using Python 3 and am trying to use my print statements with the str.format.
e.g:
print ('{0:3d} {1:6d} {2:10s} '.format (count1,count2,string1))
When I try to use the end='' to suppress the subsequent newline, this is ignored. A newline always happens.
How do I suppress the subsequent newline?
Source:
int1= 1
int2 = 999
string1 = 'qwerty'
print ( '{0:3d} {1:6d} {2:10s} '.format (int1,int2,string1))
print ('newline')
print ( '{0:3d} {1:6d} {2:10s} '.format (int1,int2,string1,end=''))
print ('newline')
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "copyright", "credits" or "license()" for more information.
1 999 qwerty
newline
1 999 qwerty
newline
Your problem is that you have the end='' argument being passed to the format function, not to the print function.
Change this line:
print ( '{0:3d} {1:6d} {2:10s} '.format (int1,int2,string1,end=''))
To this:
print ( '{0:3d} {1:6d} {2:10s} '.format (int1,int2,string1), end='')
By the way, you should also give PEP8 a read. It defines standards for Python coding styles, that you really should try to follow, unless you're working with a group of people that have agreed on some other style standards. In particular, your spacing is a bit weird around function calls - you shouldn't have spaces between function names and the argument parentheses, or between the parentheses and the first argument. I wrote my suggested solution to your problem in a way that maintains your current style, but it really should look more like this:
print('{0:3d} {1:6d} {2:10s} '.format(int1, int2, string1), end='')

Python regular expression gives unexpected result

I'm trying to create an svn pre-commit hook, but can't get my regular expression to work as expected. It should print False for messages that do not look like "DEV-5 | some message". Why do I get True here?
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("^\[[A-Z]+-[0-9]+\] | .+$", re.DOTALL)
>>> message = "test message"
>>> match = p.search(message)
>>> bool(match)
True
>>> p = re.compile("^[A-Z]+-[0-9]+ \| .+$", re.DOTALL)
>>> print p.search("test message")
None
>>> print p.search("DEV-5 | some message")
<_sre.SRE_Match object at 0x800eb78b8>
you don't need \[ and \]
you need to escape |
The culprit is the trailing " | .+$" which is matching ' message' as an alternative to the first regex. As Roman pointed out you meant to match literal '|' so you have to escape it as '\|'.
To see what was being matched, you can do:
print match.group()
' message'
(By the way, a faster non-regex way to only handle lines containing vertical bar would use line.split('|'):
for line in ...:
parts = line.split('|',1)
if len(parts)==1: continue
(code,mesg) = parts
I haven't run the code, but I suspect that the part after the alternative (|) in your regexp matches any nonempty string starting with a space, in this case it's " message".

Categories