Evaluate ANSI escapes in Python string - python

Say I have the string '\033[2KResolving dependencies...\033[2KResolving dependencies...'
In the Python console, I can print this, and it'll only display once
Python 3.10.9 (main, Jan 19 2023, 07:59:38) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> output = '\033[2KResolving dependencies...\033[2KResolving dependencies...'
>>> print(output)
Resolving dependencies...
Is there a way to get a string that consists solely of the printed output? In other words, I would like there to be some function
def evaluate_ansi_escapes(input: str) -> str:
...
such that evaluate_ansi_escapes(output) == 'Resolving dependencies...' (ideally with the correct amount of whitespace in front)
edit: I've come up with the following stopgap solution
import re
def evaluate_ansi_escapes(input: str) -> str:
erases_regex = r"^.*(\\(033|e)|\x1b)\[2K"
erases = re.compile(erases_regex)
no_erases = []
for line in input.split("\n"):
while len(erases.findall(line)) > 0:
line = erases.sub("", line)
no_erases.append(line)
return "\n".join(no_erases)
This does successfully produce output that is close enough to I want:
>>> evaluate_ansi_escapes(output)
'Resolving dependencies...'
But I would love to know if there is a less hacky way to solve this problem, or if the whitespace preceding 'Resolving dependencies...' can be captured as well.

Related

Python automatically converting some strings to raw strings?

Python seems to be automatically converting strings (not just input) into raw strings. Can somebody explain what is happening here?
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\stest'
>>> s
'\\stest'
# looks like a raw string
>>> print(s)
\stest
>>> s = '\ntest'
>>> s
'\ntest'
# this one doesn't
>>> s = '\n test'
>>> s
'\n test'
>>> s = r'\n test'
>>> s
'\\n test'
>>> print(s)
\n test
The question marked as a duplicate for this one seems to be useful, but then I do not understand why
>>> s = '\n test'
>>> s
'\n test'
>>> repr(s)
"'\\n test'"
does not get two backslashes when called, and does when repr() is called on it.
\n is a valid escape sequence and '\n' is a length 1 string (new line character). In contrast, \s is an invalid escape sequence, so Python is assuming that what you wanted there was a two character string: a backlash character plus an s character.
>>> len('\s')
2
What you saw on terminal output was just the usual representation for such a length 2 string. Note that the correct way to create the string which Python gave you back here would have been with r'\s' or with '\\s'.
>>> r'\s' == '\\s' == '\s'
True
This is a deprecated behavior. In a future version of Python, likely the next point release, your code will be a syntax error.
Since you're using v3.7.1, you could enable warnings if you want to be informed about such uses of deprecated features:
$ python -Wall
>>> '\s'
<stdin>:1: DeprecationWarning: invalid escape sequence \s
'\\s'
As for your subsequent question after the edit:
>>> s = '\n test'
>>> s # this prints the repr(s)
'\n test'
>>> repr(s) # this prints the repr(repr(s))
"'\\n test'"

remove single quotes from array output [duplicate]

This question already has answers here:
In Python IDLE, what's the difference between 'print'ing a variable and just writing the variable?
(2 answers)
What is the difference between __str__ and __repr__?
(28 answers)
Closed 4 years ago.
I have an array defined like this in Python.
keys = "setid","cntrct_id","version_nbr"
and the elements are pushed into array as expected.
print(keys)
('setid', 'cntrct_id', 'version_nbr')
But when am trying to insert quotes and split the elements with ',' seperated
am getting output like this
'"setid","cntrct_id","version_nbr"'
am expecting output like this:
"setid","cntrct_id","version_nbr"
I tried many ways,
(','.join('"' + x + '"' for x in keys))
','.join(map(lambda x: "\"" + x + "\"", keys))
','.join(['"%s"' % w for w in keys])
but everything is appending single quotes,
How should I avoid generating single quotes from output?
I think the ' is just from the Python shell and not really part of the string itself. Have a look at the following example:
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> test_string = "hello"
>>> test_string
'hello'
>>> print(test_string)
hello
>>>

Removing non integers from a grep obtained string w/ Python and Bash

I am using grep to grab the text out of a file:
NELECT = 44.0000 total number of electrons,
and I need to save the number as a variable. I have tried a handful of methods I have found here such as using filters and findall. For some reason I can only get it to separate one zero.
So far the code looks like this:
wd=os.getcwd()
electrons=str(os.system("grep 'NELECT' "+wd+"/OUTCAR"))
VBM=(re.findall('\d+', electrons))
print VBM
And in return I get ['0'].
The result of os.system is the exit status of the command, not the output of the command -- see https://docs.python.org/3/library/os.html#os.system
$ cat OUTCAR
NELECT = 44.0000 total number of electrons,
$ python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> result = os.system("grep 'NELECT' "+os.getcwd()+"/OUTCAR")
NELECT = 44.0000 total number of electrons,
>>> result
0
The "NELECT" line was just printed by grep to stdout, but not captured in the result variable
>>> from subprocess import check_output
>>> result2 = check_output(["grep", "NELECT", os.getcwd()+"/OUTCAR"])
>>> result2
'NELECT = 44.0000 total number of electrons,\n'
>>> import re
>>> re.findall(r'\d+', result2)
['44', '0000']
Or, don't call out to grep, read the file yourself
>>> import os
>>> import re
>>> with open(os.getcwd() + "/OUTCAR") as f:
... for line in f:
... if "NELECT" in line:
... digits = re.findall(r'\d+', line)
... break
...
>>> digits
['44', '0000']
Or, maybe don't use a regular expression:
>>> words = line.split()
>>> words[2]
'44.0000'
>>> int(float(words[2]))
44
Are you sure that electrons has output specified? For me this regex returns list with two elements: ['44', '000'] and that's expected behavior. So most probably there is something wrong with grep call.
Your regex won't retrieve whole 44.000 as \d+ catches only continuous digit strings, no dot symbols. To get whole number use something like \b\d+\.\d+\b which means: any word (\b means word beginning/ending, dot must be escaped as . in regex matches any character) that contains at least 1 digit, dot and at least 1 more digit. If dot is optional, then something like this: \b(\d+(?:\.\d+)?)\b ((?:) creates group that will not be captured so your output will still be single element list).
Note that re.findall will return list of string matches. To retrieve number from first match: float(VBM[0])
Edit. Forgot to add: avoid using print statement, it works oddly with tuples and is completely removed in Python 3. Python 2 support ends in 2020 so it's better to prepare. You can replace print statement with Python 3 print function by adding from __future__ import print_function at the file beginning.

how to match 'space' symbol (and only space) in python2 regex?

I am trying to use Python v2 (2.7.5 specifically) and the 're' module for regex matching. My problem is that for my application I need to match the 'space' symbol (i.e. 0x20 in hex) and ONLY that symbol as part of the match string. The first thing I tried for that was '\s' and that does not work because it also matches the newline, return, tab and form.
The end requirement is to match a string where the first three characters are digits ('\d'), there is a comma (',') and then eight symbols that are either digits ('\d') or spaces (???).
Any suggestions on how to do that? What I have already tried...
C:\Users\jlaird>python
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> goodstring = '333,000000 2'
>>> badstring = '333,000000\t2'
>>> print badstring
333,000000 2
>>> sRegex = '\d\d\d,[\s\d][\s\d][\s\d][\s\d][\s\d][\s\d][\s\d][\s\d]'
>>> cRegex = re.compile(sRegex)
>>> cRegex.match(goodstring)
<_sre.SRE_Match object at 0x023A7A30>
>>> cRegex.match(badstring)
<_sre.SRE_Match object at 0x025E82C0>
>>>
I want 'badstring' to evaluate to None because it has the tab character instead of the space. How can I do this?
Thanks jonrsharpe...works. It is always something simple that I make complicated. Sorry...
>>> sRegex = '\d\d\d,[ \d][ \d][ \d][ \d][ \d][ \d][ \d][ \d]'
>>> cRegex = re.compile(sRegex)
>>> cRegex.match(goodstring)
<_sre.SRE_Match object at 0x023A7A30>
>>> cRegex.match(badstring)
>>>

Python regular expression gives unexpected result

I'm trying to create an svn pre-commit hook, but can't get my regular expression to work as expected. It should print False for messages that do not look like "DEV-5 | some message". Why do I get True here?
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("^\[[A-Z]+-[0-9]+\] | .+$", re.DOTALL)
>>> message = "test message"
>>> match = p.search(message)
>>> bool(match)
True
>>> p = re.compile("^[A-Z]+-[0-9]+ \| .+$", re.DOTALL)
>>> print p.search("test message")
None
>>> print p.search("DEV-5 | some message")
<_sre.SRE_Match object at 0x800eb78b8>
you don't need \[ and \]
you need to escape |
The culprit is the trailing " | .+$" which is matching ' message' as an alternative to the first regex. As Roman pointed out you meant to match literal '|' so you have to escape it as '\|'.
To see what was being matched, you can do:
print match.group()
' message'
(By the way, a faster non-regex way to only handle lines containing vertical bar would use line.split('|'):
for line in ...:
parts = line.split('|',1)
if len(parts)==1: continue
(code,mesg) = parts
I haven't run the code, but I suspect that the part after the alternative (|) in your regexp matches any nonempty string starting with a space, in this case it's " message".

Categories