casting raw strings python [duplicate] - python

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 3 years ago.
Given a variable which holds a string is there a quick way to cast that into another raw string variable?
The following code should illustrate what I'm after:
line1 = "hurr..\n..durr"
line2 = r"hurr..\n..durr"
print(line1 == line2) # outputs False
print(("%r"%line1)[1:-1] == line2) # outputs True
The closest I have found so far is the %r formatting flag which seems to return a raw string albeit within single quote marks. Is there any easier way to do this kind of thing?

Python 3:
"hurr..\n..durr".encode('unicode-escape').decode()
Python 2:
"hurr..\n..durr".encode('string-escape')

Yet another way:
>>> s = "hurr..\n..durr"
>>> print repr(s).strip("'")
hurr..\n..durr

Above it was shown how to encode.
'hurr..\n..durr'.encode('string-escape')
This way will decode.
r'hurr..\n..durr'.decode('string-escape')
Ex.
In [12]: print 'hurr..\n..durr'.encode('string-escape')
hurr..\n..durr
In [13]: print r'hurr..\n..durr'.decode('string-escape')
hurr..
..durr
This allows one to "cast/trasform raw strings" in both directions. A practical case is when the json contains a raw string and I want to print it nicely.
{
"Description": "Some lengthy description.\nParagraph 2.\nParagraph 3.",
...
}
I would do something like this.
print json.dumps(json_dict, indent=4).decode('string-escape')

Related

What do empty curly braces mean in a string?

So I have been browsing through online sites to read a file line by line and I come to this part of this code:
print("Line {}: {}".format(linecount, line))
I am quite confused as to what is happening here. I know that it is printing something, but it shows:
"Line{}"
I do not understand what this means. I know that you could write this:
foo = "hi"
print(f"{foo} bob")
But I don't get why there are empty brackets.
Empty braces are equivalent to numeric braces numbered from 0:
>>> '{}: {}'.format(1,2)
'1: 2'
>>> '{0}: {1}'.format(1,2)
'1: 2'
Just a shortcut.
But if you use numerals you can control the order:
>>> '{1}: {0}'.format(1,2)
'2: 1'
Or the number of times something is used:
>>> '{0}: {0}, {1}: {1}'.format(1,2)
'1: 1, 2: 2'
Which you cannot do with empty braces.
Doing "My {} string is {}".format('formatted', 'awesome') just fills in the curly braces with the args you provide in the format function in the order you enter the arguments.
So the first {} in the above string would get 'formatted' and the second in that case would get 'awesome'.
It's an older version of formatting strings than f strings (which I'm glad I started learning Python when these already came out), but you can equally write something like this similar to f-strings:
>>> template = 'I love {item}. It makes me {emotion}'
>>>
>>> my_sentence = template.format(item='fire', emotion='calm')
>>> print(my_sentence)
I love fire. It makes me calm.
This is a different way to interpolate strings in Python.
Docs: https://docs.python.org/3/tutorial/inputoutput.html#fancier-output-formatting
The usage of string interpolations like this f'Results of the {year} {event}' came in Python 3.6.

use one backslash and quotation marks together in string [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 4 years ago.
How can i do to represent a string with (\") inside string
I tried several ways:
date = 'xpto\"xpto'
'xpto"xpto'
date = 'xpto\\"xpto'
'xpto\\"xpto'
data='xpto\\' + '"xpto'
'xpto\\"xpto'
data= r'xpto\"xpto'
'xpto\\"xpto'
i need the string exactly like this
'xpto\"xpto'
if someone knows how, I really appreciate the help
The following line works.
print(r"'xpto\"xpto'")
Output:
'xpto\"xpto'
We add r to insinuate that the string is in a raw format.
and/or
print("'xpto\\\"xpto'") where \\ = \ escapes this and \" = " escaping the " with \
"'xpto\\\"xpto'" is correct. Part of the confusion is distinguishing the actual string with Python's textual representation of the string.
>>> date = "'xpto\\\"xpto'"
>>> date
'\'xpto\\"xpto\''
>>> print(date)
'xpto\"xpto'
A simpler solution (which came to mind after reading Elvir's answer) is to use a triple-quoted raw string:
date = r"""'xpto\"xpto'"""

Python - an extremely odd behavior of function lstrip [duplicate]

This question already has answers here:
Python string.strip stripping too many characters [duplicate]
(3 answers)
Closed 6 years ago.
I have encountered a very odd behavior of built-in function lstrip.
I will explain with a few examples:
print 'BT_NAME_PREFIX=MUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=NUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=PUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=SUV'.lstrip('BT_NAME_PREFIX=') # SUV
print 'BT_NAME_PREFIX=mUV'.lstrip('BT_NAME_PREFIX=') # mUV
As you can see, the function trims one additional character sometimes.
I tried to model the problem, and noticed that it persisted if I:
Changed BT_NAME_PREFIX to BT_NAME_PREFIY
Changed BT_NAME_PREFIX to BT_NAME_PREFIZ
Changed BT_NAME_PREFIX to BT_NAME_PREF
Further attempts have made it even more weird:
print 'BT_NAME=MUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=NUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=PUV'.lstrip('BT_NAME=') # PUV - different than before!!!
print 'BT_NAME=SUV'.lstrip('BT_NAME=') # SUV
print 'BT_NAME=mUV'.lstrip('BT_NAME=') # mUV
Could someone please explain what on earth is going on here?
I know I might as well just use array-slicing, but I would still like to understand this.
Thanks
You're misunderstanding how lstrip works. It treats the characters you pass in as a bag and it strips characters that are in the bag until it finds a character that isn't in the bag.
Consider:
'abc'.lstrip('ba') # 'c'
It is not removing a substring from the start of the string. To do that, you need something like:
if s.startswith(prefix):
s = s[len(prefix):]
e.g.:
>>> s = 'foobar'
>>> prefix = 'foo'
>>> if s.startswith(prefix):
... s = s[len(prefix):]
...
>>> s
'bar'
Or, I suppose you could use a regular expression:
>>> s = 'foobar'
>>> import re
>>> re.sub('^foo', '', s)
'bar'
The argument given to lstrip is a list of things to remove from the left of a string, on a character by character basis. The phrase is not considered, only the characters themselves.
S.lstrip([chars]) -> string or unicode
Return a copy of the string S with leading whitespace removed. If
chars is given and not None, remove characters in chars instead. If
chars is unicode, S will be converted to unicode before stripping
You could solve this in a flexible way using regular expressions (the re module):
>>> import re
>>> re.sub('^BT_NAME_PREFIX=', '', 'BT_NAME_PREFIX=MUV')
MUV

Python - Most elegant way to extract a substring, being given left and right borders [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I have a string - Python :
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
Expected output is :
"Atlantis-GPS-coordinates"
I know that the expected output is ALWAYS surrounded by "/bar/" on the left and "/" on the right :
"/bar/Atlantis-GPS-coordinates/"
Proposed solution would look like :
a = string.find("/bar/")
b = string.find("/",a+5)
output=string[a+5,b]
This works, but I don't like it.
Does someone know a beautiful function or tip ?
You can use split:
>>> string.split("/bar/")[1].split("/")[0]
'Atlantis-GPS-coordinates'
Some efficiency from adding a max split of 1 I suppose:
>>> string.split("/bar/", 1)[1].split("/", 1)[0]
'Atlantis-GPS-coordinates'
Or use partition:
>>> string.partition("/bar/")[2].partition("/")[0]
'Atlantis-GPS-coordinates'
Or a regex:
>>> re.search(r'/bar/([^/]+)', string).group(1)
'Atlantis-GPS-coordinates'
Depends on what speaks to you and your data.
What you haven't isn't all that bad. I'd write it as:
start = string.find('/bar/') + 5
end = string.find('/', start)
output = string[start:end]
as long as you know that /bar/WHAT-YOU-WANT/ is always going to be present. Otherwise, I would reach for the regular expression knife:
>>> import re
>>> PATTERN = re.compile('^.*/bar/([^/]*)/.*$')
>>> s = '/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/'
>>> match = PATTERN.match(s)
>>> match.group(1)
'Atlantis-GPS-coordinates'
import re
pattern = '(?<=/bar/).+?/'
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
result = re.search(pattern, string)
print string[result.start():result.end() - 1]
# "Atlantis-GPS-coordinates"
That is a Python 2.x example. What it does first is:
1. (?<=/bar/) means only process the following regex if this precedes it (so that /bar/ must be before it)
2. '.+?/' means any amount of characters up until the next '/' char
Hope that helps some.
If you need to do this kind of search a bunch it is better to 'compile' this search for performance, but if you only need to do it once don't bother.
Using re (slower than other solutions):
>>> import re
>>> string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
>>> re.search(r'(?<=/bar/)[^/]+(?=/)', string).group()
'Atlantis-GPS-coordinates'

How can I format an integer to a two digit hex?

Does anyone know how to get a chr to hex conversion where the output is always two digits?
for example, if my conversion yields 0x1, I need to convert that to 0x01, since I am concatenating a long hex string.
The code that I am using is:
hexStr += hex(ord(byteStr[i]))[2:]
You can use string formatting for this purpose:
>>> "0x{:02x}".format(13)
'0x0d'
>>> "0x{:02x}".format(131)
'0x83'
Edit: Your code suggests that you are trying to convert a string to a hexstring representation. There is a much easier way to do this (Python2.x):
>>> "abcd".encode("hex")
'61626364'
An alternative (that also works in Python 3.x) is the function binascii.hexlify().
You can use the format function:
>>> format(10, '02x')
'0a'
You won't need to remove the 0x part with that (like you did with the [2:])
If you're using python 3.6 or higher you can also use fstrings:
v = 10
s = f"0x{v:02x}"
print(s)
output:
0x0a
The syntax for the braces part is identical to string.format(), except you use the variable's name. See https://www.python.org/dev/peps/pep-0498/ for more.
htmlColor = "#%02X%02X%02X" % (red, green, blue)
The standard module binascii may also be the answer, namely when you need to convert a longer string:
>>> import binascii
>>> binascii.hexlify('abc\n')
'6162630a'
Use format instead of using the hex function:
>>> mychar = ord('a')
>>> hexstring = '%.2X' % mychar
You can also change the number "2" to the number of digits you want, and the "X" to "x" to choose between upper and lowercase representation of the hex alphanumeric digits.
By many, this is considered the old %-style formatting in Python, but I like it because the format string syntax is the same used by other languages, like C and Java.
The simpliest way (I think) is:
your_str = '0x%02X' % 10
print(your_str)
will print:
0x0A
The number after the % will be converted to hex inside the string, I think it's clear this way and from people that came from a C background (like me) feels more like home

Categories