Put all occurences in the String in quotes using regex in python - python

I hava a long string were I can find something like this data() { <some data which is always different here> } I want to put all occurences in quotes. This is what I'm doing but it has no effect:
string = re.sub(r'data \(\) {(.*)}', r'"/1"', string)
I suppose there should be something different between curly brackets but I have no idea what...
#EDIT
I realized my String look like this:
data() {
<some white spaces> here is text
<some white spaces> }

Whitespace matters, the direction of slashes matters (thanks Wiktor, I overlooked that before) and that quantifier should probably be lazy. Also, if there are newlines within your text, you need to allow for that
string = re.sub(r'(?s)data\(\) {(.*?)}', r'"\1"', string)
Testing it on your sample text:
In [4]: string = """data() {
...: <some white spaces> here is text
...: <some white spaces> }"""
In [5]: print(re.sub(r'(?s)data\(\) {(.*?)}', r'"\1"', string))
"
<some white spaces> here is text
<some white spaces> "

Related

Best way to format a string in python when non-formatting curly brackets are present in the string [duplicate]

Non-working example:
print(" \{ Hello \} {0} ".format(42))
Desired output:
{Hello} 42
You need to double the {{ and }}:
>>> x = " {{ Hello }} {0} "
>>> print(x.format(42))
' { Hello } 42 '
Here's the relevant part of the Python documentation for format string syntax:
Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}.
Python 3.6+ (2017)
In the recent versions of Python one would use f-strings (see also PEP498).
With f-strings one should use double {{ or }}
n = 42
print(f" {{Hello}} {n} ")
produces the desired
{Hello} 42
If you need to resolve an expression in the brackets instead of using literal text you'll need three sets of brackets:
hello = "HELLO"
print(f"{{{hello.lower()}}}")
produces
{hello}
You escape it by doubling the braces.
Eg:
x = "{{ Hello }} {0}"
print(x.format(42))
The OP wrote this comment:
I was trying to format a small JSON for some purposes, like this: '{"all": false, "selected": "{}"}'.format(data) to get something like {"all": false, "selected": "1,2"}
It's pretty common that the "escaping braces" issue comes up when dealing with JSON.
I suggest doing this:
import json
data = "1,2"
mydict = {"all": "false", "selected": data}
json.dumps(mydict)
It's cleaner than the alternative, which is:
'{{"all": false, "selected": "{}"}}'.format(data)
Using the json library is definitely preferable when the JSON string gets more complicated than the example.
You want to format a string with the character { or }
You just have to double them.
format { with f'{{' and }with f'}}'
So :
name = "bob"
print(f'Hello {name} ! I want to print }} and {{ or {{ }}')
Output :
Hello bob ! I want to print } and { or { }
OR for the exact example :
number = 42
print(f'{{Hello}} {number}')
Will print :
{Hello} 42
Finally :
number = 42
string = "bob"
print(f'{{Hello}} {{{number}}} {number} {{{string}}} {string} ')
{Hello} {42} 42 {bob} bob
Try this:
x = "{{ Hello }} {0}"
Try doing this:
x = " {{ Hello }} {0} "
print x.format(42)
Although not any better, just for the reference, you can also do this:
>>> x = '{}Hello{} {}'
>>> print x.format('{','}',42)
{Hello} 42
It can be useful for example when someone wants to print {argument}. It is maybe more readable than '{{{}}}'.format('argument')
Note that you omit argument positions (e.g. {} instead of {0}) after Python 2.7
key = "FOOBAR"
print(f"hello {{{key}}}")
outputs
hello {FOOBAR}
In case someone wanted to print something inside curly brackets using fstrings.
If you need to keep two curly braces in the string, you need 5 curly braces on each side of the variable.
>>> myvar = 'test'
>>> "{{{{{0}}}}}".format(myvar)
'{{test}}'
f-strings (python 3)
You can avoid having to double the curly brackets by using f-strings ONLY for the parts of the string where you want the f-magic to apply, and using regular (dumb) strings for everything that is literal and might contain 'unsafe' special characters. Let python do the string joining for you simply by stacking multiple strings together.
number = 42
print(" { Hello }"
f" {number} "
"{ thanks for all the fish }")
### OUTPUT:
{ Hello } 42 { thanks for all the fish }
NOTE: Line breaks between the strings are NOT required. I have only added them for readability. You could as well write the code above as shown below:
⚠️ WARNING: This might hurt your eyes or make you dizzy!
print("{Hello}"f"{number}""{thanks for all the fish}")
If you are going to be doing this a lot, it might be good to define a utility function that will let you use arbitrary brace substitutes instead, like
def custom_format(string, brackets, *args, **kwargs):
if len(brackets) != 2:
raise ValueError('Expected two brackets. Got {}.'.format(len(brackets)))
padded = string.replace('{', '{{').replace('}', '}}')
substituted = padded.replace(brackets[0], '{').replace(brackets[1], '}')
formatted = substituted.format(*args, **kwargs)
return formatted
>>> custom_format('{{[cmd]} process 1}', brackets='[]', cmd='firefox.exe')
'{{firefox.exe} process 1}'
Note that this will work either with brackets being a string of length 2 or an iterable of two strings (for multi-character delimiters).
I recently ran into this, because I wanted to inject strings into preformatted JSON.
My solution was to create a helper method, like this:
def preformat(msg):
""" allow {{key}} to be used for formatting in text
that already uses curly braces. First switch this into
something else, replace curlies with double curlies, and then
switch back to regular braces
"""
msg = msg.replace('{{', '<<<').replace('}}', '>>>')
msg = msg.replace('{', '{{').replace('}', '}}')
msg = msg.replace('<<<', '{').replace('>>>', '}')
return msg
You can then do something like:
formatted = preformat("""
{
"foo": "{{bar}}"
}""").format(bar="gas")
Gets the job done if performance is not an issue.
I am ridiculously late to this party. I am having success placing the brackets in the replacement element, like this:
print('{0} {1}'.format('{hello}', '{world}'))
which prints
{hello} {world}
Strictly speaking this is not what OP is asking, as s/he wants the braces in the format string, but this may help someone.
Reason is , {} is the syntax of .format() so in your case .format() doesn't recognize {Hello} so it threw an error.
you can override it by using double curly braces {{}},
x = " {{ Hello }} {0} "
or
try %s for text formatting,
x = " { Hello } %s"
print x%(42)
I stumbled upon this problem when trying to print text, which I can copy paste into a Latex document. I extend on this answer and make use of named replacement fields:
Lets say you want to print out a product of mulitple variables with indices such as
, which in Latex would be $A_{ 0042 }*A_{ 3141 }*A_{ 2718 }*A_{ 0042 }$
The following code does the job with named fields so that for many indices it stays readable:
idx_mapping = {'i1':42, 'i2':3141, 'i3':2178 }
print('$A_{{ {i1:04d} }} * A_{{ {i2:04d} }} * A_{{ {i3:04d} }} * A_{{ {i1:04d} }}$'.format(**idx_mapping))
You can use a "quote wall" to separate the formatted string part from the regular string part.
From:
print(f"{Hello} {42}")
to
print("{Hello}"f" {42}")
A clearer example would be
string = 10
print(f"{string} {word}")
Output:
NameError: name 'word' is not defined
Now, add the quote wall like so:
string = 10
print(f"{string}"" {word}")
Output:
10 {word}
I used a double {{ }} to prevent fstring value injection,
for example, heres my Postgres UPDATE statement to update a integer array column that takes expression of {} to capture the array, ie:
ports = '{100,200,300}'
with fstrings its,
ports = [1,2,3]
query = f"""
UPDATE table SET ports = '{{{ports}}}' WHERE id = 1
"""
the actual query statement will be,
UPDATE table SET ports = '{1,2,3}'
which is a valid postgres satement
If you want to print just one side of the curly brace:
a=3
print(f'{"{"}{a}')
>>> {3
If you want to only print one curly brace (for example {) you can use {{, and you can add more braces later in the string if you want.
For example:
>>> f'{{ there is a curly brace on the left. Oh, and 1 + 1 is {1 + 1}'
'{ there is a curly brace on the left. Oh, and 1 + 1 is 2'
When you're just trying to interpolate code strings I'd suggest using jinja2 which is a full-featured template engine for Python, ie:
from jinja2 import Template
foo = Template('''
#include <stdio.h>
void main() {
printf("hello universe number {{number}}");
}
''')
for i in range(2):
print(foo.render(number=i))
So you won't be enforced to duplicate curly braces as the whole bunch of other answers suggest
If you need curly braces within a f-string template that can be formatted, you need to output a string containing two curly braces within a set of curly braces for the f-string:
css_template = f"{{tag}} {'{{'} margin: 0; padding: 0;{'}}'}"
for_p = css_template.format(tag="p")
# 'p { margin: 0; padding: 0;}'
Or just parametrize the bracket itself? Probably very verbose.
x = '{open_bracket}42{close_bracket}'.format(open_bracket='{', close_bracket='}')
print(x)
# {42}

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.
I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()
The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)
Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.
to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.
string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

Python: Can't turn string into JSON

For the past few hours, I've been fighting to get a string into a JSON dict. I've tried everything from json.loads(... which throws an error:
requestInformation = json.loads(entry["request"]["postData"]["text"])
//throws this error
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
to stripping out the slashes using a medley of re.sub('\\','',mystring) ,mystring.sub(... to no effect. My problem string looks like so
'{items:[{n:\\'PackageChannel.GetUnitsInConfigurationForUnitType\\',ps:[{n:\\'unitType\\',v:"ActionTemplate"}]}]}'
The origin of this string is that it's a HAR dump from Google Chrome. I think those backslashes are from it being escaped somewhere along the way because the bulk of the HAR file doesn't contain them, but they do appear commonly in any field labeled "text".
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
EDIT I eventually gave up on turning the text above into JSON and instead opted for regex. Sometimes the slashes showed up, sometimes they didn't based on what I was viewing the text in and that made it difficult to work with.
the json module wants a string where the keys are also wrapped in double quotes
so the string below would work:
mystring = '{"items":[{"n":"PackageChannel.GetUnitsInConfigurationForUnitType", "ps":[{"n":"unitType","v":"ActionTemplate"}]}]}'
myjson = json.loads(mystring)
This function should remove the double backslashes and put double quotes around your keys.
import json, re
def make_jsonable(mystring):
# we'll use this regex to find any key that doesn't contain any of: {}[]'",
key_regex = "([\,\[\{](\s+)?[^\"\{\}\,\[\]]+(\s+)?:)"
mystring = re.sub("[\\\]", "", mystring) # remove any backslashes
mystring = re.sub("\'", "\"", mystring) # replace single quotes with doubles
match = re.search(key_regex, mystring)
while match:
start_index = match.start(0)
end_index = match.end(0)
print(mystring[start_index+1:end_index-1].strip())
mystring = '%s"%s"%s'%(mystring[:start_index+1], mystring[start_index+1:end_index-1].strip(), mystring[end_index-1:])
match = re.search(key_regex, mystring)
return mystring
I couldn't directly test it on the first string you wrote, the double/single quotes don't match up, but on the one in the last code sample it works.
You'll need a r before JSON String, or replace all \ with \\
This works:
import json
validasst_json = r'''{
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
}'''
txt = json.loads(validasst_json)
print(txt["postData"]['mimeType'])
print(txt["postData"]['text'])

Regular expression in python

I am trying to match/sub the following line
line1 = '# Some text\n'
But avoid match/sub lines like this
'# Some text { .blah}\n'
So in other a # followed by any amount of words spaces and numbers (no punctuation) and then the end of line.
line2 = re.sub(r'# (\P+)$', r'# \1 { .text}', line1)
Puts the contents of line1 into line2 unchanged.
(I read somewhere that \P means everything except punctuation)
line2 = re.sub(r'# (\w*\d*\s*)+$', r'# \1 { .text}', line1)
Whereas the above gives
'# { .text}'
Any help is appreciated
Thanks
Tom
Your regex is a bit weird; expanded, it looks like
r"# ([a-zA-Z0-9_]*[0-9]*[ \t\n\r\f\v]*)+$"
Things to note:
It is not anchored to the beginning of the string, meaning it would match
print("Important stuff!") # Very important
The \d* is redundant, because it is already captured by \w*
Looking at your example, it seems you should be less worried about punctuation; the only thing you cannot have is a curly-brace ({).
Try
from functools import partial
def add_text(txt):
return re.sub(r"^#([^{]*)$", r"#\1 { .text }", txt, flags=re.M)
text = "# Some text\n# More text { .blah}\nprint('abc') # but not me!\n# And once again"
print("===before===")
print(text)
print("\n===after===")
print(add_text(text))
which gives
===before===
# Some text
# More text { .blah}
print('abc') # but not me!
# And once again
===after===
# Some text { .text }
# More text { .blah}
print('abc') # but not me!
# And once again { .text }
If you only want lines which start with a # and continue with alphanumeric values, spaces and _, you want this:
/^#[\w ]+$/gm

Python query for code examples

I want to create something like a dictionary for python code examples. My problem is, that I have to escape all the code examples. Also r'some string' is not useful. Would you recommend to use an other solution to query this entries?
import easygui
lex = {"dict": "woerter = {\"house\" : \"Haus\"}\nwoerter[\"house\"]",\
"for": "for x in range(0, 3):\n print \"We are on time %d\" % (x)",\
"while": "while expression:\n statement(s)"}
input_ = easygui.enterbox("Python-lex","")
output = lex[input_]
b = easygui.textbox("","",output)
Use triple quoting:
lex = {"dict": '''\
woerter = {"house" : "Haus"}
woerter["house"]
''',
"for": '''\
for x in range(0, 3):
print "We are on time %d" % (x)
''',
"while": '''\
while expression:
statement(s)
'''}
Triple-quoted strings (using ''' or """ delimiters) preserve newlines and any embedded single quotes do not need to be escaped.
The \ escape after the opening ''' triple quote escapes the newline at the start, making the value a little easier to read. The alternative would be to put the first line directly after the opening quotes.
You can make these raw as well; r'''\n''' would contain the literal characters \ and n, but literal newlines still remain literal newlines. Triple-quoting works with double-quote characters too: """This is a triple-quoted string too""". The only thing you'd have to escape is another triple quote in the same style; you only need to escape one quote character in that case:
triple_quote_with_embedded_triple = '''Triple quotes use \''' and """ delimiters'''
I guess you can use json.dumps(data, incident=1) to convert the data, and transfer into easygui.textbox.
like this below:
import json
import easygui
resp = dict(...)
easygui.textbox(text=json.dumps(resp, indent=1))

Categories