Related
Python Escape Double quote character and convert the string to json
I have tried escaping double quotes with escape characters but that didn't worked either
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
It load errors saying Expecting ',' delimiter: line 1 column 180 (char 179)
The expected output is JSON string
The correct JSON string, with escaped quotes should look like this:
[{
"Attribute": "color",
"Keywords": "green",
"AttributeComments": null
}, {
"Attribute": " season",
"Keywords": ["Holly Berry"],
"AttributeComments": null
}, {
"Attribute": " size",
"Keywords": "20\"x30",
"AttributeComments": null
}, {
"Attribute": " unit",
"Keywords": "1",
"AttributeComments": null
}]
Edit:
You can use a regular expression to correct the sting in Python resulting in a valid json:
import re
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
pattern = r'"Keywords":"([\d].)"x([\d].)""'
correctedString = re.sub(pattern, '"Keywords": "\g<1>x\g<2>"', raw_string)
print(json.loads(correctedString))
Output:
[{u'Keywords': u'green', u'Attribute': u'color', u'AttributeComments': None}, {u'Keywords': [u'Holly Berry'], u'Attribute': u' season', u'AttributeComments': None}, {u'Keywords': u'20x30', u'Attribute': u' size', u'AttributeComments': None}, {u'Keywords': u'1', u'Attribute': u' unit', u'AttributeComments': None}]
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
First of all change the key-value pair : "Keywords":"20"x30"" to "Keywords":"20x30".
The formatting is invalid in your code. If this JSON is not made by you or generated by some other source, check the source. You can check if the JSON is valid or not using JSONLint. Just paste your JSON here to check.
As for your code:
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
Since new_data is a list. If you check the type of its first and only element, using print(type(new_data[0])) you'll find it is a dict that you desired.
EDIT: Since you say you are fetching this JSON from a database, check if the JSONs there are all carrying these type of formatting errors. If yes, you'd want to check where these are JSONs being generated. Your options are either to correct it at the source and correct it manually or adding escape characters, if this is a one-off problem. I strongly suggest the former.
This code:
import json
s = '{ "key1": "value1", "key2": "value2", }'
json.loads(s)
produces this error in Python 2:
ValueError: Expecting property name: line 1 column 16 (char 15)
Similar result in Python 3:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 16 (char 15)
If I remove that trailing comma (after "value2"), I get no error. But my code will process many different JSONs, so I can't do it manually. Is it possible to setup the parser to ignore such last commas?
Another option is to parse it as YAML; YAML accepts valid JSON but also accepts all sorts of variations.
import yaml
s = '{ "key1": "value1", "key2": "value2", }'
yaml.load(s)
JSON specification doesn't allow trailing comma. The parser is throwing since it encounters invalid syntax token.
You might be interested in using a different parser for those files, eg. a parser built for JSON5 spec which allows such syntax.
It could be that this data stream is JSON5, in which case there's a parser for that: https://pypi.org/project/json5/
This situation can be alleviated by a regex substitution that looks for ", }, and replaces it with " }, allowing for any amount of whitespace between the quotes, comma and close-curly.
>>> import re
>>> s = '{ "key1": "value1", "key2": "value2", }'
>>> re.sub(r"\"\s*,\s*\}", "\" }", s)
'{ "key1": "value1", "key2": "value2" }'
Giving:
>>> import json
>>> s2 = re.sub(r"\"\s*,\s*\}", "\" }", s)
>>> json.loads(s2)
{'key1': 'value1', 'key2': 'value2'}
EDIT: as commented, this is not a good practice unless you are confident your JSON data contains only simple words, and this change is not corrupting the data-stream further. As I commented on the OP, the best course of action is to repair the up-stream data source. But sometimes that's not possible.
I wrote a regex to find and remove all commas with ] } followed in the json, but the ones in strings will be skipped.
it seems to work fine and fast.
import re, json
s = r'''
[
123, true, false, null,
{
"\n\\\",]\\": "\n\\\",]\\",
"\n\\\",}\\": "\n\\\",}\\",
},
]
'''
r = json.loads(re.sub(r'("(?:\\?.)*?")|,\s*([]}])', r'\1\2', s))
print(r) # [123, True, False, None, {'\n\\",]\\': '\n\\",]\\', '\n\\",}\\': '\n\\",}\\'}]
That's because an extra , is invalid according to JSON standard.
An object is an unordered set of name/value pairs. An object begins
with { (left brace) and ends with } (right brace). Each name is
followed by : (colon) and the name/value pairs are separated by ,
(comma).
If you really need this, you could wrap python's json parser with jsoncomment. But I would try to fix JSON in the origin.
I suspect it doesn't parse because "it's not json", but you could pre-process strings, using regular expression to replace , } with } and , ] with ]
How about use the following regex?
s = re.sub(r",\s*}", "}", s)
I have a description field that is embedded within json and I'm unable to utilize json libraries to parse this data.
I use {0,23} in order in attempt to extract first 23 characters of string, how to extract entire value associated with description ?
import re
description = "'\description\" : \"this is a tesdt \n another test\" "
re.findall(r'description(?:\w+){0,23}', description, re.IGNORECASE)
For above code just ['description'] is displayed
You could try this code out:
import re
description = "description\" : \"this is a tesdt \n another test\" "
result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]
print(result)
Which gives you the result of:
"this is a tesdt
another test"
Which is essentially:
\"this is a tesdt \n another test\"
And is what you have asked for in the comments.
Explanation -
(?<=description") is a positive look-behind that tells the regex to match the text preceded by description"
(?:\s*\:\s*) is a non-capturing group that tells the regex that description" will be followed by zero-or-more spaces, a colon (:) and again zero-or-more spaces.
(".{0,23}?(?=")") is the actual match desired, which consists of a double-quotes ("), zero-to-twenty three characters, and a double-quotes (") at the end.
# First just creating some test JSON
import json
data = {
'items': [
{
'description': 'A "good" thing',
# This is ignored because I'm assuming we only want the exact key 'description'
'full_description': 'Not a good thing'
},
{
'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
},
]
}
j = json.dumps(data)
print(j)
# The actual code
import re
pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [
# I'm using json.loads just to parse the matched string to interpret
# escapes properly. If this is not acceptable then ast.literal_eval
# will probably also work
json.loads(d)
for d in re.findall(pattern, j)]
# Testing that it works
assert descriptions == [item['description'] for item in data['items']]
For the past few hours, I've been fighting to get a string into a JSON dict. I've tried everything from json.loads(... which throws an error:
requestInformation = json.loads(entry["request"]["postData"]["text"])
//throws this error
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
to stripping out the slashes using a medley of re.sub('\\','',mystring) ,mystring.sub(... to no effect. My problem string looks like so
'{items:[{n:\\'PackageChannel.GetUnitsInConfigurationForUnitType\\',ps:[{n:\\'unitType\\',v:"ActionTemplate"}]}]}'
The origin of this string is that it's a HAR dump from Google Chrome. I think those backslashes are from it being escaped somewhere along the way because the bulk of the HAR file doesn't contain them, but they do appear commonly in any field labeled "text".
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
EDIT I eventually gave up on turning the text above into JSON and instead opted for regex. Sometimes the slashes showed up, sometimes they didn't based on what I was viewing the text in and that made it difficult to work with.
the json module wants a string where the keys are also wrapped in double quotes
so the string below would work:
mystring = '{"items":[{"n":"PackageChannel.GetUnitsInConfigurationForUnitType", "ps":[{"n":"unitType","v":"ActionTemplate"}]}]}'
myjson = json.loads(mystring)
This function should remove the double backslashes and put double quotes around your keys.
import json, re
def make_jsonable(mystring):
# we'll use this regex to find any key that doesn't contain any of: {}[]'",
key_regex = "([\,\[\{](\s+)?[^\"\{\}\,\[\]]+(\s+)?:)"
mystring = re.sub("[\\\]", "", mystring) # remove any backslashes
mystring = re.sub("\'", "\"", mystring) # replace single quotes with doubles
match = re.search(key_regex, mystring)
while match:
start_index = match.start(0)
end_index = match.end(0)
print(mystring[start_index+1:end_index-1].strip())
mystring = '%s"%s"%s'%(mystring[:start_index+1], mystring[start_index+1:end_index-1].strip(), mystring[end_index-1:])
match = re.search(key_regex, mystring)
return mystring
I couldn't directly test it on the first string you wrote, the double/single quotes don't match up, but on the one in the last code sample it works.
You'll need a r before JSON String, or replace all \ with \\
This works:
import json
validasst_json = r'''{
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
}'''
txt = json.loads(validasst_json)
print(txt["postData"]['mimeType'])
print(txt["postData"]['text'])
Non-working example:
print(" \{ Hello \} {0} ".format(42))
Desired output:
{Hello} 42
You need to double the {{ and }}:
>>> x = " {{ Hello }} {0} "
>>> print(x.format(42))
' { Hello } 42 '
Here's the relevant part of the Python documentation for format string syntax:
Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}.
Python 3.6+ (2017)
In the recent versions of Python one would use f-strings (see also PEP498).
With f-strings one should use double {{ or }}
n = 42
print(f" {{Hello}} {n} ")
produces the desired
{Hello} 42
If you need to resolve an expression in the brackets instead of using literal text you'll need three sets of brackets:
hello = "HELLO"
print(f"{{{hello.lower()}}}")
produces
{hello}
You escape it by doubling the braces.
Eg:
x = "{{ Hello }} {0}"
print(x.format(42))
The OP wrote this comment:
I was trying to format a small JSON for some purposes, like this: '{"all": false, "selected": "{}"}'.format(data) to get something like {"all": false, "selected": "1,2"}
It's pretty common that the "escaping braces" issue comes up when dealing with JSON.
I suggest doing this:
import json
data = "1,2"
mydict = {"all": "false", "selected": data}
json.dumps(mydict)
It's cleaner than the alternative, which is:
'{{"all": false, "selected": "{}"}}'.format(data)
Using the json library is definitely preferable when the JSON string gets more complicated than the example.
You want to format a string with the character { or }
You just have to double them.
format { with f'{{' and }with f'}}'
So :
name = "bob"
print(f'Hello {name} ! I want to print }} and {{ or {{ }}')
Output :
Hello bob ! I want to print } and { or { }
OR for the exact example :
number = 42
print(f'{{Hello}} {number}')
Will print :
{Hello} 42
Finally :
number = 42
string = "bob"
print(f'{{Hello}} {{{number}}} {number} {{{string}}} {string} ')
{Hello} {42} 42 {bob} bob
Try this:
x = "{{ Hello }} {0}"
Try doing this:
x = " {{ Hello }} {0} "
print x.format(42)
Although not any better, just for the reference, you can also do this:
>>> x = '{}Hello{} {}'
>>> print x.format('{','}',42)
{Hello} 42
It can be useful for example when someone wants to print {argument}. It is maybe more readable than '{{{}}}'.format('argument')
Note that you omit argument positions (e.g. {} instead of {0}) after Python 2.7
key = "FOOBAR"
print(f"hello {{{key}}}")
outputs
hello {FOOBAR}
In case someone wanted to print something inside curly brackets using fstrings.
If you need to keep two curly braces in the string, you need 5 curly braces on each side of the variable.
>>> myvar = 'test'
>>> "{{{{{0}}}}}".format(myvar)
'{{test}}'
f-strings (python 3)
You can avoid having to double the curly brackets by using f-strings ONLY for the parts of the string where you want the f-magic to apply, and using regular (dumb) strings for everything that is literal and might contain 'unsafe' special characters. Let python do the string joining for you simply by stacking multiple strings together.
number = 42
print(" { Hello }"
f" {number} "
"{ thanks for all the fish }")
### OUTPUT:
{ Hello } 42 { thanks for all the fish }
NOTE: Line breaks between the strings are NOT required. I have only added them for readability. You could as well write the code above as shown below:
⚠️ WARNING: This might hurt your eyes or make you dizzy!
print("{Hello}"f"{number}""{thanks for all the fish}")
If you are going to be doing this a lot, it might be good to define a utility function that will let you use arbitrary brace substitutes instead, like
def custom_format(string, brackets, *args, **kwargs):
if len(brackets) != 2:
raise ValueError('Expected two brackets. Got {}.'.format(len(brackets)))
padded = string.replace('{', '{{').replace('}', '}}')
substituted = padded.replace(brackets[0], '{').replace(brackets[1], '}')
formatted = substituted.format(*args, **kwargs)
return formatted
>>> custom_format('{{[cmd]} process 1}', brackets='[]', cmd='firefox.exe')
'{{firefox.exe} process 1}'
Note that this will work either with brackets being a string of length 2 or an iterable of two strings (for multi-character delimiters).
I recently ran into this, because I wanted to inject strings into preformatted JSON.
My solution was to create a helper method, like this:
def preformat(msg):
""" allow {{key}} to be used for formatting in text
that already uses curly braces. First switch this into
something else, replace curlies with double curlies, and then
switch back to regular braces
"""
msg = msg.replace('{{', '<<<').replace('}}', '>>>')
msg = msg.replace('{', '{{').replace('}', '}}')
msg = msg.replace('<<<', '{').replace('>>>', '}')
return msg
You can then do something like:
formatted = preformat("""
{
"foo": "{{bar}}"
}""").format(bar="gas")
Gets the job done if performance is not an issue.
I am ridiculously late to this party. I am having success placing the brackets in the replacement element, like this:
print('{0} {1}'.format('{hello}', '{world}'))
which prints
{hello} {world}
Strictly speaking this is not what OP is asking, as s/he wants the braces in the format string, but this may help someone.
Reason is , {} is the syntax of .format() so in your case .format() doesn't recognize {Hello} so it threw an error.
you can override it by using double curly braces {{}},
x = " {{ Hello }} {0} "
or
try %s for text formatting,
x = " { Hello } %s"
print x%(42)
I stumbled upon this problem when trying to print text, which I can copy paste into a Latex document. I extend on this answer and make use of named replacement fields:
Lets say you want to print out a product of mulitple variables with indices such as
, which in Latex would be $A_{ 0042 }*A_{ 3141 }*A_{ 2718 }*A_{ 0042 }$
The following code does the job with named fields so that for many indices it stays readable:
idx_mapping = {'i1':42, 'i2':3141, 'i3':2178 }
print('$A_{{ {i1:04d} }} * A_{{ {i2:04d} }} * A_{{ {i3:04d} }} * A_{{ {i1:04d} }}$'.format(**idx_mapping))
You can use a "quote wall" to separate the formatted string part from the regular string part.
From:
print(f"{Hello} {42}")
to
print("{Hello}"f" {42}")
A clearer example would be
string = 10
print(f"{string} {word}")
Output:
NameError: name 'word' is not defined
Now, add the quote wall like so:
string = 10
print(f"{string}"" {word}")
Output:
10 {word}
I used a double {{ }} to prevent fstring value injection,
for example, heres my Postgres UPDATE statement to update a integer array column that takes expression of {} to capture the array, ie:
ports = '{100,200,300}'
with fstrings its,
ports = [1,2,3]
query = f"""
UPDATE table SET ports = '{{{ports}}}' WHERE id = 1
"""
the actual query statement will be,
UPDATE table SET ports = '{1,2,3}'
which is a valid postgres satement
If you want to print just one side of the curly brace:
a=3
print(f'{"{"}{a}')
>>> {3
If you want to only print one curly brace (for example {) you can use {{, and you can add more braces later in the string if you want.
For example:
>>> f'{{ there is a curly brace on the left. Oh, and 1 + 1 is {1 + 1}'
'{ there is a curly brace on the left. Oh, and 1 + 1 is 2'
When you're just trying to interpolate code strings I'd suggest using jinja2 which is a full-featured template engine for Python, ie:
from jinja2 import Template
foo = Template('''
#include <stdio.h>
void main() {
printf("hello universe number {{number}}");
}
''')
for i in range(2):
print(foo.render(number=i))
So you won't be enforced to duplicate curly braces as the whole bunch of other answers suggest
If you need curly braces within a f-string template that can be formatted, you need to output a string containing two curly braces within a set of curly braces for the f-string:
css_template = f"{{tag}} {'{{'} margin: 0; padding: 0;{'}}'}"
for_p = css_template.format(tag="p")
# 'p { margin: 0; padding: 0;}'
Or just parametrize the bracket itself? Probably very verbose.
x = '{open_bracket}42{close_bracket}'.format(open_bracket='{', close_bracket='}')
print(x)
# {42}