This code:
import json
s = '{ "key1": "value1", "key2": "value2", }'
json.loads(s)
produces this error in Python 2:
ValueError: Expecting property name: line 1 column 16 (char 15)
Similar result in Python 3:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 16 (char 15)
If I remove that trailing comma (after "value2"), I get no error. But my code will process many different JSONs, so I can't do it manually. Is it possible to setup the parser to ignore such last commas?
Another option is to parse it as YAML; YAML accepts valid JSON but also accepts all sorts of variations.
import yaml
s = '{ "key1": "value1", "key2": "value2", }'
yaml.load(s)
JSON specification doesn't allow trailing comma. The parser is throwing since it encounters invalid syntax token.
You might be interested in using a different parser for those files, eg. a parser built for JSON5 spec which allows such syntax.
It could be that this data stream is JSON5, in which case there's a parser for that: https://pypi.org/project/json5/
This situation can be alleviated by a regex substitution that looks for ", }, and replaces it with " }, allowing for any amount of whitespace between the quotes, comma and close-curly.
>>> import re
>>> s = '{ "key1": "value1", "key2": "value2", }'
>>> re.sub(r"\"\s*,\s*\}", "\" }", s)
'{ "key1": "value1", "key2": "value2" }'
Giving:
>>> import json
>>> s2 = re.sub(r"\"\s*,\s*\}", "\" }", s)
>>> json.loads(s2)
{'key1': 'value1', 'key2': 'value2'}
EDIT: as commented, this is not a good practice unless you are confident your JSON data contains only simple words, and this change is not corrupting the data-stream further. As I commented on the OP, the best course of action is to repair the up-stream data source. But sometimes that's not possible.
I wrote a regex to find and remove all commas with ] } followed in the json, but the ones in strings will be skipped.
it seems to work fine and fast.
import re, json
s = r'''
[
123, true, false, null,
{
"\n\\\",]\\": "\n\\\",]\\",
"\n\\\",}\\": "\n\\\",}\\",
},
]
'''
r = json.loads(re.sub(r'("(?:\\?.)*?")|,\s*([]}])', r'\1\2', s))
print(r) # [123, True, False, None, {'\n\\",]\\': '\n\\",]\\', '\n\\",}\\': '\n\\",}\\'}]
That's because an extra , is invalid according to JSON standard.
An object is an unordered set of name/value pairs. An object begins
with { (left brace) and ends with } (right brace). Each name is
followed by : (colon) and the name/value pairs are separated by ,
(comma).
If you really need this, you could wrap python's json parser with jsoncomment. But I would try to fix JSON in the origin.
I suspect it doesn't parse because "it's not json", but you could pre-process strings, using regular expression to replace , } with } and , ] with ]
How about use the following regex?
s = re.sub(r",\s*}", "}", s)
Related
Python Escape Double quote character and convert the string to json
I have tried escaping double quotes with escape characters but that didn't worked either
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
It load errors saying Expecting ',' delimiter: line 1 column 180 (char 179)
The expected output is JSON string
The correct JSON string, with escaped quotes should look like this:
[{
"Attribute": "color",
"Keywords": "green",
"AttributeComments": null
}, {
"Attribute": " season",
"Keywords": ["Holly Berry"],
"AttributeComments": null
}, {
"Attribute": " size",
"Keywords": "20\"x30",
"AttributeComments": null
}, {
"Attribute": " unit",
"Keywords": "1",
"AttributeComments": null
}]
Edit:
You can use a regular expression to correct the sting in Python resulting in a valid json:
import re
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
pattern = r'"Keywords":"([\d].)"x([\d].)""'
correctedString = re.sub(pattern, '"Keywords": "\g<1>x\g<2>"', raw_string)
print(json.loads(correctedString))
Output:
[{u'Keywords': u'green', u'Attribute': u'color', u'AttributeComments': None}, {u'Keywords': [u'Holly Berry'], u'Attribute': u' season', u'AttributeComments': None}, {u'Keywords': u'20x30', u'Attribute': u' size', u'AttributeComments': None}, {u'Keywords': u'1', u'Attribute': u' unit', u'AttributeComments': None}]
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
First of all change the key-value pair : "Keywords":"20"x30"" to "Keywords":"20x30".
The formatting is invalid in your code. If this JSON is not made by you or generated by some other source, check the source. You can check if the JSON is valid or not using JSONLint. Just paste your JSON here to check.
As for your code:
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
Since new_data is a list. If you check the type of its first and only element, using print(type(new_data[0])) you'll find it is a dict that you desired.
EDIT: Since you say you are fetching this JSON from a database, check if the JSONs there are all carrying these type of formatting errors. If yes, you'd want to check where these are JSONs being generated. Your options are either to correct it at the source and correct it manually or adding escape characters, if this is a one-off problem. I strongly suggest the former.
I have a string like this.
config =\
{"status":"None",
"numbers":["123", "123", "123"],
"schedule":None,
"data":{
"x": "y"
}
}
I would like to remove the config=\ from the string and get a result like this.
{"status":"None",
"numbers":["123", "123", "123"],
"schedule":None,
"data":{
"x": "y"
}
}
How can I get this using python regex? Would like to consider the multiline factor as well!!
I am using this method
re.sub(r'.*{"', '{"', script_config, flags=re.MULTILINE)
But the code consider each line separately. Also I would like to remove only the
You don't need regexp for it:
string = string.replace('config =\\', '')
If the first word is not specified:
string = string[string.find('\\')+1:] if '\\' in string else string
But if you want to use regexps:
string = re.sub(r'^.*\\', '', string)
I want to use " instead of '.
countryid="be"
msg = "{'field1': 'abc','field2': '"+countryid+"', 'field3': '1'}"
I tried to use """ as follows:
msg = """{"field1": "abc","field2": """"+countryid+"""", "field3": "1"}"""
But then I get:
{"field1": "abc","field2": +countryid+, "field3": "1"}
instead of:
{"field1": "abc","field2": "be", "field3": "1"}
I will give an answer to your questions, however I think the question you are asking may be slightly wrong.
The issue you are facing is due to the unfortunate placement of the start quote for the countryid
msg = """{"field1": "abc","field2": """"+countryid+"""", "field3": "1"}"""
^^^^
||||
The python documentation for string literals says:
In triple-quoted strings, unescaped newlines and quotes are allowed
(and are retained), except that three unescaped quotes in a row
terminate the string. (A ``quote'' is the character used to open the
string, i.e. either ' or ".)
So you have many options to fix your issue:
1) Escape the problematic quote:
msg = """{"field1": "abc","field2": \""""+countryid+"""", "field3": "1"}"""
^
|
2) Use single quotes:
msg = '{"field1": "abc","field2": "'+countryid+'", "field3": "1"}'
3) Use string formatting:
msg = """{"field1": "abc","field2": "%s", "field3": "1"}""" % countryid
^ ^ ^
| | |
There are even more options if you're using python 3.6 and later or if you're willing to use a library.
This is one of the areas of the Python language where the "zen of python" principle of "one obvious way to do something" is broken
Note that all of the methods I've explained above are not actually a good way of generating JSON.
Here is how I'd do it if I were you:
import json
countryid= "someone's ID"
message = {'field1': 'abc','field2': countryid,'field3': '1'}
msg = json.dumps(messsage)
You can use json module
import json
countryid="be"
msg = {
'field1': 'abc',
'field2': countryid,
'field3': '1'
}
print(json.dumps(msg))
Is there any way using regular expression in python to replace all the occurrences of , (comma) after the flower braces {
Data is of the following format in a file - abc.json
{
"Key1":"value1",
"Key2":"value2"
},
{
"Key1":"value3",
"Key2":"value4"
},
{
"Key1":"value5",
"Key2":"value6"
}
This should result in following -
{
"Key1":"value1",
"Key2":"value2"
}
{
"Key1":"value3",
"Key2":"value4"
}
{
"Key1":"value5",
"Key2":"value6"
}
As you can see the ,(comma) has been removed after every braces }.
Would be helpful if this can be achieved via jq as well, apart from python REGEX
Test Source: https://regex101.com/r/wT6uU2/1
import re
p = re.compile(ur'},')
test_str = u"{\n\"Key1\":\"value1\",\n\"Key2\":\"value2\"\n},\n\n{\n\"Key1\":\"value3\",\n\"Key2\":\"value4\"\n},\n\n{\n\"Key1\":\"value5\",\n\"Key2\":\"value6\"\n}"
re.findall(p, test_str)
But use replace instead
replace }, -> }
This works:
import re
s="""{
"Key1":"value1",
"Key2":"value2"
},
{
"Key1":"value3",
"Key2":"value4"
},
{
"Key1":"value5",
"Key2":"value6"
}"""
pattern=re.compile(r'(?P<data>{.*?}),', re.S)
print pattern.findall(s)
s1=pattern.sub(r'\g<data>', s)
print s1
If you intend to process the resulting JSON in jq, it's probably easier to wrap it in brackets [{...}, {...}] to make it a JSON array. Then, you can use .[] in jq to unwrap the array.
Before you even consider other options, you really should go back to the source that generated that file and make sure it actually outputs valid json.
That said, you could use JQ to manipulate the contents as a raw string to add brackets, then parse it as an array to them spit out the contents.
$ jq -Rs '"[\(.)]" | fromjson[]' abc.json
I have a simple example:
test = '{ "text": "\"test\""}'
It is a valid json ( see http://jsonlint.com/ ).
But simplejson.loads(test) return error :
ValueError: Expecting , delimiter: line 1 column 12 (char 12)
Why ?
In Python, \" means " only. So, the string is actually processed as
{ "text": ""test""}
As you see now, there is an empty string before test and test is not enclosed by double quotes. So, you just have to escape the \ as well, like this
test = '{ "text": "\\"test\\"" }'
Or create the text as a raw string, like this
test = r'{ "text": "\"test\"" }'