List of dictionaries where value has two double quoted values

List of dictionaries where value has two double quoted values - python

I came up with list of dictionaries as a string. I wanted to convert this string to dictionary but it gives error.
data = '{
"address": "Ludwig-Wolf-Straß 1, 75181 Pforzheim Eutingen",
"lat": 48.90962790,
"lng": 8.74648390,
"name": "Psychiatrische Tagesklinik Pforzheim "Alte Mühle"",
"path": "appportrait7e29d81c345927b0start",
"color" : "yellow",
"zIndex": "30",}'
After checking it, I found out a value enclosed in double quotes two times.
data = {
"address": "Ludwig-Wolf-Straß 1, 75181 Pforzheim Eutingen",
"lat": 48.90962790,
"lng": 8.74648390,
"name": "Psychiatrische Tagesklinik Pforzheim "Alte Mühle"", # this value
"path": "appportrait7e29d81c345927b0start",
"color" : "yellow",
"zIndex": "30",}
I want to turn "Alte Mühle" into a single quote 'Alte Mühle' or just Alte Mühle. I tried to parse the dictionary to str and use string.replace() function but it didn't work. Since the value is dynamic I can't just change the value in a static way. i,e
string.replace('"Alte Mühle"', 'Alte Mühle') # will only change this value
is there any way to get rid of this?

Not enough rep to comment, so I'm assuming you are starting with a bunch of string literals you typed manually into your code. If not, there are other ways to handle this or it may have not been an issue to start with.
Here is an solution that doesn't require manually searching for problem strings. Enclose your dictionary string literal using tripple quotes (either """ or ''' are permitted) instead of the single ' or ". This will prevent the interpreter from getting confused about ' or " inside a string literal.
data = """{
"address": "Ludwig-Wolf-Straß 1, 75181 Pforzheim Eutingen",
"lat": 48.90962790,
"lng": 8.74648390,
"name": "Psychiatrische Tagesklinik Pforzheim "Alte Mühle"",
"path": "appportrait7e29d81c345927b0start",
"color" : "yellow",
"zIndex": "30",}"""
Next, the double quote problem can be handled using regular expressions (re). I have to leave this as an exercise as I am on a phone, but you can replace all " that lies inside a dictionary value regular expression search string ": \"([.]+?)\",” with '. Find this pattern, modify the substring, then replace the old substring with the corrected one.
Finally, to interpret it as a dictionary, call ast.literal_eval(...) on the corrected string (a version of eval(...) made safer by only interpreting literals). Requires the standard library ast import.
Consider comparing this workload vs manually fixing your strings or loading the strings or key/value pairs from a database, avoiding these string literal issues all together.

Related

How to jump to the next line in python if my list of strings is getting too long?

Line 1.
names_boys = ["john", "paul", "peter", "roger", "pete", "johnson", "derick", "christof", "andrew"]
#if this list is quite long and I want to continue on the next line (line 2), instead of having a very long line of code, what do I do?
I have tried using "\n" but nothing.
bear with me. I am new to python

You can do this simple:
names_boys = ["john", "paul", "peter", "roger", "pete",
"johnson", "derick", "christof", "andrew"]

you can either go down a line or use the \ at the desired location like so:
names_boys = ["john", "paul", "peter", "roger", "pete",\
"johnson", "derick", "christof", "andrew"]
keep in mind that you can't have any white space or comments after the \

Python statements are usually written in a single line. The newline character marks the end of the statement. If the statement is very long, we can explicitly divide into multiple lines with the line continuation character (\).
Python supports multi-line continuation inside parentheses ( ), brackets [ ], and braces { }. The brackets are used by List and the braces are used by dictionary objects. We can use parentheses for expressions, tuples, and strings.
by second rule, you can split you line as
names_boys = [
"john", "paul", "peter", "roger",
"pete", "johnson", "derick", "christof",
"andrew"]

Converting string containing double quotes to json

Python Escape Double quote character and convert the string to json
I have tried escaping double quotes with escape characters but that didn't worked either
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
It load errors saying Expecting ',' delimiter: line 1 column 180 (char 179)
The expected output is JSON string

The correct JSON string, with escaped quotes should look like this:
[{
"Attribute": "color",
"Keywords": "green",
"AttributeComments": null
}, {
"Attribute": " season",
"Keywords": ["Holly Berry"],
"AttributeComments": null
}, {
"Attribute": " size",
"Keywords": "20\"x30",
"AttributeComments": null
}, {
"Attribute": " unit",
"Keywords": "1",
"AttributeComments": null
}]
Edit:
You can use a regular expression to correct the sting in Python resulting in a valid json:
import re
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20"x30"","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
pattern = r'"Keywords":"([\d].)"x([\d].)""'
correctedString = re.sub(pattern, '"Keywords": "\g<1>x\g<2>"', raw_string)
print(json.loads(correctedString))
Output:
[{u'Keywords': u'green', u'Attribute': u'color', u'AttributeComments': None}, {u'Keywords': [u'Holly Berry'], u'Attribute': u' season', u'AttributeComments': None}, {u'Keywords': u'20x30', u'Attribute': u' size', u'AttributeComments': None}, {u'Keywords': u'1', u'Attribute': u' unit', u'AttributeComments': None}]

raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)

First of all change the key-value pair : "Keywords":"20"x30"" to "Keywords":"20x30".
The formatting is invalid in your code. If this JSON is not made by you or generated by some other source, check the source. You can check if the JSON is valid or not using JSONLint. Just paste your JSON here to check.
As for your code:
import json
raw_string = '[{"Attribute":"color","Keywords":"green","AttributeComments":null},{"Attribute":" season","Keywords":["Holly Berry"],"AttributeComments":null},{"Attribute":" size","Keywords":"20x30","AttributeComments":null},{"Attribute":" unit","Keywords":"1","AttributeComments":null}]'
new_data = json.loads(raw_string)
Since new_data is a list. If you check the type of its first and only element, using print(type(new_data[0])) you'll find it is a dict that you desired.
EDIT: Since you say you are fetching this JSON from a database, check if the JSONs there are all carrying these type of formatting errors. If yes, you'd want to check where these are JSONs being generated. Your options are either to correct it at the source and correct it manually or adding escape characters, if this is a one-off problem. I strongly suggest the former.

How to remove characters from a string until a special character?

I have a string like this.
config =\
{"status":"None",
"numbers":["123", "123", "123"],
"schedule":None,
"data":{
"x": "y"
}
}
I would like to remove the config=\ from the string and get a result like this.
{"status":"None",
"numbers":["123", "123", "123"],
"schedule":None,
"data":{
"x": "y"
}
}
How can I get this using python regex? Would like to consider the multiline factor as well!!
I am using this method
re.sub(r'.*{"', '{"', script_config, flags=re.MULTILINE)
But the code consider each line separately. Also I would like to remove only the

You don't need regexp for it:
string = string.replace('config =\\', '')
If the first word is not specified:
string = string[string.find('\\')+1:] if '\\' in string else string
But if you want to use regexps:
string = re.sub(r'^.*\\', '', string)

Python can't parse JSON with extra trailing comma

This code:
import json
s = '{ "key1": "value1", "key2": "value2", }'
json.loads(s)
produces this error in Python 2:
ValueError: Expecting property name: line 1 column 16 (char 15)
Similar result in Python 3:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 16 (char 15)
If I remove that trailing comma (after "value2"), I get no error. But my code will process many different JSONs, so I can't do it manually. Is it possible to setup the parser to ignore such last commas?

Another option is to parse it as YAML; YAML accepts valid JSON but also accepts all sorts of variations.
import yaml
s = '{ "key1": "value1", "key2": "value2", }'
yaml.load(s)

JSON specification doesn't allow trailing comma. The parser is throwing since it encounters invalid syntax token.
You might be interested in using a different parser for those files, eg. a parser built for JSON5 spec which allows such syntax.

It could be that this data stream is JSON5, in which case there's a parser for that: https://pypi.org/project/json5/
This situation can be alleviated by a regex substitution that looks for ", }, and replaces it with " }, allowing for any amount of whitespace between the quotes, comma and close-curly.
>>> import re
>>> s = '{ "key1": "value1", "key2": "value2", }'
>>> re.sub(r"\"\s*,\s*\}", "\" }", s)
'{ "key1": "value1", "key2": "value2" }'
Giving:
>>> import json
>>> s2 = re.sub(r"\"\s*,\s*\}", "\" }", s)
>>> json.loads(s2)
{'key1': 'value1', 'key2': 'value2'}
EDIT: as commented, this is not a good practice unless you are confident your JSON data contains only simple words, and this change is not corrupting the data-stream further. As I commented on the OP, the best course of action is to repair the up-stream data source. But sometimes that's not possible.

I wrote a regex to find and remove all commas with ] } followed in the json, but the ones in strings will be skipped.
it seems to work fine and fast.
import re, json
s = r'''
[
123, true, false, null,
{
"\n\\\",]\\": "\n\\\",]\\",
"\n\\\",}\\": "\n\\\",}\\",
},
]
'''
r = json.loads(re.sub(r'("(?:\\?.)*?")|,\s*([]}])', r'\1\2', s))
print(r) # [123, True, False, None, {'\n\\",]\\': '\n\\",]\\', '\n\\",}\\': '\n\\",}\\'}]

That's because an extra , is invalid according to JSON standard.
An object is an unordered set of name/value pairs. An object begins
with { (left brace) and ends with } (right brace). Each name is
followed by : (colon) and the name/value pairs are separated by ,
(comma).
If you really need this, you could wrap python's json parser with jsoncomment. But I would try to fix JSON in the origin.

I suspect it doesn't parse because "it's not json", but you could pre-process strings, using regular expression to replace , } with } and , ] with ]

How about use the following regex?
s = re.sub(r",\s*}", "}", s)

JSON string parser without literals

How to check if a string like this {:[{},{}]}, without any literals, can be represented as a JSON object or not?
The input comes with the following constraints:
1. A JSON object should start with '{' and ends with a '}'.
2. The key and value should be separated by a ':'.
3. A ',' suggests an additional JSON property.
4. An array only consists of JSON objects. It cannot contain a "key":"value" pair by itself.
And it is to be intrepreted like this:
{
"Key": [{
"Key": "Value"
}, {
"Key": "Value"
}]
}

The syntax spec for JSON can be found here.
It indicates that the [{},{}] is legal, because [] has to contain 0 or more elements separated by ,, and {} is a legal element. However, the first part of your example is NOT valid - the : must have a string in front of it. While it is legal for it to be an empty string, it's not legal for it to be null, and the interpretation of a totally missing element is ambiguous.
So. {"":[{},{}]} is legal, but {:[{},{}]} is not.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.