Python: Can't turn string into JSON - python

For the past few hours, I've been fighting to get a string into a JSON dict. I've tried everything from json.loads(... which throws an error:
requestInformation = json.loads(entry["request"]["postData"]["text"])
//throws this error
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
to stripping out the slashes using a medley of re.sub('\\','',mystring) ,mystring.sub(... to no effect. My problem string looks like so
'{items:[{n:\\'PackageChannel.GetUnitsInConfigurationForUnitType\\',ps:[{n:\\'unitType\\',v:"ActionTemplate"}]}]}'
The origin of this string is that it's a HAR dump from Google Chrome. I think those backslashes are from it being escaped somewhere along the way because the bulk of the HAR file doesn't contain them, but they do appear commonly in any field labeled "text".
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
EDIT I eventually gave up on turning the text above into JSON and instead opted for regex. Sometimes the slashes showed up, sometimes they didn't based on what I was viewing the text in and that made it difficult to work with.

the json module wants a string where the keys are also wrapped in double quotes
so the string below would work:
mystring = '{"items":[{"n":"PackageChannel.GetUnitsInConfigurationForUnitType", "ps":[{"n":"unitType","v":"ActionTemplate"}]}]}'
myjson = json.loads(mystring)
This function should remove the double backslashes and put double quotes around your keys.
import json, re
def make_jsonable(mystring):
# we'll use this regex to find any key that doesn't contain any of: {}[]'",
key_regex = "([\,\[\{](\s+)?[^\"\{\}\,\[\]]+(\s+)?:)"
mystring = re.sub("[\\\]", "", mystring) # remove any backslashes
mystring = re.sub("\'", "\"", mystring) # replace single quotes with doubles
match = re.search(key_regex, mystring)
while match:
start_index = match.start(0)
end_index = match.end(0)
print(mystring[start_index+1:end_index-1].strip())
mystring = '%s"%s"%s'%(mystring[:start_index+1], mystring[start_index+1:end_index-1].strip(), mystring[end_index-1:])
match = re.search(key_regex, mystring)
return mystring
I couldn't directly test it on the first string you wrote, the double/single quotes don't match up, but on the one in the last code sample it works.

You'll need a r before JSON String, or replace all \ with \\
This works:
import json
validasst_json = r'''{
"postData": {
"mimeType": "application/json",
"text": "{items:[{n:'PackageChannel.GetUnitsInConfigurationForUnitType',ps:[{n:'unitType',v:\"Analysis\"}]}]}"
}
}'''
txt = json.loads(validasst_json)
print(txt["postData"]['mimeType'])
print(txt["postData"]['text'])

Related

python3 - json.loads for a string that contains " in a value

I'm trying to transform a string that contains a dict to a dict object using json.
But in the data contains a "
example
string = '{"key1":"my"value","key2":"my"value2"}'
js = json.loads(s,strict=False)
it outputs json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 13 (char 12) as " is a delimiter and there is too much of it
What is the best way to achieve my goal ?
The solution I have found is to perform several .replace on the string to replace legit " by a pattern until only illgal " remains then replace back the pattern by the legit "
After that I can use json.loads and then replace the remaining pattern by the illegal "
But there must be another way
ex :
string = '{"key1":"my"value","key2":"my"value2"}'
string = string.replace('{"','__pattern_1')
string = string.replace('}"','__pattern_2')
...
...
string = string.replace('"','__pattern_42')
string = string.replace('__pattern_1','{"')
string = string.replace('__pattern_2','}"')
...
...
js = json.loads(s,strict=False)
This should work. What I am doing here is to simply replace all the expected double quotes with something else and then remove the unwanted double quotes. and then convert it back.
import re
import json
def fix_json_string(st):
st = re.sub(r'","',"!!",st)
st = re.sub(r'":"',"--",st)
st = re.sub(r'{"',"{{",st)
st = re.sub(r'"}',"}}",st)
st = st.replace('"','')
st = re.sub(r'}}','"}',st)
st = re.sub(r'{{','{"',st)
st = re.sub(r'--','":"',st)
st = re.sub(r'!!','","',st)
return st
broken_string = '{"key1":"my"value","key2":"my"value2"}'
fixed_string = fix_json_string(broken_string)
print(fixed_string)
js = json.dumps(eval(fixed_string))
print(js)
Output -
{"key1":"myvalue","key2":"myvalue2"} # str
{"key1": "myvalue", "key2": "myvalue2"} # converted to json
The variable string is not a valid JSON string.
The correct string should be:
string = '{"key1":"my\\"value","key2":"my\\"value2"}'
Problem is, that the string contains invalid json format.
String '{"key1": "my"value", "key2": "my"value2"}': value of key1 ends with "my" and additional characters value" are against the format.
You can use character escaping, valid json would look like:
{"key1": "my\"value", "key2": "my\"value2"}.
Since you are defining it as value you would then need to escape the escape characters:
string = '{"key1": "my\\"value", "key2": "my\\"value2"}'
There is a lot of educative material online on character escaping. I recommend to check it out if something is not clear
Edit: If you insist on fixing the string in code (which I don't recommend, see comment) you can do something like this:
import re
import json
string = '{"key1":"my"value","key2":"my"value2"}'
# finds contents of keys and values, assuming that the key the real key/value ending double quotes
# followed by one of following characters: ,}:]
m = re.finditer(r'"([^:]+?)"(?:[,}:\]])', string)
new_string = string
for i in reversed(list(m)):
ss, se = i.span(1) # first group holds the content
# escape double backslashes in the content and add all back together
# note that this is not effective. Bigger amounts of replacements would require other approach of concatanation
new_string = new_string[:ss] + new_string[ss:se].replace('"', '\\"') + new_string[se:]
json.loads(new_string)
This assumes that the real ending double quotes are followed by one of ,:}]. In other cases this won't work

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.
I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()
The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)
Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.
to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.
string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

Extract json values using just regex

I have a description field that is embedded within json and I'm unable to utilize json libraries to parse this data.
I use {0,23} in order in attempt to extract first 23 characters of string, how to extract entire value associated with description ?
import re
description = "'\description\" : \"this is a tesdt \n another test\" "
re.findall(r'description(?:\w+){0,23}', description, re.IGNORECASE)
For above code just ['description'] is displayed
You could try this code out:
import re
description = "description\" : \"this is a tesdt \n another test\" "
result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]
print(result)
Which gives you the result of:
"this is a tesdt
another test"
Which is essentially:
\"this is a tesdt \n another test\"
And is what you have asked for in the comments.
Explanation -
(?<=description") is a positive look-behind that tells the regex to match the text preceded by description"
(?:\s*\:\s*) is a non-capturing group that tells the regex that description" will be followed by zero-or-more spaces, a colon (:) and again zero-or-more spaces.
(".{0,23}?(?=")") is the actual match desired, which consists of a double-quotes ("), zero-to-twenty three characters, and a double-quotes (") at the end.
# First just creating some test JSON
import json
data = {
'items': [
{
'description': 'A "good" thing',
# This is ignored because I'm assuming we only want the exact key 'description'
'full_description': 'Not a good thing'
},
{
'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
},
]
}
j = json.dumps(data)
print(j)
# The actual code
import re
pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [
# I'm using json.loads just to parse the matched string to interpret
# escapes properly. If this is not acceptable then ast.literal_eval
# will probably also work
json.loads(d)
for d in re.findall(pattern, j)]
# Testing that it works
assert descriptions == [item['description'] for item in data['items']]

I want to replace single quotes with double quotes in a list

So I am making a program that takes a text file, breaks it into words, then writes the list to a new text file.
The issue I am having is I need the strings in the list to be with double quotes not single quotes.
For example
I get this ['dog','cat','fish'] when I want this ["dog","cat","fish"]
Here is my code
with open('input.txt') as f:
file = f.readlines()
nonewline = []
for x in file:
nonewline.append(x[:-1])
words = []
for x in nonewline:
words = words + x.split()
textfile = open('output.txt','w')
textfile.write(str(words))
I am new to python and haven't found anything about this.
Anyone know how to solve this?
[Edit: I forgot to mention that i was using the output in an arduino project that required the list to have double quotes.]
You cannot change how str works for list.
How about using JSON format which use " for strings.
>>> animals = ['dog','cat','fish']
>>> print(str(animals))
['dog', 'cat', 'fish']
>>> import json
>>> print(json.dumps(animals))
["dog", "cat", "fish"]
import json
...
textfile.write(json.dumps(words))
Most likely you'll want to just replace the single quotes with double quotes in your output by replacing them:
str(words).replace("'", '"')
You could also extend Python's str type and wrap your strings with the new type changing the __repr__() method to use double quotes instead of single. It's better to be simpler and more explicit with the code above, though.
class str2(str):
def __repr__(self):
# Allow str.__repr__() to do the hard work, then
# remove the outer two characters, single quotes,
# and replace them with double quotes.
return ''.join(('"', super().__repr__()[1:-1], '"'))
>>> "apple"
'apple'
>>> class str2(str):
... def __repr__(self):
... return ''.join(('"', super().__repr__()[1:-1], '"'))
...
>>> str2("apple")
"apple"
>>> str2('apple')
"apple"
In Python, double quote and single quote are the same. There's no different between them. And there's no point to replace a single quote with a double quote and vice versa:
2.4.1. String and Bytes literals
...In plain English: Both types of literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash () character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character...
"The issue I am having is I need the strings in the list to be with double quotes not single quotes." - Then you need to make your program accept single quotes, not trying to replace single quotes with double quotes.

Python query for code examples

I want to create something like a dictionary for python code examples. My problem is, that I have to escape all the code examples. Also r'some string' is not useful. Would you recommend to use an other solution to query this entries?
import easygui
lex = {"dict": "woerter = {\"house\" : \"Haus\"}\nwoerter[\"house\"]",\
"for": "for x in range(0, 3):\n print \"We are on time %d\" % (x)",\
"while": "while expression:\n statement(s)"}
input_ = easygui.enterbox("Python-lex","")
output = lex[input_]
b = easygui.textbox("","",output)
Use triple quoting:
lex = {"dict": '''\
woerter = {"house" : "Haus"}
woerter["house"]
''',
"for": '''\
for x in range(0, 3):
print "We are on time %d" % (x)
''',
"while": '''\
while expression:
statement(s)
'''}
Triple-quoted strings (using ''' or """ delimiters) preserve newlines and any embedded single quotes do not need to be escaped.
The \ escape after the opening ''' triple quote escapes the newline at the start, making the value a little easier to read. The alternative would be to put the first line directly after the opening quotes.
You can make these raw as well; r'''\n''' would contain the literal characters \ and n, but literal newlines still remain literal newlines. Triple-quoting works with double-quote characters too: """This is a triple-quoted string too""". The only thing you'd have to escape is another triple quote in the same style; you only need to escape one quote character in that case:
triple_quote_with_embedded_triple = '''Triple quotes use \''' and """ delimiters'''
I guess you can use json.dumps(data, incident=1) to convert the data, and transfer into easygui.textbox.
like this below:
import json
import easygui
resp = dict(...)
easygui.textbox(text=json.dumps(resp, indent=1))

Categories