I have a string :
'{tomatoes : 5 , livestock :{cow : 5 , sheep :2 }}'
and would like to convert it to
{
"tomatoes" : "5" ,
"livestock" :"{"cow" : "5" , "sheep" :"2" }"
}
Any ideas ?
This has been settled in 988251
In short; use the python ast library's literal_eval() function.
import ast
my_string = "{'key':'val','key2':2}"
my_dict = ast.literal_eval(my_string)
The problem with your input string is that it's actually not a valid JSON because your keys are not declared as strings, otherwise you could just use the json module to load it and be done with it.
A simple and dirty way to get what you want is to first turn it into a valid JSON by adding quotation marks around everything that's not a whitespace or a syntax character:
source = '{tomatoes : 5 , livestock :{cow : 5 , sheep :2 }}'
output = ""
quoting = False
for char in source:
if char.isalnum():
if not quoting:
output += '"'
quoting = True
elif quoting:
output += '"'
quoting = False
output += char
print(output) # {"tomatoes" : "5" , "livestock" :{"cow" : "5" , "sheep" :"2" }}
This gives you a valid JSON so now you can easily parse it to a Python dict using the json module:
import json
parsed = json.loads(output)
# {'livestock': {'sheep': '2', 'cow': '5'}, 'tomatoes': '5'}
What u have is a JSON formatted string which u want to convert to python dictionary.
Using the JSON library :
import json
with open("your file", "r") as f:
dictionary = json.loads(f.read());
Now dictionary contains the data structure which ur looking for.
Here is my answer:
dict_str = '{tomatoes: 5, livestock: {cow: 5, sheep: 2}}'
def dict_from_str(dict_str):
while True:
try:
dict_ = eval(dict_str)
except NameError as e:
key = e.message.split("'")[1]
dict_str = dict_str.replace(key, "'{}'".format(key))
else:
return dict_
print dict_from_str(dict_str)
My strategy is to convert the dictionary str to a dict by eval. However, I first have to deal with the fact that your dictionary keys are not enclosed in quotes. I do that by evaluating it anyway and catching the error. From the error message, I extract the key that was interpreted as an unknown variable, and enclose it with quotes.
Related
I am running a piece of code in Python3 where I am consuming JSON data from the source. I don't have control over the source. While reading the json data I am getting following error:
simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2
Here is the code
import logging
import simplejson as json
logging.basicConfig(level=logging.INFO)
consumer = KafkaConsumer(
bootstrap_servers='localhost:9092',
api_version=(1,0,0))
consumer.subscribe(['Test_Topic-1'])
for message in consumer:
msg_str=message.value
y = json.loads(msg_str)
print(y["city_name"])
As I can not change the source, I need to fix it at my end. I found out this post helpful as my data contains the timestamps with : in it: How to Fix JSON Key Values without double-quotes?
But it also fails for some values in my json data as those values contain : in it. e.g.
address:"1600:3050:rf02:hf64:h000:0000:345e:d321"
Is there any way where I can add double quotes to keys in my json data?
You can try to use module dirtyjson - it can fix some mistakes.
import dirtyjson
d = dirtyjson.loads('{address:"1600:3050:rf02:hf64:h000:0000:345e:d321"}')
print( d['address'] )
d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')
print( d['abc'] )
It creates AttributedDict so it may need dict() to create normal dictionary
d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')
print( d )
print( dict(d) )
Result:
AttributedDict([('abc', '1:2:3:4'), ('efg', '5:6:7:8'), ('hij', 'foo')])
{'abc': '1:2:3:4', 'efg': '5:6:7:8', 'hij': 'foo'}
I think your problem is that you have strings like this:
{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}
which are not valid JSON. You could try to repair it with a regular expression substitution:
import re
jtxt_bad ='{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"\n}'
jtxt = re.sub(r'\b([a-zA-Z]+):("[^"]+"[,\n}])', r'"\1":\2', jtxt_bad)
print(f'Original: {jtxt_bad}\nRepaired: {jtxt}')
The output of this is:
Original: {abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"
}
Repaired: {"abc":"1:2:3:4", "efg":"5:6:7:8", "hij":"foo", "klm":"bar"
}
The regular expression \b([a-zA-Z]+):("[^"]+"[,\}]) means: boundary, followed by one or more letters, followed by a :, followed by double-quoted string, followed by one of ,, }, \n. However, this will fail if there is a quote inside the string, such as "1:\"2:3".
i have the below string that i am trying to split into a dictionary with specific names.
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
what I am hoping to obtain is:
>>> print(dict)
{'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
i'm quite new to working with dictionaries, so not too sure how it is supposed to turn out.
I have tried the below code, but it only provides instruction1 instead of the full list of instructions
delimiters = ['&nn', '&tr']
values = re.split('|'.join(delimiters), string1)
values.pop(0) # remove the initial empty string
keys = re.findall('|'.join(delimiters), string1)
output = dict(zip(keys, values))
print(output)
Use url-parsing.
from urllib import parse
url = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
d = parse.parse_qs(parse.urlparse(url).query)
print(d)
Returns:
{'nn': ['specialtime'],
'tr': ['instruction1', 'instruction2', 'instruction3'],
'x': ['klink:apple']}
And from this point, if necessary..., you would simply have to rename and pick your vars. Like this:
d = {
'namy_names':d.get('nn',['Empty'])[0],
'tracks':d.get('tr',[])
}
# {'namy_names': 'specialtime', 'tracks': ['instruction1', 'instruction2', 'instruction3']}
This looks like url-encoded data, so you can/should use urllib.parse.parse_qs:
import urllib.parse
string1 = "fdsfsf:?x=klink:apple&nn=specialtime&tr=instruction1&tr=instruction2&tr=instruction3"
dic = urllib.parse.parse_qs(string1)
dic = {'namy_names': dic['nn'][0],
'tracks': dic['tr']}
# result: {'namy_names': 'specialtime',
# 'tracks': ['instruction1', 'instruction2', 'instruction3']}
Say that I have a JSON file whose structure is either unknown or may change overtime - I want to replace all values of "REPLACE_ME" with a string of my choice in Python.
Everything I have found assumes I know the structure. For example, I can read the JSON in with json.load and walk through the dictionary to do replacements then write it back. This assumes I know Key names, structure, etc.
How can I replace ALL of a given string value in a JSON file with something else?
This function recursively replaces all strings which equal the value original with the value new.
This function works on the python structure - but of course you can use it on a json file - by using json.load
It doesn't replace keys in the dictionary - just the values.
def nested_replace( structure, original, new ):
if type(structure) == list:
return [nested_replace( item, original, new) for item in structure]
if type(structure) == dict:
return {key : nested_replace(value, original, new)
for key, value in structure.items() }
if structure == original:
return new
else:
return structure
d = [ 'replace', {'key1': 'replace', 'key2': ['replace', 'don\'t replace'] } ]
new_d = nested_replace(d, 'replace', 'now replaced')
print(new_d)
['now replaced', {'key1': 'now replaced', 'key2': ['now replaced', "don't replace"]}]
I think there's no big risk if you want to replace any key or value enclosed with quotes (since quotes are escaped in json unless they are part of a string delimiter).
I would dump the structure, perform a str.replace (with double quotes), and parse again:
import json
d = { 'foo': {'bar' : 'hello'}}
d = json.loads(json.dumps(d).replace('"hello"','"hi"'))
print(d)
result:
{'foo': {'bar': 'hi'}}
I wouldn't risk to replace parts of strings or strings without quotes, because it could change other parts of the file. I can't think of an example where replacing a string without double quotes can change something else.
There are "clean" solutions like adapting from Replace value in JSON file for key which can be nested by n levels but is it worth the effort? Depends on your requirements.
Why not modify the file directly instead of treating it as a JSON?
with open('filepath') as f:
lines = f.readlines()
for line in lines:
line = line.replace('REPLACE_ME', 'whatever')
with open('filepath_new', 'a') as f:
f.write(line)
You could load the JSON file into a dictionary and recurse through that to find the proper values but that's unnecessary muscle flexing.
The best way is to simply treat the file as a string and do the replacements that way.
json_file = 'my_file.json'
with open(json_file) as f:
file_data = f.read()
file_data = file_data.replace('REPLACE_ME', 'new string')
<...>
with open(json_file, 'w') as f:
f.write(file_data)
json_data = json.loads(file_data)
From here the file can be re-written and you can continue to use json_data as a dict.
Well that depends, if you want to place all the strings entitled "REPLACE_ME" with the same string you can use this. The for loop loops through all the keys in the dictionary and then you can use the keys to select each value in the dictionary. If it is equal to your replacement string it will replace it with the string you want.
search_string = "REPLACE_ME"
replacement = "SOME STRING"
test = {"test1":"REPLACE_ME", "test2":"REPLACE_ME", "test3":"REPLACE_ME", "test4":"REPLACE_ME","test5":{"test6":"REPLACE_ME"}}
def replace_nested(test):
for key,value in test.items():
if type(value) is dict:
replace_nested(value)
else:
if value==search_string:
test[key] = replacement
replace_nested(test)
print(test)
To solve this problem in a dynamic way, I have obtained to use the same json file to declare the variables that we want to replace.
Json File :
{
"properties": {
"property_1": "value1",
"property_2": "value2"
},
"json_file_content": {
"key_to_find": "{{property_1}} is my value"
"dict1":{
"key_to_find": "{{property_2}} is my other value"
}
}
Python code (references Replace value in JSON file for key which can be nested by n levels):
import json
def fixup(self, a_dict:dict, k:str, subst_dict:dict) -> dict:
"""
function inspired by another answers linked below
"""
for key in a_dict.keys():
if key == k:
for s_k, s_v in subst_dict.items():
a_dict[key] = a_dict[key].replace("{{"+s_k+"}}",s_v)
elif type(a_dict[key]) is dict:
fixup(a_dict[key], k, subst_dict)
# ...
file_path = "my/file/path"
if path.exists(file_path):
with open(file_path, 'rt') as f:
json_dict = json.load(f)
fixup(json_dict ["json_file_content"],"key_to_find",json_dict ["properties"])
print(json_dict) # json with variables resolved
else:
print("file not found")
Hope it helps
So I have a key value file that's similar to JSON's format but it's different enough to not be picked up by the Python JSON parser.
Example:
"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}
Is there any easy way to parse this text file and store the values into an array such that I could access the data using a format similar to Matt[Items][Banana]? There is only to be one pair per line and a bracket should denote going down a level and going up a level.
You could use re.sub to 'fix up' your string and then parse it. As long as the format is always either a single quoted string or a pair of quoted strings on each line, you can use that to determine where to place commas and colons.
import re
s = """"Matt"
{
"Location" "New York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat" "1"
}
}"""
# Put a colon after the first string in every line
s1 = re.sub(r'^\s*(".+?")', r'\1:', s, flags=re.MULTILINE)
# add a comma if the last non-whitespace character in a line is " or }
s2 = re.sub(r'(["}])\s*$', r'\1,', s1, flags=re.MULTILINE)
Once you've done that, you can use ast.literal_eval to turn it into a Python dict. I use that over JSON parsing because it allows for trailing commas, without which the decision of where to put commas becomes a lot more complicated:
import ast
data = ast.literal_eval('{' + s2 + '}')
print data['Matt']['Items']['Banana']
# 2
Not sure how robust this approach is outside of the example you've posted but it does support for escaped characters and deeper levels of structured data. It's probably not going to be fast enough for large amounts of data.
The approach converts your custom data format to JSON using a (very) simple parser to add the required colons and braces, the JSON data can then be converted to a native Python dictionary.
import json
# Define the data that needs to be parsed
data = '''
"Matt"
{
"Location" "New \\"York"
"Age" "22"
"Items"
{
"Banana" "2"
"Apple" "5"
"Cat"
{
"foo" "bar"
}
}
}
'''
# Convert the data from custom format to JSON
json_data = ''
# Define parser states
state = 'OUT'
key_or_value = 'KEY'
for c in data:
# Handle quote characters
if c == '"':
json_data += c
if state == 'IN':
state = 'OUT'
if key_or_value == 'KEY':
key_or_value = 'VALUE'
json_data += ':'
elif key_or_value == 'VALUE':
key_or_value = 'KEY'
json_data += ','
else:
state = 'IN'
# Handle braces
elif c == '{':
if state == 'OUT':
key_or_value = 'KEY'
json_data += c
elif c == '}':
# Strip trailing comma and add closing brace and comma
json_data = json_data.rstrip().rstrip(',') + '},'
# Handle escaped characters
elif c == '\\':
state = 'ESCAPED'
json_data += c
else:
json_data += c
# Strip trailing comma
json_data = json_data.rstrip().rstrip(',')
# Wrap the data in braces to form a dictionary
json_data = '{' + json_data + '}'
# Convert from JSON to the native Python
converted_data = json.loads(json_data)
print(converted_data['Matt']['Items']['Banana'])
Suppose I'm dealing with the following two (or more) JSON strings from a dictionary:
JSONdict['context'] = '{"Context":"{context}","PID":"{PID}"}'
JSONdict['RDFchildren'] = '{"results":[ {"object" :
"info:fedora/book:fullbook"} ,{"object" :
"info:fedora/book:images"} ,{"object" :
"info:fedora/book:HTML"} ,{"object" :
"info:fedora/book:altoXML"} ,{"object" :
"info:fedora/book:thumbs"} ,{"object" :
"info:fedora/book:originals"} ]}'
I would like to create a merged JSON string, with "context" and "query" as root level keys. Something like this:
{"context": {"PID": "wayne:campbellamericansalvage", "Context":
"object_page"}, "RDFchildren": {"results": [{"object":
"info:fedora/book:fullbook"}, {"object":
"info:fedora/book:images"}, {"object":
"info:fedora/book:HTML"}, {"object":
"info:fedora/book:altoXML"}, {"object":
"info:fedora/book:thumbs"}, {"object":
"info:fedora/book:originals"}]}}
The following works, but I'd like to avoid using eval() if possible.
# using eval
JSONevaluated = {}
for each in JSONdict:
JSONevaluated[each] = eval(JSONdict[each])
JSONpackage = json.dumps(JSONevaluated)
Also got this way working, but feels hackish and I'm afraid encoding and escaping will become problematic as more realistic metadata comes through:
#iterate through dictionary, unpack strings and concatenate
concatList = []
for key in JSONdict:
tempstring = JSONdict[key][1:-1] #removes brackets
concatList.append(tempstring)
JSONpackage = ",".join(concatList) #comma delimits
JSONpackage = "{"+JSONpackage+"}" #adds brackets for well-formed JSON
Any thoughts? advice?
You can use json.loads() instead of eval() in your first example.