I have a string that contains a dictionary, and inside there's a field that has values that are list of tuples. When I try to load the string into json, it fails.
The format looks like
{'info_scetion': {'category1': [('response 1', '{"tag1":"str1", "tag2":"str2"}')]}}
This is one of the special inputs I receive, and it is the only one that contains a list of tuple. I do not generate the input, so I cannot change the format. Because JSON cannot parse this string directly, I was thinking about trying to identify the tuple inside the string and pick it out. For the rest, the code should be able to process.
The problem is, I'm not sure how to do it. I tried forming a regex that uses ( and ) in some forms like (.*?) to get the first incidence, but I cannot guarantee there wouldn't be any ) in the actual tuple.
If I go with this direction, how do I correctly identify the tuple?
If there's another way to do it, what is it?
EDIT: adding the } at the end
You 'JSON' is not really a JSON: it is a Python data structure, so parse it as such with the AST module:
s = "{'info_scetion': {'category1': [('response 1', '{\"tag1\":\"str1\", \"tag2\":\"str2\"}')]}}"
result = ast.literal_eval(s)
result
#{'info_scetion': {'category1': \
# [('response 1', '{"tag1":"str1", "tag2":"str2"}')]}}
Once it is imported into Python, you can manipulate it in any way you like. For example, you can "flatten" the list of tuple:
result['info_scetion']['category1'] = list(result['info_scetion']['category1'][0])
#{'info_scetion': {'category1': ['response 1', '{"tag1":"str1", "tag2":"str2"}']}}
Your json is malformed, it is missing a } at the end.
I tested things with this code and things seem to be fine.
data = {'info_scetion': {'category1': [('response 1', '{"tag1":"str1", "tag2":"str2"}')]}}
print data['info_scetion']['category1'][0][0]
# output >> response 1
print json.loads(data['info_scetion']['category1'][0][1])['tag1']
# output >> str1
Related
I am trying to construct a role in AWS where I am trying to have list of resources.
Below is an example
shared ={
"mts":{
"account_id":"11111",
"workbench":"aaaaa",
"prefix":"rad600-ars-sil,rad600-srr-sil-stage1,rad600-srr-sil-stage2"
},
"tsf":{
"account_id":"22222",
"workbench":"bbbbb",
"prefix":"yyyy"
}
}
I am trying to construct a list with
role_arn=[]
for key in shared:
role_arn.append(f"arn:aws:iam::'{shared[key]['account_id']}':role/'{shared[key]['workbench']}'_role")
here is my output:
["arn:aws:iam::'11111':role/'aaaaa'_role", "arn:aws:iam::'22222':role/'bbbbb'_role"]
I want the '' to be removed from the list while appending into the list itself.
desired output:
["arn:aws:iam::11111:role/aaaaa_role", "arn:aws:iam::22222:role/bbbbb_role"]
I am trying my hands on python.
IS there a way to achieve it?
role_arn=[]
for key in shared:
role_arn.append(f"arn:aws:iam::{shared[key]['account_id']}:role/{shared[key]['workbench']}_role")
You don't need those ' unless you want them. You can remove it and the string formatting would still work as expected and get rid of the '.
Most likely your concern was coming from not knowing the Lietral format strings. You don't need to use '' before every variable. {} takes care of it.
This is my take on it
This uses dictionary comprehension to iterate over the shared dictionary instead of a for loop
shared ={
"mts":{
"account_id":"11111",
"workbench":"aaaaa",
"prefix":"rad600-ars-sil,rad600-srr-sil-stage1,rad600-srr-sil-stage2"
},
"tsf":{
"account_id":"22222",
"workbench":"bbbbb",
"prefix":"yyyy"
}
}
role_arn = [f"arn:aws:iam::{data['account_id']}:role/{data['workbench']}_role" for key, data in shared.items()]
print(role_arn)
Which gives the output
['arn:aws:iam::11111:role/aaaaa_role', 'arn:aws:iam::22222:role/bbbbb_role']
I have a data.json file, which looks like this:
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
I am trying to get "Event" from this file using python and miserably failing at this.
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print (data['Event'])
I get the following error:
TypeError: list indices must be integers or slices, not str
And even when I try
print (data[0]['Event'])
then I get this error:
TypeError: string indices must be integers
One more thing:
print(type(data))
gives me "list"
I have searched all over and have not found a solution to this. I would really appreciate your suggestions.
You could use the ast module for this:
import ast
mydata = ["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
data = ast.literal_eval(mydata[0])
data
{'Day': 'Today', 'Event': '1', 'Date': '2019-03-20'}
data['Event']
'1'
Edit
Your original code does load the data into a list structure, but only contains a single string entry inside that list, despite proper json syntax. ast, like json, will parse that string entry into a python data structure, dict.
As it sits, when you try to index that list, it's not the same as calling a key in a dict, hence the slices cannot be str:
alist = [{'a':1, 'b':2, 'c':3}]
alist['a']
TypeError
# need to grab the dict entry from the list
adict = alist[0]
adict['a']
1
You need to convert the elements in data to dict using json module.
Ex:
import json
with open(filename) as infile:
data = json.load(infile)
for d in data:
print(json.loads(d)['Event'])
Or:
data = list(map(json.loads, data))
print(data[0]["Event"])
Output:
1
Your problem is that you are parsing it as a list that consists of a single element that is a string.
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
See how the entire content of the list is surrounded by " on either side and every other " is preceded by a \? The slash generally means to ignore the special meaning the following character might have, but interpret it as purely a string.
If you have control over the file's contents, the easiest solution would be to adjust it. You will want it to be in a format like this:
[{"Day":"Today", "Event": "1", "Date": "2019-03-20"}]
Edit: As others have suggested, you can also parse it in its current state. Granted, cleaning the data is tedious, but oftentimes worth the effort. Though this may not be one of those cases. I'm leaving this answer up anyway because it may help with explaining why OPs initial attempt did not work, and why he received the error messages he got.
Hello all and sorry if the title was worded poorly. I'm having a bit of trouble wrapping my head around how to solve this issue I have encountered. I would have liked to simply pass a dict as the value for this key in my json obj but sadly I have to pass it as a string. So, I have a json dict object that looks like this
data = {"test": "Fuzz", "options": "'{'size':'Regular','connection':'unconnected'}'"}. Obviously, I would prefer that the second dict value weren't a string representation of a dictionary but rather a dictionary. Is the best route here to just strip the second and second to last single quotes for the data[options] or is there a better alternative?
Sorry for any confusion. This is how the json object looks after I perform
json.dump(data, <filename>)
The value for options can be thought of as another variable say x and it's equivalent to '{'size':'Regular','connection':'unconnected'}'
I could do x[1:-1] but I'm not sure if that is the most pythonic way to do things here.
import ast
bad_string_dict = "'{'size':'Regular','connection':'unconnected'}'"
good_string_dict = bad_string_dict.strip("'")
good_dict = ast.literal_eval(good_string_dict)
print(good_dict)
You will have to strip quotation mark, no other way around
Given OP's comments I suggest the following:
Set the environment variable to a known data format (example: json/yaml/...), not a specific language (python)
Use the json module (or the format you've chosen) to load the data
The data should look like this:
raw_data = {"test": "Fuzz", "options": "{\"size\": \"Regular\", \"connection\": \"unconnected\"}"}
And the code should look like this:
raw_options = raw_data['options']
options = json.loads(raw_options)
data = {**raw_data, 'options': options}
I want something like this:
[
('Urbandale paid the Regional Municipality of Ottawa-Carleton "redevelopment" charges',
{'entities': [(0, 9, 'PLTF')]}),
('Urbandale carries on business as a land developer.',
{'entities': [(0, 9, 'PLTF')]})
]
I am able to set it as a variable and get it to work, but now I am try to automate constructing the JSON array. This one seems to be in a unconventional format? i.e. why is there a comma after the sentence and not :
Anyway it is the format that is required by spacy.
I've tried creating a dictionary then doing json.dumps into a json object, which is a lot easier but what is wanted is an array.
I've looked at this post: Python - Create array of json objects from for loops and tried iterating constructively but I just get an invalid syntax error
spans = []
for mention in mentions:
mention = str(mention)
for f in re.finditer(subj, mention):
spans.append(f.span())
train_data = [{mention, "entities": (f.span()[0], f.span()[1], 'PLTF')} for mention, span in zip(mentions, spans)]
Edit: using json.load(), TEST_DATA[0][1] (the structure I want) yields pretty much the same result as TRAIN_DATA[0][1] except with the additional () on the inside of the dict. I'm pretty sure its the culprit as I get this error: TypeError: 'int' object is not iterable. So how would I insert that? If I simply put an extra () around it, python parses it and it's removed.
Solved: Just put an extra []
Thanks
What you want is actually a list of tuples of str as the first element and dict as the second element. Simply put more parenthesis and move your curry bracket.
train_data = [(mention, {"entities": (f.span()[0], f.span()[1], 'PLTF')}) for mention, span in zip(mentions, spans)]
A bit lost after much research. My code below parses the JSON to a dictionary I have thought using json load
response = json.load(MSW) # -->Will take a JSON String & Turn it into a python dict
Using the iteration below I return a series like this which is fine
{u'swell': {u'components': {u'primary': {u'direction': 222.5}}}}
{u'swell': {u'components': {u'primary': {u'direction': 221.94}}}}
ourResult = response
for rs in ourResult:
print rs
But how oh how do I access the 222.5 value. The above appears to just be one long string eg response[1] and not a dictionary structure at all.
In short all I need is the numerical value (which I assume is a part of that sting) so I can test conditions in the rest of my code. Is is a dictionary? With thanks as new and lost
You have to use python syntax as follows:
>>> print response['swell']['components']['primary']['direction']
222.5
Just access the nested dictionaries, unwrapping each layer with an additional key:
for rs in ourResult:
print rs['components']['primary']['direction']