How to extract items inside JSON one by one with regex condition

How to extract items inside JSON one by one with regex condition - python

I use Google Vision API on my project. The OCR result returns a JSON file that represents all the items the API recognized with coordinates. I want to add a feature that runs through the whole JOSN to find the item I want and then store the coordinate and the description into an array/list.
This is the returned JSON format:
{
"textAnnotations": [
{
"description": "a",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},
{
"description": "b",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},{c...},{d...},{e...}
],
"fullTextAnnotation": {
"pages": "not important",
"text": "a\nb\nc\nd\ne\n"
}
}
My aim is to find 2 items and calculate whether they are parallel. For example, I want to find out b or c or d or e is parallel with a, and I have already stored the coordinate of a into a list with this method:
def getJson():
try:
f = open('json_file.json', 'r', encoding="utf-8")
string = f.read()
origin_data = json.loads(string)
return origin_data
except Exception as e:
print(e)
print(traceback.format_exc())
def get_keywords_coordinates(origin_data):
__nodes = [__node for __node in origin_data['textAnnotations'] if __node['description'] == "a"]
__keyword_coords = []
for __lv in range(0, 4):
__tempx = __node['boundingPoly']['vertices'][__lv]['x']
__keyword_coords.append(__tempx)
__tempy = __node['boundingPoly']['vertices'][__lv]['y']
__keyword_coords.append(__tempy)
return __keyword_coords
which keyword_coords is the list that contains the coordinate, which looks like this:
keyword_coords[235, 409, 247, 408, 250, 456, 238, 457]
I will put it and another keyword coordinate into a function to do that calculation but I have no idea how to get the coordinate of b, c, d, and e one by one (abcde is just an example, the real situation will not be able to define the item name with hard code. I may let the program finds out the keywords with some regex)
How should I deal with this?

I don't know what exactly you want to do but it doesn't need regex but normal for-loop to work with items one by one.
First I would change get_keywords_coordinates to get all items and coordinates
def get_keywords_coordinates(data):
results = []
for item in data['textAnnotations']:
key = item["description"]
coords = []
for point in item["boundingPoly"]['vertices']:
coords.append(point['x'])
coords.append(point['y'])
results.append( (key, coords) )
return results
results = get_keywords_coordinates(data)
print('--- coords ---')
print(results)
Result:
--- coords ---
[
('a', [235, 409, 247, 408, 250, 456, 238, 457]),
('b', [335, 409, 347, 408, 350, 456, 338, 457]),
('c', [435, 409, 447, 408, 450, 456, 438, 457])
]
And I would get some selected itme (i.e. first item with a) and create list without this item
selected = results[0]
#rest = results[1:]
rest = results.copy() # more useful if I would selected item with different index
rest.remove(selected) # more useful if I would selected item with different index
print('--- items ---')
print('selected:', selected)
print('rest :', rest)
print('---')
Result:
--- items ---
selected: ('a', [235, 409, 247, 408, 250, 456, 238, 457])
rest : [('b', [335, 409, 347, 408, 350, 456, 338, 457]), ('c', [435, 409, 447, 408, 450, 456, 438, 457])]
And I could use for-loop to compare selected item with other items - one by one
for item in rest:
print('compare', selected[0], 'with', item[0])
print(selected[0], selected[1])
print(item[0], item[1])
Result:
compare a with b
a [235, 409, 247, 408, 250, 456, 238, 457]
b [335, 409, 347, 408, 350, 456, 338, 457]
compare a with c
a [235, 409, 247, 408, 250, 456, 238, 457]
c [435, 409, 447, 408, 450, 456, 438, 457]
Full example:
data = {
"textAnnotations": [
{
"description": "a",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},
{
"description": "b",
"boundingPoly": {
"vertices": [
{
"x": 335,
"y": 409
},
{
"x": 347,
"y": 408
},
{
"x": 350,
"y": 456
},
{
"x": 338,
"y": 457
}
]
}
},
{
"description": "c",
"boundingPoly": {
"vertices": [
{
"x": 435,
"y": 409
},
{
"x": 447,
"y": 408
},
{
"x": 450,
"y": 456
},
{
"x": 438,
"y": 457
}
]
}
},
],
"fullTextAnnotation": {
"pages": "not important",
"text": "a\nb\nc\nd\ne\n"
}
}
def get_keywords_coordinates(data):
results = []
for item in data['textAnnotations']:
key = item["description"]
coords = []
for point in item["boundingPoly"]['vertices']:
coords.append(point['x'])
coords.append(point['y'])
results.append( (key, coords) )
return results
results = get_keywords_coordinates(data)
print('--- coords ---')
print(results)
selected = results[0]
#rest = results[1:]
rest = results.copy()
rest.remove(selected)
print('--- keywords ---')
print('selected:', selected)
print('rest :', rest)
print('---')
for item in rest:
print('compare', selected[0], 'with', item[0])
print(selected[0], selected[1])
print(item[0], item[1])

Related

Getting AttributeError while calling RandomForest()

I have been trying to do hyperopt tuning using the following models but I keep getting this traceback. I have tried changing the parameters, added different code for the n_estimators but to no use. I am not able to solve it with any of the solutions that are available online.
# Defining Search Space
space = hp.choice('classifiers', [
{
'model': LogisticRegression(),
'params': {
'model__penalty': hp.choice('lr.penalty', ['l2']),
'model__C': hp.choice('lr.C', np.arange(0.005,1.0,0.01))
}
},
{
'model': BernoulliNB(),
'params': {}
},
{
'model': tree.DecisionTreeClassifier(),
'params': {
'model__max_depth' : hp.choice('tree.max_depth',
range(5, 30, 1)),
}
},
{
'model': xgb.XGBClassifier(),
'params': {
'model__max_depth' : hp.choice('xgb.max_depth',
range(5, 30, 1)),
'model__learning_rate': hp.loguniform ('learning_rate', 0.01, 0.5),
'model__gamma': hp.loguniform('xbg.gamma', 0.0, 2.0),
'model__random_state' : 42
}
},
# {
# 'model': GradientBoostingClassifier(),
# 'params': {
# 'model__n_estimators': hp.uniformint('n_estimators', 100, 500),
# 'model__max_depth': hp.uniformint('max_depth', 2, 20),
# 'model__random_state' : 42
# }
# },
{
'model': RandomForestClassifier(),
'params': {
'model__n_estimators' : hp.randint('rf.n_estimators_', [100, 200, 300, 400]),
'model__max_depth': hp.uniformint('rf.max_depth', 2, 20),
'model__min_samples_split':hp.uniformint('rf.min_samples_split', 2, 10),
'model__bootstrap': hp.choice('rf.bootstrap', [True, False]),
'model__max_features': hp.choice('rf.max_features', ['auto', 'sqrt']),
'model__random_state' : np.random.RandomState(42)
}
}
])
Traceback (most recent call last):
File "<input>", line 4, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll_utils.py", line 18, in wrapper
return f(label, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll_utils.py", line 72, in hp_choice
return scope.switch(ch, *options)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 188, in __call__
return self.symbol_table._new_apply(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 61, in _new_apply
pos_args = [as_apply(a) for a in args]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 61, in <listcomp>
pos_args = [as_apply(a) for a in args]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 211, in as_apply
named_args = [(k, as_apply(v)) for (k, v) in items]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 211, in <listcomp>
named_args = [(k, as_apply(v)) for (k, v) in items]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 217, in as_apply
rval = Literal(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 534, in __init__
o_len = len(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sklearn/ensemble/_base.py", line 195, in __len__
return len(self.estimators_)
AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'
I have tried everything at this point and would appreciate any/all help. Thank you!

how to add dictionary object name to json object

I have 3 python dictionaries as below:
gender = {'Female': 241, 'Male': 240}
marital_status = {'Divorced': 245, 'Engaged': 243, 'Married': 244, 'Partnered': 246, 'Single': 242}
family_type = {'Extended': 234, 'Joint': 235, 'Nuclear': 233, 'Single Parent': 236}
I add them to a list:
lst = [gender, marital_status, family_type]
And create a JSON object which I need to save as a JSON file using pd.to_json using:
jf = json.dumps(lst, indent = 4)
When we look at jf object:
print(jf)
[
{
"Female": 241,
"Male": 240
},
{
"Divorced": 245,
"Engaged": 243,
"Married": 244,
"Partnered": 246,
"Single": 242
},
{
"Extended": 234,
"Joint": 235,
"Nuclear": 233,
"Single Parent": 236
}
]
Is there a way to make the dictionary name as key and get output as below:
{
"gender": {
"Female": 241,
"Male": 240
},
"marital_status": {
"Divorced": 245,
"Engaged": 243,
"Married": 244,
"Partnered": 246,
"Single": 242
},
"family_type": {
"Extended": 234,
"Joint": 235,
"Nuclear": 233,
"Single Parent": 236
}
}

You'll have to do this manually by creating a dictionary and mapping the name to the sub_dictionary yourself.
my_data = {'gender': gender, 'marital_status':marital_status, 'family_type': family_type}
Edit: example of adding to an outfile using json.dump
with open('myfile.json','w') as wrtier:
json.dump(my_data, writer)

As per your requirement you can done it like this by replacing line lst
dict_req = {"gender":gender, "marital_status":marital_status, "family_type":family_type}

Iterate through nested JSON in Python

js = {
"status": "ok",
"meta": {
"count": 1
},
"data": {
"542250529": [
{
"all": {
"spotted": 438,
"battles_on_stunning_vehicles": 0,
"avg_damage_blocked": 39.4,
"capture_points": 40,
"explosion_hits": 0,
"piercings": 3519,
"xp": 376586,
"survived_battles": 136,
"dropped_capture_points": 382,
"damage_dealt": 783555,
"hits_percents": 74,
"draws": 2,
"battles": 290,
"damage_received": 330011,
"frags": 584,
"stun_number": 0,
"direct_hits_received": 1164,
"stun_assisted_damage": 0,
"hits": 4320,
"battle_avg_xp": 1299,
"wins": 202,
"losses": 86,
"piercings_received": 1004,
"no_damage_direct_hits_received": 103,
"shots": 5857,
"explosion_hits_received": 135,
"tanking_factor": 0.04
}
}
]
}
}
Let us name this json "js" as a variable, this variable will be in a for-loop.
To understand better what I'm doing here, I'm trying to collect data from a game.
This game has hundreds of different tanks, each tank has tank_id with which I can post tank_id to the game server and respond the performance data as "js".
for tank_id: json = requests.post(tank_id) etc...
and fetch all these values to my database as shown in the screenshot.
my python code for it:
def api_get():
for property in js['data']['542250529']['all']:
spotted = property['spotted']
battles_on_stunning_vehicles = property['battles_on_stunning_vehicles']
# etc
# ...
insert_to_db(spotted, battles_on_stunning_vehicles, etc....)
the exception is:
for property in js['data']['542250529']['all']:
TypeError: list indices must be integers or slices, not str
and when:
print(js['data']['542250529'])
i get the rest of the js as a string, and i can't iterate... can't be used a valid json string, also what's inside js['data']['542250529'] is a list containing only the item 'all'..., any help would be appreciated

You just missed [0] to get the first item in a list:
def api_get():
for property in js['data']['542250529'][0]['all']:
spotted = property['spotted']
# ...
Look carefully at the data structure in the source JSON.

There is a list containing the dictionary with a key of all. So you need to use js['data']['542250529'][0]['all'] not js['data']['542250529']['all']. Then you can use .items() to get the key-value pairs.
See below.
js = {
"status": "ok",
"meta": {
"count": 1
},
"data": {
"542250529": [
{
"all": {
"spotted": 438,
"battles_on_stunning_vehicles": 0,
"avg_damage_blocked": 39.4,
"capture_points": 40,
"explosion_hits": 0,
"piercings": 3519,
"xp": 376586,
"survived_battles": 136,
"dropped_capture_points": 382,
"damage_dealt": 783555,
"hits_percents": 74,
"draws": 2,
"battles": 290,
"damage_received": 330011,
"frags": 584,
"stun_number": 0,
"direct_hits_received": 1164,
"stun_assisted_damage": 0,
"hits": 4320,
"battle_avg_xp": 1299,
"wins": 202,
"losses": 86,
"piercings_received": 1004,
"no_damage_direct_hits_received": 103,
"shots": 5857,
"explosion_hits_received": 135,
"tanking_factor": 0.04
}
}
]
}
}
for key, val in js['data']['542250529'][0]['all'].items():
print("key:", key, " val:", val)
#Or this way
for key in js['data']['542250529'][0]['all']:
print("key:", key, " val:", js['data']['542250529'][0]['all'][key])

TypeError: list indices must be integers or slices, not str <encoding error>

So, my code looks like this:
import requests
import random
def load():
req = requests.get(https: // yande.re / post.json?tags = rating % 3
Asafe + -pantyshot + -panties + & ms = 1 & page = 12650 & limit = 1)
data = Posts(req.json()["id"][0], req.json()["tags"], slice(req.json()["creator_id"]), req.json()["author"],
req.json()["source"],
req.json()["score"], req.json()["md5"], req.json()["file_url"], req.json()["sample_url"],
req.json()["width"],
req.json()["height"])
all = data.tags, data.creator_id, data.author, data.source, data.score, data.md5, data.file_url, data.sample_url, data.width, data.height
return all
And, when I run the load(), I have this output:
Traceback (most recent call last): File "", line
134, in File "", line 126, in anime
TypeError: list indices must be integers or slices, not str
What could be causing it?
By the way, the data I'm fetching looks like this:
[
{
"actual_preview_height": 218,
"jpeg_url": "https://files.yande.re/image/32a001e7b5050828c9b07e62de634958/yande.re%20376617%20dress%20novelance%20see_through.jpg",
"status": "active",
"preview_url": "https://assets.yande.re/data/preview/32/a0/32a001e7b5050828c9b07e62de634958.jpg",
"has_children": false,
"source": "http://i2.pixiv.net/img-original/img/2016/12/05/00/00/10/60241721_p0.jpg",
"score": 1,
"height": 1392,
"rating": "s",
"id": 376617,
"last_commented_at": 0,
"frames": [],
"md5": "32a001e7b5050828c9b07e62de634958",
"updated_at": 1480900734,
"creator_id": 280440,
"frames_pending_string": "",
"frames_string": "",
"actual_preview_width": 300,
"is_shown_in_index": true,
"frames_pending": [],
"change": 1992459,
"last_noted_at": 0,
"approver_id": null,
"is_held": false,
"preview_width": 150,
"tags": "dress novelance see_through",
"preview_height": 109,
"created_at": 1480900721,
"file_ext": "jpg",
"sample_height": 1088,
"sample_url": "https://files.yande.re/sample/32a001e7b5050828c9b07e62de634958/yande.re%20376617%20sample%20dress%20novelance%20see_through.jpg",
"parent_id": null,
"width": 1920,
"jpeg_file_size": 0,
"sample_file_size": 478570,
"author": "LolitaJoy",
"file_size": 989513,
"file_url": "https://files.yande.re/image/32a001e7b5050828c9b07e62de634958/yande.re%20376617%20dress%20novelance%20see_through.jpg",
"is_note_locked": false,
"is_pending": false,
"sample_width": 1500,
"jpeg_width": 1920,
"jpeg_height": 1392,
"is_rating_locked": false
}
]

I found the problem actually. instead of req.json()["id"], it should have been req.json()[0]["id"]

Creating a hierarchy path from JSON

Given the JSON below, what would be the best way to create a hierarchical list of "name" for a given "id"? There could be any number of sections in the hierarchy.
For example, providing id "156" would return "Add Storage Devices, Guided Configuration, Configuration"
I've been looking into using iteritems(), but could do with some help.
{
"result": true,
"sections": [
{
"depth": 0,
"display_order": 1,
"id": 154,
"name": "Configuration",
"parent_id": null,
"suite_id": 5
},
{
"depth": 1,
"display_order": 2,
"id": 155,
"name": "Guided Configuration",
"parent_id": 154,
"suite_id": 5
},
{
"depth": 2,
"display_order": 3,
"id": 156,
"name": "Add Storage Devices",
"parent_id": 155,
"suite_id": 5
},
{
"depth": 0,
"display_order": 4,
"id": 160,
"name": "NEW",
"parent_id": null,
"suite_id": 5
},
{
"depth": 1,
"display_order": 5,
"id": 161,
"name": "NEWS",
"parent_id": 160,
"suite_id": 5
}
]
}

Here's one approach:
def get_path(data, section_id):
path = []
while section_id is not None:
section = next(s for s in data["sections"] if s["id"] == section_id)
path.append(section["name"])
section_id = section["parent_id"]
return ", ".join(path)
... which assumes that data is the result of json.loads(json_text) or similar, and section_id is an int (which is what you've got for ids in that example JSON).
For your example usage:
>>> get_path(data, 156)
u'Add Storage Devices, Guided Configuration, Configuration'

Probably the simplest way is to create a dictionary mapping the IDs to names. For example:
name_by_id = {}
data = json.loads(the_json_string)
for section in data['sections']:
name_by_id[section['id']] = section['name']
or using dict comprehensions:
name_by_id = {section['id']: section['name'] for section in data['sections']}
then you can get specific element:
>>> name_by_id[156]
... 'Add Storage Devices'
or get all IDs:
>>> name_by_id.keys()
... [160, 161, 154, 155, 156]

This is probably what you want:
>>> sections = data['sections']
>>> lookup = {section['id']: section for section in sections}
>>> lookup[None] = {}
>>> for section in sections:
parent = lookup[section['parent_id']]
if 'childs' not in parent:
parent['childs'] = []
parent['childs'].append(section)
>>> def printRecurse (section, indent = 0):
if 'childs' in section:
section['childs'].sort(lambda x: x['display_order'])
for child in section['childs']:
print('{}{}: {}'.format(' ' * indent, child['id'], child['name']))
printRecurse(child, indent + 1)
>>> printRecurse(lookup[None])
154: Configuration
155: Guided Configuration
156: Add Storage Devices
160: NEW
161: NEWS

I believe you want something like:
def get_name_for_id(id_num, sections):
cur_depth = -1
texts = []
for elem in sections:
if elem['depth'] < cur_depth:
del texts[:]
elif elem['depth'] == cur_depth:
texts.pop()
texts.append(elem['name'])
cur_depth = elem['depth']
if elem['id'] == id_num:
return ', '.join(reversed(texts))
With your data it returns:
In [11]: get_name_for_id(156, data['sections'])
Out[11]: 'Add Storage Devices, Guided Configuration, Configuration'
Also it takes into account the hierarchy based on depth, thus if in your data the id 156 refers to depth = 0 the result is:
In [16]: get_name_for_id(156, data['sections'])
Out[16]: 'Add Storage Devices'
If the depth of the id 156 was 1 then the value returned is:
In [22]: get_name_for_id(156, data['sections'])
Out[22]: 'Add Storage Devices, Configuration'
Basically it considers the trees:
depth 156 = 0 depth 156 = 1 depth 156 = 2
154 156 154 154
| | |
| / \ 155
155 155 156 |
156
And it returns the concatenation of the names in the path from the 156 to the root of the tree.

you can also do like this. But for this your input should be like.
Replace null with None and true with True in your input dictionary.
def filtering(d,id_n):
names = []
while id_n:
id_n,name=[(sec['parent_id'],sec['name']) for sec in d['sections'] if sec['id'] == id_n][0]
names.append(name)
return names
d = {
"result": True, #making 'true' with 'True'
"sections": [
{
"depth": 0,
"display_order": 1,
"id": 154,
"name": "Configuration",
"parent_id": None,
"suite_id": 5
},
{
"depth": 1,
"display_order": 2,
"id": 155,
"name": "Guided Configuration",
"parent_id": 154,
"suite_id": 5
},
{
"depth": 2,
"display_order": 3,
"id": 156,
"name": "Add Storage Devices",
"parent_id": 155,
"suite_id": 5
},
{
"depth": 0,
"display_order": 4,
"id": 160,
"name": "NEW",
"parent_id": None,
"suite_id": 5
},
{
"depth": 1,
"display_order": 5,
"id": 161,
"name": "NEWS",
"parent_id": 160,
"suite_id": 5
}
]
}
testing the code with given inputs:-
id_n = 156
>>> filtering(d,id_n)
['Add Storage Devices', 'Guided Configuration', 'Configuration']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract items inside JSON one by one with regex condition - python

Related

Getting AttributeError while calling RandomForest()

how to add dictionary object name to json object

Iterate through nested JSON in Python

TypeError: list indices must be integers or slices, not str <encoding error>

Creating a hierarchy path from JSON

Categories

Resources