Getting AttributeError while calling RandomForest() - python

I have been trying to do hyperopt tuning using the following models but I keep getting this traceback. I have tried changing the parameters, added different code for the n_estimators but to no use. I am not able to solve it with any of the solutions that are available online.
# Defining Search Space
space = hp.choice('classifiers', [
{
'model': LogisticRegression(),
'params': {
'model__penalty': hp.choice('lr.penalty', ['l2']),
'model__C': hp.choice('lr.C', np.arange(0.005,1.0,0.01))
}
},
{
'model': BernoulliNB(),
'params': {}
},
{
'model': tree.DecisionTreeClassifier(),
'params': {
'model__max_depth' : hp.choice('tree.max_depth',
range(5, 30, 1)),
}
},
{
'model': xgb.XGBClassifier(),
'params': {
'model__max_depth' : hp.choice('xgb.max_depth',
range(5, 30, 1)),
'model__learning_rate': hp.loguniform ('learning_rate', 0.01, 0.5),
'model__gamma': hp.loguniform('xbg.gamma', 0.0, 2.0),
'model__random_state' : 42
}
},
# {
# 'model': GradientBoostingClassifier(),
# 'params': {
# 'model__n_estimators': hp.uniformint('n_estimators', 100, 500),
# 'model__max_depth': hp.uniformint('max_depth', 2, 20),
# 'model__random_state' : 42
# }
# },
{
'model': RandomForestClassifier(),
'params': {
'model__n_estimators' : hp.randint('rf.n_estimators_', [100, 200, 300, 400]),
'model__max_depth': hp.uniformint('rf.max_depth', 2, 20),
'model__min_samples_split':hp.uniformint('rf.min_samples_split', 2, 10),
'model__bootstrap': hp.choice('rf.bootstrap', [True, False]),
'model__max_features': hp.choice('rf.max_features', ['auto', 'sqrt']),
'model__random_state' : np.random.RandomState(42)
}
}
])
Traceback (most recent call last):
File "<input>", line 4, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll_utils.py", line 18, in wrapper
return f(label, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll_utils.py", line 72, in hp_choice
return scope.switch(ch, *options)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 188, in __call__
return self.symbol_table._new_apply(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 61, in _new_apply
pos_args = [as_apply(a) for a in args]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 61, in <listcomp>
pos_args = [as_apply(a) for a in args]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 211, in as_apply
named_args = [(k, as_apply(v)) for (k, v) in items]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 211, in <listcomp>
named_args = [(k, as_apply(v)) for (k, v) in items]
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 217, in as_apply
rval = Literal(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/hyperopt/pyll/base.py", line 534, in __init__
o_len = len(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sklearn/ensemble/_base.py", line 195, in __len__
return len(self.estimators_)
AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'
I have tried everything at this point and would appreciate any/all help. Thank you!

Related

how to add dictionary object name to json object

I have 3 python dictionaries as below:
gender = {'Female': 241, 'Male': 240}
marital_status = {'Divorced': 245, 'Engaged': 243, 'Married': 244, 'Partnered': 246, 'Single': 242}
family_type = {'Extended': 234, 'Joint': 235, 'Nuclear': 233, 'Single Parent': 236}
I add them to a list:
lst = [gender, marital_status, family_type]
And create a JSON object which I need to save as a JSON file using pd.to_json using:
jf = json.dumps(lst, indent = 4)
When we look at jf object:
print(jf)
[
{
"Female": 241,
"Male": 240
},
{
"Divorced": 245,
"Engaged": 243,
"Married": 244,
"Partnered": 246,
"Single": 242
},
{
"Extended": 234,
"Joint": 235,
"Nuclear": 233,
"Single Parent": 236
}
]
Is there a way to make the dictionary name as key and get output as below:
{
"gender": {
"Female": 241,
"Male": 240
},
"marital_status": {
"Divorced": 245,
"Engaged": 243,
"Married": 244,
"Partnered": 246,
"Single": 242
},
"family_type": {
"Extended": 234,
"Joint": 235,
"Nuclear": 233,
"Single Parent": 236
}
}
You'll have to do this manually by creating a dictionary and mapping the name to the sub_dictionary yourself.
my_data = {'gender': gender, 'marital_status':marital_status, 'family_type': family_type}
Edit: example of adding to an outfile using json.dump
with open('myfile.json','w') as wrtier:
json.dump(my_data, writer)
As per your requirement you can done it like this by replacing line lst
dict_req = {"gender":gender, "marital_status":marital_status, "family_type":family_type}

How to extract items inside JSON one by one with regex condition

I use Google Vision API on my project. The OCR result returns a JSON file that represents all the items the API recognized with coordinates. I want to add a feature that runs through the whole JOSN to find the item I want and then store the coordinate and the description into an array/list.
This is the returned JSON format:
{
"textAnnotations": [
{
"description": "a",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},
{
"description": "b",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},{c...},{d...},{e...}
],
"fullTextAnnotation": {
"pages": "not important",
"text": "a\nb\nc\nd\ne\n"
}
}
My aim is to find 2 items and calculate whether they are parallel. For example, I want to find out b or c or d or e is parallel with a, and I have already stored the coordinate of a into a list with this method:
def getJson():
try:
f = open('json_file.json', 'r', encoding="utf-8")
string = f.read()
origin_data = json.loads(string)
return origin_data
except Exception as e:
print(e)
print(traceback.format_exc())
def get_keywords_coordinates(origin_data):
__nodes = [__node for __node in origin_data['textAnnotations'] if __node['description'] == "a"]
__keyword_coords = []
for __lv in range(0, 4):
__tempx = __node['boundingPoly']['vertices'][__lv]['x']
__keyword_coords.append(__tempx)
__tempy = __node['boundingPoly']['vertices'][__lv]['y']
__keyword_coords.append(__tempy)
return __keyword_coords
which keyword_coords is the list that contains the coordinate, which looks like this:
keyword_coords[235, 409, 247, 408, 250, 456, 238, 457]
I will put it and another keyword coordinate into a function to do that calculation but I have no idea how to get the coordinate of b, c, d, and e one by one (abcde is just an example, the real situation will not be able to define the item name with hard code. I may let the program finds out the keywords with some regex)
How should I deal with this?
I don't know what exactly you want to do but it doesn't need regex but normal for-loop to work with items one by one.
First I would change get_keywords_coordinates to get all items and coordinates
def get_keywords_coordinates(data):
results = []
for item in data['textAnnotations']:
key = item["description"]
coords = []
for point in item["boundingPoly"]['vertices']:
coords.append(point['x'])
coords.append(point['y'])
results.append( (key, coords) )
return results
results = get_keywords_coordinates(data)
print('--- coords ---')
print(results)
Result:
--- coords ---
[
('a', [235, 409, 247, 408, 250, 456, 238, 457]),
('b', [335, 409, 347, 408, 350, 456, 338, 457]),
('c', [435, 409, 447, 408, 450, 456, 438, 457])
]
And I would get some selected itme (i.e. first item with a) and create list without this item
selected = results[0]
#rest = results[1:]
rest = results.copy() # more useful if I would selected item with different index
rest.remove(selected) # more useful if I would selected item with different index
print('--- items ---')
print('selected:', selected)
print('rest :', rest)
print('---')
Result:
--- items ---
selected: ('a', [235, 409, 247, 408, 250, 456, 238, 457])
rest : [('b', [335, 409, 347, 408, 350, 456, 338, 457]), ('c', [435, 409, 447, 408, 450, 456, 438, 457])]
And I could use for-loop to compare selected item with other items - one by one
for item in rest:
print('compare', selected[0], 'with', item[0])
print(selected[0], selected[1])
print(item[0], item[1])
Result:
compare a with b
a [235, 409, 247, 408, 250, 456, 238, 457]
b [335, 409, 347, 408, 350, 456, 338, 457]
compare a with c
a [235, 409, 247, 408, 250, 456, 238, 457]
c [435, 409, 447, 408, 450, 456, 438, 457]
Full example:
data = {
"textAnnotations": [
{
"description": "a",
"boundingPoly": {
"vertices": [
{
"x": 235,
"y": 409
},
{
"x": 247,
"y": 408
},
{
"x": 250,
"y": 456
},
{
"x": 238,
"y": 457
}
]
}
},
{
"description": "b",
"boundingPoly": {
"vertices": [
{
"x": 335,
"y": 409
},
{
"x": 347,
"y": 408
},
{
"x": 350,
"y": 456
},
{
"x": 338,
"y": 457
}
]
}
},
{
"description": "c",
"boundingPoly": {
"vertices": [
{
"x": 435,
"y": 409
},
{
"x": 447,
"y": 408
},
{
"x": 450,
"y": 456
},
{
"x": 438,
"y": 457
}
]
}
},
],
"fullTextAnnotation": {
"pages": "not important",
"text": "a\nb\nc\nd\ne\n"
}
}
def get_keywords_coordinates(data):
results = []
for item in data['textAnnotations']:
key = item["description"]
coords = []
for point in item["boundingPoly"]['vertices']:
coords.append(point['x'])
coords.append(point['y'])
results.append( (key, coords) )
return results
results = get_keywords_coordinates(data)
print('--- coords ---')
print(results)
selected = results[0]
#rest = results[1:]
rest = results.copy()
rest.remove(selected)
print('--- keywords ---')
print('selected:', selected)
print('rest :', rest)
print('---')
for item in rest:
print('compare', selected[0], 'with', item[0])
print(selected[0], selected[1])
print(item[0], item[1])

Iterate through nested JSON in Python

js = {
"status": "ok",
"meta": {
"count": 1
},
"data": {
"542250529": [
{
"all": {
"spotted": 438,
"battles_on_stunning_vehicles": 0,
"avg_damage_blocked": 39.4,
"capture_points": 40,
"explosion_hits": 0,
"piercings": 3519,
"xp": 376586,
"survived_battles": 136,
"dropped_capture_points": 382,
"damage_dealt": 783555,
"hits_percents": 74,
"draws": 2,
"battles": 290,
"damage_received": 330011,
"frags": 584,
"stun_number": 0,
"direct_hits_received": 1164,
"stun_assisted_damage": 0,
"hits": 4320,
"battle_avg_xp": 1299,
"wins": 202,
"losses": 86,
"piercings_received": 1004,
"no_damage_direct_hits_received": 103,
"shots": 5857,
"explosion_hits_received": 135,
"tanking_factor": 0.04
}
}
]
}
}
Let us name this json "js" as a variable, this variable will be in a for-loop.
To understand better what I'm doing here, I'm trying to collect data from a game.
This game has hundreds of different tanks, each tank has tank_id with which I can post tank_id to the game server and respond the performance data as "js".
for tank_id: json = requests.post(tank_id) etc...
and fetch all these values to my database as shown in the screenshot.
my python code for it:
def api_get():
for property in js['data']['542250529']['all']:
spotted = property['spotted']
battles_on_stunning_vehicles = property['battles_on_stunning_vehicles']
# etc
# ...
insert_to_db(spotted, battles_on_stunning_vehicles, etc....)
the exception is:
for property in js['data']['542250529']['all']:
TypeError: list indices must be integers or slices, not str
and when:
print(js['data']['542250529'])
i get the rest of the js as a string, and i can't iterate... can't be used a valid json string, also what's inside js['data']['542250529'] is a list containing only the item 'all'..., any help would be appreciated
You just missed [0] to get the first item in a list:
def api_get():
for property in js['data']['542250529'][0]['all']:
spotted = property['spotted']
# ...
Look carefully at the data structure in the source JSON.
There is a list containing the dictionary with a key of all. So you need to use js['data']['542250529'][0]['all'] not js['data']['542250529']['all']. Then you can use .items() to get the key-value pairs.
See below.
js = {
"status": "ok",
"meta": {
"count": 1
},
"data": {
"542250529": [
{
"all": {
"spotted": 438,
"battles_on_stunning_vehicles": 0,
"avg_damage_blocked": 39.4,
"capture_points": 40,
"explosion_hits": 0,
"piercings": 3519,
"xp": 376586,
"survived_battles": 136,
"dropped_capture_points": 382,
"damage_dealt": 783555,
"hits_percents": 74,
"draws": 2,
"battles": 290,
"damage_received": 330011,
"frags": 584,
"stun_number": 0,
"direct_hits_received": 1164,
"stun_assisted_damage": 0,
"hits": 4320,
"battle_avg_xp": 1299,
"wins": 202,
"losses": 86,
"piercings_received": 1004,
"no_damage_direct_hits_received": 103,
"shots": 5857,
"explosion_hits_received": 135,
"tanking_factor": 0.04
}
}
]
}
}
for key, val in js['data']['542250529'][0]['all'].items():
print("key:", key, " val:", val)
#Or this way
for key in js['data']['542250529'][0]['all']:
print("key:", key, " val:", js['data']['542250529'][0]['all'][key])

jsonb join not working properly in sqlalchemy

I have a query that joins on a jsonb type column in postgres that I want to convert to sqlalchemy in django using the aldjemy package
SELECT anon_1.key AS tag, count(anon_1.value ->> 'polarity') AS count_1, anon_1.value ->> 'polarity' AS anon_2
FROM feedback f
JOIN tagging t ON t.feedback_id = f.id
JOIN jsonb_each(t.json_content -> 'entityMap') AS anon_3 ON true
JOIN jsonb_each(((anon_3.value -> 'data') - 'selectionState') - 'segment') AS anon_1 ON true
where f.id = 2
GROUP BY anon_1.value ->> 'polarity', anon_1.key;
The json_content field stores data in the following format:
{
"entityMap":
{
"0":
{
"data":
{
"people":
{
"labelId": 5,
"polarity": "positive"
},
"segment": "a small segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 75,
"anchorOffset": 3
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
},
"1":
{
"data":
{
"product":
{
"labelId": 6,
"polarity": "positive"
},
"segment": "another segment",
"selectionState":
{
"focusKey": "9xrre",
"hasFocus": true,
"anchorKey": "9xrre",
"isBackward": false,
"focusOffset": 138,
"anchorOffset": 79
}
},
"type": "TAG",
"mutability": "IMMUTABLE"
}
}
}
I wrote the following sqlalchemy code to achieve the query
first_alias = aliased(func.jsonb_each(Tagging.sa.json_content["entityMap"]))
print(first_alias)
second_alias = aliased(
func.jsonb_each(
first_alias.c.value.op("->")("data")
.op("-")("selectionState")
.op("-")("segment")
)
)
polarity = second_alias.c.value.op("->>")("polarity")
p_tag = second_alias.c.key
_count = (
Feedback.sa.query()
.join(
CampaignQuestion,
CampaignQuestion.sa.question_id == Feedback.sa.question_id,
isouter=True,
)
.join(Tagging)
.join(first_alias, true())
.join(second_alias, true())
.filter(CampaignQuestion.sa.campaign_id == campaign_id)
.with_entities(p_tag.label("p_tag"), func.count(polarity), polarity)
.group_by(polarity, p_tag)
.all()
)
print(_count)
but it is giving me a NotImplementedError: Operator 'getitem' is not supported on this expression error on accessing first_alias.c
the stack trace:
Traceback (most recent call last):
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/home/work/api/app/campaign/views.py", line 119, in results_p_tags
d = campaign_service.get_p_tag_count_for_campaign_results(id)
File "/home/work/api/app/campaign/services/campaign.py", line 177, in get_p_tag_count_for_campaign_results
return campaign_selectors.get_p_tag_counts_for_campaign(campaign_id)
File "/home/work/api/app/campaign/selectors.py", line 196, in get_p_tag_counts_for_campaign
polarity = second_alias.c.value.op("->>")("polarity")
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 1093, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 746, in columns
self._populate_column_collection()
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 1617, in _populate_column_collection
self.element._generate_fromclause_column_proxies(self)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 703, in _generate_fromclause_column_proxies
fromclause._columns._populate_separate_keys(
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in _populate_separate_keys
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/base.py", line 1216, in <genexpr>
self._colset.update(c for k, c in self._collection)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 831, in operate
return op(self.comparator, *other, **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/operators.py", line 434, in __getitem__
return self.operate(getitem, index)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/type_api.py", line 75, in operate
return o[0](self.expr, op, *(other + o[1:]), **kwargs)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 173, in _getitem_impl
_unsupported_impl(expr, op, other, **kw)
File "/home/.cache/pypoetry/virtualenvs/api-FPSaTdE5-py3.8/lib/python3.8/site-packages/sqlalchemy/sql/default_comparator.py", line 177, in _unsupported_impl
raise NotImplementedError(
NotImplementedError: Operator 'getitem' is not supported on this expression
Any help would be greatly appreciated
PS: The sqlalchemy version I'm using for this is 1.4.6
I used the same sqlalchmy query expression before in a flask project using sqlalchemy version 1.3.22 and it was working correctly
Fixed the issue by using table_valued functions as mentioned in the docs,
and accessing the ColumnCollection of the function using indices instead of keys. Code is as follows:
first_alias = func.jsonb_each(Tagging.sa.json_content["entityMap"]).table_valued(
"key", "value"
)
second_alias = func.jsonb_each(
first_alias.c[1].op("->")("data").op("-")("selectionState").op("-")("segment")
).table_valued("key", "value")
polarity = second_alias.c[1].op("->>")("polarity")
p_tag = second_alias.c[0]

Partial updating of object in elastic search using python

So the puamapi/apiobjects_american/4901 object looks like this:
{
"_id": "4701",
"_index": "puamapi",
"_source": {
"CatRais": null,
"Classification": "Photographs",
"Constituents": [],
"CreditLine": "Gift of H. Kelley Rollings, Class of 1948, and Mrs. Rollings",
"CuratorApproved": 0,
"DateBegin": 1921,
"DateEnd": 1921,
"Dated": "1921",
"Department": "Photography",
"DimensionsLabel": "image: 19.3 x 24.6 cm (7 5/8 x 9 11/16 in.)\r\nsheet: 20.2 x 25.4 cm (7 15/16 x 10 in.)",
"Edition": null,
"Medium": "Gelatin silver print",
"ObjectID": 4701,
"ObjectNumber": "1995-341",
"ObjectStatus": "Accessioned Object",
"Restrictions": "Restricted",
"SortNumber": " 1995 341",
"SysTimeStamp": "AAAAAAAAC3k="
},
"_type": "apiobjects_american",
"_version": 4,
"found": true
}
I want to do a partial update on the object, where we add a constituent to the constituent array.
The record looks like this:
{'params': {'item': [{'ConstituentID': 5}]}, 'script': 'if (ctx._source[Constituents] == null) {ctx._source.Constituents = item } else { ctx._source.Constituents+= item }'}
And then I add with an elastic search instance in python:
es.update(index="puamapi", doc_type="apiobjects_american", id=4901, body=record)
But, I'm getting this error
Traceback (most recent call last):
File "json_to_elasticsearch.py", line 138, in <module>
load_xrefs(api_xrefs)
File "json_to_elasticsearch.py", line 118, in load_xrefs
load_xref(table, xref_map[table][0], xref_map[table][1], json.load(file)["RECORDS"])
File "json_to_elasticsearch.py", line 109, in load_xref
es.update(index=database, doc_type=table1, id=id1, body=record)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 460, in update
doc_type, id, '_update'), params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 108, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Bastion][127.0.0.1:9300][indices:data/write/update[s]]')
Any insights would be appreciated. Thanks!

Categories