Does the Python have something like reference in PHP? - python

I want to get some data from XML and return it in the same variable, so i do like this before return:
for element in response:
document = parseString(element)
try:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
element = False
return response
But the response still contains raw xml data instead of parsed results... Basically I want that element = ... value returned to response value.

Edit: Improving the answer according delnan and jonrsharpe.
The problem with your code is that, when you loop
for element in response
in each iteration element will be pointing to a new created object couse the lines:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
and the object it was pointing before still the same. You can try this.
for index,element in enumerate(response):
document = parseString(element)
try:
response[index]= {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
response[index]= False
return response

Assuming that response is a list of strings and you want to replace those strings with the parsed elements dict..., that's easy. Since lists are mutable containers, you can replace the elements as you go. No need to return the list... the one you pass in is changed.
def convert_response_to_elements(response):
for index, element_str in enumerate(response):
document = parseString(element_str)
try:
element = {
'scopes': document.getElementsByTagNameNS('*', 'Scopes')[0].firstChild.nodeValue,
'address': document.getElementsByTagNameNS('*', 'XAddrs')[0].firstChild.nodeValue
}
except:
element = False
response[index] = element

Since response is a list, I would use pythons list comprehension. This will create new list without modifying old one.
new_response = [modify_element(element) for element in response]
Later if you want to remove elements that equal to False you can use filter function:
without_false = filter(lambda element: bool(element), new_response)

Related

Parsing Boolean from a json and assigning to a variable does not work when the value is False

Using Python, I've got a function retrieving a list of operations from an API endpoint.
The functino takes a filter argument as a variable in order to filter the results on a given predicate.
Function looks like this:
def list_operations(filter=None):
# make a curl call to the product recognizer
headers = {
'Authorization': 'Bearer {}'.format(creds.token),
'Content-Type': 'application/json',
}
response = requests.get(
'https://{}/v1alpha1/projects/{}/locations/us-central1/operations'.format(API_ENDPOINT, project),
headers=headers
)
# dump the json response and display their names
data = json.loads(response.text)
#add a Metadata element to the operations if it does not exist
for item in data['operations']:
if not item.get('metadata'):
item['metadata'] = {}
item['metadata']['createTime'] = ''
else:
if not item['metadata'].get('createTime'):
item['metadata']['createTime'] = ''
# Order operations by create time if the metadata exists and the createTime exists
data['operations'] = sorted(data['operations'], key=lambda k: k['done'], reverse=True)
if filter:
# filter the operations by the filter value
# Parse the filter value to get the operation name
filter_path = filter.split('=')[0].split('.')
filter_value = filter.split('=')[1]
#check if the filter_value could be a Boolean
if filter_value == 'True':
filter_value = True
elif filter_value == 'False':
filter_value = False
# iterate backwards to avoid index out of range error using reversed
for item in reversed(data['operations']):
# for every element in filter_path, check if it exists
item_value = item
for filter_el in filter_path:
if item_value.get(filter_el):
item_value = item_value.get(filter_el)
# if the item value is not equal to the filter value, remove it from the list
if item_value != filter_value:
data['operations'].remove(item)
My problem is when I'm calling the function with/
list_operations(filter='done=False')
even when the done key from the response message is False, the assignment of the value to item_value does not work:
item_value = item_value.get(filter_el)
Using the debugger, item_value is {'name': 'api_path/operation-1676883175156-5f51dc9fc2ad1-b4c56f97-edd1e5be', 'done': False, 'metadata': {'createTime': ''}} instead of False
It works fine when calling
list_operations(filter='done=True')
I can't see what's missing here ...
[EDIT]
Problem was it the
if item_value.get(filter_el):
To test existence of the key, should have done:
if filter_el in item_value:
stupid mistake ...
json.loads() loads a JSON to a Python dictionary, so "done": false is already converted to {"done": False} in Python:
import json
d = json.loads("""{"name": "api_path/operation-1676883175156-5f51dc9fc2ad1-b4c56f97-edd1e5be",
"done": false,
"metadata": {"createTime": ""}}""")
print(type(d['done']))
>>> <class 'bool'>
I don't have your full response so I cannot help beyond this point.

What would be the best way to cycle through an array of elements from HTML in order to use 2 separate tag names in the order they came?

Not really sure how to word this question properly, but I'm basically playing around with python and using Selenium to scrape a website and I'm trying to create a JSON file with the data.
Here's the goal I'm aiming to achieve:
{
"main1" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
},
"main2" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
}
}
The problem I'm facing at the moment is that the website has no indentation or child elements. It looks like this (but longer and actual copy, of course):
<h3>Main1</h3>
<p>Sub1</p>
<p>Sub2</p>
<p>Sub3</p>
<p>Sub4</p>
<h3>Main2</h3>
Now I want to iterate through the HTML in order to use the <h3> tags as the parent ("Main" in the JSON example) and <p> tags as the children(sub[num]). I'm new to both python and Selenium, so I may have done this wrong, but I've tried using items.find_elements_by_tag_name('el') to separate two, but I don't know how to put them back together in the order that they originally came.
I then tried looping through all the elements and separating the tags using if (item.tag_name == "el"): loops. This works perfectly when I print the results of each loop, but when it comes to putting them together in a JSON file, I have the same issue as the previous method where I cannot seem to get the 2 to join. I've tried a few variations and I either get key errors or only the last item in the loop gets recorded.
Just for reference, here's the code for this step:
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//*")
statuses = [
"Status1",
"Status2",
"Status3",
"Status4"
]
for item in itemList: #iterate through the HTML
if (item.tag_name == "h3"): #Separate H3 Tags
main = item.text
print("======================================")
print(main)
print("======================================")
if (item.tag_name == 'p'): #Separate P tags
for status in statuses:
if(status in item.text): #Filter P tags to only display info that contains words in the Status array
delimeters = ":", "(", "See"
regexPattern = "|".join(map(re.escape, delimeters))
zoneData = re.split(regexPattern, item.text)
#Split P tags into separate parts
sub1 = zoneData[0]
sub2 = zoneData[1].translate({ord('*'): None})
sub3 = zoneData[2].translate({ord(")"): None})
print(sub1)
print(sub2)
print(sub3)
The final option I've decided to try is to try going through all the HTML again, but using enumerate() and using the element's IDs and including all the tags between the 2 IDs, but I'm not really sure what my plan of action is with this just yet.
In general, the last option seems a bit convoluted and I'm pretty certain there's a simpler way to do this. What would you suggest?
Here's my idea, but I didn't do the data part, you can add it later.
I assume that there's no duplicate in main name, or else you will lose some info.
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//p|.//h3") # only finds h3 or p
def construct(item_list):
current_main = ''
final_dict: dict = {}
for item in item_list:
if item.tag_name == "h3":
current_main = item.text
final_dict[current_main] = {} # create empty dict inside main. remove if you want to update the main dict
if item.tag_name == "p":
p_name = item.text
final_dict[current_main][p_name] = "data"
return final_dict

Is there an efficient way to retrieve values from dictionary

I have a dictionary of values. This is for a company name.
It has 3 keys:
{'html_attributions': [],
'result' : {'Address': '123 Street', 'website' :'123street.com'
'status': 'Ok' }
I have a dataframe of many dictionaries. I want to loop through each row's dictionary and get the necessary information I want.
Currently I am writing for loops to retrieve these information. Is there a more efficient way to retrieve these information?
addresses = []
for i in range(len(testing)):
try:
addresses.append(testing['Results_dict'][i]['result']['Address'])
except:
addresses.append('No info')
What I have works perfectly fine. However I would like something that would be more efficient. Perhaps using the get() method? but I don't know how I can call to get the inside of 'result'.
Try this:
def get_address(r):
try:
return r['result']['Address']
except Exception:
return 'No info'
addresses = df['Results_dict'].map(get_address)
This guards against cases where Result_dict is None, not a dict, or any key along the path way does not exist.
This is a way faster solution if the data is big:
addresses = list(map(lambda x: x.get('result').get('Address', 'No info'), testing['Results_dict']))
Here is how I deal with nested dict keys:
Example:
def keys_exists(element, *keys):
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments,
one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True
For data :
{'html_attributions': [],
'result' : {'Address': '123 Street', 'website' :'123street.com'
'status': 'Ok' }
if you want to check result exists or not use above function like this
`print 'result (exists/Not): {}'.format(keys_exists(data,"result"))`
To check address exist inside result Try this
`print 'result > Address (exists/not): {}'.format(keys_exists(data, "result", "Address"))`
It will return output in True/False

How to check if key is present in nested list in json?

I have a JSON file where each object looks like the following example:
[
{
"timestamp": 1569177699,
"attachments": [
],
"data": [
{
"post": "\u00f0\u009f\u0096\u00a4\u00f0\u009f\u0092\u0099"
},
{
"update_timestamp": 1569177699
}
],
"title": "firstName LastName"
}
]
I want to check if, there is the key post, nested within the key data. I wrote this, but it doesn't work:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
if 'post' in post['data']
print post['data']['post']
Here is my solution. I see from your sample data that post["data"] is a list, so the program should iterate over it:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
#THIS IS THE NEW LINE to iterate list
for d in post["data"]:
if 'post' in d:
print d['post']
Try:
posts = json.loads(open(file).read())
for data in posts:
for key, value in data.items():
if key == 'data':
for item in value:
if 'post' in item:
print(key, item['post'])
Try this answer this works!
Elegant way to check if a nested key exists in a python dict
def keys_exists(element, *keys):
'''
Check if *keys (nested) exists in `element` (dict).
'''
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments, one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True
You could do it generically by adapting my answer to the question How to find a particular json value by key?.
It's generic in the sense that it doesn't care much about the details of how the JSON data is structured, it just checks every dictionary it finds inside it.
import json
def find_values(id, json_file):
results = []
def _decode_dict(a_dict):
try:
results.append(a_dict[id])
except KeyError:
pass
return a_dict
json.load(json_file, object_hook=_decode_dict) # Return value ignored.
return len(results) > 0 # If there are any results, id was found.
with open('find_key_test.json', 'r') as json_file:
print(find_values('post', json_file)) # -> True
please try the following:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
for data in post['data']:
if 'post' in data:
print(data['post'])

Logical evaluation error when looping through dictionary of conditions

I'm looping through a list of web pages with Scrapy. Some of the pages that I scrape are in error. i want to keep track of the various error types so I have set up my function to first check if a series of error conditions ( which I have placed in a dictionary are true and if none are proceed with normal page scraping:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"' pageis not found' in response.body" : 'invalid',
"'has been transferred' in response.body" : 'transferred',
}
for key, value in error_cases.iteritems():
if bool(key):
error_value = True
output = value
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
........................
However I see that despite even in cases with none of the error cases being valid, the 'invalid' option is being selected. What am I doing wrong?
Your conditions (something in response.body) are not evaluated. Instead, you evaluate the truth value of a nonempty string, which is True.
This might work:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"pageis not found" : 'invalid',
"has been transferred" : 'transferred',
}
for key, value in error_cases.iteritems():
if key in response.body:
error_value = True
output = value
break
.................
(Must it be "pageis not found" or "page is not found"?)
bool(key) will convert key from a string to a bool.
What it won't do is actually evaluate the condition. You could use eval() for that, but I'd recommend instead storing a list of functions (each returning an object or throwing an exception) rather than your current dict-with-string-keys-that-are-actually-Python-code.
I'm not sure why you are evaluating bool(key) like you are. Let's look at your error_cases. You have two keys, and two values. "' pageis not found' in response.body" will be your key the first time, and "'has been transferred' in response.body" will be the key in the second round in your for loop. Neither of those will be false when you check bool(key), because key has a value other than False or 0.
>>> a = "' pageis not found' in response.body"
>>> bool(a)
True
You need to have a different evaluator other than bool(key) there or you will always have an error.
Your conditions are strings, so they are not be evaluated.
You could evaluate your strings using eval(key) function, that is quite unsafe.
With the help of the operator module, there is no need to evaluate unsafe strings (as long as your conditions stay quite simple).
error['operator'] holds reference to the 'contains' function, which can be used as a replacement for 'in'.
from operator import contains
class ...:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = [
{'search': ' pageis not found', 'operator': contains, 'output': 'invalid' },
{'search': 'has been transferred', 'operator': contains, 'output': 'invalid' },
]
for error in error_cases:
if error['operator'](error['search'], response.body):
error_value = True
output = error['output']
print output
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
...

Categories