I trying to loop through a list of objects and pull out their attributes and add them to a dictionary. In this list of objects some of the data was previously populated but sometimes there will be null or blank values. When the loop runs into the blank value it throws an "Index out of Range" Error.
obj = Idea.objects.get(name=idea_name)
new_obj = []
plan = ProductPlan.objects.all()
for product in plan:
answers = product.question.answer_set.filter(idea=obj.id)
new_plan = {"title": product.title, "answer": answers[0]}
print new_plan
new_obj.append(new_plan)
return render(request, 'idea.html', {"new_obj": new_obj, "obj":obj})
If the index is null, how do I just store it as empty.
answers = product.question.answer_set.filter(idea=obj.id)
answer = answers[0] if answers.exists() else None
new_plan = {"title": product.title, "answer": answer}
Just in case you don't know, exists() would efficiently test whether a queryset is empty or not. Check django doc for more info.
Related
Not really sure how to word this question properly, but I'm basically playing around with python and using Selenium to scrape a website and I'm trying to create a JSON file with the data.
Here's the goal I'm aiming to achieve:
{
"main1" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
},
"main2" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
}
}
The problem I'm facing at the moment is that the website has no indentation or child elements. It looks like this (but longer and actual copy, of course):
<h3>Main1</h3>
<p>Sub1</p>
<p>Sub2</p>
<p>Sub3</p>
<p>Sub4</p>
<h3>Main2</h3>
Now I want to iterate through the HTML in order to use the <h3> tags as the parent ("Main" in the JSON example) and <p> tags as the children(sub[num]). I'm new to both python and Selenium, so I may have done this wrong, but I've tried using items.find_elements_by_tag_name('el') to separate two, but I don't know how to put them back together in the order that they originally came.
I then tried looping through all the elements and separating the tags using if (item.tag_name == "el"): loops. This works perfectly when I print the results of each loop, but when it comes to putting them together in a JSON file, I have the same issue as the previous method where I cannot seem to get the 2 to join. I've tried a few variations and I either get key errors or only the last item in the loop gets recorded.
Just for reference, here's the code for this step:
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//*")
statuses = [
"Status1",
"Status2",
"Status3",
"Status4"
]
for item in itemList: #iterate through the HTML
if (item.tag_name == "h3"): #Separate H3 Tags
main = item.text
print("======================================")
print(main)
print("======================================")
if (item.tag_name == 'p'): #Separate P tags
for status in statuses:
if(status in item.text): #Filter P tags to only display info that contains words in the Status array
delimeters = ":", "(", "See"
regexPattern = "|".join(map(re.escape, delimeters))
zoneData = re.split(regexPattern, item.text)
#Split P tags into separate parts
sub1 = zoneData[0]
sub2 = zoneData[1].translate({ord('*'): None})
sub3 = zoneData[2].translate({ord(")"): None})
print(sub1)
print(sub2)
print(sub3)
The final option I've decided to try is to try going through all the HTML again, but using enumerate() and using the element's IDs and including all the tags between the 2 IDs, but I'm not really sure what my plan of action is with this just yet.
In general, the last option seems a bit convoluted and I'm pretty certain there's a simpler way to do this. What would you suggest?
Here's my idea, but I didn't do the data part, you can add it later.
I assume that there's no duplicate in main name, or else you will lose some info.
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//p|.//h3") # only finds h3 or p
def construct(item_list):
current_main = ''
final_dict: dict = {}
for item in item_list:
if item.tag_name == "h3":
current_main = item.text
final_dict[current_main] = {} # create empty dict inside main. remove if you want to update the main dict
if item.tag_name == "p":
p_name = item.text
final_dict[current_main][p_name] = "data"
return final_dict
Here I'm writing a python program to list all the ec2 instances without the tag "Owner". Where I'm stuck right now is if the program encounters any Instance already having the "Owner" tag then it exits without checking further Instances. What I want is to skip those instances and continue to check other instances.
Here is my Code:
import boto3
client = boto3.client('ec2',region_name='ap-south-1')
a = client.describe_instances()
count=0
y="Owner"
for i in a['Reservations']:
for j in i['Instances']:
for k in range(len(j['Tags'])):
if y == j['Tags'][k]['Key']:
count +=1
else:
continue
if count==0:
print "The following Instance does not have Owner Tag: "+j['InstanceId']
Here is how the json dict object would look like:
{
'Reservations': [{
'Instances': [{
.
.
.
.
'Tags':[
{
'Value': 'sample1',
'Key': 'Name'
},
{
'Value': 'user',
'Key': 'Owner'
}
]
}
}
}
Your continue is causing a few problems.
Here's an alternate version:
import boto3
key_to_find = 'Owner'
client = boto3.client('ec2')
response = client.describe_instances()
for reservation in response['Reservations']:
for instance in reservation['Instances']:
if key_to_find not in [tag['Key'] for tag in instance['Tags']]:
print (key_to_find + ' tag not found for Instance ' + instance['InstanceId'])
I noticed a couple of things in your code that if you did them slightly different maybe it would be easier to control the program's flow. For starters, use more descriptive variable names, if you have a list of instances for example, a good name would simply be instances. Here is an example of how you could do this:
import boto3
client = boto3.client('ec2',region_name='ap-south-1')
data = client.describe_instances()
count=0
reservations = data["Reservations"]
for reservation in reservations:
instances = reservation["Instances"]
for instance in instances:
tags = instance["Tags"]
instanceId = instance['InstanceId']
hasOwner = len([tag for tag in tags if tag["Key"] == "Owner"]) > 0
print "The following Instance does not have Owner Tag: " + instanceId
Notice how I also added some spacing to increase readability.
Lastly, if you have a string, namely "Owner", it would be silly to put thin into a variable called owner, since that would only be confusing if it were ever changed.
You are exiting the loop after finding the first tag that does not match. Get all the tag and then check. Try this:
for i in a['Reservations']:
for j in i['Instances']: # For each instance
keys = [tag['Key'].upper() for tag in j['Tags']] # Collect all tags
if 'OWNER' not in keys: # Case insensitive check of Owner tag
print "The following Instance does not have Owner Tag: "+j['InstanceId']
having just written almost exactly this there are few areas that were over looked here. In no particular order:
The examples given will only cover one region.
The examples will only find instance that are missing an owner tag, There is the case where the Tag exists, but the Value is empty
In the case of newly launched instance, with no Tags added, the answer examples will error out.
Below answers the question, and covers the points above.
#Open an initial client to get a list of existing regions
client = boto3.client('ec2', region_name='us-east-1')
# Get a list of all existing regions
zones = client.describe_regions()
# Loop through all regions as anyone can launch in any region though we all
# tend to work in us-east-1
for region in zones["Regions"]:
#reset a dict used for tracking to make sure we don't carry data forward
bt_missing = {}
region_called = region["RegionName"]
# open a new client as we move through each region
ec2 = boto3.client('ec2', region_name=region_called)
# Get a list of running instances in the region
response = ec2.describe_instances()
# Loop though all instances looking for instances with No tags, Missing Billing Owner tags,
# or empty Billing Owner tags... adding the instance-id to a list
for resv in response["Reservations"]:
for inst in resv["Instances"]:
if "Tags" not in [x for x in inst]:
print(f"The instance: {inst['InstanceId']} has no tags at all, if your's please update: {region_called}")
continue
if "Billing_Owner" not in [t['Key'] for t in inst["Tags"]]:
bt_missing.update({inst["InstanceId"]: region_called})
for resv in response["Reservations"]:
for inst in resv["Instances"]:
if "Tags" not in [x for x in inst]:
continue
for tag in inst["Tags"]:
if tag['Key'] == "Billing_Owner" and not tag['Value']:
bt_missing.update({inst["InstanceId"]: region_called})
# Take the list of all instance-ids in all regions that have blank, or missing
# billing owner tags or no tags at all, and retrieve the Name tag for that instance
# and put that into a dict -- instanceid: name-region format
# reset inst to not carry data foward
inst = ""
for inst in bt_missing:
report_on = ec2.describe_tags(
Filters=[
{
'Name': 'resource-id',
'Values': [inst],
},
],
)
for tag in report_on["Tags"]:
if tag["Key"] == "Name":
instname = tag["Value"]
message = f"The instance: {inst} has no Billing Owner tag, if yours, please update: Name: {instname}, region: {bt_missing[inst]}"
print(message)
I would like to check retrieve items that have an attribute value that is present in the list of value I provide. Below is the query I have for searching. Unfortunately the response return an empty list of items. I don't understand why this is the case and would like to know the correct query.
def search(self, src_words, translations):
entries = []
query_src_words = [word.decode("utf-8") for word in src_words]
params = {
"TableName": self.table,
"FilterExpression": "src_word IN (:src_words) AND src_language = :src_language AND target_language = :target_language",
"ExpressionAttributeValues": {
":src_words": {"SS": query_src_words},
":src_language": {"S": config["source_language"]},
":target_language": {"S": config["target_language"]}
}
}
page_iterator = self.paginator.paginate(**params)
for page in page_iterator:
for entry in page["Items"]:
entries.append(entry)
return entries
Below is the table that I would like to query from. For example if my list of query_src_word have: [soccer ball, dog] then only row with entry_id=2 should be returned
Any insights would be much appreciated.
I think this is because in the query_src_word you have "soccer_ball" (with an underscore), while in the database you have "soccer ball" (without an underscore).
Change "soccer_ball" to "soccer ball" in your query_src_words and it should work find
Basically what I am trying to do is generate a json list of SSH keys (public and private) on a server using Python. I am using nested dictionaries and while it does work to an extent, the issue lies with it displaying every other user's keys; I need it to list only the keys that belong to the user for each user.
Below is my code:
def ssh_key_info(key_files):
for f in key_files:
c_time = os.path.getctime(f) # gets the creation time of file (f)
username_list = f.split('/') # splits on the / character
user = username_list[2] # assigns the 2nd field frome the above spilt to the user variable
key_length_cmd = check_output(['ssh-keygen','-l','-f', f]) # Run the ssh-keygen command on the file (f)
attr_dict = {}
attr_dict['Date Created'] = str(datetime.datetime.fromtimestamp(c_time)) # converts file create time to string
attr_dict['Key_Length]'] = key_length_cmd[0:5] # assigns the first 5 characters of the key_length_cmd variable
ssh_user_key_dict[f] = attr_dict
user_dict['SSH_Keys'] = ssh_user_key_dict
main_dict[user] = user_dict
A list containing the absolute path of the keys (/home/user/.ssh/id_rsa for example) is passed to the function. Below is an example of what I receive:
{
"user1": {
"SSH_Keys": {
"/home/user1/.ssh/id_rsa": {
"Date Created": "2017-03-09 01:03:20.995862",
"Key_Length]": "2048 "
},
"/home/user2/.ssh/id_rsa": {
"Date Created": "2017-03-09 01:03:21.457867",
"Key_Length]": "2048 "
},
"/home/user2/.ssh/id_rsa.pub": {
"Date Created": "2017-03-09 01:03:21.423867",
"Key_Length]": "2048 "
},
"/home/user1/.ssh/id_rsa.pub": {
"Date Created": "2017-03-09 01:03:20.956862",
"Key_Length]": "2048 "
}
}
},
As can be seen, user2's key files are included in user1's output. I may be going about this completely wrong, so any pointers are welcomed.
Thanks for the replies, I read up on nested dictionaries and found that the best answer on this post, helped me solve the issue: What is the best way to implement nested dictionaries?
Instead of all the dictionaries, I simplfied the code and just have one dictionary now. This is the working code:
class Vividict(dict):
def __missing__(self, key): # Sets and return a new instance
value = self[key] = type(self)() # retain local pointer to value
return value # faster to return than dict lookup
main_dict = Vividict()
def ssh_key_info(key_files):
for f in key_files:
c_time = os.path.getctime(f)
username_list = f.split('/')
user = username_list[2]
key_bit_cmd = check_output(['ssh-keygen','-l','-f', f])
date_created = str(datetime.datetime.fromtimestamp(c_time))
key_type = key_bit_cmd[-5:-2]
key_bits = key_bit_cmd[0:5]
main_dict[user]['SSH Keys'][f]['Date Created'] = date_created
main_dict[user]['SSH Keys'][f]['Key Type'] = key_type
main_dict[user]['SSH Keys'][f]['Bits'] = key_bits
I'm looping through a list of web pages with Scrapy. Some of the pages that I scrape are in error. i want to keep track of the various error types so I have set up my function to first check if a series of error conditions ( which I have placed in a dictionary are true and if none are proceed with normal page scraping:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"' pageis not found' in response.body" : 'invalid',
"'has been transferred' in response.body" : 'transferred',
}
for key, value in error_cases.iteritems():
if bool(key):
error_value = True
output = value
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
........................
However I see that despite even in cases with none of the error cases being valid, the 'invalid' option is being selected. What am I doing wrong?
Your conditions (something in response.body) are not evaluated. Instead, you evaluate the truth value of a nonempty string, which is True.
This might work:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = {
"pageis not found" : 'invalid',
"has been transferred" : 'transferred',
}
for key, value in error_cases.iteritems():
if key in response.body:
error_value = True
output = value
break
.................
(Must it be "pageis not found" or "page is not found"?)
bool(key) will convert key from a string to a bool.
What it won't do is actually evaluate the condition. You could use eval() for that, but I'd recommend instead storing a list of functions (each returning an object or throwing an exception) rather than your current dict-with-string-keys-that-are-actually-Python-code.
I'm not sure why you are evaluating bool(key) like you are. Let's look at your error_cases. You have two keys, and two values. "' pageis not found' in response.body" will be your key the first time, and "'has been transferred' in response.body" will be the key in the second round in your for loop. Neither of those will be false when you check bool(key), because key has a value other than False or 0.
>>> a = "' pageis not found' in response.body"
>>> bool(a)
True
You need to have a different evaluator other than bool(key) there or you will always have an error.
Your conditions are strings, so they are not be evaluated.
You could evaluate your strings using eval(key) function, that is quite unsafe.
With the help of the operator module, there is no need to evaluate unsafe strings (as long as your conditions stay quite simple).
error['operator'] holds reference to the 'contains' function, which can be used as a replacement for 'in'.
from operator import contains
class ...:
def parse_detail_page(self, response):
error_value = False
output = ""
error_cases = [
{'search': ' pageis not found', 'operator': contains, 'output': 'invalid' },
{'search': 'has been transferred', 'operator': contains, 'output': 'invalid' },
]
for error in error_cases:
if error['operator'](error['search'], response.body):
error_value = True
output = error['output']
print output
if error_value:
for field in J1_Item.fields:
if field == 'case':
item[field] = id
else:
item[field] = output
else:
item['case'] = id
...