I am new to coding in Python and ran into an issue.
I have a list of domain names that I would like to get whois lookup information of.
I am using a for look to get whois information on every domain in the list called domain_name like this:
for i in domain_name:
print(whois.whois(i))
I am getting the results printed just fine. But I would like to save those results in a variable that I can make a list of dataframe out of.
How do I go about doing that?
thank you!
A list comprehension is appropriate here, useful if you are starting with one list and want to create a new one.
my_results = [whois.whois(i) for i in domain_name]
Will create a new list with the whois results.
Define the list you want to store them in before the loop then append them to it inside the loop:
my_container = []
for domain in domain_names:
my_container.append(whois.whois(domain))
Related
import requests
api="https://api.rootnet.in/covid19-in/stats/latest"
response=requests.get(api)
local_case_tracker=response.json()
print(local_case_tracker.items())
print(local_case_tracker['data'])
print(local_case_tracker['data']['regional'])
So I'm trying to build a Covid tracker(global and local, and by local i mean my own country) using python.If you run the code you get to see a really big and branched dictionary and I wish to access any one of the states in that dictionary (lets say Goa) but i cant do so, so I tried to break down the problem by doing this
print(local_case_tracker['data'])
print(local_case_tracker['data']['regional'])
Im able to fetch some results but when i try
print(local_case_tracker['data']['regional']['loc : Goa'])
i get a:
TypeError: list indices must be integers or slices, not str
Its a very stupid doubt but ive been scratching my head over this since the last 30 min.
local_case_tracker['data']['regional']is a Python List (of Dictionaries) so should be accessed by index, stepping through by or searching through. Try this to get the data for Goa.
for item in local_case_tracker['data']['regional']:
if item['loc'] == 'Goa':
print(item)
You could just save the item instead of printing it; it is a Dictionary so could access the values using the various keys. Also try this to see all the locations
for item in local_case_tracker['data']['regional']:
print(item['loc'])
This is in relation to web scraping, specifically scrapy. I want to be able to iterate an expression to create my items. As an example, lets say I import the item class as "item." In order to then store an item, I would have to code something like:
item['item_name'] = response.xpath('xpath')
My response is actually a function so it actually looks something like:
item['item_name'] = eval(xpath_function(n))
This works perfectly. However, how can I iterate this to create multiple items with different names without having to manually name each one? The code below does not work at all (and I didn't expect it to), but should give you an idea of what I am trying to accomplish:
for n in range(1, 10):
f"item['item_name{n}'] = eval(xpath_function(n))"
Basically trying to create 10 different items names item_name1 - item_name10. Hope that makes sense and I appreciate any help.
If you are just creating keys for your dictionary based on the value of n you could try something like:
for n in range(10):
item['item_name' + str(n+1)] = eval(xpath_function(n+1))
If you need to format the number (e.g. include leading zeros), you could use an f-string rather than concatenating the strings as I did.
[NB your for loop as written will only run from 1 to 9, so I have changed this in my answer.]
For my below code everything is working correctly, however I am trying to store my output in a list and I can't figure out how to do so. I tried to create an empty list and append the output to that list but it did not work. Any help would be great!
sample_photo_rep["photo"]["tags"]["tag"]
for sample_tags_list in sample_photo_rep["photo"]["tags"]["tag"]:
print [sample_tags_list['raw'].decode('utf-8')]
current output:
[u'Nature']
[u'Mist']
[u'Mountain']
correct output: [u'nature', u'mist', u'mountain']
In each loop, you're printing a list containing a single element, i.e. [u'Nature'], [u'Mountain'], etc.
If you remove the enclosing braces, i.e. [sample_tags_list['raw'].decode('utf-8')] to sample_tags_list['raw'].decode('utf-8'), you should just get the string.
Not sure why your append didn't work, as
output = []
for sample_tags_list in sample_photo_rep["photo"]["tags"]["tag"]:
output.append(sample_tags_list['raw'].decode('utf-8'))
should do the trick. A list comprehension would accomplish the same thing as in the answer from #abccd; both give the same output.
declare an empty list at the top of your code like this:
tags = []
Then, instead of printing it out in your for loop append it to the list:
for sample_tags_list in sample_photo_rep["photo"]["tags"]["tag"]:
tags.append([sample_tags_list['raw'].decode('utf-8')])
Then tags should be this:
[u'nature', u'mist', u'mountain']
Further Reading
Info on appending to lists: https://www.tutorialspoint.com/python/list_append.htm
Info on lists in general:
https://developers.google.com/edu/python/lists
You can always use list comprehension like this:
print [sample_tags_list['raw'].decode('utf-8') for sample_tags_list in sample_photo_rep["photo"]["tags"]["tag"]]
In place of your for loop. This is by far still the most preferred way of doing this. You can see the pydoc for a simple example of using list comps.
Let's say we have HTML like this (sorry, I don't know how to copy and paste page info and this is on an intranet):
And I want to get the highlighted portion for all of the questions (this is like a Stack Overflow page). EDIT: to be clearer, what I am interested in is getting a list that has:
['question-summary-39968',
'question-summary-40219',
'question-summary-42899',
'question-summary-34348',
'question-summary-32497',
'question-summary-35308',
...]
Now I know that a working solution is a list comprehension where I could do:
[item["id"] for item in html_df.find_all(class_="question-summary")]
But this is not exactly what I want. How can I directly access question-summary-41823 for the first item?
Also, what is the difference between soup.select and soup.get?
I thought I would post my answer here if it helps others.
What I am trying to do is access the id attribute within the question-summary class.
Now you can do something like this and obtain it for only the first item (object?):
html_df.find(class_="question-summary")["id"]
But you want it for all of them. So you could do this to get the class data:
html_df.select('.question-summary')
But you can't just do
html_df.select('.question-summary')["id"]
Because you have a list filled with bs4.elements. So you need to iterate over the list and select just the piece that you want. You could do a for loop but a more elegant way is to just use list comprehension:
[item["id"] for item in html_df.find_all(class_="question-summary")]
Breaking down what this does, it:
It first creates a list of all the question-summary objects from the soup
Iterates over each element in the list, which we've named item
Extracts the id attribute and adds it to the list
Alternatively you can use select:
[item["id"] for item in html_df.find_all(class_="question-summary")]
I prefer the first version because it's more explicit, but either one results in:
['question-summary-43960',
'question-summary-43953',
'question-summary-43959',
'question-summary-43947',
'question-summary-43952',
'question-summary-43945',
...]
My project has required this enough times that I'm hoping someone on here can give me an elegant way to write it.
I have a list of strings, and would like to filter out duplicates using a key/key-like functionality (like I can do with sorted([foo, key=bar)).
Most recently, I'm dealing with links.
Currently I have to create an empty list, and add in values if
Note: name is the name of the file the link links too -- just a regex matching
parsed_links = ["http://www.host.com/3y979gusval3/name_of_file_1",
"http://www.host.com/6oo8wha55crb/name_of_file_2",
"http://www.host.com/6gaundjr4cab/name_of_file_3",
"http://www.host.com/udzfiap79ld/name_of_file_6",
"http://www.host.com/2bibqho4mtox/name_of_file_5",
"http://www.host.com/4a31wozeljsp/name_of_file_4"]
links = []
[links.append(link) for link in parsed_links if not name(link) in
[name(lnk) for lnk in links]]
I want the final list to have the full links (so I can't just get rid of everything but the filenames and use set); but I'd like to be able to do this without creating an empty list every time.
Also, my current method seems inefficient (which is significant as it is often dealing with hundreds of links).
Any suggestions?
Why not just use a dictionary?
links = dict((name(link), link) for link in parsed_links)
If I understand your question correctly, your performance problems may come from the list comprehension that is repeatedly evaluated in a tight loop.
Try caching the result by putting the list comprehension outside of the loop, then use another comprehension instead of append() on an empty list:
linkNames = [name(lnk) for lnk in links]
links = [link in parsed_links if not name(link) in linkNames]