This may seem like the worlds simplest python question... But I'm going to give it a go of explaining it.
Basically I have to loop through pages of json results from a query.
the standard result is this
{'result': [{result 1}, {result 2}], 'next_page': '2'}
I need the loop to continue to loop, appending the list in the result key to a var that can be later accessed and counted the amount of results within the list. However I require it to loop only while next_page exists as after a while when there are no more pages the next_page key is dropped from the dict.
currently i have this
next_page = True
while next_page == True:
try:
next_page_result = get_results['next_page'] # this gets the next page
next_url = urllib2.urlopen("http://search.twitter.com/search.json" + next_page_result)# this opens the next page
json_loop = simplejson.load(next_url) # this puts the results into json
new_result = result.append(json_loop['results']) # this grabs the result and "should" put it into the list
except KeyError:
next_page = False
result_count = len(new_result)
Alternate (cleaner) approach, making one big list:
results = []
res = { "next_page": "magic_token_to_get_first_page" }
while "next_page" in res:
fp = urllib2.urlopen("http://search.twitter.com/search.json" + res["next_page"])
res = simplejson.load(fp)
fp.close()
results.extend(res["results"])
new_result = result.append(json_loop['results'])
The list is appended as a side-effect of the method call.
append() actually returns None, so new_result is now a reference to None.
You want to use
result.append(json_loop['results']) # this grabs the result and "should" put it into the list
new_result = result
if you insist on doing it that way. As Bastien said, result.append(whatever) == None
AFAICS, you don't need the variable new_result at all.
result_count = len(result)
will give you the answer you need.
you cannot append into a dict..you can append into your list inside your dict,you should do like this
result['result'].append(json_loop['results'])
if you want to check if there is no next page value in your result dict,and you want to delete the key from the dict,just do like this
if not result['next_page']:
del result['next_page']
Related
I was wondering if there is any way to save the first result of a for loop as a variable, and not the last. I wanted to get the first result of searching on youtube, following the code in https://github.com/ytdl-org/youtube-dl. The for loop is something like this,
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
vids = 'https://www.youtube.com' + vid['href']
But I want the first result, not the last.
I guess you want something like this
lst = [1,2,3,4,5]
so you want to get 1, the first element. You can do lst_saved = lst[0]
but if you want to use a loop and get the first element, you could use
for i in lst:
lst_saved = i
break
but that's... awful
You can do something like
test = 0
for i in range(5):
if i == 0:
test = i
Is this what you wanted?
Do you just want the first element, or are you trying to complete the loop then access the first one?
Either way, I'd avoid a loop, and doing something else.
For the former, you can just ignore the rest of the list do something like this since the "list[0]" syntax will just pick out the first item in the list:
what_you_want = 'https://www.youtube.com/' + soup.findAll(attrs={'class':'yt-uix-tile-link'})[0]['href']
Or otherwise, I'd recommend using list concatenation so that you're not overwriting the variable each time you loop. You can achieve the same thing multiple ways, but here's how I'd go:
# first get your list of links
list_of_things = ['https://www.youtube.com/' + vid['href'] for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}]
# then pick out the first item in your list
what_you_want = list_of_things[0]
I have a list of dicts. For each dict, I need to pass one of its properties to a function, and then assign a new property based on the result of that function.
For example:
I have a list of pages of a site. I need to loop through them and, based on the URL, assign the author name to a property in the dict.
for index, page in enumerate(pages):
pages[index]['author'] = get_author(page['url'])
This works. But's cluttered and doesn't feel pythonic. pages[index] feels like a thing that I shouldn't have to do in Python.
Is there a way to do this via a list comprehension? Or some other more pythonic way?
pages = [??? for page in pages]
You could use such a list comprehension:
result = [{**page, 'author': get_author(page['url'])}
for page in pages]
# This works too:
result = [dict(**page, author=get_author(page['url']))
for page in pages]
# but is less preferred because it will fail for input containing non-string keys
This creates a new dict for each original dict with an extra key, author, based on the value of get_author as applied to the value corresponding to the url key.
Note that it does not modify the original list.
Example:
def get_author(i):
if i == 1:
return 'hello'
else:
return 'bye'
pages = [{'url': 1},
{'url': 2}]
result = [{**page, **{'author': get_author(page['url'])}} for page in pages]
print(result)
Output:
[{'url': 1, 'author': 'hello'}, {'url': 2, 'author': 'bye'}]
A list comprehension builds a list from an existing list. In your case, you want to update existing list items. A list comprehension is inappropriate for this job.
Your solution can be somewhat improved, though:
for page in pages:
page['author'] = get_author(page['url'])
I can see two "pythonic" solutions:
Creating a new dict out of unpacked page and the new value of 'author':
pages = [{**p, 'author': get_author(p['url'])} for p in pages]
A little trick using 'or' operator:
pages = [p.update(author=get_author(p['url'])) or p for p in pages]
Since p.update() returns None, None or p will always be the updated version of p.
The first one seems to be more readable but I believe that the second one beats the first one in performance.
Hey guys I need a bit of guidance with this problem ( .py noobie)
So I have a list of websites that have different status codes:
url_list=["http://www.ehow.com/foo-barhow_2323550_clean-coffee-maker-vinegar.html",
"http://www.google.com",
"http://livestrong.com/register/confirmation/",
"http://www.facebook.com",
"http://www.youtube.com"]
What i'm trying to return is a dictionary that returns the website's status code as key and the associated websites as values. Something like that:
result= {"200": ["http://www.google.com",
"http://www.facebook.com",
"http://www.youtube.com"],
"301": ["http://livestrong.com/register/confirmation/"],
"404": ["http://www.ehow.com/foo-barhow_2323550_clean-coffee-maker-vinegar.html"]}
What I have till now:
Function that gets the status code:
def code_number(url):
try:
u = urllib2.urlopen(url)
code = u.code
except urllib2.HTTPError, e:
code = e.code
return code
And a function should return the dictionary but is not working - the part where i got stuck. Basically I dont know how to make it insert in the same status code more than 1 url
result={}
def get_code(list_of_urls):
for n in list_of_urls:
code = code_number(n)
if n in result:
result[code] = n
else:
result[code] = n
return result
Any ideas please?! Thank you
collections.defaultdict makes this a breeze:
import collections
def get_code(list_of_urls):
result = collections.defaultdict(list)
for n in list_of_urls:
code = code_number(n)
result[code].append(n)
return result
Not sure why you had result as a global, since it's returned as the function's result anyway (avoid globals except when really indispensable... locals are not only a structurally better approach, but also faster to access).
Anyway, the collections.defaultdict instance result will automatically call the list argument, and thus make an empty list, to initialize any entry result[n] that wasn't yet there at the time of indexing; so you can just append to the entry without needing to check whether it was previously there or not. That is the super-convenient idea!
If for some reason you want a plain dict as a result (though I can't think of any sound reason for needing that), just return dict(result) to convert the defaultdict into a plain dict.
You could initialize every key of the dict with a list, to which you will append any websites that return the same status code. Example:
result={}
def get_code(list_of_urls):
for n in list_of_urls:
code = code_number(n)
if code in result:
result[code].append(n)
else:
result[code] = [n]
return result
I also think that the condition should be if code in result, since your keys are the return codes.
In web2py I have been trying to break down this list comprehension so I can do what I like with the categories it creates. Any ideas as to what this breaks down to?
def menu_rec(items):
return [(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)) for x in items or []]
In addition the following is what uses it:
response.menu = [(SPAN('Catalog', _class='highlighted'), False, '',
menu_rec(db(db.category).select().as_trees()) )]
So far I've come up with:
def menu_rec(items):
for x in items:
return x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children))
I've got other variations of this but, every variation only gives me back 1(one) category, when compared to the original that gives me all the categories.
Can anyone see where I'm messing this up at? Any and all help is appreciated, thank you.
A list comprehension builds a list by appending:
def menu_rec(items):
result = []
for x in items or []:
url = URL('shop', 'category', args=pretty_url(x.id, x.slug))
menu = menu_rec(x.children) # recursive call
result.append((x.title, None, url, menu))
return result
I've added two local variables to break up the long line somewhat, and to show how it recursively calls itself.
Your version returned directly out of the for loop, during the first iteration, and never built up a list.
You don't want to do return. Instead append to a list and then return the list:
def menu_rec(items):
result = []
for x in items:
result.append(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)))
return result
If you do return, it will return the value after only the first iteration. Instead, keep adding it to a list and then return that list at the end. This will ensure that your result list only gets returned when all the values have been added instead of just return one value.
I'm using the following code:
def recentchanges(bot=False,rclimit=20):
"""
#description: Gets the last 20 pages edited on the recent changes and who the user who edited it
"""
recent_changes_data = {
'action':'query',
'list':'recentchanges',
'rcprop':'user|title',
'rclimit':rclimit,
'format':'json'
}
if bot is False:
recent_changes_data['rcshow'] = '!bot'
else:
pass
data = urllib.urlencode(recent_changes_data)
response = opener.open('http://runescape.wikia.com/api.php',data)
content = json.load(response)
pages = tuple(content['query']['recentchanges'])
for title in pages:
return title['title']
When I do recentchanges() I only get one result. If I print it though, all the pages are printed.
Am I just misunderstanding or is this something relating to python?
Also, opener is:
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
Once a return statment is reached in a function, that functions execution ends, so the second return does not get executed. In order to return both values you need to pack them in a list or tuple:
...
returnList = [title['title'] for title in pages]
return returnList
This uses list comprehension to make a list of all the object you want the function to return and then returns it.
Then you can unpackage individual results from the return list:
answerList = recentchanges()
for element in answerList:
print element
The problem you are having is that a function ends at the first return line it sees.
So. in the line
for title in pages:
return title['title']
It returns only the first value: pages[0]['title'].
One way around this is to use a list-comprehension i.e.
return [ title['title'] for title in pages ]
Another option is to make recentchanges a generator and use yield.
for title in pages:
yield title['title']
return ends the function. So the loop only executes once, because you're returning in the loop. Think about it: how would the caller get subsequent values once the first value has been returned? Would they have to call the function again? But that would start it over again. Should Python wait until the loop is complete to return all the values at once? But where would they go and how would Python know to do this?
You might provide a generator here by yielding rather than returning it. You could also just return a generator:
return (page['title'] for page in pages)
Either way, the caller can then convert it to a list if desired, or iterate over it directly:
titles = list(recentchanges())
# or
for title in recentchanges():
print title
Alternatively, you can just return the list of titles:
return [page['title'] for page in pages]
Since you use return, your function will end after returning first value.
There are two alternatives;
you can append the titles to a list and return that, or
you can use yield instead of return to turn your function into a generator.
The latter is probably more pythonic, because you could then us it like this:
for title in recentchanges():
# do something with the title
pass