Find value matching value in a list of dicts - python

I have a list of dicts that looks like this:
serv=[{'scheme': 'urn:x-esri:specification:ServiceType:DAP',
'url': 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/air.mon.anom.nobs.nc'},
{'scheme': 'urn:x-esri:specification:ServiceType:WMS',
'url': 'http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities'},
{'scheme': 'urn:x-esri:specification:ServiceType:WCS',
'url': 'http://ferret.pmel.noaa.gov/geoide/wcs/Datasets/air.mon.anom.nobs.nc?service=WCS&version=1.0.0&request=GetCapabilities'}]
and I want to find the URL corresponding to the ServiceType:WMS which means finding the value of url key in the dictionary from this list where the scheme key has value urn:x-esri:specification:ServiceType:WMS.
So I've got this that works:
for d in serv:
if d['scheme']=='urn:x-esri:specification:ServiceType:WMS':
url=d['url']
print url
which produces
http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities
but I've just watched Raymond Hettinger's PyCon talk and at the end he says that that if you can say it as a sentence, it should be expressed in one line of Python.
So is there a more beautiful, idiomatic way of achieving the same result, perhaps with one line of Python?
Thanks,
Rich

The serv array you listed looks like a dictionary mapping schemes to URLs, but it's not represented as such. You can easily convert it to a dict using list comprehensions, though, and then use normal dictionary lookups:
url = dict([(d['scheme'],d['url']) for d in serv])['urn:x-esri:specification:ServiceType:WMS']
You can, of course, save the dictionary version for future use (at the cost of using two lines):
servdict = dict([(d['scheme'],d['url']) for d in serv])
url = servdict['urn:x-esri:specification:ServiceType:WMS']

If you're only interested in one URL, then you can build a generator over serv and use next with a default value for the cases where a match isn't found, eg:
url = next((dct['url'] for dct in serv if dct['scheme'] == 'urn:x-esri:specification:ServiceType:WMS'), 'default URL / not found')

I would split this into two lines, to separate the target from the url retrieval. This is because your target may change in time, so this should not be hardwired. The single line of code follows.
I would use in instead of == as we want to search for all schemes that are of this type. This adds more flexibility, and readability, assuming this will not also catch other schemes not wanted. But from the description, this is the functionality desired.
target = "ServiceType:WMS"
url = [d['url'] for d in serv if target in d['scheme']]
Also, note, this returns a list in all cases, in case there is more than one match, so you will have to loop over url in the code that uses this.

How about this?
urls = [d['url'] for d in serv if d['scheme'] == 'urn:x-esri:specification:ServiceType:WMS']
print urls # ['http://www.esrl.noaa.gov/psd/thredds/wms/Datasets/air.mon.anom.nobs.nc?service=WMS&version=1.3.0&request=GetCapabilities']
Its doing the same thing your code is doing, where d['url'] are being appended to the list - urls if they end with WMS
You can even add an else clause:
urls = [i['url'] for i in serv if i['scheme'].endswith('WMS') else pass]

I've been trying to work in more functional programming into my own work, so here is a pretty simple functional way:
needle='urn:x-esri:specification:ServiceType:WMS'
url = filter( lambda d: d['scheme']==needle, serv )[0]['url']
filter takes as arguments a function that returns a boolean and a list to be filtered. It returns a list of elements that return True when passed to the boolean-returning function (in this case a lambda I defined on the fly). So, to finally get the url, we have to take the zeroth element of the list that filter returns. Since that is the dict containing our desired url, we can tag ['url'] on the end of the whole expression to get the corresponding dictionary entry.

Related

Parsing JSON in Python (Reverse dictionary search)

I'm using Python and "requests" to practice the use of API. I've had success with basic requests and parsing, but having difficulty with list comprehension for a more complex project.
I requested from a server and got a dictionary. From there, I used:
participant_search = (match1_request['participantIdentities'])
To convert the values of the participantIdentities key to get the following data:
[{'player':
{'summonerName': 'Crescent Bladex',
'matchHistoryUri': '/v1/stats/player_history/NA1/226413119',
'summonerId': 63523774,
'profileIcon': 870},
'participantId': 1},
My goal here is to combine the summonerId and participantId to one list. Which is easy normally, but the order of ParticipantIdentities is randomized. So the player I want information on will sometimes be 1st on the list, and other times third.
So I can't use the var = list[0] like how I would normally do.
I have access to summonerId, so I'm thinking I can search the list the summonerId, then somehow collect all the information around it. For instance, if I knew 63523774 then I could find the key for it. From here, is it possible to find the parent list of the key?
Any guidance would be appreciated.
Edit (Clarification):
Here's the data I'm working with: http://pastebin.com/spHk8VP0
At line 1691 is where participant the nested dictionary 'participantIdentities' is. From here, there are 10 dictionaries. These 10 dictionaries include two nested dictionaries, "player" and "participantId".
My goal is to search these 10 dictionaries for the one dictionary that has the summonerId. The summonerId is something I already know before I make this request to the server.
So I'm looking for some sort of "search" method, that goes beyond "true/false". A search method that, if a value is found within an object, the entire dictionary (key:value) is given.
Not sure if I properly understood you, but would this work?
for i in range(len(match1_request['participantIdentities'])):
if(match1_request['participantIdentities'][i]['summonerid'] == '63523774':
# do whatever you want with it.
i becomes the index you're searching for.
ds = match1_request['participantIdentities']
result_ = [d for d in ds if d["player"]["summonerId"] == 12345]
result = result_[0] if result_ else {}
See if it works for you.
You can use a dict comprehension to build a dict wich uses summonerIds as keys:
players_list = response['participantIdentities']
{p['player']['summonerId']: p['participantId'] for p in players_list}
I think what you are asking for is: "How do I get the stats for a given a summoner?"
You'll need a mapping of participantId to summonerId.
For example, would it be helpful to know this?
summoner[1] = 63523774
summoner[2] = 44610089
...
If so, then:
# This is probably what you are asking for:
summoner = {ident['participantId']: ident['player']['summonerId']
for ident in match1_request['participantIdentities']}
# Then you can do this:
summoner_stats = {summoner[p['participantId']]: p['stats']
for p in match1_request['participants']}
# And to lookup a particular summoner's stats:
print summoner_stats[44610089]
(ref: raw data you pasted)

Parsing multiple occurrences of an item into a dictionary

Attempting to parse several separate image links from JSON data through python, but having some issues drilling down to the right level, due to what I believe is from having a list of strings.
For the majority of the items, I've had success with the below example, pulling back everything I need. Outside of this instance, everything is a 1:1 ratio of keys:values, but for this one, there are multiple values associated with one key.
resultsdict['item_name'] = item['attribute_key']
I've been adding it all to a resultsdict={}, but am only able to get to the below sample string when I print.
INPUT:
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
OUTPUT (only relevant section):
'images': [{u'VariationSpecificPictureSet': [{u'PictureURL': [u'http//imagelink1'], u'VariationSpecificValue': u'color1'}, {u'PictureURL': [u'http//imagelink2'], u'VariationSpecificValue': u'color2'}, {u'PictureURL': [u'http//imagelink3'], u'VariationSpecificValue': u'color3'}, {u'PictureURL': [u'http//imagelink4'], u'VariationSpecificValue': u'color4'}]
I feel like I could add ['VariationPictureSet']['PictureURL'] at the end of my initial input, but that throws an error due to the indices not being integers, but strings.
Ideally, I would like to see the output as a simple comma-separated list of just the URLs, as follows:
OUTPUT:
'images': http//imagelink1, http//imagelink2, http//imagelink3, http//imagelink4
An answer to your comment that required a bit of code to it.
When using
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
you get a list with one element, so I recommend using this
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures'][0]
now you can use
for image in resultsdict['images']['VariationsSpecificPictureSet']:
print(image['PictureUR‌​L'])
Thanks for the help, #uzzee, it's appreciated. I kept tinkering with it and was able to pull the continuous string of all the image URLs with the following code.
resultsdict['images'] = sum([x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']],[])
Without the sum it looks like this and pulls in the whole list of lists...
resultsdict['images'] = [x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']]

Getting positions from Python lists to generate a dynamic range

I have a list being built in Python, using this code:
def return_hosts():
'return a list of host names'
with open('./tfhosts') as hosts:
return [host.split()[1].strip() for host in hosts]
The format of tfhosts is that of a hosts file, so what I am doing is taking the hostname portion and populating that into a template, so far this works.
What I am trying to do is make sure that even if more hosts are added they're put into a default section as the other host sections are fixed, this part however I would like to be dynamic, to do that I've got the following:
rendered_inventory = inventory_template.render({
'host_main': gethosts[0],
'host_master1': gethosts[1],
'host_master2': gethosts[2],
'host_spring': gethosts[3],
'host_default': gethosts[4:],
})
Everything is rendered properly except the last host under the host_default section, instead of getting a newline separated lists of hosts, like this (which is what I want):
[host_default]
dc01-worker-02
dc01-worker-03
It just write out the remaining hostnames in a single list, as (which I don't want):
[host_default]
['dc01-worker-02', 'dc01-worker-03']
I've tried to wrap the host default section and split it, but I get a runtime error if I try:
[gethosts[4:].split(",").strip()...
I believe gethosts[4:] returns a list, if gethosts is a list (which seems to be the case) , hence it is directly writing the list to your file.
Also, you cannot do .split() on a list (I guess you hoped to do .split on the string, but gethosts[4:] returns a list). I believe an easy way out for you would be to join the strings in the list using str.join with \n as the delimiter. Example -
rendered_inventory = inventory_template.render({
'host_main': gethosts[0],
'host_master1': gethosts[1],
'host_master2': gethosts[2],
'host_spring': gethosts[3],
'host_default': '\n'.join(gethosts[4:]),
})
Demo -
>>> lst = ['dc01-worker-02', 'dc01-worker-03']
>>> print('\n'.join(lst))
dc01-worker-02
dc01-worker-03
If you own the template, a cleaner approach would be to loop through the list for host_default and print each element in the template. Example you can try using a for loop construct in the jinja template.

Parsing data with pattern

I'm parsing some data with the pattern as follows:
tagA:
titleA
dataA1
dataA2
dataA3
...
tagB:
titleB
dataB1
dataB2
dataB3
...
tagC:
titleC
dataC1
dataC2
...
...
These tags are stored in a list list_of_tags, if I iterate through the list, I can get all the tags; also, if iterating through the tags, I can get the title and the data associated with the title.
The tags in my data are pretty much something like <div>, so they are not useful to me; what I'm trying to do is to construct a dictionary which uses titles as keys and datas as a list of values.
The constructed dictionary would look like:
{
titleA: [dataA1, dataA2, dataA3...],
titleB: [dataB1, dataB2, dataB3...],
...
}
Notice every tag only contains one title and some datas, and title always comes before data.
So here are my working codes:
Method 1:
result = {}
for tag in list_of_tags:
list_of_values = []
for idx, elem in enumerate(tag):
if not idx:
key = elem
else:
construct_list_of_values()
update_the_dictionary()
Actually, method 1 works fine and gives me my desired result; however, if I put that piece of codes in PyCharm, it warns me that "Local variable 'key' might be referenced before assignment" at the last line. Hence, I try another approach:
Method 2:
result = {tag[0]: tag[1:] for tag in list_of_tags}
Method 2 works fine if tags are lists, but I also want the code to work normally if tags are generators ('generator' object is not subscriptable will occur with method 2)
In order to work with generators, I come up with:
Method 3:
key_val_list = [(next(tag), list(tag)) for tag in list_of_tags]
result = dict(key_val_list)
Method 3 also works; but I cannot write this in dictionary comprehension ({next(tag): list(tag) for tag in list_of_tags} would give StopIteration exception because list(tag) will be evaluated first)
So, my question is, is there an elegant way for dealing with this pattern which could work no matter tags are lists or generators? (method 1 seems to work for both, but I don't know if I should ignore the warning PyCharms gives; the other two methods looks more concise, but one can only work on lists while the other can only work on generators)
Sorry for the long question, thanks for the patience!
I guess the reason why PyCharm is giving you a warning is that you are using key in update_the_dictionary, but key could be left unassigned if tag does not contain at least one element. You might have the knowledge that the title will always be in the list, but the static analyzer is not able to infer that from the context.
If you are using Python 3, you might want to try using PEP 3132 - Extended Iterable Unpacking. It should work for both lists and generators.
e.g.
title, *data = tag

Grouping list of similar urls in python

I have a large sets of urls. Some are similar to each other i.e. they represent the similar set of pages.
For eg.
http://example.com/product/1/
http://example.com/product/2/
http://example.com/product/40/
http://example.com/product/33/
are similar. Similarly
http://example.com/showitem/apple/
http://example.com/showitem/banana/
http://example.com/showitem/grapes/
are also similar. So i need to represent them as http://example.com/product/(Integers)/
where (Integers) = 1,2,40,33 and http://example.com/showitem/(strings)/ where strings = apple,banana,grapes ... and so on.
Is there any inbuilt function or library in python to do find these similar urls from large set of mixed urls? How can this be done more efficiently? Please suggest. Thanks in advance.
Use a string to store the first part of the URL and just handle IDs, example:
In [1]: PRODUCT_URL='http://example.com/product/%(id)s/'
In [2]: _ids = '1 2 40 33'.split() # split string into list of IDs
In [3]: for id in _ids:
...: print PRODUCT_URL % {'id':id}
...:
http://example.com/product/1/
http://example.com/product/2/
http://example.com/product/40/
http://example.com/product/33/
The statement print PRODUCT_URL % {'id':id} uses Python string formatting to format the product URL depending on the variable id passed.
UPDATE:
I see you've changed your question. The solution for your problem is quite domain-specific and depends on your data set. There are several approaches, some more manual than others. One such approach would be to get the top-level URLs i.e. to retrieve the domain name:
In [7]: _url = 'http://example.com/product/33/' # url we're testing with
In [8]: ('/').join(_url.split('/')[:3]) # get domain
Out[8]: 'http://example.com'
In [9]: ('/').join(_url.split('/')[:4]) # get domain + first URL sub-part
Out[9]: 'http://example.com/product'
[:3] and [:4] above are just slicing the list resulting from split('/')
You can set the result as a key on a dict for which you keep a count of each time you encounter the URL part. And move on from there. Again the solution depends on your data. If it gets more complex than above then I suggest you look into regex as the other answers suggest.
You can use regular expressions to handle that cases. You can go to the Python documentation to see how is this handle.
Also you can see how Django implement this on its routings system
I'm not exactly sure what specifically you are looking for. It sounds to me that you are looking for something to match URLs. If this is indeed what you want then I suggest you use something that is built using regular expressions. One example can be found here.
I also suggest you take a look at Django and its routing system.
Not in Python, but I've created a Ruby Library (and an accompanying app) --
https://rubygems.org/gems/LinkGrouper
It works on all links (doesn't need to know any pattern).

Categories