Reach a string behind unknown value in JSON - python

I use Wikipedia's API to get information about a page.
The API gives me JSON like this:
"query":{
"pages":{
"188791":{
"pageid":188791,
"ns":0,
"title":"Vanit\u00e9",
"langlinks":[
{
"lang":"bg",
"*":"Vanitas"
},
{
"lang":"ca",
"*":"Vanitas"
},
ETC.
}
}
}
}
You can see the full JSON response.
I want to obtain all entries like:
{
"lang":"ca",
"*":"Vanitas"
}
but the number key ("188791") in the pages object is the problem.
I found Find a value within nested json dictionary in python that explains me how to do enumerate the values.
Unfortunately I get the following exception:
TypeError: 'dict_values' object does not support indexing
My code is:
json["query"]["pages"].values()[0]["langlinks"]
It's probably a dumb question but I can't find a way to pass in the page id value.

One solution is to use the indexpageids parameter, e.g.: http://fr.wikipedia.org/w/api.php?action=query&titles=Vanit%C3%A9&prop=langlinks&lllimit=500&format=jsonfm&indexpageids. It will add an array of pageids to the response. You can then use that to access the dictionary.

As long as you're only querying one page at a time, Simeon Visser's answer will work. However, as a matter of good style, I'd recommend structuring your code so that you iterate over all the returned results, even if you know there should be only one:
for page in data["query"]["pages"].values():
title = page["title"]
langlinks = page["langlinks"]
# do something with langlinks...
In particular, by writing your code this way, if you ever find yourself needing to run the query for multiple pages, you can do it efficiently with a single MediaWiki API request.

You're using Python 3 and values() now returns a dict_values instead of a list. This is a view on the values of the dictionary.
Hence that's why you're getting that error because indexing fails. Indexing is possible in a list but not a view.
To fix it:
list(json["query"]["pages"].values())[0]["langlinks"]

If you really want just one page arbitrarily, do that the way Simeon Visser suggested.
But I suspect you want all langlinks in all pages, yes?
For that, you want a comprehension:
[page["langlinks"] for page in json["query"]["pages"].values()]
But of course that gives you a 2D list. If you want to iterate over each page's links, that's perfect. If you want to iterate over all of the langlinks at once, you want to flatten the list:
[langlink for page in json["query"]["pages"]
for langlink in page["langlinks"].values()]
… or…
itertools.chain.from_iterable(page["langlinks"]
for page in json["query"]["pages"].values())
(The latter gives you an iterator; if you need a list, wrap the whole thing in list. Conversely, for the first two, if you don't need a list, just any iterable, use parens instead of square brackets to get a generator expression.)

Related

Python- Insert new values into 'nested' list?

What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python.
In simple terms, from a list which produces output as follows:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}...
All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId.
So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below:
for market_book in market_books:
market_catalogues = trading.betting.list_market_catalogue(
market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"],
filter=betfairlightweight.filters.market_filter(
market_ids=[market_book.market_id],
),
max_results=100)
data = []
for market_catalogue in market_catalogues:
for runner in market_catalogue.runners:
data.append(
(runner.selection_id, runner.runner_name)
)
So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id.
I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array).
As per the request below, here is the full list:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}
i think i understand what you are trying to do now
first hold your data as a python object (you gave us a json object)
import json
my_data = json.loads(my_json_string)
for item in my_data['runners']:
item['selectionId'] = [item['selectionId'], my_name_here]
the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary
each item is a dicitonary so you can always also a new keys to it
item['new_key'] = my_value
So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop.
mkt_runners = []
for market_catalogue in market_catalogues:
for r in market_catalogue.runners:
mkt_runners.append((r.selection_id, r.runner_name))
for market_book in market_books:
for runner in market_book.runners:
for x in mkt_runners:
if runner.selection_id in x:
setattr(runner, 'x', x[1])
print(market_book.market_id, runner.x, runner.selection_id)
print(market_book.json())
So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!

Parsing JSON in Python (Reverse dictionary search)

I'm using Python and "requests" to practice the use of API. I've had success with basic requests and parsing, but having difficulty with list comprehension for a more complex project.
I requested from a server and got a dictionary. From there, I used:
participant_search = (match1_request['participantIdentities'])
To convert the values of the participantIdentities key to get the following data:
[{'player':
{'summonerName': 'Crescent Bladex',
'matchHistoryUri': '/v1/stats/player_history/NA1/226413119',
'summonerId': 63523774,
'profileIcon': 870},
'participantId': 1},
My goal here is to combine the summonerId and participantId to one list. Which is easy normally, but the order of ParticipantIdentities is randomized. So the player I want information on will sometimes be 1st on the list, and other times third.
So I can't use the var = list[0] like how I would normally do.
I have access to summonerId, so I'm thinking I can search the list the summonerId, then somehow collect all the information around it. For instance, if I knew 63523774 then I could find the key for it. From here, is it possible to find the parent list of the key?
Any guidance would be appreciated.
Edit (Clarification):
Here's the data I'm working with: http://pastebin.com/spHk8VP0
At line 1691 is where participant the nested dictionary 'participantIdentities' is. From here, there are 10 dictionaries. These 10 dictionaries include two nested dictionaries, "player" and "participantId".
My goal is to search these 10 dictionaries for the one dictionary that has the summonerId. The summonerId is something I already know before I make this request to the server.
So I'm looking for some sort of "search" method, that goes beyond "true/false". A search method that, if a value is found within an object, the entire dictionary (key:value) is given.
Not sure if I properly understood you, but would this work?
for i in range(len(match1_request['participantIdentities'])):
if(match1_request['participantIdentities'][i]['summonerid'] == '63523774':
# do whatever you want with it.
i becomes the index you're searching for.
ds = match1_request['participantIdentities']
result_ = [d for d in ds if d["player"]["summonerId"] == 12345]
result = result_[0] if result_ else {}
See if it works for you.
You can use a dict comprehension to build a dict wich uses summonerIds as keys:
players_list = response['participantIdentities']
{p['player']['summonerId']: p['participantId'] for p in players_list}
I think what you are asking for is: "How do I get the stats for a given a summoner?"
You'll need a mapping of participantId to summonerId.
For example, would it be helpful to know this?
summoner[1] = 63523774
summoner[2] = 44610089
...
If so, then:
# This is probably what you are asking for:
summoner = {ident['participantId']: ident['player']['summonerId']
for ident in match1_request['participantIdentities']}
# Then you can do this:
summoner_stats = {summoner[p['participantId']]: p['stats']
for p in match1_request['participants']}
# And to lookup a particular summoner's stats:
print summoner_stats[44610089]
(ref: raw data you pasted)

Parsing multiple occurrences of an item into a dictionary

Attempting to parse several separate image links from JSON data through python, but having some issues drilling down to the right level, due to what I believe is from having a list of strings.
For the majority of the items, I've had success with the below example, pulling back everything I need. Outside of this instance, everything is a 1:1 ratio of keys:values, but for this one, there are multiple values associated with one key.
resultsdict['item_name'] = item['attribute_key']
I've been adding it all to a resultsdict={}, but am only able to get to the below sample string when I print.
INPUT:
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
OUTPUT (only relevant section):
'images': [{u'VariationSpecificPictureSet': [{u'PictureURL': [u'http//imagelink1'], u'VariationSpecificValue': u'color1'}, {u'PictureURL': [u'http//imagelink2'], u'VariationSpecificValue': u'color2'}, {u'PictureURL': [u'http//imagelink3'], u'VariationSpecificValue': u'color3'}, {u'PictureURL': [u'http//imagelink4'], u'VariationSpecificValue': u'color4'}]
I feel like I could add ['VariationPictureSet']['PictureURL'] at the end of my initial input, but that throws an error due to the indices not being integers, but strings.
Ideally, I would like to see the output as a simple comma-separated list of just the URLs, as follows:
OUTPUT:
'images': http//imagelink1, http//imagelink2, http//imagelink3, http//imagelink4
An answer to your comment that required a bit of code to it.
When using
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
you get a list with one element, so I recommend using this
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures'][0]
now you can use
for image in resultsdict['images']['VariationsSpecificPictureSet']:
print(image['PictureUR‌​L'])
Thanks for the help, #uzzee, it's appreciated. I kept tinkering with it and was able to pull the continuous string of all the image URLs with the following code.
resultsdict['images'] = sum([x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']],[])
Without the sum it looks like this and pulls in the whole list of lists...
resultsdict['images'] = [x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']]

Parsing data with pattern

I'm parsing some data with the pattern as follows:
tagA:
titleA
dataA1
dataA2
dataA3
...
tagB:
titleB
dataB1
dataB2
dataB3
...
tagC:
titleC
dataC1
dataC2
...
...
These tags are stored in a list list_of_tags, if I iterate through the list, I can get all the tags; also, if iterating through the tags, I can get the title and the data associated with the title.
The tags in my data are pretty much something like <div>, so they are not useful to me; what I'm trying to do is to construct a dictionary which uses titles as keys and datas as a list of values.
The constructed dictionary would look like:
{
titleA: [dataA1, dataA2, dataA3...],
titleB: [dataB1, dataB2, dataB3...],
...
}
Notice every tag only contains one title and some datas, and title always comes before data.
So here are my working codes:
Method 1:
result = {}
for tag in list_of_tags:
list_of_values = []
for idx, elem in enumerate(tag):
if not idx:
key = elem
else:
construct_list_of_values()
update_the_dictionary()
Actually, method 1 works fine and gives me my desired result; however, if I put that piece of codes in PyCharm, it warns me that "Local variable 'key' might be referenced before assignment" at the last line. Hence, I try another approach:
Method 2:
result = {tag[0]: tag[1:] for tag in list_of_tags}
Method 2 works fine if tags are lists, but I also want the code to work normally if tags are generators ('generator' object is not subscriptable will occur with method 2)
In order to work with generators, I come up with:
Method 3:
key_val_list = [(next(tag), list(tag)) for tag in list_of_tags]
result = dict(key_val_list)
Method 3 also works; but I cannot write this in dictionary comprehension ({next(tag): list(tag) for tag in list_of_tags} would give StopIteration exception because list(tag) will be evaluated first)
So, my question is, is there an elegant way for dealing with this pattern which could work no matter tags are lists or generators? (method 1 seems to work for both, but I don't know if I should ignore the warning PyCharms gives; the other two methods looks more concise, but one can only work on lists while the other can only work on generators)
Sorry for the long question, thanks for the patience!
I guess the reason why PyCharm is giving you a warning is that you are using key in update_the_dictionary, but key could be left unassigned if tag does not contain at least one element. You might have the knowledge that the title will always be in the list, but the static analyzer is not able to infer that from the context.
If you are using Python 3, you might want to try using PEP 3132 - Extended Iterable Unpacking. It should work for both lists and generators.
e.g.
title, *data = tag

Mongodb: How to change an element of a nested arrary?

From what I have read it is impossible to update an element in an nested array using the positional operator $ in mongo. The $ only works one level deep. I see it is a requested feature in mongo 2.7.
Updating the whole document one level up is not an option because of write conflicts. I need to just be able to change the 'username' for a particular reward program for instance.
One of the ideas would to be pull, modify, and push the entire 'reward_programs' element but then I would loose the order. Order is important.
Consider this document:
{
"_id:"0,
"firstname":"Tom",
"profiles" : [
{
"profile_name": "tom",
"reward_programs:[
{
'program_name':'American',
'username':'tomdoe',
},
{
'program_name':'Delta',
'username':'tomdoe',
}
]
}
]
}
How would you go about specifically changing the 'username' of 'program_name'=Delta?
After doing more reading it looks like this is unsupported in mongodb at the moment. Positional updates are only supported for one level deep. The feature might be added for mongodb 2.7.
The are a couple of work arounds.
1) Flatten out your database structure. In this case, make 'reward_programs' it's own collection and do your operation on that.
2) Instead of arrays of dicts, use dicts of dicts. That way you can just have an absolute path down to the object you need to modify. This can have drawbacks to query flexibility.
3) Seems hacky to me but you can also walk the list on the nested array find it's position index in the array and do something like this:
users.update({'_id': request._id, 'profiles.profile_name': profile_name}, {'$set': {'profiles.$.reward_programs.{}.username'.format(index): new_username}})
4) Read in the whole document, modify, write back. However, this has possible write conflicts
Setting up your database structure initially is extremely important. It really depends on how you are going to use it.
A simple way to do this:
doc = collection.find_one({'_id': 0})
doc['profiles'][0]["reward_programs"][1]['username'] = 'new user name'
#replace the whole collection
collection.save(doc)

Categories