Passing multiple items from a list into a function - python

I am relatively new to coding in python and currently trying to build my first web scraper.
I have created a function that puts all the links I want to get data from into a list.
['url1.com', 'url2.com', 'url3.com'...]
I now want to pass every single item in the list (url) to the function that actually gets the information.
I have tried to solve this with *args but I get the error that the urlopen() function only takes 3 arguments. So I have to find a way to pass each item/url individually into the function.
I know this is probably a quite easy thing to do but I have been stuck on this for a couple of days now and wasn't able to figure it our yet.
I would be very grateful if someone could point me into the right direction.
Thank you!

A list is iterable, so you can access all of its contents via a for loop and parse them individually.
my_list ['url1.com', 'url2.com', 'url3.com']
for item in my_list:
foo(item)

You need to pass each url to the urlopen() function, not pass all the urls at once.
my_urls = {'url1.com', 'url2.com', 'url3.com'}
read_urls(url):
urlopen(url)
.... do some operation
for url in urls:
read_urls(url)

Related

How to print specific json data for multiple instances?

import requests
url = "https://api.gametools.network/bf1/players/?gameId=7218126050214"
r = requests.get(url)
data = r.json()
print(data['teams'][1]['players'][0]['name'])
print(data['teams'][1]['players'][1]['name'])
print(data['teams'][1]['players'][2]['name'])
print(data['teams'][1]['players'][3]['name'])
print(data['teams'][1]['players'][4]['name'])
print(data['teams'][1]['players'][5]['name'])
print(data['teams'][1]['players'][6]['name'])
print(data['teams'][1]['players'][7]['name'])
print(data['teams'][1]['players'][8]['name'])
print(data['teams'][1]['players'][9]['name'])
print(data['teams'][1]['players'][10]['name'])
print(data['teams'][1]['players'][11]['name'])
print(data['teams'][1]['players'][12]['name'])
print(data['teams'][1]['players'][13]['name'])
print(data['teams'][1]['players'][14]['name'])
print(data['teams'][1]['players'][15]['name'])
print(data['teams'][1]['players'][16]['name'])
print(data['teams'][1]['players'][17]['name'])
print(data['teams'][1]['players'][18]['name'])
print(data['teams'][1]['players'][19]['name'])
print(data['teams'][1]['players'][20]['name'])
This is a snippet of my code that I think could be really improved to save some space, the issue is I can't seem to find a way to enumerate all the players name beside doing them one per one like shown above, my goal is to have a list that will show every username for both team of a given server.
I should mention the code is currently working but I feel like there is a much simpler way to obtain the data I need, I did search for these methods but because my data is nested in a list/dictionary I get confused.
Any suggestion is welcome, thank you :)
Here is how to iterate over a list:
l = [1, 2, 3, 4]
for element in l:
print(l)
Once you know that, it's only a matter of breaking nested iteration problems down to their simplest form above. So you just need to get the list which you need to iterate over, and then use the above sample you already know:
l = data['teams'][1]['players']
for element in l:
print(element['name'])
Let's write a simple function that accepts the data from a team and use a for loop to iterate over the players.
def printTeamNames(team):
print(team["teamid"])
for player in team["players"]: print(player["name"])
Then we can pass that function that data for each team.
printTeamNames(data["teams"][0])
printTeamNames(data["teams"][1])

TypeError: string indices must be integers when making rest api request

When I try to parse a rest api data,
it raises TypeError.
This is my code:
def get_contracts():
response_object = requests.get(
"https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
)
print(response_object.status_code)
for contract in response_object.json()["result"]["book"]:
print(contract["asks"])
get_contracts()
Any tip or solution will be very welcomed. Thanks in advance.
Edit/Update:
For some reason I am not able to select a specific key in the format above, its only possible if I do it like this:
data = response_object.json()['result']['book']['asks']
print(data)
I will try to work my code around that. Thanks for everyone who helped.
This code review may help you:
import requests
url = "https://testnet-api.phemex.com/md/orderbook?symbol=BTCUSD"
response_object = requests.get(url)
data = response_object.json()
# Printing your data helps to inspect the structure
# print(data)
# This is the list you are looking for:
asks = data['result']['book']['asks']
for ask in asks:
print(ask)
You need to iterate through asks, not book.
You have a nested dictionary where asks is a nested list.
If you simply click on the link you get getting, or print out your response_object.json() you would see the structure.
for foo in response_object.json()['result']['book']['asks']:
print(foo)
Although generally it's better to assign your response_object to a variable.
data = response_object.json()
for foo in data['result']['book']['asks']:
print(foo)
It looks like you are trying to access something that is not there, hence the KeyError.
I would debug, a simple print, the JSON object you are getting as answer and make sure that the keys you are trying to access are there.

Python- Insert new values into 'nested' list?

What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python.
In simple terms, from a list which produces output as follows:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}...
All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId.
So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below:
for market_book in market_books:
market_catalogues = trading.betting.list_market_catalogue(
market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"],
filter=betfairlightweight.filters.market_filter(
market_ids=[market_book.market_id],
),
max_results=100)
data = []
for market_catalogue in market_catalogues:
for runner in market_catalogue.runners:
data.append(
(runner.selection_id, runner.runner_name)
)
So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id.
I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array).
As per the request below, here is the full list:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}
i think i understand what you are trying to do now
first hold your data as a python object (you gave us a json object)
import json
my_data = json.loads(my_json_string)
for item in my_data['runners']:
item['selectionId'] = [item['selectionId'], my_name_here]
the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary
each item is a dicitonary so you can always also a new keys to it
item['new_key'] = my_value
So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop.
mkt_runners = []
for market_catalogue in market_catalogues:
for r in market_catalogue.runners:
mkt_runners.append((r.selection_id, r.runner_name))
for market_book in market_books:
for runner in market_book.runners:
for x in mkt_runners:
if runner.selection_id in x:
setattr(runner, 'x', x[1])
print(market_book.market_id, runner.x, runner.selection_id)
print(market_book.json())
So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!

Parsing multiple occurrences of an item into a dictionary

Attempting to parse several separate image links from JSON data through python, but having some issues drilling down to the right level, due to what I believe is from having a list of strings.
For the majority of the items, I've had success with the below example, pulling back everything I need. Outside of this instance, everything is a 1:1 ratio of keys:values, but for this one, there are multiple values associated with one key.
resultsdict['item_name'] = item['attribute_key']
I've been adding it all to a resultsdict={}, but am only able to get to the below sample string when I print.
INPUT:
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
OUTPUT (only relevant section):
'images': [{u'VariationSpecificPictureSet': [{u'PictureURL': [u'http//imagelink1'], u'VariationSpecificValue': u'color1'}, {u'PictureURL': [u'http//imagelink2'], u'VariationSpecificValue': u'color2'}, {u'PictureURL': [u'http//imagelink3'], u'VariationSpecificValue': u'color3'}, {u'PictureURL': [u'http//imagelink4'], u'VariationSpecificValue': u'color4'}]
I feel like I could add ['VariationPictureSet']['PictureURL'] at the end of my initial input, but that throws an error due to the indices not being integers, but strings.
Ideally, I would like to see the output as a simple comma-separated list of just the URLs, as follows:
OUTPUT:
'images': http//imagelink1, http//imagelink2, http//imagelink3, http//imagelink4
An answer to your comment that required a bit of code to it.
When using
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures']
you get a list with one element, so I recommend using this
for item in data['Item']:
resultsdict['images'] = item['Variations']['Pictures'][0]
now you can use
for image in resultsdict['images']['VariationsSpecificPictureSet']:
print(image['PictureUR‌​L'])
Thanks for the help, #uzzee, it's appreciated. I kept tinkering with it and was able to pull the continuous string of all the image URLs with the following code.
resultsdict['images'] = sum([x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']],[])
Without the sum it looks like this and pulls in the whole list of lists...
resultsdict['images'] = [x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']]

Reach a string behind unknown value in JSON

I use Wikipedia's API to get information about a page.
The API gives me JSON like this:
"query":{
"pages":{
"188791":{
"pageid":188791,
"ns":0,
"title":"Vanit\u00e9",
"langlinks":[
{
"lang":"bg",
"*":"Vanitas"
},
{
"lang":"ca",
"*":"Vanitas"
},
ETC.
}
}
}
}
You can see the full JSON response.
I want to obtain all entries like:
{
"lang":"ca",
"*":"Vanitas"
}
but the number key ("188791") in the pages object is the problem.
I found Find a value within nested json dictionary in python that explains me how to do enumerate the values.
Unfortunately I get the following exception:
TypeError: 'dict_values' object does not support indexing
My code is:
json["query"]["pages"].values()[0]["langlinks"]
It's probably a dumb question but I can't find a way to pass in the page id value.
One solution is to use the indexpageids parameter, e.g.: http://fr.wikipedia.org/w/api.php?action=query&titles=Vanit%C3%A9&prop=langlinks&lllimit=500&format=jsonfm&indexpageids. It will add an array of pageids to the response. You can then use that to access the dictionary.
As long as you're only querying one page at a time, Simeon Visser's answer will work. However, as a matter of good style, I'd recommend structuring your code so that you iterate over all the returned results, even if you know there should be only one:
for page in data["query"]["pages"].values():
title = page["title"]
langlinks = page["langlinks"]
# do something with langlinks...
In particular, by writing your code this way, if you ever find yourself needing to run the query for multiple pages, you can do it efficiently with a single MediaWiki API request.
You're using Python 3 and values() now returns a dict_values instead of a list. This is a view on the values of the dictionary.
Hence that's why you're getting that error because indexing fails. Indexing is possible in a list but not a view.
To fix it:
list(json["query"]["pages"].values())[0]["langlinks"]
If you really want just one page arbitrarily, do that the way Simeon Visser suggested.
But I suspect you want all langlinks in all pages, yes?
For that, you want a comprehension:
[page["langlinks"] for page in json["query"]["pages"].values()]
But of course that gives you a 2D list. If you want to iterate over each page's links, that's perfect. If you want to iterate over all of the langlinks at once, you want to flatten the list:
[langlink for page in json["query"]["pages"]
for langlink in page["langlinks"].values()]
… or…
itertools.chain.from_iterable(page["langlinks"]
for page in json["query"]["pages"].values())
(The latter gives you an iterator; if you need a list, wrap the whole thing in list. Conversely, for the first two, if you don't need a list, just any iterable, use parens instead of square brackets to get a generator expression.)

Categories