I'm trying to set up a loop to pull in weather data for about 500 weather stations for an entire year which I have in my dataframe. The base URL stays the same, and the only part that changes is the weather station ID.
I'd like to create a dataframe with the results. I believe i'd use requests.get to pull in data for all the weather stations in my list, which the IDs to use in the URL are in a column called "API ID" in my dataframe. I am a python beginner - so any help would be appreciated! My code is below but doesn't work and returns an error:
"InvalidSchema: No connection adapters were found for '0 " http://www.ncei.noaa.gov/access/services/data/...\nName: API ID, Length: 497, dtype: object'
.
def callAPI(API_id):
for IDs in range(len(API_id)):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + distances['API ID'] + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
I am not sure about your whole code base, but this is the function that will return the data from the API, If you have multiple station id on a single df column then you can use a for loop otherwise no need to do that.
Also, you are not returning the result from the function. Check the return keyword at the end of the function.
Working code:
import requests
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
print(callAPI('USC00457180'))
So your full code will be something like this,
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
Note: Even better use asynchronous calls to the API to make the process faster. Something like this: https://stackoverflow.com/a/56926297/1138192
What I have so far is the first request gathering Id's. I would then like to use that return draftgroupid to insert into the second url request. Is it possible to send two requests in the same script, and if so how would I do a for i in range(draftgroupid) in the second url request?
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
Output of draftgroupid:
16901
16905
16902
16903
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/THEVALUEIWANTTOLOOPTHROUGH/draftables?format=json")
EDIT
import csv
import requests
import json
req = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
data2 = req2.json
for i, player_info in enumerate(data2['draftables'][0]):
date = player_info['competition']['startTime']
print(date)
Running into a TypeError: 'method' object is not subscriptable
As I understand, your problem is related to string manipulation rather than for the request library.
So basically,
import requests
import json
req1 = requests.get(url="https://www.draftkings.com/lobby/getcontests?sport=NHL")
req.raise_for_status()
data = req.json()
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/" + str(draftgroupid) + "/draftables?format=json")
should do the job.
More elegant ways to concatenate strings can be found at http://www.pythonforbeginners.com/concatenation/string-concatenation-and-formatting-in-python
Edit
For example,
"some string " + str(123)
"some string %d" % 123
"some string %s" % 123
Will all give the same output. There are more ways to concatenate strings. You just need to choose the best fit based on the context.
for i, contest in enumerate(data['DraftGroups']):
draftgroupid = contest['DraftGroupId']
req2 = requests.get(url="https://api.draftkings.com/draftgroups/v1/draftgroups/%d/draftables?format=json" % draftgroupid)
I assume you didn't actually mean for i in range(draftgroupid) as you stated in the question, because that would mean making 16901 requests, followed by 16905 requests (all of which except the last four would be duplicates of the first batch), followed by 16902 requests (of which all would be duplicates), etc.
I make a GET to a API
I got this back
{"status":200,"message":"Success","data":[{"email_address":"admin#nyunets.com","password":"admin","account_id":1000,"account_type":"admin","name_prefix":null,"first_name":null,"middle_names":null,"last_name":"Admin","name_suffix":null,"non_person_name":false,"dba":"","display_name":"Admin","address1":"111 Park Ave","address2":"Floor 4","address3":"Suite 4011","city":"New York","state":"NY","postal_code":"10022","nation_code":"USA","phone1":"212-555-1212","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":2,"last_updated_utc_in_secs":1446127072},{"email_address":"mhn#nyu.com","password":"nyu123","account_id":1002,"account_type":"customer","name_prefix":"","first_name":"MHN","middle_names":"","last_name":"User","name_suffix":"","non_person_name":false,"dba":"","display_name":"MHNUser","address1":"3101 Knox St","address2":"","address3":"","city":"Dallas","state":"TX","postal_code":"75205","nation_code":"USA","phone1":"8623875097","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":2,"last_updated_utc_in_secs":1461166172},{"email_address":"mhn1#nyu.com","password":"nyu123","account_id":1004,"account_type":"customer","name_prefix":"","first_name":"MHN1","middle_names":"","last_name":"User","name_suffix":"","non_person_name":false,"dba":"","display_name":"MHN1User","address1":"1010 Rosedale Shopping Center","address2":"","address3":"","city":"Roseville","state":"MN","postal_code":"55113","nation_code":"USA","phone1":"8279856982","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":2,"last_updated_utc_in_secs":1461166417},{"email_address":"location#nyu.com","password":"nyu123","account_id":1005,"account_type":"customer","name_prefix":"","first_name":"BB","middle_names":"","last_name":"HH","name_suffix":"","non_person_name":false,"dba":"","display_name":"BBHH","address1":"9906 Beverly Dr","address2":"9906 Beverly Dr","address3":"","city":"Beverly Hills","state":"CA","postal_code":"90210","nation_code":"90210","phone1":"3105559906","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":1,"last_updated_utc_in_secs":1461167224},{"email_address":"mbn1#nyu.com","password":"nyu123","account_id":1003,"account_type":"customer","name_prefix":"","first_name":"MBN1","middle_names":"","last_name":"User","name_suffix":"","non_person_name":false,"dba":"","display_name":"MBN1User","address1":"3200 S Las Vegas Blvd","address2":"","address3":"","city":"Las Vegas","state":"NV","postal_code":"89109","nation_code":"USA","phone1":"9273597497","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":1,"last_updated_utc_in_secs":1461593233},{"email_address":"mbn#nyu.com","password":"nyu123","account_id":1001,"account_type":"customer","name_prefix":"","first_name":"MBN","middle_names":"","last_name":"User","name_suffix":"","non_person_name":false,"dba":"","display_name":"MBNUser","address1":"300 Concord Road","address2":"","address3":"","city":"Billerica","state":"MA","postal_code":"01821","nation_code":"USA","phone1":"8127085695","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":1,"last_updated_utc_in_secs":1461784499},{"email_address":"usermbn#nyu.com","password":"nyu123","account_id":1006,"account_type":"customer","name_prefix":"","first_name":"User","middle_names":"","last_name":"MBN","name_suffix":"","non_person_name":false,"dba":"","display_name":"UserMBN","address1":"75 Saint Alphonsus Street","address2":"","address3":"","city":"Boston","state":"MA","postal_code":"01821","nation_code":"USA","phone1":"8127085695","phone2":"","phone3":"","time_zone_offset_from_utc":-5,"customer_type":1,"last_updated_utc_in_secs":1462285561},{"email_address":"emile.barnaby#example.com","password":"nyu123","account_id":2000,"account_type":"customer","name_prefix":"","first_name":"emile","middle_names":"","last_name":"barnaby","name_suffix":"","non_person_name":false,"dba":"","display_name":"emilebarnaby","address1":"300 Concord Rd","address2":"","address3":"","city":"8239grandmaraisave","state":"manitoba","postal_code":"56798","nation_code":"USA","phone1":"414-140-1435","phone2":"414-140-1435","phone3":"414-140-1435","time_zone_offset_from_utc":-5,"customer_type":1,"last_updated_utc_in_secs":1462211572}]}
I have
import requests
import json
url = "http://api/users"
accounts = requests.get(url).json()
data = json.loads(accounts)
object_with_max_account_id = max(accounts['data'], key=lambda x: x['account_id'])
print(object_with_max_account_id['account_id'])
Goal
is to get the highest account id out of it.
Usually we like to see what OPs try themselves, this is pretty much straightforward.
import requests
url = "http://api/users"
accounts = requests.get(url).json()
object_with_max_account_id = max(accounts['data'], key=lambda x: x['account_id'])
print(object_with_max_account_id['account_id'])
>> 2000
Edit: Apparently, you first need to parse your input as JSON.
Check out simplejson.
import simplejson as json
data_obj = json.loads(data)
The s in loads means load from string.
Then, if you want to be looping through, how about something like:
maxID= -1
for account in data_obj:
if(account[account_id])>maxID:
maxID= account[account_id]
print "Max ID is %d" % maxID
So sorry about this vague and confusing title. But there is no really better way for me to summarize my problem in one sentence.
I was trying to get the student and grade information from a french website. The link is this (http://www.bankexam.fr/resultat/2014/BACCALAUREAT/AMIENS?filiere=BACS)
My code is as follows:
import time
import urllib2
from bs4 import BeautifulSoup
regions = {'R\xc3\xa9sultats Bac Amiens 2014':'/resultat/2014/BACCALAUREAT/AMIENS'}
base_url = 'http://www.bankexam.fr'
tests = {'es':'?filiere=BACES','s':'?filiere=BACS','l':'?filiere=BACL'}
for i in regions:
for x in tests:
# create the output file
output_file = open('/Users/student project/'+ i + '_' + x + '.txt','a')
time.sleep(2) #compassionate scraping
section_url = base_url + regions[i] + tests[x] #now goes to the x test page of region i
request = urllib2.Request(section_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
# Find the maximum pages to go through
if soup.find('div','pagination'):
import re
page_info = soup.find('div','pagination')
pages = []
for i in page_info.find_all('a',re.compile('elt')):
try:
pages.append(int(i.string.encode('utf8')))
except ValueError:
continue
max_page = max(pages)
# Now goes through page 2 to max page
for i in range(1,max_page):
page_url = '&p='+str(i)+'#anchor'
section2_url = section_url+page_url
request = urllib2.Request(section2_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
A little more description about the code:
I created a 'regions' dictionary and 'tests' dictionary because there are 30 other regions I need to collect and I just include one here for showcase. And I'm just interested in the test results of three tests (ES, S, L) and so I created this 'tests' dictionary.
Two errors keep showing up,
one is
KeyError: 2
and the error is linked to line 12,
section_url = base_url + regions[i] + tests[x]
The other is
TypeError: cannot concatenate 'str' and 'int' objects
and this is linked to line 10.
I know there is a lot of information here and I'm probably not listing the most important info for you to help me. But let me know how I can do to fix this!
Thanks
The issue is that you're using the variable i in more than one place.
Near the top of the file, you do:
for i in regions:
So, in some places i is expected to be a key into the regions dictionary.
The trouble comes when you use it again later. You do so in two places:
for i in page_info.find_all('a',re.compile('elt')):
And:
for i in range(1,max_page):
The second of these is what is causing your exceptions, as the integer values that get assigned to i don't appear in the regions dict (nor can an integer be added to a string).
I suggest renaming some or all of those variables. Give them meaningful names, if possible (i is perhaps acceptable for an "index" variable, but I'd avoid using it for anything else unless you're code golfing).