API request search issue - how do I search through all pages - python

I am trying to search through multiple pages through an API GET request but I am unsure how to search passed the first page of results.
The first GET request will download the first page of results but to get to the next page you need to GET the new URL which is listed at the bottom of the first page.
def getaccount_data():
view_code = input("Enter Account: ")
global token
header = {"X-Fid-Identity": token}
url = BASE_URL + '/data/v1/entity/account'
accountdata = requests.get(url, headers=header, verify = False)
newaccountdata = accountdata.json()
for data in newaccountdata ['results']:
if (data['fields']['view_code']) == view_code:
print("Account is set up")
else:
url =(newaccountdata['moreUri'])
print(url)
getaccount_data()
Is there anyway to search all the pages by updating the url to get to the next page?

Related

Python - Finding total number of pages of a REST API

I am trying to loop through a REST API and fetch the complete data set.
url = f'https://apiurl.com/api/1.1/json/tickets?page=1'
auth = (f'{api_key}', f'{auth_code}')
res = requests.get(url, auth=auth)
data = json.loads(res.content)
The above returns data for page 1 and I am able to do it for all other pages, page by page by specifying the page number in the URL. I am not sure how do I find the total pages such that I can perform a for loop that does it for all pages in the API feed.
I was able to get the number of pages using the below code:
res = requests.get(url, auth=auth)
data=res.json()
while 'next' in res.links.keys():
res = requests.get(res.links['next']['url'])
data.extend(res.json())
page_count = repos['page_info']['page_count'] <<-- This returns the max page count

Using Python to get new URL after clicking Submit button

I am using Google Map to get the GPS coordinates of address that I am searching. I want to get the URL after I chick on Submit so that I can extract GPS coordinates from URL. However my URL only shows: https://www.google.com/maps
url = "http://maps.google.com/"
locationAdrs = '957 ASHBY GROVE SW ATLANTA'
browser = webdriver.Chrome(executable_path="C:/Users/joe/AppData/Local/Programs/Python/Python37-32/PyOn/chromedriver")
browser.get(url)
address = browser.find_element_by_xpath('//*[#id="searchboxinput"]')
address.send_keys(locationAdrs)
address.submit()
url = browser.current_url
print(url)
You have to re-affirm what the de-facto accessed link is, because your inputted link may not correspond with the DNS-route that finally connects you to the final destination. Then you have to wait for your browser to update before you return the new address that you are accessing:
url = "https://www.google.com/maps"
locationAdrs = '957 ASHBY GROVE SW ATLANTA'
browser.get(url)
address = browser.find_element_by_xpath('//*[#id="searchboxinput"]')
address.send_keys(locationAdrs)
# address.submit() - doesn't seem to do the right thing.
url = browser.current_url # have initial url on same format before click is made to move away
browser.find_element_by_xpath('//*[#id="searchbox-searchbutton"]').click()
while url == browser.current_url:
time.sleep(2)
url = browser.current_url
print(url)
Output:
https://www.google.com/maps/place/957+Ashby+Grove+SW,+Atlanta,+GA+30314,+USA/#33.7500669,-84.4211224,17z/data=!3m1!4b1!4m5!3m4!1s0x88f5035d3de5336f:0x9ca82913b5ecbde!8m2!3d33.7500669!4d-84.4189284

Python log into Voobly

I've read through dozens of pages on how to log into a web page using Python, but I can't seem to make my code work. I'm trying to log into a site called "Voobly", and I'm wondering if there might be something specific to Voobly that is making this more difficult. Here is my code:
import requests
loginURL = "https://www.voobly.com/login"
matchUrl = "https://www.voobly.com/profile/view/124993231/Matches"
s = requests.session()
loginInfo = {"username":"myUsername", "password":"myPassword"}
firstGetRequest = s.get(loginURL) # Get the login page using our session so we save the cookies
postRequest = s.post(loginURL,data=loginInfo) # Post data to the login page, the data being my login information
getRequest = s.get(matchUrl) # Get content from a login - restricted page
response = getRequest.content.decode() # Get the actual html text from restricted page
if "Page Access Failed" in response: # True if I'm blocked
print("Failed")
else: # If I'm not blocked, I have the result I want
print("Worked!") # I can't achieve this
As mentioned in the comments, the login form is submitted to /login/auth. But, the cookie is generated from the /login URL.
Use the following code:
form = {'username': USERNAME, 'password': PASSWORD}
with requests.Session() as s:
# Get the cookie
s.get('https://www.voobly.com/login')
# Post the login form data
s.post('https://www.voobly.com/login/auth', data=form)
# Go to home page
r = s.get('https://www.voobly.com/welcome')
# Check if username is in response.text
print(USERNAME in r.text)
# True
r2 = s.get('https://www.voobly.com/profile/view/124993231/Matches')
if "Page Access Failed" in r2.text:
print("Failed")
else:
print("Worked!")
# Worked!
Note: The Go to home page part is not at all needed for the login. It's used just to show that the login is successful.

Python post request for USPTO site scraping

I’m trying to scrape data from http://portal.uspto.gov/EmployeeSearch/ web site.
I open the site in browser, click on the Search button inside the Search by Organisation part of the site and look for the request being sent to server.
When I post the same request using python requests library in my program, I don’t get the result page which I am expecting but I get the same Search page, with no employee data on it.
I’ve tried all variants, nothing seems to work.
My question is, what URL should I use in my request, do I need to specify headers (tried also, copied headers viewed in Firefox developer tools upon request) or something else?
Below is the code that sends the request:
import requests
from bs4 import BeautifulSoup
def scrape_employees():
URL = 'http://portal.uspto.gov/EmployeeSearch/searchEm.do;jsessionid=98BC24BA630AA0AEB87F8109E2F95638.prod_portaljboss4_jvm1?action=displayResultPageByOrgShortNm&currentPage=1'
response = requests.post(URL)
site_data = response.content
soup = BeautifulSoup(site_data, "html.parser")
print(soup.prettify())
if __name__ == '__main__':
scrape_employees()
All the data you need is in a form tag:
action is the url when you make a post to server.
input is the data you need post to server. {name:value}
import requests, bs4, urllib.parse,re
def make_soup(url):
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
return soup
def get_form(soup):
form = soup.find(name='form', action=re.compile(r'OrgShortNm'))
return form
def get_action(form, base_url):
action = form['action']
# action is reletive url, convert it to absolute url
abs_action = urllib.parse.urljoin(base_url, action)
return abs_action
def get_form_data(form, org_code):
data = {}
for inp in form('input'):
# if the value is None, we put the org_code to this field
data[inp['name']] = inp['value'] or org_code
return data
if __name__ == '__main__':
url = 'http://portal.uspto.gov/EmployeeSearch/'
soup = make_soup(url)
form = get_form(soup)
action = get_action(form, url)
data = get_form_data(form, '1634')
# make request to the action using data
r = requests.post(action, data=data)

Fetch all pages using python request

I am using an api to fetch orders from a website. The problem is at one time it only fetch only 20 orders. I figured out i need to use a pagination iterator but dont know to use it. How to fetch all the orders all at once.
My code:
def search_orders(self):
headers = {'Authorization':'Bearer %s' % self.token,'Content-Type':'application/json',}
url = "https://api.flipkart.net/sellers/orders/search"
filter = {"filter": {"states": ["APPROVED","PACKED"],},}
return requests.post(url, data=json.dumps(filter), headers=headers)
Here is a link to documentation.
Documentation
You need to do what the documentation suggests -
The first call to the Search API returns a finite number of results based on the pageSize value. Calling the URL returned in the nextPageURL field of the response gets the subsequent pages of the search result.
nextPageUrl - String - A GET call on this URL fetches the next page results. Not present for the last page
(Emphasis mine)
You can use response.json() to get the json of the response. Then you can check the flag - hasMore - to see if there are more if so, use requests.get() to get the response for next page, and keep doing this till hasMore is false. Example -
def search_orders(self):
headers = {'Authorization':'Bearer %s' % self.token,'Content-Type':'application/json',}
url = "https://api.flipkart.net/sellers/orders/search"
filter = {"filter": {"states": ["APPROVED","PACKED"],},}
s = requests.Session()
response = s.post(url, data=json.dumps(filter), headers=headers)
orderList = []
resp_json = response.json()
orderList.append(resp_json["orderItems"])
while resp_json.get('hasMore') == True:
response = s.get('"https://api.flipkart.net/sellers{0}'.format(resp_json['nextPageUrl']))
resp_json = response.json()
orderList.append(resp_json["orderItems"])
return orderList
The above code should return the complete list of orders.

Categories