Suppose I have the following HTML input element within a form:
<input name="title_input" type="text" id="missing_value" title="Title">
If I want to submit a POST:
s = requests.Session()
s.get(url)
postResult = s.post(url, {'title_input':'This Is the Name of the Title'})
Even though the element has a missing value attribute, will this POST still work correctly?
I.e. will Python append value="This Is The Name of the Title" in the element even though it's missing from the original HTML?
Even though the element has a missing value attribute, will this POST still work correctly?
Yes It will. POST request will be done without obtaining HTML at all
You don't need this line for POST request
s.get(url)
I.e. will Python append value="This Is The Name of the Title" in the element
No Python will not append anything. Python even will not analyze get content (if get request is done)
It just open tcp connection and send data.
You haven't explained what that HTML is or how it relates to the Python code, but in any case the HTML doesn't seem to have anything to do with anything. The POST request is made by the requests module, not by HTML, so it gets its value from whatever you put into the parameters to the post() call.
Related
I've written a script in python using post requests to fetch the json content from a webpage. The script is doing just fine if I'm only stick to it's default page. However, my intention is to create a loop to collect the content from few different pages. The only problem I'm struggling to solve is use page keyword within payload in order to loop three different pages. Consider my faulty approach as a placeholder.
How can I use format within dict in order to change page numbers?
Working script (if I get rid of the pagination loop):
import requests
link = 'https://nsv3auess7-3.algolianet.com/1/indexes/idealist7-production/query?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%203.30.0&x-algolia-application-id=NSV3AUESS7&x-algolia-api-key=c2730ea10ab82787f2f3cc961e8c1e06'
for page in range(0,3):
payload = {"params":"getRankingInfo=true&clickAnalytics=true&facets=*&hitsPerPage=20&page={}&attributesToSnippet=%5B%22description%3A20%22%5D&attributesToRetrieve=objectID%2Ctype%2Cpublished%2Cname%2Ccity%2Cstate%2Ccountry%2Curl%2CorgID%2CorgUrl%2CorgName%2CorgType%2CgroupID%2CgroupUrl%2CgroupName%2CisFullTime%2CremoteOk%2Cpaid%2ClocalizedStarts%2ClocalizedEnds%2C_geoloc&filters=(orgType%3A'NONPROFIT')%20AND%20type%3A'JOB'&aroundLatLng=40.7127837%2C%20-74.0059413&aroundPrecision=15000&minimumAroundRadius=16000&query="}
res = requests.post(link,json=payload.format(page)).json()
for item in res['hits']:
print(item['name'])
I get an error when I run the script as it is:
res = requests.post(link,json=payload.format(page)).json()
AttributeError: 'dict' object has no attribute 'format'
format is a string method. You should apply it to the string value of your payload instead:
payload = {"params":"getRankingInfo=true&clickAnalytics=true&facets=*&hitsPerPage=20&page={}&attributesToSnippet=%5B%22description%3A20%22%5D&attributesToRetrieve=objectID%2Ctype%2Cpublished%2Cname%2Ccity%2Cstate%2Ccountry%2Curl%2CorgID%2CorgUrl%2CorgName%2CorgType%2CgroupID%2CgroupUrl%2CgroupName%2CisFullTime%2CremoteOk%2Cpaid%2ClocalizedStarts%2ClocalizedEnds%2C_geoloc&filters=(orgType%3A'NONPROFIT')%20AND%20type%3A'JOB'&aroundLatLng=40.7127837%2C%20-74.0059413&aroundPrecision=15000&minimumAroundRadius=16000&query=".format(page)}
res = requests.post(link,json=payload).json()
I am trying to have a request.get statement with two urls in it. What I am aiming to do is have requests (Python Module) make two requests based on list or two strings I provide. How can I pass multiple strings from a list into a request.get statement, and have requests go to each url (string) and have do something?
Thanks
Typically if we talking python requests library it only runs one url get request at a time. If what you are trying to do is perform multiple requests with a list of known urls then it's quite easy.
import requests
my_links = ['www.google.com', 'www.yahoo.com']
my_responses = []
for link in my_links:
payload = requests.get(link).json()
print('got response from {}'.format(link))
my_response.append(payload)
print(payload)
my_responses now has all the content from the pages.
You don't. The requests.get() method (or any other method, really) takes single URL and makes a single HTTP request because that is what most humans want it to do.
If you need to make two requests, you must call that method twice.
requests.get(url)
requests.get(another_url)
Of course, these calls are synchronous, the second will only begin once the first response is received.
I am trying to automate a web page request by using mechanize in python.
When I add custom headers like
X-Session= 'abc'
and
X-Auth='123'
by using addheader function.
object=mechanize.Browser()
object.addheaders=[('X-Session','abc'),('X-Auth','123')]
It changes those headers to X-session and X-auth.
I believe due to that the server is not able to authenticate me.
Can anybody help how to maintain the case?
Thanks.
Mechanize expect two items tuple as header, first item is header name, second is value, so you must do:
object.addheaders=[('X-Session','abc'), ('X-Auth','123')]
(Two tuples of two elements instead of one tuple with 4 elements).
To check headers that Mechanize will send with query, you can do:
print(request.header_items())
This should print something like:
[('X-Session','abc'), ('X-Auth','123')]
Doc: http://wwwsearch.sourceforge.net/mechanize/doc.html#adding-headers
At the URL https://www.airbnb.com/rooms/3093543, there is a map that loads near the bottom of the page containing a ‘neighborhood’ box that says Presidio. It’s stored in a tag as Presidio
I'm trying to get it with this:
profile = BeautifulSoup(requests.get("https://www.airbnb.com/rooms/3093543").content, "html.parser")
print profile.select('div[id="hover-card"]')[0].find('a').text
# div[id="hover-card"] is not found
I’m not sure if this is a dynamic variable that could only be retrieved with another module, or whether it is possible to get with requests.
You can get that data via another element.
Try this:
profile = BeautifulSoup(requests.get("https://www.airbnb.com/rooms/3093543").content, "html.parser")
print profile.select('meta[id="_bootstrap-neighborhood_card"]')[0]
And if needed request the map via:
https://www.airbnb.pt/locations/api/neighborhood_tiles.json?ids%5B%5D=ID
Where the ID in the above URL is given by the neighborhood_basic_info attribute in the first print.
I have been breaking my head against this wall for a couple days now, so I thought I would ask the SO community. I want a python script that, among other things, can hit 'accept' buttons on forms on websites in order to download files. To that end, though, I need to get access to the form.
This is an example of the kind of file I want to download. I know that within it, there is an unnamed form with an action to accept the terms and download the file. I also know that the div that form can be found in is the main-content div.
However, whenever I BeautifulSoup parse the webpage, I cannot get the main-content div. The closest I've managed to get is the main_content link right before it, which does not provide me any information through BeautifulSoup's object.
Here's a bit of code from my script:
web_soup = soup(urllib2.urlopen(url))
parsed = list(urlparse(url))
ext = extr[1:]
for downloadable in web_soup.findAll("a"):
encode = unicodedata.normalize('NFKD',downloadable.text).encode('UTF-8','ignore')
if ext in str.lower(encode):
if downloadable['href'] in url:
return ("http://%s%s" % (parsed[1],downloadable['href']))
for div in web_soup.findAll("div"):
if div.has_key('class'):
print(div['class'])
if div['class'] == "main-content":
print("Yep")
return False
Url is the name of the url I am looking at (so the url I posted earlier). extr is the type of file I am hoping to download in the form .extension, but that is not really relevant to my question. The code that is relevant is the second for loop, the one where I am attempting to loop through the divs. The first bit of code(the first for loop) is code that goes through to grab download links in another case (when the url the script is given is a 'download link' marked by a file extension such as .zip with a content type of text/html), so feel free to ignore it. I added it in just for context.
I hope I provided enough detail, though I am sure I did not. Let me know if you need any more information on what I am doing and I will be happy to oblige. Thanks, Stack.
Here's the code for getting main-content div and form action:
import re
import urllib2
from bs4 import BeautifulSoup as soup
url = "http://www.cms.gov/apps/ama/license.asp?file=/McrPartBDrugAvgSalesPrice/downloads/Apr-13-ASP-Pricing-file.zip"
web_soup = soup(urllib2.urlopen(url))
# get main-content div
main_div = web_soup.find(name="div", attrs={'class': 'main-content'})
print main_div
# get form action
form = web_soup.find(name="form", attrs={'action': re.compile('.*\.zip.*')})
print form['action']
Though, if you need, I can provide examples for lxml, mechanize or selenium.
Hope that helps.