Submitting to a web form using python - python

I have seen questions like this asked many many times but none are helpful
Im trying to submit data to a form on the web ive tried requests, and urllib and none have worked
for example here is code that should search for the [python] tag on SO:
import urllib
import urllib2
url = 'http://stackoverflow.com/'
# Prepare the data
values = {'q' : '[python]'}
data = urllib.urlencode(values)
# Send HTTP POST request
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()
# Print the result
print html
yet when i run it i get the html soure of the home page
here is an example of using requests:
import requests
data= {
'q': '[python]'
}
r = requests.get('http://stackoverflow.com', data=data)
print r.text
same result! i dont understand why these methods arent working i've tried them on various sites with no success so if anyone has successfully done this please show me how!
Thanks so much!

If you want to pass q as a parameter in the URL using requests, use the params argument, not data (see Passing Parameters In URLs):
r = requests.get('http://stackoverflow.com', params=data)
This will request https://stackoverflow.com/?q=%5Bpython%5D , which isn't what you are looking for.
You really want to POST to a form. Try this:
r = requests.post('https://stackoverflow.com/search', data=data)
This is essentially the same as GET-ting https://stackoverflow.com/questions/tagged/python , but I think you'll get the idea from this.

import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
This makes a POST request with the data specified in the values. we need urllib to encode the url and then urllib2 to send a request.

Mechanize library from python is also great allowing you to even submit forms. You can use the following code to create a browser object and create requests.
import mechanize,re
br = mechanize.Browser()
br.set_handle_robots(False) # ignore robots
br.set_handle_refresh(False) # can sometimes hang without this
br.addheaders = [('User-agent', 'Firefox')]
br.open( "http://google.com" )
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'
br.submit()
resp = None
for link in br.links():
siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
if siteMatch:
resp = br.follow_link( link )
break
content = resp.get_data()
print content

Related

How do I find the required parameters for the payload when logging into a website with requests?

This is what I have so far. I'm very new to this so point out if I'm doing anything wrong.
import requests
url = "https://9anime.to/user/watchlist"
payload = {
"username":"blahblahblah",
"password":"secretstringy"
# anything else?
}
with requests.Session() as s:
res = s.post(url, data=payload)
print(res.status_code)
You will need to inspect the form and see where the form is posting to in the action tag. In this case it is posting to user/ajax/login. Instead of requesting the watchlist URL you should post those details to the loginurl. Once you are logged in you can request your watchlist.
from lxml.html import fromstring
import requests
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
payload = {
"username":"someemail#gmail.com",
"password":"somepass"
}
with requests.Session() as s:
res = s.post(loginurl, data=payload)
print(res.content)
# b'{"success":true,"message":"Login successful"}'
res = s.get(url)
tree = fromstring(res.content)
elem = tree.cssselect("div.watchlist div.widget-body")[0]
print(elem.text_content())
# Your watch list is empty.
You would need to have knowledge (documentation of some form) on what that URL is expecting and how you are expected to interact with it. There is no way to know just given the information you have provided.
If you have some system that is able to interact with that URL already (e.g. your able to log in with your browser), then you could try to reverse-engineer what it is your browser is doing...

Unable to login to indeed.com using python requests

I'm trying to write a code to collect resumes from "indeed.com" website.
In order to download resumes from "indeed.com" you have to login with your account.
The problem with me is after posting data it shows me response [200] which indicates successful post but still fail to login.
Here is my code :
import requests
from bs4 import BeautifulSoup
from lxml import html
page = requests.get('https://secure.indeed.com/account/login')
soup = BeautifulSoup(page.content, 'html.parser')
row_text = soup.text
surftok = str(row_text[row_text.find('"surftok":')+11:row_text.find('","tmpl":')])
formtok = str(row_text[row_text.find('"tk":') + 6:row_text.find('","variation":')])
logintok = str(row_text[row_text.find('"loginTk":') + 11:row_text.find('","debugBarLink":')])
cfb = int(str(row_text[row_text.find('"cfb":')+6:row_text.find(',"pvr":')]))
pvr = int(str(row_text[row_text.find('"pvr":') + 6:row_text.find(',"obo":')]))
hl = str(row_text[row_text.find('"hl":') + 6:row_text.find('","co":')])
data = {
'action': 'login',
'__email': 'myEmail',
'__password': 'myPassword',
'remember': '1',
'hl': hl,
'cfb': cfb,
'pvr': pvr,
'form_tk': formtok,
'surftok': surftok,
'login_tk': logintok
}
response = requests.post("https://secure.indeed.com/", data=data)
print response
print 'myEmail' in response.text
It shows me response [200] but when I search for my email in the response page to make sure that login is successful, I don't find it. It seems that login failed for a reason that I don't know.
send headers as well in your post request, get the headers from response headers of your browser.
headers = {'user-agent': 'Chrome'}
response = requests.post("https://secure.indeed.com/",headers = headers, data=data)
Some websites uses JavaScript redirection. "indeed.com" is one of them. Unfortunately, python requests does not support JavaScript redirection. In such situations we may use selenium.

What are the corresponding parameters of Requests "data" and "params" in urllib2?

I have been successfully implementing python Requests module to send out POST requests to server with specified
resp = requests.request("POST", url, proxies, data, headers, params, timeout)
However, for a certain reason, I now need to use python urllib2 module to query. For urllib2.urlopen's parameter "data," what I understand is that it helps to form the query string (which is the same as Requests "params"). requests.request's parameter "data," on the other hand, is used to fill the request body.
After searching and reading many posts, examples, and documentations, I still have not been able to figure out what is the corresponding parameter of requests.request's "data" in urllib2.
Any advice is much appreciated! Thanks.
-Janton
It doesn't matter what it is called - it is a matter of passing it in at the right place. For example in this example, the POST data is a dictionary (name can be anything).
The dictionary is urlencoded and the urlencoded name can again be anything but I've picked "postdata", which is the data that is POSTed
import urllib # for the urlencode
import urllib2
searchdict = {'q' : 'urllib2'}
url = 'https://duckduckgo.com/html'
postdata = urllib.urlencode(searchdict)
req = urllib2.Request(url, postdata)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
If your POST data is plain text (not a Python type such as a dictionary) it can work without urllib.urlencode:
import urllib2
searchstring = 'q=urllib2'
url = 'https://duckduckgo.com/html'
req = urllib2.Request(url, searchstring)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()

Why the browser display the entity name?

import urllib2
url = "http://www.reddit.com/r/pics/hot.json"
hdr = { 'User-Agent' : 'super happy flair bot by /u/spladug' }
req = urllib2.Request(url, headers=hdr)
html = urllib2.urlopen(req).read()
data = '{0}({1})'.format("callback", html)
print data
I want to use some data scraped from some sites and make the return value support the JSONP.But the browser outputs:
callback({"kind": "Listing", "data":........
Actually I want:
callback({"kind": "Listing", "data":........
I don't want the quotes convert to ".Maybe I should not use function read(),but I don't know how to deal with it?
Can someone help me?

Python - POSTing with a urllib2 opener

I have a urllib2 opener, and wish to use it for a POST request with some data.
I am looking to receive the content of the page that I am POSTing to, and also the URL of the page that is returned (I think this is just a 30x code; so something along those lines would be awesome!).
Think of this as the code:
anOpener = urllib2.build_opener(???,???)
anOpener.addheaders = [(???,???),(???,???),...,(???,???)]
# do some other stuff with the opener
data = urllib.urlencode(dictionaryWithPostValues)
pageContent = anOpener.THE_ANSWER_TO_THIS_QUESTION
pageURL = anOpener.THE_SECOND_PART_OF_THIS_QUESTION
This is such a silly question once one realizes the answer.
Just use:
open(URL,data)
for the first part, and like Rachel Sanders mentioned,
geturl()
for the second part.
I really can't figure out how the whole Request/opener thing works though; I couldn't find any nice documentation :/
This page should help you out:
http://www.voidspace.org.uk/python/articles/urllib2.shtml#data
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
the_url = response.geturl() # <- doc claims this gets the redirected url
It looks like you can also use response.info() to get the Location header directly instead of using .geturl().
Hope that helps!
If you add data to the request the method gets automatically changed to POST. Check out the following example:
import urllib2
import json
url = "http://server.local/x/y"
data = {"name":"JackBauer"}
method = "PUT"
request = urllib2.Request(url)
request.add_header("Content-Type", "application/json")
request.get_method = lambda: method
if data: request.add_data(json.dumps(data))
response = urllib2.urlopen(request)
if response: print response.read()
As i mentioned the lambda is not needed if you use GET/POST.

Categories