I have a urllib2 opener, and wish to use it for a POST request with some data.
I am looking to receive the content of the page that I am POSTing to, and also the URL of the page that is returned (I think this is just a 30x code; so something along those lines would be awesome!).
Think of this as the code:
anOpener = urllib2.build_opener(???,???)
anOpener.addheaders = [(???,???),(???,???),...,(???,???)]
# do some other stuff with the opener
data = urllib.urlencode(dictionaryWithPostValues)
pageContent = anOpener.THE_ANSWER_TO_THIS_QUESTION
pageURL = anOpener.THE_SECOND_PART_OF_THIS_QUESTION
This is such a silly question once one realizes the answer.
Just use:
open(URL,data)
for the first part, and like Rachel Sanders mentioned,
geturl()
for the second part.
I really can't figure out how the whole Request/opener thing works though; I couldn't find any nice documentation :/
This page should help you out:
http://www.voidspace.org.uk/python/articles/urllib2.shtml#data
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
the_url = response.geturl() # <- doc claims this gets the redirected url
It looks like you can also use response.info() to get the Location header directly instead of using .geturl().
Hope that helps!
If you add data to the request the method gets automatically changed to POST. Check out the following example:
import urllib2
import json
url = "http://server.local/x/y"
data = {"name":"JackBauer"}
method = "PUT"
request = urllib2.Request(url)
request.add_header("Content-Type", "application/json")
request.get_method = lambda: method
if data: request.add_data(json.dumps(data))
response = urllib2.urlopen(request)
if response: print response.read()
As i mentioned the lambda is not needed if you use GET/POST.
Related
I have been successfully implementing python Requests module to send out POST requests to server with specified
resp = requests.request("POST", url, proxies, data, headers, params, timeout)
However, for a certain reason, I now need to use python urllib2 module to query. For urllib2.urlopen's parameter "data," what I understand is that it helps to form the query string (which is the same as Requests "params"). requests.request's parameter "data," on the other hand, is used to fill the request body.
After searching and reading many posts, examples, and documentations, I still have not been able to figure out what is the corresponding parameter of requests.request's "data" in urllib2.
Any advice is much appreciated! Thanks.
-Janton
It doesn't matter what it is called - it is a matter of passing it in at the right place. For example in this example, the POST data is a dictionary (name can be anything).
The dictionary is urlencoded and the urlencoded name can again be anything but I've picked "postdata", which is the data that is POSTed
import urllib # for the urlencode
import urllib2
searchdict = {'q' : 'urllib2'}
url = 'https://duckduckgo.com/html'
postdata = urllib.urlencode(searchdict)
req = urllib2.Request(url, postdata)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
If your POST data is plain text (not a Python type such as a dictionary) it can work without urllib.urlencode:
import urllib2
searchstring = 'q=urllib2'
url = 'https://duckduckgo.com/html'
req = urllib2.Request(url, searchstring)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
So, all I want to do is send a request to the 511 api and return the train times from the train station. I can do that using the full url request, but I would like to be able to set values without paste-ing together a string and then sending that string. I want to have the api return the train times for different stations. I see other requests that use headers, but I don't know how to use headers with a request and am confused by the documentation.
This works...
urllib2.Request("http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?token=xxxx-xxx&stopcode=70142")
response = urllib2.urlopen(request)
the_page = response.read()
I want to be able to set values like this...
token = xxx-xxxx
stopcode = 70142
url = "http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?"
... and then put them together like this...
urllib2.Request(url,token, stopcode)
and get the same result.
The string formatting documentation would be a good place to start to learn more about different ways to plug in values.
val1 = 'test'
val2 = 'test2'
url = "https://www.example.com/{0}/blah/{1}".format(val1, val2)
urllib2.Request(url)
The missing piece is "urllib" needs to be used along with "urllib2". Specifically, the function urllib.urlencode() returns the encoded versions of the values.
From the urllib documentation here
import urllib
query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args
url = 'http://localhost:8080/?' + encoded_args
print urllib.urlopen(url).read()
So the corrected code is as follows:
import urllib
import urllib2
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
encoded_args = urllib.urlencode(query_args)
request = urllib2.Request(url+encoded_args)
response = urllib2.urlopen(request)
print(response.read())
Actually, it is a million times easier to use requests package and not urllib, urllib2. All that code above can be replaced with this:
import requests
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
r = request.get(url, params = query_args)
r.text
I've been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.
In my script I am setting allow_redirects=True.
I would like to know if the page has been redirected to something else, what is the new URL.
For example, if the start URL was: www.google.com/redirect
And the final URL is www.google.co.uk/redirected
How do I get that URL?
You are looking for the request history.
The response.history attribute is a list of responses that led to the final URL, which can be found in response.url.
response = requests.get(someurl)
if response.history:
print("Request was redirected")
for resp in response.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(response.status_code, response.url)
else:
print("Request was not redirected")
Demo:
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
... print(resp.status_code, resp.url)
...
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get
This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.
If you want to use allow_redirects=False and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url won't work. Instead, it's the "Location" header:
r = requests.get('http://github.com/', allow_redirects=False)
r.status_code # 302
r.url # http://github.com, not https.
r.headers['Location'] # https://github.com/ -- the redirect destination
I think requests.head instead of requests.get will be more safe to call when handling url redirect. Check a GitHub issue here:
r = requests.head(url, allow_redirects=True)
print(r.url)
the documentation has this blurb https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history
import requests
r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for
For python3.5, you can use the following code:
import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)
I wrote the following function to get the full URL from a short URL (bit.ly, t.co, ...)
import requests
def expand_short_url(url):
r = requests.head(url, allow_redirects=False)
r.raise_for_status()
if 300 < r.status_code < 400:
url = r.headers.get('Location', url)
return url
Usage (short URL is this question's url):
short_url = 'https://tinyurl.com/' + '4d4ytpbx'
full_url = expand_short_url(short_url)
print(full_url)
Output:
https://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url
I wasn't able to use requests library and had to go different way. Here is the code that I post as solution to this post. (To get redirected URL with requests)
This way you actually open the browser, wait for your browser to log the url in the history log and then read last url in your history. I wrote this code for google chrom, but you should be able to follow along if you are using different browser.
import webbrowser
import sqlite3
import pandas as pd
import shutil
webbrowser.open("https://twitter.com/i/user/2274951674")
#source file is where the history of your webbroser is saved, I was using chrome, but it should be the same process if you are using different browser
source_file = 'C:\\Users\\{your_user_id}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History'
# could not directly connect to history file as it was locked and had to make a copy of it in different location
destination_file = 'C:\\Users\\{user}\\Downloads\\History'
time.sleep(30) # there is some delay to update the history file, so 30 sec wait give it enough time to make sure your last url get logged
shutil.copy(source_file,destination_file) # copying the file.
con = sqlite3.connect('C:\\Users\\{user}\\Downloads\\History')#connecting to browser history
cursor = con.execute("SELECT * FROM urls")
names = [description[0] for description in cursor.description]
urls = cursor.fetchall()
con.close()
df_history = pd.DataFrame(urls,columns=names)
last_url = df_history.loc[len(df_history)-1,'url']
print(last_url)
>>https://twitter.com/ozanbayram01
All the answers are applicable where the final url exists/working fine.
In case, final URL doesn't seems to work then below is way to capture all redirects.
There was scenario where final URL isn't working anymore and other ways like url history give error.
Code Snippet
long_url = ''
url = 'http://example.com/bla-bla'
try:
while True:
long_url = requests.head(url).headers['location']
print(long_url)
url = long_url
except:
print(long_url)
I have seen questions like this asked many many times but none are helpful
Im trying to submit data to a form on the web ive tried requests, and urllib and none have worked
for example here is code that should search for the [python] tag on SO:
import urllib
import urllib2
url = 'http://stackoverflow.com/'
# Prepare the data
values = {'q' : '[python]'}
data = urllib.urlencode(values)
# Send HTTP POST request
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()
# Print the result
print html
yet when i run it i get the html soure of the home page
here is an example of using requests:
import requests
data= {
'q': '[python]'
}
r = requests.get('http://stackoverflow.com', data=data)
print r.text
same result! i dont understand why these methods arent working i've tried them on various sites with no success so if anyone has successfully done this please show me how!
Thanks so much!
If you want to pass q as a parameter in the URL using requests, use the params argument, not data (see Passing Parameters In URLs):
r = requests.get('http://stackoverflow.com', params=data)
This will request https://stackoverflow.com/?q=%5Bpython%5D , which isn't what you are looking for.
You really want to POST to a form. Try this:
r = requests.post('https://stackoverflow.com/search', data=data)
This is essentially the same as GET-ting https://stackoverflow.com/questions/tagged/python , but I think you'll get the idea from this.
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
This makes a POST request with the data specified in the values. we need urllib to encode the url and then urllib2 to send a request.
Mechanize library from python is also great allowing you to even submit forms. You can use the following code to create a browser object and create requests.
import mechanize,re
br = mechanize.Browser()
br.set_handle_robots(False) # ignore robots
br.set_handle_refresh(False) # can sometimes hang without this
br.addheaders = [('User-agent', 'Firefox')]
br.open( "http://google.com" )
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'
br.submit()
resp = None
for link in br.links():
siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
if siteMatch:
resp = br.follow_link( link )
break
content = resp.get_data()
print content
I'm trying to rewrite some old python code with requests module.
The purpose is to upload an attachment.
The mail server requires the following specification :
https://api.elasticemail.com/attachments/upload?username=yourusername&api_key=yourapikey&file=yourfilename
Old code which works:
h = httplib2.Http()
resp, content = h.request('https://api.elasticemail.com/attachments/upload?username=omer&api_key=b01ad0ce&file=tmp.txt',
"PUT", body=file(filepath).read(),
headers={'content-type':'text/plain'} )
Didn't find how to use the body part in requests.
I managed to do the following:
response = requests.put('https://api.elasticemail.com/attachments/upload',
data={"file":filepath},
auth=('omer', 'b01ad0ce')
)
But have no idea how to specify the body part with the content of the file.
Thanks for your help.
Omer.
Quoting from the docs
data – (optional) Dictionary or bytes to send in the body of the Request.
So this should work (not tested):
filepath = 'yourfilename.txt'
with open(filepath) as fh:
mydata = fh.read()
response = requests.put('https://api.elasticemail.com/attachments/upload',
data=mydata,
auth=('omer', 'b01ad0ce'),
headers={'content-type':'text/plain'},
params={'file': filepath}
)
I got this thing worked using Python and it's request module. With this we can provide a file content as page input value. See code below,
import json
import requests
url = 'https://Client.atlassian.net/wiki/rest/api/content/87440'
headers = {'Content-Type': "application/json", 'Accept': "application/json"}
f = open("file.html", "r")
html = f.read()
data={}
data['id'] = "87440"
data['type']="page"
data['title']="Data Page"
data['space']={"key":"AB"}
data['body'] = {"storage":{"representation":"storage"}}
data['version']={"number":4}
print(data)
data['body']['storage']['value'] = html
print(data)
res = requests.put(url, json=data, headers=headers, auth=('Username', 'Password'))
print(res.status_code)
print(res.raise_for_status())
Feel free to ask if you have got any doubt.
NB: In this case the body of the request is being passed to the json kwarg.