Mechanize in Python Changes HTTP headers case - python

I am trying to automate a web page request by using mechanize in python.
When I add custom headers like
X-Session= 'abc'
and
X-Auth='123'
by using addheader function.
object=mechanize.Browser()
object.addheaders=[('X-Session','abc'),('X-Auth','123')]
It changes those headers to X-session and X-auth.
I believe due to that the server is not able to authenticate me.
Can anybody help how to maintain the case?
Thanks.

Mechanize expect two items tuple as header, first item is header name, second is value, so you must do:
object.addheaders=[('X-Session','abc'), ('X-Auth','123')]
(Two tuples of two elements instead of one tuple with 4 elements).
To check headers that Mechanize will send with query, you can do:
print(request.header_items())
This should print something like:
[('X-Session','abc'), ('X-Auth','123')]
Doc: http://wwwsearch.sourceforge.net/mechanize/doc.html#adding-headers

Related

How to retrieve the domain of a web archived website using the archived url in Python?

Given a url such as :
http://web.archive.org/web/20010312011552/www.feralhouse.com/cgi-bin/store/commerce.cgi?page=ac2.html
Is there a way (using some library, package, or vanilla Python) to retrieve the domain "www.feralhouse.com"?
I thought of simply using split at "www", split the second-index item at "com", and re-group the first-index item like:
url = "http://web.archive.org/web/20010312011552/www.feralhouse.com/cgi-bin/store/commerce.cgi?page=ac2.html"
url1=url.split("www")
url2=url1[1].split("com")
desired_output = "www"+url2[0]+"com"
print(desired_output)
#www.feralhouse.com
But there are some exceptions to this method (sites with no www, I assume they rely on the browser automatically changing that). I would prefer a less "hacky" approach if possible. Thanks in advance!
NOTE: I dont want a solution just for this SPECIFIC url, I want a solution for all possible archived urls.
EDIT: Another example url
http://web.archive.org/web/20000614170338/http://www.clonejesus.com/
Two methods, one with split, one with re module:
s = 'http://web.archive.org/web/20010312011552/www.feralhouse.com/cgi-bin/store/commerce.cgi?page=ac2.html'
print(s.split('/', 5)[-1])
import re
print(re.findall(r'\d{14}/(.*)', s)[0])
Prints:
www.feralhouse.com/cgi-bin/store/commerce.cgi?page=ac2.html
www.feralhouse.com/cgi-bin/store/commerce.cgi?page=ac2.html

GET requests works only with params not just url

I've just discovered something strange. When downloading data from facebook with GET using the requests 2.18.4 library, I get error when I just use
requests.get('https://.../{}/likes?acces_token={}'.format(userID,token))
into which I parse the user ID and access - the API does not read the access token correctly.
But, it works fine as
requests.get('https://../{}'.format(userID), params={"access_token":token})
Or it works when I copy paste the values in the appropriate fields by hand in the python console.
So my hypothesis is that it has something to with how the token string got parsed using the params vs the string. But what I don't understand at all, why would that be the case? Or is ? character somehow strange in this case?
Double check if both the URLs are the same (in your post they differ by the /likes substring).
Then you can check how the library requests concatenated parameters from the params argument:
url = 'https://facebook.com/.../{}'.format(userID)
r = requests.Request('GET', url, params={"access_token":token})
pr = r.prepare()
print pr.url

Passing Multiple URLs (strings) Into One request.get statement Python

I am trying to have a request.get statement with two urls in it. What I am aiming to do is have requests (Python Module) make two requests based on list or two strings I provide. How can I pass multiple strings from a list into a request.get statement, and have requests go to each url (string) and have do something?
Thanks
Typically if we talking python requests library it only runs one url get request at a time. If what you are trying to do is perform multiple requests with a list of known urls then it's quite easy.
import requests
my_links = ['www.google.com', 'www.yahoo.com']
my_responses = []
for link in my_links:
payload = requests.get(link).json()
print('got response from {}'.format(link))
my_response.append(payload)
print(payload)
my_responses now has all the content from the pages.
You don't. The requests.get() method (or any other method, really) takes single URL and makes a single HTTP request because that is what most humans want it to do.
If you need to make two requests, you must call that method twice.
requests.get(url)
requests.get(another_url)
Of course, these calls are synchronous, the second will only begin once the first response is received.

Send Cookie Using Python Requests

I am trying to scrape some data (reproduce a POST operation I did in a browser) by using Python Requests library. Expecting it will return the content I saw while using browser by copying the request header and post form.
However, I am not quite sure what is the correct way to send cookies using Python Requests.
Here is a screen shot how it looks like in Chrome.
It looks like I can either use cookie as a key in the request header or use the cookie argument in the post command.
(1). If I want to use cookie in the request header, should I treat the whole cookie content as a long string and set the key to be cookie and set the value to be that string?
(2). If I want to use the cookie argument in the request.post command, should I manually translate that long string into a dictionary first, and then pass to cookie argument?. Something like this?
mycookie = {'firsttimevisitor':'N'; 'cmTPSet':'Y'; 'viewType':'List'... }
# Then
r = requests.post(myurl, data=myformdata, cookies=mycookie, headers=myheaders)
Thanks!
Just follow the documentation:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
So use a dict for cookie, and note it's cookies not cookie.
Yes. But make sure you call it "Cookie" (With capital C)
I always did it with a dict. Requests expects you to give a dict.
A string will give the following
cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: string indices must be integers, not str

Python Requests - missing element value

Suppose I have the following HTML input element within a form:
<input name="title_input" type="text" id="missing_value" title="Title">
If I want to submit a POST:
s = requests.Session()
s.get(url)
postResult = s.post(url, {'title_input':'This Is the Name of the Title'})
Even though the element has a missing value attribute, will this POST still work correctly?
I.e. will Python append value="This Is The Name of the Title" in the element even though it's missing from the original HTML?
Even though the element has a missing value attribute, will this POST still work correctly?
Yes It will. POST request will be done without obtaining HTML at all
You don't need this line for POST request
s.get(url)
I.e. will Python append value="This Is The Name of the Title" in the element
No Python will not append anything. Python even will not analyze get content (if get request is done)
It just open tcp connection and send data.
You haven't explained what that HTML is or how it relates to the Python code, but in any case the HTML doesn't seem to have anything to do with anything. The POST request is made by the requests module, not by HTML, so it gets its value from whatever you put into the parameters to the post() call.

Categories