building Web browser in Python and issue regarding cookies - python

I know this sounds weird, but I have got no choice, I searched the google and I found nothing, So..
I'm following a video tutorial https://www.youtube.com/watch?v=JEW50aEVi4k on 'building a webbrowser in python', I was wondering if cookies can be saved, So is it possible ?
If yes, then could you give some suggestions.

Cookies are not a problem - you can use mechanize (https://pypi.python.org/pypi/mechanize/) which saves and sends the cookies automatically.
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
response = browser.open('http://www.youtube.com')
#Headers are handled automatically. You can access them:
headers = browser.request.header_items()
>>> headers
[('Host', 'www.youtube.com'), ('Cookie', 'YSC=cNcoiHG71bY; VISITOR_INFO1_LIVE=uLHsDODGalg; PREF=f1=50000000'), ('User-agent', 'Python-urllib/2.7')]
It is very hard to write a browser with Javascript support. If you need javasctipt then i suggest you to use selenium with PhantomJS which acts just like a real browser.

Related

Open url in browser and fill the form

I want to open a url using python script and then same python script should fill the form but not submit it
For example script should open https://www.facebook.com/ and fill the name and password in the fields, but don't submit it.
You can use Selenium to get it done smoothly. Here is the sample code with Google search:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
browser.find_element_by_id("lst-ib").send_keys("book")
# browser.find_element_by_name("btnK").click()
The last line is commented intentionally if do not want to submit the search.
Many websites don't support Web Scraping. Actually It may cost you an illegal access case on you.
But Try using requests library in python.
You'll find it easy to do that stuff.
https://realpython.com/python-requests/
payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)

Python- Get cookies of a website saved in a browser (Chrome/FireFox)

I can manually see the cookies set in the browser.
How can I fetch the cookie from a Python script?
import requests
res=requests.get("https://stackoverflow.com/questions/50404771/python-get-cookiesof-a-website-saved-in-a-browser-chrome-firefox")
res.cookies
print(res.cookies.keys())
print(res.cookies["prov"])
I hope I read your question right.
You may want to ask "how do I read cookies already stored in my browser?". Which I don't think you can do. But Selenium would give you access to a new browser session with which you can obtain more cookies.
UPDATE
Thanks to Sraw for the pointer, I've tried this now but it wouldn't transfer my login to the requests API. So maybe it is not possible on modern sites, or the OP could try these tools since their question is clearer in their mind than ours.
import requests
import browsercookie
url = "https://stackoverflow.com/questions/50404771/python-get-cookiesof-a-website-saved-in-a-browser-chrome-firefox"
res=requests.get(url)
cj = browsercookie.chrome()
res2 = requests.get(url, cookies=cj)
import re
get_title = lambda html: re.findall('(.*?)', html, flags=re.DOTALL)[0].strip()
get_me = lambda html: re.findall('John', html, flags=re.DOTALL)
# At this point I had deleted my answer so this got nothing
# now my answer is reinstated it will return me but not in place of the login button.
print(len(get_me(res2.text)))

Scrape Facebook in Python

I'm interested in getting the number of friends each of my friends on Facebook has. Apparently the official Facebook API does not allow getting the friends of friends, so I need to get around this (somehwhat sensible) limitation somehow. I tried the following:
import sys
import urllib, urllib2, cookielib
username = 'me#example.com'
password = 'mypassword'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : username, 'pass' : password})
request = urllib2.Request('https://login.facebook.com/login.php')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12')
opener.open(request, login_data)
resp = opener.open('http://facebook.com')
print resp.read()
but I only end up with a captcha page. Any idea how FB is detecting that the request is not from a "normal" browser? I could add an extra step and solve the captcha but that would add unnecessary complexity to the program so I would rather avoid it. When I use a web browser with the same User-Agent string I don't get a captcha.
Alternatively, does anyone have any saner ideas on how to accomplish my goal, i.e. get a list of friends of friends?
Have you tried tracing and comparing HTTP transactions with Fiddler2 or Wireshark? Fiddler can even trace https, as long as your client code can be made to work with bogus certs.
I did try a lot of ways to scrape facebook and the only way that worked for me is :
To install selenium , the firefox plugin, the server and the python client library.
Then with the firefox plugin, you can record the actions you do to login and export as a python script, you use this as a base for your work and it will work. Basically I added to this script a request to my webserver to fectch a list of things to inspect on FB and then at the end of the script I send the results back to my server.
I could NOT find a way to do it directly from my server with a browser simulator like mechanize or else ! I guess It needs to be done from a client browser.

Using urllib2 for posting data, following redirects and maintaining cookies

I am using urllib2 in Python to post login data to a web site.
After successful login, the site redirects my request to another page. Can someone provide a simple code sample on how to do this in Python with urllib2? I guess I will need cookies also to be logged in when I get redirected to another page. Right?
Thanks a lot in advace.
First, get mechanize: http://wwwsearch.sourceforge.net/mechanize/
You could do this kind of stuff with just urllib2, but you will be writing tons of boilerplate code, and it will be buggy.
Then:
import mechanize
br = mechanize.Browser()
br.open('http://somesite.com/account/signin/')
br.select_form('loginForm')
br['username'] = 'jekyll'
br['password'] = 'bananas'
br.submit()
# At this point, you're logged in, redirected, and the
# br object has the cookies and all that.
br.geturl() # e.g. http://somesite.com/loggedin/
Then you can use the Browser object br and do whatever you have to do, click on links, etc. Check the samples on the mechanize site

Python - The request headers for mechanize

I am looking for a way to view the request (not response) headers, specifically what browser mechanize claims to be. Also how would I go about manipulating them, eg setting another browser?
Example:
import mechanize
browser = mechanize.Browser()
# Now I want to make a request to eg example.com with custom headers using browser
The purpose is of course to test a website and see whether or not it shows different pages depending on the reported browser.
It has to be the mechanize browser as the rest of the code depends on it (but is left out as it's irrelevant.)
browser.addheaders = [('User-Agent', 'Mozilla/5.0 blahblah')]
You've got an answer on how to change the headers, but if you want to see the exact headers that are being used try using a proxy that displays the traffic. e.g. Fiddler2 on windows or see this question for some Linux altenatives.
you can modify referer too...
br.addheaders = [('Referer', 'http://google.com')]

Categories