How to self-handling cookies in PyObjC

How to self-handling cookies in PyObjC - python

I'm implementing a minimal browser in PyObjC for my study.
First, I googled about the way to use webkit from pyobjc and wrote code like below:
#coding: utf-8
import Foundation
import WebKit
import AppKit
import objc
def main():
app = AppKit.NSApplication.sharedApplication()
rect = Foundation.NSMakeRect(100,350,600,800)
win = AppKit.NSWindow.alloc()
win.initWithContentRect_styleMask_backing_defer_(
rect,
AppKit.NSTitledWindowMask |
AppKit.NSClosableWindowMask |
AppKit.NSResizableWindowMask |
AppKit.NSMiniaturizableWindowMask,
AppKit.NSBackingStoreBuffered,
False)
win.display()
win.orderFrontRegardless()
webview = WebKit.WebView.alloc()
webview.initWithFrame_(rect)
pageurl = Foundation.NSURL.URLWithString_("http://twitter.com")
req = Foundation.NSURLRequest.requestWithURL_(pageurl)
webview.mainFrame().loadRequest_(req)
win.setContentView_(webview)
app.run()
if __name__ == '__main__':
main()
It worked fine. But I noticed that this browser is sharing cookies with safari. I want it to be independent from my Safari.app.
So I googled again and I learned that I can override cookie-handling-methods by using NSMutableURLRequest.
Below is the second code I tested:
#coding: utf-8
import Foundation
import WebKit
import AppKit
import objc
def main():
app = AppKit.NSApplication.sharedApplication()
rect = Foundation.NSMakeRect(100,350,600,800)
win = AppKit.NSWindow.alloc()
win.initWithContentRect_styleMask_backing_defer_(
rect,
AppKit.NSTitledWindowMask |
AppKit.NSClosableWindowMask |
AppKit.NSResizableWindowMask |
AppKit.NSMiniaturizableWindowMask,
AppKit.NSBackingStoreBuffered,
False)
win.display()
win.orderFrontRegardless()
webview = WebKit.WebView.alloc()
webview.initWithFrame_(rect)
pageurl = Foundation.NSURL.URLWithString_("http://twitter.com")
req = Foundation.NSMutableURLRequest.requestWithURL_(pageurl)
Foundation.NSMutableURLRequest.setHTTPShouldHandleCookies_(req, False)
webview.mainFrame().loadRequest_(req)
win.setContentView_(webview)
app.run()
if __name__ == '__main__':
main()
This code show me a login screen of twitter :-)
But I couldn't login to twitter by this browser.
I input account name, password and pushed enter key. Then the browser displays the timeline of the account which I always use in Safari.app.
Yes, I know that it's proper result.
I didn't write anything about handling cookies.
And my question is on this point.
I want to know that:
How can I implement and use something like NSHTTPCookieStorage?
Can I write it in python?
Thank you.

To start with the easy part: if it is possible to do this in Objective-C it should also be possible with PyObjC.
That said, it is unclear to me if this is possible at all. How can I have multiple instances of webkit without sharing cookies? seems to indicate that it isn't although you might be able to do something through the webkit delegate.
An other alternative is to use NSURLProtocol, register a custom NSURLProtocol class for handling http/https requests and implement that using Python's urllib or urllib2. The PyDocURL example shows how to do this (that example registers a subclass for pydoc:// URLs).
More information on NSURLConnection is on Apple's website.
Updated with an implemention hint:
An alternate method might be to disable cookie storaga by NSHTTPCookieStorage (NSHTTPCookieStorage.sharedHTTPCookieStorage.setCookieAcceptPolicy_(NSHTTPCookieAcceptPolicyNever)). Then use the webkit resource loading delegate to handle cookies yourself:
Maintain your own cookie store (possibly using a class in urllib2)
In webView:resource:willSendRequest:redirectResponse:fromDataSource: add cookie headers based on information in that store
In webView:resource:didReceiveResponse:fromDataSource: check for "set-cookie" headers and update your own cookie store.
It shouldn't be too hard to do this, and I'd love to have this functionality as an example on the PyObjC website (or even as a utility class in the WebKit bindings for PyObjC).

Related

How to use correctly importlib in a flask controller?

I am trying to load a module according to some settings. I have found a working solution but I need a confirmation from an advanced python developer that this solution is the best performance wise as the API endpoint which will use it will be under heavy load.
The idea is to change the working of an endpoint based on parameters from the user and other systems configuration. I am loading the correct handler class based on these settings. The goal is to be able to easily create new handlers without having to modify the code calling the handlers.
This is a working example :
./run.py :
from flask import Flask, abort
import importlib
import handlers
app = Flask(__name__)
#app.route('/')
def api_endpoint():
try:
endpoint = "simple" # Custom logic to choose the right handler
handlerClass = getattr(importlib.import_module('.'+str(endpoint), 'handlers'), 'Handler')
handler = handlerClass()
except Exception as e:
print(e)
abort(404)
print(handlerClass, handler, handler.value, handler.name())
# Handler processing. Not yet implemented
return "Hello World"
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080, debug=True)
One "simple" handler example. A handler is a module which needs to define an Handler class :
./handlers/simple.py :
import os
class Handler:
def __init__(self):
self.value = os.urandom(5)
def name(self):
return "simple"
If I understand correctly, the import is done on each query to the endpoint. It means IO in the filesystem with lookup for the modules, ...
Is it the correct/"pythonic" way to implement this strategy ?

Question moved to codereview. Thanks all for your help : https://codereview.stackexchange.com/questions/96533/extension-pattern-in-a-flask-controller-using-importlib
I am closing this thread.

Django view to convert URL to PDF with PyQt

I am trying to write a Django view that will return a PDF of a URL.
I'm using PyQt webview.print to create the PDF but I am unsure how to pass the pdf to the Django response, I've tried QBuffer but I can't seem to get it right.
Here is my view so far:
def pdf(request):
app = QApplication(sys.argv)
bufferPdf = QBuffer()
bufferPdf.open(QBuffer.ReadWrite)
web = QWebView()
web.load(QUrl("http://www.google.com")) #the desired url.
printer = QPrinter()
printer.setPageSize(QPrinter.Letter)
printer.setOrientation(QPrinter.Landscape);
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("file.pdf")
def convertIt():
web.print_(printer)
print "Pdf generated"
QApplication.exit()
QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
bufferPdf.seek(0)
result = bufferPdf.readData(0)
bufferPdf.close()
sys.exit(app.exec_())
response = HttpResponse(result, mimetype='application/pdf')
response['Content-Disposition'] = 'attachment; filename=coupon.pdf'
return response
Thanks in advance.

The accepted solution from ekhumoro is incorrect. He provides you code that will run from the command line, but can never ever work within a Django view.
Many people have noted that it's not easy and possibly outright impossible to combine Django with a QT threaded application. The error that you are seeing is a classic example of what you will see when you attempt to do so.
In my own projects I tried many different permutations of organizing and grouping the code, and never did find a solution. The issue seems to be (I am not a QT expert, so if anyone has more information please correct me) that event driven QT applications (anything WebKit uses the QT event model) are built around what is effectively a singleton "QApplication". You cannot control when this sub application will quit and when it's various resources are reaped. As a result any multi-threaded applications using the library will need to very carefully manage it's resources - something that you have zero control over during the process of handling various web applications.
One possible (messy and unprofessional) solution would be to create a script that accepts command line arguments and then invoke said script from within Django as an official sub-process. You would use temporary files for output and then load that into your application. After whatever read event you'd just purge the file on disk. Messy, but effective.
I personally would love to hear from anyone who definitively knows either why this is so hard, or a proper solution - there are literally dozens of threads here on Stackoverflow with incorrect or incomplete explanations of how to approach this problem...

Here's a re-write of your example that should do what you want:
import sys
from PyQt4 import QtCore, QtGui, QtWebKit
class WebPage(QtWebKit.QWebPage):
def __init__(self):
QtWebKit.QWebPage.__init__(self)
self.printer = QtGui.QPrinter()
self.printer.setPageSize(QtGui.QPrinter.Letter)
self.printer.setOrientation(QtGui.QPrinter.Landscape);
self.printer.setOutputFormat(QtGui.QPrinter.PdfFormat)
self.mainFrame().loadFinished.connect(self.handleLoadFinished)
def start(self, url):
self.mainFrame().load(QtCore.QUrl(url))
QtGui.qApp.exec_()
def handleLoadFinished(self):
temp = QtCore.QTemporaryFile(
QtCore.QDir.temp().filePath('webpage.XXXXXX.pdf'))
# must open the file to get the filename.
# file will be automatically deleted later
temp.open()
self.printer.setOutputFileName(temp.fileName())
# ensure that the file can be written to
temp.close()
self.mainFrame().print_(self.printer)
temp.open()
self.pdf = temp.readAll().data()
QtGui.qApp.quit()
def webpage2pdf(url):
if not hasattr(WebPage, 'app'):
# can only have one QApplication, and it must be created first
WebPage.app = QtGui.QApplication(sys.argv)
webpage = WebPage()
webpage.start(url)
return webpage.pdf
if __name__ == '__main__':
if len(sys.argv) > 1:
url = sys.argv[1]
else:
url = 'http://www.google.com'
result = webpage2pdf(url)
response = HttpResponse(result, mimetype='application/pdf')
response['Content-Disposition'] = 'attachment; filename=coupon.pdf'
# do stuff with response...

PyQt4 QWebView external resource content

class Browser(QWebView):
def __init__(self):
QWebView.__init__(self)
self.loadFinished.connect(self._result_available)
self.loadStarted.connect(self._load_started)
self.page().frameCreated.connect(self.onFrame)
# ...
browser = Browser()
browser.setHtml('<html>...</html>', baseUrl=QUrl('http://www.google.com/'))
After that, i need to catch content of all external resources loaded by QWebView. I need to get content of all CSS/Javascript files. How can i do that ? Related questions: question 1, question 2
I know i need to use QNetworkAccessManager somehow, but i don't have any example to use.

We need to make custom QNetworkReply class and get results in readyRead event results.

unit test to check if for a given path a correct context will be returned

Just like in the title. I have a model that I can test manually. I enter url in a browser and receive a result form one of the views. Thing is unittest should be doing that.
I think there should be some way to create a request, send it to the application and in return receive the context.

You can create functional tests using the WebTest package, which allows you to wrap your WSGI application in a TestApp that supports .get(), .post(), etc.
See http://docs.pylonsproject.org/projects/pyramid/1.0/narr/testing.html#creating-functional-tests for specifics in Pyramid, pasted here for posterity:
import unittest
class FunctionalTests(unittest.TestCase):
def setUp(self):
from myapp import main
app = main({})
from webtest import TestApp
self.testapp = TestApp(app)
def test_root(self):
res = self.testapp.get('/', status=200)
self.failUnless('Pyramid' in res.body)

Pyramid doesn't really expose a method for testing a real request and receiving information about the internals. You possible execute the traverser yourself using:
from pyramid.traversal import traverse
app = get_app(...)
root = get_root(app)
out = traverse(root, '/my/test/path')
context = out['context']
However, the test is a bit contrived. It'd be more relevant to use a functional test that checks if the returned page is what you expect.

IE8 automation and https

I'm trying to use IE8 through COM to access a secured site (namely, SourceForge), in Python. Here is the script:
from win32com.client import gencache
from win32com.client import Dispatch
import pythoncom
gencache.EnsureModule('{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}', 0, 1, 1)
class SourceForge(object):
def __init__(self, baseURL='https://sourceforget.net/', *args, **kwargs):
super(SourceForge, self).__init__(*args, **kwargs)
self.__browser = Dispatch('InternetExplorer.Application')
self.__browser.Visible = True
self.__browser.Navigate(baseURL)
def run(self):
while True:
pythoncom.PumpMessages()
def main():
sf = SourceForge()
sf.run()
if __name__ == '__main__':
main()
If I launch IE by hand, fine. If I launch the script, I get a generic error page "Internet Explorer cannot display this page". If I change baseURL to use http instead of https, the script works. I guess this is some security "feature". I tried adding the site to the list of trusted sites. I tried to enable IE scripting in the options for the Internet zone. Doesn't work. Google was no help.
So, does anybody know something about this ? Is there a mysterious option to enable or am I doomed ?
I'm on Windows XP SP3 BTW, Python 2.5 and pywin32 build 213.

I can't open https://sourceforget.net/ -- not by hand, not by script.
Are you sure this link is right?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.