Transfer selected web sockets to browser - python

My understanding of web programming isn't the best, so I might have some misconceptions here about how it works in general, but hopefully what I'm trying to do is possible.
Recently my friend and I have been challenging each other to break web systems we've set up, and in order to break his next one I need to use the requests module, while doing part of it by myself. I'm perfectly happy with the requests module, but after a while, I want to manually take over that session with the server in my browser. I've tried webbrowser.open, but this effectively loads the page again as if I've never connected before, due to not having any of the cookies from the other session. Is this possible, or do I have a misunderstanding of the situation? Thanks in advance for any help.

Related

'406 Not Acceptable' after scaping web using python

The website i scapped blocked me out by showing 406 Not Acceptable on the browser. It might i mistakenly sent too many requests at once on phython code.
So i put time.sleep(10) for each loop to not make it look like a DDoS attack, and it seems worked out.
My questions are:
How long would it be reasonable to send between each request? Sleep 10 seconds for each loop makes my code running too slow.
How to fix the 406 Not Acceptable error on my browsers? They still block me out, only if i chance my ip address but it's not permanent solution.
Thank you all for your answers and comments. Good day!
Any rate-limit errors are all subject to which website you choose to scrape / interact with. I could set up a website that only allows you to view it once per day, before throwing HTTP errors at your screen. So to answer your first question, there is no definitive answer. You must test for yourself and see what's the fastest speed you can go, without getting blocked.
However, there is a workaround. If you use proxies, then it's almost impossible to detect and stop the requests from executing, and therefore you will not be hit by any HTTP errors. HOWEVER, JUST BECAUSE YOU CAN, DOESN'T MEAN THAT YOU SHOULD- I am a programmer, not a lawyer. I'm sure there's a rule somewhere that says that spamming a page, even after it tells you to stop, is illegal.
Your second question isn't exactly related to programming, but I will answer it anyways- try clearing your cookies or refreshing your IP (try using a VPN or such). Other than changing your IP or cookies, there's not many more ways that a page can fingerprint you (in order to block you).

Can you open a headless browser with webbrowser?

I was just wondering if it's possible to open a headless browser with the webbrowser module? I'm new to programming and have virtually no experience and don't even know where to look. I heard this is a good site to start. I wanted to use the webbrowser module because I'm planning to run the program on other computers and the average person doesn't have special software like chrome drivers installed on their computers, also webbrowser doesn't require a PATH to open a browser window. So I wanted to use it. If anyone knows any other alternative modules that can open common browsers without needing a PATH please say so.
Most modules have a so-called API documentation. For the webbrowser module, it can be found here: https://docs.python.org/3.6/library/webbrowser.html
If you come across a module of which you cannot find any documentation, try help() in iPython:
import webbrowser
help(webbrowser) # help for module
help(webbrowser.get) # help for function
browser = webbrowser.get()
help(browser) # help for browser object
There one can see, that this is no documented feature for the webbrowser module. Nevertheless, there are other modules that you might want to look into - this list seems to be a good start https://github.com/dhamaniasad/HeadlessBrowsers
Btw. to respond to Basile Starynkevitch (I have not yet enough reputation to add a comment under other posts): A headless browser might process JavaScript and follow HTML forwarding. You will not get the same from the software you mentioned.
Wrong terminology: a headless browser should be more generally called some HTTP client. Read much more about HTTP and take time to understand what should happen in the HTTP clients and what should happen in the HTTP servers. Be also aware of HTML5, JavaScript, AJAX and other web technologies. They are related in their usage within a usual browser such as Firefox, but conceptually independent.
Of course, your typical browser is an HTTP client, but there are many other HTTP clients (e.g. wget or any program using libcurl, which is a good free software HTTP client library or web crawlers).
Some browsers (e.g. links) can be much more crude than your typical one, but all browsers are HTTP clients. They might not even know about JavaScript or CSS (or not even show any image). They still deserve to be called "browsers". Some programs (e.g. selenium) reproduce many functions of typical browsers (even JavaScript or CSS) but don't show anything on a screen. You might call them headless browsers but they might not even claim being one.
And Python includes some HTTP client (and also HTTP server) functions.
You could find other HTTP server libraries, such as libonion.
Many programs use HTTP (outside of browsing, e.g. as inter-process communication). Be aware of web services.
PS. That is the first time I read about headless browsers, so I don't think this terminology is very common.

What's the best way to get continuous data from another program in Django?

Here's the setup: On a single-board computer with a very rudimentary linux I'm running a Django app. This app is, when a button is pressed or as a response to the data described below, supposed to call either a function from a library written in C, or a compiled C program, to write data to system memory at a specified address, poke/peek like. (Python doesn't seem to be able to do that natively).
The Django app should also display data, continuously, which is being read from the memory from the same library / program.
My question now is how to even begin with setting up the scenario described above. Is this even possible with a web app? Is a Django or more fundamentally any web framework even the right approach here? I'm at a bit of a loss here, since I've spent quite a few hours now trying to figure out how to do this while not getting the most basic starting point...
Disclaimer: I'm pretty new to the entire web framework thing, and more importantly web development in general, so sorry if this is a bad question as in, I could have easily found information on this topic online, but I couldn't really find a good starting point on this.
I wanted to add a comment but not enough space... anyway
You can write a native extension in C for Python that could do what you need, check this.
Now for the fact of displaying data continuously this is kind of vague, if this C library is switching this hypothetical address, very often and very fast you have to update a browser client as fast as possible.
I think websockets would do the trick but they are js related, so I think NodeJs would be a better candidate for the server side of your application instead of Django.
If you want to stick to Django you can also expose an URL with the generated address value and have a webpage continuously (with a little Interval) checking that URL using a simple ajax call, kind of ugly and inefficient but would work.
Anyway IMHO your best bet is for websockets because with them you have a fullduplex communication between client and server.
Good Luck with your project.
Info:
Websockets in Django with socket.io
Nodejs socket.io

Delivering Python Processed data to the web

I have developed a python program that parses a webpage and creates a new text document with the parsed data. I want to deliver this new information to the web. I have no idea where to start with something like this. Are there any free options where I can have a site automatically call this python code upon request and update the new data to its page? Or is the only feasible solution here to have my own website/server that uses my code? I'm honestly pretty overwhelmed with many of the options when I try to begin doing a web-search for a solution like this. I have done a decent amount of application programming before so i'm confident in my ability to learn new things, but web protocols are all new to me so its hard to find a starting point.
Ultimately I want this python code to run automatically, or per request of a user, and deliver to the data to them. It could even be through an email, although that is probably less practical.
I personally have good experience using Google Appengine (and its free for a limited amount of requests). The downside is that it does not allow C-extensions or Python3.
If you want to host your own server, tornado is a good option I think. Tornado supports both Python2 and Python3.
There are a great deal of options available.. from 'traditional' virtual server or website hosts like a2hosting or godaddy to 'Cloud Application Hosts' such as Amazon EC2, Heroku or OpenShift.
For your case, and without knowing more, I would suggest that an application hosting is more appropriate, and that you should take a look at Heroku and Openshift in particular.
Define carefully what you want to achieve (how the users access your application, what they see, how they interact with it... etc..) and then evaluate these options based on those requirements.
Most offer a free trial, or even free services, depending on what you need! Good luck
If you've never worked with web technologies before this will be a overwhelming task, since there's a lot of different technologies involved, and many possible ways to combine them.
You'll probably want to start by familiarizing yourself with the very basics of the HTTP protocol.
Then you should read a bit on CGI server-side programming (the article also has a quick overview on HTTP).
Python can run both on CGI and WSGI (if the server provider allows such access), so you may also want to read about WSGI.
Once you grasp all these concepts, you should check this question for actual python techniques.
Also, since you seem to be under the impression you must pay to have a website/app deployed, you should know there are companies that host python apps for free

Streaming the result of a command back to the browser using Twisted and Comet

I'm writing an application that streams the output (by this I mean both sys.stdout and sys.stderr) of a python script excited on the server, in real time to the browser.
The users on the site will be allowed to select the script to run, excite and kill their chosen script, and change some parameters, so I will need a different thread per user on the site (user A can start, stop and change a script, whilst user B can do the same with a different script).
I know I need to use comet for the web clients, and seeing as the rest of the project is written in python, I'd like to use twisted for the server, however I'm not really sure of what I need to do next!
There are a daunting number of options (Divmod Mantissa, Divmod Nevow, twisted.web, STOMP, etc), and some are better documented that others, making the whole thing rather tricky!
I have a working demo using stompservice on orbited, using Orbited.TCPSocket for the javascript side of things, however I'm starting to think that STOMPs channel model isn't going to work for multithreading, multi-running scripts (unless I open a new channel per run, but that seems like the wrong use of the channel model).
Can anyone point me in the right direction, or some sample code I can learn from?
Thanks!
Nevow Athena is a framework specifically for AJAX and COMET applications and in theory is exactly the sort of thing you are looking for.
However, I am not sure that it is well used or supported at this time - looking at mailing list traffic and google search results suggests that it may not be.
There are a couple of tutorials you could look at to help you decide on it:
one on the 'official' site: http://divmod.org/trac/wiki/DivmodNevow/Athena/Tutorials/LiveElement
and one other that I found:
http://divmodsphinx.funsize.net/nevow/chattutorial/part01/index.html
The code for the latter seems to be included in the Nevow distribution when you download it under /doc/listings/partxx (I think...)
You can implement a very simple "HTTP streaming" by keeping the http connection open and appending javascript chunks that update the dom contents. This works since the browser evaluates the "script" chunks as they arrive.
I wrote a blog entry a while ago with a running example using twisted and very few lines of javascript: Simple HTTP streaming with Twisted & Javascript
You can easily mix this pattern with a publisher/subscriber pattern to make it multiuser, etc. I use this pattern to watch live log streams via web.
An example of serving for long-polling clients with Twisted is slosh. This might not be what you want, but because it's not a large framework, it can help you figure out how to use Twisted.

Categories