How to access proxy_url\auth in urllib3.ProxyManager - python

I'm working right now on project (not mine to clarify) which scraps some sites using urllib3 to make requests and some of them are under CF protection. I found some cfscrape (etc etc list of similar names) library that is a wrapper of requests.Session which may help with circumventing antibot measures of CF but there is a catch, I need proxies which are fetched by API and put into ProxyManager objects. In devenv I have no access to those proxies because of policy. Is there an easy way to get proxy url and auth from ProxyManager or do I need to add some square wheels (aka save them somewhere else as second copy) to integrate that library into project with little work as possible without degrading performance by that much? Don't really want to rewrite urllib3 usage to requests.Session

To close the question - ProxyManager does have easy access tho it's kinda strange that I couldn't find anything in the docs (maybe I overlooked it)

Related

How to create a URL shortner (with python) without importing bittly or other stuff

So I recently became interested into knowing how to create a URL shortener without using bittly or other things, but I am not very good at using python to connect with other stuff. All I know is:
Checking to see if the URL is available ( Only to see if it has HTTP:// and unavailable characters, nothing to see if the domain is occupied or not. )
All of the other things... I need help with.
By the way, I COMPLETLY do not understand how to do that, so it would be great if you add comments to show me what is going on.
I suggest you take a look at Flask, it is a framework for building web applications (APIs, web apps, etc.).
DigitalOcean has a nice tutorial on this.
You can either use hashing algorithms for the custom shortened urls, or even let the user pick more readable names (like bit.ly/my-url). In this case you would be storing in a database the shortened url and the long url.

How to handle multireddits with praw

does anybody have any experience with handling multireddits with PRAW?
I need to get a list of multi's for a logged in user (that should be http://www.reddit.com/dev/api/oauth#GET_api_multi_mine) and then get a list of subbreddits in each multi.
For the life of me, I can't figure out how to do this with PRAW.
Thanks!
It's not implemented (yet).
You can send the requests through get_content, which would allow you to take advantage of some of PRAW's features such as request throttling, caching and objects rather than json. It will still be some work.
I'll update this comment when multireddit functionality is implemented in PRAW.

How can I update a plone page via a script?

I have a large amount of automatically generated html files that I would like to push to my Plone website with a script. I currently generate the files, log into Plone, click edit on each individual page and copy and paste the html into the editor. I'd like to automate this. It would be nice to retain the plone versioning, have a auto generated comment for the edit, and come from a specific user.
I've read and tried Webdav with little luck at getting it working consistently and know that there is a way to connect to plone via ftp, but haven't tried it. I'm not sure if these are the methods that I need.
My google searches aren't leading me to anything useful. Any ideas on where to start looking for a solution to this? Or any tips on implementing it?
You can script anything in Plone via the following methods:
Through-the-web via API calls (e.g. XML-RPC, wsapi, etc.)
The bin/instance run script provided by plone.recipe.zope2instance (See charm for an example of this).
You can also use a migration framework like:
collective.transmogrifier
which allows you to write migration code, and trigger it via GenericSetup or Browser view. Additionally, there are applications written on top of Transmogrifier aimed roughly at what you are describing, the most popular of which is:
funnelweb
I would recommend that you consider using or writing a Transmogrifier "blueprint(s)" to do your import, and execute the pipeline with a tool that makes that easy:
mr.migrator
You can find blueprints by searching PyPI for "transmogrify". One popular set of blueprints is:
quintagroup.transmogrifier
One of the main attractions to the Transmogrifier approach, aside from getting the job done, is the ability to share useful blueprints with others.
I think transmogrifier is the best tool for this job, but this will definitely be a programming task no matter how you do it. It's used for many such migration jobs such as migrating from drupal.
There's an add-on, wsapi4plone.core that pumazi at WebLion started that provides web services for portals which you can then hook into. You can create, modify, delete content via XML-RPC calls. The only caveat is that it doesn't yet work with Collections (criteria specifically).
project: http://pypi.python.org/pypi/wsapi4plone.core
docs: http://packages.python.org/wsapi4plone.core/
You can also do it programmatically by hooking into the ZODB via Python (zopepy or some other method).
These should get you started:
http://plone.org/documentation/kb/manipulating-plone-objects-programmatically/reading-and-writing-field-values - you should be able to get an understanding of accessors and mutators (setters and getters), in your case you are going to be more than likely working with obj.Text (getter) and obj.setText (setter).
https://weblion.psu.edu/trac/weblion/wiki/AutomatingObjectCreation - lots of examples (slightly outdated but still relevant)
http://plone.org/documentation/faq/upload-images-files
Try to enable Webdav or ftp in Plone, then you can access Plone via webdav or ftp clients, pushing the html files. Plone (Zope) will recognises the html files as Pages.

how to get feedparser to send a cache-control header?

I'm using python feedparser in an aggregator client that runs behind a squid proxy. I want it to send a cache-control: max-age=600 header in the request, so that we get a reasonably up-to-date response. (At the moment the feeds are returned by the proxy from its cache, even days after they changed, which is reasonable based on heuristic expiry but not good enough.)
There doesn't seem to be any direct api in feedparser to do this so what's the best way? I don't really want to change the source.
update: there's a bug, 224, asking for a way to add arbitrary headers, with partial patches, but not yet merged. That's probably the cleanest way. Otherwise it seems I need to monkeypatch either urllib or feedparser. ick.
It seems to me there are two ways:
1- wait for http://code.google.com/p/feedparser/issues/detail?id=224 to be fixed. I put up a patch that lets you send extra_headers={'Cache-control': 'max-age=0'} and we'll see if they accept it.
2- monkeypatch in to urllib2 to put some extra headers on the request, which seems to be the only answer without changing feedparser.
Better answers very welcome...
update 2010-10-29 patch is now merged upstream, and waiting for a release
The semantics of the argument have changed (it's called request_headers now) but there is a new release of feedparser out that should support this use case.

Noob Question: Python + Twitter + App Engine - Oauth

I'm sorry but I'm having some trouble implementing Oauth within my app engine python project.
I've been working from http://github.com/tav/tweetapp, but I don't think I have a strong enough grasp on this platform to understand how to implement this class within my main.py I'm building the rest of my app in.
This maybe a feeble attempt, but here is what I have so far:
twa = twitter_auth
client = twa.OAuthClient('twitter')
I've created a source folder within my project called "twitter_auth" and that contains a file within it called "twitter_auth.py" which contains the above linked library, and a file called __ init__.py (no space) which is completely empty.
I really have no idea what to do from here :/
Let me recommend taking a look at the tweepy library and some example tweepy apps. Specifically here: http://github.com/wasauce/tweepy-examples
This shows how to use oauth to authenticate a user: http://github.com/wasauce/tweepy-examples/tree/master/appengine/oauth_example/
As Hagge said, it sounds like your issue is more with the tweetapp library than with App Engine. However, if you would like to know more about OAuth on App Engine and if I may be allowed to link to myself, my two articles on the topic seem to be reasonably popular.
The tweetapp library was a an early prototype for Twitter OAuth on twitter. Tav did the heavy lifting and I deployed the site http://twitteroauth.appspot.com , using some of the tweetapp library. The actual source of that site is here (I need to update the site to point here): http://github.com/ryanwi/twitteroauth
I am still using it in production, but, it has aged and does not work for all API calls. I'd recommend trying a different, more up to date and maintained library as others have mentioned.
But, take a look at the twitteroauth source if you want to try to get a first attempt working.
These two are on Twitter's list
http://github.com/brosner/python-oauth2
http://code.google.com/p/oauth-python-twitter2/
I'm not familiar with that library, but after a quick look and seeing the warning that it is not maintained I'd search for something better. I implemented a simple Twitter connection based on Tornado's auth: see an example of how to make Twitter API calls here (and an authentication example here). In case you don't want to use tipfy, I recommend implementing the python-twitter library in your framework of choice.

Categories