Trouble finding library that supports http PUT in Python 2.7 - python

I need to perform http PUT operations from python Which libraries have been proven to support this? More specifically I need to perform PUT on keypairs, not file upload.
I have been trying to work with the restful_lib.py, but I get invalid results from the API that I am testing. (I know the results are invalid because I can fire off the same request with curl from the command line and it works.)
After attending Pycon 2011 I came away with the impression that pycurl might be my solution, so I have been trying to implement that. I have two issues here. First, pycurl renames "PUT" as "UPLOAD" which seems to imply that it is focused on file uploads rather than key pairs. Second when I try to use it I never seem to get a return from the .perform() step.
Here is my current code:
import pycurl
import urllib
url='https://xxxxxx.com/xxx-rest'
UAM=pycurl.Curl()
def on_receive(data):
print data
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')]
encodedarg=urllib.urlencode(arglist)
path2= url+"/user/"+"99b47002-56e5-4fe2-9802-9a760c9fb966"
UAM.setopt(pycurl.URL, path2)
UAM.setopt(pycurl.POSTFIELDS, encodedarg)
UAM.setopt(pycurl.SSL_VERIFYPEER, 0)
UAM.setopt(pycurl.UPLOAD, 1) #Set to "PUT"
UAM.setopt(pycurl.CONNECTTIMEOUT, 1)
UAM.setopt(pycurl.TIMEOUT, 2)
UAM.setopt(pycurl.WRITEFUNCTION, on_receive)
print "about to perform"
print UAM.perform()

httplib should manage.
http://docs.python.org/library/httplib.html
There's an example on this page http://effbot.org/librarybook/httplib.htm

urllib and urllib2 are also suggested.

Thank you all for your assistance. I think I have found an answer.
My code now looks like this:
import urllib
import httplib
import lxml
from lxml import etree
url='xxxx.com'
UAM=httplib.HTTPSConnection(url)
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')\
]
encodedarg=urllib.urlencode(arglist)
uuid="99b47002-56e5-4fe2-9802-9a760c9fb966"
path= "/uam-rest/user/"+uuid
UAM.putrequest("PUT", path)
UAM.putheader('content-type','application/x-www-form-urlencoded')
UAM.putheader('accepts','application/com.internap.ca.uam.ama-v1+xml')
UAM.putheader("Content-Length", str(len(encodedarg)))
UAM.endheaders()
UAM.send(encodedarg)
response = UAM.getresponse()
html = etree.HTML(response.read())
result = etree.tostring(html, pretty_print=True, method="html")
print result
Updated: Now I am getting valid responses. This seems to be my solution. (The pretty print at the end isn't working, but I don't really care, that is just there while I am building the function.)

Related

Download entire history of a Wikipedia page

I'd like to download the entire revision history of a single article on Wikipedia, but am running into a roadblock.
It is very easy to download an entire Wikipedia article, or to grab pieces of its history using the Special:Export URL parameters:
curl -d "" 'https://en.wikipedia.org/w/index.php?title=Special:Export&pages=Stack_Overflow&limit=1000&offset=1' -o "StackOverflow.xml"
And of course I can download the entire site including all versions of every article from here, but that's many terabytes and way more data than I need.
Is there a pre-built method for doing this? (Seems like there must be.)
The example above only gets information about the revisions, not the actual contents themselves. Here's a short python script that downloads the full content and metadata history data of a page into individual json files:
import mwclient
import json
import time
site = mwclient.Site('en.wikipedia.org')
page = site.pages['Wikipedia']
for i, (info, content) in enumerate(zip(page.revisions(), page.revisions(prop='content'))):
info['timestamp'] = time.strftime("%Y-%m-%dT%H:%M:%S", info['timestamp'])
print(i, info['timestamp'])
open("%s.json" % info['timestamp'], "w").write(json.dumps(
{ 'info': info,
'content': content}, indent=4))
Wandering around aimlessly looking for clues to another question I have myself — my way of saying I know nothing substantial about this topic! — I just came upon this a moment after reading your question: http://mwclient.readthedocs.io/en/latest/reference/page.html. Have a look for the revisions method.
EDIT: I also see http://mwclient.readthedocs.io/en/latest/user/page-ops.html#listing-page-revisions.
Sample code using the mwclient module:
#!/usr/bin/env python3
import logging, mwclient, pickle, os
from mwclient import Site
from mwclient.page import Page
logging.root.setLevel(logging.DEBUG)
logging.debug('getting page...')
env_page = os.getenv("MEDIAWIKI_PAGE")
page_name = env_page is not None and env_page or 'Stack Overflow'
page_name = Page.normalize_title(env_page)
site = Site('en.wikipedia.org') # https by default. change w/`scheme=`
page = site.pages[page_name]
logging.debug('extracting revisions (may take a really long time, depending on the page)...')
revisions = []
for i, revision in enumerate(page.revisions()):
revisions.append(revision)
logging.debug('saving to file...')
with open('{}Revisions.mediawiki.pkl'.format(page_name), 'wb+') as f:
pickle.dump(revisions, f, protocol=0) # protocol allows backwards compatibility between machines

Python URLLib does not work with PyQt + Multiprocessing

A simple code as such:
import urllib2
import requests
from PyQt4 import QtCore
import multiprocessing
import time
data = (
['a', '2'],
)
def mp_worker((inputs, the_time)):
r = requests.get('http://www.gpsbasecamp.com/national-parks')
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
def mp_handler():
p = multiprocessing.Pool(2)
p.map(mp_worker, data)
if __name__ == '__main__':
mp_handler()
Basically, if i import PyQt4, and i have a urllib request (i believe this is used in almost all web extraction libraries such as BeautifulSoup, Requests or Pyquery. it crashes with a cryptic log on my MAC)
This is exactly True. It always fails on Mac, I have wasted rows of days just to fix this. And honestly there is no fix as of now. The best way is to use Thread instead of Process and it will work like a charm.
By the way -
r = requests.get('http://www.gpsbasecamp.com/national-parks')
and
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
do one and the same thing. Why are you doing it twice?
This may be due _scproxy.get_proxies() not being fork-safe on Mac.
This is raised here https://bugs.python.org/issue33725#msg329926
_scproxy has been known to be problematic for some time, see for instance Issue31818. That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.
This is something that urllib may be attempting to do causing failure when multiprocessing.
There is a workaround and that is to set the environmental variable no-proxy to *
Eg. export no_proxy=*

python django soaplib response with classmodel issue

I run a soap server in django.
Is it possible to create a soap method that returns a soaplib classmodel instance without <{method name}Response><{method name}Result> tags?
For example, here is a part of my soap server code:
# -*- coding: cp1254 -*-
from soaplib.core.service import rpc, DefinitionBase, soap
from soaplib.core.model.primitive import String, Integer, Boolean
from soaplib.core.model.clazz import Array, ClassModel
from soaplib.core import Application
from soaplib.core.server.wsgi import Application as WSGIApplication
from soaplib.core.model.binary import Attachment
class documentResponse(ClassModel):
__namespace__ = ""
msg = String
hash = String
class MyService(DefinitionBase):
__service_interface__ = "MyService"
__port_types__ = ["MyServicePortType"]
#soap(String, Attachment, String ,_returns=documentResponse,_faults=(MyServiceFaultMessage,) , _port_type="MyServicePortType" )
def sendDocument(self, fileName, binaryData, hash ):
binaryData.file_name = fileName
binaryData.save_to_file()
resp = documentResponse()
resp.msg = "Saved"
resp.hash = hash
return resp
and it responses like that:
<senv:Body>
<tns:sendDocumentResponse>
<tns:sendDocumentResult>
<hash>14a95636ddcf022fa2593c69af1a02f6</hash>
<msg>Saved</msg>
</tns:sendDocumentResult>
</tns:sendDocumentResponse>
</senv:Body>
But i need a response like this:
<senv:Body>
<ns3:documentResponse>
<hash>A694EFB083E81568A66B96FC90EEBACE</hash>
<msg>Saved</msg>
</ns3:documentResponse>
</senv:Body>
What kind of configurations should i make in order to get that second response i mentioned above ?
Thanks in advance.
I haven't used Python's SoapLib yet, but had the same problem while using .NET soap libs. Just for reference, in .NET this is done using the following decorator:
[SoapDocumentMethod(ParameterStyle=SoapParameterStyle.Bare)]
I've looked in the soaplib source, but it seems it doesn't have a similar decorator. The closest thing I've found is the _style property. As seen from the code https://github.com/soaplib/soaplib/blob/master/src/soaplib/core/service.py#L124 - when using
#soap(..., _style='document')
it doesn't append the %sResult tag, but I haven't tested this. Just try it and see if this works in the way you want it.
If it doesn't work, but you still want to get this kind of response, look at Spyne:
http://spyne.io/docs/2.10/reference/decorator.html
It is a fork from soaplib(I think) and has the _soap_body_style='bare' decorator, which I believe is what you want.

Python -- how to grab images off the internet

How can I grab a picture off of a known url and save it to my computer using Python (v2.6)? Thanks
You can use urllib.urlretrieve.
Copy a network object denoted by a URL to a local file, if necessary.
Example:
>>> import urllib
>>> urllib.urlretrieve('http://i.imgur.com/Ph4Xw.jpg', 'duck.jpg')
('duck.jpg', <httplib.HTTPMessage instance at 0x10118e830>)
# by now the file should be downloaded to 'duck.jpg'
You can use urllib.urlretrieve:
import urllib
urllib.urlretrieve('http://example.com/file.png', './file.png')
If you need more flexibility, use urllib2.
In the absence of any context, the following is a simple example of using standard library modules to make an non-authenticated HTTP GET request
import urllib2
response = urllib2.urlopen('http://lolcat.com/images/lolcats/1674.jpg')
with open('lolcat.jpg', 'wb') as outfile:
outfile.write(response.read())
EDIT: urlretrieve() is new to me. I guess then you could turn it into a command line one-liner... if you're bored.
$ python -c "import urllib; urllib.urlretrieve('http://lolcat.com/images/lolcats/1674.jpg', filename='/tmp/1674.jpg')"
batteries are included in urllib:
urllib.urlretrieve(yourUrl, fileName)
import urllib2
open("fish.jpg", "w").write(urllib2.urlopen("http://www.fiskeri.no/Fiskeslag/Fjesing.jpg").read())
Easy.
import urllib
urllib.urlretrieve("http://www.dokuwiki.org/_media/wiki:dokuwiki-128.png","dafile.png")

Given a URL to a text file, what is the simplest way to read the contents of the text file?

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?
TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc
Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2
Actually the simplest way is:
import urllib2 # the lib that handles the url stuff
data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
print line
You don't even need "readlines", as Will suggested. You could even shorten it to: *
import urllib2
for line in urllib2.urlopen(target_url):
print line
But remember in Python, readability matters.
However, this is the simplest way but not the safe way because most of the time with network programming, you don't know if the amount of data to expect will be respected. So you'd generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:
import urllib2
data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines
for line in data:
print line
* Second example in Python 3:
import urllib.request # the lib that handles the url stuff
for line in urllib.request.urlopen(target_url):
print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is
I'm a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is
import urllib.request
data = urllib.request.urlopen(target_url)
for line in data:
...
or alternatively
from urllib.request import urlopen
data = urlopen(target_url)
Note that just import urllib does not work.
The requests library has a simpler interface and works with both Python 2 and 3.
import requests
response = requests.get(target_url)
data = response.text
There's really no need to read line-by-line. You can get the whole thing like this:
import urllib
txt = urllib.urlopen(target_url).read()
import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
print line
Another way in Python 3 is to use the urllib3 package.
import urllib3
http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')
This can be a better option than urllib since urllib3 boasts having
Thread safety.
Connection pooling.
Client-side SSL/TLS verification.
File uploads with multipart encoding.
Helpers for retrying requests and dealing with HTTP redirects.
Support for gzip and deflate encoding.
Proxy support for HTTP and SOCKS.
100% test coverage.
import urllib2
f = urllib2.urlopen(target_url)
for l in f.readlines():
print l
For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.
requests package works really well for simple ui
as #Andrew Mao suggested
import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
print(f'{i} {line}')
o/p:
0 The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1 prices and the demand for clean air', J. Environ. Economics & Management,
2 vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3 ...', Wiley, 1980. N.B. Various transformations are used in the table on
4 pages 244-261 of the latter.
5
6 Variables in order:
Checkout kaggle notebook on how to extract dataset/dataframe from URL
I do think requests is the best option. Also note the possibility of setting encoding manually.
import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text
Just updating here the solution suggested by #ken-kinder for Python 2 to work with Python 3:
import urllib
urllib.request.urlopen(target_url).read()
You can use this, as well for simple methodology:
import requests
url_res = requests.get(url= "http://www.myhost.com/SomeFile.txt")
with open(filename + ".txt", "wb") as file:
file.write(url_res.content)

Categories