how to download a file using python-sharepoint library - python

I am using this library https://github.com/ox-it/python-sharepoint to connect to a SharePoint list. I can authenticate, access the list fields, including the full URL to the file I want, and it seems this library does have is_file() and open() methods however, I do not understand how to call these.
Any advice is appreciated!
from sharepoint import SharePointSite, basic_auth_opener
opener = basic_auth_opener(server_url, "domain/username", "password")
site = SharePointSite(server_url, opener)
sp_list = site.lists['ListName']
for row in sp_list.rows:
print row.id, row.Title, row.Author['name'], row.Created, row.EncodedAbsUrl
#download file
#row.open() ??
To quote from ReadMe file:
Support for document libraries is limited, but SharePointListRow
objects do support a is_file() method and an open() method for
accessing file data.

Basically you call these methods on the list row (which is of type SharePointListRow).
The open() method is actually the method of urllib2's opener, which you usually use like so:
import urllib2
opener = urllib2.build_opener()
response = opener.open('http://www.example.com/')
print ('READ CONTENTS:', response.read())
print ('URL :', response.geturl())
# ....
So you should be able to use it like this (I don't have any Sharepoint site to check this though):
from sharepoint import SharePointSite, basic_auth_opener
opener = basic_auth_opener(server_url, "domain/username", "password")
site = SharePointSite(server_url, opener)
sp_list = site.lists['ListName']
for row in sp_list.rows(): # <<<
print row.id, row.Title, row.Author['name'], row.Created, row.EncodedAbsUrl
# download file here
print ( "This row: ", row.name() ) # <<<
if row.is_file(): # <<<
response = row.open() # <<<
file_data = response.read() # <<<
# process the file data, e.g. write to disk

Related

How do I use python requests to download a processed files?

I'm using Django 1.8.1 with Python 3.4 and i'm trying to use requests to download a processed file. The following code works perfect for a normal request.get command to download the exact file at the server location, or unprocessed file.
The file needs to get processed based on the passed data (shown below as "data"). This data will need to get passed into the Django backend, and based off the text pass variables to run an internal program from the server and output .gcode instead .stl filetype.
python file.
import requests, os, json
SERVER='http://localhost:8000'
authuser = 'admin#google.com'
authpass = 'passwords'
#data not implimented
##############################################
data = {FirstName:Steve,Lastname:Escovar}
############################################
category = requests.get(SERVER + '/media/uploads/9128342/141303729.stl', auth=(authuser, authpass))
#download to path file
path = "/home/bradman/Downloads/requestdata/newfile.stl"
if category.status_code == 200:
with open(path, 'wb') as f:
for chunk in category:
f.write(chunk)
I'm very confused about this, but I think the best course of action is to pass the data along with request.get, and somehow make some function to grab them inside my views.py for Django. Anyone have any ideas?
To use data in request you can do
get( ... , params=data)
(and you get data as parameters in url)
or
post( ... , data=data).
(and you send data in body - like HTML form)
BTW. some APIs need params= and data= in one request of GET or POST to send all needed information.
Read requests documentation

How to convert suds object to xml string

This is a duplicate to this question:
How to convert suds object to xml
But the question has not been answered: "totxt" is not an attribute on the Client class.
Unfortunately I lack of reputation to add comments. So I ask again:
Is there a way to convert a suds object to its xml?
I ask this because I already have a system that consumes wsdl files and sends data to a webservice. But now the customers want to alternatively store the XML as files (to import them later manually). So all I need are 2 methods for writing data: One writes to a webservice (implemented and tested), the other (not implemented yet) writes to files.
If only I could make something like this:
xml_as_string = My_suds_object.to_xml()
The following code is just an example and does not run. And it's not elegant. Doesn't matter. I hope you get the idea what I want to achieve:
I have the function "write_customer_obj_webservice" that works. Now I want to write the function "write_customer_obj_xml_file".
import suds
def get_customer_obj():
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
customer = c.factory.create("ns0:CustomerType")
return customer
def write_customer_obj_webservice(customer):
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
response = c.service.save(someparameters, None, None, customer)
return response
def write_customer_obj_xml_file(customer):
output_filename = r'C\temp\testxml'
# The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
xml = customer.to_xml()
fo = open(output_filename, 'a')
try:
fo.write(xml)
except:
raise
else:
response = 'All ok'
finally:
fo.close()
return response
# Get the customer object always from the wsdl.
customer = get_customer_obj()
# Since customer is an object, setting it's attributes is very easy. There are very complex objects in this system.
customer.name = "Doe J."
customer.age = 42
# Write the new customer to a webservice or store it in a file for later proccessing
if later_processing:
response = write_customer_obj_xml_file(customer)
else:
response = write_customer_obj_webservice(customer)
I found a way that works for me. The trick is to create the Client with the option "nosend=True".
In the documentation it says:
nosend - Create the soap envelope but don't send. When specified, method invocation returns a RequestContext instead of sending it.
The RequestContext object has the attribute envelope. This is the XML as string.
Some pseudo code to illustrate:
c = suds.client.Client(url, nosend=True)
customer = c.factory.create("ns0:CustomerType")
customer.name = "Doe J."
customer.age = 42
response = c.service.save(someparameters, None, None, customer)
print response.envelope # This prints the XML string that would have been sent.
You have some issues in write_customer_obj_xml_file function:
Fix bad path:
output_filename = r'C:\temp\test.xml'
The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
What's the type of customer? type(customer)?
xml = customer.to_xml() # to be continued...
Why mode='a'? ('a' => append, 'w' => create + write)
Use a with statement (file context manager).
with open(output_filename, 'w') as fo:
fo.write(xml)
Don't need to return a response string: use an exception manager. The exception to catch can be EnvironmentError.
Analyse
The following call:
customer = c.factory.create("ns0:CustomerType")
Construct a CustomerType on the fly, and return a CustomerType instance customer.
I think you can introspect your customer object, try the following:
vars(customer) # display the object attributes
help(customer) # display an extensive help about your instance
Another way is to try the WSDL URLs by hands, and see the XML results.
You may obtain the full description of your CustomerType object.
And then?
Then, with the attributes list, you can create your own XML. Use an XML template and fill it with the object attributes.
You may also found the magic function (to_xml) which do the job for you. But, not sure the XML format matches your need.
client = Client(url)
client.factory.create('somename')
# The last XML request by client
client.last_sent()
# The last XML response from Web Service
client.last_received()

How to download large files in Python 2

I'm trying to download large files (approx. 1GB) with mechanize module, but I have been unsuccessful. I've been searching for similar threads, but I have found only those, where the files are publicly accessible and no login is required to obtain a file. But this is not my case as the file is located in the private section and I need to login before the download. Here is what I've done so far.
import mechanize
g_form_id = ""
def is_form_found(form1):
return "id" in form1.attrs and form1.attrs['id'] == g_form_id
def select_form_with_id_using_br(br1, id1):
global g_form_id
g_form_id = id1
try:
br1.select_form(predicate=is_form_found)
except mechanize.FormNotFoundError:
print "form not found, id: " + g_form_id
exit()
url_to_login = "https://example.com/"
url_to_file = "https://example.com/download/files/filename=fname.exe"
local_filename = "fname.exe"
br = mechanize.Browser()
br.set_handle_robots(False) # ignore robots
br.set_handle_refresh(False) # can sometimes hang without this
br.addheaders = [('User-agent', 'Firefox')]
response = br.open(url_to_login)
# Find login form
select_form_with_id_using_br(br, 'login-form')
# Fill in data
br.form['email'] = 'email#domain.com'
br.form['password'] = 'password'
br.set_all_readonly(False) # allow everything to be written to
br.submit()
# Try to download file
br.retrieve(url_to_file, local_filename)
But I'm getting an error when 512MB is downloaded:
Traceback (most recent call last):
File "dl.py", line 34, in <module>
br.retrieve(br.retrieve(url_to_file, local_filename)
File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 277, in retrieve
block = fp.read(bs)
File "C:\Python27\lib\site-packages\mechanize\_response.py", line 199, in read
self.__cache.write(data)
MemoryError: out of memory
Do you have any ideas how to solve this?
Thanks
You can use bs4 and requests to get you logged in then write the streamed content. There are a few form fields required including a _token_ field that is definitely necessary:
from bs4 import BeautifulSoup
import requests
from urlparse import urljoin
data = {'email': 'email#domain.com', 'password': 'password'}
base = "https://support.codasip.com"
with requests.Session() as s:
# update headers
s.headers.update({'User-agent': 'Firefox'})
# use bs4 to parse the from fields
soup = BeautifulSoup(s.get(base).content)
form = soup.select_one("#frm-loginForm")
# works as it is a relative path. Not always the case.
action = form["action"]
# Get rest of the fields, ignore password and email.
for inp in form.find_all("input", {"name":True,"value":True}):
name, value = inp["name"], inp["value"]
if name not in data:
data[name] = value
# login
s.post(urljoin(base, action), data=data)
# get protected url
with open(local_filename, "wb") as f:
for chk in s.get(url_to_file, stream=True).iter_content(1024):
f.write(chk)
Try downloading/writing it by chunks. Seems like file takes all your memory.
You should specify Range header for your request if server supports it.
https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

Python requests_toolbelt MultipartEncoder filename

Using requests_toolbelt to upload large files in a Multipart form, I have constructed a method below which succeeds in uploading the file, however I cannot access the posted filename. How do I access the filename on the server?
# client-side
file = open('/Volumes/Extra/test/my_video.mpg', 'rb')
payload = MultipartEncoder({file.name: file})
r = requests.post(url, data=payload, headers={'Content-Type': 'application/octet-stream'})
# server-side
#view_config(route_name='remote.agent_upload', renderer='json')
def remote_agent_upload(request):
r = request.response
fs = request.body_file
f = open('/Volumes/Extra/tests2/bar.mpg', 'wb') # wish to use filename here
f.write(fs.read())
fs.close()
f.close()
return r
OK, it looks like you are using the name of the file as the field name. Also, the way that you are doing it, seems like the entire post content is being written to file... Is this the desired outcome? Have you tried to actually play your mpg files after you write them on the server side?
I don't have an HTTP server readily available to test at the moment which automagically gives me a request object, but I am assuming that the request object is a webob.Request object (at least it seems like that is the case, please correct me if I'm wrong)
OK, let me show you my test. (This works on python3.4, not sure what version of Python you are using, but I think it should also work on Python 2.7 - not tested though)
The code in this test is a bit long, but it is heavily commented to help you understand what I did every step of the way. Hopefully, it will give you a better understanding of how HTTP requests and responses work in python with the tools you are using
# My Imports
from requests_toolbelt import MultipartEncoder
from webob import Request
import io
# Create a buffer object that can be read by the MultipartEncoder class
# This works just like an open file object
file = io.BytesIO()
# The file content will be simple for my test.
# But you could just as easily have a multi-megabyte mpg file
# Write the contents to the file
file.write(b'test mpg content')
# Then seek to the beginning of the file so that the
# MultipartEncoder can read it from the beginning
file.seek(0)
# Create the payload
payload = MultipartEncoder(
{
# The name of the file upload field... Not the file name
'uploadedFile': (
# This would be the name of the file
'This is my file.mpg',
# The file handle that is ready to be read from
file,
# The content type of the file
'application/octet-stream'
)
}
)
# To send the file, you would use the requests.post method
# But the content type is not application-octet-stream
# The content type is multipart/form-data; with a boundary string
# Without the proper header type, your server would not be able to
# figure out where the file begins and ends and would think the
# entire post content is the file, which it is not. The post content
# might even contain multiple files
# So, to send your file, you would use:
#
# response = requests.post(url, data=payload, headers={'Content-Type': payload.content_type})
# Instead of sending the payload to the server,
# I am just going to grab the output as it would be sent
# This is because I don't have a server, but I can easily
# re-create the object using this output
postData = payload.to_string()
# Create an input buffer object
# This will be read by our server (our webob.Request object)
inputBuffer = io.BytesIO()
# Write the post data to the input buffer so that the webob.Request object can read it
inputBuffer.write(postData)
# And, once again, seek to 0
inputBuffer.seek(0)
# Create an error buffer so that errors can be written to it if there are any
errorBuffer = io.BytesIO()
# Setup our wsgi environment just like the server would give us
environment = {
'HTTP_HOST': 'localhost:80',
'PATH_INFO': '/index.py',
'QUERY_STRING': '',
'REQUEST_METHOD': 'POST',
'SCRIPT_NAME': '',
'SERVER_NAME': 'localhost',
'SERVER_PORT': '80',
'SERVER_PROTOCOL': 'HTTP/1.0',
'CONTENT_TYPE': payload.content_type,
'wsgi.errors': errorBuffer,
'wsgi.input': inputBuffer,
'wsgi.multiprocess': False,
'wsgi.multithread': False,
'wsgi.run_once': False,
'wsgi.url_scheme': 'http',
'wsgi.version': (1, 0)
}
# Create our request object
# This is the same as your request object and should have all our info for reading
# the file content as well as the file name
request = Request(environment)
# At this point, the request object is the same as what you get on your server
# So, from this point on, you can use the following code to get
# your actual file content as well as your file name from the object
# Our uploaded file is in the POST. And the POST field name is 'uploadedFile'
# Grab our file so that it can be read
uploadedFile = request.POST['uploadedFile']
# To read our content, you can use uploadedFile.file.read()
print(uploadedFile.file.read())
# And to get the file name, you can use uploadedFile.filename
print(uploadedFile.filename)
So, I think this modified code will work for you. (Hopefully)
Again, not tested because I don't actually have a server to test with. And also, I don't know what kind of object your "request" object is on the server side.... OK, here goes:
# client-side
import requests
file = open('/Volumes/Extra/test/my_video.mpg', 'rb')
payload = MultipartEncoder({'uploadedFile': (file.name, file, 'application/octet-stream')})
r = requests.post('http://somewhere/somefile.py', data=payload, headers={'Content-Type': payload.content_type})
# server-side
#view_config(route_name='remote.agent_upload', renderer='json')
def remote_agent_upload(request):
# Write your actual file contents, not the post data which contains multi part boundary
uploadedFile = request.POST['uploadedFile']
fs = uploadedFile.file
# The file name is insecure. What if the file name comes through as '../../../etc/passwd'
# If you don't secure this, you've just wiped your /etc/passwd file and your server is toast
# (assuming the web user has write permission to the /etc/passwd file
# which it shouldn't, but just giving you a worst case scenario)
fileName = uploadedFile.filename
# Secure the fileName here...
# Make sure it doesn't have any slashes or double dots, or illegal characters, etc.
# I'll leave that up to you
# Write the file
f = open('/Volumes/Extra/tests2/' + fileName, 'wb')
f.write(fs.read())
Probably too late for the original OP but may help someone else. This is how I upload a file with accompanying json in a multipart/form-data upload using the MultipartEncoder. When I require a file to uploaded as binary with a single json string as part of a multipart request (so there are just two parts, the file and the json). Note that in creating my request header (it's a custom header as designated by the receiving server) I get the content_type from the encoded object (it usually comes through as multipart/form-data). I'm using simplejson.dumps but you can just use json.dumps I believe.
m = MultipartEncoder([
('json', (None, simplejson.dumps(datapayload), 'text/plain')),
('file', (os.path.basename(file_path), open(file_path, 'rb'), 'text/plain'))],
None, encoding='utf-8')
headers = {'Authorization': 'JwToken' + ' ' + jwt_str, 'content-type': m.content_type}
response = requests.post(uri, headers=headers, data=m, timeout=45, verify=True )
In the file part, the field is called "file", but I use os.path.basename(file_path) to get just the filename from a full file path e.g. c:\temp\mytestfile.txt . It's possible I could just as easily call the file something else (that's not the original name) in this field if I wanted to.

upload binary data or file from memory using python http post

I have seen the receipes for uploading files via multipartform-data and pycurl. Both methods seem to require a file on disk. Can these recipes be modified to supply binary data from memory instead of from disk ? I guess I could just use a xmlrpc server instead.I wanted to get around having to encode and decode the binary data and send it raw... Do pycurl and mutlipartform-data work with raw data ?
This (small) library will take a file descriptor, and will do the HTTP POST operation: https://github.com/seisen/urllib2_file/
You can pass it a StringIO object (containing your in-memory data) as the file descriptor.
Find something that cas work with a file handle. Then simply pass a StringIO object instead of a real file descriptor.
I met similar issue today, after tried both and pycurl and multipart/form-data, I decide to read python httplib/urllib2 source code to find out, I did get one comparably good solution:
set Content-Length header(of the file) before doing post
pass a opened file when doing post
Here is the code:
import urllib2, os
image_path = "png\\01.png"
url = 'http://xx.oo.com/webserviceapi/postfile/'
length = os.path.getsize(image_path)
png_data = open(image_path, "rb")
request = urllib2.Request(url, data=png_data)
request.add_header('Cache-Control', 'no-cache')
request.add_header('Content-Length', '%d' % length)
request.add_header('Content-Type', 'image/png')
res = urllib2.urlopen(request).read().strip()
return res
see my blog post: http://www.2maomao.com/blog/python-http-post-a-binary-file-using-urllib2/
Following python code works reliable on 2.6.x. The input data is of type str.
Note that the server that receives the data has to loop to read all the data as the large
data sizes will be chunked. Also attached java code snippet to read the chunked data.
def post(self, url, data):
self.curl = pycurl.Curl()
self.response_headers = StringIO.StringIO()
self.response_body = io.BytesIO()
self.curl.setopt(pycurl.WRITEFUNCTION, self.response_body.write)
self.curl.setopt(pycurl.HEADERFUNCTION, self.response_headers.write)
self.curl.setopt(pycurl.FOLLOWLOCATION, 1)
self.curl.setopt(pycurl.MAXREDIRS, 5)
self.curl.setopt(pycurl.TIMEOUT, 60)
self.curl.setopt(pycurl.ENCODING,"deflate, gzip")
self.curl.setopt(pycurl.URL, url)
self.curl.setopt(pycurl.VERBOSE, 1)
self.curl.setopt(pycurl.POST,1)
self.curl.setopt(pycurl.POSTFIELDS,data)
self.curl.setopt(pycurl.HTTPHEADER, [ "Content-Type: octate-stream" ])
self.curl.setopt(pycurl.POSTFIELDSIZE, len(data))
self.curl.perform()
return url, self.curl.getinfo(pycurl.RESPONSE_CODE),self.response_headers.getvalue(), self.response_body.getvalue()
Java code for the servlet engine:
int postSize = Integer.parseInt(req.getHeader("Content-Length"));
results = new byte[postSize];
int read = 0;
while(read < postSize) {
int n = req.getInputStream().read(results);
if (n < 0) break;
read += n;
}
found a solution that works with the cherrypy file upload example: urllib2-binary-upload.py
import io # Part of core Python
import requests # Install via: 'pip install requests'
# Get the data in bytes. I got it via:
# with open("smile.gif", "rb") as fp: data = fp.read()
data = b"GIF89a\x12\x00\x12\x00\x80\x01\x00\x00\x00\x00\xff\x00\x00!\xf9\x04\x01\n\x00\x01\x00,\x00\x00\x00\x00\x12\x00\x12\x00\x00\x024\x84\x8f\x10\xcba\x8b\xd8ko6\xa8\xa0\xb3Wo\xde9X\x18*\x15x\x99\xd9'\xad\x1b\xe5r(9\xd2\x9d\xe9\xdd\xde\xea\xe6<,\xa3\xd8r\xb5[\xaf\x05\x1b~\x8e\x81\x02\x00;"
# Hookbin is similar to requestbin - just a neat little service
# which helps you to understand which queries are sent
url = 'https://hookb.in/je2rEl733Yt9dlMMdodB'
in_memory_file = io.BytesIO(data)
response = requests.post(url, files=(("smile.gif", in_memory_file),))

Categories