Can you Unzip and view content on Google App engine - Python? - python

I am trying to upload large CSV files onto GAE using a zip using XML & HTTP POST
Steps:
CSV is zipped & base64 encoded and sent to GAE via XML/HTTP POST
GAE - using minidom to parse XML
GAE - Base64 decode ZIP
GAE - Get CSV from Zip file.
I have tried using zipfile but can't figure out how to create a zipfile object from the base 64decoded string
I get: TypeError: unbound method read() must be called with ZipFile instance as first argument (got str instance instead)
myZipFile = base64.decodestring(base64ZipFile)
objZip = zipfile.ZipFile(myZipFile,'r')
strCSV = zipfile.ZipFile.read(objZip,'list.csv')

As Rob mentioned, ZipFile requires a file-like object. You can use StringIO to provide a file-like interface to a string.
For example:
import StringIO
myZipFile = base64.decodestring(base64ZipFile)
objZip = zipfile.ZipFile(StringIO.StringIO(myZipFile),'r')

Yes you can. In fact, I wrote a blog post that describes how to do exactly that.

A simple approach might be to upload the zipped csv to the blobstore using the blob upload API, and process the zip file from there. You'd need to fake a form post, but life might be simpler for you on the appengine side.
There's an example of how to process zipped data in AppEngine MapReduce. See the BlobstoreZipInputReader class.

ZipFile does not take a string but a file-like object.
One solution is creating a tempfile to write the string to then passing that to ZipFile:
import tempfile
import zipfile
tmp = tempfile.TemporaryFile()
tmp.write(myZipFile) # myZipFile is your decoded string containing the zip-data
objZip = zipfile.ZipFile(tmp)

Related

PyFlink - How to readFile() with specified file input format (instead of text format)?

In Java/Scala API, there is a readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) method which reads files in the path based on the given fileInputFormat. With this method, I can read other file types, e.g. a gzip file.
Is there a corresponding method in Python API? (Or how can I read a gzip file with Python API?)
Thanks,
Acan
you need to use the build-in python module gzip.
example:
import gzip
with gzip.open("test.txt.gz", "rb") as f:
data = f.read()
print(data)
>>> b'example text'
here is a tutorial https://www.tutorialspoint.com/python-support-for-gzip-files-gzip
You can call the read_text_file method of stream_execution_environment in PyFlink

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

According to S3.Client.upload_file and S3.Client.upload_fileobj, upload_fileobj may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use upload_fileobj? In other words,
import boto3
s3 = boto3.resource('s3')
### Version 1
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
### Version 2
with open('/tmp/hello.txt', 'rb') as data:
s3.upload_fileobj(data, 'mybucket', 'hello.txt')
Is version 1 or version 2 better? Is there a difference?
The main point with upload_fileobj is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM.
Python have standard library module for that purpose.
The code will look like
import io
import boto3
s3 = boto3.client('s3')
fo = io.BytesIO(b'my data stored as file object in RAM')
s3.upload_fileobj(fo, 'mybucket', 'hello.txt')
In that case it will perform faster, since you don't have to read from local disk.
TL;DR
in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).
use upload_file() when writing code that only handles uploading files from disk.
use upload_fileobj() when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.
What is fileobj anyway?
there is convention in multiple places including the python standard library, that when one is using the term fileobj she means file-like object.
There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.
when using file object your code is not limited to disk only, for example:
for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).
you can (de)compress or decrypt data on the fly when writing objects to S3
example using python gzip module with file-like object in generic way:
import gzip, io
def gzip_greet_file(fileobj):
"""write gzipped hello message to a file"""
with gzip.open(filename=fileobj, mode='wb') as fp:
fp.write(b'hello!')
# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))
# using filename from disk
gzip_greet_file('/tmp/b.gz')
# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())
tarfile on the other hand has two parameters file & fileobj:
tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
Example compression on-the-fly with s3.upload_fileobj()
import gzip, boto3
s3 = boto3.resource('s3')
def upload_file(fileobj, bucket, key, compress=False):
if compress:
fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
key = key + '.gz'
s3.upload_fileobj(fileobj, bucket, key)
Neither is better, because they're not comparable. While the end result is the same (an object is uploaded to S3), they source that object quite differently. One expects you to supply the path on disk of the file to upload while the other expects you to provide a file-like object.
If you have a file on disk and want to upload it, then use upload_file. If you have a file-like object (which could ultimately be many things including an open file, a stream, a socket, a buffer, a string) then use upload_fileobj.
A 'file-like object' in this context is anything that implements the read method, and returns bytes.
As per the documentation in https://boto3.amazonaws.com/v1/documentation/api/1.9.185/guide/s3-uploading-files.html
"The upload_file and upload_fileobj methods are provided by the S3 Client, Bucket, and Object classes. The method functionality provided by each class is identical. No benefits are gained by calling one class's method over another's. Use whichever class is most convenient."
The answers above seems to be false

reading data from the json file in Python for automating the web application

I have a code for automating web application( i am using Python and Selenium ),where i am entering the static data, i want to use JSON file to send the data to the application, can anyone please help me to how to write the code to pick from JSON file. Here is my code :-
import unittest
from selenium import webdriver
name = driver.find_element_by_xpath("some xpath").send_keys("xxxx")
pass = driver.find_element_by_xpath("some xpath").send_keys("xxxx")
phone_no = driver.find_element_by_xpath("some xpath").send_keys("xxxx")
Please help me in how to read data from json file.
Not sure if this is what you're looking for but, the simplest to read json is using the json module. json.load deserializes the json into a python object, usually a dictionary.
import json
with open('file-name.json') as data_file:
data = json.load(data_file)
# access json (now python object) like this... data['some-field']
When using the send_keys function, you can't just send the json as is, because it will add '\n' after every json attribute. Here are three ways to do it:
Way 1 - Do not use as a json, just use plain text
path = os.path.abspath("../excel_upload.json")
with open(path, "r") as fp:
obj = fp.readlines()
print(str(obj))
driver.find_element_by_name("name").send_keys((obj))
Way 2 - Serialization of the json, which is in the form of Dict, using json.dumps.
Way 3 - Loading the json and then removing '\n' by taking it in a string.
I used the first way for taking the complete json text, while second way to use key pairs. There will be a better way, please update with better solutions.
Loading and reading json is fine, but using it in send_keys is a different issue.

Creating a File-Like Object from email.message.Message

I have an attachment in my email.message.Message.
The attachment is of type email.message.Message so I can call get_payload() on it to return its associated data.
However, I want to be able to load this data into a file-like object so I can read and write from it as if I was reading this attachment from my desktop.
How can I do this without actually saving the attachment on my drive?
cStringIO was made specifically for this purpose.
You can use StringIO if you need multiple encoding schemes,but cStringIO is MUCH faster.
Example usage:
import cStringIO
test = cStringIO.StringIO()
test.write("test")
test.getvalue()
>>> "test"

Web2Py - Upload a file and read the content as Zip file

I am trying to upload a zip file from Web2Py form and then read the contents:
form = FORM(TABLE(
TR(TD('Upload File:', INPUT(_type='file',
_name='myfile',
id='myfile',
requires=IS_NOT_EMPTY()))),
TR(TD(INPUT(_type='submit',_value='Submit')))
))
if form.accepts(request.vars):
data=StringIO.StringIO(request.vars.myfile)
import zipfile
zfile=zipfile.Zipfile(data)
For some reason this code does work and complains of file not being a zip file although the uploaded file is a zip file.
I am new to Web2Py. How can the data be represented as zip-file?
web2py form field uploads already are cgi.FieldStorage, you can get the raw uploaded bytes using:
data = request.vars.myfile.value
For a file-like object StringIO is not needed, use:
filelike = request.vars.myfile.file
zip = zipfile.Zipfile(filelike)
HTTP uploads aren't just raw binary, it's mixed-multipart-form encoded. Write request.vars.myfile out to disk and you'll see, it'll say something like
------------------BlahBlahBoundary
Content-Disposition: type="file"; name="myfile"
Content-Type: application/octet-stream
<binary data>
------------------BlahBlahBoundary--
The naive solution for this is, use cgi.FieldStorage(), the example I provide uses wsgi.input, which is part of mod_wsgi.
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
raw_filw = cStringIO.StringIO(form['myfile'].file.read())
Two things to point out here
Always use cStringIO if you have it,
it'll be faster than StringIO
If you allow uploads like this,
you're streaming the file into ram,
so, however big the file is is how
much ram you'll be using - this does
NOT scale. I had to write my own
custom MIME stream parser to stream
files to disk through python to avoid
this. But, if you're learning or this is
a proof of concept you should be fine.

Categories