I'm trying to build an AWS lambda function that accepts a file upload and then parses it in memory. The file is an xlsx file, and the content comes in to the lambda function looking like this in the body key of the event:
Beginning:
----------------------------300017151060007960655534
Content-Disposition: form-data; name="tag_list"; filename="test-list.xlsx"
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PK
�y�N docProps/PK
And the end of the string looks like this:
[Content_Types].xmlPK�;
----------------------------475068475306850797919587--
If I do a head/tail of the actual file on my computer, it appears that the file starts at the PK and ends at the xmlPK�;. I've attempted to slice this section out and create a BytesIO object or a SpooledTemporaryFile, but none of these options work. They all give me something like invalid seek position, or bad zip file errors.
My goal is to load this xlsx file into memory and then parse it using openpyxl.
My current function looks a little something like this currently. I keep trying to format it differently, sometimes I decode it, sometimes not.
def lambda_handler(event, context):
file_index = event['body'].index('PK')
file_string = event['body'][file_index:]
file_end = file_string.index(';')
file = file_string[:file_end].encode('utf-8')
I then try to pass the file string into BytesIO or a SpooledTemporaryFile, but they all give me errors...
Note, I do NOT want to use S3.
Related
I have a function that saves files to a db, but this one requires a bytes stream as parameter. Something like:
write_to_db("File name", stream_obj)
Now, I want to save a XML; I am using the xml library.
import xml.etree.cElementTree as ET
Is there a function that convert the xml object to bytes stream?
The solution I got was:
Save it locally with the function write
Retrieve it with "rb" to get the file as bytes
Now that I have the bytes stream, save it with the function mentioned
Delete the file
Example:
# Saving xml as local file
tree = ET.ElementTree(ET.Element("Example")
tree.write("/This/is/a/path.xml")
# Reading local file as bytes
f = open("/This/is/a/path.xml", "rb")
# Saving to DB
write_to_db("File name", f) # <--- See how I am using "f" cuz I opened it as bytes with rb
# Deleting local file
os.remove("/This/is/a/path.xml")]
But is there a function from the xml library that returns automatically the bytes stream? Something like:
tree = ET.ElementTree(ET.Element("Example")
bytes_file = tree.get_bytes() # <-- Like this?
# Writing to db
write_to_db("File name", bytes_file)
This so I can prevent creating and removing the file in my repository.
Thank you in advance.
Another fast question:
Are the words "bytes stream" correct? or what is the difference? what would be the correct words that I am looking for?
So as Balmy mentioned in the comments, the solution is using:
ET.tostring()
My code at the end looked something like this:
# Here you build your xml
x = ET.Element("ExampleXML",{"a tag": "1", "another tag": "2"})
# Here I am saving it to my db by using the "tostring" function,
# Which as default return the xml as a bytes stream string.
write_to_db("File name", ET.tostring(x))
I am using a library that has a function like the below, where it expects a file location. The thing is, I store the content of this file in AWS so I can just use AWS api to securely return values I have stored in that resource as a string. How can I transform this string value to be able to pass in a function that is looking for file path - without actually just writing the string values to the local directory as .txt for json file? (I know I can convert the string to a json object but not certain as to how this solves the fact this function is looking for a path) function below:
file_location = 'String values I get returned when I return it from its safe and secure location'
password = credential_function.from_kefile_name(file_location)
I am trying to read-in a file from a Python request, form data. All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body. I see many examples out there like: if 'filename' in request.files:, however this never works for me. I know that the file does in fact live within the ImmutableMultiDict type. Here is my working code example:
if 'my_file.xls' in request.files:
# do something
else:
# return error
if 'file' in request.files:
This is looking for the field name 'file' which corresponds to the name attribute you set in the form:
<input type='file' name='file'>
You then need to do something like this to assign the FileStorage object to the variable mem:
mem = request.files['file']
See my recent answer for more details of how and why.
You can then access the filename itself with:
mem.filename # should give back 'my_file.xls'
To actually read the stream data:
mem.read()
The official flask docs have further info on this, and how to save to disk with secure_filename() etc. Probably worth a read.
All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body.
If you actually want to read the contents of that Excel file, then you'll need to use a library which has compatibility for this such as xlrd. this answer demonstrates how to open a workbook, passing it as a stream. Note that they have used fileobj as the variable name, instead of mem.
I'm trying to detect whether an optional file has been uploaded or not when a cgi form has been submitted.
I've read here that I should be doing something like:
myfile = form["myfile"]
if myfile.file:
# It's an uploaded file; save it to disk
file = open('path_to_file','wb')
file.write(myfile.file.read())
file.close()
But this is not working for me. The file is always being written, whether it's been uploaded or not.
While with any other fields I can always use a default value to check it:
field = cgi.escape(data.getfirst('field','null'))
I can't see the approach to face this for files in the documentation. Any help there?
Thanks.
I tested with Firefox, and uploading a form with an empty file input results in the following post contents:
-----------------------------135438447855682763402193870
Content-Disposition: form-data; name="foo"; filename=""
Content-Type: application/octet-stream
-----------------------------135438447855682763402193870--
Thus a zero-length field is uploaded. However no filename is set so test for the value of the filename attribute instead.
If a field represents an uploaded file, accessing the value via the value attribute or the
getvalue() method reads the entire file in memory as a string. This may not be what you want.
You can test for an uploaded file by testing either the filename attribute or the file attribute.
I am trying to upload a zip file from Web2Py form and then read the contents:
form = FORM(TABLE(
TR(TD('Upload File:', INPUT(_type='file',
_name='myfile',
id='myfile',
requires=IS_NOT_EMPTY()))),
TR(TD(INPUT(_type='submit',_value='Submit')))
))
if form.accepts(request.vars):
data=StringIO.StringIO(request.vars.myfile)
import zipfile
zfile=zipfile.Zipfile(data)
For some reason this code does work and complains of file not being a zip file although the uploaded file is a zip file.
I am new to Web2Py. How can the data be represented as zip-file?
web2py form field uploads already are cgi.FieldStorage, you can get the raw uploaded bytes using:
data = request.vars.myfile.value
For a file-like object StringIO is not needed, use:
filelike = request.vars.myfile.file
zip = zipfile.Zipfile(filelike)
HTTP uploads aren't just raw binary, it's mixed-multipart-form encoded. Write request.vars.myfile out to disk and you'll see, it'll say something like
------------------BlahBlahBoundary
Content-Disposition: type="file"; name="myfile"
Content-Type: application/octet-stream
<binary data>
------------------BlahBlahBoundary--
The naive solution for this is, use cgi.FieldStorage(), the example I provide uses wsgi.input, which is part of mod_wsgi.
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
raw_filw = cStringIO.StringIO(form['myfile'].file.read())
Two things to point out here
Always use cStringIO if you have it,
it'll be faster than StringIO
If you allow uploads like this,
you're streaming the file into ram,
so, however big the file is is how
much ram you'll be using - this does
NOT scale. I had to write my own
custom MIME stream parser to stream
files to disk through python to avoid
this. But, if you're learning or this is
a proof of concept you should be fine.