upload csv/excel file to appengine (python) for processing - python

I need to be able to upload an excel or csv file to appengine so that the server can process the rows and create objects. Can anyone provide or point me to an example of how this is done? Thanks for your help.

Uploading to the Blobstore is probably what you are after. Then reading the data and processing it with the csv module.
You might want to look into sending your file to google docs in the case of excel (and other) formats then reading the rows back via the Spreadsheets API

If you mean a one-off (or a few) transfers, you're probably looking for the bulk upload system: http://code.google.com/appengine/docs/python/tools/uploadingdata.html
If you're talking about regular uploads during use, you'll need to handle them as post requests to the application.

Related

Is it possible to use Django to parse locally stored files (csv for example) without uploading them?

I would like to develop a WebApp that parses locally stored data and allows users to create a sorted excel file.
It would be amazing if I could somehow avoid the uploading of the files. Some users are worried because of the data, the files can get really big so I have to implement async processes and so on...
Is something like that possible?

What's the best way to handle large files upload with aiohttp (server)

I'm currently working on kind of cloud files manager. You can basically upload your files through a website. This website is connected to a python backend which will store your files and manage them using a database. It will basically put every of your files inside a folder and rename them with their hash. The database will associate the name of the file and its categories (kindof folders) with the hash so that you can retrieve the file easily.
My problem is that I would like the file upload to be really user friendly: I have a quite bad connection and when I try to download or upload a file on the internet I often get problems like at 90% of file uploading, the upload fails and I need to restart it. I'ld like to avoid that.
I'm using aiohttp to achieve this, how could I allow a file to be uploaded in multiple times? What should I use to upload large files.
In a previous code which managed really small files (less than 10MB), I was using something like that:
data = await request.post()
data['file'].file.read()
Should I continue to do it this way for large files?

Django data storage - SQL or something else?

I am building a Django web app which will essentially serve static data to the users. By static, I mean that admins will be able to upload new datasets but no data entries will be made by users. Effectively, once the data is uploaded, it will be read-only on request by a user.
Given that these are quite large datasets (200k+ rows), I figured that SQL would be the best way to store the data - this avoids reading large datasets into memory (as you'd have to with a pickle or json?). This has the added bonus of using Django models to access the data.
However, I am not sure of the best way to do this, or if there is a better alternative to SQL. I currently have an admin page that allows you to upload .xlsx files which are then parsed and added as model entries row-by-row. It takes FOREVER (30+ minutes for 100K rows). Perhaps I should be creating a whole new db outside of Django and then importing that somehow, but I can't find much documentation on how this could/should be done. Any ideas would be greatly appreciated! Thanks in advance for any wisdom.
You can try to use .csv file format instead of .xlsx. Python has libraries that allow you to easily write to an sql database using .csv format (comma separated value). This answer could be of further assistance. I hope you find what you're looking for and happy coding!

Loading a Lot of Data into Google Bigquery from Python

I've been struggling to load big chunks of data into bigquery for a little while now. In Google's docs, I see the insertAll method, which seems to work fine, but gives me 413 "Entity too large" errors when I try to send anything over about 100k of data in JSON. Per Google's docs, I should be able to send up to 1TB of uncompressed data in JSON. What gives? The example on the previous page has me building the request body manually instead of using insertAll, which is uglier and more error prone. I'm also not sure what format the data should be in in that case.
So, all of that said, what is the clean/proper way of loading lots of data into Bigquery? An example with data would be great. If at all possible, I'd really rather not build the request body myself.
Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.
If you'd like to send large chunks directly to BQ, you can send it via POST. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert() instead of tabledata.insertAll(), and provide a description of a load job. To actually push the bytes using the Python client, you can create a MediaFileUpload or MediaInMemoryUpload and pass it as the media_body parameter.
The other option is to stage the data in Google Cloud Storage and load it from there.
The example here uses the resumable upload to upload a CSV file. While the file used is small, it should work for virtually any size upload since it uses a robust media upload protocol. It sounds like you want json, which means you'd need to tweak the code slightly for json (an example for json is in the load_json.py example in the same directory). If you have a stream you want to upload instead of a file, you can use a MediaInMemoryUpload instead of the MediaFileUpload that is used in the example.
BTW ... Craig's answer is correct, I just thought I'd chime in with links to sample code.

Creating spreadsheets for download in django

I am using xlrd to create spreadsheets. On a website, a user will be able to create a custom report and download that xls file.
Usually, I am storing files on S3, but in this case, is there a way not to store the file anywhere and just give it directly to the user? Or how should I do this if I don't want to use S3 to save the file?
xlrd is a good choice. About the generation and download processes - it depends on the web framework in usage, here is an example with web2py.

Categories