Modifying files in Google Cloud Storage from Google App Engine - python

I have an application that is hosted through Google App Engine. It is intended to be a file hosting application, where files are uploaded directly to GCS. However, there is some processing that needs to happen with these files, so originally my plan was to download the files, do the modifications, then reupload. Unfortunately, GAE is a read-only file system. What would be the proper way to make file modifications to objects in GCS from GAE? I am unfamiliar with most google cloud services, but I see ones such as google-cloud-dataproc, would these be able to do it?
Operations are removing lines from files, and combining files into a single .zip

You can store the file in the tmpfs partition that you have on App Engine mounted in /tmp. It's in memory file system and you will use memory to store files. If the files are too large, increase the memory size of your App Engine instance else you will have a out of memory error
If the file is too big, you have to use another product.
Think to clean the files after use to free memory space.

Related

What do I do with my local csv files when deploying Python Panel Application to Google Cloud

Hello I am trying to deploy my panel application with google cloud. I am just wondering how do I deal with my local csv files I am importing as data frames. I use df = pd.read_csv("some/local/directory") to create my data frames. And I use functions like os.path.join How do I handle this data when deploying? My data is all .csv files around 7gb.
Upload your csv files to Google Cloud Storage, it can be accessed from other Cloud Services.
Try Python Client for Cloud Storage and refer example Reading and writing to Cloud Storage
If you are running your application in Compute Engine VM, one more option is Cloud Fuse - you can mount storage bucket as a file system in VM.
Make sure to give storage bucket access to Service Account you are using.

How to run Python script on files in Google Cloud

I have a bunch of files in a Google Cloud Storage bucket, including some Python scripts and text files. I want to run the Python scripts on the text files. What would be the best way to go about doing this (App Engine, Compute Engine, Jupyter)? Thanks!
I recommend using Google Cloud Function, that can be triggered automatically each time you upload new file to the Cloud Storage to process it. You can see workflow for this in Cloud Function Storage Tutorial
You will need to at least download the python scripts onto an environment first (be it GCE or GAE). To access the GCS text files, you can use https://pypi.org/project/google-cloud-storage/ library. I don't think you can execute python scripts from the object bucket itself.
If it is troublesome to change the python codes for reading the text files from GCS, you will have to download everything into your environment (e.g. using gsutil)

Store and retrieve text online/cloud using python APIs

I am writing an python application which reads/parses a file of this this kind.
myalerts.ini,
value1=1
value2=3
value3=10
value4=15
Currently I store this file in local filesystem. If I need to change this file I need to have physical access to this computer.
I want to move this file to cloud so that I can change this file anywhere (another computer or from phone).
If this application is running on some machine I should be able to change this file on the cloud and the application which is running on another machine which I don't have physical access to will be able to read updated file.
Notes,
I am new to both python and aws.
I am currently running it on my local mac/linux and planning on deploying on aws.
There are many options!
Amazon S3: This is the simplest option. Each computer could download the file at regular intervals or just before they run a process. If the file is big, the app could instead check whether the file has changed before downloading.
Amazon Elastic File System (EFS): If your applications are running on multiple Amazon EC2 instances, EFS provides a shared file system that can be mounted on each instance.
Amazon DynamoDB: A NoSQL database instead of a file. Much faster than parsing a file, but less convenient for updating values — you'd need to write a program to update values, eg from the command-line.
AWS Systems Manager Parameter Store: A managed service for storing parameters. Applications (anywhere on the Internet) can request and update parameters. A great way to configure cloud-based application!
If you are looking for minimal change and you want it accessible from anywhere on the Internet, Amazon S3 is the easiest choice.
Whichever way you go, you'll use the boto3 AWS SDK for Python.

How do I work on multiple appengine projects with python?

I need to export my blobstore from one appengine project and upload it to another project. How can I switch between projects programmatically with python?
If by "python" you mean a python GAE app's code itself - AFAIK you can't switch apps - each such code runs only inside the app specified in the .yaml file.
You could teach the exporting app project to serve the blob and for the actual transfer you could either:
have the receving app directly pulling the blobs from the exporting app
have an external (python) script pull blobs from the exporting apps and uploading them to the importing app.
Either way you'd need to write some code to actually perform the transfer.
So instead of doing that, I'd rather write and execute a one-time conversion script to move the data from blobstore (presently shown in the GAE python docs under Storing Data > Superseded Storage Solutions section on the left-side menubar) to the Datastore or GCS, both of which have better backup/restore options, including across apps :) GCS can probably be even used to share the same data across apps. And you can still serve the GCS data using the blobstore API, see Uploading files directly to Google Cloud Storage for GAE app
If you mean some external python app code - AFAIK the blobstore doesn't offer generic access directly to an external application (I might be wrong, tho). So an external app would need to go through the regular upload/download handlers of the 2 apps. So in this case switching between projects really means switching between the 2 apps' upload/download URLs.
Even for this scenario it might be worthy to migrate to GCS, which does offer direct access, see Sharing and Collaboration

what is the best way to achieve remote storage for users' files using django

What is the best storage server mechanism for the following requirements:
The files that are going to be stored are encrypted and below 70MB
Files have an identifier on the storage server
I need the file to be retrieved very fast
The storage server is in the same domain as as the django server
The number of files being stored increases over time.
I have been suggested different methods like having a web server like apache or nginx to serve the files. Others also suchested using MongoDB as the storage server. I want the implementation be as simple as possible. What would you recommend?
If you can try moving the file storage to Amazon S3 Storage, Django Boto (https://github.com/qnub/django-boto) helps in integration and easy to use. For Better performance i would suggest virtualenv, nginx with uWSGI link:
http://uwsgi-docs.readthedocs.org/en/latest/tutorials/Django_and_nginx.html

Categories