Uploading files to storage. Multiple users - python

I am making an application in python.
In short:
The user inputs some images for calibration, and some images that then are transformed by algorithm.
To further improve the algorithm and service, I want the users to upload calibration images to a central storage in the cloud. How would I go about this?
How do I make it secure(I.e. Not get people randomly upload terabytes of files)?
Is it possible to have a script on the server/cloud side that validates if the uploaded file should be deleted or not?
I have some experience with Azure, but open for anything..

A high level perspective:
develop a middleware to manage user authentication and proxy the upload to any cloud storage on your own. In python you may want to look for web API framework like Django / Flask to implement user authentication with a database properly. You also have to implement secure connection between the middleware and client.
A less recommended implementation is calling cloud service API directly from client, for example AWS provides a boto python client which can access to S3 API with accessKey(AK) and secretKey(SK) of a IAM user. You could prompt the user for their AK and SK for uploading file on S3. Then you are relying on the authorization of AWS. However this would expose your public cloud account to user, in security measure each user using your application would need to create a unique IAM user, setting up with minimal access policy properly. If you have lot of users, you will need to consider a user group for your application to minimize your effort on user management.

Related

Is there a way to retrieve Google Analytics 4 data on a schedule using Node.js?

This is what I want to achieve:
Ask the user to authorize the collection of their data on a Google Analytics 4 property (or Universal Analytics but I would rather not)
Programmatically retrieve and store the data every n-hours
I was able to do (1) client-side by asking for authorization with google's OAUTH2 and making a call to Reporting API v4 https://developers.google.com/analytics/devguides/reporting/core/v4 using gapi on the front-end.
However, I'm not sure how to do it on a schedule without user interaction. I've searched Google's API docs and I believe there's a way to do it in python https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py but I am currently limited to Node and the browser. I guess I could make a server in python that does the data fetching and connects with the Node application, but that's yet another layer of complications that I'm trying to avoid. Is there a way to do everything in Node?
GCP APIs are all documented in a way which allows everyone to generate client libraries in a variety of languages, including node.js. The documentation for the node.js client for Analytics Reporting is here.
For the question of how to schedule this on GCP, I would recommend you to use Cloud Scheduler. This will hit an endpoint running on Cloud Run, which will do the actual work. Alternatively, if you already have a service running somewhere else, you can simply add the required endpoints there and point Cloud Scheduler to it.
The overall design which I would suggest you goes something like this:
Build a site which takes the user through the OAUTH2 login process,
requesting the relevant Google Analytics Reporting API scopes
required to make the request.
Store the obtained credentials in their user database.(preferably
Firestore in Datastore mode)
Set up a Cloud Run service (or anything else), with two endpoints
Iteration endpoint: Iterate through the list of users and add tasks
to Cloud Tasks to hit the download endpoint for each one.
Download endpoint: Takes a user ID (e.g. as a query parameter) and
performs the download for this user. You will need to load the
credentials for the user from the database and use this to access the
reporting API.
Store the downloaded data in the desired location, e.g. Cloud
Storage, Firestore, Cloud SQL, etc.
Set up Cloud Scheduler to hit the iteration endpoint at the desired
frequency.
For the GCP services mentioned above, basically everything other than Analytics, you may use the "cloud" clients for node.js, which are available here
Note : The question you have asked is a very broad question and this answer is just a suggestion. You may think about other designs whichever works best for you.

Google Drive Python API without Creating Project

For the Google Drive Python API, in all the tutorials I have seen, they require
users to create a project in their Google Dashboard, before obtaining a client ID and a client secret json file. I've been researching both the default Google Drive API and the pydrive module.
Is there a way for users to simply login to their Google Account with username and password,
without having to create a project? So once they login to their Google Account, they are free to
access all files in their Google Drive?
It's not possible to use the Drive API without creating a GCP project for the application. Otherwise Google has no idea what application is requesting access, and what scope of account access it should have.
Using simply a username and password to log in is not possible. You need to create a project and use OAuth.
it might be possible using some pysimplegui hackery or just simply modifying the code of a python based browser but in most cases it is not practical
except if you need to automate something (like renaming files ) that would take 1 hour in a place where you do not have access to GCP

Is there any way to use AWS textract API without giving access to secret key in code?

I'm working on a project in which I have to use extract API, which gives the best result as compared to other API.
To use API, I use my account API credentials. , E.g.,, I implement an image text recognition function in my desktop app. Which uses python as backend( for request and processing) and PYQT5 for frontend( to get the desired file from a user ) so to use "AWS Textract" I set up my "Acess key" and "Secret Acess key" as an environment variable for convenience if I want to export that project to another system.
I have to provide my access keys and secret access key to work accurately. That I don't want to share. How I manage to use AWS Textract in Desktop application without giving sensitive information to the user in my source code of application ( which can be very harmful to me as AWS provide a limited number of runs of Textract for Trial users )
If the user got their hands on the Access key and Secret Access key, they might use it to create bulk requests. Which was not the goal of application to use it.
Help needed and modifications are also accepted in the idea.
Three possible ways:
Use Temporary Security Credentials. This requires that your app has a server component, which will create time-limited credentials with presumably very restricted permissions for the desktop app on demand. This way your root credentials are never exposed.
Your server acts as a proxy for the entire operation, meaning that it accepts a data upload, runs it through Textract and returns the result. This way no AWS credentials are ever exposed to the client, but the processing requirements for the server are much higher obviously.
You require the user of your app to register their own AWS account and generate their own credentials, ridding you of any responsibility. Alternatively you can create specific limited users on your own account if that makes sense for your use case.

Secured communication between two web servers (Amazon EC2 with Django and Google App Engine)

I have a website which uses Amazon EC2 with Django and Google App Engine for its powerful Image API and image serving infrastructure. When a user uploads an image the browser makes an AJAX request to my EC2 server for the Blobstore upload url. I'm fetching this through my Django server so I can check whether the user is authenticated or not and then the server needs to get the url from the App Engine server. After the upload is complete and processed in App Engine I need to send the upload info back to the django server so I can build the required model instances. How can I accomplish this? I was thinking to use urllib but how can I secure this to make sure the urls will only get accessed by my servers only and not by a web user? Maybe some sort of secret key?
apart from the Https call ( which you should be making to transfer info to django ), you can go with AES encryption ( use Pycrypto/ any other lib). It takes a secret key to encrypt your message.
For server to server communication, traditional security advice would recommend some sort of IP range restriction at the web server level for the URLs in addition to whatever default security is in place. However, since you are making the call from a cloud provider to another cloud provider, your ability to permanently control the IP address of either the client and the server may diminished.
That said, I would recommend using a standard username/password authentication mechanism and HTTPS for transport security. A basic auth username/password would be my recommendation(https:\\username:password#appengine.com\). In addition, I would make sure to enforce a lockout based on a certain number of failed attempts in a specific time window. This would discourage attempts to brute force the password.
Depending on what web framework you are using on the App Engine, there is probably already support for some or all of what I just mentioned. If you update this question with more specifics on your architecture or open a new question with more information, we could give you a more accurate recommendation.
SDC provides a secure tunnel from AppEngine to a private network elsewhere -- which could be your EC2 instance, if you run it there.

How can I protect my AWS access id and secret key in my python application

I'm making an application in Python and using Amazon Web Services in some modules.
I'm now hard coding my AWS access id and secret key in *.py file. Or might move them out to an configuration file in future.
But there's a problem, how can I protect AWS information form other people? As I know python is a language that easy to de-compile.
Is there a way to do this?
Well what I'm making is an app to help user upload/download stuff from cloud. I'm using Amazon S3 as cloud storage. As I know Dropbox also using S3 so I'm wondering how they protects the key.
After a day's research I found something.
I'm now using boto (an AWS library for python). I can use a function of 'generate_url(X)' to get a url for the app to accessing the object in S3. The url will be expired in X seconds.
So I can build a web service for my apps to provide them the urls. The AWS keys will not be set into the app but into the web service.
It sounds great, but so far I only can download objects with this function, upload doesn't work. Any body knows how to use it for uploading?
Does anyone here know how to use key.generate_url() of boto to get a temporary url for uploading stuff to S3?
There's no way to protect your keys if you're going to distribute your code. They're going to be accessible to anyone who has access to your server or source code.
There are two things you can do to protect yourself against malicious use of your keys.
Use the amazon IAM service to create a set of keys that only has permission to perform the tasks that you require for your script. http://aws.amazon.com/iam/
If you have a mobile app or some other app that will require user accounts you can create a service to create temporary tokens for each user. The user must have a valid token and your keys to perform any actions. If you want to stop a user from using your keys you can stop generating new tokens for them. http://awsdocs.s3.amazonaws.com/STS/latest/sts-api.pdf
Specifically to S3 if you're creating an application to allow people to upload content. The only way to protect your account and the information of the other users is to make them register an account with you.
The first step of the application would be to authenticate with your server.
Once your server authenticates you make a request to amazons token server and return a token
Your application then makes a request using the keys built into the exe and the token.
Based on the permissions applied to this user he can upload only to the bucket that is assigned to him.
If this seems pretty difficult then you're probably not ready to design an application that will help users upload data to S3. You're going to have significant security problems if you only distribute 1 key even if you can hide that key from the user they would be able to edit any data added by any user.
The only way around this is to have each user create their own AWS account and your application will help them upload files to their S3 account. If this is the case then you don't need to worry about protecting the keys because the user will be responsible for adding their own keys after installing your application.
I've been trying to answer the same question... the generate_url(x) looks quite promising.
This link had a suggestion about creating a cloudfront origin access identity, which I'm guessing taps into the IAM authentication... meaning you could create a key for each application without giving away your main account details. With IAM, you can set permissions based on keys as to what they can do, so they can have limited access.
Note: I don't know if this really works, I haven't tried it yet, but it might be another avenue to explore.
2 - Create a Cloudfront "Origin Access Identity"
This identity can be reused for many different distributions and keypairs. It is only used
to allow cloudfront to access your private S3 objects without allowing
everyone. As of now, this step can only be performed using the API.
Boto code is here:
# Create a new Origin Access Identity
oai = cf.create_origin_access_identity(comment='New identity for secure videos')
print("Origin Access Identity ID: %s" % oai.id)
print("Origin Access Identity S3CanonicalUserId: %s" % oai.s3_user_id)
You're right, you can't upload using pre-signed URLs.
There is a different, more complex capability that you can use called GetFederationToken. This will return you some temporary credentials, to which you can apply any policy (permissions) that you like.
So for example, you could write a web service POST /upload that creates a new folder in S3, then creates temporary credentials with permissions to PutObject to only this folder, and returns the folder path and credentials to the caller. Presumably, some authorization check would be performed by this method as well.
You can't embed cloud credentials, or any other credentials, in your application code. Which isn't to say that nobody ever accidentally does this, even security professionals.
To safely distribute credentials to your infrastructure, you need tool support. If you use an AWS facility like CloudFormation, you can (somewhat more) safely give it your credentials. CloudFormation can also create new credentials on the fly. If you use a PaaS like Heroku, you can load your credentials into it, and Heroku will presumably treat them carefully. Another option for AWS is IAM Role. You can create an IAM Role with permission to do what you need, then "pass" the role to your EC2 instance. It will be able to perform the actions permitted by the role.
A final option is a dedicated secrets management service, such as Conjur. (Disclaimer: I'm a founder of the company). You load your credentials and other secrets into a dedicated virtual appliance, and you define access permissions that govern the modification and distribution of the credentials. These permissions can be granted to people or to "robots" like your EC2 box. Credentials can be retrieved via REST or client APIs, and every interaction with credentials is recorded to a permanent record.
Don't put it in applications you plan to distribute. It'll be visible and they can launch instances that are directly billable to you or worst..they can take down instances if you use it in production.
I would look at your programs design and seriously question why I need to include that information in the app. If you post more details on the design I'm sure we can help you figure out a way in which you don't need to bundle this information.

Categories