How to access GitHub Actions Environment Secrets locally and remotely - python

On my local repository, I have a .gitignore'd config.py file that holds sensitive tokens. On the remote repo, I grab the GitHub Secrets using Python's os.
Here is the relevant snippet of my code, is this the best practice for it to work both locally and remotely?
import os
try: # This should run locally
import config
ACCESS_TOKEN = config.ACCESS_TOKEN
OWNER_ID = config.OWNER_ID
except: # This should run on GitHub Actions
ACCESS_TOKEN = os.environ['ACCESS_TOKEN']
OWNER_ID = os.environ['OWNER_ID']
Here is the full code: https://github.com/Andrew6rant/Andrew6rant/blob/main/today.py

There are many ways to solve it, but personally I did this:
Use config values always in Python - don't handle environment anywhere.
Overwrite values in the config file on GitHub Action before executing Python from GitHub secrets - separate step independent from Python.
For local development we keep config file in 1password vault and not keeping it in GitHub to avoid leaking secrets and vurnable data.
In my case I keep config in JSON format, but you can pick any.

Related

Learning Python fast how can I protect some private connections from been exposed

Hi I'm new to the community and new to Python, experienced but rusty on other high level languages, so my question is simple.
I made a simple script to connect to a private ftp server, and retrieve daily information from it.
from ftplib import FTP
#Open ftp connection
#Connect to server to retrieve inventory
#Open ftp connection
def FTPconnection(file_name):
ftp = FTP('ftp.serveriuse.com')
ftp.login('mylogin', 'password')
#List the files in the current directory
print("Current File List:")
file = ftp.dir()
print(file)
# # #Get the latest csv file from server
# ftp.cwd("/pub")
gfile = open(file_name, "wb")
ftp.retrbinary('RETR '+ file_name, gfile.write)
gfile.close()
ftp.quit()
FTPconnection('test1.csv')
FTPconnection('test2.csv')
That's the whole script, it passes my credentials, and then calls the function FTPconnection on two different files I'm retrieving.
Then my other script that processes them has an import statement, as I tried to call this script as a module, what my import does it's just connect to the FTP server and fetch information.
import ftpconnect as ftpc
This is the on the other Python script, that does the processing.
It works but I want to improve it, so I need some guidance on best practices about how to do this, because in Spyder 4.1.5 I get an 'Module ftpconnect called but unused' warning ... so probably I am missing something here, I'm developing on MacOS using Anaconda and Python 3.8.5.
I'm trying to build an app, to automate some tasks, but I couldn't find anything about modules that guided me to better code, it simply says you have to import whatever .py file name you used and that will be considered a module ...
and my final question is how can you normally protect private information(ftp credentials) from being exposed? This has nothing to do to protect my code but the credentials.
There are a few options for storing passwords and other secrets that a Python program needs to use, particularly a program that needs to run in the background where it can't just ask the user to type in the password.
Problems to avoid:
Checking the password in to source control where other developers or even the public can see it.
Other users on the same server reading the password from a configuration file or source code.
Having the password in a source file where others can see it over your shoulder while you are editing it.
Option 1: SSH
This isn't always an option, but it's probably the best. Your private key is never transmitted over the network, SSH just runs mathematical calculations to prove that you have the right key.
In order to make it work, you need the following:
The database or whatever you are accessing needs to be accessible by SSH. Try searching for "SSH" plus whatever service you are accessing. For example, "ssh postgresql". If this isn't a feature on your database, move on to the next option.
Create an account to run the service that will make calls to the database, and generate an SSH key.
Either add the public key to the service you're going to call, or create a local account on that server, and install the public key there.
Option 2: Environment Variables
This one is the simplest, so it might be a good place to start. It's described well in the Twelve Factor App. The basic idea is that your source code just pulls the password or other secrets from environment variables, and then you configure those environment variables on each system where you run the program. It might also be a nice touch if you use default values that will work for most developers. You have to balance that against making your software "secure by default".
Here's an example that pulls the server, user name, and password from environment variables.
import os
server = os.getenv('MY_APP_DB_SERVER', 'localhost')
user = os.getenv('MY_APP_DB_USER', 'myapp')
password = os.getenv('MY_APP_DB_PASSWORD', '')
db_connect(server, user, password)
Look up how to set environment variables in your operating system, and consider running the service under its own account. That way you don't have sensitive data in environment variables when you run programs in your own account. When you do set up those environment variables, take extra care that other users can't read them. Check file permissions, for example. Of course any users with root permission will be able to read them, but that can't be helped. If you're using systemd, look at the service unit, and be careful to use EnvironmentFile instead of Environment for any secrets. Environment values can be viewed by any user with systemctl show.
Option 3: Configuration Files
This is very similar to the environment variables, but you read the secrets from a text file. I still find the environment variables more flexible for things like deployment tools and continuous integration servers. If you decide to use a configuration file, Python supports several formats in the standard library, like JSON, INI, netrc, and XML. You can also find external packages like PyYAML and TOML. Personally, I find JSON and YAML the simplest to use, and YAML allows comments.
Three things to consider with configuration files:
Where is the file? Maybe a default location like ~/.my_app, and a command-line option to use a different location.
Make sure other users can't read the file.
Obviously, don't commit the configuration file to source code. You might want to commit a template that users can copy to their home directory.
Option 4: Python Module
Some projects just put their secrets right into a Python module.
# settings.py
db_server = 'dbhost1'
db_user = 'my_app'
db_password = 'correcthorsebatterystaple'
Then import that module to get the values.
# my_app.py
from settings import db_server, db_user, db_password
db_connect(db_server, db_user, db_password)
One project that uses this technique is Django. Obviously, you shouldn't commit settings.py to source control, although you might want to commit a file called settings_template.py that users can copy and modify.
I see a few problems with this technique:
Developers might accidentally commit the file to source control. Adding it to .gitignore reduces that risk.
Some of your code is not under source control. If you're disciplined and only put strings and numbers in here, that won't be a problem. If you start writing logging filter classes in here, stop!
If your project already uses this technique, it's easy to transition to environment variables. Just move all the setting values to environment variables, and change the Python module to read from those environment variables.

Azure Pipelines - Use System.AccessToken within a Python Script

I am working on a pipeline where the majority of code is within a python script that I call in the pipeline. In the script I would like to use the predefined variable System.AccessToken to make a call to the DevOps API that sets the status of a pull request.
However, when I try to get the token using os.environ['System.AccessToken'] I get a key error.
Oddly though, it seems that System.AccessToken is set, because in the yaml file for the pipeline I am able to access the API like:
curl -u ":$(System.AccessToken)" URL
and get back a valid response. Is there something additional I need to do in Python to access this variable?
After reviewing the page that Mani posted I found the answer. For most variables, something like System.AccessToken would have a corresponding SYSTEM_ACCESSTOKEN.
However, with a secret variable this is not the case. I was able to make it accessible to my python script by adding:
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
to where the Python script is called in the pipeline's yaml file.
See https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables for more details.
with this documentation it can work: https://learn.microsoft.com/de-de/azure/developer/python/azure-sdk-authenticate?tabs=cmd
Just change the language to "read in english"
There must be a vault and a present Secret aka SAS Token.
And I have to say your code above is curl not python.
import os
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
# Acquire the resource URL
vault_url = os.environ["KEY_VAULT_URL"]
# Acquire a credential object
credential = DefaultAzureCredential()
# Acquire a client object
secret_client = SecretClient(vault_url=vault_url, credential=credential)
# Attempt to perform an operation
retrieved_secret = secret_client.get_secret("secret-name-01")
with this save change the fields to your vault and secret name the file as test.py and run it.
If you need the token outside, each Environment have it own namespace.
So adding it in local context with export ... or
follow the Unix policy, "everything is a file" write it to file.
Good practise here is to use ansible-vault or something similar.
store it encrypted, use it if you need it.
read it from file.
Can you use os.environ['SYSTEM_ACCESSTOKEN'] . As mentioned in https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#environment-variables the case/format of the environment variables is different

Accessing DynamoDB Local from boto3

I am doing AWS tutorial Python and DynamoDB. I downloaded and installed DynamoDB Local. I got the access key and secret access key. I installed boto3 for python. The only step I have left is setting up authentication credentials. I do not have AWS CLI downloaded, so where should I include access key and secret key and also the region?
Do I include it in my python code?
Do I make a file in my directory where I put this info? Then should I write anything in my python code so it can find it?
You can try passing the accesskey and secretkey in your code like this:
import boto3
session = boto3.Session(
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
)
client = session.client('dynamodb')
OR
dynamodb = session.resource('dynamodb')
From the AWS documentation:
Before you can access DynamoDB programmatically or through the AWS
Command Line Interface (AWS CLI), you must configure your credentials
to enable authorization for your applications. Downloadable DynamoDB
requires any credentials to work, as shown in the following example.
AWS Access Key ID: "fakeMyKeyId"
AWS Secret Access Key:"fakeSecretAccessKey"
You can use the aws configure command of the AWS
CLI to set up credentials. For more information, see Using the AWS
CLI.
So, you need to create an .aws folder in yr home directory.
There create the credentials and config files.
Here's how to do this:
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html
If you want to write portable code and keep in the spirit of developing 12-factor apps, consider using environment variables.
The advantage is that locally, both the CLI and the boto3 python library in your code (and pretty much all the other offical AWS SDK languages, PHP, Go, etc.) are designed to look for these values.
An example using the official Docker image to quickly start DynamoDB local:
# Start a local DynamoDB instance on port 8000
docker run -p 8000:8000 amazon/dynamodb-local
Then in a terminal, set some defaults that the CLI and SDKs like boto3 are looking for.
Note that these will be available until you close your terminal session.
# Region doesn't matter, CLI will complain if not provided
export AWS_DEFAULT_REGION=us-east-1
# Set some dummy credentials, dynamodb local doesn't care what these are
export AWS_ACCESS_KEY_ID=abc
export AWS_SECRET_ACCESS_KEY=abc
You should then be able to run the following (in the same terminal session) if you have the CLI installed. Note the --endpoint-url flag.
# Create a new table in DynamoDB Local
aws dynamodb create-table \
--endpoint-url http://127.0.0.1:8000 \
--table-name tmp \
--attribute-definitions AttributeName=id,AttributeType=S \
--key-schema AttributeName=id,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
You should then able to list out the tables with:
aws dynamodb list-tables --endpoint-url http://127.0.0.1:8000
And get a result like:
{
"TableNames": [
"tmp"
]
}
So how do we get the endpoint-url that we've been specifying in the CLI to work in Python? Unfortunately, there isn't a default environment variable for the endpoint url in the boto3 codebase, so we'll need to pass it in when the code runs. The docs for .NET and Java are comprehensive but for Python, they are a bit more elusive. From the boto3 github repo and also see this great answer, we need to create a client or resource with the endpoint_url keyword. In the below, we're looking for a custom environment variable called AWS_DYNAMODB_ENDPOINT_URL. The point being that if specified, it will be used, otherwise will fall back to whatever the platform default is, making your code portable.
# Run in the same shell as before
export AWS_DYNAMODB_ENDPOINT_URL=http://127.0.0.1:8000
# file test.py
import os
import boto3
# Get environment variable if it's defined
# Make sure to set the environment variable before running
endpoint_url = os.environ.get('AWS_DYNAMODB_ENDPOINT_URL', None)
# Using (high level) resource, same keyword for boto3.client
resource = boto3.resource('dynamodb', endpoint_url=endpoint_url)
tables = resource.tables.all()
for table in tables:
print(table)
Finally, run this snippet with
# Run in the same shell as before
python3 test.py
# Should produce the following output:
# dynamodb.Table(name='tmp')

Alternative to attempting to persist Environment Variables in Python

Up until now, whenever I have needed to store a "secret" for a simple python application, I have relied on environment variables. In Windows, I set the variables via the Computer Properties dialog and I access them in my Python code like this:
database_password = os.environ['DB_PASS']
The simplicity of this approach has served me well. Now I have a project that uses Oauth2 authentication and I have a need to store tokens to the environment that may change throughout program execution. I want them to persist the next time I execute the program. This is what I have come up with:
#fetch a new token
token = oauth.fetch_token('https://api.example.com/oauth/v2/token', code=secretcode)
access_token = token['access_token']
#make sure it persists in the current session
os.environ['TOKEN'] = access_token
#store to the system environment (Windows)
cmd = 'SETX /M TOKEN ' + access_token
os.system(cmd)
It gets the job done quickly for me today, but does not seem like the right approach to add to my toolbox. Does anyone have a more elegant way of doing what I am trying to do that does not add too many layers of complexity? If the solution worked across platforms that would be a bonus.
I have used the Python keyring module with great success. It's an interface to credential vaults provided by the operating system (e.g., Windows Credential Manager). I haven't used it on Linux, but it appears to be supported, as well.
Storing a password/token and then retrieving it can be as simple as:
import keyring
keyring.set_password("system", "username", "password")
keyring.get_password("system", "username")

Unable to switch gcloud platform account using python script

Please could someone help me with a query related to permissions on the Google cloud platform? I realise that this is only loosely programming related so I apologise if this is the wrong forum!
I have a project ("ProjectA") written in Python that uses Googles cloud storage and compute engine. The project has various buckets that are accessed using python code from both compute instances and from my home computer. This project uses a service account which is a Project "owner", I believe it has all APIs enabled and the project works really well. The service account name is "master#projectA.iam.gserviceaccount.com".
Recently I started a new project that needs similar resources (storage, compute) etc, but I want to keep it separate. The new project is called "ProjectB" and I set up a new master service account called master#projectB.iam.gserviceaccount.com. My code in ProjectB generates an error related to access permissions and is demonstrated even if I strip the code down to these few lines:
The code from ProjectA looked like this:
from google.cloud import storage
client = storage.Client(project='projectA')
mybucket = storage.bucket.Bucket(client=client, name='projectA-bucket-name')
currentblob = mybucket.get_blob('somefile.txt')
The code from ProjectB looks like this:
from google.cloud import storage
client = storage.Client(project='projectB')
mybucket = storage.bucket.Bucket(client=client, name='projectB-bucket-name')
currentblob = mybucket.get_blob('somefile.txt')
Both buckets definitely exist, and obviously if "somefile.text" does not exist then currentblob is None, which is fine, but when I execute this code I receive the following error:
Traceback (most recent call last):
File .... .py", line 6, in <module>
currentblob = mybucket.get_blob('somefile.txt')
File "C:\Python27\lib\site-packages\google\cloud\storage\bucket.py", line 599, in get_blob
_target_object=blob,
File "C:\Python27\lib\site-packages\google\cloud\_http.py", line 319, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/storage/v1/b/<ProjectB-bucket>/o/somefile.txt: master#ProjectA.iam.gserviceaccount.com does not have storage.objects.get access to projectB/somefile.txt.
Notice how the error message says "ProjectA" service account doesn't have ProjectB access - well, I would somewhat expect that but I was expecting to use the service account on ProjectB!
Upon reading the documentation and links such as this and this, but even after removing and reinstating the service account or giving it limited scopes it hasnt helped. I have tried a few things:
1) Make sure that my new service account was "activated" on my local machine (where the code is being run for now):
gcloud auth activate-service-account master#projectB.iam.gserviceaccount.com --key-file="C:\my-path-to-file\123456789.json"
This appears to be successful.
2) Verify the list of credentialled accounts:
gcloud auth list
This lists two accounts, one is my email address (that I use for gmail, etc), and the other is master#projectB.iam.gserviceaccount.com, so it appears that my account is "registered" properly.
3) Set the service account as the active account:
gcloud config set account master#projectB.iam.gserviceaccount.com
When I look at the auth list again, there is an asterisk "*" next to the service account, so presumably this is good.
4) Check that the project is set to ProjectB:
gcloud config set project projectB
This also appears to be ok.
Its strange that when I run the python code, it is "using" the service account from my old project even though I have changed seemingly everything to refer to project B - Ive activated the account, selected it, etc.
Please could someone point me in the direction of something that I might have missed? I don't recall going through this much pain when setting up my original project and Im finding it so incredibly frustrating that something I thought would be simple is proving so difficult.
Thank you to anyone who can offer me any assistance.
I'm not entirely sure, but this answer is from a similar question on here:
Permission to Google Cloud Storage via service account in Python
Specifying the account explicitly by pointing to the credentials in your code. As documented here:
Example from the documentation page:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Don't you have a configured GOOGLE_APPLICATION_CREDENTIALS env variable which points project A's SA?
The default behavior of Google SDK is to takes the service account from the environment variable GOOGLE_APPLICATION_CREDENTIALS.
If you want to change the account you can do something like:
from google.cloud import storage
credentials_json_file = os.environ.get('env_var_with_path_to_account_json')
client= storage.Client.from_service_account_json(credentials)
The above assumes you have creates a json account file like in: https://cloud.google.com/iam/docs/creating-managing-service-account-keys
and that the json account file is in the environment variable env_var_with_path_to_account_json
This way you can have 2 account files and decide which one to use.

Categories