I have several accounts in aws, which i control daily. In some of them i have eks clusters. Usually, if i want to see the available namespaces in the cluster, i’m logging into the cluster and then run kubectl get ns from windows terminal.
How can i use this command with python?
(I’m using boto3 to write my query in aws, and trying to do everything within boto3 module)
I’ve already entered inside the cluster, but using describe_cluster i don’t have that info i’m looking for
You can not get cluster resources with boto3. With additional libraries you can get k8s cluster resources with python. kubernetes, eks_token libraries can help you.
An example usage
Related
Usually things go other way - you use DataDog to monitor Airflow, but in my case I need to access DataDog metrics over some DataDog API from Airflow so I can send it to Snowflake table.
Idea it to use this table to build alerting system in ThoughtSpot Cloud when Kafka lag happens since ThoughtSpot Cloud doesn't support calling API, at least not from cloud version.
I'm googling over this options endlessly but not finding any more optimal and less complicated solution. Any advices is highly appreciated.
I have a Python script that pulls some data from an Azure Data Lake cluster, performs some simple compute, then stores it into a SQL Server DB on Azure. The whole shebang runs in about 20 seconds. It needs sqlalchemy, pandas, and some Azure data libraries. I need to run this script daily. We also have a Service Fabric cluster available to use.
What are my best options? I thought of containerizing it with Docker and making it into an http triggered API, but then how do I trigger it 1x per day? I'm not good with Azure or microservices design so this is where I need the help.
You can use Web Jobs in App Service. It has two types of Azure Web Jobs for you to choose: Continuous and Trigger. As I see you need the type Trigger
You could refer to the document here for more details.In addition, here shows how to run tasks in WebJobs.
Also, you can use Azure function timer-based on python which was made generally available in recent months.
Noob and beginner here. Just trying to learn the basics of GCP.
I have a series of Google Cloud Buckets that are text files. I also have a VM instance that I've set up via GCP.
Now, I'm trying to write some code to extract the data from Google buckets and run the script via GCP's command prompt.
How can I extract GCP buckets in Python
I think that you can use the Listing Objects and Downloading Objects GCS methods with Python; in this way, you will be able to get a list of the objects stored in your Cloud Storage buckets to then extract them into you VM instance. Additionally, keep in mind that it is important to verify that the service account that you implement to perform these tasks, has the required roles assigned in order to access to your GCS buckets, as well as provide the credentials to your application by using environment variables or explicitly pointing to your service account file in code.
Once you have your code ready, you can simply execute your Python program by using the
python command. You can take a look on this link to get the instructions to install Python in your new environment.
I'm looking for an end-to-end example of launching an AWS EMR cluster with a pyspark step and have it automatically terminate when the step is done or fails.
I've seen pieces of this explained but not one complete example.
First of all you should go through the AWS documentation for EMR which provides the details of all the available APIs
https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html
There are two options which you can use to access the aws services :
1) boto3 : http://boto3.readthedocs.io/en/latest/index.html
boto3 provides you with a set of functions to control different aws services.
2) aws-cli : https://github.com/aws/aws-cli
This provides a command line client to access aws apis for different services.
You can use either of the above services for your task and have good documentation.
As far as emr is concerned, you can refer following specific documents:
http://boto3.readthedocs.io/en/latest/reference/services/emr.html
https://github.com/aws/aws-cli/tree/develop/awscli/examples/emr
Try out some these APIs and feel free to ask for help if you get stuck somewhere.
I have a highly threaded application running on amazon EC2. I would to convert this application to a cluster on EC2. I would like to use starcluster for this as its easy to manage the cluster.
However I am new to cluster/distributed computing. A after googling I found the following list of python libraries for cluster computing:
http://wiki.python.org/moin/ParallelProcessing (look at the cluster computing section)
I would like know if all the libraries will work with starcluster. Is there anything I need to keep in mind like a dependency when choosing a library since I want the application to work with starcluster?
Basically, StarCluster is a tool to help you manage your cluster. It can add/remove node, set them within a placement and security group, register them into Open Grid Scheduler and more. You can also easily create commands and plugins to help you in your work.
How were you intending to use StarCluster?
If it's as a watcher to load balance your cluster then there shouldn't be any problems.
If it's as an actor (making it directly do the computation by launching it with a command you would craft yourself and parallelizing its execution among the cluster) then I don't know. It might be possible, but StarCluster was not designed for it. We can read from the website:
StarCluster has been designed to simplify the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud.