How can I bind my existing programs to Pulumi? - python

I'm starting to work with Pulumi for IaC design of my project but I'm having a hard time in understanding how to bind my existing code to the use of Pulumi.
For example, suppose I've made a lambda function in python with the following content:
# test_lambda.py
import boto3
import json
sqs_client = boto3.client('sqs')
ssm_client = boto3.client('ssm')
def get_auth_token():
response = ssm_client.get_parameters(
Names=[
'lambda_auth_token',
],
WithDecryption=False
)
return response["Parameters"][0]["Value"]
def handler(event, _):
body = json.loads(event['body'])
if body['auth_token'] == get_auth_token():
sqs_client.send_message(
QueueUrl='my-queue',
MessageBody='validated auth code',
MessageDeduplicationId='akjseh3278y7iuad'
)
return {'statusCode': 200}
else:
return {'statusCode': 403}
How do I reference this whole file containing the lambda function in a Pulumi project? So I could then use this lambda integrated with SNS service.
And also, since I'm using Pulumi for my architecture, boto3 seems uneeded, I could just replace it by Pulumi aws library right? And then the python interpreter will just use Pulumi as a common interface library for my aws resources (like boto3)?
This last question might seem odd but for now I've only seen the use of pulumi as the stack and architecture "builder" when running pulumi up.

You could try importing the lambda into your pulumi stack like below
aws.lambda_.Function("sqs_lambda", opts=ResourceOptions(import=[
"lambda_id",
]))
A pulumi up , post this will mean the lambda is managed by pulumi from there on.You should also be cautious of
pulumi stack destroy
as that would mean the imported resources are also deleted.
You can read Pulumi import resource

Related

How to connect kafka IO from apache beam to a cluster in confluent cloud

I´ve made a simple pipeline in Python to read from kafka, the thing is that the kafka cluster is on confluent cloud and I am having some trouble conecting to it.
Im getting the following log on the dataflow job:
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:820)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:612)
at org.apache.beam.sdk.io.kafka.KafkaIO$Read$GenerateKafkaSourceDescriptor.processElement(KafkaIO.java:1495)
Caused by: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set
So I think Im missing something while passing the config since it mentions something related to it, Im really new to all of this and I know nothing about java so I dont know how to proceed even reading the JAAS documentation.
The code of the pipeline is the following:
import apache_beam as beam
from apache_beam.io.kafka import ReadFromKafka
from apache_beam.options.pipeline_options import PipelineOptions
import os
import json
import logging
os.environ['GOOGLE_APPLICATION_CREDENTIALS']='credentialsOld.json'
with open('cluster.configuration.json') as cluster:
data=json.load(cluster)
cluster.close()
def logger(element):
logging.INFO('Something was found')
def main():
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanisms":data["sasl.mechanisms"],
"sasl.username":data["sasl.username"],
"sasl.password":data["sasl.password"],
"session.timeout.ms":data["session.timeout.ms"],
"auto.offset.reset":"earliest"
}
print('======================================================')
beam_options = PipelineOptions(runner='DataflowRunner',project='project',experiments=['use_runner_v2'],streaming=True,save_main_session=True,job_name='kafka-stream-test')
with beam.Pipeline(options=beam_options) as p:
msgs = p | 'ReadKafka' >> ReadFromKafka(consumer_config=config,topics=['users'],expansion_service="localhost:8088")
msgs | beam.FlatMap(logger)
if __name__ == '__main__':
main()
I read something about passing a property java.security.auth.login.config in the config dictionary but since that example is with java and I´am using python Im really lost at what I have to pass or even if that´s the property I have to pass etc.
btw Im getting the api key and secret from here and this is what I am passing to sasl.username and sasl.password
I faced the same error the first time I tried the beam's expansion service. The key sasl.mechanisms that you are supplying is incorrect, try with sasl.mechanism also you do not need to supply the username and password since you are connection is authenticated by jasl basically the consumer_config like below worked for me:
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanism":data["sasl.mechanisms"],
"session.timeout.ms":data["session.timeout.ms"],
"group.id":"tto",
"sasl.jaas.config":f'org.apache.kafka.common.security.plain.PlainLoginModule required serviceName="Kafka" username=\"{data["sasl.username"]}\" password=\"{data["sasl.password"]}\";',
"auto.offset.reset":"earliest"
}
I got a partial answer to this question since I fixed this problem but got into another one:
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanisms":data["sasl.mechanisms"],
"sasl.username":data["sasl.username"],
"sasl.password":data["sasl.password"],
"session.timeout.ms":data["session.timeout.ms"],
"group.id":"tto",
"sasl.jaas.config":f'org.apache.kafka.common.security.plain.PlainLoginModule required serviceName="Kafka" username=\"{data["sasl.username"]}\" password=\"{data["sasl.password"]}\";',
"auto.offset.reset":"earliest"
}
I needed to provide the sasl.jaas.config porpertie with the api key and secret of my cluster and also the service name, however, now Im facing a different error whe running the pipeline on dataflow:
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
This error shows after 4-5 mins of trying to run the job on dataflow, actually I have no idea how to fix this but I think is related to my broker on confluent rejecting the connection, I think this could be related to the zone execution since the cluster is in a different zone than job region.
UPDATE:
I tested the code on linux/ubuntu and I dont know why but the expansión service gets downloaded automatically so you wont get unsoported signal error, still having some issues trying to autenticate to confluent kafka tho.

Flyte 0.16.2: Error loading Blob - How to get Types.Blob.fetch() to work in task decorated function?

I have a Flyte task function like this:
#task
def do_stuff(framework_obj):
framework_obj.get_outputs() # This calls Types.Blob.fetch(some_uri)
Trying to load a blob URI using flytekit.sdk.types.Types.Blob.fetch, but getting this error:
ERROR:flytekit: Exception when executing No temporary file system is present. Either call this method from within the context of a task or surround with a 'with LocalTestFileSystem():' block. Or specify a path when calling this function. Note: Cleanup is not automatic when a path is specified.
I can confirm I can load blobs using with LocalTestFileSystem(), in tests, but when actually trying to run a workflow, I'm not sure why I'm getting this error, as the function that calls blob-processing is decorated with #task so it's definitely a Flyte Task. I also confirmed that the task node exists on the Flyte web console.
What path is the error referencing and how do I call this function appropriately?
Using Flyte Version 0.16.2
Could you please give a bit more information about the code? This is flytekit version 0.15.x? I'm a bit confused since that version shouldn't have the #task decorator. It should only have #python_task which is an older API. If you want to use the new python native typing API you should install flytekit==0.17.0 instead.
Also, could you point to the documentation you're looking at? We've updated the docs a fair amount recently, maybe there's some confusion around that. These are the examples worth looking at. There's also two new Python classes, FlyteFile and FlyteDirectory that have replaced the Blob class in flytekit (though that remains what the IDL type is called).
(would've left this as a comment but I don't have the reputation to yet.)
Some code to help with fetching outputs and reading from a file output
#task
def task_file_reader():
client = SynchronousFlyteClient("flyteadmin.flyte.svc.cluster.local:81", insecure=True)
exec_id = WorkflowExecutionIdentifier(
domain="development",
project="flytesnacks",
name="iaok0qy6k1",
)
data = client.get_execution_data(exec_id)
lit = data.full_outputs.literals["o0"]
ctx = FlyteContext.current_context()
ff = TypeEngine.to_python_value(ctx, lv=lit,
expected_python_type=FlyteFile)
with open(ff, 'rb') as fh:
print(fh.readlines())

get all the proper namespaces and ingressroutes with the python k8's api client library

I have the following command line below. It gives me the namespaces and ingressroutes names (see the Example below)
kubectl --context mlf-eks-dev get --all-namespaces ingressroutes
Example
NAMESPACE NAME AGE
aberdeentotalyes aberdeentotalgeyes-yesgepi 98d
I"m trying to replicate the kubetl command line above through my python code, using the python client for kubernetes but I miss on the option, on how to get the ingressroutes.
#My python code
from kubernetes import client, config
# Configs can be set in Configuration class directly or using helper utility
config.load_kube_config()
v1 = client.CoreV1Api()
print("Listing pods with their IPs:")
ret = v1.list_pod_for_all_namespaces(watch=False)
for i in ret.items:
print("%s\t%s" % (i.metadata.namespace, i.metadata.name))
The results are below
wwwpaulolo ci-job-backup-2kt6f
wwwpaulolo wwwpaulolo-maowordpress-7dddccf44b-ftpbq
wwwvhugo ci-job-backup-dz5hs
wwwvhugo wwwvhua-maowordpress-5f7bdfdccd-v7kcx
I have the namespaces but not the ingressroutes, as I don't know how can I get it.
Question is what is the option to have the ingressroutes, please?
Any helps are more than welcomed.
Cheers
After a careful reading of the documentation of the existing APIs for ingressroutes, I found none that can be used, all of them are still in beta as Ingressroutes is not a native components of Kubernetes.
I also asked on Slack's Kubernetes, on the Kubernetes client channel.
Alan C on Kubernetes's slack, gave me that answer:
There's no class for ingressroutes. You can use the dynamic client to
support arbitrary resource types (see link)
My solution is to use the python library subprocess to run the kubectl command line.
did you try CustomObjectsApi?
from kubernetes import client, config
config.load_kube_config()
api = client.CustomObjectsApi()
dir(api) # check for list_cluster_custom_object
#example below api will list all issuers in the cluster.
#issuer is custom resource of cert-manager.io
api.list_cluster_custom_object(group="cert-manager.io",
version="v1",plural="issuers")

Datastore delay on creating entities with put()

I am developing an application using with the Cloud Datastore Emulator (2.1.0) and the google-cloud-ndb Python library (1.6).
I find that there is an intermittent delay on entities being retrievable via a query.
For example, if I create an entity like this:
my_entity = MyEntity(foo='bar')
my_entity.put()
get_my_entity = MyEntity.query().filter(MyEntity.foo == 'bar').get()
print(get_my_entity.foo)
it will fail itermittently because the get() method returns None.
This only happens on about 1 in 10 calls.
To demonstrate, I've created this script (also available with ready to run docker-compose setup on GitHub):
import random
from google.cloud import ndb
from google.auth.credentials import AnonymousCredentials
client = ndb.Client(
credentials=AnonymousCredentials(),
project='local-dev',
)
class SampleModel(ndb.Model):
"""Sample model."""
some_val = ndb.StringProperty()
for x in range(1, 1000):
print(f'Attempt {x}')
with client.context():
random_text = str(random.randint(0, 9999999999))
new_model = SampleModel(some_val=random_text)
new_model.put()
retrieved_model = SampleModel.query().filter(
SampleModel.some_val == random_text
).get()
print(f'Model Text: {retrieved_model.some_val}')
What would be the correct way to avoid this intermittent failure? Is there a way to ensure the entity is always available after the put() call?
Update
I can confirm that this is only an issue with the datastore emulator. When testing on app engine and a Firestore in Datastore mode, entities are available immediately after calling put().
The issue turned out to be related to the emulator trying to replicate eventual consistency.
Unlike relational databases, Datastore does not gaurentee that the data will be available immediately after it's posted. This is because there are often replication and indexing delays.
For things like unit tests, this can be resolved by passing --consistency=1.0 to the datastore start command as documented here.

Aerospike Python Client. UDF module to count records. Cannot register module

I am currently implementing the Aerospike Python Client in order to benchmark it along with our Redis implementation, to see which is faster and/or more stable.
I'm still on baby steps, currently Unit-Testing basic functionality, for example if I correctly add records in my Set. For that reason, I want to create a function to count them.
I saw in Aerospike's Documentation, that :
"to perform an aggregation on query, you first need to register a UDF
with the database".
It seems that this is the suggested way that aggregations, counting and other custom functionality should be run in Aerospike.
Therefore, to count the records in a set I have, I created the following module:
# "counter.lua"
function count(s)
return s : map(function() return 1 end) : reduce (function(a,b) return a+b end)
end
I'm trying to use aerospike python client's function to register a UDF(User Defined Function) module:
udf_put(filename, udf_type, policy)
My code is as follows:
# aerospike_client.py:
# "udf_put" parameters
policy = {'timeout': 1000}
lua_module = os.path.join(os.path.dirname(os.path.realpath(__file__)), "counter.lua") #same folder
udf_type = aerospike.UDF_TYPE_LUA # equals to "0", which is for "Lua"
self.client.udf_put(lua_module, udf_type, policy) # Exception is thrown here
query = self.client.query(self.aero_namespace, self.aero_set)
query.select()
result = query.apply('counter', 'count')
an exception is thrown:
exceptions.Exception: (-2L, 'Filename should be a string', 'src/main/client/udf.c', 82)
Is there anything I'm missing or doing wrong?
Is there a way to "debug" it without compiling C code?
Is there any other suggested way to count the records in my set? Or I'm fine with the Lua module?
First, I'm not seeing that exception, but I am seeing a bug with udf_put where the module is registered but the python process hangs. I can see the module appear on the server using AQL's show modules.
I opened a bug with the Python client's repo on Github, aerospike/aerospike-client-python.
There's a best practices document regarding UDF development here: https://www.aerospike.com/docs/udf/best_practices.html
In general using the stream-UDF to aggregate the records through the count function is the correct way to go about it. There are examples here and here.

Categories