I have a Deployment Manager script as follows:
cluster.py creates a kubernetes cluster and when the script was run only for the k8 cluster creation, it was successful -- so it means the cluster.py had no issues in creation of a k8 cluster
cluster.py also exposes ouputs:
A small snippet of the cluster.py is as follows:
outputs.append({
'name': 'v1endpoint' ,
'value': type_name + type_suffix })
return {'resources': resources, 'outputs': outputs}
If I try to access the exposed output inside dmnginxservice resource below as $(ref.dmcluster.v1endpoint) I get an error as resource not found
imports:
- path: cluster.py
- path: nodeport.py
resources:
- name: dmcluster
type: cluster.py
properties:
zone: us-central1-a
- name: dmnginxservice
type: nodeport.py
properties:
cluster: $(ref.dmcluster.v1endpoint)
image: gcr.io/pr1/nginx:latest
port: 342
nodeport: 32123
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1519960432614-566655da89a70-a2f917ad-69eab05a]: errors:
- code: CONDITION_NOT_MET
message: Referenced resource yaml%dmcluster could not be found. At resource
gke-cluster-dmnginxservice.
I tried to reproduce a similar implementation and I have been able to deploy it with no issues making use of your very same sintax for the output.
I deployed 2 VM and a new network. I will post you my code, maybe you find some interesting hints concerning the outputs.
The first VM pass as output the name for the second VM and use a reference from the network
The second VM takes the name from the properties that have been populated from the output of the first VM
the network thanks to the references is the first one to be created.
Keep in mind that:
This can get tricky because the order of creation for resources is important; you cannot add virtual machine instances to a network that does not exist, or attach non-existent persistent disks. Furthermore, by default, Deployment Manager creates all resources in parallel, so there is no guarantee that dependent resources are created in the correct order.
I will skip that is the same. If you provide your code I could try to help you to debug it, but from the error code it seems that the DM is not aware that the first element has been created, but from the info provided is not clear why.
Moreover if I were you I would give a shot to explicitly set that dmnginxservice depends on dmcluster making use of the metadata. In this way you can double check if it is actually waiting the first resource.
UPDATE
I have been able to reproduce the bug with a simpler configuration basically depending on how I reference the variables, the behaviour is different and for some reason the property get expanded to $(ref.yaml%vm-1.paolo), it seems that the combination of project and cluster references causes troubles.
#'name': context.properties["debug"],WORKING
#'name': context.env["project"],WORKING
'name': context.properties["debug"]+context.env["project"],#NOT WORKING
You can check the configuration here, If you need it.
Related
I'm building an application that can run user-submitted python code. I'm considering the following approaches:
Spinning up a new AWS lambda function for each user's request to run the submitted code in it. Delete the lambda function afterwards. I'm aware of AWS lambda's time limit - so this would be used to run only small functions.
Spinning up a new EC2 machine to run a user's code. One instance per user. Keep the instance running while the user is still interacting with my application. Kill the instance after the user is done.
Same as the 2nd approach but also spin up a docker container inside the EC2 instance to add an additional layer of isolation (is this necessary?)
Are there any security vulnerabilities I need to be aware of? Will the user be able to do anything if they gain access to environment variables in their own lambda function/ec2 machine? Are there any better solutions?
Any code which you run on AWS Lambda will have the capabilities of the associated function. Be very careful what you supply.
Even logging and metrics access can be manipulated to incur additional costs.
I'm having some trouble in trying to understand how to pass an output of a resource as an input to another resource, so they have a dependency and the order at the creation time works properly.
Scenario:
Resource B has a dependency from Resource A.
I was trying to pass to resource B something like these
opts = ResourceOptions(depends_on=[ResourceA])
But for some reason, it acts as that parameter wasn't there and keeps creating Resource B before creating Resource A, therefore throwing an error.
If I execute pulumi up a second time, as Resource A exists, Resource B gets created.
I noticed that you could also pass an output as an input of another resource, and because of this, Pulumi understands that there is a relationship and makes it so automatically
https://www.pulumi.com/docs/intro/concepts/inputs-outputs/
But I can't get my head around it in how to pass that, so, any help regarding this would be appreciate it.
I also used the following explanation regarding how to use ResourceOptions, which I think that I'm using it correctly as the code above, but still no case
How to control resource creation order in Pulumi
Thanks in advance.
#mrthopson,
Let me try to explain using one of the public examples. I took it from this Pulumi example:
https://github.com/pulumi/examples/blob/master/aws-ts-eks/index.ts
// Create a VPC for our cluster.
const vpc = new awsx.ec2.Vpc("vpc", { numberOfAvailabilityZones: 2 });
// Create the EKS cluster itself and a deployment of the Kubernetes dashboard.
const cluster = new eks.Cluster("cluster", {
vpcId: vpc.id,
subnetIds: vpc.publicSubnetIds,
instanceType: "t2.medium",
desiredCapacity: 2,
minSize: 1,
maxSize: 2,
});
The example first creates a VPC in AWS. The VPC contains a number of different networks and the identifiers of these networks are exposed as outputs. When we create the EKS cluster, we pass the ids of the public subnets (output vpc.publicSubnetIds) as an input to the cluster (input: subnetIds).
That is the only thing you need to do to have a dependency from the EKS cluster on the VPC. When running Pulumi, the engine will find out it first needs to create the VPC and only after that it can create the EKS cluster.
Ringo
I try to get some parameter from AWS System Manager for ECS Fargate Containers, but I get some problems. My code is:
secret_value = ssm.StringParameter.from_secure_string_parameter_attributes(
self,
"/spark/ssh_pub",
parameter_name="/spark/ssh_pub",
version=1
)
container_sp = fargate_task_definition_sp.add_container(
"pod-spark-master",
image=ecs.ContainerImage.from_registry(
"xxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/spark-master:ready-for-test-deployment"),
health_check=health_check_sp,
logging=log_config_sp,
secrets={
"SPARK_PUB": ecs.Secret.from_ssm_parameter(secret_value)
}
)
Then I get this error:
jsii.errors.JSIIError: There is already a Construct with name '--spark--ssh_pub' in Stack [sandbox]
has someone any idea?
There are few possibilities.
CDK Bug
CDK has too many issues. Similar error was reported https://github.com/aws/aws-cdk/issues/8603. So it can be a CDK bug. In this case, all we can do is raise an issue in the Github and hope they will fix which may not happen soon with having 1000+ issues reported and open.
There are actually a few CDK constructs (AWS resources) to which the same name have been given. Search through your stack "sandbox" and make sure no duplicate name will be created. If the same construct can be created more than once and the name can be the same.
There is already a Construct with name '--spark--ssh_pub' in Stack [sandbox].
Please also make sure this is actually what you need.
image=ecs.ContainerImage.from_registry(
"xxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com...
Apparently the docker image is in your ECR. Then from_ecr_repository should be the one to use. AWS documentations are confusing and sometime incorrect. The from_registry is not to pull the images from ECR but from the DockerHub, etc.
I am trying to run a distributive application with PyTorch distributive trainer. I thought I would first try the example they have, found here. I set up two AWS EC2 instances and configured them according to the description in the link, but when I try to run the code I get two different errors: in the first terminal window for node0 I get the error message: RuntimeError: Address already in use
Under the other three windows I get the same error message:
RuntimeError: NCCL error in:
/pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:272, unhandled system
error
I followed the code in the link, and terminated the instances an redid but it didn't help/
This is using python 3.6 with the nightly build Cuda 9.0. I tried changing the MASTER_ADDR to the ip for node0 on both nodes, as well as using the same MASTER_PORT (which is an available, unused port). However I still get the same error message.
After running this, my goal is to the adjust this StyleGan implementation so that I can train it across multiple GPUs in two different nodes.
So after a lot of failed attempts I found out what the problem is. Note that this solution applies to using ASW deep learning instances.
After creating two instances I had to adjust the security group. Add two rules: The first rule should be ALL_TCP, and set the source to the Private IPs of the leader. The second rule should be the same (ALL_TCP), but with the source as the Private IPs of the slave node.
Previously, I had the setting security rule set as: Type SSH, which only had a single available port (22). For some reason I was not able to use this port to allow the nodes to communicate. After changing these settings the code worked fine. I was also able to run this with the above mentioned settings.
I'm using Py2neo in a project. Most of the time the neo4j server runs on localhost so in order to connect to the graph I just do:
g = Graph()
But when I run tests I'd like connect to a different graph, preferably one I can trash without any consequencews.
I'd like to have a "production" graph, possibly set up in such a way that even though it also runs on localhost, the tests won't have access to it.
Can this be done?
UPDATE 0 - A better way to put this question might have been how can I get my locahost Neo4J to serve up 2 databases on two different ports? Once I've got that working it's trivial ot use the REST client to connect to one or the other. I'm running the latest .deb version of Neo4J on an Ubuntu workstation (if that matters).
You can have multiple instances of Neo4j running on the same machine by configuring them to use different ports, i.e. 7474 for development and 7473 for tests.
Graph() defaults to http://localhost:7474/db/data/ but you can also pass a connection URI explicitly:
dev = Graph()
test = Graph("http://localhost:7473/db/data/")
prod = Graph("https://remotehost.com:6789/db/data/")
You can run neo4j server on a different machine and access it through REST service.
Inside the neo4j-server.properties, you can uncomment the line where it says IP address of 0.0.0.0
This would allow that server to be accessed from any place. Now I dont what with Python, but with Java I am using Java Rest library to access that server using the Java Rest Library for Neo4j. Take a look here
https://github.com/rash805115/bookeeping/blob/master/src/main/java/database/service/impl/Neo4JRestServiceImpl.java
Update 0: There are three ways to complete your wish.
Method 1: Start neo4j instance on a separate machine. Then access that instance using some REST API. The way to do that would be to go in conf/neo4j-server.properties and then to find this line and uncomment it.
#org.neo4j.server.webserver.address=0.0.0.0
Method 2: Start two neo4j instances on the same machine but different port and use the REST service to access those. To do this copy the neo4j distribution into two separate folders. Then change this line in conf/neo4j-server.properties and change the port in atleast one if them.
First Instance - org.neo4j.server.webserver.port=7474
org.neo4j.server.webserver.https.port=7473
Second Instance - org.neo4j.server.webserver.port=8484
org.neo4j.server.webserver.https.port=8483
Method 3: From your comments it appears you want to do this and indeed this is the easiest method. Have two separate databases on the same Neo4J Instance. For you to do this you dont have to change any configuration files, just a line in your code. I have not done this in python exactly, but I have done the same in Java. Let me give you the Java code and you can see how easy it is.
Production Code:
package rash.experiments.neo4j;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class Neo4JEmbedded
{
public static void main(String args[])
{
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/productiondata/");
ExecutionEngine executionEngine = new ExecutionEngine(graphDatabaseService);
try(Transaction transaction = graphDatabaseService.beginTx())
{
executionEngine.execute("create (node:Person {userId: 1})");
transaction.success();
}
ExecutionResult executionResult = executionEngine.execute("match (node) return count(node)");
System.out.println(executionResult.dumpToString());
}
}
Test Code:
package rash.experiments.neo4j;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class Neo4JEmbedded
{
public static void main(String args[])
{
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/testdata/");
ExecutionEngine executionEngine = new ExecutionEngine(graphDatabaseService);
try(Transaction transaction = graphDatabaseService.beginTx())
{
executionEngine.execute("create (node:Person {userId: 1})");
transaction.success();
}
ExecutionResult executionResult = executionEngine.execute("match (node) return count(node)");
System.out.println(executionResult.dumpToString());
}
}
Note the difference in line:
GraphDatabaseService graphDatabaseService = new GraphDatabaseFactory().newEmbeddedDatabase("db/testdata/");
This creates two separate folders db/productiondata and db/testdata. Both of these folders contains separate data and your code can use either folder based on your requirement.
I am pretty sure, in your python code you have to do almost the same thing. Something like (Note that this code might not be correct):
g = Graph("/db/productiondata")
g = Graph("/db/testdata")
Unfortunately, this is a problem without a perfect solution right now. There are however a few options available which may suffice for what you need.
First, have a look at the py2neo build script: https://github.com/nigelsmall/py2neo/blob/release/2.0.5/bau
This is a bash script that spawns a new database instance for each version that needs testing, starting up with an empty store beforehand and closing down afterwards. It uses the default port 7474 but it should be an easy change to tweak this automatically in the properties file. Specifically here, you'll probably want to look at the test, neo4j_start and neo4j_stop functions.
Additionally, py2neo provides an extension called neobox:
http://py2neo.org/2.0/ext/neobox.html
This is intended to be a quick and simple way to set up new database instances running on free ports and might be helpful in this case.
Note that generally speaking, clearing down the data store between tests is a bad idea as this is a slow operation and can seriously impact the running time of your test suite. For that reason, a test database that lives for all tests is a better idea although requires a little thought when writing tests so as they don't overlap.
Going forward, Neo4j will gain DROP functionality to help with this kind of work but it will likely be a few releases before this appears.