Weblogic domain and cluster creation with WLST - python

I want to create a cluster with 2 managed servers on 2 different physical machines.
I have following tasks to be performed (please correct me if I miss something)
Domain creation.
Set admin server properties and create AdminServer under SSL
Create logical machines for the physical ones
Create managed servers
create cluster with the managed servers
I have following questions.
Which of the above mentioned tasks can be done offline if any ?
Which of the above mentioned tasks must also be performed on the 2nd physical machine ?

I eventually found the answer. I am posting here for reference.
Out of the 5 mentioned tasks, all can be performed with an offline wlst script. All of them have to be performed on the node where AdminServer is supposed to live.
Now, for updating the domain information on the second node, there is an nmEnroll command in wlst which hast to be performed online
So, to summarize,
Execute an offline wlst script to perform all the 5 tasks mentioned in the question. This has to be done on the node (physical computer) where we want our AdminServer to run.
Start nodemanager on all the nodes to be used in the cluster,
Start the AdminServer on the node where we executed the domain creation script.
On all the other nodes execute the script which looks like following.
connect('user','password','t3://adminhost:adminport')
nmEnroll('path_to_the_domain_dir')

There are two steps missed after the step 1, you need to copy the configuration from the machine where the AdminServer is running run to the other machine in the cluster using the command pack content in Weblogic installation:
1.1 On the machine where the AdminServer is running run ./pack.shdomain=/home/oracle/config/domains/my_domain
-template=/home/oracle/my_domain.jar -template_name=remote_managed -managed=true
1.2 Go on the other machines and copy the jar file produced in the previous step and run ./unpack.sh -domain=/home/oracle/config/domains/my_domain -template=/home/oracle/my_domain.jar SAML_IDP_FromScript
Now you have copied all the file you need to start the NodeManager and the ManagedServers on the other machines.

Related

Tensorflow Distributed Learning isn't working when using MultiWorkerMirroredStrategy in actual cluster

I am just trying to follow example of MultiWorkerMirroredStrategy in tensorflow doc.
I succeed training in localhost, which has a single node.
However, I failed training in cluster, which has two nodes.
I have tried disabling firewall, but it didn't solve the problem.
Here is the main.py. (I run same code in node 1 and node 2, except the tf_config variable. I set node1's tf_config as tf_config['task']['index']=0, and node2's tf_config as tf_config['task']['index']=1)
main.py
Any helps Appreciated. Thanks.
I see that you don't have an error code, but I think I can infer where the issue could be arising, since your code should work. I will test on my kubernetes once I get a chance (I have a node down atm).
The most likely issue. You are using json.dumps() to set the environment variable. In many setting the you should be using:
tf_config=json.loads(os.environ.get(TF_CONFIG) or '{}'),
TASK_INDEX=tf_config['task']['index']
That should clear up any issues with expose ports and ip configurations.
-It sounds like the method you are using is in a notebook? Since you are not running the the same code for main.py. As in one main.py you set 1 and the other 0. Either way that is not what you are doing here. You are setting the index to 1 and 0 but you are not getting back only the index, you are getting back the full cluster spec with the index you set it to. If the environment variable is not set from your cluster, you will need to get back the TF_CONFIG that was set, and then use loads to set that as your tf_config, now you will be getting ONLY the replica index for that node.
If you are using a notebook it needs to be connected to the cluster environment, otherwise you are setting a local environment variable to your machine, and not to containers on the cluster. Consider using Kubeflow, to manage this.
You can either launch from the notebook after setting up your cluster
configuration op, or build a TF_job spec as a YAML that defined the node specs, then launch the pods using that spec.
Either way, the cluster needs to actually have that configuration, you should be able to load the environment in the cluster such that each node is ASSIGNED an index and you are getting that index from THAT nodes replica ID that you set when you launched the nodes and specified with a YAML or json dictionary. A locally set environment running within the local container means nothing to the actual cluster, if the replica-index:{num} on kubernetes does not match the environment variable on the container —That is assigned when the pod is launched.
-Try making a function that will return what the index of each worker is to test if it is set to the same replica-index on your kubernetes dashboard or from kubectl. Make sure to have the function print that out so you can see it in the pod logs. This will help with debugging.
-Look at the pod logs and see if the pods are connecting to the server, and are using whatever communication spec is compatible with your cluster: grcp/etc. You are not setting a communication strategy, but it should be able to automatically find it for you in most cases (just check in case).
-If you are able to launch pods, make sure you are terminating them before trying again. Again kubeflow is going to make things so much easier for you once you get the hang of their python pipeline skd. You can launch functions as containers. You can build an op that clean up, by terminating old pods.
-You should consider having your main.py and any other supporting modules loaded on an image in a repository, such as dockerhub, so that the containers can load the image. With Multiworker Strategy, each machine needs to have the same data for it to be sharded properly. Again check your pod logs to see if it cannot shard the data.
-Are you running on a local machine with different GPUs? If so you should be using Mirrored Strategy NOT multiworker.

Python compiler call another python compiler to execute a script (execute a script from one independent machine to another)

I know the question title is weird!.
I have two virtual machines. First one has limited resources, while the second one has enough resources just like normal machine. The first machine will receive a signal from an external device. This signal will trigger a python compiler to execute a script. The script is big and the first machine does not have enough resources to execute it.
I can copy the script to the second machine to run it there, but I can't make the second machine receive the external signal. I am wondering if there is a way to make the compiler on the first machine ( once the external signal received) call the compiler on the second machine, so the compiler on the second machine executes the script? so the second compiler should use the second machine resources. check the attached image please.
Assume that the connection is established between the two machines and they can see each other, and the second machine has a copy from the script. I just need the commands that pass ( the execution ) to the second machine and make it use its own resources.
You should look into the microservice architecture to do this.
You can achieve this either by using flask and sending server requests between each machine, or something like nameko, which will allow you to create a "bridge" between machines and call functions between them (seems like what you are more interested in). Example for nameko:
Machine 2 (executor of resource-intensive script):
from nameko.rpc import rpc
class Stuff(object):
#rpc
def example(self):
return "Function running on Machine 2."
You would run the above script through the Nameko shell, as detailed in the docs.
Machine 1:
from nameko.standalone.rpc import ClusterRpcProxy
# This is the amqp server that machine 2 would be running.
config = {
'AMQP_URI': AMQP_URI # e.g. "pyamqp://guest:guest#localhost"
}
with ClusterRpcProxy(config) as cluster_rpc:
cluster_rpc.Stuff.example() # Function running on Machine 2.
More info here.
Hmm, there's many approaches to this problem.
If you want a python only solution, you can check out
dispy http://dispy.sourceforge.net/
Or Dask. https://dask.org/
If you want a robust solution (what I use on my home computing cluster but imo overkill for your problem) you can use
SLURM. SLURM is basically a way to string multiple computers together into a "supercomputer". https://slurm.schedmd.com/documentation.html
For a semi-quick, hacky solution. You can write a microservice. Essentially, your "weak" computer will receive the message then send a http request to your "strong" computer. Your strong computer will contain the actual program, compute results, and pass back the result to your "weak" computer.
Flask is an easy and lightweight solution for this.
All of these solutions require some type of networking. At the least, the computers need to be on the same LAN or both have access over the web.
There are many other approaches not mentioned. For example, you can export a NFS (netowrk file storage) and have one computer put a file in the shared folder and the other computer perform work on the file. I'm sure there are plenty other contrived ways to accomplish this task :). I'd be happy to expand on a particular method if you want.

Are IPython engines independent processes?

From the IPython Architecture Overview documentation we know that ...
The IPython engine is a Python instance that takes Python commands over a network connection.
Given that it is a Python instance does that imply that these engines are stand alone processes? I can manually load a set of engines via a command like ipcluster start -n 4. Doing thus is the creation of engines considered the creation of child processes of some parent process or just a means to kick off a set of independent processes that rely on IPC communication to get their work done? I can also invoke an engine via the ipengine command, which is surely standalone as its entered directly to the OS command line with no relation to anything really.
As background I'm trying to drill into how the many IPython engines manipulated through a Client from a python script will interact with another process kicked off in that script.
Here's a simple way to find out the processes involved, print the list of current processes before I fire off the controller and engines and then print the list after they're fired off. There's a wmic command to get the job done...
C:\>wmic process get description,executablepath
Interestingly enough the controller gets 5 python processes going, and each engine creates one additional python process. So from this investigation I also learned that an engine is its own process, as well as the controller...
C:\>wmic process get description,executablepath | findstr ipengine
ipengine.exe C:\Python34\Scripts\ipengine.exe
ipengine.exe C:\Python34\Scripts\ipengine.exe
C:\>wmic process get description,executablepath | findstr ipcontroller
ipcontroller.exe C:\Python34\Scripts\ipcontroller.exe
From the looks of it they all seem standalone, though I don't think the OS's running process list carries any information about how the processes are related as far as the parent/child relationship is concerned. That may be a developer only formalism that has no representation that's tracked in the OS, but I don't know about these sort of internals to know either way.
Here's a definitive quote from MinRK that addresses this question directly:
"Every engine is its own isolated process...Each kernel is a separate
process and can be on any machine... It's like you started a terminal IPython session, and every engine is a separate IPython session. If you do a=5 in this one, a=10 in that one, this guy has 10 this guy has 5."
Here's further definitive validation, inspired by a great SE Hot Network Question on ServerFault that mentioned use of ProcessExplorer which actually tracks parent child processes...
Process Explorer is a Sysinternals tool maintained by Microsoft. It
can display the command line of the process in the process's
properties dialog as well as the parent that launched it, though the
name of that process may no longer be available.
--Corrodias
If I fire off more engines in another command window that section of ProcessExplorer just duplicates exactly as you see in the screenshot.
And just for the sake of completeness, here' what the command ipcluster start --n=5 looks like...

Starting and stopping processes in a cluster

I'm writing software that runs a bunch of different programs (via twisted's twistd); that is N daemons of various kinds must be started across multiple machines. If I did this manually, I would be running commands like twistd foo_worker, twistd bar_worker and so on on the machines involved.
Basically there will be a list of machines, and the daemon(s) I need them to run. Additionally, I need to shut them all down when the need arises.
If I were to program this from scratch, I would write a "spawner" daemon that would run permanently on each machine in the cluster with the following features accessible through the network for an authenticated administrator client:
Start a process with a given command line. Return a handle to manage it.
Kill a process given a handle.
Optionally, query stuff like cpu time given a handle.
It would be fairly trivial to program the above, but I cannot imagine this is a new problem. Surely there are existing solutions to doing exactly this? I do however lack experience with server administration, and don't even know what the related terms are.
What existing ways are there to do this on a linux cluster, and what are some of the important terms involved? Python specific solutions are welcome, but not necessary.
Another way to put it: Given a bunch of machines in a lan, how do I programmatically work with them as a cluster?
The most familiar and universal way is just to use ssh. To automate you could use fabric.
To start foo_worker on all hosts:
$ fab all_hosts start:foo_worker
To stop bar_worker on a particular list of hosts:
$ fab -H host1,host2 stop:bar_worker
Here's an example fabfile.py:
from fabric.api import env, run, hide # pip install fabric
def all_hosts():
env.hosts = ['host1', 'host2', 'host3']
def start(daemon):
run("twistd --pid %s.pid %s" % (daemon, daemon))
def stop(daemon):
run("kill %s" % getpid(daemon))
def getpid(daemon):
with hide('stdout'):
return run("cat %s.pid" % daemon)
def ps(daemon):
"""Get process info for the `daemon`."""
run("ps --pid %s" % getpid(daemon))
There are a number of ways to configure host lists in fabric, with scopes varying from global to per-task, and it’s possible mix and match as needed..
To streamline the process management on a particular host you could write initd scripts for the daemons (and run service daemon_name start/stop/restart) or use supervisord (and run supervisorctl e.g., supervisorctl stop all). To control "what installed where" and to push configuration in a centralized manner something like puppet could be used.
The usual tool is a batch queue system, such as SLURM, SGE, Torque/Moab, LSF, and so on.
Circus :
Documentation :
http://docs.circus.io/en/0.5/index.html
Code:
http://pypi.python.org/pypi/circus/0.5
Summary from the documentation :
Circus is a process & socket manager. It can be used to monitor and control processes and sockets.
Circus can be driven via a command-line interface or programmatically trough its python API.
It shares some of the goals of Supervisord, BluePill and Daemontools. If you are curious about what Circus brings compared to other projects, read Why should I use Circus instead of X ?.
Circus is designed using ZeroMQ http://www.zeromq.org/. See Design for more details.

Handling hardware resources when testing with Jenkins

I want to setup Jenkins to
1) pull our source code from our repository,
2) compile and build it
3) run the tests on an embedded device
step 1 & 2 are quite easy and straight forward with Jenkins
as for step 3,
we have hundreds of those devices in various versions of them, and I'm looking for a utility (preferable in python) that can handle the availability of hardware devices/resources.
in such manner that one of the steps will be able to receive which of the device is available and run the tests on it.
What I have found, is that the best thing to do, is have something like jenkins, or if you're using enterprise, electric commander, manage a resource 'pool' the pool is essentially virtual devices, but they have a property, such that you can call into a python script w/ either an ip-address or serial port and communicate w/ your devices.
I used it for automated embedded testing on radios. The python script managed a whole host of tests, and commander would go ahead and choose a single-step resource from the pool, that resource had an ip, and would pass it into the python script. test would then perform all the tests and the stdout would get stored up into commander/jenkins ... Also set properties to track pass/fail count as test was executing
//main resource gets single step item from pool, in the main resource wrote a tiny script that asked if the item pulled from the pool had the resource name == "Bench1" .. "BenchX" etc.
basically:
if resource.name=="BENCH1":
python myscript.py --com COM3 --baud 9600
...
etc.
the really great feature about doing it this way, is if you have to disconnect a device, you don't need to deliver up script changes, you simply mark the commander/jenkins resource as disabled, and the main 'project' can still pull from what remains in your resource pool

Categories