I have build as application with REST API.
I have to send request from a cluster (of about 10 nodes) to get some information from this API.
Now if I run it on single system, it is expected that it may become a bottleneck and mapreduce job will take a lot of time to complete.
Is there any way or solution that I can run this service on multiple systems for load balancing?. I am using Linux OS. My service is built in Python and I am using Flash to get it via REST.
Related
I'm having a lot of trouble with this question and don't really find a lot of resources with it. I have a Power Automate Flow setup (triggered by a user submitting a form) in which I would like to receive data from my on-premise (on our corporate LAN) SQL (MariaDB) instance on my RHEL. From my understanding, there are three components that I would require: (1) REST API built on my web server which would simply return the data by querying the database, (2) On-Premise Gateway on my RHEL system, and (3) A HTTP Connector in my Flow process.
Developing a simple python/php script for my REST API is quite straightforward, however, I am having a bit of difficulty building my on-premise gateway. Microsoft comes with its own free Gateway however that is only available on Windows. Would I need to set up a Windows VM and from there install the gateway or is there a better solution?
Additionally, is my approach to solve this problem correct or am I going about it wrong?
I have a Python script that pulls some data from an Azure Data Lake cluster, performs some simple compute, then stores it into a SQL Server DB on Azure. The whole shebang runs in about 20 seconds. It needs sqlalchemy, pandas, and some Azure data libraries. I need to run this script daily. We also have a Service Fabric cluster available to use.
What are my best options? I thought of containerizing it with Docker and making it into an http triggered API, but then how do I trigger it 1x per day? I'm not good with Azure or microservices design so this is where I need the help.
You can use Web Jobs in App Service. It has two types of Azure Web Jobs for you to choose: Continuous and Trigger. As I see you need the type Trigger
You could refer to the document here for more details.In addition, here shows how to run tasks in WebJobs.
Also, you can use Azure function timer-based on python which was made generally available in recent months.
I have a Spark batch processing code (basically, the model training) that I execute with spark-submit from AWS EMR cluster. Now I want to be able to launch this job each day at specific time.
What is the standard way to do it?
Should I change the code and add the scheduling inside the code? Or is there any way to schedule spark-submit job?
Or maybe should I make it as a Spark Streaming job executed every 24 hours? (though I am interested in a specific time slot, i.e. between 11:00pm and 12pm)
Cron is more traditional... although it is good, Another way/option is RunDeck.
Use Rundeck as an easier to manage and more secure replacement for Cron or as a replacement for legacy tools like Control-M or HP Operations Orchestration. Rundeck gives your users a simple web interface (GUI or API) to go to for both on-demand and scheduled operations tasks.
What is Rundeck?
Rundeck is open source software that helps you automate routine operational procedures in data center or cloud environments. Rundeck provides a number of features that will alleviate time-consuming grunt work and make it easy for you to scale up your automation efforts and create self service for others. Teams can collaborate to share how processes are automated while others are given trust to view operational activity or execute tasks.
Rundeck allows you to run tasks on any number of nodes from a web-based or command-line interface. Rundeck also includes other features that make it easy to scale up your automation efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.
If you are using Linux you can setup a Cron job to call the spark-submit script
http://kvz.io/blog/2007/07/29/schedule-tasks-on-linux-using-crontab/
I am trying to develop a cloud-bursting solution for our cluster.
What I need is a way to monitor the VM's on the openNebula cluster and turn off those vm's whose cpu consumption is less then 10% for a certain amount of time.
I am stuck at monitoring part.
I am not able to find any way via which I can timely monitor the VM's for the CPU/Memory consumption status.
I am writing code on python.
I am also using libcloud to access the openNebula from my code.
Any ideas?
Thanks.
You should use the OpenNebula XMLRPC API instead of libcloud, since libcloud does not include monitoring information of the VMs.
You can use any of the available binding to interact with the OpenNebula XMLRPC API (ruby & java)
Calling the info method on a Virtual Machine instance will retrieve the Virtual Machine information including the monitoring values for CPU and MEMORY
I'm a complete novice in this area, so please excuse my ignorance.
I have three questions:
What's the best (fastest, easiest, headache-free) way of hosting a python program online?
I'm currently looking at Google App Engine and Web Frameworks for Python, but all the options are a bit overwhelming.
Which gui/viz libraries will transfer to a web app environment without problems?
I'm willing to sacrifice some performance for the sake of simplicity.
(Google App Engine can't do C libraries, so this is causing a dilemma.)
Where can I learn more about running a program locally vs. having a program continuously run on a server and taking requests from multiple users?
Currently I have a working Python program that only uses standard Python libraries. It currently uses around 2.7gb of ram, but as I increase my dataset, I'm predicting it will use closer to 6gb. I can run it on my personal machine, and everything is just peachy. I'd like to continue developing on the front end on my home machine and implement the web app later.
Here is a relevant, previous post of mine.
Depending on your knowledge with server administration, you should consider a dedicated server. I was doing running some custom Python modules with Numpy, Scipy, Pandas, etc. on some data on a shared server with Godaddy. One program I wrote took 120 seconds to complete. Recently we switched to a dedicated server and it now takes 2 seconds. The shared environment used CGI to run Python and I installed mod_python on the dedicated server.
Using a dedicated server allows COMPLETE control (including root access) to the server which allows the compilation and/or installation of anything. It is a bit pricy but if you're making money with your stuff it might be worth it.
Another option would be to use something like http://www.dyndns.com/ where you can host a domain on your own machine.
So with that said, perhaps some answers:
It depends on your requirements. ~4gb of RAM might require a dedicated server. What you are asking is not necessarily an easy task so don't be afraid to get your hands dirty.
Not sure what you mean here.
A server is just a computer that responds to requests. On the dedicated server (I keep mentioning) you are operating in a Unix (or Windows) environment just like you would locally. You use SOFTWARE (e.g. Apache web server) to serve client requests. My vote is mod_python.
It's a greater headache than a dedicated server, but it should be much closer to your needs to go with an Amazon EC2 instance.
http://aws.amazon.com/ec2/#instance
Their extra large instance should be more than large enough for what you need to do, and you only turn the instance on when you need it so you don't have the massive bill that you get with a dedicated server that's the same size.
There are some nice javascript based visualization toolkits out there, so you can model your application to return raw (json) data and render that on the client.
I can mention d3.js http://mbostock.github.com/d3/ and the JavaScript InfoVis Toolkit http://thejit.org/