Timing application and exact timestamps from server - serverless? - python

I would like developers to give a comment about :) thank you
I am building an application that needs an exact timestamp for multiple devices at the same time. Device time cannot be used, because they are not at the same time.
Asking time from the server is Ok but it is slow and depend on connection speed, and if making this as serverless/regioness then serverside timestamp cannot be used because of time-zones?
On demo application, this works fine with one backend at pointed region but still it needs to respond faster to clients.
Here is simply image to see this in another way?
In the image, there are no IDs etc but the main idea is there
I am planning to build this on nodejs server and later when more timing calculations translate it to python/Django pack...
Thank you

Using server time is perfectly fine and timezone difference due to different regions can be easily adjusted in code. If you are using a cloud provider, they typically group their services by region instead of distributing them worldwide, so adjusting the timezone shouldn't be a problem. Anyway, check the docs for the cloud service you are planning to use, they usually clearly document the geographic distribution.
Alternatively, you could consider implementing (or using) Network Time Protocol (NTP) on your devices. It's a fairly simple protocol for synchronizing clocks, if you need a source code example take a look at the source for node-ntp-client , a javascript implementation.

Related

How to integrate BIRT with Python Django Project by using Py4j

Hi is there anyone who is help me to Integrate BIRT report with Django Projects? or any suggestion for connect third party reporting tools with Django like Crystal or Crystal Clear Report.
Some of the 3rd-party Crystal Reports viewers listed here provide a full command line API, so your python code can preview/export/print reports via subprocess.call()
The resulting process can span anything between an interactive Crystal Report viewer session (user can login, set/change parameters, print, export) and an automated (no user interaction) report printing/exporting.
While this would simplify your code, it would restrict deployment to Windows.
For prototyping, or if you don't mind performance, you can call from BIRT from the command line.
For example, download the POJO runtime and use the script genReport.bat (IIRC) to generate a report to a file (eg. PDF format). You can specify the output options and the report parameters on the command line.
However, the BIRT startup is heavy overhead (several seconds).
For achieving reasonable performance, it is much better to perform this only once.
To achieve this goal, there are at least two possible ways:
You can use the BIRT viewer servlet (which is included as a WAR file with the POJO runtime). So you start the servlet with a web server, then you use HTTP requests to generate reports.
This looks technically old-fashioned (eg. no JSON Requests), but it should work. However, I never used this approach.
The other option is to write your own BIRT server.
In our product, we followed this approach.
You can take the viewer servlet as a template for seeing how this could work.
The basic idea is:
You start one (or possibly more than one) Java process.
The Java process initializes the BIRT runtime (this is what takes some seconds).
After that, the Java process listens for requests somehow (we used a plain socket listener, but of course you could use HTTP or some REST server framework as well).
A request would contain the following information:
which module to run
which output format
report parameters (specific to the module)
possibly other data/metadata, e.g. for authentication
This would create a RunAndRenderTask or separate RunTask and RenderTasks.
Depending on your reports, you might consider returning the resulting output (e.g. PDF) directly as a response, or using an asynchronous approach.
Note that BIRT will happily create several reports at the same time - multi-threading is no problem (except for the initialization), given enough RAM.
Be warned, however, that you will need at least a few days to build a POC for this "create your own server" approach, and probably some weeks for prodction quality.
So if you just want to build something fast to see if the right tool for you, you should start with the command line approach, then the servlet approach and only then, and only if you find that the servlet approach is not quite good enough, you should go the "create your own server" way.
It's a pity that currently there doesn't seem to exist an open-source, production-quality, modern BIRT REST service.
That would make a really good contribution to the BIRT open-source project... (https://github.com/eclipse/birt)

API Design Questions using Django for OS tasks (REST vs RPC)

Background:
I have an application that is supposed to automate some infrastructure & OS-heavy tasks that happen on a network file system (for example: mounting volumes, shutting down / bringing up servers, creating directories, moving data around, ssh-ing etc). Ultimately there are a lot of OS-level commands that need to be run in a sequence for each action. Our consumer/client likely does not know this sequence, but knows "I want to do X task".
Tech stack: Python/Django
I have been tasked with setting this application up but am perplexed on the best way to approach for modularity from the API standpoint & just overall design. Currently, we have a similar application that is a SOAP-style (rpc) but the way it is written is not very modular. Like for example, one function will have a ton of random hardcoded subprocess commands - not the approach I want to emulate here.
Initially I was leaning more towards REST API since Django has a nice django rest framework plug-in, but am having trouble modelling these very action-oriented tasks. The more I read other things online, the more I come to believe I really need to think of every little action as a resource with the client having to GET/POST/PUT to each of these to keep things very modular but when I boil that down further it looks like I may need to set up 15+ endpoints for each situation needed and the client likely isn't going to want to call all 15 endpoints to get their singular behavior they want. That being said - moving to rpc so users can have one endpoint that 'moves the moon on a single call' might not be the best approach either.
I think one of the issues I see is our application is doing a lot of work on a file system, not all contained within our application's database. I reckon that's kind of a central point of this application, but I have trouble modelling things that require file system actions outside our application's database.
Question 1:
One example action that our client might want to call would be responsible for ssh-ing to a remote server and running a command. How might you model this in REST?
Question 2:
How do you all model file system actions in your applications?
Question 3:
After reviewing the above, does RPC seem like the better option?
Other:
Any other help or feedback (even in generally is much appreciated).
REST is similar to SOAP in a sense that you call operations in SOAP and REST just maps those operations to web resources and HTTP methods.
For example
z DoSSHStuffOnARemoteServer(x=1,y=2)
vs
POST /RemoteServer/SSHStuff {x:1,y:2}
If it timeouts, because it takes a lot of time, then you can do
202 accepted
{type: "transaction": href: "/RemoteServer/SSHStuff/123", status: "pending"}
and poll it in every 5-10 mins or use websockets to update the status. After it is done:
200 ok
{type: "transaction": href: "/RemoteServer/SSHStuff/123", status: "done", result: {z:3}}
So there is no magic. Just keep in mind that REST is in the presentation layer of your application, it returns view models and the entire structure is connected to the application services if you do DDD. It should not reflect the database structure unless you have an anemic domain model, or others call it thick client. Normally I would not say anything about RemoteServer/SSHStuff, just tell the client what will be done and stay silent about how it will be done. They don't need to know anything about how you store data or how many servers you have with what protocols and applications. It should not be their concern. The only thing they need to know what will be done, how long it takes to respond and what will be the response. The other part is irrelevant to them and it is a security risk if you share too much of it. When we design an interface like an interface for a REST service or just an OOP interface we always do it to hide implementation details. I hope that helps, have a nice day!

Sharing tables across multiple sites

this is more of an architecture question which I can't solve this properly as I don't have enough experience with such architecture... I'm currently running the solution with Python and SqlAlchemy, but the question is generic and the answer doesn't have to address those technologies.
I will try to explain it on an example of public library. So imagine having a public library, with server holding tables with all the books, scans (large binary images), users. I've already made a client and server parts which work great, but locally for a single library.
Now I would like to have this of server and clients for another public library (and later more public libraries to come). Having a local server for each library is desired as there is much data to be transferred to and from local server.
The complication comes from the requirement to be able to share users (with their member cards) between libraries - if user comes and registers at library A, he should be able to go to library B without the need for new registration. There's no need for being able to see other user data in the library he wasn't registered in the first place, just hist member account (id, login and password).
The simple solution would be:
having large data on local server
having users on cloud (some public server on internet)
The problem is that there are queries (for statistics, views, and so on), which run on local server and need accessing users, so I can't have users on a different server and database, because I couldn't then do select + join on such an architecture.
The solution which is left behind by previous developer and which other developers think is wrong, is to have the users table set up as replicated table (MariaDB + Galera), so it would end up having users table the same on cloud and each library site, so the previous code would work as if everything is just local, while sharing the users on the background with other libraries.
One of the problems with this is that the current version of our database (MariaDB) doesn't support (or has broken) partial replication (only some tables or some databases), so it would need patching of the MariaDB and distributing this patched version of database server to cloud and other sites, which stinks of various problems now and in the future, when new version of MariaDB will come out.
What would be the proper way of sharing these users between sites, while retaining the ability to do local selects and joins with the user table?
(Maybe there's a known design / architecture pattern for this, but I just don't know what to search for as I'm new to this.)
Thanks,
Miro
schema - sharing table between sites
Start with a single-source-of truth for the user registrations. That is one server (or Galera cluster, for HA) somewhere (in HQ, in Cloud, wherever). Login queries remotely access that server.
Think about any place you log in -- you are going to some central cite. My point is, that is the way everyone does it because it is fast, reliable, efficient, etc, with today's networks.
Next, what about images, etc? If they are shared across your sites, you may as well do them the same way. Look at any search engine for the last two decades -- images (etc) are fetched from a single site. (Actually a small number of sites, for redundancy, etc). Even the biggest web providers have no more than perhaps a dozen datacenters to service the entire world.
After that, you need to decide on Cloud vs dedicated (or even run your own datacenter).
For HA, Cloud providers do a lot. For do-it-yourself, there are various replication scenarios, Galera being one of the best (today). For true HA, you need two copies of your data geographically separated -- to protect from hurricanes, fires, floods, earthquakes, etc. Consider a WAN deployment of Galera, or some asynchronouse replication (possibly even between two Galera clusters.
Another choice is whether the Users and Images tables need to be on separate servers. Only if the traffic and size are high do you need to consider separating them. For a huge Image library, you may need a large number of servers, at which point, they should probably living on servers with the sole purpose of delivering images -- no Users, no HTML pages, etc. Even the "meta" info about images could be elsewhere in MySQL; the Images are in files and just a web server tuned to deliver images runs. (I can think of multiple 'big guys' that do it this way.)

Durable architecture in Python distributed application

Wondering about durable architectures for distributed Python applications.
This question I asked before should provide a little guidance about the sort of application it is. We would like to have the ability to have several code servers and several database servers, and ideally some method of deployment that is manageable and not too much of a pain.
The question I mentioned provides an answer that I like, but I wonder how it could be made more durable, or if doing so requires using other technologies. In particular:
I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:
Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.
The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.
What is the best way to make durable applications? Any suggestions on how to either "code for failure" or design it differently so that we don't necessarily have to? If you think Python might not be suited for this, that is also a valid solution.
Well to continue on the previous answer I gave.
In my projects I code for failure, because I use AWS for a lot of my hosting needs.
I have implemented database backends that will make sure that the database, region, is accessible and if not it will choose another region from a specified list. This happens transparently to the rest of the system on that node. So, if the east-1a region goes down I have a few other regions that I also host in that it will failover into, such as the west coast. I keep track of currently going database transactions and send them over to the west coast and dump them to a file so I can import them into the old database region once it becomes available.
My front end servers sit behind a elastic load balancer that is distributed across multiple regions and this allows for durable recovery if a region fails. But, it cannot be relied upon so I am looking into solutions such as running a HAProxy and switching my DNS in the case that my ELB goes down. This is a work in progress and I cannot give specifics on my own solutions.
To make your data processing durable look into Celery and store the data in a distributed mongo server to keep your results safe. Using a durable data store to keep your results allows you to get them back in the event of a node crash. It comes at the cost of some performance, but it shouldn't be too terrible if you only rely on soft-realtime constraints.
http://www.mnxsolutions.com/amazon/designing-for-failure-with-amazon-web-services.html
The above article talks mostly about AWS but the ideas apply to any system that you need to keep high availability in and system durability. Just remember that downtime is ok as long as you minimize it for a subset of users.

Is it more efficient to parse external XML or to hit the database?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls instead for that same information.
First off -- measure. Don't just assume that one is better or worse than the other.
Second, if you really don't want to measure, I'd guess the database is a bit faster (assuming the database is relatively local compared to the web service). Network latency usually is more than parse time unless we're talking a really complex database or really complex XML.
Everyone is being very polite in answering this question: "it depends"... "you should test"... and so forth.
True, the question does not go into great detail about the application and network topographies involved, but if the question is even being asked, then it's likely a) the DB is "local" to the application (on the same subnet, or the same machine, or in memory), and b) the webservice is not. After all, the OP uses the phrases "external service" and "display on your own site." The phrase "parsing it once or however many times you need to each day" also suggests a set of data that doesn't exactly change every second.
The classic SOA myth is that the network is always available; going a step further, I'd say it's a myth that the network is always available with low latency. Unless your own internal systems are crap, sending an HTTP query across the Internet will always be slower than a query to a local DB or DB cluster. There are any number of reasons for this: number of hops to the remote server, outage or degradation issues that you can't control on the remote end, and the internal processing time for the remote web service application to analyze your request, hit its own persistence backend (aka DB), and return a result.
Fire up your app. Do some latency and response times to your DB. Now do the same to a remote web service. Unless your DB is also across the Internet, you'll notice a huge difference.
It's not at all hard for a competent technologist to scale a DB, or for you to completely remove the DB from caching using memcached and other paradigms; the latency between servers sitting near each other in the datacentre is monumentally less than between machines over the Internet (and more secure, to boot). Even if achieving this scale requires some thought, it's under your control, unlike a remote web service whose scaling and latency are totally opaque to you. I, for one, would not be too happy with the idea that the availability and responsiveness of my site are based on someone else entirely.
Finally, what happens if the remote web service is unavailable? Imagine a world where every request to your site involves a request over the Internet to some other site. What happens if that other site is unavailable? Do your users watch a spinning cursor of death for several hours? Do they enjoy an Error 500 while your site borks on this unexpected external dependency?
If you find yourself adopting an architecture whose fundamental features depend on a remote Internet call for every request, think very carefully about your application before deciding if you can live with the consequences.
Consuming the webservices is more efficient because there are a lot more things you can do to scale your webservices and webserver (via caching, etc.). By consuming the middle layer, you also have the options to change the returned data format (e.g. you can decide to use JSON rather than XML). Scaling database is much harder (involving replication, etc.) so in general, reduce hits on DB if you can.
There is not enough information to be able to say for sure in the general case. Why don't you do some tests and find out? Since it sounds like you are using python you will probably want to use the timeit module.
Some things that could effect the result:
Performance of the web service you are using
Reliability of the web service you are using
Distance between servers
Amount of data being returned
I would guess that if it is cacheable, that a cached version of the data will be faster, but that does not necessarily mean using a local RDBMS, it might mean something like memcached or an in memory cache in your application.
It depends - who is calling the web service? Is the web service called every time the user hits the page? If that's the case I'd recommend introducing a caching layer of some sort - many web service API's throttle the amount of hits you can make per hour.
Whether you choose to parse the cached XML on the fly or call the data from a database probably won't matter (unless we are talking enterprise scaling here). Personally, I'd much rather make a simple SQL call than write a DOM Parser (which is much more prone to exceptional scenarios).
It depends from case to case, you'll have to measure (or at least make an educated guess).
You'll have to consider several things.
Web service
it might hit database itself
it can be cached
it will introduce network latency and might be unreliable
or it could be in local network and faster than accessing even local disk
DB
might be slow since it needs to access disk (although databases have internal caches, but those are usually not targeted)
should be reliable
Technology itself doesn't mean much in terms of speed - in one case database parses SQL, in other XML parser parses XML, and database is usually acessed via socket as well, so you have both parsing and network in either case.
Caching data in your application if applicable is probably a good idea.
As a few people have said, it depends, and you should test it.
Often external services are slow, and caching them locally (in a database in memory, e.g., with memcached) is faster. But perhaps not.
Fortunately, it's cheap and easy to test.
Test definitely. As a rule of thumb, XML is good for communicating between apps, but once you have the data inside of your app, everything should go into a database table. This may not apply in all cases, but 95% of the time it has for me. Anytime I ever tried to store data any other way (ex. XML in a content management system) I ended up wishing I would have just used good old sprocs and sql server.
It sounds like you essentially want to cache results, and are wondering if it's worth it. But if so, I would NOT use a database (I assume you are thinking of a relational DB): RDBMSs are not good for caching; even though many use them. You don't need persistence nor ACID.
If choice was between Oracle/MySQL and external web service, I would start with just using service.
Instead, consider real caching systems; local or not (memcache, simple in-memory caches etc).
Or if you must use a DB, use key/value store, BDB works well. Store response message in its serialized form (XML), try to fetch from cache, if not, from service, parse. Or if there's a convenient and more compact serialization, store and fetch that.

Categories