Couchbase ships with a backup tool called cbbackup (the docs are here).
In my setup, I have a cluster of 4 Couchbase nodes (let's call them CB1, CB2, CB3, CB4).
I want to backup my entire cluster (including design docs and all buckets) but I want to run the backup procedure from my backup server (let's call it B1).
When I run cbbackup (on B1) and point it at CB1, I properly download and save the data from all nodes. However, when attempting to download the design docs, the backup borks.
Looking through the code, I notice that there is a restriction in two places where node filtering is done that limits the returned list to nodes that begin with the host pattern "localhost" or "127.0.0.1" (this is in pump.py and pump_tap.py).
Tracing back this logic, it seems to me that it is impossible to backup design docs on CB1 from a node other than CB1.
Am I wrong? Is this possible?
If it's not possible (follow-up question) can somebody point me to a design decision that justifies this, or explain the logic behind the decision?
Related
This question is more on architecture and libs, than on implementation.
I am currently working at project, which requires a local long-term cache storage (updated once a day) at client kept in sync with a remote db at server. For client side sqlite has been chosen as a lightweight approach and postgresql as feature rich db at server. Native replication mechanisms of postgres are no-opt cause I need to keep client really lightweight and free of relying on external components like db servers.
The implementation language would be Python. Now I'm looking at ORMs like SQLAlchemy, but haven't worked with any before.
Does SQLAlchemy have any tools to keep sqlite and postgres dbs in sync?
If not, are there any other Python libraries which have such tools?
Any ideas about how should the architecture look like, if the task must be solved "by hand"?
Added:
It's like telemetry, cause client would have internet connection only for approximately 20 minutes a day
So, the main question is about architecure of such a system
It doesn't usually fall within the tasks of an ORM to sync data between databases, so you will likely have to implement it yourself. I don't know of any solution that will handle syncing for you given your choice of databases.
There are a couple important design choices to consider:
how do you figure out what data changed ( i.e. inserted, updated or deleted )
what is the most efficient way to package the change-log
will you have to deal with conflicts ? and how will you do that.
The most efficient way to figure out what changed is to have the database tell you that directly. Bottled water can offer some inspiration in this regard. The idea is to tap into the event log postgres would use for replication. You will need something like Kafka to keep track of what each of your clients already knows. This will allow you to optimize your server for writes, as you won't have clients querying trying to figure out what changed since they were last online.
The same can be achieved on the sqlight end with event callbacks, you'll just have to trade some storage space on the client to retain the changes to be sent to the server. If that sounds like too much infrastructure for your needs, it's something that you can easily implement with SQL and pooling as well, but I would still think of it as an event log, and consider how it's implemented a detail - possibly allowing for a more efficient implementation lather on.
The best way to structure and package your change log will depend on your applications requirements, available band-with, etc. You could use standard formats such as json, compress and encrypt if needed.
It will be much simpler to design your application as such to avoid conflicts, and possibly flow data in a single direction, or partition your data so that it always flows in a single direction for a specific partition.
One final taught is that with such an architecture you would be getting incremental updates, some of which might be missed for unplanned reasons ( system failure, bugs, dropped messages, etc ). You could have some built in heuristic to check that your data matches, like at least checking the number of records on each side, with some way to recover such a fault, at a minimal a way to manually re-fetch the data from the authoritative source, i.e. if the server is authoritative, the client should be able to discard it's data and re-fetch it. You might need such a mechanism anyway for cases wen the client is reinstalled, etc.
I have created a memcached cluster in the Elasticache tool of AWS.
My program in every call sets keys with some data in cache and every time that I call the server it updates the data. However while testing it with the cluster I found that it seems that is changing the node where the key is located or it is erasing it, so the moment it changes the node /or erase the key, I lose my previous information.As Im only calling to one end point for all the cluster, shouldnt it keep the consistancy of the key over the cluster and not delete the content of the key or restart the key somewhere else ?
Is there any configuration parameter of memcached cluster to force it not to change the reference node for a key?
Now Im using the default configuration parameters of the AWS file default.memcached1.4..and I took a look to the configuration parameters at http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ParameterGroups.Memcached.html and I dont find any information giving me tips related to how to solve this issue.
(Pd. When I directly point my program to a specific node everything works fine)
That is the way it's supposed to be.
The following diagram illustrates a typical Memcached and a typical
Redis cluster. Memcached clusters contain from 1 to 20 nodes across
which you can horizontally partition your data. Redis
From http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Clusters.html
The django documentation says something similar.
One excellent feature of Memcached is its ability to share a cache
over multiple servers. This means you can run Memcached daemons on
multiple machines, and the program will treat the group of machines as
a single cache, without the need to duplicate cache values on each
machine
In other words, you cannot directly request data from any given node in the cluster. You have to let django's cache api figure out for you how to retrieve the data.
With redis the behaviour is the opposite. Once you write to the cluster you can query any node in the cluster for the data because it will be replicated to them all. Where as in memcache, it's sharded.
We are using multiple cassandra datastax cluster instances(6) to connect to cassandra using python. We are pooling these multiple connections to do some operations. Each operation is independent of other.
It works fine on a small number of operations, but once I try to scale up I get the following errors :
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.ption('Pool is shutdown',)})
and sometimes the following warning:
WARNING Heartbeat failed for connection (140414695068880) to 127.0.0.1
I tried changing some cluster object parameters but it did not help.
Following is the configuration of key space in cassandra I am using :
'class': 'SimpleStrategy',
'replication_factor': '1'
I am using lastest versions of cassandra and datastax driver for python. There is only one node is cassandra.
EDIT: More details:
The multiple cluster instances are in different processes (processes are created using the Python multiprocessing module) - one cluster instance per process. Lets call the proccesses Cassandra-Processes (CP). There are a bunch of other process that do some computation and need to look up a Cassandra DB, and write to it, occassionally. The current design is that each of these processes is mapped to one CP, and all DB reads/writes to be done by the process is done via this mapped CP. 'what' exactly is to be read/written is passed into a queue (again from the multiprocessing library) which the mapped CP reads.
We observe that this setup runs for quite sometime - and then suddenly Cassandra begins erroring out.
It's unclear why you're using six cluster instances against a single Cassandra node. Generally, you should use one Cluster instance per application (per remote cluster). You can read about general design considerations for Cassandra drivers here
If you're looking to "scale" with regards to throughput, you might consider using multiprocessing. I discuss this in a blog post here.
Follow-on:
Two things can be inferred from the information we have so far:
The application is pushing more concurrent requests than your connection pool is configured to handle. I say this because the "Pool is shutdown" only occurs when a request is waiting for a connection/stream to become available. You can tune connection pooling to make more available initially using cluster settings. However, if your "cluster" (server node) is overwhelmed, you won't gain much there.
Your connection is being shutdown. This exception only happens when the node is suddenly marked down. In a single node setup this is most likely because of a connection error. Look for clues in the server log, or driver debug log if you're capturing that.
We probably need to know more about your execution model to help more. Is it possible you're running unfettered async requests without occasionally waiting for them to complete?
Remote diagnosis is hard to do without knowing anything on your specific topology, setup and system configuration. This however looks much like a configuration problem or even the python driver. If you google your error message you will find multiple topics on Datastax's Jira describing this or similar problems, I would check that the Python Driver is up to date.
What would help in the first place is to see in detail what you try to do, how your cluster is configured aso.
Wondering about durable architectures for distributed Python applications.
This question I asked before should provide a little guidance about the sort of application it is. We would like to have the ability to have several code servers and several database servers, and ideally some method of deployment that is manageable and not too much of a pain.
The question I mentioned provides an answer that I like, but I wonder how it could be made more durable, or if doing so requires using other technologies. In particular:
I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:
Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.
The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.
What is the best way to make durable applications? Any suggestions on how to either "code for failure" or design it differently so that we don't necessarily have to? If you think Python might not be suited for this, that is also a valid solution.
Well to continue on the previous answer I gave.
In my projects I code for failure, because I use AWS for a lot of my hosting needs.
I have implemented database backends that will make sure that the database, region, is accessible and if not it will choose another region from a specified list. This happens transparently to the rest of the system on that node. So, if the east-1a region goes down I have a few other regions that I also host in that it will failover into, such as the west coast. I keep track of currently going database transactions and send them over to the west coast and dump them to a file so I can import them into the old database region once it becomes available.
My front end servers sit behind a elastic load balancer that is distributed across multiple regions and this allows for durable recovery if a region fails. But, it cannot be relied upon so I am looking into solutions such as running a HAProxy and switching my DNS in the case that my ELB goes down. This is a work in progress and I cannot give specifics on my own solutions.
To make your data processing durable look into Celery and store the data in a distributed mongo server to keep your results safe. Using a durable data store to keep your results allows you to get them back in the event of a node crash. It comes at the cost of some performance, but it shouldn't be too terrible if you only rely on soft-realtime constraints.
http://www.mnxsolutions.com/amazon/designing-for-failure-with-amazon-web-services.html
The above article talks mostly about AWS but the ideas apply to any system that you need to keep high availability in and system durability. Just remember that downtime is ok as long as you minimize it for a subset of users.
Most of the production env might need a automation script for Clusters. Reason is whenever there is a need of enhance the sites need to add new cluster to existing domain. When there is a decommissioned a physical site that needs removal of cluster from the domain. There is also possibility of "Growth" of Cluster (adding managed servers to a cluster). Finally there is also need of decommission of machines which requires removal of servers from the cluster.
Menu
1. Add a Cluster
2. Del a Cluster
3. Add a server to Cluster
4. Del a server from Cluster
Please share your thoughts and suggestions...
Thanks in advance.
I'm actually doing exactly that and it works fine.
You'll have to add the inital
edit()
startEdit()
and to save
save()
activate(block='true')
as well as exception handling but the functions are pretty simple:
Add a server to cluster:
managedServer = create(ServerName,'Server')
managedServer.setListenPort(ListenPort)
managedServer.setCluster(Clustername)
managedServer.setMachine(Machinename)
Delete server from cluster (and the server, optional):
serverMBean = getMBean("Servers/"+ServerName)
serverMBean.setCluster(None)
serverMBean.setMachine(None)
delete(ServerName,'Server')
Add a cluster (you can also use the same method as creating a server -> create(name, 'Cluster'):
cd('/')
cmo.createCluster('Cluster-0')
cd('/Clusters/Cluster-0')
cmo.setClusterMessagingMode('unicast')
...
Delete a cluster works the same way as deleting a server, you should power them down first though.
In general you can use the admin console to perform the desired actions and the record button at the top allows you to generate a wlst script which does exactly what you do in the admin console.