Memcached AWS Cluster not keeping the key in the same node

Memcached AWS Cluster not keeping the key in the same node - python

I have created a memcached cluster in the Elasticache tool of AWS.
My program in every call sets keys with some data in cache and every time that I call the server it updates the data. However while testing it with the cluster I found that it seems that is changing the node where the key is located or it is erasing it, so the moment it changes the node /or erase the key, I lose my previous information.As Im only calling to one end point for all the cluster, shouldnt it keep the consistancy of the key over the cluster and not delete the content of the key or restart the key somewhere else ?
Is there any configuration parameter of memcached cluster to force it not to change the reference node for a key?
Now Im using the default configuration parameters of the AWS file default.memcached1.4..and I took a look to the configuration parameters at http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ParameterGroups.Memcached.html and I dont find any information giving me tips related to how to solve this issue.
(Pd. When I directly point my program to a specific node everything works fine)

That is the way it's supposed to be.
The following diagram illustrates a typical Memcached and a typical
Redis cluster. Memcached clusters contain from 1 to 20 nodes across
which you can horizontally partition your data. Redis
From http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Clusters.html
The django documentation says something similar.
One excellent feature of Memcached is its ability to share a cache
over multiple servers. This means you can run Memcached daemons on
multiple machines, and the program will treat the group of machines as
a single cache, without the need to duplicate cache values on each
machine
In other words, you cannot directly request data from any given node in the cluster. You have to let django's cache api figure out for you how to retrieve the data.
With redis the behaviour is the opposite. Once you write to the cluster you can query any node in the cluster for the data because it will be replicated to them all. Where as in memcache, it's sharded.

Related

Saving frequently updated arrays to database

I am creating a machine learning app which should save a number to a database locally and frequently.
These number values are connected which basically means that I want to frequently update a time series of values by appending a number to the list.
An ideal case would be to be able to save key-value pairs where key would represent the name of the array (example train_loss) and value would be according time series.
My first idea was leveraging redis but as far as I know redis data is only saved in RAM? What I want to achieve is saving to a disk after every log or perhaps after every couple of logs.
I need the local save of data since this data will be consumed by other app (in javascript). Therefore some JSON-like format would be nice.
Using JSON files (and Python json package) is an option, but I believe it would result in an I/O bottleneck because of frequent updates.
I am basically trying to create a clone of a web app like Tensorboard.

A technique we use in the backend of hosted application for frequently used read/post api is to write to the Redis and DB at the same time and during the read operation we check if the key is available in the Redis and if it's not we read update it to the Redis and then serve it

What's the optimal way to store image data temporarily in a containerized website?

I'm currently working on a website where i want the user to upload one or more images, my flask backend will do some changes on these pictures and then return them back to the front end.
Where do I optimally save these images temporarily especially if there are more then one user at the same time on my website (I'm planning on containerizing the website). Is it safe for me to save the images in the folder of the website or do I need e.g. a database for that?

You should use a database, or external object storage like Amazon S3.
I say this for a couple of reasons:
Accidents do happen. Say the client does an HTTP POST, gets a URL back, and does an HTTP GET to retrieve the result. But in the meantime, the container restarts (because the system crashed; your cloud instance got terminated; you restarted the container to upgrade its image; the application failed); the container-temporary filesystem will get lost.
A worker can run in a separate container. It's very reasonable to structure this application as a front-end Web server, that pushes messages into a job queue, and then a back-end worker picks up messages out of that queue to process the images. The main server and the worker will have separate container-local filesystems.
You might want to scale up the parts of this. You can easily run multiple containers from the same image; they'll each have separate container-local filesystems, and you won't directly control which replica a request goes to, so every container needs access to the same underlying storage.
...and it might not be on the same host. In particular, cluster technologies like Kubernetes or Docker Swarm make it reasonably straightforward to run container-based applications spread across multiple systems; sharing files between hosts isn't straightforward, even in these environments. (Most of the Kubernetes Volume types that are easy to get aren't usable across multiple hosts, unless you set up a separate NFS server.)
That set of constraints would imply trying to avoid even named volumes as much as you can. It makes sense to use volumes for the underlying storage for your database, and it can make sense to use Docker bind mounts to inject configuration files or get log files out, but ideally your container doesn't really use its local filesystem at all and doesn't care how many copies of itself are running.
(Do not rely on Docker's behavior of populating a named volume on first use. There are three big problems with it: it is on first use only, so if you update the underlying image, the volume won't get updated; it only works with Docker named volumes and not other options like bind-mounts; and it only works in Docker proper and not in Kubernetes.)
Other decisions are possible given other sets of constraints. If you're absolutely sure you will never ever want to run this application spread across multiple nodes, Docker volumes or bind mounts might make sense. I'd still avoid the container-temporary filesystem.

How to create an Elasticache with more than one node in Redis?

Elasticache was updated to support more than one node for Redis clusters. In the console more than one node can be created for the same replication group, but using boto, if the option num_cache_nodes is set to more than 1 the API throwns and error saying that it cannot create with more than one node. Is the boto library not up to date or there is another gotcha?

As of August 2015 you cannot, even in the console. The field is there, but you cannot set it to anything other than one.
What you can do is to create a replication group with various Redis cache clusters (each of them size one), one acting as read/write master and the rest as read replicas.
At this time, ElastiCache supports single-node Redis clusters.
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheNode.Redis.html

Backing up a Couchbase 2.0 cluster from a non-CB server

Couchbase ships with a backup tool called cbbackup (the docs are here).
In my setup, I have a cluster of 4 Couchbase nodes (let's call them CB1, CB2, CB3, CB4).
I want to backup my entire cluster (including design docs and all buckets) but I want to run the backup procedure from my backup server (let's call it B1).
When I run cbbackup (on B1) and point it at CB1, I properly download and save the data from all nodes. However, when attempting to download the design docs, the backup borks.
Looking through the code, I notice that there is a restriction in two places where node filtering is done that limits the returned list to nodes that begin with the host pattern "localhost" or "127.0.0.1" (this is in pump.py and pump_tap.py).
Tracing back this logic, it seems to me that it is impossible to backup design docs on CB1 from a node other than CB1.
Am I wrong? Is this possible?
If it's not possible (follow-up question) can somebody point me to a design decision that justifies this, or explain the logic behind the decision?

Anyone tried Cluster automation with WLST?

Most of the production env might need a automation script for Clusters. Reason is whenever there is a need of enhance the sites need to add new cluster to existing domain. When there is a decommissioned a physical site that needs removal of cluster from the domain. There is also possibility of "Growth" of Cluster (adding managed servers to a cluster). Finally there is also need of decommission of machines which requires removal of servers from the cluster.
Menu
1. Add a Cluster
2. Del a Cluster
3. Add a server to Cluster
4. Del a server from Cluster
Please share your thoughts and suggestions...
Thanks in advance.

I'm actually doing exactly that and it works fine.
You'll have to add the inital
edit()
startEdit()
and to save
save()
activate(block='true')
as well as exception handling but the functions are pretty simple:
Add a server to cluster:
managedServer = create(ServerName,'Server')
managedServer.setListenPort(ListenPort)
managedServer.setCluster(Clustername)
managedServer.setMachine(Machinename)
Delete server from cluster (and the server, optional):
serverMBean = getMBean("Servers/"+ServerName)
serverMBean.setCluster(None)
serverMBean.setMachine(None)
delete(ServerName,'Server')
Add a cluster (you can also use the same method as creating a server -> create(name, 'Cluster'):
cd('/')
cmo.createCluster('Cluster-0')
cd('/Clusters/Cluster-0')
cmo.setClusterMessagingMode('unicast')
...
Delete a cluster works the same way as deleting a server, you should power them down first though.
In general you can use the admin console to perform the desired actions and the record button at the top allows you to generate a wlst script which does exactly what you do in the admin console.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.