I have never used REDIS so excuse my ignorance. REDIS is supposed to store (keys,values) to disk and it's better in speed than mysqldb in queries. My questions are :
-Can I save the REDIS db not in the primary disk but in an external (should i do this by installing redis not in the primary's disk location ?)
-Is there an API where I can see the stored data (something like sql's phpmyadmin ) ?
Thanks in advance.
First of all, Redis is a in-memory DB, that is, all the DB is stored in memory but It save the data to disk based on several algorithms.
So, to store in a different disk you only need to set the destination in your redis config file.
Example of redis conf file: https://raw.github.com/antirez/redis/2.6/redis.conf
dbfilename /my/path/dump.rdb
About the redis API, you can use the redis-cli, It's a friendly Command Line interface for redis that It's included in the redis installation. But if you prefer something more "Graphical", take a look to http://redisdesktop.com/ a Open Source GUI for Redis.
I recommend to you a look to Redis QuickStart guide
Related
I have about 100 million json files (10 TB), each with a particular field containing a bunch of text, for which I would like to perform a simple substring search and return the filenames of all the relevant json files. They're all currently stored on Google Cloud Storage. Normally for a smaller number of files I might just spin up a VM with many CPUs and run multiprocessing via Python, but alas this is a bit too much.
I want to avoid spending too much time setting up infrastructure like a Hadoop server, or loading all of that into some MongoDB database. My question is: what would be a quick and dirty way to perform this task? My original thoughts were to set up something on Kubernetes with some parallel processing running Python scripts, but I'm open to suggestions and don't really have a clue how to go about this.
Easier would be to just load the GCS data into Big Query and just run your query from there.
Send your data to AWS S3 and use Amazon Athena.
The Kubernetes option would be set up a cluster in GKE and install Presto in it with a lot of workers, use a hive metastore with GCS and query from there. (Presto doesn't have direct GCS connector yet, afaik) -- This option seems more elaborate.
Hope it helps!
I have millions (about 200 million) records, which I need to insert into my CosmosDb.
I've just discovered that Microsoft failed to implement batch insert capability... (1), (2) <--- 3 years to reply.
Rather than throw myself of a Microsoft building, I have started to look for alternative solutions.
One idea I had was to write all my documents to a file and then import that file into my DB.
Now, how, once I have created my JSON file containing my documents, can I do the bulk import (from Python 3.6)?
I've come across some Migration tool, but I was wondering if there is a better/quicker way and without me having to install this tool... You see, I will be running my code in a WebJob, so installing the migration tool may not be an option, anyway.
I suggest you to use Mongo DB driver which is supported by Azure Document DB.
Mongo DB driver works with the binary protocol, while Azure Document DB SDK works with the HTTP protocol.
As we know,work efficiency of binary protocol is better than HTTP protocol.
You could use Bulk method to import data into Azure Cosmos DB and each group of operations can have at most 1000 operations.
Please refer to the document here.
Notes:
The maximum size for a document is 2 MB in the Azure Document DB which is mentioned here.
I Have a Python application which analyses data from multiple sources in real time. Once the data is analyzed the result of the analysis is stored in a database along with a time-stamp of when it was analyzed.
I would like to access the most recent result of this program remotely from another computer.
I was thinking about using python sockets and having a server script running on the main computer which runs the application and then that way I can access the data using a client script on another computer.
Is there a better way of doing this? Or are there any other solutions out there that can address this need?
Your question is very broad.
Most DB servers will provide a method/API to access the data remotely. You can use Python as a client if there is a DBAPI module for your DB that supports remote access over the network. For example if you are using Postgres you could use the psycopg2 module.
If you are using a simple DB such as SQLite then you might be able to use an ODBC driver. Some alternatives are here.
Edit
mongodb provides an API, pymongo.
In the end Redis was the best solution. Considering the original question The goal was to be able to send data in real time from one computer to another. Solutions such as Redis or RabbitMQ successfully accomplish this.
With Redis a server can be setup and it can publish messages to the network, clients can then subscribe to data channels and receive the messages in a queue
This Python library was used as a python Redis client :
https://pypi.python.org/pypi/redis
I've been using PostgreSQL for the longest time. All of my data lives inside Postgres. I've recently looked into redis and it has a lot of powerful features that would otherwise take a couple of lines in Django (python) to do. Redis data is persistent as long the machine it's running on doesn't go down and you can configure it to write out the data it's storing to disk every 1000 keys or every 5 minutes or so depending on your choice.
Redis would make a great cache and it would certainly replace a lot of functions I have written in python (up voting a user's post, viewing their friends list etc...). But my concern is, all of this data would some how need to be translated over to postgres. I don't trust storing this data in redis. I see redis as a temporary storage solution for quick retrieval of information. It's extremely fast and this far outweighs doing repetitive queries against postgres.
I'm assuming the only way I could technically write the redis data to the database is to save() whatever I get from the 'get' query from redis to the postgres database through Django.
That's the only solution I could think of. Do you know of any other solutions to this problem?
Redis is increasingly used as a caching layer, much like a more sophisticated memcached, and is very useful in this role. You usually use Redis as a write-through cache for data you want to be durable, and write-back for data you might want to accumulate then batch write (where you can afford to lose recent data).
PostgreSQL's LISTEN and NOTIFY system is very useful for doing selective cache invalidation, letting you purge records from Redis when they're updated in PostgreSQL.
For combining it with PostgreSQL, you will find the Redis foreign data wrapper provider that Andrew Dunstain and Dave Page are working on very interesting.
I'm not aware of any tool that makes Redis into a transparent write-back cache for PostgreSQL. Their data models are probably too different for this to work well. Usually you write changes to PostgreSQL and invalidate their Redis cache entries using listen/notify to a cache manager worker, or you queue changes in Redis then have your app read them out and write them into Pg in chunks.
Redis is persistent if configured to be so, both through snapshots and a kind of WAL called AOF. Loads of people use it as a primary datastore.
https://redis.io/topics/persistence
If one is referring to the greater world of Redis compatible (resp protocol) datastores, many are not limited to in-memory storage:
https://keydb.dev/
http://ssdb.io/
and many more...
I created an app. Its copy will be in two different computers. But a sqlite database file needs to be common for these two apps. I mean, both computers will be able to read and write this database file. For this purpose, I will put this file in a folder on our server which both computers are connected to. How can I get the full path for this file in Python? Or can you suggest any other way as easy as possible for doing this task?
Sqlite over a network share [stackoverflow.com]
I'd recommend against database files on a network drive. The network filesystem usually isn't robust enough to handle random updates like a DB.
As a previous answer suggested, you'd be better off creating a simple client/server model. A server process has sole access to the sqlite db, clients send requests to the server. Don't pass the sqlite db file back and forth.
You might want to use a full network DB such as MySQL or PostgreSQL.
I would have a Python server program running on the server with the database file (using the sockets library). Then have the two clients connect to the server program (again, using the sockets library), and then receive the database file. You can find some examples for the socket library at http://www.prasannatech.net/2008/07/socket-programming-tutorial.html