Insert big list into Cassandra using python - python

I have a problem inserting big list into Cassandra using python. I have a list of 3200 string that I want to save in Cassandra:
CREATE TABLE IF NOT EXISTS my_test (
id bigint PRIMARY KEY,
list_strings list<text>
);
When I'm reducing my list I have no problem. It works.
prepared_statement = session.prepare("INSERT INTO my_test (id, list_strings) VALUES (?, ?)")
session.execute(prepared_statement, [id, strings[:5]])
But if I keep the totality of my list I have an error:
Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 1 failures" info={'required_responses': 1, 'consistency': 'LOCAL_ONE', 'received_responses': 0, 'failures': 1}
How can I insert big list into Cassandra?

A DB array type is not supossed to hold that ammount of data. Using different rows of the table to store each string would be better:
id | time | strings
-----------+------------+---------
bigint | timestamp | string
partition | clustering |
Using id as the clustering key would be a bad solution as when requesting all the tweets from a user id, it will require to do a read in multiple nodes while when used as a partition key it will only require to read in one node per user.

Related

How do I get only the value of a column identified by a unique userid in MySQL? pls refer to Question for more detailed explanation

I'm currently writing an rpg game in python that uses a mysql database to store info on players. However, I've come across a problem.
Sample Code of How Database has been Set Up:
playerinfo Table
userID | money | xp |
1 | 200 | 20 |
2 | 100 | 10 |
I'm trying to select the amount of money with only the value. My select query right now is
SELECT money FROM playerinfo WHERE id = 1
The full code/function for collecting selecting the info is
def get_money_stats(user_id):
global monresult
remove_characters = ["{", "}", "'"]
try:
with connection.cursor() as cursor:
monsql = "SELECT money FROM players WHERE userid = %s"
value = user_id
cursor.execute(monsql, value)
monresult = str(cursor.fetchone())
except Exception as e:
print(f"An error Occurred> {e}")
CURRENT OUTPUT:
{'money': 200}
DESIRED OUTPUT:
200
Basically, all I want to select is the INT/DATA from the player's row (identified by unique userid). How do I do that? The only solution I have is to replace the characters with something else but I don't really want to do that as it's incredibly inconvenient and messy. Is there another way to reformat/select the data?
It seems like that fetching one row gives you a dictionary of the selected columns with its values, which seems the correct approach to me. You should simply access the dictionary with the column that you want to retrieve:
monresult = cursor.fetchone()['money']
If you don't want to specify again the column (which you should) you could get the values of the dictionary as a list and retrieve the first one:
monresult = list(cursor.fetchone().values())[0]
I do not recommend the last approach because it's heavily dependent on the current status of the query and it may have to change if the query is changed.

SQL - Possible to Auto Increment Number but with leading zeros?

Our IDs look something like this "CS0000001" which stands for Customer with the ID 1. Is this possible to to with SQL and Auto Increment or do i need to to that in my GUI ?
I need the leading zeroes but with auto incrementing to prevent double usage if am constructing the ID in Python and Insert them into the DB.
Is that possible?
You have few choices:
Construct the CustomerID in your code which inserts the data into
the Customer table (=application side, requires change in your code)
Create a view on top of the Customer-table that contains the logic
and use that when you need the CustomerID (=database side, requires change in your code)
Use a procedure to do the inserts and construct the CustomerID in
the procedure (=database side, requires change in your code)
Possible realization.
Create data table
CREATE TABLE data (id CHAR(9) NOT NULL DEFAULT '',
val TEXT,
PRIMARY KEY (id));
Create service table
CREATE TABLE ids (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY);
Create trigger which generates id value
CREATE TRIGGER tr_bi_data
BEFORE INSERT
ON data
FOR EACH ROW
BEGIN
INSERT INTO ids () VALUES ();
SET NEW.id = CONCAT('CS', LPAD(LAST_INSERT_ID(), 7, '0'));
DELETE FROM ids;
END
Create trigger which prohibits id value change
CREATE TRIGGER tr_bu_data
BEFORE UPDATE
ON data
FOR EACH ROW
BEGIN
SET NEW.id = OLD.id;
END
Insert some data, check result
INSERT INTO data (val) VALUES ('data-1'), ('data-2');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Try to update, ensure id change prohibited
UPDATE data SET id = 'CS0000100' WHERE val = 'data-1';
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Insert one more data, ensure enumeration continues
INSERT INTO data (val) VALUES ('data-3'), ('data-4');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
CS0000003 | data-3
CS0000004 | data-4
Check service table is successfully cleared
SELECT COUNT(*) FROM ids;
| COUNT(*) |
| -------: |
| 0 |
db<>fiddle here
Disadvantages:
Additional table needed.
Generated id value edition is disabled (copy and delete old record must be used instead, custom value cannot be set).

Postgresql: Insert from huge csv file, collect the ids and respect unique constraints

In a postgresql database:
class Persons(models.Model):
person_name = models.CharField(max_length=10, unique=True)
The persons.csv file, contains 1 million names.
$cat persons.csv
Name-1
Name-2
...
Name-1000000
I want to:
Create the names that do not already exist
Query the database and fetch the id for each name contained in the csv file.
My approach:
Use the COPY command or the django-postgres-copy application that implements it.
Also take advantage of the new Postgresql-9.5+ upsert feature.
Now, all the names in the csv file, are also in the database.
I need to get their ids -from the database- either in memory or in another csv file with an efficient way:
Use Q objects
list_of_million_q = <iterate csv and append Qs>
million_names = Names.objects.filter(list_of_million_q)
or
Use __in to filter based on a list of names:
list_of_million_names = <iterate csv and append strings>
million_names = Names.objects.filter(
person_name__in=[list_of_million_names]
)
or
?
I do not feel that any of the above approaches for fetching the ids is efficient.
Update
There is a third option, along the lines of this post that should be a great solution which combines all the above.
Something like:
SELECT * FROM persons;
make a name: id dictionary out of the names recieved from the database:
db_dict = {'Harry': 1, 'Bob': 2, ...}
Query the dictionary:
ids = []
for name in list_of_million_names:
if name in db_dict:
ids.append(db_dict[name])
This way you're using the quick dictionary indexing as opposed to the slower if x in list approach.
But the only way to really know for sure is to benchmark these 3 approaches.
This post describes how to use RETURNING with ON CONFLICT so while inserting into the database the contents of the csv file, the ids will be saved in another table either when an insertion was successful, or when -due to unique constraints- the insertion was omitted.
I have tested it in sqlfiddle where I used a set up that resembles the one used for the COPY command which inserts to the database straight from a csv file, respecting the unique constraints.
The schema:
CREATE TABLE IF NOT EXISTS label (
id serial PRIMARY KEY,
label_name varchar(200) NOT NULL UNIQUE
);
INSERT INTO label (label_name) VALUES
('Name-1'),
('Name-2');
CREATE TABLE IF NOT EXISTS ids (
id serial PRIMARY KEY,
label_ids varchar(12) NOT NULL
);
The script:
CREATE TEMP TABLE tmp_table
(LIKE label INCLUDING DEFAULTS)
ON COMMIT DROP;
INSERT INTO tmp_table (label_name) VALUES
('Name-2'),
('Name-3');
WITH ins AS(
INSERT INTO label
SELECT *
FROM tmp_table
ON CONFLICT (label_name) DO NOTHING
RETURNING id
)
INSERT INTO ids (label_ids)
SELECT
id FROM ins
UNION ALL
SELECT
l.id FROM tmp_table
JOIN label l USING(label_name);
The output:
SELECT * FROM ids;
SELECT * FROM label;

How to update all column values with new values in a MySQL database

I am using Python to pull data from an API and update a MySQL database with those values. I realized that as my code is right now, the data I want from the API is only being INSERTED into the database the first time the code is ran but the database needs to be updated with new values whenever the code is executed after the first time. The API provides close to real - time values of position, speed, altitude etc. of current airlines in flight. My database looks like this:
table aircraft:
+-------------+------------+------------+-----------+-----------+
| longitude | latitude | velocity | altitude | heading |
+-------------+------------+------------+-----------+-----------+
| | | | | |
| | | | | |
I am quite new to MySQL and I am having trouble finding how to do this the right way. The point is to run the code in order to update the table whenever I want. Or possibly every 10 seconds or so. I am using Python's MySQLdb module in order to execute SQL commands within the python code. Here is the main part of what I currently have.
#"states" here is a list of state vectors from the api that have the data I want
states = api.get_states()
#creates a cursor object to execute SQL commands
#the parameter to this function should be an SQL command
cursor = db.cursor()
#"test" is the database name
cursor.execute("USE test")
print("Adding states from API ")
for s in states.states:
if( s.longitude is not None and s.latitude is not None):
cursor.execute("INSERT INTO aircraft(longitude, latitude) VALUES (%r, %r);", (s.longitude, s.latitude))
else:
("INSERT INTO aircraft(longitude, latitude) VALUES (NULL, NULL);")
if(s.velocity is not None):
cursor.execute("INSERT INTO aircraft(velocity) VALUES (%r);" % s.velocity)
else:
cursor.execute("INSERT INTO aircraft(velocity) VALUES (NULL);")
if(s.altitude is not None):
cursor.execute("INSERT INTO aircraft(altitude) VALUES (%r);" % s.altitude)
else:
cursor.execute("INSERT INTO aircraft(altitude) VALUES (NULL);")
if(s.heading is not None):
cursor.execute("INSERT INTO aircraft(heading) VALUES (%r);" % s.heading)
else:
cursor.execute("INSERT INTO aircraft(heading) VALUES (NULL);")
db.commit()

Gather rows under the same key in Cassandra/Python

I was working through the Pattern 1 and was wondering if there is a way, in python or otherwise to "gather" all the rows with the same id into a row with dictionaries.
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
Insert some data
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Query the database.
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
Result:
weatherstation_id | event_time | temperature
-------------------+--------------------------+-------------
1234ABCD | 2013-04-03 06:01:00+0000 | 72F
1234ABCD | 2013-04-03 06:02:00+0000 | 73F
1234ABCD | 2013-04-03 06:03:00+0000 | 73F
1234ABCD | 2013-04-03 06:04:00+0000 | 74F
Which works, but I was wondering if I can turn this into a row per weatherstationid.
E.g.
{
"weatherstationid": "1234ABCD",
"2013-04-03 06:01:00+0000": "72F",
"2013-04-03 06:02:00+0000": "73F",
"2013-04-03 06:03:00+0000": "73F",
"2013-04-03 06:04:00+0000": "74F"
}
Is there some parameter in cassandra driver that can be specified to gather by certain id (weatherstationid) and turn everything else into dictionary? Or is there needs to be some Python magic to turn list of rows into single row per id (or set of IDs)?
Alex, you will have to do some post execution data processing to get this format. The driver returns row by row, no matter what row_factory you use.
One of the reasons the driver cannot accomplish the format you suggest is that there is pagination involved internally (default fetch_size is 5000). So the results generated your way could potentially be partial or incomplete. Additionally, this can be easily be done with Python when the query execution is done and you are sure that all required results are fetched.

Categories