Interrelated requests MySQL, analogue in MongoDB - python

Good day dear colleagues, I decided to move some projects from MySQL to MongoDB and faced several difficulties:
For example there are two tables in MySQL:
Users:
CREATE TABLE `testdb`.`users` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 55 ) NOT NULL ,
`password` VARCHAR( 32 ) NOT NULL
) ENGINE = MYISAM
Rules:
CREATE TABLE `testdb`.`rules` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`uid` INT NOT NULL ,
`title` VARCHAR( 155 ) NOT NULL ,
`points` INT NOT NULL
) ENGINE = MYISAM
Now to chose all "rules", which belong to a paticular user I can make SQL request:
SELECT r.`title`, r.`points` FROM `rules` r, `users` u WHERE r.`uid` = u.`id` AND u.`id` = '123'
By now, I can't figure out how to do the same in MongoDB, can you please explain and provide an example.
P.S. I make implementation in Python with the help of pymongo
P.P.S. I also wanted to see the alternative ways of solving this problem with the help of ORM mongoengine or mongokit.
Thank you in advance:)

MongoDB does not support joins, unlike RDBMS's like mysql. And that's because MongoDB is not a relational database. Modelling data in MongoDB in the same way as you do in an RDBMS is therefore generally a bad idea - you have to design your schemas in a whole different mindset.
In this case for example, in MongoDB you could have 1 document per User, with the Rules belonging each user nested inside.
e.g.
{
"ID" : 1,
"name" : "John",
"password" : "eek hope this is secure",
"rules": [
{
"ID" : 1,
"Title" : "Rule 1",
"Points" : 100
},
{
"ID" : 2,
"Title" : "Rule 2",
"Points" : 200
}
]
}
This means, you only need a single read to pull back a user and all their rules.
A good starting point is the Mongodb.org reference on Schema Design - what I'm talking about above is embedding objects.

Related

Usage of postgres jsonb

I'm trying to figure out how to work better with json in postgres.
I have a file that stores information about many tables (structure and values). File is periodically updated, this may mean changes in data as well as in table structures. It turns out some kind of dynamic tables.
As a result, I have json table structure (key is column, value is field type (string or number only)) and list of json records for each table.
Something like this (actualy structure does not matter):
{
'table_name': 'table1',
'columns': {
'id': 'int',
'data1': 'string',
'data2': 'string'
},
'values': [
[1, 'aaa', 'bbb'],
[2, 'ccc', 'ddd']
]
}
At first I wanted to make a real table for each table in file, do truncate when updating the data and drop table if table structure changes. Second option I'm testing now is a single table with json data:
CREATE TABLE IF NOT EXISTS public.data_tables
(
id integer NOT NULL,
table_name character varying(50),
row_data jsonb,
CONSTRAINT data_tables_pkey PRIMARY KEY (id)
)
And now there is the question of how to properly work with json:
directly query row_data like row_data->>'id' = 1 with hash index for 'id' key
use jsonb_populate_record with custom types for each table (yes, I need to recreate them each time table structure will change)
probably some other way to work with it?
First option is the easiest and fast because of indexes, but there is no data type control and you have to put it in every query.
Second option is more difficult to implement, but easier to use in queries. I can even create views for each table with jsonb_populate_record. But as far as I see - indexes won't work with json function?
Perhaps there is a better way? Or is recreating tables not such a bad option?
Firstly, your JSON string is not the correct format. I wrote the corrected sample JSON string:
{
"table_name": "table1",
"columns": {
"id": "integer",
"data1": "text",
"data2": "text"
},
"values": [
{
"id": 1,
"data1": "aaa",
"data2": "bbb"
},
{
"id": 2,
"data1": "ccc",
"data2": "ddd"
}
]
}
I wrote a sample function for you, but only for creating table from JSON. You can write SQL code for inserting process too, it's easy, not difficult.
Sample Function:
CREATE OR REPLACE FUNCTION dynamic_create_table()
RETURNS boolean
LANGUAGE plpgsql
AS $function$
declare
rec record;
begin
FOR rec IN
select
t1.table_name,
string_agg(t2.pkey || ' ' || t2.pval || ' NULL', ', ') as sql_columns
from data_tables t1
cross join jsonb_each_text(t1.row_data->'columns') t2(pkey, pval)
group by t1.table_name
loop
execute 'create table ' || rec.table_name || ' (' || rec.sql_columns || ')';
END loop;
return true;
END;
$function$;

Best approach to JSON info to insert into a database

I have managed to extract from an API some information but the format its in is hard for a novice programmer like me. I can save it to a file or move it to a new list etc. but what stumps me is should I not mess with the data and insert it as is, or - do I make it into a human type format and basically deconstruct it to use after?
The JSON was already difficult as it was in a nested dictionary, and the value was a list. So after trying things out I want it to actually sit in a database. I am using postgresql as the database for now and am learning python.
response = requests.post(url3, headers=headers)
jsonResponse = response.json()
my_data = jsonResponse['message_response']['scanresults'][:]
store_list = []
for item in my_data:
dev_details = {"mac":None, "user":None, "resource_name":None}
dev_details['mac'] = item['mac_address']
dev_details['user'] = item['agent_logged_on_users']
dev_details['devName'] = item['resource_name']
store_list.append(dev_details)
try:
connection = psycopg2.connect(
user="",
other_info="")
# create cursor to perform db actions
curs = connection.cursor()
sql = "INSERT INTO public.tbl_devices (mac, user, devName) VALUES (%(mac)s, %(user)s, %(devName)s);"
curs.execute(sql, store_list)
connection.commit()
finally:
if (connection):
curs.close()
connection.close()
print("Connection terminated")
I have ended up with a dictionary as records inside a list:
[{rec1},{rec2}..etc]
And naturally putting the info in the database it is complaining about "list indices must be integers or slices" so wanting some advice on A) the way to add this into a database table or B) use a different approach.
Many thanks in advance
Good that you ask! The answer is almost certainly that you should not just dump the JSON into the database as it is. That makes things easy in the beginning, but you'll pay the price when you try to query or modify the data later.
For example, if you have data like
[
{ "name": "a", "keys": [1, 2, 3] },
{ "name": "b", "keys": [4, 5, 6] }
]
create tables
CREATE TABLE key_list (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
mytable_id BIGINT REFERENCES mytable NOT NULL
);
CREATE TABLE key (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
value integer NOT NULL,
key_list_id bigint REFERENCES key_kist NOT NULL
);
and store the values in that fashion.

Python doesn't select joined Tables from SQlite database while just running a query works fine

I am using Python 3.6.3 and SQLite 3.14.2. I have two tables with one having a foreign key pointin to the other one. When I run the query with join in SQlite browser, it works fine and returns the results I need. But when I try to execute the query in Python, it always return empty list. No matter how simple I make the join, the result is same. Can anyone help me? Thank you in advance.
query = '''SELECT f.ID, f.FoodItemName, f.WaterPerKilo, r.AmountInKilo FROM
FoodItems AS f INNER JOIN RecipeItems AS r on f.ID=r.FoodItemID
WHERE r.RecipeID = {:d}'''.format(db_rec[0])
print(query)
db_fooditems = cur.execute(query).fetchall() #this returns []
The Tables are as follows:
CREATE TABLE "FoodItems" (
`ID` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
`FoodItemName` TEXT NOT NULL,
`WaterPerKilo` REAL NOT NULL)
CREATE TABLE "RecipeItems" (
`ID` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
`RecipeID` INTEGER NOT NULL,
`FoodItemID` INTEGER NOT NULL,
`AmountInKilo` REAL NOT NULL)
with some random data.

Python sqlite3: INSERT into table WHERE NOT EXISTS, using ? substitution parameter

I'm creating a table of descriptions from a list of not necessarily unique descriptions. I would like the table to contain only distinct descriptions, so while inserting descriptions into the table, I need to check to see if they already exist. My code(simplified) looks something like as follows:
cur.execute(''' CREATE TABLE descriptions
(id INTEGER PRIMARY KEY AUTOINCREMENT, desc TEXT)''')
descripts = ["d1", "d2", "d3", "d4", "d3", "d1", "d5", "d6", "d7", "d2"]
cur.executemany('''
INSERT INTO descriptions(desc)
VALUES (?)
WHERE NOT EXISTS (
SELECT *
FROM descriptions as d
WHERE d.desc=?)
''', zip(descripts, descripts))
The result is OperationalError: near "WHERE": syntax error, and I'm not sure exactly where I'm going wrong.
Just a note: I realize I could solve this using a set() structure in python, but for academic reasons this is not permitted.
Thanks
To replace VALUES by SELECT should work
cursor.executemany('''
INSERT INTO descriptions(desc)
SELECT (?)
WHERE NOT EXISTS (
SELECT *
FROM descriptions as d
WHERE d.desc=?)''',
zip(descripts, descripts))

Efficiently querying a graph structure

I have a database which consists of a graph. The table I need to access looks like this:
Sno Source Dest
1 'jack' 'bob'
2 'jack' 'Jill'
3 'bob' 'Jim'
Here Sno is the primary key. Source and Destination are 2 non-unique numbers which represents an edge between nodes in my graph. My Source and Dest may also be strings and not necessarily an number data type. I have around 5 million entries in my database and I have built it using Postgresql with Psycopg2 for python.
It is very easy and quick to query for the primary key. However, I need to frequently query this database for all the dest a particular source is connected to. Right now I achieve this by calling the query:
SELECT * FROM name_table WHERE Source = 'jack'
This turns out to be quite inefficient (Up to 2 seconds per query) and there is no way that I can make this the primary key as it is not unique. Is there any way that I can make an index based on these repeated values and query it quickly?
This should make your query much faster.
CREATE INDEX table_name_index_source ON table_name Source;
However there are many options which you can use
PostgreSQL Documentation
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]
Read more about indexing with PostgreSQL in their Documentation.
Update
If your table is that small as yours, this will for help for sure. However if your dataset is growing you should probably consider a schema change to have unique values which can be indexed more efficiently.

Categories