SQL Alchemy: Convert Row Values to Column Names - python

I've got a table with columns like this:
DataID (Primary auto-incrementing key)
TestID (Foreign Key)
FormID (Foreign Key)
VariableName (String Type)
Data (String Type)
Which means the data often looks like this:
DataID, TestID, FormID, VariableName, Data
1, 1, 1, Name, Billy
2, 1, 1, Date, 02/02/2023
3, 2, 1, Name, Bob
4, 2, 1, Date, 02/01/2023
I'd like to be able to run an SQL Alchemy query that will return the data to me in this format instead:
TestID, FormID, Name, Date
1, 1, Billy, 02/02/2023
2, 1, Bob, 02/01/2023
NOTE: The DataID is not needed and the VariableName/Data pairs are to be grouped by TestID.
I tried the advice from these like posts:
SQL Pivot Operation in SQLAlchemy ,
Pivot in SQLAlchemy ,
SQLAlchemy Column to Row Transformation and vice versa -- is it possible?
Although I'm not seeing something that is a purely SQL Alchemy option. and when I attempted the last link I'm struggling because I do not know what kind of table I would need to create in my database to have the relation and association_proxy represented in my table.
Any help/advice is appreciated.

Related

How to do this Not In operation without triggering an overflow in the marker amount of operations in pyobdc/sqlalchemy?

This is a simplification of the case:
I have two databases, a MySQL and a MS_Access. I am trying to delete all elements from the MsAccess that are not in the MySQL table but are still in MSAccess.
I am using sqlalchemy to connect to both DB. To connect with MSAccess (I know, this database should not be used anymore, this is actually part of a migration process), I am using sqlalchemy-access, that internally works with pyodbc.
The code that does this operation is:
#every row in the mysql table contains a field that references its correspondent row in msaccess
mysql_ids = mysql_session.query(mysql_table.id_msaccess).all()
list_of_ids = [elem(0) for elem in mysql_ids]
delete_query = delete(access_table).where((access_table).id.not_in(list_of_ids))
results = access_session.execute(delete_query)
However, I get this error message:
(pyodbc.ProgrammingError) ('The SQL contains -9972 parameter markers, but 55564 parameters were supplied)
DELETE FROM [access_table] WHERE ([access_table].[id] NOT IN (?, ?, ... <here there are all the 55564 parameter markers>) parameters: (241, 242, 243,...)
I have found this issue in pyodbc's github page:
Github Issue in Pyodbc
They essentially say that there is a marker counter that overflows in the internal implementation. They are talking about SQL Server but I guess the same thing happens here.
I could do this query in blocks of 32768 rows, or otherwise check for every element from the mysql table to see if it is in the ms-access table (I think this would be quite slow) but I wonder if there is not a better approach. Do you have any suggestions on how could I approach this?
Thanks in advance for any suggestions
I could do this query in blocks of 32768 rows
That won't work for a NOT IN query. Say you had a list of rows to keep:
[1, 2, 3, 4, 5, 6]
If you tried to do that in batches of 3 then the first DELETE would be
DELETE FROM access_table WHERE id NOT IN (1, 2, 3)
which would delete the rows with id values of 4, 5, and 6. Then the next DELETE would be
DELETE FROM access_table WHERE id NOT IN (4, 5, 6)
which would delete the rows with id values of 1, 2, and 3.
However, you could build a list of rows to delete like this:
with mysql_engine.begin() as conn:
mysql_existing = (
conn.scalars(sa.select(mysql_table.c.id_msaccess)).all()
)
print(mysql_existing) # [2, 3]
with access_engine.begin() as conn:
access_existing = (
conn.scalars(sa.select(access_table.c.id)).all()
)
print(access_existing) # [1, 2, 3, 4, 5, 6]
access_to_delete = list(set(access_existing).difference(mysql_existing))
print(access_to_delete) # [1, 4, 5, 6]
and you could process that list in batches by using IN instead of NOT IN.

Insert into table only if most recent record from group is different

I have a MySQL table with the following columns:
score_id (int10, auto_increment, primary key);
game_id (int10);
away_team_score (int10);
home_team_score (int10);
date_time (datetime);
I am scraping a web API (using python) which I am trying to write an array to this database. However, each time I read this API it provides a list of all events. I am trying to write to this database only when there is a difference in either the away_team_score or the home_team_score for each game_id.
I am able to get the most-recent records using the query from this example (mySQL GROUP, most recent). However, I am unsure on how to check if the values that I am inserting are the same.
I don't want to use update because I want to keep the scores for historical purposes. Also, if the game_id does not exist, it should insert it.
My python code I currently have:
# Connecting to the mysql database
mydb = mysql.connector.connect(host="examplehost", user="exampleuser", passwd="examplepassword", database="exampledb")
mycursor = mydb.cursor()
# example scores array that I scraped
# It is in the format of game_id, away_team_score, home_team_score, date_time
scores = [[1, 0, 1, '2019-11-30 13:05:00'], [2, 1, 5, '2019-11-30 13:05:00'], [3, 4, 8, '2019-11-30 13:05:00'],
[4, 6, 1, '2019-11-30 13:05:00'], [5, 0, 2, '2019-11-30 13:05:00']]
# Inserting into database
game_query = "INSERT INTO tbl_scores (game_id, away_team_score, home_team_score, date_time) VALUES (%s, %s, %s, %s)"
mycursor.executemany(game_query, scores)
mydb.commit()
mydb.close()
You need to make use of UPSERT functionality in MySQL.
Changing your insert query to the following query will only insert when there is new game-id, else it will update the scores:
INSERT INTO tbl_scores
(game_id, score_id, away_team_score, home_team_score, date_time)
VALUES
(game_id, score_id, away_team_score, home_team_score, date_time)
ON DUPLICATE KEY UPDATE
game_id = game_id,
away_team_score = away_team_score,
home_team_score = home_team_score,
date_time = date_time;
Details on upsert - https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html
Let me know if that helps.

Update SQL table with list of dictionaries

How to proper use dictionaries in SQL queries? Should I predefine SQL variables to prevent SQL injection?
E.g. I've got an error in Python while trying update table with dictionary, but query works perfect in pure SQL with predefined variables and values.
_mysql_connector.MySQLInterfaceError: Incorrect datetime value: '1' for column 'colF' at row 1
My dict
{'ID': 1, 'colA': 'valA1', 'colB': 'valB1', 'colC': 'valC1',
'colD': 'valD1', 'colE': 'valE1', 'colF': datetime.datetime(2017, 12, 23, 0, 0),
'colG': datetime.datetime(2018,
7, 11, 0, 0), 'colH': datetime.datetime(2018, 9, 1, 0, 0)}
SQL statement
UPDATE table1
SET
colA = CASE \
WHEN %(valA1)s IS NOT NULL THEN %(valA1)s
END,\
colB = CASE \
WHEN %(valB1)s IS NOT NULL THEN %(valB1)s
END,\
colC = CASE \
WHEN %(valC1)s IS NOT NULL THEN %(valC1)s
END,\
colD = CASE \
WHEN %(valD1)s IS NOT NULL THEN %(valD1)s
END,\
colE = CASE \
WHEN %(valE1)s IS NOT NULL THEN %(valE1)s
END,\
colF = CASE\
WHEN %(valF)s IS NOT NULL THEN %(valF)s
END,\
colG = CASE\
WHEN %(valF1)s IS NOT NULL THEN %(valF1)s
END,\
colH = CASE\
WHEN %(valH1)s IS NOT NULL THEN %(valH1)s
END\
WHERE %(ID)s = Id """
When I format a query string
colF1 = CASE
WHEN 2018-07-11 00:00:00 IS NOT NULL THEN 2018-07-11 00:00:00
END,
colH1 = CASE
WHEN 2018-09-01 00:00:00 IS NOT NULL THEN 2018-09-01 00:00:00
END
WHERE Id = 1
And another issue when value is not null. Syntax is wrong, I suppose.
Several issues arise with your attempted parameterized query:
As described in MySQLdb docs, column or table identifiers cannot be used in parameter placeholders. Such placeholders are only used for literal values. Also, consider the triple quote strings for multiple lines to avoid \ line breaks:
sql = """UPDATE table1
SET
colA = CASE
WHEN colA IS NOT NULL
THEN %(valA1)s
END,
...
"""
cur.execute(sql, mydict)
con.commit()
To use the named placeholder approach, dictionary keys must match placeholder names. Currently, you need to reverse most of your key/value order in dictionary which you correctly do only for ID. However, as mentioned in #1 above, remove any column identifiers:
{'ID': 1, 'valA1': 'data value', 'valB1': 'data value', 'valC1': 'data value', ... }
Datetimes in most databases including MySQL must take the string form YYYY-MM-DD HH:MM:SS. MySQL engine cannot execute Python's datetime() objects. However, some DBI-APIs may be able to convert this object type.
{'valF': '2017-12-23 00:00:00',
'valG': '2018-07-11 00:00:00',
'valH': '2018-09-01 00:00:00'}

Python sqlite - insert if not exists [duplicate]

I have an SQLite database. I am trying to insert values (users_id, lessoninfo_id) in table bookmarks, only if both do not exist before in a row.
INSERT INTO bookmarks(users_id,lessoninfo_id)
VALUES(
(SELECT _id FROM Users WHERE User='"+$('#user_lesson').html()+"'),
(SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+" AND cast(starttime AS int)="+Math.floor(result_set.rows.item(markerCount-1).starttime)+")
WHERE NOT EXISTS (
SELECT users_id,lessoninfo_id from bookmarks
WHERE users_id=(SELECT _id FROM Users
WHERE User='"+$('#user_lesson').html()+"') AND lessoninfo_id=(
SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+")))
This gives an error saying:
db error near where syntax.
If you never want to have duplicates, you should declare this as a table constraint:
CREATE TABLE bookmarks(
users_id INTEGER,
lessoninfo_id INTEGER,
UNIQUE(users_id, lessoninfo_id)
);
(A primary key over both columns would have the same effect.)
It is then possible to tell the database that you want to silently ignore records that would violate such a constraint:
INSERT OR IGNORE INTO bookmarks(users_id, lessoninfo_id) VALUES(123, 456)
If you have a table called memos that has two columns id and text you should be able to do like this:
INSERT INTO memos(id,text)
SELECT 5, 'text to insert'
WHERE NOT EXISTS(SELECT 1 FROM memos WHERE id = 5 AND text = 'text to insert');
If a record already contains a row where text is equal to 'text to insert' and id is equal to 5, then the insert operation will be ignored.
I don't know if this will work for your particular query, but perhaps it give you a hint on how to proceed.
I would advice that you instead design your table so that no duplicates are allowed as explained in #CLs answer below.
For a unique column, use this:
INSERT OR REPLACE INTO tableName (...) values(...);
For more information, see: sqlite.org/lang_insert
insert into bookmarks (users_id, lessoninfo_id)
select 1, 167
EXCEPT
select user_id, lessoninfo_id
from bookmarks
where user_id=1
and lessoninfo_id=167;
This is the fastest way.
For some other SQL engines, you can use a Dummy table containing 1 record.
e.g:
select 1, 167 from ONE_RECORD_DUMMY_TABLE

How to remove quotations of string in list of values for sql query in python

I want to insert values into the table open.roads using SQL in python, the table has the following columns:
id integer NOT NULL DEFAULT nextval('open.res_la_roads_id_seq'::regclass),
run_id integer,
step integer,
agent integer,
road_id integer,
CONSTRAINT res_la_roads_pkey PRIMARY KEY (id)
I'm trying to write various rows of values into sql within one query, therefore I created a list with the values to be inserted in the query, following this example:
INSERT INTO open.roads(id, run_id, step, agent, road_id)
VALUES (DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234);
The list in Python should contain:
list1=(DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234), (.....
I have problems with the value "DEFAULT" as it is a string which should be introduced in sql without the quotations. But I don't manage to remove the quotations, I have tried to save "DEFAULT" in a variable as a string and used str.remove(), str.replace() etc.
The code I'm trying to use:
for road in roads:
a="DEFAULT", self.run_id, self.modelStepCount, self.unique_id, road
list1=append.(a)
val=','.join(map(str, list1))
sql = """insert into open.test ("id","run_id","step","agent","road_id")
values {0}""".format(val)
self.model.mycurs.execute(sql)
I get an error because of the quotiations:
psycopg2.DataError: invalid input syntax for integer: "DEFAULT"
LINE 2:('DEFAULT', 582, 0, 2, 13391),('DEFAULT'
How can I remove them? Or is there another way to do this?
I think this might help you.
sql = "INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES "
values = ['(DEFAULT, 1000, 2, 2, 5286), ', '(DEFAULT, 1000, 1, 1, 5234)']
for value in values:
sql += value
print(sql)
After running this code, it will print: INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234)
Other solution is to format your string using %.
print("INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (%s, %s, %s, %s, %s)" % ("DEFAULT", 1000, 2, 2, 5286))
This will print INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (DEFAULT, 1000, 2, 2, 5286)
Simply omit the column in append query as it will be supplied with DEFAULT value per your table constraint. Also, use parameterization of values by passing a vars argument in execute (safer/more maintainable than string interpolation): execute(query, vars). And since you have a list of tuples, even consider executemany().
Below assumes database API (i.e., pymysql, pyscopg2) uses the %s as parameter placeholder, otherwise use ?:
# PREPARED STATEMENT
sql = """INSERT INTO open.test ("run_id", "step", "agent", "road_id")
VALUES (%s, %s, %s, %s);"""
list1 = []
for road in roads:
a = (self.run_id, self.modelStepCount, self.unique_id, road)
list1.append(a)
self.model.mycurs.executemany(sql, list1)

Categories