MYSQL compare 2 tables and set the values - python

I have an issue with my MySQL database. I am programming it in python.
I have 2 tables: Raspberry_data and Operation1.
I must read the data from Operation1 and copy some values from Operation1 to Raspberry_data table. The issue that some columns in Raspberry_data are identical which causes the query to work incorrectly.
Please check the following query:
http://sqlfiddle.com/#!9/a4c2ef/5
I must update Current_operation and ID columns in the Raspberry_data table from the data in Operation1.
The expected result:
Current_operation = 1 ID = 4
Current_operation = 1 ID = 6
However, the result is :
Current_operation = 1 ID = 4
Current_operation = 1 ID = 4
How can I ensure that it copies the individual rows line by line?
I am not able to execute this query for some reason on sqlfiddle but I have tested it on my actual mysql database and the results are the same.

Related

psycopg2 Syntax errors at or near "' '"

I have a dataframe named Data2 and I wish to put values of it inside a postgresql table. For reasons, I cannot use to_sql as some of the values in Data2 are numpy arrays.
This is Data2's schema:
cursor.execute(
"""
DROP TABLE IF EXISTS Data2;
CREATE TABLE Data2 (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
"""
)
My code segment:
for col in Data2_mcw.columns:
for row in Data2_mcw.index:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
cursor.execute(
"""
INSERT INTO Data2_mcw(%s)
VALUES (%s)
"""
,
(col.replace('\"',''),value)
)
Error generated:
psycopg2.errors.SyntaxError: syntax error at or near "'time'"
LINE 2: INSERT INTO Data2_mcw('time')
How do I rectify this error?
Any help would be much appreciated!
There are two problems I see with this code.
The first problem is that you cannot use bind parameters for column names, only for values. The first of the two %s placeholders in your SQL string is invalid. You will have to use string concatenation to set column names, something like the following (assuming you are using Python 3.6+):
cursor.execute(
f"""
INSERT INTO Data2_mcw({col})
VALUES (%s)
""",
(value,))
The second problem is that a SQL INSERT statement inserts an entire row. It does not insert a single value into an already-existing row, as you seem to be expecting it to.
Suppose your dataframe Data2_mcw looks like this:
a b c
0 1 2 7
1 3 4 9
Clearly, this dataframe has six values in it. If you were to run your code on this dataframe, then it would insert six rows into your database table, one for each value, and the data in your table would look like the following:
a b c
1
3
2
4
7
9
I'm guessing you don't want this: you'd rather your database table contained the following two rows instead:
a b c
1 2 7
3 4 9
Instead of inserting one value at a time, you will have to insert one entire row at time. This means you have to swap your two loops around, build the SQL string up once beforehand, and collect together all the values for a row before passing it to the database. Something like the following should hopefully work (please note that I don't have a Postgres database to test this against):
column_names = ",".join(Data2_mcw.columns)
placeholders = ",".join(["%s"] * len(Data2_mcw.columns))
sql = f"INSERT INTO Data2_mcw({column_names}) VALUES ({placeholders})"
for row in Data2_mcw.index:
values = []
for col in Data2_mcw.columns:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
values.append(value)
cursor.execute(sql, values)

Search SQL request with two tables on PostgreSQL. SQLAlchemy. Python

Need help in request making on SQL or SQLAlchemy
First table named as Rows
sid
unit_sid
ROW_UUID1
UNIT_UUID1
ROW_UUID2
UNIT_UUID1
ROW_UUID3
UNIT_UUID
Second table with name Records
row_sid (==SID from ROWS)
item_sid
content (str)
ROW_UUID1
ITEM_UUID1
Decription 1
ROW_UUID1
ITEM_UUID2
Decription 1
ROW_UUID2
ITEM_UUID1
Description 3
ROW_UUID2
ITEM_UUID2
Description 2
ROW_UUID3
ITEM_UUID1
Description 5
ROW_UUID3
ITEM_UUID2
Description 1
I need an example of a SQL query, where I can specify a search for several content values for different item_sid
For example I need all ROWS where
item_sid == ITEM_UUID1 and content == Description 1
item_sid == ITEM_UUID2 and content == Description 1
Request like bellow will not work for me, because I need search in two item_sid in same time for receiving unique ROWS
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
and (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid
Solved like
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
or (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid having count(row_sid) = 2
But maybe there are more beautiful solution? I want to request different number of item_sids (2-5) in the same time

Executing an SQL update statement from a pandas dataframe

Context: I am using MSSQL, pandas, and pyodbc.
Steps:
Obtain dataframe from query using pyodbc (no problemo)
Process columns to generate the context of a new (but already existing) column
Fill an auxilliary column with UPDATE statements (i.e. UPDATE t SET t.value = df.value FROM dbo.table t where t.ID = df.ID)
Now how do I execute the sql code in the auxilliary column, without looping through each row?
sample data
The first two columns are obtained by querying dbo.table, the third columns exists but is empty in the database. The fourth column only exists in the dataframe to prepare the SQL statement that would correspond to updating dbo.table
ID
raw
processed
strSQL
1
lorum.ipsum#test.com
lorum ipsum
UPDATE t SET t.processed = 'lorum ipsum' FROM dbo.table t WHERE t.ID = 1
2
rumlo.sumip#test.com
rumlo sumip
UPDATE t SET t.processed = 'rumlo sumip' FROM dbo.table t WHERE t.ID = 2
3
...
...
...
I would like to execute the SQL script in each row in an efficient manner.
After I recommended .executemany() in a comment to the question, a subsequent comment from #Charlieface suggested that a table-valued parameter (TVP) would provide even better performance. I didn't think it would make that much difference, but I was wrong.
For an existing table named MillionRows
ID TextField
-- ---------
1 foo
2 bar
3 baz
…
and example data of the form
num_rows = 1_000_000
rows = [(f"text{x:06}", x + 1) for x in range(num_rows)]
print(rows)
# [('text000000', 1), ('text000001', 2), ('text000002', 3), …]
my test using a standard executemany() call with cnxn.autocommit = False and crsr.fast_executemany = True
crsr.executemany("UPDATE MillionRows SET TextField = ? WHERE ID = ?", rows)
took about 180 seconds (3 minutes).
However, by creating a user-defined table type
CREATE TYPE dbo.TextField_ID AS TABLE
(
TextField nvarchar(255) NULL,
ID int NOT NULL,
PRIMARY KEY (ID)
)
and a stored procedure
CREATE PROCEDURE [dbo].[mr_update]
#tbl dbo.TextField_ID READONLY
AS
BEGIN
SET NOCOUNT ON;
UPDATE MillionRows SET TextField = t.TextField
FROM MillionRows mr INNER JOIN #tbl t ON mr.ID = t.ID
END
when I used
crsr.execute("{CALL mr_update (?)}", (rows,))
it did the same update in approximately 80 seconds (less than half the time).

Is there a way to store both a pandas dataframe and separate string var in the same SQLite table?

Disclaimer: This is my first time posting a question here, so I apologize if I didn't post this properly.
I recently started learning how to use SQLite in python. As the title suggests, I have a python object with a string and a pandas dataframe attributes and want to know if/how I can add both of these to the same SQLite table. Below is the code I have thus far. The mydb.db file gets created successfully, but on insert I get the following error message:
sqlite3.InterfaceError: Error binding parameter :df- probably unsupported type.
I know you can use df.to_sql('mydbs', conn) to store a pandas dataframe in an SQL table, but this wouldn't seem to allow for an additional string to be added to the same table and then retrieved separately from the dataframe. Any solutions or alternative suggestions are appreciated.
Python Code:
# Python 3.7
import sqlite3
import pandas as pd
import myclass
conn = sqlite3.connect("mydb.db")
c = conn.cursor()
c.execute("""CREATE TABLE mydbs (
name text,
df blob
)""")
conn.commit()
c.execute("INSERT INTO mydbs VALUES (:name, :df)", {'name': myclass.name, 'df': myclass.df})
conn.commit()
conn.close()
It looks like you are trying to store a dataframe in an SQL table 'cell'. This is a bit odd, since sql is used for storing tables of data... and a dataframe is something that arguably should be stored as a table on its own (hence the built in pandas function). To accomplish what you want specifically, you could pickle the dataframe and store it
import codecs
import pickle
import pandas as pd
import sqlite3
df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
pickled = codecs.encode(pickle.dumps(df), "base64").decode()
df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
Store & Retrieve:
conn = sqlite3.connect("mydb.db")
c = conn.cursor()
c.execute("""CREATE TABLE mydbs (
name text,
df text
)""")
c.execute("INSERT INTO mydbs VALUES (:name, :df)", {'name': 'name', 'df': pickled})
conn.commit()
c.execute('SELECT * FROM mydbs')
result = c.fetchall()
unpickled = pickle.loads(codecs.decode(result[0][1].encode(), "base64"))
conn.close()
unpickled
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
If you wanted to store the dataframe as an sql table (which imo makes more sense and is simpler) and you needed to have a name with it you could just add a column 'name' to the df:
import pandas as pd
from sqlalchemy import create_engine
df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
Add name column, then save to db and retrieve:
df['name'] = 'the df name'
engine = create_engine('sqlite://', echo=False)
df.to_sql('users', con=engine)
r = engine.execute("SELECT * FROM users").fetchall()
r = pd.read_sql('users', con=engine)
r
index foo bar name
0 0 0 5 the df name
1 1 1 6 the df name
2 2 2 7 the df name
3 3 3 8 the df name
4 4 4 9 the df name
But even that method may not be ideal, since you are effectively adding an extra column of data for each df, and this could get costly if you are working on a large project where database size is a factor, and maybe even speed (although SQL is quite fast). In this case, it may be best to use relational tables. For this I refer you here since there is no point re-writing the code here. Using a relational model would be the most 'proper' solution imo, since it fully embodies the purpose of SQL.

Peewee counting all objects with specific value

My struggle is not with creating a table, I can create a table. The problem is to populate columns based off of calculations of other tables.
I have looked at How to create all tables defined in models using peewee and this is not helping me do summations and count etc..
I have a hypothetical database (database.db) and created these two tables:
Table 1 (from class User)
id name
1 Jamie
2 Sam
3 Mary
Table 2 (from class Sessions)
id SessionId
1 4121
1 4333
1 4333
3 5432
I simply want to create a new table using peewee:
id name sessionCount TopSession # <- (Session that appears most for the given user)
1 Jamie 3 4333
2 Sam 0 NaN
3 Mary 1 5432
4 ...
Each entry in Table1 and Table2 was created using User.create(...) or Sessions.create(...)
The new table should look at the data that is in the database.db (ie Table1 and Table2) and perform the calculations.
This would be simple in Pandas, but I cant seem to find a query that can do this. Please help
I found it...
query = Sessions.select(fn.COUNT(Sessions.id)).where(Sessions.id==1)
count = query.scalar()
print(count)
>>> 3
# Or:
query = Sessions.select().where(Sessions.id == 1).count()
3
For anyone out there : )

Categories