mysql update entire pandas dataframe - python

I was wondering if there is a way to update all rows of a pandas dataframe in one query to mysql.
I select a dataframe from mysql. Then I do some calculations and then I want the rows in the mysql table to update to the rows in the dataframe. I do not select the complete table so I cannot just replace the table.
the column order/type remain unchanged so it just needs to replace/update the rows and I have a primary key indexed, auto-increment 'id' column if this makes any difference.
thanks
The error I get when trying to create the sql statement from the post Bob commented below.
58 d = {'col1': 'val1', 'col2': 'val2'}
59 sql = 'UPDATE table SET {}'.format(', '.join('{}=%s'.format(k) for k in d))
60 print sql
61 sql undefined, k = 'col2', global d = {'col1': 'val1', 'col2': 'val2'}
<type 'exceptions.ValueError'>: zero length field name in format
args = ('zero length field name in format',)
message = 'zero length field name in format'

I don't think that's possible with Pandas. At least not directly from Pandas. I know that you can use to_sql() to append or replace, but that doesn't help you very much.
You could try converting a dataframe to a dict with to_dict() and then executing an Update statement with values from the dict and a mysql cursor.
UPDATE
You might be using a version (like 2.6) of python that requires positional arguments in the format()
sql = 'UPDATE table SET {0}'.format(', '.join('{0}=%s'.format(k) for k in d))

Related

Difficulty Inserting Json Dictionary To MSSQL

I am new to Python, but have a seemingly very simply exercise that I am struggling to figure out. This is a two part issue.
First: I have a list of Json objects that I am getting from an API. I am wanting to enter each list item as a row in a Dataframe to preserve that row's json object/dict for storing in a database to allow for editing and reposting. In addition, I am wanting to convert the list into a standard Dataframe (easy enough). In essence, it will be a standard dataframe with the row's raw json contained as an additional column. I've managed to accomplish this by joining a Series to a Dataframe, but I am relying on the index to join. My first question is whether the join can be done based not just on the index, but to join based on the 'id' value in the dataframe to the 'id' contained in each element in the json object/dict in the list? The rationale being that in doing so, I'd eliminate order concerns and I'd be 100% sure the json object is associated with the correct dataframe row.
Second: As I mentioned above, I've simply joined the series to the dataframe using the index, which works in this case because the both use the same 'data' list. The output looks good when I print but when I go to insert it into MSSQL, it does not like the json dictionary ('Json_Series' column). The insert works fine when I just eliminate that field. I could see how an inline fstring for the insert concatenation might work if you did some type of cast or convert on the dict, but I will be doing this for many APIs so I am trying to keep from having to write custom insert language (i.e. would like to rely on a to_sql or equivalent method/class/library). I have also tried changing the column with .astype('str') prior to insert, but that doesn't work as has been documented elsewhere.
The error I get is:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('Invalid parameter type. param-index=0 param-type=dict', 'HY105')
[SQL: INSERT INTO dbo.[TestInsert_JsonObject_Python] ([Json_Series], id, value) VALUES (?, ?, ?), (?, ?, ?)]
[parameters: ({'id': 1, 'value': 2}, 1, 2, {'id': 3, 'value': 4}, 3, 4)]
Clearly, it is the first parameter and it sees it as a dictionary. Removing that column resolves the issue.
Here is what I've tried. This shows the Series (df), the dataframe (df2) and the combined join (inner_merged'). The join is vulnerable in the future to order if I were to join different lists. I am needing the join to reference the internal 'id' in the json dict when joining against the dataframe's 'id' column The combined join ('inner_merged') is however in the form that I'd like to be able to just insert into SQL.
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.engine import URL
data = [{"id": 1, "value": 2}, {"id": 3, "value": 4}]
df = pd.Series(data, name='Json_Series')
df2= pd.DataFrame(data)
inner_merged = pd.merge(df, df2, left_index=True, right_index=True)
print(df)
print(df2)
# inner_merged['Json_Series'] = inner_merged['Json_Series'].astype('str')
print(inner_merged)
Here is the MSSQL table creation:
CREATE TABLE [dbo].[TestInsert_JsonObject_Python](
[Json_Series] [varchar](500) NULL,
[id] [int] NULL,
[value] int Null,
)
GO
CREATE TABLE [dbo].[TestInsert_NoJsonObject_Python](
[id] [int] NULL,
[value] int Null,
)
GO
Here is the insert - the first one without the json object and the second with:
server = 'enteryourserver'
database = 'enteryourdatabase'
username = 'enteryourusername'
password = 'enteryourpassword'
driver = '{ODBC Driver 18 for SQL Server}'
table1 = 'TestInsert_NoJsonObject_Python'
table2 = 'TestInsert_JsonObject_Python'
schema = 'dbo'
connection_string = f"DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password}"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df2.to_sql(table1, con=engine, schema= schema, if_exists='replace', index=False)
inner_merged.to_sql(table2, con=engine, schema= schema, if_exists='replace', index=False)
As I mentioned, I am totally new to this so any suggestions are much appreciated.

Add dataframe column WITH VARYING VALUES to MySQL table?

Pretty simple question, but not sure if it’s possible from what I’ve seen so far online.
To keep it simple, let’s say I have a MySQL table with 1 column and 5 rows made already. If I have a pandas dataframe with 1 column and 5 rows, how can I add that dataframe column (with its values) to the database table?
The guides I’ve read so far only show you how to simply create a new column with either null values or 1 constant value, which doesn’t help much. The same question was asked here but the answer provided didn’t answer the question, so I’m asking it again here.
As an example:
MySQL table:
Pandas DataFrame:
Desired MySQL table:
Then for kicks, let's say we have a string column to add as well:
Desired MySQL output:
Safe to assume the index column will always match in the DF and the MySQL table.
You can use INSERT ... ON DUPLICATE KEY UPDATE.
You have the following table:
create table tbl (
index_ int ,
col_1 int ,
primary key index_(`index_`)
) ;
insert into tbl values (1,1), (2,2), (3,3), (4,4), (5,5);
And want to add the following data in a new column on the same table ;
(1,0.1),(2,0.2),(3,0.3),(4,0.4),(5,0.5)
First you need to add the column with the alter command,
alter table tbl add column col_2 decimal(5,2) ;
Then use INSERT ON DUPLICATE KEY UPDATE Statement
INSERT INTO tbl (index_,col_2)
VALUES
(1,0.1),
(2,0.2),
(3,0.3),
(4,0.4),
(5,0.5)
ON DUPLICATE KEY UPDATE col_2=VALUES(col_2);
Fiddle

Is There a Way To Select All Columns From The Table And Except One Column

I have been using sqlite3 with python for creating databases. Till Now I have been successful,
But Unfortunately I have No way Out Of This. I have A Table With 63 columns but I Want To Select Only 62 Out Of Them, I am Sure That I can write The Names of The Columns In The Select Statement. But Writing 62 Of Them seems like a non-logical(for a programmer like me) idea for me. I am using Python-sqlite3 databases. Is There A Way Out Of This
I'm Sorry If I am Grammarly Mistaken.
Thanks in advance
With Sqlite, you can:
do a PRAGMA table_info(tablename); query to get a result set that describes that table's columns
pluck the column names out of that result set and remove the one you don't want
compose a column list for the select statement using e.g. ', '.join(column_names) (though you might want to consider a higher-level SQL statement builder instead of playing with strings).
Example
A simple example using a simple table and an in-memory SQLite database:
import sqlite3
con = sqlite3.connect(":memory:")
con.executescript("CREATE TABLE kittens (id INTEGER, name TEXT, color TEXT, furriness INTEGER, age INTEGER)")
columns = [row[1] for row in con.execute("PRAGMA table_info(kittens)")]
print(columns)
selected_columns = [column for column in columns if column != 'age']
print(selected_columns)
query = f"SELECT {', '.join(selected_columns)} FROM kittens"
print(query)
This prints out
['id', 'name', 'color', 'furriness', 'age']
['id', 'name', 'color', 'furriness']
SELECT id, name, color, furriness FROM kittens

Pandas to_sql() to update unique values in DB?

How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.
Is there a parameter for this?
I understand that the parameters if_exists = 'append' and if_exists = 'replace'is for the entire table - not the unique entries.
I am using:
sqlalchemy
pandas dataframe with the following datatypes:
index: datetime.datetime <-- Primary Key
float
float
float
float
integer
string <--- Primary Key
string<---- Primary Key
I'm stuck on this so your help is much appreciated. -Thanks
In pandas, there is no convenient argument in to_sql to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS clause.
engine = sqlalchemy.create_engine(...)
df.to_sql(name='myTempTable', con=engine, if_exists='replace')
with engine.begin() as cn:
sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
SELECT t.Col1, t.Col2, t.Col3, ...
FROM myTempTable t
WHERE NOT EXISTS
(SELECT 1 FROM myFinalTable f
WHERE t.MatchColumn1 = f.MatchColumn1
AND t.MatchColumn2 = f.MatchColumn2)"""
cn.execute(sql)
This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT and so is compliant in practically all SQL-integrated relational databases.

Importing SQL query into Pandas results in only 1 column

I'm trying to import the results of a complex SQL query into a pandas dataframe. My query requires me to create several temporary tables since the final result table I want includes some aggregates.
My code looks like this:
cnxn = pyodbc.connect(r'DRIVER=foo;SERVER=bar;etc')
cursor = cnxn.cursor()
cursor.execute('SQL QUERY HERE')
cursor.execute('SECONDARY SQL QUERY HERE')
...
df = pd.DataFrame(cursor.fetchall(),columns = [desc[0] for desc in cursor.description])
I get an error that tells me shapes aren't matching:
ValueError: Shape of passed values is (1,900000),indices imply (5,900000)
And indeed, the result of all the SQL queries should be a table with 5 columns rather than 1. I've run the SQL query using Microsoft SQL Server Management Studio and it works and returns the 5 column table that I want. I've tried to not pass any column names into the dataframe and printed out the head of the dataframe and found that pandas has put all the information in 5 columns into 1. The values in each row is a list of 5 values separated by commas, but pandas treats the entire list as 1 column. Why is pandas doing this? I've also tried going the pd.read_sql route but I still get the same error.
EDIT:
I have done some more debugging, taking the comments into account. The issue doesn't appear to stem from the fact that my query is nested. I tried a simple (one line) query to return a 3 column table and I still got the same error. Printing out fetchall() looks like this:
[(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),
(str1,str2,str3,datetime.date(stuff),datetime.date(stuff)),...]
Use pd.DataFrame.from_records instead:
df = pd.DataFrame.from_records(cursor.fetchall(),
columns = [desc[0] for desc in cursor.description])
Simply adjust the pd.DataFrame() call as right now cursor.fetchall() returns one-length list of tuples. Use tuple() or list to map child elements into their own columns:
df = pd.DataFrame([tuple(row) for row in cur.fetchall()],
columns = [desc[0] for desc in cursor.description])

Categories