Write results to permanent table in bigquery - python

I am using named parameters in Bigquery SQL and want to write the results to a permanent table. I have two functions 1 for using named query parameters and 1 for writing query results to table. How do I combine the two to get query results written to table; the query having named parameters.
This is the function using parameterized queries :
def sync_query_named_params(column_name,min_word_count,value):
query = """with lsq_results as
(select "%s" = #min_word_count)
replace (%s AS %s)
from lsq.lsq_results
""" % (min_word_count,value,column_name)
client = bigquery.Client()
query_results = client.run_sync_query(query
,
query_parameters=(
bigquery.ScalarQueryParameter('column_name', 'STRING', column_name),
bigquery.ScalarQueryParameter(
'min_word_count',
'STRING',
min_word_count),
bigquery.ScalarQueryParameter('value','INT64',value)
))
query_results.use_legacy_sql = False
query_results.run()
Function to write to permanent table
class BigQueryClient(object):
def __init__(self, bq_service, project_id, swallow_results=True):
self.bigquery = bq_service
self.project_id = project_id
self.swallow_results = swallow_results
self.cache = {}
def write_to_table(
self,
query,
dataset=None,
table=None,
external_udf_uris=None,
allow_large_results=None,
use_query_cache=None,
priority=None,
create_disposition=None,
write_disposition=None,
use_legacy_sql=None,
maximum_billing_tier=None,
flatten=None):
configuration = {
"query": query,
}
if dataset and table:
configuration['destinationTable'] = {
"projectId": self.project_id,
"tableId": table,
"datasetId": dataset
}
if allow_large_results is not None:
configuration['allowLargeResults'] = allow_large_results
if flatten is not None:
configuration['flattenResults'] = flatten
if maximum_billing_tier is not None:
configuration['maximumBillingTier'] = maximum_billing_tier
if use_query_cache is not None:
configuration['useQueryCache'] = use_query_cache
if use_legacy_sql is not None:
configuration['useLegacySql'] = use_legacy_sql
if priority:
configuration['priority'] = priority
if create_disposition:
configuration['createDisposition'] = create_disposition
if write_disposition:
configuration['writeDisposition'] = write_disposition
if external_udf_uris:
configuration['userDefinedFunctionResources'] = \
[ {'resourceUri': u} for u in external_udf_uris ]
body = {
"configuration": {
'query': configuration
}
}
logger.info("Creating write to table job %s" % body)
job_resource = self._insert_job(body)
self._raise_insert_exception_if_error(job_resource)
return job_resource
How do I combine the 2 functions to write a parameterized query and write the results to a permanent table?Or if there is another simpler way. Please suggest.

You appear to be using two different client libraries.
Your first code sample uses a beta version of the BigQuery client library, but for the time being I would recommend against using it, since it needs substantial revision before it is considered generally available. (And if you do use it, I would recommend using run_async_query() to create a job using all available parameters, and then call results() to get the QueryResults object.)
Your second code sample is creating a job resource directly, which is a lower-level interface. When using this approach, you can specify the configuration.query.queryParameters field on your query configuration directly. This is the approach I'd recommend right now.

Related

S3 Select Query JSON for nested value when keys are dynamic

I have a JSON object in S3 which follows this structure:
<code> : {
<client>: <value>
}
For example,
{
"code_abc": {
"client_1": 1,
"client_2": 10
},
"code_def": {
"client_2": 40,
"client_3": 50,
"client_5": 100
},
...
}
I am trying to retrieve the numerical value with an S3 Select query, where the "code" and the "client" are populated dynamically with each query.
So far I have tried:
sql_exp = f"SELECT * from s3object[*][*] s where s.{proc}.{client_name} IS NOT NULL"
sql_exp = f"SELECT * from s3object s where s.{proc}[*].{client_name}[*] IS NOT NULL"
as well as without the asterisk inside the square brackets, but nothing works, I get ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found LITERAL:UNKNOWN at line 1, column X (depending on the length of the query string)
Within the function defining the object, I have:
resp = s3.select_object_content(
Bucket=<bucket>,
Key=<filename>,
ExpressionType="SQL",
Expression=sql_exp,
InputSerialization={'JSON': {"Type": "Document"}},
OutputSerialization={"JSON": {}},
)
Is there something off in the way I define the object serialization? How can I fix the query so I can retrieve the desired numerical value on the fly when I provide ”code” and “client”?
I did some tinkering based on the documentation, and it works!
I need to access the single event in the EventStream (resp) as follows:
event_stream = resp['Payload']
# unpack successful query response
for event in event_stream:
if "Records" in event:
output_str = event["Records"]["Payload"].decode("utf-8") # bytes to string
output_dict = json.loads(output_str) # string to dict
Now the correct SQL expression is:
sql_exp= f"SELECT s['{code}']['{client}'] FROM S3Object s"
where I have gotten (dynamically) my values for code and client beforehand.
For example, based on the dummy JSON structure above, if code = "code_abc" and client = "client_2", I want this S3 Select query to return the value 10.
The f-string resolves to sql_exp = "SELECT s['code_abc']['client_2'] FROM S3Object s", and when we call resp, we retrieve output_dict = {'client_2': 10} (Not sure if there is a clear way to get the value by itself without the client key, this is how it looks like in the documentation as well).
So, the final step is to retrieve value = output_dict['client_2'], which in our case is equal to 10.

How to declare variables in a GraphqQL query using the gql client?

I'm new with GraphQL schemas and I would like to do a mutation using the gql client. The query below works like a charme in the graphql web interface after replacing the 5 variables with the corresponding strings and integers.
But when I put a $ before every variables in the query, as mentionned in the documentation, it throws an error saying Variable '$w' is not defined by operation 'createMutation'.
What am'I missing ?
transport = AIOHTTPTransport(url="http://x.x.x.x:8000/graphql")
client = Client(transport=transport, fetch_schema_from_transport=True)
query = gql(
"""
mutation createMutation {
createTarget(targetData: {weight: $w, dt: $dt,
exchangeId: $exchangeId,
strategyId: $strategyId,
marketId:$marketId
}) {
target {
dt,
weight,
market,
exchange,
strategy
}
}
}
"""
)
params = {"w": self.weight,
"dt": self.dt,
"exchangeId": self.exchange.pk,
"strategyId": self.strategy.pk,
"marketId": self.market.pk
}
result = client.execute(query, variable_values=params)
When I remove the $ it says Float cannot represent non numeric value: w.
And this is how the graphene code looks like at the server side :
class TargetInput(graphene.InputObjectType):
weight = graphene.Float()
dt = graphene.DateTime()
strategy_id = graphene.Int()
exchange_id = graphene.Int()
market_id = graphene.Int()
class CreateTarget(graphene.Mutation):
class Arguments:
target_data = TargetInput(required=True)
target = graphene.Field(CustomObject)
#staticmethod
def mutate(root, info, target_data):
target = Target.objects.create(**target_data)
return CreateTarget(target=target)
class Mutation(graphene.ObjectType):
create_target = CreateTarget.Field()
schema = graphene.Schema(query=Query, mutation=Mutation)
There is also another question related to gql variables but it doesn't solve my problem.
I have found the answer to my own question. When using variables it's necessary to declare each of them between ( and ) at the beggining of the query, as stipulated here.
So in my case the correct query was:
query = gql(
"""
mutation createMutation ($w: Float, $dt: DateTime, $exchangeId: Int, $strategyId: Int, $marketId: Int){
createTarget(targetData: {weight: $w, dt: $dt,
exchangeId: $exchangeId,
strategyId: $strategyId,
marketId: $marketId
}) {
target {
dt
}
}
}
"""
)

Python mock multiple queries in a function using pytest_mock

I am writing unit test case for a function which has multiple sql queries in it.I am using psycopg2 module and trying to mock the cursor.
app.py
import psycopg2
def my_function():
# all connection related code goes here ...
query = "SELECT name,phone FROM customer WHERE name='shanky'"
cursor.execute(query)
columns = [i[0] for i in cursor.description]
customer_response = []
for row in cursor.fetchall():
customer_response.append(dict(zip(columns, row)))
query = "SELECT name,id FROM product WHERE name='soap'"
cursor.execute(query)
columns = [i[0] for i in cursor.description]
product_response = []
for row in cursor.fetchall():
product_response.append(dict(zip(columns, row)))
return product_response
test.py
from pytest_mock import mocker
import psycopg2
def test_my_function(mocker):
from my_module import app
mocker.patch('psycopg2.connect')
#first query
mocked_cursor_one = psycopg2.connect.return_value.cursor.return_value
mocked_cursor_one.description = [['name'],['phone']]
mocked_cursor_one.fetchall.return_value = [('shanky', '347539593')]
mocked_cursor_one.execute.call_args == "SELECT name,phone FROM customer WHERE name='shanky'"
#second query
mocked_cursor_two = psycopg2.connect.return_value.cursor.return_value
mocked_cursor_two.description = [['name'],['id']]
mocked_cursor_two.fetchall.return_value = [('nirma', 12313)]
mocked_cursor_two.execute.call_args == "SELECT name,id FROM product WHERE name='soap'"
ret = app.my_function()
assert ret == {'name' : 'nirma', 'id' : 12313}
But the mocker always takes the last mock object (the second query).I have already tried multiple hacks, but that didn't work out. How can i mock multiple queries in one function and successfully pass the unit test case? Is it possible to write a unit test case in this fashion or do i need to split the queries in different functions?
After drilling a lot through the documentation, I was able to achieve this with the help of unittest mock decorator and side_effect which was suggested by #Pavel Vergeev.I was able to write a unit test case that is good enough to test the functionality.
from unittest import mock
from my_module import app
#mock.patch('psycopg2.connect')
def test_my_function(mocked_db):
mocked_cursor = mocked_db.return_value.cursor.return_value
description_mock = mock.PropertyMock()
type(mocked_cursor).description = description_mock
fetchall_return_one = [('shanky', '347539593')]
fetchall_return_two = [('nirma', 12313)]
descriptions = [
[['name'],['phone']],
[['name'],['id']]
]
mocked_cursor.fetchall.side_effect = [fetchall_return_one, fetchall_return_two]
description_mock.side_effect = descriptions
ret = app.my_function()
# assert whether called with mocked side effect objects
mocked_db.assert_has_calls(mocked_cursor.fetchall.side_effect)
# assert db query count is 2
assert mocked_db.return_value.cursor.return_value.execute.call_count == 2
# first query
query1 = """
SELECT name,phone FROM customer WHERE name='shanky'
"""
assert mocked_db.return_value.cursor.return_value.execute.call_args_list[0][0][0] == query1
# second query
query2 = """
SELECT name,id FROM product WHERE name='soap'
"""
assert mocked_db.return_value.cursor.return_value.execute.call_args_list[1][0][0] == query2
# assert the data of response
assert ret == {'name' : 'nirma', 'id' : 12313}
In addition to this if there are dynamic parameters in the query, that can be asserted too by the following method.
assert mocked_db.return_value.cursor.return_value.execute.call_args_list[0][0][1] = (parameter_name,)
so when the first query is executed, cursor.execute(query,(parameter_name,)) at call_args_list[0][0][0] the query can be obtained and asserted, at call_args_list[0][0][1] the first parameter parameter_name can be obtained. similarly incrementing the index, all the other params and different queries can be obtained and asserted.
Try side_effect argument of mocker.patch:
from unittest.mock import MagicMock
from pytest_mock import mocker
import psycopg2
def test_my_function(mocker):
from my_module import app
mocker.patch('psycopg2.connect', side_effect=[MagicMock(), MagicMock()])
#first query
mocked_cursor_one = psycopg2.connect().cursor.return_value # note that we actually call psyocpg2.connect -- it's important
mocked_cursor_one.description = [['name'],['phone']]
mocked_cursor_one.fetchall.return_value = [('shanky', '347539593')]
mocked_cursor_one.execute.call_args == "SELECT name,phone FROM customer WHERE name='shanky'"
#second query
mocked_cursor_two = psycopg2.connect().cursor.return_value
mocked_cursor_two.description = [['name'],['id']]
mocked_cursor_two.fetchall.return_value = [('nirma', 12313)]
mocked_cursor_two.execute.call_args == "SELECT name,id FROM product WHERE name='soap'"
assert mocked_cursor_one is not mocked_cursor_two # show that they are different
ret = app.my_function()
assert ret == {'name' : 'nirma', 'id' : 12313}
As per the docs, side_effect allows you to change returned value each time the patched object is called:
If you pass in an iterable, it is used to retrieve an iterator which must yield a value on every call. This value can either be an exception instance to be raised, or a value to be returned from the call to the mock
As I have mentioned in an earlier comment, the best way to make unit testing portable is to develop a complete Mock of your database's behavior.
I've done it for MySQL but it's pretty much the same for all databases.
First of all, I like using wrapper classes over the packages I'm using, it helps quickly change the database at one place instead of changing it everywhere in the code.
Here's a samople of what I use as a wrapper:
Now, you would need to Mock this MySQL class:
# _database.py
# -----------------------------------------------------------------------------
# Database Metaclass
# -----------------------------------------------------------------------------
"""Metaclass for Database implementation.
"""
# -----------------------------------------------------------------------------
import logging
logger = logging.getLogger(__name__)
class Database:
"""Database Metaclass"""
def __init__(self, connect_func, **kwargs):
self.connection = connect_func(**kwargs)
def execute(self, statement, fetchall=True):
"""Execute a statement.
Execute the statement passed as arugment.
Args:
statement (str): SQL Query or Command to execute.
Returns:
set: List of returned objects by the cursor.
"""
cursor = self.connection.cursor()
logger.debug(f"Executing: {statement}")
cursor.execute(statement)
if fetchall:
return cursor.fetchall()
else:
return cursor.fetchone()
def __del__(self):
"""Close connection on object deletion."""
self.connection.close()
And the mysql module:
# mysql.py
# -*- coding: utf-8 -*-
# -----------------------------------------------------------------------------
# MySQL Database Class
# -----------------------------------------------------------------------------
"""Class for MySQL Database connection."""
# -----------------------------------------------------------------------------
import logging
import mysql.connector
from . import _database
logger = logging.getLogger(__name__)
class MySQL(_database.Database):
"""Snowflake Database Class Wrapper.
Attributes:
connection (obj): Object returned from mysql.connector.connect
"""
def __init__(self, autocommit=True, **kwargs):
super().__init__(connect_func=mysql.connector.connect, **kwargs)
self.connection.autocommit = autocommit
Instantiate like: db = MySQL(user='...', password='...', ...)
Here's the data file:
# database_mock_data.json
{
"customer": {
"name": [
"shanky",
"nirma"
],
"phone": [
123123123,
232342342
]
},
"product": {
"name": [
"shanky",
"nirma"
],
"id": [
1,
2
]
}
}
The mocks.py
# mocks.py
import json
import re
from . import mysql
_MOCK_DATA_PATH = 'database_mock_data.json'
class MockDatabase(MySQL):
"""
"""
def __init__(self, **kwargs):
self.connection = MockConnection()
class MockConnection:
"""
Mock the connection object by returning a mock cursor.
"""
#staticmethod
def cursor():
return MockCursor()
class MockCursor:
"""
The Mocked Cursor
A call to execute() will initiate the read on the json data file and will set
the description object (containing the column names usually).
You could implement an update function like `_json_sql_update()`
"""
def __init__(self):
self.description = []
self.__result = None
def execute(self, statement):
data = _read_json_file(_MOCK_DATA_PATH)
if statement.upper().startswith('SELECT'):
self.__result, self.description = _json_sql_select(data, statement)
def fetchall(self):
return self.__result
def fetchone(self):
return self.__result[0]
def _json_sql_select(data, query):
"""
Takes a dictionary and returns the values from a sql query.
NOTE: It does not work with other where clauses than '='.
Also, note that a where statement is expected.
:param (dict) data: Dictionary with the following structure:
{
'tablename': {
'column_name_1': ['value1', 'value2],
'column_name_2': ['value1', 'value2],
...
},
...
}
:param (str) query: An update sql query as:
`update TABLENAME set column_name_1='value'
where column_name_2='value1'`
:return: List of list of values and header description
"""
try:
match = (re.search("select(.*)from(.*)where(.*)[;]?", query,
re.IGNORECASE | re.DOTALL).groups())
except AttributeError:
print("Select Query pattern mismatch... {}".format(query))
raise
# Parse values from the select query
tablename = match[1].strip().upper()
columns = [col.strip().upper() for col in match[0].split(",")]
if columns == ['*']:
columns = data[tablename].keys()
where = [cmd.upper().strip().replace(' ', '')
for cmd in match[2].split('and')]
# Select values
selected_values = []
nb_lines = len(list(data[tablename].values())[0])
for i in range(nb_lines):
is_match = True
for condition in where:
key_condition, value_condition = (_clean_string(condition)
.split('='))
if data[tablename][key_condition][i].upper() != value_condition:
# Set flag to yes
is_match = False
if is_match:
sub_list = []
for column in columns:
sub_list.append(data[tablename][column][i])
selected_values.append(sub_list)
# Usual descriptor has nested list
description = zip(columns, ['...'] * len(columns))
return selected_values, description
def _read_json_file(file_path):
with open(file_path, 'r') as f_in:
data = json.load(f_in)
return data
And then you have your test in a test_module_yourfunction.py
import pytest
def my_function(db, query):
# Code goes here
#pytest.fixture
def db_connection():
return MockDatabase()
#pytest.mark.parametrize(
("query", "expected"),
[
("SELECT name,phone FROM customer WHERE name='shanky'", {'name' : 'nirma', 'id' : 12313}),
("<second query goes here>", "<second result goes here>")
]
)
def test_my_function(db_connection, query, expected):
assert my_function(db_connection, query) == expected
Now I'm sorry if you can't copy/paste this code and make it work, but you get the feeling :) just trying to help

How to set group = true in couchdb

I am trying to use map/reduce to find the duplication of the data in couchDB
the map function is like this:
function(doc) {
if(doc.coordinates) {
emit({
twitter_id: doc.id_str,
text: doc.text,
coordinates: doc.coordinates
},1)};
}
}
and the reduce function is:
function(keys,values,rereduce){return sum(values)}
I want to find the sum of the data in the same key, but it just add everything together and I get the result:
<Row key=None, value=1035>
Is that a problem of group? How can I set it to true?
Assuming you're using the couchdb package from pypi, you'll need to pass a dictionary with all of the options you require to the view.
for example:
import couchdb
# the design doc and view name of the view you want to use
ddoc = "my_design_document"
view_name = "my_view"
#your server
server = couchdb.server("http://localhost:5984")
db = server["aCouchDatabase"]
#naming convention when passing a ddoc and view to the view method
view_string = ddoc +"/" + view_name
#query options
view_options = {"reduce": True,
"group" : True,
"group_level" : 2}
#call the view
results = db.view(view_string, view_options)
for row in results:
#do something
pass

SQLAlchemy ON DUPLICATE KEY UPDATE

Is there an elegant way to do an INSERT ... ON DUPLICATE KEY UPDATE in SQLAlchemy? I mean something with a syntax similar to inserter.insert().execute(list_of_dictionaries) ?
ON DUPLICATE KEY UPDATE post version-1.2 for MySQL
This functionality is now built into SQLAlchemy for MySQL only. somada141's answer below has the best solution:
https://stackoverflow.com/a/48373874/319066
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a #compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
I should mention that ever since the v1.2 release, the SQLAlchemy 'core' has a solution to the above with that's built in and can be seen under here (copied snippet below):
from sqlalchemy.dialects.mysql import insert
insert_stmt = insert(my_table).values(
id='some_existing_id',
data='inserted value')
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted.data,
status='U'
)
conn.execute(on_duplicate_key_stmt)
Based on phsource's answer, and for the specific use-case of using MySQL and completely overriding the data for the same key without performing a DELETE statement, one can use the following #compiles decorated insert expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if insert.kwargs.get('on_duplicate_key_update'):
fields = s[s.find("(") + 1:s.find(")")].replace(" ", "").split(",")
generated_directive = ["{0}=VALUES({0})".format(field) for field in fields]
return s + " ON DUPLICATE KEY UPDATE " + ",".join(generated_directive)
return s
It's depends upon you. If you want to replace then pass OR REPLACE in prefixes
def bulk_insert(self,objects,table):
#table: Your table class and objects are list of dictionary [{col1:val1, col2:vale}]
for counter,row in enumerate(objects):
inserter = table.__table__.insert(prefixes=['OR IGNORE'], values=row)
try:
self.db.execute(inserter)
except Exception as E:
print E
if counter % 100 == 0:
self.db.commit()
self.db.commit()
Here commit interval can be changed to speed up or speed down
My way
import typing
from datetime import datetime
from sqlalchemy.dialects import mysql
class MyRepository:
def model(self):
return MySqlAlchemyModel
def upsert(self, data: typing.List[typing.Dict]):
if not data:
return
model = self.model()
if hasattr(model, 'created_at'):
for item in data:
item['created_at'] = datetime.now()
stmt = mysql.insert(getattr(model, '__table__')).values(data)
for_update = []
for k, v in data[0].items():
for_update.append(k)
dup = {k: getattr(stmt.inserted, k) for k in for_update}
stmt = stmt.on_duplicate_key_update(**dup)
self.db.session.execute(stmt)
self.db.session.commit()
Usage:
myrepo.upsert([
{
"field11": "value11",
"field21": "value21",
"field31": "value31",
},
{
"field12": "value12",
"field22": "value22",
"field32": "value32",
},
])
The other answers have this covered but figured I'd reference another good example for mysql I found in this gist. This also includes the use of LAST_INSERT_ID, which may be useful depending on your innodb auto increment settings and whether your table has a unique key. Lifting the code here for easy reference but please give the author a star if you find it useful.
from app import db
from sqlalchemy import func
from sqlalchemy.dialects.mysql import insert
def upsert(model, insert_dict):
"""model can be a db.Model or a table(), insert_dict should contain a primary or unique key."""
inserted = insert(model).values(**insert_dict)
upserted = inserted.on_duplicate_key_update(
id=func.LAST_INSERT_ID(model.id), **{k: inserted.inserted[k]
for k, v in insert_dict.items()})
res = db.engine.execute(upserted)
return res.lastrowid
ORM
use upset func based on on_duplicate_key_update
class Model():
__input_data__ = dict()
def __init__(self, **kwargs) -> None:
self.__input_data__ = kwargs
self.session = Session(engine)
def save(self):
self.session.add(self)
self.session.commit()
def upsert(self, *, ingore_keys = []):
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in self.__input_data__.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = self.__input_data__[key]
insert_stmt = insert(self.__table__).values(**udpate_data)
all_ignore_keys = ['id']
if isinstance(ingore_keys, list):
all_ignore_keys =[*all_ignore_keys, *ingore_keys]
else:
all_ignore_keys.append(ingore_keys)
udpate_columns = dict()
for key in self.__input_data__.keys():
if key not in column_keys or key in all_ignore_keys:
continue
else:
udpate_columns[key] = insert_stmt.inserted[key]
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
**udpate_columns
)
# self.session.add(self)
self.session.execute(on_duplicate_key_stmt)
self.session.commit()
class ManagerAssoc(ORM_Base, Model):
def __init__(self, **kwargs):
self.id = idWorker.get_id()
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in kwargs.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = kwargs[key]
ORM_Base.__init__(self, **udpate_data)
Model.__init__(self, **kwargs, id = self.id)
....
# you can call it as following:
manager_assoc.upsert()
manager.upsert(ingore_keys = ['manager_id'])
Got a simpler solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
my_connection.execute(my_table.insert(replace_string=""), my_values)
I just used plain sql as:
insert_stmt = "REPLACE INTO tablename (column1, column2) VALUES (:column_1_bind, :columnn_2_bind) "
session.execute(insert_stmt, data)
Update Feb 2023: SQLAlchemy version 2 was recently released and supports on_duplicate_key_update in the MySQL dialect. Many many thanks to Federico Caselli of the SQLAlchemy project who helped me develop sample code in a discussion at https://github.com/sqlalchemy/sqlalchemy/discussions/9328
Please see https://stackoverflow.com/a/75538576/1630244
If it's ok to post the same answer twice (?) here is my small self-contained code example:
import sqlalchemy as db
import sqlalchemy.dialects.mysql as mysql
from sqlalchemy import delete, select, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "foo"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
engine = db.create_engine('mysql+mysqlconnector://USER-NAME-HERE:PASS-WORD-HERE#localhost/SCHEMA-NAME-HERE')
conn = engine.connect()
# setup step 0 - ensure the table exists
Base().metadata.create_all(bind=engine)
# setup step 1 - clean out rows with id 1..5
del_stmt = delete(User).where(User.id.in_([1, 2, 3, 4, 5]))
conn.execute(del_stmt)
conn.commit()
sel_stmt = select(User)
users = list(conn.execute(sel_stmt))
print(f'Table size after cleanout: {len(users)}')
# setup step 2 - insert 4 rows
ins_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "x"},
{"id": 2, "name": "y"},
{"id": 3, "name": "w"},
{"id": 4, "name": "z"},
]
)
conn.execute(ins_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after insert: {len(users)}')
# demonstrate upsert
ups_stmt = mysql.insert(User).values(
[
{"id": 1, "name": "xx"},
{"id": 2, "name": "yy"},
{"id": 3, "name": "ww"},
{"id": 5, "name": "new"},
]
)
ups_stmt = ups_stmt.on_duplicate_key_update(name=ups_stmt.inserted.name)
# if you want to see the compiled result
# x = ups_stmt.compile(dialect=mysql.dialect())
# print(x.string, x.construct_params())
conn.execute(ups_stmt)
conn.commit()
users = list(conn.execute(sel_stmt))
print(f'Table size after upsert: {len(users)}')
As none of these solutions seem all the elegant. A brute force way is to query to see if the row exists. If it does delete the row and then insert otherwise just insert. Obviously some overhead involved but it does not rely on modifying the raw sql and it works on non orm stuff.

Categories