Bulk move rows from one table to another with SQLAlchemy

Bulk move rows from one table to another with SQLAlchemy - python

I have two identical tables post and old_post. I have a query that checks for old posts. I would like to move the rows returned by the query into the table old_post and delete the rows from table post.
I could solve this by iterating through the results returned by the initial query and update my results this way, however I am worried this is very inefficient and will start to cause me problems when I have 1,000+ rows. How can I efficiently "bulk move" rows from one table to another?

Query for old Posts. Bulk insert OldPosts with the old Posts' data. Bulk delete the old Posts.
keys = db.inspect(Post).columns.keys()
get_columns = lambda post: {key: getattr(post, key) for key in keys}
posts = Post.query.filter(Post.expiry > ts)
db.session.bulk_insert_mappings(OldPost, (get_columns(post) for post in posts))
posts.delete()
db.session.commit()
The get_columns function takes a Post instance and creates a dictionary out of the column keys and values. Read the docs and warnings about using bulk insert and delete operations.

You can use Common Table Expressions in PostgreSQL
WITH moved_posts AS (
DELETE FROM post
WHERE expiry > time_stamp
RETURNING *
)
INSERT INTO old_post
SELECT * from moved_posts
CTE support for DELETE will be added in SQLAlchemy 1.1. In current release you can execute raw SQL
from sqlalchemy import text
sql = text('''
WITH moved_posts AS (
DELETE FROM post
WHERE expiry > ?
RETURNING *
)
INSERT INTO old_post
SELECT * from moved_posts
''')
db.session.execute(sql, [time_stamp])
db.session.commit()
In SQLAlchemy 1.1 it would look like this
posts = Post.__table__
old_posts = OldPost.__table__
moved_posts = (
posts.delete()
.where(posts.c.expiry > ts)
.returning(*(posts.c._all_columns))
.cte('moved_posts'))
insert = (
old_posts.insert()
.from_select(
[c.name for c in moved_posts.columns],
moved_posts.select()
))
db.session.execute(insert)
db.session.commit()

Related

how to delete the only the rows in postgres but not to drop table using pandas read_sql_query method?

I wanted to perform an operation where i would like to delete all the rows(but not to drop the table) in postgres and update with new rows in it. And I wanted to use pd.read_sql_query() method from pandas:
qry = 'delete from "table_name"'
pd.read_sql_query(qry, conection, **kwargs)
But it was throwing error 'ResourceClosedError: This result object does not return rows. It has been closed automatically.'
I can expect this because the method should return the empty dataframe.But it was not returning any empty dataframe but only the the above error. Could you please help me in resolving it??

I use MySql, but the logic is the same:
Query 1: Choose all ids from you table
Quear 2: Delete all this ids
As a result you have:
Delete FROM table_name WHERE id IN (Select id FROM table_name)
The line do not return anuthing, it just delete all rows with a special id. I recomend to do the command using psycopg only - no pandas.
Then you need another query to get smth from db like:
pd.read_sql_query("SELECT * FROM table_name", conection, **kwargs)
Probably (I do not use pandas to read from db) in this case you'll get empty dataframe with Column names
Probably you can combine all the actions, the following way:
pd.read_sql_query('''Delete FROM table_name WHERE id IN (Select id FROM table_name); SELECT * FROM table_name''', conection, **kwargs)
Please try and share your results.

You can follow the next steps!
Check 'row existence' first in the table.
And then delete rows
Example code
check_row_query = "select exists(select * from tbl_name limit 1)"
check_exist = pd.read_sql_query(check_row_query, con)
if check_exist.exists[0]:
delete_query = 'DELETE FROM tbl_name WHERE condtion(s)'
con.execute(delete_query) # to delete rows using a sqlalchemy function
print('Delete all rows!)
else:
pass

loop over all tables in mysql databases

I am new with MySQL and I need some help please. I am using MySQL connector to write scripts.
I have database contain 7K tables and I am trying to select some values from some of these tables
cursor.execute( "SELECT SUM(VOLUME) FROM stat_20030103 WHERE company ='Apple'")
for (Volume,) in cursor:
print(Volume)
This works for one table e.g (stats_20030103). However I want to sum all volume of all tables .startwith (stats_2016) where the company name is Apple. How I can loop over my tables?

I'm not an expert in MySQL, but here is something quick and simple in python:
# Get all the tables starting with "stats_2016" and store them
cursor.execute("SHOW TABLES LIKE 'stats_2016%'")
tables = [v for (v, ) in cursor]
# Iterate over all tables, store the volumes sum
all_volumes = list()
for t in tables:
cursor.execute("SELECT SUM(VOLUME) FROM %s WHERE company = 'Apple'" % t)
# Get the first row as is the sum, or 0 if None rows found
all_volumes.append(cursor.fetchone()[0] or 0)
# Return the sum of all volumes
print(sum(all_volumes))

You can probably use select * from information_schema.tables to get all tables name into your query.

I'd try to left-join.
SELECT tables.*, stat.company, SUM(stat.volume) AS volume
FROM information_schema.tables AS tables LEFT JOIN mydb.stat_20030103 AS stat
WHERE tables.schema = "mydb" GROUP BY stat.company;
This will give you all results at once. Maybe MySQL doesn't support joining from metatables, in which case you might select it into a temporary table.
CREATE TEMPORARY TABLE mydb.tables SELECT name FROM information_schema.tables WHERE schema = "mydb"
See MySQL doc on information_schema.table.

How can django produce this SQL?

I have the following SQL query that returns what i need:
SELECT sensors_sensorreading.*, MAX(sensors_sensorreading.timestamp) AS "last"
FROM sensors_sensorreading
GROUP BY sensors_sensorreading.chipid
In words: get the last sensor reading entry for each unique chipid.
But i cannot seem to figure out the correct Django ORM statement to produce this query. The best i could come up with is:
SensorReading.objects.values('chipid').annotate(last=Max('timestamp'))
But if i inspect the raw sql it generates:
>>> print connection.queries[-1:]
[{u'time': u'0.475', u'sql': u'SELECT
"sensors_sensorreading"."chipid",
MAX("sensors_sensorreading"."timestamp") AS "last" FROM
"sensors_sensorreading" GROUP BY "sensors_sensorreading"."chipid"'}]
As you can see, it almost generates the correct SQL, except django selects only the chipid field and the aggregate "last" (but i need all the table fields returned instead).
Any idea how to return all fields?

Assuming you also have other fields in the table besides chipid and timestamp, then I would guess this is the SQL you actually need:
select * from (
SELECT *, row_number() over (partition by chipid order by timestamp desc) as RN
FROM sensors_sensorreading
) X where RN = 1
This will return the latest rows for each chipid with all the data that is in the row.

MySQL - Match two tables contains HUGE DATA and find the similar data

I have two tables in my SQL.
Table 1 contains many data, but Table 2 contains huge data.
Here's the code I implement using Python
import MySQLdb
db = MySQLdb.connect(host = "localhost", user = "root", passwd="", db="fak")
cursor = db.cursor()
#Execute SQL Statement:
cursor.execute("SELECT invention_title FROM auip_wipo_sample WHERE invention_title IN (SELECT invention_title FROM us_pat_2005_to_2012)")
#Get the result set as a tuple:
result = cursor.fetchall()
#Iterate through results and print:
for record in result:
print record
print "Finish."
#Finish dealing with the database and close it
db.commit()
db.close()
However, it takes so long. I have run the Python script for 1 hour, and it still doesn't give me any results yet.
Please help me.

Do you have index on invention_title in both tables? If not, then create it:
ALTER TABLE auip_wipo_sample ADD KEY (`invention_title`);
ALTER TABLE us_pat_2005_to_2012 ADD KEY (`invention_title`);
Then combine your query into one which don't use subqueries:
SELECT invention_title FROM auip_wipo_sample
INNER JOIN us_pat_2005_to_2012 ON auip_wipo_sample.invention_title = us_pat_2005_to_2012.invention_title
And let me know about your results.

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.

Query for just a single known column:
session.query(MyTable.col1).count()

I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.

I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()

Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.

If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)

I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()

def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch

query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...

Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bulk move rows from one table to another with SQLAlchemy - python

Related

how to delete the only the rows in postgres but not to drop table using pandas read_sql_query method?

loop over all tables in mysql databases

How can django produce this SQL?

MySQL - Match two tables contains HUGE DATA and find the similar data

How to count rows with SELECT COUNT(*) with SQLAlchemy?

Categories

Resources