Pandas to_sql can't write to schema besides 'public' on PostgreSQL - python

I'm trying to write the contents of a data frame to a table in a schema besides the 'public' schema. I followed the pattern described in Pandas writing dataframe to other postgresql schema:
meta = sqlalchemy.MetaData()
engine = create_engine('postgresql://some:user#host/db')
meta = sqlalchemy.MetaData(engine, schema='schema')
meta.reflect(engine, schema='schema')
pdsql = pandas.io.sql.PandasSQLAlchemy(engine, meta=meta)
But when I try to write to the table:
pdsql.to_sql(df, 'table', if_exists='append')
I get the following error:
InvalidRequestError: Table 'schema.table' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.
I also tried adding extend_existing=Trueto the reflect call, but that doesn't seem to make a difference.
How can I get pandas to write to this table?

Update: starting from pandas 0.15, writing to different schema's is supported. Then you will be able to use the schema keyword argument:
df.to_sql('test', engine, schema='a_schema')
As I said in the linked question, writing to different schema's is not yet supported at the moment with the read_sql and to_sql functions (but an enhancement request has already been filed: https://github.com/pydata/pandas/issues/7441).
However, I described a workaround using the object interface. But what I described there only works for adding the table once, not for replacing and/or appending the table. So if you just want to add, first delete the existing table and then write again.
If you want to append to the table, below is a little bit more hacky workaround. First redefine has_table and get_table:
def has_table(self, name):
return self.engine.has_table(name, schema=self.meta.schema)
def get_table(self, table_name):
if self.meta.schema:
table_name = self.meta.schema + '.' + table_name
return self.meta.tables.get(table_name)
pd.io.sql.PandasSQLAlchemy.has_table = has_table
pd.io.sql.PandasSQLAlchemy.get_table = get_table
Then create the PandasSQLAlchemy object as you did, and write the data:
meta = sqlalchemy.MetaData(engine, schema='schema')
meta.reflect()
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'table', if_exists='append')
This is obviously not the good way to do, but we are working to provide a better API for 0.15. If you want to help, pitch in at https://github.com/pydata/pandas/issues/7441.
Beware! This interface (PandasSQLAlchemy) is not yet really public and will still undergo changes in the next version of pandas, but this is how you can do it for pandas 0.14(.1).
Update: PandasSQLAlchemy is renamed to SQLDatabase in pandas 0.15.

Related

Bulk insert using sqlalchemy Engine

Is there a way to bulk-insert/update values into a Microsoft SQLserver Database using Engine?
I have read several (very) old posts regarding this, and it seems not very easy to do (back then).
E.g in some examples we need to create a class, add those classes to a session and at last commit the session.
Isn't there a way like (pseudo) this:
from sqlalchemy import String, Integer, Float
values= [(1,"hello",2.5),(2,"world",10.5)] #values to insert
table = "my_schema.my_table" #Table name
col = ["id","statement","ratio"] #Name of the columns in the database
type = [Integer,String,Float] #Type of each value
engine = sqlalchemy.create_engine(connection_string)
with engine.session():
try:
engine.bulk_insert(table,values,col,type)
except:
engine.rollback()
or something else, instead of looping over engine.execute("INSERT INTO ...")?
I know I can use pandas.DataFrame.to_sql but since I want to be able to roll-back in case of errors etc. I won't use that

Solved: Adding new Column to ORM SQLAlchemy table in a volatile setting

I am working on a open source persistance layer for a MQTT-Broker https://github.com/volkerjaenisch/amqtt_db
Incoming MQTT messages are irregular blobs of data so usually the DB-Backend is some kind of object storage.
I do it the hard way and deserialize the blobs into typed data colums and store them into a fast relational database. My finally target will be timescaleDB but first I go via SQLAlchemy to access a wide bunch of DBs with one API.
MQTT messages are volatile (think not always complete) so the DB scheme has to adjust dynamically e.g. adding new columns for new information.
First Message:
Time: 1234
Temperature : 23.4
Second Message:
Time: 1245
Temperature : 23.6
Rel Hum : 87 %
I have used SQLalchemy ORM for more than a decade but always for quite static databases. So I am new to work dynmically.
Utilizing the ORM to build DB tables dynamically from the structure of incoming MQTT-Messages was quite doable and worked out perfect.
But currently I am stuck with the case of additional information in the MQTT-Packages that extends the tables with new columns.
What I did so far:
Utilizing sqlalchemy-migration it was quite easy to dynamically add new columns to the existing table in the DB. In the code "topic_cls" is the declarative class and "column_def" a col_name - type mapping.
from migrate.versioning.schema import Table as MiTable, Column as MiColumn
def add_new_colums(self, topic_cls, column_def):
table_name = str(topic_cls.__table__.name)
table = MiTable(table_name, self.metadata)
for col_name, col_type in column_def.items():
col = MiColumn(col_name, col_type)
col.create(table)
Works like a charm. But how to get this changes to the DB reflected back into declarative classes? I tried to get a new inspection of the table:
new_table = Table(topic_cls.__table__.name, self.metadata, autoload_with=self.engine)
This also works but it gives me a new table but not a declarative base.
So my stupid questions are:
Is this the right way to achive my goal?
How can I get a declarative class by inspecting an already existing table in a DB?
"Drop the ORM and use SQL" is not the answer I am looking for.
Cheers,
Volker
Found a solution but it is a bit of a hack.
new_table = Table("test/topic_growth", Base.metadata, autoload_with=self.engine)
Base.metadata.remove(topic_cls.__table__)
new_dcl = type(str(table_name), (Base,), {'__table__': new_table})
Base.metadata._add_table(table_name, None, new_table)
After you obtained the new table via inspection, remove the old table entry from the metadata.
Then generate a new declarative base with the new table and same table name.
At last add the new table to the metadata.

How to convert select_from object into a new table in sqlalchemy

I have a database that contains two tables in the data, cdr and mtr. I want a join of the two based on columns ego_id and alter_id, and I want to output this into another table in the same database, complete with the column names, without the use of pandas.
Here's my current code:
mtr_table = Table('mtr', MetaData(), autoload=True, autoload_with=engine)
print(mtr_table.columns.keys())
cdr_table = Table('cdr', MetaData(), autoload=True, autoload_with=engine)
print(cdr_table.columns.keys())
query = db.select([cdr_table])
query = query.select_from(mtr_table.join(cdr_table,
((mtr_table.columns.ego_id == cdr_table.columns.ego_id) &
(mtr_table.columns.alter_id == cdr_table.columns.alter_id))),
)
results = connection.execute(query).fetchmany()
Currently, for my test code, what I do is to convert the results as a pandas dataframe and then put it back in the original SQL database:
df = pd.DataFrame(results, columns=results[0].keys())
df.to_sql(...)
but I have two problems:
loading everything into a pandas dataframe would require too much memory when I start working with the full database
the columns names are (apparently) not included in results and would need to be accessed by results[0].keys()
I've checked this other stackoverflow question but it uses the ORM framework of sqlalchemy, which I unfortunately don't understand. If there's a simpler way to do this (like pandas' to_sql), I think this would be easier.
What's the easiest way to go about this?
So I found out how to do this via CREATE TABLE AS:
query = """
CREATE TABLE mtr_cdr AS
SELECT
mtr.idx,cdr.*
FROM mtr INNER JOIN cdr
ON (mtr.ego_id = cdr.ego_id AND mtr.alter_id = cdr.alter_id)""".format(new_table)
with engine.connect() as conn:
conn.execute(query)
The query string seems to be highly sensitive to parentheses though. If I put a parentheses enclosing the whole SELECT...FROM... statement, it doesn't work.

Less Memory-intense way of copying tables & renaming columns in sqlite/pandas

I have found a very nice way to:
read a table from a sql database
rename the columns with a dict (read from a yaml file)
rewrite the table to another database
The only problem is, that as the table becomes bigger(10col x several million rows), reading the table into a pandas is so memory-intensive, that it causes the process to be killed.
There must be an easier way. I looked at alter table statements but they seem to be very complicated as well& will not do the copying in another db. Any ideas on how to do the same operation without using this much memory. Feeling like pandas are a crutch I use due to my bad sql.
import pandas as pd
import sqlite3
def translate2generic(sourcedb, targetdb, sourcetable,
targettable, toberenamed):
"""Change table's column names to fit generic api keys.
:param: Path to source db
:param: Path to target db
:param: Name of table to be translated in source
:param: Name of the newly to be created table in targetdb
:param: dictionary of translations
:return: New column names in target db
"""
sourceconn = sqlite3.connect(sourcedb)
targetconn = sqlite3.connect(targetdb)
table = pd.read_sql_query('select * from ' + sourcetable, sourceconn) #this is the line causing the crash
# read dict in the format {"oldcol1name": "newcol1name", "oldcol2name": "newcol2name"}
rename = {v: k for k, v in toberenamed.items()}
# rename columns
generic_table = table.rename(columns=rename)
# Write table to new database
generic_table.to_sql(targettable, targetconn, if_exists="replace")
targetconn.close()
sourceconn.close()
I've looked also at solutions such as this one but they suppose you know the type of the columns.
An elegant solution would be very much appreciated.
Edit: I know there is a method in sqlite since the September release 3.25.0, but I am stuck with version 2.6.0
To elaborate on my comments...
If you have a table in foo.db and want to copy that table's data to a new table in bar.db with different column names:
$ sqlite3 foo.db
sqlite> ATTACH 'bar.db' AS bar;
sqlite> CREATE TABLE bar.newtable(newcolumn1, newcolumn2);
sqlite> INSERT INTO bar.newtable SELECT oldcolumn1, oldcolumn2 FROM main.oldtable;

How do I handle database columns with reserved characters in SQLAlchemy ORM?

I'm somewhat new to SQLAlchemy ORM, and I'm trying to select and then store data from a column within a view that has a forward slash in the name of the column.
The databases are mapped using the following:
source_engine = create_engine("...")
base = automap_base()
base.prepare(source_engine, reflect=True)
metadata = MetaData(self.engine)
table_1 = Table("table_1", self.metadata, autoload=True)
The second destination table is mapped the same way.
Then, I connect to this database, and I'm trying to select information from columns to copy into a different database:
source_table_session = Session(source_engine)
dest_table_session = Session(dest_engine)
table_1_data = table_1_session.query(table_1)
for instance in table_1_data:
newrow = dest_table.base.classes.dest_table()
newrow.Column1 = instance.Column1 # This works fine, column has normal name
But then, the problem is that there's a column in the view with the name "Slot/Port"
With a direct query, you can do:
select "Slot/Port" from source_database;
But in ORM, you can't just type:
newrow.Slot/Port = instance.Slot/Port
or
newrow.'Slot/Port' = instance.'Slot/Port'
That isn't going to be correct, and the following doesn't work either:
newrow.SlotPort = instance.SlotPort
AttributeError: 'result' object has no attribute 'SlotPort'
I have no control over how columns are named in the source database.
I find the SQLAlchemy documentation to be generally fragmented (only showing small snippets of code) and confusing, so I'm not sure if this is kind of thing is addressed or not. Is there a way to get around this limitation, or perhaps if the columns are already mapped to a valid name without a slash or a way to do so?
Thanks to #DeepSpace for helping me find the answer.
Instead of
newrow.whatever = instance.whatever
I needed:
setattr(newrow, 'Slot/Port', getattr(instance, 'Slot/Port'))

Categories