Postgres tables reflected with SQLAlchemy not recognizing columns for Join

Postgres tables reflected with SQLAlchemy not recognizing columns for Join - python

I'm trying to learn SQLAlchemy and I feel like I must be missing something fundamental. Hopefully, this is a straightforward error on my part and someone can show me the link to the documentation hat explains what I'm doing wrong.
I have a Postgres database running in a Docker container on my local machine. I can connect to it and export queries to Python using psycopg2 with no issues.
I'm trying to recreate what I did using pyscopg2 using SQLAlchemy but I'm having trouble when I try to join two tables. My current code looks like this:
from sqlalchemy import *
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import select
conn = create_engine('postgresql://postgres:my_password#localhost/in_votes')
metadata = MetaData(conn)
metadata.reflect(bind = conn)
pop = metadata.tables['pop']
absentee = metadata.tables['absentee']
Session = sessionmaker(bind=conn)
session = Session()
session.query(pop).join(absentee, county == pop.county).all()
I'm trying to join the pop and absentee tables on the county field and I get the error:
NameError: name 'county' is not defined
I can view the columns in each table and loop through them to access the data.
Can someone clear this up for me and explain what I'm doing wrong?

Related

Python Pandas SQLAlchemy how to make connection to a local SQL Server

I am trying to connect to a local network SQL Server using SQLAlchemy. I don't know how to use SQLAlchemy for doing this. Other examples I have seen do not use the more modern Python (3.6+) f-string. I need to have data in a Pandas dataframe "df". I'm not 100% sure but this local server does not have a username and password requirement...

So this is working right now.
import pandas as pd
import pyodbc
import sqlalchemy as sql
server = 'NetworkServer' # this is the server name that IT said my data is on.
database = 'Database_name' # The name of the database and this database has multiple tables.
table_name = 't_lake_data' # name of the table that I want.
# I'm not sure but this local server does not have a username and password requirement.
engine = sql.create_engine(f'mssql+pyodbc://{server}/{database}?trusted_connection=yes&driver=SQL+Server')
# I don't know all the column names so I use * to represent all column names.
sql_str = f"SELECT * FROM dbo.{table_name}"
df = pd.read_sql_query(sql_str, engine, parse_dates="DATE_TIME")
So if there are concerns with how this looks leave a comment. Thank you.

Bulk insert many rows using sqlalchemy

I want to insert thousands of rows in to Oracle db using Python. For this I am trying to insert bulk_insert_mappings method of a sqlalchemy session. I am following this tutorial where it shows how to load a csv file into the database fast. I failed because bulk_insert_mappings expects a mapper object as well which they don't pass.
The code to create the connection and the mapping without a csv:
from sqlalchemy.sql import select, sqltypes
from sqlalchemy.schema import CreateSchema
from sqlalchemy import create_engine, MetaData, Table, inspect, engine, Column, String, DateTime, insert
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import get_mapper
import pytz
import datetime
import pandas as pd
engine_url = engine.URL.create(
drivername='oracle',
username='ADMIN',
password='****',
host='***',
database='****',
)
oracle_engine = create_engine(engine_url, echo=False)
Session = sessionmaker(bind=oracle_engine)
base = datetime.datetime.today().replace(tzinfo=pytz.utc)
date_list = [base - datetime.timedelta(days=x) for x in range(20)]
df = pd.DataFrame(date_list, columns = ['date_time'])
I use the following line of code to create the table if doesnt exist:
df[:0].to_sql('test_insert_df', oracle_engine, schema='ADMIN', if_exists='replace')
The I used this line to insert data into the table:
with Session() as session:
session.bulk_insert_mappings(df.to_dict('records'))
The traceback I receive is the following:
TypeError: bulk_insert_mappings() missing 1 required positional argument: 'mappings'
How can I create the mapper if I dont use the sqlalchemy ORM to create the table? Looking at this question I know how to create the mapper object with a sqlalchemy model but not otherwise.
Also I have the option of inserting using bulk_insert_objects method but that would also need a model which I dont have
PS: I am doing this because I want to insert many rows into a Oracle database using sqlalchemy. If you have any better solution it is also welcome.

The simplest way to insert would be to use the dataframe's to_sql method, but if you want to use SQLAlchemy you can use its core features to insert the data. There's no need to use the ORM in your use case. Something like this should work, assuming you are using SQLAlchemy 1.4 and you can arrange your data as a list of dicts:
import sqlalchemy as sa
...
tbl = sa.Table('test_insert_of', sa.MetaData(), autoload_with=engine)
ins = tbl.insert()
with engine.begin() as conn:
# This will automatically commit
conn.execute(ins, list_of_dicts)
If you have a lot of data you might want to insert in batches.

How to get the total number of rows in a database through SQLalchemy?

I have created a sqlite database in sqlalchemy and I have inserted data into it. Now I would like to get the total number of rows that was inserted.
I am trying to follow the example of this stack overflow question
Get the number of rows in table using SQLAlchemy
However, this involved creating a 'Congress' class and geoalchemy library, which I am not sure where they came from, or how they are related.
This is my code
import sqlalchemy
from sqlalchemy import create_engine
engine = sqlalchemy.create_engine("sqlite:///dbfile/CSSummaries.db")
pandasDataframeExample.to_sql('CS_table', engine, index = False, if_exists= 'append')
Now I would like to see how many rows have been added, and perhaps see a few samples from the database to make sure everything saved ok.
I would like to see that it has x many rows.
Bonus:
A easy to way to sample some of the data in the database.

Could run it through a for loop:
import sqlalchemy
from sqlalchemy import create_engine
engine = sqlalchemy.create_engine("sqlite:///dbfile/CSSummaries.db")
for i in range(len(pandasDataframeExample)):
DataFrame.iloc[0].to_sql('CS_table', engine, index = False, if_exists= 'append')
if i%10 == 0:
print('rows inserted: ' + str(i))

How to join tables from two different databases using sqlalchemy expression language / sqlalchemy core?

I am using MySql. I was however able to find ways to do this using sqlalchemy orm but not using expression language.So I am specifically looking for core / expression language based solutions. The databases lie on the same server
This is how my connection looks like:
connection = engine.connect().execution_options(
schema_translate_map = {current_database_schema: new_database_schema})
engine_1=create_engine("mysql+mysqldb://root:user#*******/DB_1")
engine_2=create_engine("mysql+mysqldb://root:user#*******/DB_2",pool_size=5)
metadata_1=MetaData(engine_1)
metadata_2=MetaData(engine_2)
metadata.reflect(bind=engine_1)
metadata.reflect(bind=engine_2)
table_1=metadata_1.tables['table_1']
table_2=metadata_2.tables['table_2']
query=select([table_1.c.name,table_2.c.name]).select_from(join(table_2,table_1.c.id==table_2.c.id,'left')
result=connection.execute(query).fetchall()
However, when I try to join tables from different databases it throws an error obviously because the connection belongs to one of the databases. And I haven't tried anything else because I could not find a way to solve this.
Another way to put the question (maybe) is 'how to connect to multiple databases using a single connection in sqlalchemy core'.

Applying the solution from here to Core only you could create a single Engine object that connects to your server, but without defaulting to one database or the other:
engine = create_engine("mysql+mysqldb://root:user#*******/")
and then using a single MetaData instance reflect the contents of each schema:
metadata = MetaData(engine)
metadata.reflect(schema='DB_1')
metadata.reflect(schema='DB_2')
# Note: use the fully qualified names as keys
table_1 = metadata.tables['DB_1.table_1']
table_2 = metadata.tables['DB_2.table_2']
You could also use one of the databases as the "default" and pass it in the URL. In that case you would reflect tables from that database as usual and pass the schema= keyword argument only when reflecting the other database.
Use the created engine to perform the query:
query = select([table_1.c.name, table_2.c.name]).\
select_from(outerjoin(table1, table_2, table_1.c.id == table_2.c.id))
with engine.begin() as connection:
result = connection.execute(query).fetchall()

SQLAlchemy - get date that table was created

I'm connecting to an Oracle database from sqlalchemy and I want to know when the tables in the database were created. I can access this information through the sql developer application so I know that it is stored somewhere, but I don't know if its possible to get this information from sqlalchemy.
Also if its not possible, how should I be getting it?

SqlAlchemy doesn't provide anything to help you get that information. You have to query the database yourself.
something like:
with engine.begin() as c:
result = c.execute("""
SELECT created
FROM dba_objects
WHERE object_name = <<your table name>>
AND object_type = 'TABLE'
""")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Postgres tables reflected with SQLAlchemy not recognizing columns for Join - python

Related

Python Pandas SQLAlchemy how to make connection to a local SQL Server

Bulk insert many rows using sqlalchemy

How to get the total number of rows in a database through SQLalchemy?

How to join tables from two different databases using sqlalchemy expression language / sqlalchemy core?

SQLAlchemy - get date that table was created

Categories

Resources