sqlalchemy queries not fetching right data

sqlalchemy queries not fetching right data - python

I have a basic query that I run over and over at each 2 minutes to extract all records that have a flag set to 1/true etc.
If I run the script from the command and I have a record with the flag set it extracts it then, if I go to mysql directly and re-set that flag to true/1 the next time (2 minutes) the query is executed the record is not found.
I have enabled the queries executed to be printed out to my console and if I execute the query directly into mysql I can see the record showing up. Why isnt sqlalchemy finding it?
Here's my config:
engine = create_engine( config.DATABASE_URI, pool_recycle=1800 )
metadata = MetaData()
db_session = scoped_session( sessionmaker( bind = engine,
autoflush = True,
autocommit = False ) )

The problem may be related to the scoped session. With this type of session, a session is bind to every server thread.. what framework are you using?

Related

FASTAPI testing database not creating database

I'm trying to test my FASTAPI app. Seems to me, all settings are correct.
test_users.py
engine = create_engine(
f"postgresql"
f"://{settings.database_username}"
f":{settings.database_password}"
f"#{settings.database_hostname}"
f":{settings.database_port}"
f"/test_{settings.database_name}"
)
TestingSessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base.metadata.create_all(bind=engine)
def override_get_db():
try:
db = TestingSessionLocal()
yield db
finally:
db.close()
app.dependency_overrides[get_db] = override_get_db
client = TestClient(app)
def test_create_user():
response = client.post(
"/users/",
json={"email": "nikita#gmail.com", "password": "password"}
)
new_user = schemas.UserOutput(**response.json())
assert response.status_code == 201
assert new_user.email == "nikita#gmail.com"
When I run pytest, I get this error:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 5432 failed: FATAL: database "test_social_media_api" does not exist
Why is the code not creating the database?

With engine = create_engine("postgresql://...") you define a connection to an existing PostgreSql database.
And with Base.metadata.create_all(bind=engine) you create the tables - according to your models - in the existing database.
So the code that you written does not create a database, it expects that you give it an already existing database.
And that has to do with PostgreSQL itself.
PostgreSQL runs as a server, and a PostgreSQL server can run multiple databases. And each database has to be created explicitly.
Just telling SQLAlchemy the connection string is not enough.
It's possible to create a new database from Python itself by connecting to the PostgreSQL server (see https://www.tutorialspoint.com/python_data_access/python_postgresql_create_database.htm), or alternatively you can create it manually before you run your script. E.g. by running CREATE DATABASE databasename; inside psql (or any other database tool).
However if you want to test using a running database, I would suggest using testcontainers. They will spawn a new PostgreSQL server with an empty database everytime you run the tests.
Notice, that the example from the FastAPI documentation works differently.
They just use
SQLALCHEMY_DATABASE_URL = "sqlite:///./test.db"
engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check_same_thread": False}
)
Base.metadata.create_all(bind=engine)
which creates the database.
This works, because SQLite doesn't run as a server. It's just one file that represents the full database, and if the file doesn't exist, the sqlite database adapter will assume that the database is just empty, and create a new file for you. PostgreSQL doesn't work like this though.

Fetch from one database, Insert/Update into another using SQLAlchemy

We have data in a Snowflake cloud database that we would like to move into an Oracle database. As we would like to work toward refreshing the Oracle database regularly, I am trying to use SQLAlchemy to automate this.
I would like to do this using Core because my team is all experienced with SQL, but I am the only one with Python experience. I think it would be easier to tweak the data pulls if we just pass SQL strings. Plus the Snowflake db has some columns with JSON that seems easier to parse using direct SQL since I do not see JSON in the SnowflakeDialect.
I have established connections to both databases and am able to do select queries from both. I have also manually created the tables in our Oracle db so that the keys and datatypes match what I am pulling from Snowflake. When I try to insert, though, my Jupyter notebook just continuously says "Executing Cell" and hangs. Any thoughts on how to proceed or how to get the notebook to tell me where the hangup is?
from sqlalchemy import create_engine,pool,MetaData,text
from snowflake.sqlalchemy import URL
import pandas as pd
eng_sf = create_engine(URL( #engine for snowflake
account = 'account'
user = 'user'
password = 'password'
database = 'database'
schema = 'schema'
warehouse = 'warehouse'
role = 'role'
timezone = 'timezone'
))
eng_o = create_engine("oracle+cx_oracle://{}[{}]:{}#{}".format('user','proxy','password','database'),poolclass=pool.NullPool) #engine for oracle
meta_o = MetaData()
meta_o.reflect(bind=eng_o)
person_o = meta_o['bb_lms_person'] # other oracle tables follow this example
meta_sf = MetaData()
meta_sf.reflect(bind=eng_sf,only=['person']) # other snowflake tables as well, but for simplicity, let's look at one
person_sf = meta_sf.tables['person']
person_query = """
SELECT ID
,EMAIL
,STAGE:student_id::STRING as STUDENT_ID
,ROW_INSERTED_TIME
,ROW_UPDATED_TIME
,ROW_DELETED_TIME
FROM cdm_lms.PERSON
"""
with eng_sf.begin() as connection:
result = connection.execute(text(person_query)).fetchall() # this snippet runs and returns result as expected
with eng_o.begin() as connection:
connection.execute(person_o.insert(),result) # this is a coinflip, sometimes it runs, sometimes it just hangs 5ever
eng_sf.dispose()
eng_o.dispose()
I've checked the typical offenders. The keys for both person_o and the result are all lowercase and match. Any guidance would be appreciated.

use the metadata for the table. the fTable_Stage update or inserted as fluent functions and assign values to lambda variables. This is very safe because only metadata field variables can be used in the lambda. I am updating three fields:LateProbabilityDNN, Sentiment_Polarity, Sentiment_Subjectivity
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
connection=engine.connect()
metadata=MetaData()
Session = sessionmaker(bind = engine)
session = Session()
fTable_Stage=Table('fTable_Stage', metadata,autoload=True,autoload_with=engine)
stmt=fTable_Stage.update().where(fTable_Stage.c.KeyID==keyID).values(\
LateProbabilityDNN=round(float(late_proba),2),\
Sentiment_Polarity=round(my_valance.sentiment.polarity,2),\
Sentiment_Subjectivity= round(my_valance.sentiment.subjectivity,2)\
)
connection.execute(stmt)

postgres pg8000 create database

I am trying to create a database with pg8000 driver of postgressql , but unable to create. Creating a db manually and then connecting to it works fine with me, but i need to create db with my code. I am getting error "sqlalchemy.exc.ProgrammingError: (pg8000.ProgrammingError)". I have tried below code for creating a db.
from sqlalchemy import create_engine
dburl = "postgresql+pg8000://user:pswd#myip:5432/postgres/"
engine = create_engine(dburl)
conn = engine.connect()
conn.execute("COMMIT")
conn.execute("CREATE DATABASE qux")
I also tried with below -
from sqlalchemy import create_engine
from sqlalchemy.engine import url
settings ={"drivername" : "postgresql+pg8000", "host" : "myip","port" : 5432,"username" : "user","password" : "pswd","database" : "MyTestDB"}
db=create_engine(url.URL(**settings))
db.execute("commit")
This is the exact Error i am getting """sqlalchemy.exc.ProgrammingError: (pg8000.ProgrammingError) ('ERROR', '25001', 'CREATE DATABASE cannot run inside a transaction block') [SQL: 'create database workDB']""""
Please suggest as to how i can create this db...

Here's a solution:
from sqlalchemy import create_engine
dburl = "postgresql+pg8000://user:pswd#myip:5432/postgres/"
engine = create_engine(dburl)
conn = engine.connect()
con.rollback() # Make sure we're not in a transaction
con.autocommit = True # Turn on autocommit
conn.execute("CREATE DATABASE qux")
con.autocommit = False # Turn autocommit back off again
The docs talk about this problem of executing commands that can't be run in a transaction. The thing is that pg8000 automatically executes a begin transaction before any execute() if there isn't already a transaction in progress. This is fine until you come to execute a command that can't be executed inside a transaction. In that case you have to enter autocommit mode, which implicitly starts a transaction before the statement and commits it afterwards, but automatically avoids doing this if (like CREATE DATABASE) the statement can't be executed within a transaction.

sqlAlchemy does not recognise changes to DB made outside of session

Something peculiar I've noticed is that any changes committed to the DB outside of the session (such as ones made in MySQL's Workbench) are not recognised in the sqlAlchemy session. I have to close and open a new session for sqlAlchemy to recognise it.
For example, a row I deleted manually is still fetched from sqlAlchemy.
This is how I initialise the session:
engine = create_engine('mysql://{}:{}#{}/{}'.format(username, password, host, schema), pool_recycle=3600)
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData()
How can I get sqlAlchemy to recognise them?
My sqlAlchemy version is 0.9.4 and my MySQL version is 5.5.34. We use only sqlAlchemy's Core (no ORM).

To be able to read committed data from others transactions you'll need to set transaction isolation level to READ COMMITTED. For sqlalchemy and mysql:
To set isolation level using create_engine():
engine = create_engine(
"mysql://scott:tiger#localhost/test",
isolation_level="READ COMMITTED")
To set using per-connection execution options:
connection = engine.connect()
connection = connection.execution_options(
isolation_level="READ COMMITTED")
source

Zombie Connection in SQLAlchemy

DBSession = sessionmaker(bind=self.engine)
def add_person(name):
s = DBSession()
s.add(Person(name=name))
s.commit()
Everytime I run add_person() another connection is created with my postgreSQL DB.
Looking at:
SELECT count(*) FROM pg_stat_activity;
I see the count going up, until I get a Remaining connection slots are reserved for non-replication superuser connections error.
How do I kill those connections? Am I wrong in opening a new session everytime I want to add a Person record?

In general, you should keep your Session object (here DBSession) separate from any functions that make changes to the database. So in your case you might try something like this instead:
DBSession = sessionmaker(bind=self.engine)
session = DBSession() # create your session outside of functions that will modify database
def add_person(name):
session.add(Person(name=name))
session.commit()
Now you will not get new connections every time you add a person to the database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.