I'm having problems importing datetimes from a SQL Server database into Pandas.
I'm using the following code:
data = pd.read_sql('select top 10 timestamp from mytable',db)
'MyTable' contains a column 'Timestamp', which is of type DateTime2.
If db is a pyodbc database connection this works fine, and my timestamps are returned as data type 'datetime64[ns]'. However if db an SQL Alchemy engine created using create_engine('mssql+pyodbc://...') then the timestamps returned in data are of type 'object' and cause problems later on in my code.
Any idea why this happens? I'm using pandas version 0.14.1, pyodbc version 3.0.7 and SQL alchemy version 0.9.4. How best can I force the data into datetime64[ns]?
Turns out the problem originates from how SQL Alchemy calls PyODBC. By default it will use the 'SQL Server' driver, which doesn't support DateTime2. When I was using PyODBC directly, I was using the 'SQL Server Native Client 10.0' driver.
To get the correct behaviour, i.e. return python datetime objects, I needed to create the SQL Alchemy engine as follows:
import sqlalchemy as sql
connectionString = 'mssql+pyodbc://username:password#my_server/my_database_name?driver=SQL Server Native Client 10.0'
engine = sql.create_engine(connectionString)
The ?driver=... part forces SQL Alchemy to use the right driver.
Related
Context
I just get into trouble while trying to do some I/O operations on some databases from a Python3 script.
When I want to connect to a database, I habitually use psycopg2 in order to handle the connections and cursors.
My data are usually stored as Pandas DataFrames and/or GeoPandas's equivalent GeoDataFrames.
Difficulties
In order to read data from a database table;
Using Pandas:
I can rely on its .read_sql() methods which takes as a parameter con, as stated in the doc:
con : SQLAlchemy connectable (engine/connection) or database str URI
or DBAPI2 connection (fallback mode)'
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
for engine disposal and connection closure for the SQLAlchemy connectable. See
`here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on its .read_postigs() methods which takes as a parameter con, as stated in the doc:
con : DB connection object or SQLAlchemy engine
Active connection to the database to query.
In order to write data to a database table;
Using Pandas:
I can rely on the .to_sql() methods which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on the .to_sql() methods (which directly relies on the Pandas .to_sql()) which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
From here, I easily understand that GeoPandas is built on Pandas especially for its GeoDataFrame object, which is, shortly, a special DataFrame that can handle geographic data.
But I'm wondering why do GeoPandas has the ability to directly takes a psycopg2 connection as an argument and not Pandas and if it is planned for the latter?
And why is it neither the case for one nor the other when it comes to writing data?
I would like (as probably many of others1,2) to directly give them a psycopg2 connections argument instead of relying on SQLAlchemy engine.
Because even is this tool is really great, it makes me use two different frameworks to connect to my database and thus handle two different connection strings (and I personally prefer the way psycopg2 handles the parameters expansion from a dictionary to build a connection string properly such as; psycopg2.connect(**dict_params) vs URL injection as explained here for example: Is it possible to pass a dictionary into create_engine function in SQLAlchemy?).
Workaround
I was first creating my connection string with psycopg2 from a dictionary of parameters this way:
connParams = ("user={}", "password={}", "host={}", "port={}", "dbname={}")
conn = ' '.join(connParams).format(*dict_params.values())
Then I figured out it was better and more pythonic this way:
conn = psycopg2.connect(**dict_params)
Which I finally replaced by this, so that I can interchangeably use it to build either a psycopg2 connections, or a SQLAlchemy engine:
def connector():
return psycopg2.connect(**dict_params)
a) Initialize a psycopg2 connection is now done by:
conn = connector()
curs = conn.cursor()
b) And initialize a SQLAlchemy engine by:
engine = create_engine('postgresql+psycopg2://', creator=connector)
(or with any of your flavored db+driver)
This is well documented here:
https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
and here:
https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine
[1] Dataframe to sql without Sql Alchemy engine
[2] How to write data frame to Postgres table without using SQLAlchemy engine?
Probably the main reason why to_sql needs a SQLAlchemy Connectable (Engine or Connection) object is that to_sql needs to be able to create the database table if it does not exist or if it needs to be replaced. Early versions of pandas worked exclusively with DBAPI connections, but I suspect that when they were adding new features to to_sql they found themselves writing a lot of database-specific code to work around the quirks of the various DDL implementations.
On realizing that they were duplicating a lot of logic that was already in SQLAlchemy they likely decided to "outsource' all of that complexity to SQLAlchemy itself by simply accepting an Engine/Connection object and using SQLAlchemy's (database-independent) SQL Expression language to create the table.
it makes me use two different frameworks to connect to my database
No, because .read_sql_query() also accepts a SQLAlchemy Connectable object so you can just use your SQLAlchemy connection for both reading and writing.
Both are supposed to parse connection string and able to insert into say, SQL Server from pandas dataframe.
What is the real difference here?
PyODBC allows you connecting to and using an ODBC database using the standard DB API 2.0. SQL Alchemy is a toolkit that resides one level higher than that and provides a variety of features:
Object-relational mapping (ORM)
Query constructions
Caching
Eager loading
and others. It can work with PyODBC or any other driver that supports DB API 2.0.
I'm trying to connect to an oldschool jTDS ms server for a variety of different analysis tasks. Firstly just using Python with SQL alchemy, as well as using Tableau and Presto.
Focusing on SQL Alchemy first at the moment I'm getting an error of:
Data source name not found and no default driver specified
With this, based on this thread here Connecting to SQL Server 2012 using sqlalchemy and pyodbc
i.e,
import urllib
params = urllib.parse.quote_plus("DRIVER={FreeTDS};"
"SERVER=x-y.x.com;"
"DATABASE=;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={FreeTDS}".format(params))
Connecting works fine through Dbeaver, using a jTDS SQL Server (MSSQL) driver (which is labelled as legacy).
Curious as to how to resolve this issue, I'll keep researching away, but would appreciate any help.
I imagine there is an old drive on the internet I need to integrate into SQL Alchemy to begin with, and then perhaps migrating this data to something newer.
Appreciate your time
I am trying to switch a pyodbc connection to sqlalchemy engine. My working pyodbc connection is:
con = pyodbc.connect('DSN=SageLine50v23;UID=#####;PWD=#####;')
This is what I've tried.
con = create_engine('pyodbc://'+username+':'+password+'#'+url+'/'+db_name+'?driver=SageLine50v23')
I am trying to connect to my Sage 50 accounting data but just can't work out how to build the connection string. This is where I downloaded the odbc driver https://my.sage.co.uk/public/help/askarticle.aspx?articleid=19136.
I got some orginal help for the pyodbc connection using this website (which is working) https://www.cdata.com/kb/tech/sageuk-odbc-python-linux.rst but would like to use SQLAlchemy for it connection with pandas. Any ideas? Assume the issue is with this part pyodbc://
According to this thread Sage 50 uses MySQL to store its data. However, Sage also provides its own ODBC driver which may or may not use the same SQL dialect as MySQL itself.
SQLAlchemy needs to know which SQL dialect to use, so you could try using the mysql+pyodbc://... prefix for your connection URI. If that doesn't work (presumably because "Sage SQL" is too different from "MySQL SQL") then you may want to ask Sage support if they know of a SQLAlchemy dialect for their product.
I'm trying to connect to our internal Teradata database, using flask and sqlAlchemy along with a custom engine called sqlalchemy teradata. I put the database into the create_engine function likes so.
engine = sqlalchemy.create_engine('teradata://username:pw#server_name/database')
I've setup my dialect just like in the tests
registry.register("tdalchemy", "sqlalchemy_teradata.dialect", "TeradataDialect")
I'm getting a.
DatabaseError: (teradata.api.DatabaseError) (3807, u"[42S02] [Teradata][ODBC Teradata Driver][Teradata Database] Object 'table_name' does not exist
I can make raw sql queries just fine, I can also have alchemy do a query I construct and it pulls the data. I'm not sure what all is preventing things from working properly at all. When I test a similar call but looking at a database in an psql server it works just fine and pulls from that db without issue.
Also the pypi page says there is supposed to be an test/orm_test.py but it doesn't seem to have it.