How avoid Broken Pipe with Sqlalchemy & MySQL? - python

I am trying to move from SQLite to MySQL - and I am almost there. However, I all the time encounter one problem, that plagues me regardless if I am connecting to my local MySQL db, or the one hosted on Google Cloud. Have anyone had the same issue.
This is the code I use to append pandas dataframe to the table
import pandas as pd
from sqlalchemy import create_engine
connection = create_engine(f"mysql+mysqlconnector://{user}:{pw}#{host}/{db}")
tablename = 'TABLENAME'
df.to_sql(tablename, connection, if_exists='append', index=False)
The pandas table is not very large, a few tens of rows at a time.
Every now and then I get this error
sqlalchemy.exc.OperationalError: (mysql.connector.errors.OperationalError) 2055: Lost connection to MySQL server at 'serverip:port', system error: 32 Broken pipe
So far I have been using sqlite3 package for SQLite - do you think it is sqlalchemy or MySQL which is the problem?

Related

Python Pandas MySQL - Why is SQLite so much faster when writing dataframes to a database

I'm developing a website where users import csv files directly to a database and a front end that performs some data analytics on the data once it has been filed in the database. I'm using pandas to convert the csv to a dataframe and to subsequently import that dataframe into the MySQL database:
Import to MySQL database:
engine = create_engine('mysql+mysqlconnector://[username]:[password]#[host]:[port]/[schema]', echo=False)
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql(con= engine, name='data', if_exists='replace')
The problem with this is that for the datasets I work with (5 million rows), the performance is too slow and the action times out without importing the data. However, if I try the same thing except using SQLite3:
import to SQLite3 database:
conn = sqlite3.connect('customer.db')
df = pd.read_csv('C:/Users/[user]/Documents/Sales_Records.csv')
df.to_sql('Sales', conn, if_exists='append', index=False)
mycursor = conn.cursor()
query = 'SELECT * FROM Sales LIMIT 10'
print(mycursor.execute(query).fetchall())
This block of code executes in seconds and imports all 5 million rows of the dataset. So what should I do? I do not anticipate multiple people passing in large datasets all at the same time so I suppose it would not hurt to just ditch MySQL for the clear performance advantages provided by SQLite in this application. It just feels like there's a better way though...
MySQL sends the data to a disk over a network connection.
SQLite3 send the data over a disk directly.
Look at https://gist.github.com/jboner/2841832
You did not mention where the MySQL server is. But even if it was on your local machine, it will pass through a TCP/IP stack whereas SQLite will just write directly to disk.

MySQL server : connection using dask

I have a dataframe which has million of records and while pulling the dataframe in jupyter it takes lot of memory and I am unable to do so as the server get's crashed because there are million's of records in database.
I got to know about DASK package which helps in getting huge dataframe in the python , I am new to dask and not sure how can I set up a connection using dask and mysql server.
I usually make connection with jupyter and mysql server using the following way , I would really appreciate if someone could provide me how to make connection for the same table and server using dask framework.
sql_conn = pyodbc.connect("DSN=CNVDED")
query = "SELECT * FROM Abc table"
df_training = pd.read_sql(query, sql_conn)
data=df_training
I would really appreciate if someone could help me on this and I can't use csv and then use dask need proper connection with mysql server

Pyodbc to SQLAlchemy connection string for Sage 50

I am trying to switch a pyodbc connection to sqlalchemy engine. My working pyodbc connection is:
con = pyodbc.connect('DSN=SageLine50v23;UID=#####;PWD=#####;')
This is what I've tried.
con = create_engine('pyodbc://'+username+':'+password+'#'+url+'/'+db_name+'?driver=SageLine50v23')
I am trying to connect to my Sage 50 accounting data but just can't work out how to build the connection string. This is where I downloaded the odbc driver https://my.sage.co.uk/public/help/askarticle.aspx?articleid=19136.
I got some orginal help for the pyodbc connection using this website (which is working) https://www.cdata.com/kb/tech/sageuk-odbc-python-linux.rst but would like to use SQLAlchemy for it connection with pandas. Any ideas? Assume the issue is with this part pyodbc://
According to this thread Sage 50 uses MySQL to store its data. However, Sage also provides its own ODBC driver which may or may not use the same SQL dialect as MySQL itself.
SQLAlchemy needs to know which SQL dialect to use, so you could try using the mysql+pyodbc://... prefix for your connection URI. If that doesn't work (presumably because "Sage SQL" is too different from "MySQL SQL") then you may want to ask Sage support if they know of a SQLAlchemy dialect for their product.

Python pyodbc for Teradata error 10054 connection reset by peer

I am trying to pull a query using pyodbc and it keeps returning a reset by peer error, I can run the query no issue 100% of the time in Teradata SQL Assistant.
This happens with one other query I use but everything else I have done works using the same code.
I have tried running it multiple times, always works in Teradata SQL Assistant but not in python. Hundreds of other queries have worked with no issue, only two have given issues. I have tried slightly changing the queries with no luck.
I also tried in R with RODBC and it worked there.
I asked our DBA team and they said they have no process that would automatically boot that process and there is no issues with the query.
import pyodbc
import pandas.io.sql as psql
connection_info = 'DSN=xxxxxx'
conn = pyodbc.connect(connection_info)
sql1 = '''
QUERY HERE
'''
df = psql.read_sql_query(sql1, conn)
expect df to = resuls, instead get the following error:
10054 WSA E ConnReset: Connection reset by peer

DateTime import using Pandas/SQLAlchemy

I'm having problems importing datetimes from a SQL Server database into Pandas.
I'm using the following code:
data = pd.read_sql('select top 10 timestamp from mytable',db)
'MyTable' contains a column 'Timestamp', which is of type DateTime2.
If db is a pyodbc database connection this works fine, and my timestamps are returned as data type 'datetime64[ns]'. However if db an SQL Alchemy engine created using create_engine('mssql+pyodbc://...') then the timestamps returned in data are of type 'object' and cause problems later on in my code.
Any idea why this happens? I'm using pandas version 0.14.1, pyodbc version 3.0.7 and SQL alchemy version 0.9.4. How best can I force the data into datetime64[ns]?
Turns out the problem originates from how SQL Alchemy calls PyODBC. By default it will use the 'SQL Server' driver, which doesn't support DateTime2. When I was using PyODBC directly, I was using the 'SQL Server Native Client 10.0' driver.
To get the correct behaviour, i.e. return python datetime objects, I needed to create the SQL Alchemy engine as follows:
import sqlalchemy as sql
connectionString = 'mssql+pyodbc://username:password#my_server/my_database_name?driver=SQL Server Native Client 10.0'
engine = sql.create_engine(connectionString)
The ?driver=... part forces SQL Alchemy to use the right driver.

Categories