mySQLdb connection Returns a Truncated Output - python

I'm trying to connect to a sql server remotely that runs a store procedure and returns a huge file as an output.
When I run the file locally on the SQLbox its fine and returns ~800,000 rows as expected, but when I try to run it using the mySQLdb library from python, it receives a truncated output of only ~6000 rows.
It runs fine for smaller data, so I'm guessing there's some result limit that's coming into play.
I'm sure there's some property that needs to be changed somewhere but there doesn't seem to be any documentation on the pypi library regarding the same.
For explanatory purposes, I've included my code below:
import MySQLdb
import pandas as pd
connection = MySQLdb.connect(sql_server,sql_admin,sql_pw,sql_db)
sql_command = """call function(4)"""
return pd.read_sql(sql_command, connection)

I was able to solve this using cursors. The approach I took is shown below and hopefully should help anyone else.
connection = MySQLdb.connect (host = sql_server, user = sql_admin, passwd = sql_pw, db = sql_db)
cursor = connection.cursor ()
cursor.execute("""call function(4)""")
data = cursor.fetchall()
frame = []
for row in data:
frame.append(row)
return pd.DataFrame(frame)
cursor.close ()
# close the connection
connection.close ()

Related

Project involving an API, a database and a visualisation package

Firstly this is my first post on StackOverflow and so if I haven't structured my post properly, please let me know. Basically, I'm new to Python but I've been trying to connect an API to Python, from Python to a database that is hosted online, and then finally into a visualization package. I'm running into some problems when inserting the API data (Sheffield Solar) from Python into my database. The data does actually upload to the database but I'm struggling with an error message that I get in Python.
from datetime import datetime, date
import pytz
import psycopg2
import sqlalchemy
from pandas import DataFrame
from pvlive_api import PVLive
from sqlalchemy import create_engine, Integer, String, DATETIME, FLOAT
def insert_data():
""" Connect to the PostgreSQL database server """
# Calling the class from the pvlive_api.py file
data = PVLive()
# Gets the data between the two dates from the API and converts the output into a dataframe
dl = data.between(datetime(2019, 4, 5, 10, 30, tzinfo=pytz.utc),
datetime(2020, 4, 5, 14, 0, tzinfo=pytz.utc), entity_type="pes",
entity_id=0, dataframe=True)
# sql is used to insert the API data into the database table
sql = """INSERT INTO sheffield (pes_id, datetime_gmt, generation_mw) VALUES (%s, %s, %s)"""
uri = "Redacted"
print('Connecting to the PostgreSQL database...')
engine = create_engine(
'postgresql+psycopg2://Redacted')
# connect to the PostgreSQL server
conn = psycopg2.connect(uri)
# create a cursor that allows python code to execute Postgresql commands
cur = conn.cursor()
# Converts the data from a dataframe to an sql readable format, it also appends new data to the table, also
# prevents the index from being included in the table
into_db = dl.to_sql('sheffield', engine, if_exists='append', index=False)
cur.execute(sql, into_db)
# Commits any changes to ensure they actually happen
conn.commit()
# close the communication with the PostgreSQL
cur.close()
def main():
insert_data()
if __name__ == "__main__":
main()
The error I'm getting is as follows:
psycopg2.errors.SyntaxError: syntax error at or near "%"
LINE 1: ...eld (pes_id, datetime_gmt, generation_mw) VALUES (%s, %s, %s...
with the ^ pointing at the first %s. I'm assuming that the issue is due to me using into_db as my second argument in cur.execute(), however, as I mentioned earlier the data still uploads into my database. As I mentioned earlier I'm very new to Python and therefore it could be an easily solvable issue that I've overlooked. I've also redacted some personal connection information from the code. Any help would be appreciated, thanks.
You are getting such error because trying to execute query without any values for inserting.
If you read the doc about dl.to_sql before using of it you would know that this method writes records to database and returns None.
So, there is no needed trying to construct own sql query for inserting data.

ORA-00936: missing expression when using pyodbc to extract specific data from a SQL server

Unable to understand why my sql query is throwing an exception of [Oracle][ODBC][Ora]ORA-00936: missing expression.
The case is that the code seems to be working fine when I'm using
select* from reports.ORDERS_NOW.
So it's letting me pull all the data, but for my case, I want only specific columns for which I'm writing the query. Please look at the code below and let me know what's wrong with it.
import pyodbc
import pandas as pd
conn = conn = pyodbc.connect('DSN=abcd;UID=xxxxxx;PWD=xxxxxx')
if conn:
print("Connection is successful")
db query
sql = '''
select [QUANTITY] from reports.ORDERS_NOW
'''
df = pd.read_sql(sql,conn)
i think [] is not allowed in oracle so remove it
select QUANTITY from reports.ORDERS_NOW

How do I get Data to 'commit' from Python to SQL Server?

I have a localhost SQL Server running and am able to connect to it successfully. However, I am running into the problem of data not transfering over from temp csv files. Using Python import pyodbc for Server connection.
I've tried with Python Import pymssql but have had worse results so I've stuck with pyodbc. I've also tried closing the cursor each time or just at the end but not to any luck.
Here is a piece of code that I am using. Towards the bottom are two different csv styles. One is a temp in which is used to fill the SQL Server table. The other is for my personal use to make sure I am actually gathering information at the moment but, in the long term will be removed so only the temp csv is used.
#_retry(max_retry=1, timeout=1)
def blocked_outbound_utm_scada():
# OTHER CODE EXISTS HERE!!!
# GET Search Results and add to a temp CSV File then send to MS SQL Server
service_search_results_str = '/services/search/jobs/%s/results?output_mode=csv&count=0' % sid
search_results = (_service.request(_host + service_search_results_str, 'GET',
headers={'Authorization': 'Splunk %s' % session_key},
body={})[1]).decode('utf-8')
with tempfile.NamedTemporaryFile(mode='w+t', suffix='.csv', delete=False) as temp_csv:
temp_csv.writelines(search_results)
temp_csv.close()
try:
cursor.execute("BULK INSERT Blocked_Outbound_UTM_Scada FROM '%s' WITH ("
"FIELDTERMINATOR='\t', ROWTERMINATOR='\n', FirstRow = 2);" % temp_csv.name)
conn.commit()
except pyodbc.ProgrammingError:
cursor.execute("CREATE TABLE Blocked_Outbound_UTM_Scada ("
"Date_Time varchar(25),"
"Src_IP varchar(225),"
"Desktop_IP varchar(225));")
conn.commit()
finally:
cursor.execute("BULK INSERT Blocked_Outbound_UTM_Scada FROM '%s' WITH ("
"FIELDTERMINATOR='\t', ROWTERMINATOR='\n', FirstRow = 2);" % temp_csv.name)
conn.commit()
os.remove(temp_csv.name)
with open(_global_path + '/blocked_outbound_utm_scada.csv', 'a', newline='') as w:
w.write(search_results)
w.close()
I'm just trying to get the information into SQL Server but the code seems to be ignoring cursor.commit(). Any help is appreciated in figuring out what is wrong.
Thanks in Advance!
Try it without the conn.commit .
I do not understand why or how does it work but it seems, as well to me, that pyodbc ignores the commit clause.
Try change autocommit parameter in pymssql.connect()
conn = pymssql.connect(host=my_host, user=my_user, password=my_password, database=my_database, autocommit=True)
conn = pymssql.connect(host=my_host, user=my_user, password=my_password, database=my_database, autocommit=False)

How to query a (Postgres) RDS DB through an AWS Jupyter Notebook?

I'm trying to query an RDS (Postgres) database through Python, more specifically a Jupyter Notebook. Overall, what I've been trying for now is:
import boto3
client = boto3.client('rds-data')
response = client.execute_sql(
awsSecretStoreArn='string',
database='string',
dbClusterOrInstanceArn='string',
schema='string',
sqlStatements='string'
)
The error I've been receiving is:
BadRequestException: An error occurred (BadRequestException) when calling the ExecuteSql operation: ERROR: invalid cluster id: arn:aws:rds:us-east-1:839600708595:db:zprime
In the end, it was much simpler than I thought, nothing fancy or specific. It was basically a solution I had used before when accessing one of my local DBs. Simply import a specific library for your database type (Postgres, MySQL, etc) and then connect to it in order to execute queries through python.
I don't know if it will be the best solution since making queries through python will probably be much slower than doing them directly, but it's what works for now.
import psycopg2
conn = psycopg2.connect(database = 'database_name',
user = 'user',
password = 'password',
host = 'host',
port = 'port')
cur = conn.cursor()
cur.execute('''
SELECT *
FROM table;
''')
cur.fetchall()

Why does mysql connector break ("Lost connection to MySQL server during query" error)

When I run large queries (queries returning many rows), I get the Lost connection to MySQL server during query error, and I cannot see what I do wrong. I use the "new" mysql driver from mysql.com (not the "old" MySQLdb), and the mysql version that is bundled with MAMP. Python 2.7. Table is not corrupted, analyze table nrk2013b_tbl; returns status ok. Here's an example that breaks:
#!/usr/bin/python2.7
# coding: utf-8
import sys
import mysql.connector # version 2.0.1
connection = mysql.connector.connect(
unix_socket="/Applications/MAMP/tmp/mysql/mysql.sock",
user="dbUsernam",
passwd="dbUserPassword",
db="nrk",
charset = "utf8",
use_unicode = True)
cur = connection.cursor()
cur.execute("USE nrk;")
sql = """SELECT id FROM nrk2013b_tbl WHERE main_news_category = 'Sport'"""
cur.execute(sql)
rows = cur.fetchall()
print rows
sys.exit(0)
This results in the error I get most of the time:
Traceback (most recent call last):
File "train_trainer_test.py", line 20, in <module>
remaining_rows = cur.fetchall()
File "/Library/Python/2.7/site-packages/mysql/connector/cursor.py", line 823, in fetchall
(rows, eof) = self._connection.get_rows()
File "/Library/Python/2.7/site-packages/mysql/connector/connection.py", line 669, in get_rows
rows = self._protocol.read_text_result(self._socket, count)
File "/Library/Python/2.7/site-packages/mysql/connector/protocol.py", line 309, in read_text_result
packet = sock.recv()
File "/Library/Python/2.7/site-packages/mysql/connector/network.py", line 226, in recv_plain
raise errors.InterfaceError(errno=2013)
mysql.connector.errors.InterfaceError: 2013: Lost connection to MySQL server during query
Line 20 is the rows = cur.fetchall()
If I limit the query to result fewer result SELECT id FROM nrk2013b_tbl WHERE main_news_category = 'Sport' LIMIT 10 all is well. But I do want to work with larger result sets. For some ad-hoc problem solving I have moved the limit and broken down the data I wanted into smaller batches, but this keeps popping up as a problem.
In order to take connect-timeout, and max_allowed_packet, etc into account, I have this my.cnf-file: File: /Applications/MAMP/conf/my.cnf
[mysqld]
max_allowed_packet = 64M
wait_timeout = 28800
interactive_timeout = 28800
connect-timeout=31536000
This does not seem to make any difference (I'm not even sure if mysql recognises these settings). When I run queries from the terminal or from Sequel Pro, it works fine. It is only through the python mysql.connector I get these errors.
Any ideas?
PS: I've temporarily given this up, and changed to PyMySQL instead of of the Oracle mysql.connector. By changing to this, the problems seems to disappear (and I conclude for myself that the problem is in the oracle mysql connector).
import pymysql
conn = pymysql.connect(
unix_socket="/Applications/MAMP/tmp/mysql/mysql.sock",
user="dbUsernam",
passwd="dbUserPassword",
db="nrk",
charset = "utf8",
use_unicode = True)
conn.autocommit(True)
cur = conn.cursor()
I also had to switch to PyMySQL. I am running pip 1.5.6, Python 2.7.8, and tried mysql-connector 2.0.1
I was able to run the query from within Sequel Pro with no problems, but my Python query would fail with the error described in the question after returning just a subset of results.
Switched to PyMySQL and things work as expected.
https://github.com/PyMySQL/PyMySQL
In the virtualenv:
pip install pymysql
In the code:
import pymysql
connection = pymysql.connect(user='x', passwd='x',
host='x',
database='x')
cursor = connection.cursor()
query = ("MYQUERY")
cursor.execute(query)
for item in cursor:
print item
Definitely a bug in mysql-connector-python.
Try increasing your net_read_timeout (probably a default value of 30secs is too small in your scenario)
Ref:
net_read_timeout
and in general:
B.5.2.3 Lost connection to MySQL server
I encountered similar problems too. In my case it was solved by getting the cursor in this way:
cur = connection.cursor(buffered=True)
Looks like a bug in MySQL Connector/Python: http://bugs.mysql.com/bug.php?id=74483
Should be fixed in 2.0.3, which is not yet released.
Expanding on Christian's answer. Timeout for read queries (select) are set by net_write_timeout. It is a "write" from the perspective of the server.

Categories