How to get a CSV string from querying a relational DB? - python

I'm querying a relational Database and I need the result as a CSV string. I can't save it on the disk as is running in a serverless environment (I don't have access to disk).
Any idea?

My solution was using PyGreSQL library and defining this function:
import pg
def get_csv_from_db(query, cols):
"""
Given the SQL #query and the expected #cols,
a string formatted CSV (containing headers) is returned
:param str query:
:param list of str cols:
:return str:
"""
connection = pg.DB(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
header = ','.join(cols) + '\n'
records_list = []
for row in connection.query(query).dictresult():
record = []
for c in cols:
record.append(str(row[c]))
records_list.append(",".join(record))
connection.close()
return header + "\n".join(records_list)
Unfortunately this solution expects the column names in input (which is not too bad IMHO) and iterate over the dictionary result with Python code.
Other solutions (especially out of the box) using other packages are more than welcome.

This is another solution based on PsycoPG and Pandas:
import psycopg2
import pandas as pd
def get_csv_from_db(query):
"""
Given the SQL #query a string formatted CSV (containing headers) is returned
:param str query:
:return str:
"""
conn = psycopg2.connect(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
cur = conn.cursor()
cursor.execute("query")
df = pd.DataFrame(cur.fetchall(), columns=[desc[0] for desc in cur.description])
cur.close()
conn.commit()
return df.to_csv()
I hadn't chance to test it yet though.

here is a different approach from other answers, Using Pandas.
i suppose you have a database connection already,
for example I'm using Oracle database, same can be done by using respective library for your relational db.
only these 2 lines do the trick,
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")
Here is a full example using Oracle database:
dsn = cx_Oracle.makedsn(ip, port,service_name)
con = cx_Oracle.connect("user","password",dsn)
query = """"select * from YOUR_TABLE"""
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")

PyGreSQL's Cursor has method copy_to. It accept as stream file-like object (which must have a write() method). io.StringIO does meet this condition and do not need access to disk, so it should be possible to do:
import io
csv_io = io.StringIO()
# here connect to your DB and get cursor
cursor.copy_to(csv_io, "SELECT * FROM table", format="csv", decode=True)
csv_io.seek(0)
csv_str = csv_io.read()
Explanation: many python modules accept file-like object, meaning you can use io.StringIO() or io.BytesIO() in place of true file-handles. These mimick file opened in text and bytes modes respectively. As with files there is position of reader, so I do seek to begin after usage. Last line does create csv_str which is just plain str. Remember to adjust SQL query to your needs.
Note: I do not tested above code, please try it yourself and write if it works as intended.

Related

Passing a parameter to SQL Server query using read_sql_query

I have a one column list of Member_IDs called Member_ID_only2 similar to below:
Member ID
'123',
'456',
'758',
.
.
.
I'm trying to pass this as a list parms to a read_sql_query. My code sample is below:
'''
'''
import pandas as pd
import pyodbc
conn = pyodbc.connect()
query = open(path + 'XXXXX_PROD.SQL', 'r')
SQL_Query = pd.read_sql_query(query, conn, params=(Member_ID_only2,))
My XXXXX_PROD.SQL has a where statement below:
WHERE MEME_CK IN '%s'
'''
I'm getting the error below:
'''
DatabaseError: Execution failed on sql '<_io.TextIOWrapper
name='C:\Folder\XXXXX_PROD.SQL' mode='r' encoding='cp1252'>': The first argument
''' to execute must be a string or unicode query.
I'm not sure how to fix it. Can someone help?
Thank you for any help that I can get.
The open method opens the file and returns a TextIOWrapper object but does not read the content of the files.
To actually get the content of the file, you need to call the read method on that object, like so:
# Read the sql file
query = open('filename.sql', 'r')
DF = pd.read_sql_query(query.read(),connection,your_params)
By the aid of with open you will be ensure the file is properly closed
with open(filename.sql', 'r') as query:
DF = pd.read_sql_query(query.read(),connection,your_params)

How to implement DBMS_METADATA.GET_DDL in cx_Oracle and python3 and get the ddl of the table?

this is the oracle command i am using :-
query = '''SELECT DBMS_METADATA.GET_DDL('TABLE', 'MY_TABLE', 'MY_SCHEMA') FROM DUAL;'''
cur.execute(query)
now how to get the ddl of the table using cx_Oracle and python3 .
please help . i am unable to extract the ddl.
The following code can be used to fetch the contents of the DDL from dbms_metadata:
import cx_Oracle
conn = cx_Oracle.connect("username/password#hostname/myservice")
cursor = conn.cursor()
def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)
cursor.outputtypehandler = OutputTypeHandler
cursor.execute("select dbms_metadata.get_ddl('TABLE', :tableName) from dual",
tableName="THE_TABLE_NAME")
text, = cursor.fetchone()
print("DDL fetched of length:", len(text))
print(text)
The use of the output type handler is to eliminate the need to process the CLOB. Without it you would need to do str(lob) or lob.read() in order to get at its contents. Either way, however, you are not limited to 4,000 characters.
cx_Oracle has native support for calling PL/SQL.
Assuming you have connection and cursor object, the snippet will look like this:
binds = dict(object_type='TABLE', name='DUAL', schema='SYS')
ddl = cursor.callfunc('DBMS_METADATA.GET_DDL', keywordParameters=binds, returnType=cx_Oracle.CLOB)
print(ddl)
You don't need to split the DDL in 4k/32k chunks, as Python's str doesn't have this limitation. In order to get the DDL in one chunk, just set returnType to cx_Oracle.CLOB. You can later convert it to str by doing str(ddl).

Add list of values to a blob field in firebird using Python

I have a list of items which I like to store in my firebird database.
Thus far I made the following code
Sens=278.3
DSens=1.2
Fc10=3.8
Bw10=60.0
Fc20=4.2
Bw20=90.0
ResultArray = (Sens,DSens,Fc10,Bw10,Fc20,Bw20,t6,t20,Nel,Nsub)
con = fdb.connect(dsn="192.168.0.2:/database/us-database/usdb.gdb", user="sysdba", password="#########")
cur = con.cursor()
InsertStatement="insert into Tosh_Probe (TestResults ) Values (?)"
cur.execute(InsertStatement, (ResultArray,))
con.commit()
In here the TestResult field is blob field in my database.
This gives a TypeError (???)
What is the correct syntax to store these values into a blob
An other option I tried is to write the list of items into a StringIO, and store that in the database. Now a new entry is made in the database but no data is added to the blob field
Here is the code for adding the fields to the StringIO
ResultArray = StringIO.StringIO()
ResultArray.write = Sens
ResultArray.write = DSens
#ResultArray.close #tried with and without this line but with the same result
I've tested this with Python 3.5.1 and FDB 1.6. The following variants of writing all work (into a blob sub_type text):
import fdb
import io
con = fdb.connect(dsn='localhost:testdatabase', user='sysdba', password='masterkey')
cur = con.cursor()
statement = "insert into blob_test2 (text_blob) values (?)"
cur.execute(statement, ("test blob as string",))
cur.execute(statement, (io.StringIO("test blob as StringIO"),))
streamwrites = io.StringIO()
streamwrites.write("streamed write1,")
streamwrites.write("streamed write2,")
streamwrites.seek(0)
cur.execute(statement, (streamwrites,))
con.commit()
con.close()
The major differences with your code in the case of the the writes to StringIO are:
Use of write(...) instead of write = ...
Use of seek(0) to position the stream at the start, otherwise you read nothing, as the stream is positioned after the last write.
I haven't tried binary IO, but I expect that to work in a similar fashion.

Psycopg2 "copy_from" command, possible to ignore delimiter in quote (getting error)?

I am trying to load rows of data into postgres in a csv-like structure using the copy_from command (function to utilize copy command in postgres). My data is delimited with commas(and unfortunately since I am not the data owner I cannot just change the delimiter). I run into a problem when I try to load a row that has a value in quotes containing a comma (ie. that comma should not be treated as a delimiter).
For example this row of data is fine:
",Madrid,SN,,SEN,,,SN,173,157"
This row of data is not fine:
","Dominican, Republic of",MC,,YUO,,,MC,65,162",
Some code:
conn = get_psycopg_conn()
cur = conn.cursor()
_io_buffer.seek(0) #This buffer is holding the csv-like data
cur.copy_from(_io_buffer, str(table_name), sep=',', null='', columns=column_names)
conn.commit()
It looks like copy_from doesn't expose the csv mode or quote options, which are available form the underlying PostgreSQL COPY command. So you'll need to either patch psycopg2 to add them, or use copy_expert.
I haven't tried it, but something like
curs.copy_expert("""COPY mytable FROM STDIN WITH (FORMAT CSV)""", _io_buffer)
might be sufficient.
I had this same error and was able to get close to a fix based on the single line of code listed by craig-ringer. The other item I needed was to include quotes for the initial object by using df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',') and specifically , quoting=csv.QUOTE_NONNUMERIC.
The full example of pulling one data source from MySQL and storing it in Postgres is below:
#run in python 3.6
import MySQLdb
import psycopg2
import os
from io import StringIO
import pandas as pd
import csv
mysql_db = MySQLdb.connect(host="host_address",# your host, usually localhost
user="user_name", # your username
passwd="source_pw", # your password
db="source_db") # name of the data base
postgres_db = psycopg2.connect("host=dest_address dbname=dest_db_name user=dest_user password=dest_pw")
my_list = ['1','2','3','4']
# you must create a Cursor object. It will let you execute all the queries you need
mysql_cur = mysql_db.cursor()
postgres_cur = postgres_db.cursor()
for item in my_list:
# Pull cbi data for each state and write it to postgres
print(item)
mysql_sql = 'select * from my_table t \
where t.important_feature = \'' + item + '\';'
# Do something to create your dataframe here...
df = pd.read_sql_query(mysql_sql, mysql_db)
# Initialize a string buffer
sio = StringIO()
sio.write(df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',')) # Write the Pandas DataFrame as a csv to the buffer
sio.seek(0) # Be sure to reset the position to the start of the stream
# Copy the string buffer to the database, as if it were an actual file
with postgres_db.cursor() as c:
print(c)
c.copy_expert("""COPY schema:new_table FROM STDIN WITH (FORMAT CSV)""", sio)
postgres_db.commit()
mysql_db.close()
postgres_db.close()

MySql: How to know if an entry is compressed or not

I'm working with python and mysql and I want to verify that a certain entry is compressed in the db. Ie:
cur = db.getCursor()
cur.execute('''select compressed_column from table where id=12345''')
res = cur.fetchall()
at this point I would like to verify that the entry is compressed (ie in order to work with the data you would have to use select uncompress(compressed_column)..). Ideas?
COMPRESS() on MySQL uses zlib, therefore you can try the following to see if the string is compressed:
try:
out = s.decode('zlib')
except zlib.error:
out = s

Categories