Getting data from table in database - python

I want to extract data from a postgresql database and use that data (in a dataframe format) in a script. Here's my initial try:
from pandas import DataFrame
import psycopg2
conn = psycopg2.connect(host=host_address, database=name_of_database, user=user_name, password=user_password)
cur = conn.cursor()
cur.execute("SELECT * FROM %s;" % name_of_table)
the_data = cur.fetchall()
colnames = [desc[0] for desc in cur.description]
the_frame = DataFrame(the_data)
the_frame.columns = colnames
cur.close()
conn.close()
Note: I am aware that I should not use "string parameters interpolation (%) to pass variables to a SQL query string", but this works great for me as it is.
Would there be a more direct approach to this?
Edit: Here's what I used from the selected answer:
import pandas as pd
import sqlalchemy as sq
engine = sq.create_engine("postgresql+psycopg2://username:password#host:port/database")
the_frame = pd.read_sql_table(name_of_table, engine)

Pandas can load data from Postgres directly:
import psycopg2
import pandas.io.sql as pdsql
conn = psycopg2.connect(...)
the_frame = pdsql.read_frame("SELECT * FROM %s;" % name_of_table, conn)
If you have a recent pandas (>=0.14), you should use read_sql_query/table (read_frame is deprecated) with an sqlalchemy engine:
import pandas as pd
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine("postgresql+psycopg2://...")
the_frame = pd.read_sql_query("SELECT * FROM %s;" % name_of_table, engine)
the_frame = pd.read_sql_table(name_of_table, engine)

Here is an alternate method:
# run sql code
result = conn.execute(sql)
# Insert to a dataframe
df = DataFrame(data=list(result), columns=result.keys())

Related

Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;':

How can I easily write my pandas dataframe to a MySQL database using mysql.connector?
import mysql.connector as sql
import pandas as pd
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=db_connection, name='table_name', if_exists='replace')
Tried this but it gives me an error that:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement
Does the mysql.connectornot have a df.to_sqlfunction?
These are the col names:
Col names Index(['Person_ID', 'AirTable_ID_Person', 'Person_Name', 'Gender', 'Ethnicity',
'LinkedIn_Link_to_the_Profile_of_Person', 'Jensen_Analyst',
'Data_Source', 'Created_Time', 'Last_Modified_Time', 'Last refresh',
'createdTime', 'Gender_ID', 'Ethnicity_ID', 'Jensen_Analyst_ID',
'Data_Source_ID', 'Position_ID', 'Egnyte_File', 'Comment', 'Move',
'Right_Move', 'Bio-Import-Assistant', 'Diversity'],
dtype='object')
Pandas requires an SQLAlchemy engine to write data to sql. You can take up the following two approaches, the first being writing with a connector execure and the second using the engine with a pandas.to_sql statement.
It works very similar to your pandas read function.
import pandas as pd
import mysql.connector as sql
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df_temp = df[['Person_Name', 'Person_ID']]
query_insert = 'insert into table_name(Person_Name) values %s where Person_ID = %s'
pars = df_temp.values.tolist()
pars = list(map(tuple, pars))
cursor = db_connection.cursor()
cursor.executemany(query, pars)
cursor.commit()
cursor.close()
Or you can establish an engine for uploading.
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector as sql
# engine = create_engine('mysql+pymysql://username:password#host/database')
# or in your case-
engine = create_engine('mysql+pymysql://user:pw#124685.eu-central-1.rds.amazonaws.com/db_name')
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=engine, name='table_name', if_exists='replace')
For this method be sure to install pymysql before running with pip install pymysql and you should be good to go.

Convert SQL query output into pandas dataframe

I have been looking since yesterday about the way I could convert the output of an SQL Query into a Pandas dataframe.
For example a code that does this :
data = select * from table
I've tried so many codes I've found on the internet but nothing seems to work.
Note that my database is stored in Azure DataBricks and I can only access the table using its URL.
Thank you so much !
Hope this would help you out. Both insertion & selection are in this code for reference.
def db_insert_user_level_info(table_name):
#Call Your DF Here , as an argument in the function or pass directly
df=df_parameter
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=DESKTOP-ITAJUJ2;DATABASE=githubAnalytics")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
table_row_count=select_row_count(table_name)
df_row_count=df.shape[0]
if table_row_count == df_row_count:
print("Data Cannot Be Inserted Because The Row Count is Same")
else:
df.to_sql(name=table_name,con=engine, index=False, if_exists='append')
print("********************************** DONE EXECTUTED SUCCESSFULLY ***************************************************")
def select_row_count(table_name):
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DESKTOP-ITAJUJ2;"
"Database=githubAnalytics;"
"Trusted_Connection=yes;")
cur = cnxn.cursor()
try:
db_cmd = "SELECT count(*) FROM "+table_name
res = cur.execute(db_cmd)
# Do something with your result set, for example print out all the results:
for x in res:
return x[0]
except:
print("Table is not Available , Please Wait...")
Using sqlalchemy to connect to the database, and the built-in method read_sql_query from pandas to go straight to a DataFrame:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(url)
connection = engine.connect()
query = "SELECT * FROM table"
df = pd.read_sql_query(query,connection)

Convert list[adodbapi.apibase.SQLrow] to pd.DataFrame

Among sql-server connectors adodbapi is the only one that's working in my environment.
import adodbapi
conn = adodbapi.connect("PROVIDER=SQLOLEDB;Data Source={0};Database={1}; \
UID={2};PWD={3};".format(server,db,user,pwd))
cursor = conn.cursor()
query_list = [row for row in cursor]
type(query_list[0]) = adodbapi.apibase.SQLrow
How to convert this list into a pandas df?
Thanks
This is how I did it:
import adodbapi as ado
import numpy as np
import pandas as pd
def get_df(data):
ar = np.array(data.ado_results) # turn ado results into a numpy array
df = pd.DataFrame(ar).transpose() # create a dataframe from the array
df.columns = data.columnNames.keys() # set column names
return df
with ado.connect('yourconnectionstring') as con:
with con.cursor() as cur:
sql_str = 'yourquery'
cur.execute(sql_str)
data = cur.fetchall()
df = get_df(data)
This may help:
import pandas as pd
.......
ur_statements
.......
query_list = [row for row in cursor]
df = pd.DataFrame({'col':query_list })
print (df)
Consider pandas' read_sql to directly query the database. Currently, though you will recieve an error:
KeyError: '_typ'
However, there is a working fix thanks to #TomAubrunner on this Github ticket which appears to be a bug in adodbapi.
Find location of adodpapi: print(adodbapi.__file__)
Open the script in folder: apibase.py
Locate: return self._getValue(self.rows.columnNames[name.lower()]) and replace with below try/execpt block:
try:
return self._getValue(self.rows.columnNames[name.lower()])
except:
return False
Once done, simply run as you would any DB-API pandas connection even with qmark parameters:
import pandas as pd
import adodbapi
conn = adodbapi.connect("PROVIDER=SQLOLEDB;Data Source={0};Database={1}; \
UID={2};PWD={3};".format(server,db,user,pwd))
# WITHOUT PARAMS
df = pd.read_sql("SELECT * FROM myTable", conn)
# WITH PARAMS
df = pd.read_sql("SELECT * FROM myTable WHERE [Col]= ?", conn, params=['myValue'])
conn.close()

How to store mySQL query result into pandas DataFrame with pymysql?

I'm trying to store a mySQL query result in a pandas DataFrame using pymysql and am running into errors building the dataframe. Found a similar question here and here, but it looks like there are pymysql-specific errors being thrown:
import pandas as pd
import datetime
import pymysql
# dummy values
connection = pymysql.connect(user='username', password='password', databse='database_name', host='host')
start_date = datetime.datetime(2017,11,15)
end_date = datetime.datetime(2017,11,16)
try:
with connection.cursor() as cursor:
query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"
cursor.execute(query, (start_date, end_date))
df = pd.DataFrame(data=cursor.fetchall(), index = None, columns = cursor.keys())
finally:
connection.close()
returns: AttributeError: 'Cursor' object has no attribute 'keys'
If I drop the index and columns arguments:
try:
with connection.cursor() as cursor:
query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"
cursor.execute(query, (start_date, end_date))
df = pd.DataFrame(cursor.fetchall())
finally:
connection.close()
returns ValueError: DataFrame constructor not properly called!
Thanks in advance!
Use Pandas.read_sql() for this:
query = "SELECT * FROM orders WHERE date_time BETWEEN ? AND ?"
df = pd.read_sql(query, connection, params=(start_date, end_date))
Thank you for your suggestion to use pandas.read_sql(). It works with executing a stored procedure as well! I tested it in MSSQL 2017 environment.
Below is an example (I hope it helps others):
def database_query_to_df(connection, stored_proc, start_date, end_date):
# Define a query
query ="SET NOCOUNT ON; EXEC " + stored_proc + " ?, ? " + "; SET NOCOUNT OFF"
# Pass the parameters to the query, execute it, and store the results in a data frame
df = pd.read_sql(query, connection, params=(start_date, end_date))
return df
Try This:
import pandas as pd
import pymysql
mysql_connection = pymysql.connect(host='localhost', user='root', password='', db='test', charset='utf8')
sql = "SELECT * FROM `brands`"
df = pd.read_sql(sql, mysql_connection, index_col='brand_id')
print(df)

Cannot drop table in pandas to_sql using SQLAlchemy

I'm trying to drop an existing table, do a query and then recreate the table using the pandas to_sql function. This query works in pgadmin, but not here. Any ideas of if this is a pandas bug or if my code is wrong?
Specific error is ValueError: Table 'a' already exists.
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
drop table a;
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', engine)
conn.close()
Why are you doing this like that? There is a shorter way: the if_exists kwag in to_sql. Try this:
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
# Notice how below line is different. You forgot the schema argument
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
conn.close()
According to docs:
replace: If table exists, drop it, recreate it, and insert data.
Ps. Additional tip:
This is better way to handle the connection:
with engine.connect() as conn, conn.begin():
sql = """select * from some_table limit 1"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
Because it ensures that your connection is always closed, even if your program exits with an error. This is important to prevent data corruption. Further, I would just use this:
import pandas as pd
...
pd.read_sql(sql, conn)
instead of the way you are doing it.
So, if I was in your place writing that code, it would look like this:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
with engine.connect() as conn, conn.begin():
df = pd.read_sql('select * from some_table limit 1', con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')

Categories