I want to convert lot of database table into dataframe. So I tried this step manually.
sql_query = pd.read_sql_query ('''
SELECT
*
FROM attendance
''', test_db_engine)
test_db_attendance_df=pd.DataFrame(sql_query)
Where 'test_db_engine' is database connection.
This method work and I can create dataframe for table attendance.
Now, I want to put this into function so I can do with any table not just one. So I tried this method.
def sql_to_df(table_name):
sql_query = pd.read_sql_query ('''
SELECT
*
FROM table_name
''', test_db_engine)
test_db_df=pd.DataFrame(sql_query)
return test_db_df
sql_to_df(attendance)
It threw me an error:-
name 'attendance' is not defined
Can anyone tell me how to pass function argument through sql query so I can convert any number of database table into pandas dataframe? I need to pass attendance inside sql query replacing table_name.
python thinks that attendance is a variable , but you need to pass a string to the function and then use string replacement
def sql_to_df(table_name):
sql_query = pd.read_sql_query ('''
SELECT
*
FROM %s
''' % (table_name), test_db_engine)
test_db_df=pd.DataFrame(sql_query)
return test_db_df
sql_to_df('attendance')
use f-strings to format your query.
and pass attendance as a string (your error occurred because no variable attendance was set).
read_sql_query returns a dataframe.
def sql_to_df(table_name):
return pd.read_sql_query(f'''
SELECT
*
FROM {table_name}
''', test_db_engine)
sql_to_df("attendance")
Related
I'm trying to debug a SQL statement generated with sqlite3 python module...
c.execute("SELECT * FROM %s WHERE :column = :value" % Photo.DB_TABLE_NAME, {"column": column, "value": value})
It is returning no rows when I do a fetchall()
When I run this directly on the database
SELECT * FROM photos WHERE album_id = 10
I get the expected results.
Is there a way to see the constructed query to see what the issue is?
To actually answer your question, you can use the set_trace_callback of the connection object to attach the print function; this will make all queries get printed when they are executed. Here is an example in action:
# Import and connect to database
import sqlite3
conn = sqlite3.connect('example.db')
# This attaches the tracer
conn.set_trace_callback(print)
# Get the cursor, execute some statement as an example
c = conn.cursor()
c.execute("CREATE TABLE stocks (symbol text)")
t = ('RHAT',)
c.execute("INSERT INTO stocks VALUES (?)", t)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print(c.fetchone())
This produces the output:
CREATE TABLE stocks (symbol text)
BEGIN
INSERT INTO stocks VALUES ('RHAT')
SELECT * FROM stocks WHERE symbol='RHAT'
('RHAT',)
the problem here is that the string values are automatically embraced with single quotes. You can not dynamically insert column names that way.
Concerning your question, I'm not sure about sqlite3, but in MySQLdb you can get the final query as something like (I am currently not at a computer to check):
statement % conn.literal(query_params)
You can only use substitution parameters for row values, not column or table names.
Thus, the :column in SELECT * FROM %s WHERE :column = :value is not allowed.
I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
I wanted to perform an operation where i would like to delete all the rows(but not to drop the table) in postgres and update with new rows in it. And I wanted to use pd.read_sql_query() method from pandas:
qry = 'delete from "table_name"'
pd.read_sql_query(qry, conection, **kwargs)
But it was throwing error 'ResourceClosedError: This result object does not return rows. It has been closed automatically.'
I can expect this because the method should return the empty dataframe.But it was not returning any empty dataframe but only the the above error. Could you please help me in resolving it??
I use MySql, but the logic is the same:
Query 1: Choose all ids from you table
Quear 2: Delete all this ids
As a result you have:
Delete FROM table_name WHERE id IN (Select id FROM table_name)
The line do not return anuthing, it just delete all rows with a special id. I recomend to do the command using psycopg only - no pandas.
Then you need another query to get smth from db like:
pd.read_sql_query("SELECT * FROM table_name", conection, **kwargs)
Probably (I do not use pandas to read from db) in this case you'll get empty dataframe with Column names
Probably you can combine all the actions, the following way:
pd.read_sql_query('''Delete FROM table_name WHERE id IN (Select id FROM table_name); SELECT * FROM table_name''', conection, **kwargs)
Please try and share your results.
You can follow the next steps!
Check 'row existence' first in the table.
And then delete rows
Example code
check_row_query = "select exists(select * from tbl_name limit 1)"
check_exist = pd.read_sql_query(check_row_query, con)
if check_exist.exists[0]:
delete_query = 'DELETE FROM tbl_name WHERE condtion(s)'
con.execute(delete_query) # to delete rows using a sqlalchemy function
print('Delete all rows!)
else:
pass
sql="select %s,tablename from pg_table_def where tablename like (%s)"
data=("schemaname","abc",)
cur.execute(sql,data)
If I pass a value as described above, then the select takes it as a string.
Which is not the intention.
If I try
data=(schemaname,"abc",)
then it shows the error global name 'schemaname' is not defined.
You cannot parameterize object name (in this case, a column name) that way. You could instead resort to string manipulation:
column = "schemaname"
sql = "select {}, tablename from pg_table_def where tablename like (%s)".format(column)
data= ("abc",)
cur.execute(sql,data)