I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.
Related
I have a pandas Series made from the following python dictionary, so:
gr8 = {'ERF13' : 'AT2G44840', 'BBX32' : 'AT3G21150', 'NAC061' : 'AT3G44350', 'NAC090' : 'AT5G22380', 'ERF019' : 'AT1G22810'}
gr8obj = pd.Series(gr8)
( where I have previously imported pandas as pd )
I have an SQLite database, AtRegnet.db
I want to iterate over the pandase Series, gr8obj, and query the database, AtRegnet.db, for each member of the series.
This is what I have tried:
for i in gr8obj:
resdf = pd.read_sql('SELECT * FROM AtRegNet WHERE TargetLocus = ?' (i), con=sqlite3.connect("/home/anno/SQLiteDBs/AtRegnet.db"))
fresdf = resdf.append(resdf)
fresdf
( the table in the AtRegnet.db that I want is AtRegNet and the column I am searching on is called TargetLocus. )
I know that when I work on the SQLite3 database directly with a SQL command,
select * from AtRegNet where TargetLocus="AT3G23230"
that I get back 80 lines from the database. (AT3G23230 is one of members of gr8obj)
You can try using a f-string. And the value for TargetLocus in your query should also be in quotes
resdf = pd.read_sql(f'''SELECT * FROM AtRegNet WHERE TargetLocus = \'{i}\'''')
I am new to Python.
I have two SQL Views.
DBOP4 and DBOP4_SELECTION
DBOP4 contains many columns and many rows.
One column of DBOP4 is SaBeNummerDebitoren.
DBOP4_SELECTION:
SELECT SaBeNummerDebitoren AS SBNr, [Sachbearbeiter Debitoren] AS SBName
FROM dbo.DBOP4
GROUP BY SaBeNummerDebitoren, [Sachbearbeiter Debitoren]
I tried to write a python script, that outputs the results of DBOP4 seperated for each existing value in SaBeNummerDebitoren.
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=***;'
'Database=***;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
SQL_SBNR_Selection = pd.read_sql_query('SELECT SBNR FROM DBOP4_SBSELECTION' ,conn)
print(SQL_SBNR_Selection)
#print(type(SQL_SBNR_Selection))
#Sachbearbeiternummer = ('1258','1278','1290')
Sachbearbeiternummer = pd.DataFrame(SQL_SBNR_Selection)
for sachbearbeiternr in Sachbearbeiternummer:
print("Starte " + str(sachbearbeiternr))
sql_query = pd.read_sql_query('SELECT * FROM DBOP4 Where [SaBeNummerDebitoren] =' +str(sachbearbeiternr) ,conn)
print(sql_query)
print(type(sql_query))
df = pd.DataFrame(sql_query)
df.to_excel (r'C:\OP\export_dataframe '+str(sachbearbeiternr)+'.xlsx', sheet_name='DBOP4_' +str(sachbearbeiternr) , index = False, header=True, freeze_panes=(1,5))
print("Fertig")
The output is a follows:
SBNR
0 1258.0
1 1278.0
2 1290.0
Starte SBNR
Debugging Message:
Exception has occurred: DatabaseError
Execution failed on sql 'SELECT * FROM DBOP4 Where [SaBeNummerDebitoren] =SBNR': ('42S22', "[42S22] [Microsoft][ODBC SQL Server Driver][SQL Server]Ungültiger Spaltenname 'SBNR'. (207) (SQLExecDirectW)")
File "C:\AzureDevopsRepos\Python Skripte\PythonApplication1\PythonApplication1.py", line 20, in <module>
sql_query = pd.read_sql_query('SELECT * FROM DBOP4 Where [SaBeNummerDebitoren] =' +str(sachbearbeiternr) ,conn)
Problems:
The for loop does not repeat the excel export for every number in my list ('1258','1278','1290').
When I did fill the Sachbearbeiternummer like this
Sachbearbeiternummer = ('1258','1278','1290')
The script worked.
Problem 1:
The loop starts with the name of the column SBNR instead of the first value.
Problem 2:
The loop does not continue after trying to use SBNR.
If I just do the print("Starte " + str(sachbearbeiternr)) in the for loop, it also stops after SBNR.
I would be great if someone can help me to fix my problem.
Currently, your for-loop (for sachbearbeiternr in Sachbearbeiternummer) iterates across columns of data frame which you then pass into query without quotes enclosing the literal value. That is why the error points to first column name, SNBR, as invalid name.
An immediate fix is to loop across specific column (or series) of data frame and then parameterize the query with value using params argument of read_sql_query. By the way, there is no need to call DataFrame after read_sql_query as docs indicate the return value of method is a DataFrame. Plus, you do not need a cursor for Pandas-SQL operations.
# ITERATE ACROSS COLUMN OR SERIES
for sachbearbeiternr in Sachbearbeiternummer['SBNR']:
print("Starte " + str(sachbearbeiternr))
...
# BIND ITERATOR VALUE AS PARAMETER
sql_query = pd.read_sql_query('SELECT * FROM DBOP4 WHERE [SaBeNummerDebitoren] = ?',
conn, params = [sachbearbeiternr])
With that said, there is no need for a second query or data frame. Simply import the entire view and then run Pandas' groupby() to split data frame by distinct values of SaBeNummerDebitoren. From there, iterate and process each subset.
df_DBOP4 = pd.read_sql_query('SELECT * FROM DBOP4', conn)
# SPLIT DATA FRAME BY COLUMN: i IS SPLIT VALUE, g IS SUBSET DF
for i,g in df_DBOP4.groupby(['SaBeNummerDebitoren']):
print("Starte " + str(i))
print(g.head(10)) # FIRST 10 ROWS
df.to_excel(r'C:\OP\export_dataframe {0}.xlsx'.format(i),
sheet_name='DBOP4_'+str(i), index = False,
header=True, freeze_panes=(1,5))
I'm trying to make a query to a SQLite database from a python script. However, whenever I use parameterization it just returns the first parameter, which is column2. The desired result is for it to return the value held in column2 on the row where column1 is equal to row1.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ? from table WHERE column1 = ? ;', ("column2","row1"))
result = c.fetchone()[0]
print(result)
It prints
>>column2
Whenever I run this using concatenated strings, it works fine.
conn = sqlite3.connect('path/to/database')
c = conn.cursor()
c.execute('SELECT ' + column2 + ' from table WHERE column1 = ' + row1 + ';')
result = c.fetchone()[0]
print(result)
And it prints:
>>desired data
Any idea why this is happening?
This behaves as designed.
The mechanism that parameterized queries provide is meant to pass literal values to the query, not meta information such as column names.
One thing to keep in mind is that the database must be able to parse the parameterized query string without having the parameter at hand: obviously, a column name cannot be used as parameter under such assumption.
For your use case, the only possible solution is to concatenate the column name into the query string, as shown in your second example. If the parameter comes from outside your code, be sure to properly validate it before that (for example, by checking it against a fixed list of values).
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
I am trying to put together a SQL query in python pandas. I have attempted different methods, but always getting the following error:
Incorrect number of bindings supplied. The current statement uses 6, and there are 3 supplied.
My code is as follows. What am I doing wrong?
conn = sqlite3.connect(my_db)
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE #var1 = (?)
AND #var2 = (?)
AND #var3 = (?)
;""".format(table=table),
params= (value1, value2, value3),
)
As you've said, #var1, #var2 and #var3 are all column names.
However, SQL interprets the # symbol as a parameter for a value which you will supply later in your code.
So, your SQL code expects 6 values because of the three (?)s and three #vars. But you're only supplying 3 values (for the (?)s), meaning that the said error is occurring.
I would recommend naming your columns something without '#' so that there is less chance for errors.
See this question for further clarification.
sqlite interprets the # symbol, like ?, as a parameter placeholder (Search for "Parameters"). If #var1 is the name of a column, then it must escaped by surrounding it with backticks:
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE `#var1` = (?)
AND `#var2` = (?)
AND `#var3` = (?)""".format(table=table),
params= (value1, value2, value3), )
I agree with #AdiC, though -- it would be more convenient to rename your columns so that they do not use characters with special meaning.