I am using python library IBM_DB with which I am able to establish connection and read tables into dataframes.
The problem comes when writing into a DB2 table (INSERT query) from a dataframe source in python.
Below is sample code for connection but can someone help me how to insert all records from a dataframe into the target table in DB2 ?
import pandas as pd
import ibm_db
ibm_db_conn = ibm_db.connect("DATABASE="+"database_name"+";HOSTNAME="+"localhost"+";PORT="+"50000"+";PROTOCOL=TCPIP;UID="+"db2user"+";PWD="+"password#123"+";", "","")
import ibm_db_dbi
conn = ibm_db_dbi.Connection(ibm_db_conn)
df=pd.read_sql("SELECT * FROM SCHEMA1.TEST_TABLE",conn)
print df
I am also able to insert a record manually if given SQL syntax with hard coded values :
query = "INSERT INTO SCHEMA1.TEST_TABLE (Col1, Col2, Col3) VALUES('A', 'B', 0)"
print query
stmt = ibm_db.exec_immediate(ibm_db_conn, query)
print stmt
What I am unable to achieve is to insert from a dataframe and append it to the table.
I've tried DATAFRAME.to_SQL() as well but it errors out with the following :
df.to_sql(name='TEST_TABLE', con=conn, flavor=None, schema='SCHEMA1', if_exists='append', index=True, index_label=None, chunksize=None, dtype=None)
This errors out saying :
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ibm_db_dbi::ProgrammingError: SQLNumResultCols failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0204N "SCHEMA1.SQLITE_MASTER" is an undefined name. SQLSTATE=42704 SQLCODE=-204
You can write a pandas data frame into ibm db2 using ibm_db.execute_many().
subset = df[['col1','col2', 'col3']]
tuple_of_tuples = tuple([tuple(x) for x in subset.values])
sql = "INSERT INTO Schema.Table VALUES(?,?,?)"
cnn = ibm_db.connect("DATABASE=database;HOSTNAME=127.0.0.1;PORT=50000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
stmt = ibm_db.prepare(cnn, sql)
ibm_db.execute_many(stmt, tuple_of_tuples)
Related
How can I easily write my pandas dataframe to a MySQL database using mysql.connector?
import mysql.connector as sql
import pandas as pd
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=db_connection, name='table_name', if_exists='replace')
Tried this but it gives me an error that:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement
Does the mysql.connectornot have a df.to_sqlfunction?
These are the col names:
Col names Index(['Person_ID', 'AirTable_ID_Person', 'Person_Name', 'Gender', 'Ethnicity',
'LinkedIn_Link_to_the_Profile_of_Person', 'Jensen_Analyst',
'Data_Source', 'Created_Time', 'Last_Modified_Time', 'Last refresh',
'createdTime', 'Gender_ID', 'Ethnicity_ID', 'Jensen_Analyst_ID',
'Data_Source_ID', 'Position_ID', 'Egnyte_File', 'Comment', 'Move',
'Right_Move', 'Bio-Import-Assistant', 'Diversity'],
dtype='object')
Pandas requires an SQLAlchemy engine to write data to sql. You can take up the following two approaches, the first being writing with a connector execure and the second using the engine with a pandas.to_sql statement.
It works very similar to your pandas read function.
import pandas as pd
import mysql.connector as sql
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df_temp = df[['Person_Name', 'Person_ID']]
query_insert = 'insert into table_name(Person_Name) values %s where Person_ID = %s'
pars = df_temp.values.tolist()
pars = list(map(tuple, pars))
cursor = db_connection.cursor()
cursor.executemany(query, pars)
cursor.commit()
cursor.close()
Or you can establish an engine for uploading.
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector as sql
# engine = create_engine('mysql+pymysql://username:password#host/database')
# or in your case-
engine = create_engine('mysql+pymysql://user:pw#124685.eu-central-1.rds.amazonaws.com/db_name')
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=engine, name='table_name', if_exists='replace')
For this method be sure to install pymysql before running with pip install pymysql and you should be good to go.
I'm getting an unexpected error when using sqlite3 in python with pandas. I'm using a sqlite database for an analysis I'm doing, so it's single-user, single-computer. I'm in Python 3.9.1, with sqlite 3.33.0 and pandas 1.2.1.
The short description is that I'm trying to loop over rows of Table1, and for each row, insert data into Table2 based on an API request using an ID stored in Table1. The API gets me a lot more columns than I need for Table2, so I do the following to insert it into a new temporary table, then copy over the columns I need into Table1:
my_dataframe.to_sql("tmp", conn, if_exists="replace", index=False)
cur.execute("INSERT INTO Table1 (col1, col2) SELECT col1, col2 FROM Table2")
The problem is, on the second iteration of the loop, I get an error when pandas tries to drop the tmp table. Here is the full code:
def get_data(api_id, conn):
my_dataframe = call_to_api(api_id)
my_dataframe.to_sql("tmp", conn, if_exists="replace", index=False)
cur.execute("INSERT INTO Table1 (col1, col2) SELECT col1, col2 FROM Table2")
for chunk in pd.read_sql_query("SELECT id_for_api FROM Table1", conn, chunksize=10):
ids = chunk["id_for_api"].values
for api_id in ids:
get_data(api_id, conn)
The error I get is:
DatabaseError: Execution failed on sql 'DROP TABLE "tmp"': database table is locked
which is raised by this line:
pd.DataFrame(data).to_sql("tmp", conn, if_exists="replace", index=False)
I've tried everything I could think of to fix this:
changing the connection to be isolation_level=None (autocommit)
adding conn.commit() after the INSERT statement
creating a new cursor within the get_data function (cur = conn.cursor())
creating a new connection for use in the outer loop with read_sql_query (conn2 = sqlite3.connect('mydb.db'))
What am I missing? Is there something about sqlite isolation levels or locking that I don't understand?
When you make your connection, set autocommit=True
#contextlib.contextmanager
def database_connect():
db_conn = pyodbc.connect(
autocommit=True, # needed to prevent locks in DB with SPs
)
try:
yield db_conn
finally:
db_conn.close()
...
with database_connect() as db_conn:
df = pd.read_sql_query(
f"EXEC {sp_table}.{sp_name} " + ",".join(f"#{a}=?" for a in kwargs.keys()),
db_conn,
params=kwargs.values()
)
I have got a DataFrame which has got around 30,000+ rows and 150+ columns. So, currently I am using the following code to insert the data into MySQL. But since it is reading the rows one at a time, it is taking too much time to insert all the rows into MySql.
Is there any way in which I can insert the rows all at once or in batches? The constraint here is that I need to use only PyMySQL, I cannot install any other library.
import pymysql
import pandas as pd
# Create dataframe
data = pd.DataFrame({
'book_id':[12345, 12346, 12347],
'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
'price':[29, 23, 27]
})
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='book')
# create cursor
cursor=connection.cursor()
# creating column list for insertion
cols = "`,`".join([str(i) for i in data.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in data.iterrows():
sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()
# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)
connection.close()
Thank You.
Try using SQLALCHEMY to create an Engine than you can use later with pandas df.to_sql function. This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor.
Your code would look something like this:
import pymysql
import pandas as pd
from sqlalchemy import create_engine
# Create dataframe
data = pd.DataFrame({
'book_id':[12345, 12346, 12347],
'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
'price':[29, 23, 27]
})
db_data = 'mysql+mysqldb://' + 'root' + ':' + '12345' + '#' + 'localhost' + ':3306/' \
+ 'book' + '?charset=utf8mb4'
engine = create_engine(db_data)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='book')
# create cursor
cursor=connection.cursor()
# Execute the to_sql for writting DF into SQL
data.to_sql('book_details', engine, if_exists='append', index=False)
# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)
engine.dispose()
connection.close()
You can take a look to all the options this function has in pandas doc
It is faster to push a file to the SQL server and let the server manage the input.
So first push the data to a CSV file.
data.to_csv("import-data.csv", header=False, index=False, quoting=2, na_rep="\\N")
And then load it at once into the SQL table.
sql = "LOAD DATA LOCAL INFILE \'import-data.csv\' \
INTO TABLE book_details FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' \
(`" +cols + "`)"
cursor.execute(sql)
Possible improvements.
remove or disable indexes on the table(s)
Take the commit out of the loop
Now try and load the data.
Generate a CSV file and load using ** LOAD DATA INFILE ** - this would be issued from within mysql.
I am looking to work in python with a table that I have in SQL. I want to store the entire table in a matrix called 'mat' and then get the output after the python code so I can read the table with SQL again. This is how I started:
import pyodbc
import pandas as pd
server = 'myserver'
database = 'mydatabase'
username = 'myuser'
password = 'mypassword'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
******Python code*******
mat=pd.read_sql('select * from mytable order by time' , con = cnxn)
How should I read the table to store it in mat and then how do I send it back to SQL?
You have already read the data into a DataFrame. If you want to convert a dataframe to a matrix, do mat.values. If you want to write the data to a sql table, you will have to create a cursor and use it to insert the data.
cursor = cnxn.cursor()
cursor.execute(''' INSERT INTO myTable (FirstName, LastName) VALUES ('Wilsamson', 'Shiphrah') ''')
If you have multiple values, you should use the executemany command;
values = list(zip(mat['FirstName'].values.tolist(), mat['LastName'].values.tolist()))
cursor.executemany('''INSERT INTO myTable (FirstName, LastName) VALUES (?, ?)''', values);
At the end of the INSERT statement, you will need to commit the inserts before closing your cursor and connection.
cursor.commit()
cursor.close()
cnxn.close()
If you want to convert
This is how I do it.
import mysql.connector
import pandas as pd
import numpy as np
# use this to display ALL columns...useful, but definitely not required
pd.set_option('display.max_columns', None)
mydb = mysql.connector.connect(
host="localhost",
user="duser_name",
passwd="pswd",
database="db_naem"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM YourTable")
myresult = mycursor.fetchall()
df = pd.DataFrame(myresult)
df.to_csv('C:\\path_here\\test.csv', sep=',')
You can easily convert a dataframe to a matrix.
np.array(df.to_records().view(type=np.matrix))
But I'm not sure why you want to do that. I think datframes are a lot more practical for most people's needs.
I'm trying to drop an existing table, do a query and then recreate the table using the pandas to_sql function. This query works in pgadmin, but not here. Any ideas of if this is a pandas bug or if my code is wrong?
Specific error is ValueError: Table 'a' already exists.
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
drop table a;
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', engine)
conn.close()
Why are you doing this like that? There is a shorter way: the if_exists kwag in to_sql. Try this:
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
# Notice how below line is different. You forgot the schema argument
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
conn.close()
According to docs:
replace: If table exists, drop it, recreate it, and insert data.
Ps. Additional tip:
This is better way to handle the connection:
with engine.connect() as conn, conn.begin():
sql = """select * from some_table limit 1"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
Because it ensures that your connection is always closed, even if your program exits with an error. This is important to prevent data corruption. Further, I would just use this:
import pandas as pd
...
pd.read_sql(sql, conn)
instead of the way you are doing it.
So, if I was in your place writing that code, it would look like this:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
with engine.connect() as conn, conn.begin():
df = pd.read_sql('select * from some_table limit 1', con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')