I want to append dataframe (pandas) to my table in oracle.
But this code deletes all rows in table:(
My dataframe and my result become this:
0, 0, 0, ML_TEST, 0, 5
0, 0, 0, ML_TEST, 0, 6
by this code block below :
import cx_Oracle
import pandas as pd
from sqlalchemy import types, create_engine
dataset = pd.read_csv("denemedf.txt", delimiter=",")
print(dataset)
from sqlalchemy import create_engine
engine = create_engine('oracle://***:***#***:***/***', echo=False)
dataset.to_sql(name='dev_log',con=engine ,if_exists = 'append', index=False)
How can i append this dataframe's rows to the last of the table without deleting existing rows in this table?
Now i tried again, now appends to the last, but in first try it deleted all existing rows.
How can do this effectively without causing any problem?
Actually problem occurs because of the schema of this table.
This table is in gnl owner, but i connected with prg. So it couldnt find the table and created another.
Is that any way to write owner or schema in to this function?
I think this may help :
import cx_Oracle
import pandas as pd
dataset = pd.read_csv("C:\\pathToFile\\denemedf.txt", delimiter=",")
con = cx_Oracle.connect('uname/pwd#serverName:port/instanceName')
cursor = con.cursor()
sql='INSERT INTO gnl.tbl_deneme VALUES(:1,:2,:3,:4,:5,:6)'
df_list = dataset.values.tolist()
n = 0
for i in dataset.iterrows():
cursor.execute(sql,df_list[n])
n += 1
con.commit()
cursor.close
con.close
provided insert privilege is already granted to the schema prg for your table tbl_deneme
( after connecting to gnl -> grant insert on tbl_deneme to prg )
where your text file( denemedf.txt ) is assumed to be
col1,col2,col3,col4,col5,col6
0, 0, 0, ML_TEST, 0, 5
0, 0, 0, ML_TEST, 0, 6
Moreover, a dynamic option, which will create a table if not exists by using the column names at the first line and insert the values depending on the splitted elements of the values list derived from the second line without explicitly specified the variable list one by one, and more performant option along with using cursor.executemany might be provided such as
import cx_Oracle
import pandas as pd
con = cx_Oracle.connect(user, password, host+':'+port+'/'+dbname)
cur = con.cursor()
tab_name = 'gnl.tbl_deneme'
cursor.execute('SELECT COUNT(*) FROM user_tables WHERE table_name = UPPER(:1) ',[tab_name])
exs = cursor.fetchone()[0]
df = pd.read_csv('C:\\pathToFile\\denemedf.txt', sep = ',', dtype=str)
col=df.columns.tolist()
crt=""
for k in col:
crt += ''.join(k)+' VARCHAR2(4000),'
if int(exs) == 0:
crt = 'CREATE TABLE '+tab_name+' ('+crt.rstrip(",")+')'
cursor.execute(crt)
vrs=""
for i in range(0,len(col)):
vrs+= ':'+str(i+1)+','
cols=[]
sql = 'INSERT INTO '+tab_name+' VALUES('+vrs.rstrip(",")+')'
for i in range(0,len(df)):
cols.append(tuple(df.fillna('').values[i]))
cursor.executemany(sql,cols)
con.commit()
cursor.close
con.close
Considering data_df to be dataframe it can be done by below 3 lines
rows = [tuple(x) for x in data_df.values]
cur.executemany("INSERT INTO table_name VALUES (:1,:2,:3,:4)",rows)
con_ora.commit()
dataset.to_sql('dev_log',engine ,if_exists = 'append', index=False)
dev_log use directly as a table name and engine directly for connection not name='dev_log' and con = engine is
paramenter: append: Insert new values to the existing table
so i think it will work for appending new row to the existing table and it will not delte any row from existing table
pandas.DataFrame.to_sql
I have a table where I'm trying to create the value of the attribute from one column based off the attribute of the value of another column...
Essentially, one column is a long series of numbers... such as 540379724021081
and I am trying to for that tuple, make another attribute value of the first 4 characters so...5403
So in the end my table would go from...
To this...
I started doing this in Python with Psycopg2...but don't think it's the right way with a quick script I made
import psycopg2
conn = psycopg2.connect("dbname='gis3capstone' user='postgres' password='password' host='localhost' port='5433'")
cur = conn.cursor()
cur.execute('SELECT * FROM fcc_form_477')
row = cur.fetchone()
while row:
val2 = str(row[0])[0:4]
# Set row[1] = val2 ??
row = cur.fetchone()
Any help to go about this?
EDIT: or SQL...if I can do it that way
You can use substr():
select substr(val1, 1, 4) as val2
If the value is a number, then convert it to a string:
select substr(val1::text, 1, 4) as val2
I am trying to execute a query and then placing it into a dataframe. For some reason, it is only loading column 0. The entire resultset is loaded into that one column.
Here's what I am doing
conn = pyodbc.connect("connection info")
cursor = conn.cursor()
cursor.execute = "sql statement"
names = [ x[0] for x in cursor.description]
rows = cursor.fetchall()
df = pd.DataFrame(rows, columns = names)
I get this error.
ValueError: Shape of passed values is (1, 421), indices imply (5, 421)
It is suppose to be 5 columns, however when I ran it without the 'columns = names' and the dataframe is returning one column. Within this column it is storing the entire dataset of 5 columns from the sql query.
this is an example of what the result set looks like when I run it without the 'columns name':
(9026461, 875, 110, Decimal('1.08'), 3100)
This finally worked with the following:
engine = create_engine('mssql+pyodbc://serverName/DatabaseName? driver=SQL+Server+Native+Client+11.0')
connection = engine.connect()
resultSet = connection.execute("select column1, column2 from ....")
df = pd.DataFrame(resultSet.fetchall())
df.columns = resultSet .keys()
I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?
I know pandas has a to_sql function, but that only works on a database connection, it can not generate a string.
Example
What I would like is to take a dataframe like so:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
And a function that would generate this (this example is PostgreSQL but any would be fine):
CREATE TABLE data
(
index timestamp with time zone,
"A" double precision,
"B" double precision,
"C" double precision,
"D" double precision
)
If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema function of the pandas.io.sql module:
In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')
CREATE TABLE "data" (
"index" TIMESTAMP,
"A" REAL,
"B" REAL,
"C" REAL,
"D" REAL
)
Some notes:
I had to use reset_index because it otherwise didn't include the index
If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
SOURCE = df
TARGET = data
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):
# SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET)
# SOURCE: source dataframe
# TARGET: target table to be created in database
import pandas as pd
sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET)
return sql_text
Check the SQL CREATE TABLE Statement String
print('\n\n'.join(sql_text))
GENERATE SQL INSERT STATEMENT FROM DATAFRAME
def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):
sql_texts = []
for index, row in SOURCE.iterrows():
sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values)))
return sql_texts
Check the SQL INSERT INTO Statement String
print('\n\n'.join(sql_texts))
Insert Statement Solution
Not sure if this is the absolute best way to do it but this is more efficient than using df.iterrows() as that is very slow. Also this takes care of nan values with the help of regular expressions.
import re
def get_insert_query_from_df(df, dest_table):
insert = """
INSERT INTO `{dest_table}` (
""".format(dest_table=dest_table)
columns_string = str(list(df.columns))[1:-1]
columns_string = re.sub(r' ', '\n ', columns_string)
columns_string = re.sub(r'\'', '', columns_string)
values_string = ''
for row in df.itertuples(index=False,name=None):
values_string += re.sub(r'nan', 'null', str(row))
values_string += ',\n'
return insert + columns_string + ')\n VALUES\n' + values_string[:-2] + ';'
If you want to write the file by yourself, you may also retrieve columns names and dtypes and build a dictionary to convert pandas data types to sql data types.
As an example:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
tableName = 'table'
columnNames = df.columns.values.tolist()
columnTypes = map(lambda x: x.name, df.dtypes.values)
# Storing column names and dtypes in a dataframe
tableDef = pd.DataFrame(index = range(len(df.columns) + 1), columns=['cols', 'dtypes'])
tableDef.iloc[0] = ['index', df.index.dtype.name]
tableDef.loc[1:, 'cols'] = columnNames
tableDef.loc[1:, 'dtypes'] = columnTypes
# Defining a dictionnary to convert dtypes
conversion = {'datetime64[ns]':'timestamp with time zone', 'float64':'double precision'}
# Writing sql in a file
f = open('yourdir\%s.sql' % tableName, 'w')
f.write('CREATE TABLE %s\n' % tableName)
f.write('(\n')
for i, row in tableDef.iterrows():
sep = ",\n" if i < tableDef.index[-1] else "\n"
f.write('\t\"%s\" %s%s' % (row['cols'], conversion[row['dtypes']], sep))
f.write(')')
f.close()
You can do the same way to populate your table with INSERT INTO.
SINGLE INSERT QUERY SOLUTION
I didn't find the above answers to suit my needs. I wanted to create one single insert statement for a dataframe with each row as the values. This can be achieved by the below:
import re
import pandas as pd
table = 'your_table_name_here'
# You can read from CSV file here... just using read_sql_query as an example
df = pd.read_sql_query(f'select * from {table}', con=db_connection)
cols = ', '.join(df.columns.to_list())
vals = []
for index, r in df.iterrows():
row = []
for x in r:
row.append(f"'{str(x)}'")
row_str = ', '.join(row)
vals.append(row_str)
f_values = []
for v in vals:
f_values.append(f'({v})')
# Handle inputting NULL values
f_values = ', '.join(f_values)
f_values = re.sub(r"('None')", "NULL", f_values)
sql = f"insert into {table} ({cols}) values {f_values};"
print(sql)
db.dispose()
If you're just looking to generate a string with inserts based on pandas.DataFrame - I'd suggest using bulk sql insert syntax as suggested by #rup.
Here's an example of a function I wrote for that purpose:
import pandas as pd
import re
def df_to_sql_bulk_insert(df: pd.DataFrame, table: str, **kwargs) -> str:
"""Converts DataFrame to bulk INSERT sql query
>>> data = [(1, "_suffixnan", 1), (2, "Noneprefix", 0), (3, "fooNULLbar", 1, 2.34)]
>>> df = pd.DataFrame(data, columns=["id", "name", "is_deleted", "balance"])
>>> df
id name is_deleted balance
0 1 _suffixnan 1 NaN
1 2 Noneprefix 0 NaN
2 3 fooNULLbar 1 2.34
>>> query = df_to_sql_bulk_insert(df, "users", status="APPROVED", address=None)
>>> print(query)
INSERT INTO users (id, name, is_deleted, balance, status, address)
VALUES (1, '_suffixnan', 1, NULL, 'APPROVED', NULL),
(2, 'Noneprefix', 0, NULL, 'APPROVED', NULL),
(3, 'fooNULLbar', 1, 2.34, 'APPROVED', NULL);
"""
df = df.copy().assign(**kwargs)
columns = ", ".join(df.columns)
tuples = map(str, df.itertuples(index=False, name=None))
values = re.sub(r"(?<=\W)(nan|None)(?=\W)", "NULL", (",\n" + " " * 7).join(tuples))
return f"INSERT INTO {table} ({columns})\nVALUES {values};"
By the way, it converts nan/None entries to NULL and it's possible to pass constant column=value pairs as keyword arguments (see status="APPROVED" and address=None arguments in docstring example).
Generally, it works faster since any database does a lot of work for a single insert: checking the constraints, building indices, flushing, writing to log, etc. This complex operations can be optimized by the database when doing several-in-one operation, and not calling the engine one-by-one.
Taking the user #Jaris's post to get the CREATE, I extended it further to work for any CSV
import sqlite3
import pandas as pd
db = './database.db'
csv = './data.csv'
table_name = 'data'
# create db and setup schema
df = pd.read_csv(csv)
create_table_sql = pd.io.sql.get_schema(df.reset_index(), table_name)
conn = sqlite3.connect(db)
c = conn.cursor()
c.execute(create_table_sql)
conn.commit()
# now we can insert data
def insert_data(row, c):
values = str(row.name)+','+','.join([str('"'+str(v)+'"') for v in row])
sql_insert=f"INSERT INTO {table_name} VALUES ({values})"
try:
c.execute(sql_insert)
except Exception as e:
print(f"SQL:{sql_insert} \n failed with Error:{e}")
# use apply to loop over dataframe and call insert_data on each row
df.apply(lambda row: insert_data(row, c), axis=1)
# finally commit all those inserts into the database
conn.commit()
Hopefully this is more simple than the alternative answers and more pythonic!
Depending on if you can forego generating an intermediate representation of the SQL statement; You can just outright execute the insert statement as well.
con.executemany("INSERT OR REPLACE INTO data (A, B, C, D) VALUES (?, ?, ?, ?, ?)", list(df_.values))
This worked a little better as there is less messing around with string generation.
i have a sqlite3 database that has multiple (six) tables and i need it to be imported to csv, but when i try to import it, i get a duplicated value if a column (in a table) is larger than another (in another table).
ie: this is how my sqlite3 database file looks like:
column on table1 column on table2 column on table3
25 30 20
30
this is the result on the .csv file (using this script as example)
25,30,20
30,30,20
and this is the result i need it to show:
25,30,20
30
EDIT: Ok, this is how i add the values to each table, based on the python documentation example (executed each time a value entry is used):
import sqlite3
conn = sqlite3.connect('database.db')
c = conn.cursor()
# Create table
c.execute('''CREATE TABLE table
(column int)''')
# Insert a row of data
c.execute("INSERT INTO table VALUES (value)")
# Save (commit) the changes
conn.commit()
# We can also close the cursor if we are done with it
c.close()
any help?
-Regards...
This is how you could do this.
import sqlite3
con = sqlite3.connect('database')
cur = con.Cursor()
cur.execute('select table1.column, table2.column, table3.column from table1, table2, table3')
# result should look like ((25, 30, 20), (30,)), if I remember correctly
results = cur.fetchall()
output = '\n'.join(','.join(str(i) for i in line) for line in results)
Note: this code is untested, written out of my head, but I hope you get the idea.
UPDATE: apparently I made some mistakes in the script and somehow sql 'magically' pads the result (you might have guessed now that I'm not a sql guru :D). Another way to do it would be:
import sqlite3
conn = sqlite3.connect('database.db')
cur = conn.cursor()
tables = ('table1', 'table2', 'table3')
results = [list(cur.execute('select column from %s' % table)) for table in tables]
def try_list(lst, i):
try:
return lst[i]
except IndexError:
return ('',)
maxlen = max(len(i) for i in results)
results2 = [[try_list(line, i) for line in results] for i in xrange(maxlen)]
output = '\n'.join(','.join(str(i[0]) for i in line) for line in results2)
print output
which produces
25,30,20
30,,
This is probably an overcomplicated way to do it, but it is 0:30 right now for me, so I'm not on my best...
At least it gets the desired result.