Write a DataFrame to an SQL database (Oracle)

Write a DataFrame to an SQL database (Oracle) - python

I need to upload a table I modified to my oracle database. I exported the table as pandas dataframe modified it and now want to upload it to the DB.
I am trying to do this using the df.to_sql function as follows:
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine('oracle://"IP_address_of_server"/"serviceDB"')
df.to_sql("table_name",engine, if_exists='replace', chunksize = None)
I always get this error: DatabaseError: (cx_Oracle.DatabaseError) ORA-12505: TNS:listener does not currently know of SID given in connect descriptor (Background on this error at: http://sqlalche.me/e/4xp6).
I am not an expert of this, so I could not understand what the matter is, specially that the IP_address I am givingg is the right one.
Could anywone help? Thanks a lot!

Related

Trying to read sqlite database to Dask dataframe

I am trying to read a table from a sqlite database in kaggle using Dask,
link to DB : https://www.kaggle.com/datasets/marcilonsilvacunha/amostracnpj?select=amostraCNPJ.sqlite
some of the tables in this database are really large and I want to test how dask can handle them.
I wrote the following code for one of the tables in the smaller sqlite database :
import dask.dataframe as ddf
import sqlite3
# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("/kaggle/input/amostraCNPJ.sqlite")
df = ddf.read_sql_table('cnpj_dados_cadastrais_pj', con, index_col='cnpj')
# Verify that result of SQL query is stored in the dataframe
print(df.head())
this gives an error:
AttributeError: 'sqlite3.Connection' object has no attribute '_instantiate_plugins'
any help would be apreciated as this is the first time I use Dask to read sqlite.

As the docstring stated, you should not pass a connection object to dask. You need to pass a sqlalchemy compatible connection string
df = ddf.read_sql_table('cnpj_dados_cadastrais_pj',
'sqlite:////kaggle/input/amostraCNPJ.sqlite', index_col='cnpj')

Improve loading dataframe into postgress db with python

Today I started to learn postgress and I was tryng to do the same thing that I do to load dataframes into my Oracle db
So, for example I have a df that contains 70k of records and 10 columns. My code for this is the following:
from sqlalchemy import create_engine
conn = create_engine('postgresql://'+data['user']+':'+data['password']+'#'+data['host']+':'+data['port_db']+'/'+data['dbname'])
df.to_sql('first_posgress', conn)
This code is kinda the same I use for my Oracle tables but in this case it takes several time to accomplish the task. So I was wondering if there is a better way to do this or it is because in postgress in general is slower.
I found some examples on SO and google but mostly are focused on create the table, not insert a df.

If it is possible for you to use psycopg2 instead of SQLALchemy you can transform your df into a csv and then use cursor.copy_from() to copy the csv into the db.
import io
output = io.StringIO()
df.to_csv(output, sep=",")
output.seek(0)
#psycopg2.cursor:
cursor.copy_from(
output,
target_table, #'first_posgress'
sep=",",
columns=tuple(df.columns)
)
con.commit() #psycopg2 conn
(I don't know if there is an similar function in SQLAlchemy, that is faster too)
Psycopg2 Cursor Documentation
This blogpost contains more information!
Hopefully this is useful for you !

Python pandas.to_sql error when trying too upload many columns into Microsoft SQL Server

I have been attempting to use Python to upload a table into Microsoft SQL Server. I have had great success with smaller tables, but start to get errors when there is a large number of columns or rows. I don't believe it is the filesize that is the issue, but I may be mistaken.
The same error comes up whether the data is from an Excel file, csv file, or query.
When I run the code, it does create a table in SQL Server, but only has the column headers (the rest being blank).
This is the code that I am using, which works for smaller files but gives me the below error for the larger ones:
import pyodbc
#import cx_Oracle
import pandas as pd
from sqlalchemy import create_engine
connstr_Dev = ('DSN='+ODBC_Dev+';UID='+SQLSN+';PWD='+SQLpass)
conn_Dev = pyodbc.connect(connstr_Dev)
cursor_Dev=conn_Dev.cursor()
engine_Dev = create_engine('mssql+pyodbc://'+ODBC_Dev)
upload_file= "M:/.../abc123.xls"
sql_table_name='abc_123_sql'
pd.read_excel(upload_file).to_sql(sql_table_name, engine_Dev, schema='dbo', if_exists='replace', index=False, index_label=None, chunksize=None, dtype=None)
conn_Dev.commit()
conn_Dev.close()
This gives me the following error:
ProgrammingError: (pyodbc.ProgrammingError) ('The SQL contains -13854
parameter markers, but 248290 parameters were supplied', 'HY000') .......
(Background on this error at: http://sqlalche.me/e/f405)
The error log in the provided link doesn't give me any ideas on troubleshooting.
Anything I can tweak in the code to make this work?
Thanks!

Upgrading to pandas 0.23.4 solved it for me. What is your version ?

Error: _get_column_info() When uploading Pandas DataFrame to Redshift

I'm trying to upload a pandas DataFrame directly to Redshift using the to_sql function.
connstr = 'redshift+psycopg2://%s:%s#%s.redshift.amazonaws.com:%s/%s' %
(username, password, cluster, port, db_name)
def send_data(df, block_size=10000):
engine = create_engine(connstr)
with engine.connect() as conn, conn.begin():
df.to_sql(name='my_table_clean', schema='my_schema', con=conn, index=False,
if_exists='replace', chunksize=block_size)
del engine
The table my_schema.my_table_clean exists (but is empty), and the connection built using connstr is also valid (verified by a correspond retrieve_data method). The retrieve function pulls data from my_table and my script cleans it up using pandas to output to my_table_clean.
The problem is, I keep getting the following error:
TypeError: _get_column_info() takes exactly 9 arguments (8 given)
during the to_sql function.
I can't seem to figure out what is causing this error. Is anyone familiar with it?
Using
python 2.7.13
pandas 0.20.2
sqlalchemy 1.2.0.
Note: I'm trying to circumvent S3 -> Redshift for this script since I don't want to create a folder in my bucket just for one file, and this single script doesn't conform to my overall ETL structure. I'm hoping to just run this one script after the ETL that creates the original my_table.

Pandas to_sql right truncation error

I'm trying to use Pandas to_sql to insert data from .csv files into a mssql db. No matter how I seem do it I run into this error:
pyodbc.DataError: ('String data, right truncation: length 8 buffer 4294967294', '22001')
The code I'm running looks like this:
import pandas as pd
from sqlalchemy import create_engine
df = pd.read_csv('foo.csv')
engine = create_engine("mssql+pyodbc://:#Test")
with engine.connect() as conn, conn.begin():
df.to_sql(name='test', con=conn, schema='foo', if_exists='append', index=False)
Any help would be appreciated!
P.S I'm still fairly new to python and mssql.

Okay so I didn't have my DSN configured correctly. The driver I was using was SQL Server and I needed to change it to ODBC Driver 13 for SQL Server. That fixed all my problems.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write a DataFrame to an SQL database (Oracle) - python

Related

Trying to read sqlite database to Dask dataframe

Improve loading dataframe into postgress db with python

Python pandas.to_sql error when trying too upload many columns into Microsoft SQL Server

Error: _get_column_info() When uploading Pandas DataFrame to Redshift

Pandas to_sql right truncation error

Categories

Resources