Trying to write CSV file to Oracle databse table with pandas

Trying to write CSV file to Oracle databse table with pandas - python

I am trying to write aa table to an Oracle database using Python's pandas.
Here's is my code:
import cx_Oracle
import pandas as pd
import csv
df = pd.read_csv('C:/Users/admin/Desktop/customer.csv')
conn = cx_Oracle.connect('SYSTEM/Mouni123$#localhost/orcl')
df = df.to_sql('cust', conn, 'if_exists=replace')
conn.close()
df
I get the following error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ORA-01036: illegal variable name/number
What am I doing wrong?

The error indicates that your code is actually trying to export to a SQLite database which is expected to fail if, in fact, the target is an Oracle database.
If I understand the documentation for dataframe.to_sql() correctly, it assumes an SQLite database as the target by default. So, in order to use Oracle as a database target, you'll have to make that explicit using SQLAlchemy as described in the documentation.

Related

Handling UUID values in Arrow with Parquet files

I'm new to Python and Pandas - please be gentle!
I'm using SqlAlchemy with pymssql to execute a SQL query against a SQL Server database and then convert the result set into a dataframe. I'm then attempting to write this dataframe as a Parquet file:
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
The data I'm retrieving in the SQL query includes a uniqueidentifier column (i.e. a UUID) named rowguid. Because of this, I'm getting the following error on the last line above:
pyarrow.lib.ArrowInvalid: ("Could not convert UUID('92c4279f-1207-48a3-8448-4636514eb7e2') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column rowguid with type object')
Is there any way I can force all UUIDs to strings at any point in the above chain of events?
A few extra notes:
The goal for this portion of code was to receive the SQL query text as a parameter and act as a generic SQL-to-Parquet function.
I realise I can do something like df['rowguid'] = df['rowguid'].astype(str), but it relies on me knowing which columns have uniqueidentifier types. By the time it's a dataframe, everything is an object and each query will be different.
I also know I can convert it to a char(36) in the SQL query itself, however, I was hoping to do something more "automatic" so the person writing the query doesn't trip over this problem accidentally all the time / doesn't have to remember to always convert the datatype.
Any ideas?

Try DuckDB
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
# Close the database connection
conn.close()
# Create DuckDB connection
duck_conn = duckdb.connect(':memory:')
# Write DataFrame content to a snappy compressed parquet file
COPY (SELECT * FROM df) TO 'df-snappy.parquet' (FORMAT 'parquet')
Ref:
https://duckdb.org/docs/guides/python/sql_on_pandas
https://duckdb.org/docs/sql/data_types/overview
https://duckdb.org/docs/data/parquet

unable to use sql built in functions in PYODBC SQL Query from Lotus Note DB

I am new to using 'pyodbc' for querying data from ODBC DB. Specifically a Lotus Notes DB.
This is an example where the query fails using a function in SQL:
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={Lotus Notes SQL Driver (*.nsf)};SERVER=server;DATABASE=db.nsf;PWD=xxxxx;UID=userid", autocommit=True)
cursor = cnxn.cursor()
sql_addon = """SELECT REPLACE(timestamp_DT,'-','') as timestamp_DT
FROM ViewInNoteDB
"""
df_addon = pd.read_sql(sql_addon, cnxn)
This the error I get:
': ('37000', u"[37000] [Lotus][ODBC Lotus Notes]Name, constant, or expression expected (23008) (SQLExecDirectW); [37000] [Lotus][ODBC Lotus Notes]Incorrect syntax near 'SELECT' (23064)")
I get different errors using GETDATE(), CONVERT function, and many other functions.

It seems that the issue is related to using to SQL*Server syntax which is not supported by the Lotus Notes ODBC driver. CAST and CONVERT are not supported unfortunately.
The only supported column functions: http://www-12.lotus.com/ldd/doc/notessql/2.0.6/notessql.nsf/66208c256b4136a2852563c000646f8c/1f3d9225b5e6a547852567010067254d?OpenDocument

Python dataframe value insertion to database table column

Is it possible to insert python dataframe values to database table column?
I am using snowflake as my database.
CommuteTime is the table which contains the StudentID column. "add_col" is the python dataframe. I need to insert the df values to StudentID column.
Below is my code which i have tried to insert df values to table column.
c_col = pd.read_sql_query('insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")', engine)
When I execute the above its not accepting the dataframe. Its throwing the below error.
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 68
invalid identifier '"add_col"' [SQL: 'insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")']
(Background on this error at: http://sqlalche.me/e/f405)
Please provide suggestions to fix this..

You cannot make it with pd.read_sql_query.
First, you need to create Snowflake cursor.
e.g.
import snowflake.connector
cursor = snowflake.connector.connect(
user='username',
password='password',
database='database_name',
schema='PUBLIC',
warehouse='warehouse_name'
).cursor()
Once you have a cursor, you can query like this: cursor.execute("SELECT * from "CommuteTime")
To insert data into tables, you need to use INSERT INTO from Snowflake
Please provide more info about your dataframe, to help you further.

I was only able to do that using SQL Alchemy, not Snowflake Python Connector.
from sqlalchemy import create_engine
# Establish the connection to the Snowflake database
sf = 'snowflake://{}:{}#{}{}'.format(user, password, account, table_location)
engine = create_engine(sf)
# Write your data frame to a table in database
add_col.to_sql(table_name, con=engine, if_exists='replace', index=False)
See here to learn how to establish a connection to Snowflake by passing username, password, account, and table location.
Explore here to learn about the arguments you can pass as if_exists to the function to_sql().

SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master db

Using MSSQL (version 2012), I am using SQLAlchemy and pandas (on Python 2.7) to insert rows into a SQL Server table.
After trying pymssql and pyodbc with a specific server string, I am trying an odbc name:
import sqlalchemy, pyodbc, pandas as pd
engine = sqlalchemy.create_engine("mssql+pyodbc://mssqlodbc")
sqlstring = "EXEC getfoo"
dbdataframe = pd.read_sql(sqlstring, engine)
This part works great and worked with the other methods (pymssql, etc). However, the pandas to_sql method doesn't work.
finaloutput.to_sql("MyDB.dbo.Loader_foo",engine,if_exists="append",chunksize="10000")
With this statement, I get a consistent error that pandas is trying to do a CREATE TABLE in the sql server Master db, which it is not permisioned for.
How do I get pandas/SQLAlchemy/pyodbc to point to the correct mssql database? The to_sql method seems to ignore whatever I put in engine connect string (although the read_sql method seems to pick it up just fine.

To have this question as answered: the problem is that you specify the schema in the table name itself. If you provide "MyDB.dbo.Loader_foo" as the table name, pandas will interprete this full string as the table name, instead of just "Loader_foo".
Solution is to only provide "Loader_foo" as table name. If you need to specify a specific schema to write this table into, you can use the schema kwarg (see docs):
finaloutput.to_sql("Loader_foo", engine, if_exists="append")
finaloutput.to_sql("Loader_foo", engine, if_exists="append", schema="something_else_as_dbo")

Python MS Access Database Table Creation From Pandas Dataframe Using SQLAlchemy

I'm trying to create an MS Access database from Python and was wondering if it's possible to create a table directly from a pandas dataframe. I know that I can use pandas dataframe.to_sql() function to successfully write the dataframe to an SQLite database or by an using sqlalchemy engine for some other database format (but not Access unfortunately) but I can't get all the pieces parts to come together. Here's the code snippet that I've been testing with:
import pandas as pd
import sqlalchemy
import pypyodbc # Used to actually create the .mdb file
import pyodbc
# Connection function to use for sqlalchemy
def Connection():
MDB = 'C:\\database.mdb'
DRV = '{Microsoft Access Driver (*.mdb)}'
connection_string = 'Driver={Microsoft Access Driver (*.mdb)};DBQ=%s' % MDB
return pyodbc.connect('DRIVER={};DBQ={}'.format(DRV,MDB))
# Try to connect to the database
try:
Conn = Connection()
# If it fails because its not been created yet, create it and connect to it
except:
pypyodbc.win_create_mdb(MDB)
Conn = Connection()
# Create the sqlalchemy engine using the pyodbc connection
Engine = sqlalchemy.create_engine('mysql+pyodbc://', creator=Connection)
# Some dataframe
data = {'Values' : [1., 2., 3., 4.],
'FruitsAndPets' : ["Apples", "Oranges", "Puppies", "Ducks"]}
df = pd.DataFrame(data)
# Try to send it to the access database (and fail)
df.to_sql('FruitsAndPets', Engine, index = False)
I'm not sure that what I'm trying to do is even possible with the current packages I'm using but I wanted to check here before I write my own hacky dataframe to MS Access table function. Maybe my sqlalchemy engine is set up wrong?
Here's the end of my error with mssql+pyodbc in the engine:
cursor.execute(statement, parameters)
sqlalchemy.exc.DBAPIError: (Error) ('HY000', "[HY000] [Microsoft][ODBC Microsoft Access Driver] Could not find file 'C:\\INFORMATION_SCHEMA.mdb'. (-1811) (SQLExecDirectW)") u'SELECT [COLUMNS_1].[TABLE_SCHEMA], [COLUMNS_1].[TABLE_NAME], [COLUMNS_1].[COLUMN_NAME], [COLUMNS_1].[IS_NULLABLE], [COLUMNS_1].[DATA_TYPE], [COLUMNS_1].[ORDINAL_POSITION], [COLUMNS_1].[CHARACTER_MAXIMUM_LENGTH], [COLUMNS_1].[NUMERIC_PRECISION], [COLUMNS_1].[NUMERIC_SCALE], [COLUMNS_1].[COLUMN_DEFAULT], [COLUMNS_1].[COLLATION_NAME] \nFROM [INFORMATION_SCHEMA].[COLUMNS] AS [COLUMNS_1] \nWHERE [COLUMNS_1].[TABLE_NAME] = ? AND [COLUMNS_1].[TABLE_SCHEMA] = ?' (u'FruitsAndPets', u'dbo')
and the ending error for mysql+pyodbc in the engine:
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Invalid SQL statement; expected 'DELETE', 'INSERT', 'PROCEDURE', 'SELECT', or 'UPDATE'. (-3500) (SQLExecDirectW)") "SHOW VARIABLES LIKE 'character_set%%'" ()
Just to note, I don't care if I use sqlalchemy or pandas to_sql() I just am looking for some easy way of getting a dataframe into my MS Access database easily. If that's dump to JSON then a loop function to insert rows using SQL manually, whatever, if it works well I'll take it.

For those still looking into this, basically you can't use pandas to_sql method for MS Access without a great deal of difficulty. If you are determined to do it this way, here is a link where someone fixed sqlalchemy's Access dialect (and presumably the OP's code would work with this Engine):
connecting sqlalchemy to MSAccess
The best way to get a data frame into MS Access is to build the INSERT statments from the records, then simply connect via pyodbc or pypyodbc and execute them with a cursor. You have to do inserts one at a time, its probably best to break this up into chunks (around 5000) if you have a lot of data.

There is a short tutorial on the pypyodbc website for executing SQL commands and populating an Access database:
https://code.google.com/p/pypyodbc/wiki/pypyodbc_for_access_mdb_file
I also found this useful Python wiki article:
https://wiki.python.org/moin/Microsoft%20Access
It states that mxODBC also has the capability to work with MS Access. A long time ago, I believe I successfully used ADOdb to connect to MS Access as well.
A few years ago, SQLAlchemy had experimental support for Microsoft Access. I used it to move an Access database to MS SQL Server at the time. I used SQLAlchemy to autoload / reflect the database. It was super handy. I believe that code was in version 0.5. You can read a bit about what what I did here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to write CSV file to Oracle databse table with pandas - python

Related

Handling UUID values in Arrow with Parquet files

unable to use sql built in functions in PYODBC SQL Query from Lotus Note DB

Python dataframe value insertion to database table column

SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master db

Python MS Access Database Table Creation From Pandas Dataframe Using SQLAlchemy

Categories

Resources