Trying to write CSV file to Oracle databse table with pandas - python

I am trying to write aa table to an Oracle database using Python's pandas.
Here's is my code:
import cx_Oracle
import pandas as pd
import csv
df = pd.read_csv('C:/Users/admin/Desktop/customer.csv')
conn = cx_Oracle.connect('SYSTEM/Mouni123$#localhost/orcl')
df = df.to_sql('cust', conn, 'if_exists=replace')
conn.close()
df
I get the following error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ORA-01036: illegal variable name/number
What am I doing wrong?

The error indicates that your code is actually trying to export to a SQLite database which is expected to fail if, in fact, the target is an Oracle database.
If I understand the documentation for dataframe.to_sql() correctly, it assumes an SQLite database as the target by default. So, in order to use Oracle as a database target, you'll have to make that explicit using SQLAlchemy as described in the documentation.

Related

Handling UUID values in Arrow with Parquet files

I'm new to Python and Pandas - please be gentle!
I'm using SqlAlchemy with pymssql to execute a SQL query against a SQL Server database and then convert the result set into a dataframe. I'm then attempting to write this dataframe as a Parquet file:
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
The data I'm retrieving in the SQL query includes a uniqueidentifier column (i.e. a UUID) named rowguid. Because of this, I'm getting the following error on the last line above:
pyarrow.lib.ArrowInvalid: ("Could not convert UUID('92c4279f-1207-48a3-8448-4636514eb7e2') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column rowguid with type object')
Is there any way I can force all UUIDs to strings at any point in the above chain of events?
A few extra notes:
The goal for this portion of code was to receive the SQL query text as a parameter and act as a generic SQL-to-Parquet function.
I realise I can do something like df['rowguid'] = df['rowguid'].astype(str), but it relies on me knowing which columns have uniqueidentifier types. By the time it's a dataframe, everything is an object and each query will be different.
I also know I can convert it to a char(36) in the SQL query itself, however, I was hoping to do something more "automatic" so the person writing the query doesn't trip over this problem accidentally all the time / doesn't have to remember to always convert the datatype.
Any ideas?
Try DuckDB
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
# Close the database connection
conn.close()
# Create DuckDB connection
duck_conn = duckdb.connect(':memory:')
# Write DataFrame content to a snappy compressed parquet file
COPY (SELECT * FROM df) TO 'df-snappy.parquet' (FORMAT 'parquet')
Ref:
https://duckdb.org/docs/guides/python/sql_on_pandas
https://duckdb.org/docs/sql/data_types/overview
https://duckdb.org/docs/data/parquet

unable to use sql built in functions in PYODBC SQL Query from Lotus Note DB

I am new to using 'pyodbc' for querying data from ODBC DB. Specifically a Lotus Notes DB.
This is an example where the query fails using a function in SQL:
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={Lotus Notes SQL Driver (*.nsf)};SERVER=server;DATABASE=db.nsf;PWD=xxxxx;UID=userid", autocommit=True)
cursor = cnxn.cursor()
sql_addon = """SELECT REPLACE(timestamp_DT,'-','') as timestamp_DT
FROM ViewInNoteDB
"""
df_addon = pd.read_sql(sql_addon, cnxn)
This the error I get:
': ('37000', u"[37000] [Lotus][ODBC Lotus Notes]Name, constant, or expression expected (23008) (SQLExecDirectW); [37000] [Lotus][ODBC Lotus Notes]Incorrect syntax near 'SELECT' (23064)")
I get different errors using GETDATE(), CONVERT function, and many other functions.
It seems that the issue is related to using to SQL*Server syntax which is not supported by the Lotus Notes ODBC driver. CAST and CONVERT are not supported unfortunately.
The only supported column functions: http://www-12.lotus.com/ldd/doc/notessql/2.0.6/notessql.nsf/66208c256b4136a2852563c000646f8c/1f3d9225b5e6a547852567010067254d?OpenDocument

Python dataframe value insertion to database table column

Is it possible to insert python dataframe values to database table column?
I am using snowflake as my database.
CommuteTime is the table which contains the StudentID column. "add_col" is the python dataframe. I need to insert the df values to StudentID column.
Below is my code which i have tried to insert df values to table column.
c_col = pd.read_sql_query('insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")', engine)
When I execute the above its not accepting the dataframe. Its throwing the below error.
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 68
invalid identifier '"add_col"' [SQL: 'insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")']
(Background on this error at: http://sqlalche.me/e/f405)
Please provide suggestions to fix this..
You cannot make it with pd.read_sql_query.
First, you need to create Snowflake cursor.
e.g.
import snowflake.connector
cursor = snowflake.connector.connect(
user='username',
password='password',
database='database_name',
schema='PUBLIC',
warehouse='warehouse_name'
).cursor()
Once you have a cursor, you can query like this: cursor.execute("SELECT * from "CommuteTime")
To insert data into tables, you need to use INSERT INTO from Snowflake
Please provide more info about your dataframe, to help you further.
I was only able to do that using SQL Alchemy, not Snowflake Python Connector.
from sqlalchemy import create_engine
# Establish the connection to the Snowflake database
sf = 'snowflake://{}:{}#{}{}'.format(user, password, account, table_location)
engine = create_engine(sf)
# Write your data frame to a table in database
add_col.to_sql(table_name, con=engine, if_exists='replace', index=False)
See here to learn how to establish a connection to Snowflake by passing username, password, account, and table location.
Explore here to learn about the arguments you can pass as if_exists to the function to_sql().

SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master db

Using MSSQL (version 2012), I am using SQLAlchemy and pandas (on Python 2.7) to insert rows into a SQL Server table.
After trying pymssql and pyodbc with a specific server string, I am trying an odbc name:
import sqlalchemy, pyodbc, pandas as pd
engine = sqlalchemy.create_engine("mssql+pyodbc://mssqlodbc")
sqlstring = "EXEC getfoo"
dbdataframe = pd.read_sql(sqlstring, engine)
This part works great and worked with the other methods (pymssql, etc). However, the pandas to_sql method doesn't work.
finaloutput.to_sql("MyDB.dbo.Loader_foo",engine,if_exists="append",chunksize="10000")
With this statement, I get a consistent error that pandas is trying to do a CREATE TABLE in the sql server Master db, which it is not permisioned for.
How do I get pandas/SQLAlchemy/pyodbc to point to the correct mssql database? The to_sql method seems to ignore whatever I put in engine connect string (although the read_sql method seems to pick it up just fine.
To have this question as answered: the problem is that you specify the schema in the table name itself. If you provide "MyDB.dbo.Loader_foo" as the table name, pandas will interprete this full string as the table name, instead of just "Loader_foo".
Solution is to only provide "Loader_foo" as table name. If you need to specify a specific schema to write this table into, you can use the schema kwarg (see docs):
finaloutput.to_sql("Loader_foo", engine, if_exists="append")
finaloutput.to_sql("Loader_foo", engine, if_exists="append", schema="something_else_as_dbo")

Python MS Access Database Table Creation From Pandas Dataframe Using SQLAlchemy

I'm trying to create an MS Access database from Python and was wondering if it's possible to create a table directly from a pandas dataframe. I know that I can use pandas dataframe.to_sql() function to successfully write the dataframe to an SQLite database or by an using sqlalchemy engine for some other database format (but not Access unfortunately) but I can't get all the pieces parts to come together. Here's the code snippet that I've been testing with:
import pandas as pd
import sqlalchemy
import pypyodbc # Used to actually create the .mdb file
import pyodbc
# Connection function to use for sqlalchemy
def Connection():
MDB = 'C:\\database.mdb'
DRV = '{Microsoft Access Driver (*.mdb)}'
connection_string = 'Driver={Microsoft Access Driver (*.mdb)};DBQ=%s' % MDB
return pyodbc.connect('DRIVER={};DBQ={}'.format(DRV,MDB))
# Try to connect to the database
try:
Conn = Connection()
# If it fails because its not been created yet, create it and connect to it
except:
pypyodbc.win_create_mdb(MDB)
Conn = Connection()
# Create the sqlalchemy engine using the pyodbc connection
Engine = sqlalchemy.create_engine('mysql+pyodbc://', creator=Connection)
# Some dataframe
data = {'Values' : [1., 2., 3., 4.],
'FruitsAndPets' : ["Apples", "Oranges", "Puppies", "Ducks"]}
df = pd.DataFrame(data)
# Try to send it to the access database (and fail)
df.to_sql('FruitsAndPets', Engine, index = False)
I'm not sure that what I'm trying to do is even possible with the current packages I'm using but I wanted to check here before I write my own hacky dataframe to MS Access table function. Maybe my sqlalchemy engine is set up wrong?
Here's the end of my error with mssql+pyodbc in the engine:
cursor.execute(statement, parameters)
sqlalchemy.exc.DBAPIError: (Error) ('HY000', "[HY000] [Microsoft][ODBC Microsoft Access Driver] Could not find file 'C:\\INFORMATION_SCHEMA.mdb'. (-1811) (SQLExecDirectW)") u'SELECT [COLUMNS_1].[TABLE_SCHEMA], [COLUMNS_1].[TABLE_NAME], [COLUMNS_1].[COLUMN_NAME], [COLUMNS_1].[IS_NULLABLE], [COLUMNS_1].[DATA_TYPE], [COLUMNS_1].[ORDINAL_POSITION], [COLUMNS_1].[CHARACTER_MAXIMUM_LENGTH], [COLUMNS_1].[NUMERIC_PRECISION], [COLUMNS_1].[NUMERIC_SCALE], [COLUMNS_1].[COLUMN_DEFAULT], [COLUMNS_1].[COLLATION_NAME] \nFROM [INFORMATION_SCHEMA].[COLUMNS] AS [COLUMNS_1] \nWHERE [COLUMNS_1].[TABLE_NAME] = ? AND [COLUMNS_1].[TABLE_SCHEMA] = ?' (u'FruitsAndPets', u'dbo')
and the ending error for mysql+pyodbc in the engine:
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Invalid SQL statement; expected 'DELETE', 'INSERT', 'PROCEDURE', 'SELECT', or 'UPDATE'. (-3500) (SQLExecDirectW)") "SHOW VARIABLES LIKE 'character_set%%'" ()
Just to note, I don't care if I use sqlalchemy or pandas to_sql() I just am looking for some easy way of getting a dataframe into my MS Access database easily. If that's dump to JSON then a loop function to insert rows using SQL manually, whatever, if it works well I'll take it.
For those still looking into this, basically you can't use pandas to_sql method for MS Access without a great deal of difficulty. If you are determined to do it this way, here is a link where someone fixed sqlalchemy's Access dialect (and presumably the OP's code would work with this Engine):
connecting sqlalchemy to MSAccess
The best way to get a data frame into MS Access is to build the INSERT statments from the records, then simply connect via pyodbc or pypyodbc and execute them with a cursor. You have to do inserts one at a time, its probably best to break this up into chunks (around 5000) if you have a lot of data.
There is a short tutorial on the pypyodbc website for executing SQL commands and populating an Access database:
https://code.google.com/p/pypyodbc/wiki/pypyodbc_for_access_mdb_file
I also found this useful Python wiki article:
https://wiki.python.org/moin/Microsoft%20Access
It states that mxODBC also has the capability to work with MS Access. A long time ago, I believe I successfully used ADOdb to connect to MS Access as well.
A few years ago, SQLAlchemy had experimental support for Microsoft Access. I used it to move an Access database to MS SQL Server at the time. I used SQLAlchemy to autoload / reflect the database. It was super handy. I believe that code was in version 0.5. You can read a bit about what what I did here.

Categories