I try to import data from a oracle database to a pandas dataframe.
right Now i am using:
import cx_Oracle
import pandas as pd
db_connection_string = '.../A1#server:port/servername'
con = cx_Oracle.connect(db_connection_string)
query = """SELECT*
FROM Salesdata"""
df = pd.read_sql(query, con=con)
and get the following error: DatabaseError: ORA-00942: Table or view doesn't exist
When I run a query to get the list of all tables:
cur = con.cursor()
cur.execute("SELECT table_name FROM dba_tables")
for row in cur:
print (row)
The output looks like this:
('A$',)
('A$BD',)
('Salesdata',)
What I am doing wrong? I used this question to start.
If I use the comment to print(query)I get:
SELECT*
FROM Salesdata
Getting ORA-00942 when running SELECT can have 2 possible causes:
The table does not exist: here you should make sure the table name is prefixed by the table owner (schema name) as in select * from owner_name.table_name. This is generally needed if the current Oracle user connected is not the table owner.
You don't have SELECT privileges on the table. This is also generally needed if the current Oracle user connected is not the table owner.
You need to check both.
My MSSQL DB table contains following structure:
create table TEMP
(
MyXMLFile XML
)
Using Python, I a trying to load locally stored .XML file into MS SQL DB (No XML Parsing Required)
Following is Python code:
import pyodbc
import xlrd
import xml.etree.ElementTree as ET
print("Connecting..")
# Establish a connection between Python and SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=TEST;'
'Database=test;'
'Trusted_Connection=yes;')
print("DB Connected..")
# Get XMLFile
XMLFilePath = open('C:HelloWorld.xml')
# Create Table in DB
CreateTable = """
create table test.dbo.TEMP
(
XBRLFile XML
)
"""
# execute create table
cursor = conn.cursor()
try:
cursor.execute(CreateTable)
conn.commit()
except pyodbc.ProgrammingError:
pass
print("Table Created..")
InsertQuery = """
INSERT INTO test.dbo.TEMP (
XBRLFile
) VALUES (?)"""
# Assign values from each row
values = (XMLFilePath)
# Execute SQL Insert Query
cursor.execute(InsertQuery, values)
# Commit the transaction
conn.commit()
# Close the database connection
conn.close()
But the code is storing the XML path in MYXMLFile column and not the XML file. I referred lxml library and other tutorials. But, I did not encountered straight forward approach to store file.
Please can anyone help me with it. I have just started working on Python.
Here, is solution to load .XML file directly into MS SQL SB using Python.
import pyodbc
import xlrd
import xml.etree.ElementTree as ET
print("Connecting..")
# Establish a connection between Python and SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
'Server=TEST;'
'Database=test;'
'Trusted_Connection=yes;')
print("DB Connected..")
# Get XMLFile
XMLFilePath = open('C:HelloWorld.xml')
x = etree.parse(XBRLFilePath) # Updated Code line
with open("FileName", "wb") as f: # Updated Code line
f.write(etree.tostring(x)) # Updated Code line
# Create Table in DB
CreateTable = """
create table test.dbo.TEMP
(
XBRLFile XML
)
"""
# execute create table
cursor = conn.cursor()
try:
cursor.execute(CreateTable)
conn.commit()
except pyodbc.ProgrammingError:
pass
print("Table Created..")
InsertQuery = """
INSERT INTO test.dbo.TEMP (
XBRLFile
) VALUES (?)"""
# Assign values from each row
values = etree.tostring(x) # Updated Code line
# Execute SQL Insert Query
cursor.execute(InsertQuery, values)
# Commit the transaction
conn.commit()
# Close the database connection
conn.close()
From this source I get to know to read lat long coordinate from a csv file and get the reverse geocoding done, but when I try to do the same but using source as sql table instead of csv file, it gives some syntactical error.
reading data from sql table
import pyodbc
import urllib.request
import urllib.parse
import json
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=mydb;UID=test;PWD=abc#123;autocommit=True')
cursor = cnxn.cursor()
cursor.execute("select GEOCODE_ID, lat, longt from GEOCODE_TBL where JSON_str is NULL")
ID=[]
px_val=[]
py_val=[]
for row in cursor.fetchall():
ID.append(row[0])
px_val.append(row[1])
py_val.append(row[2])
#code to update the O/P from google reverse geocoding(JSON String) into table
for i in range(1,len(px_val)):
wp = urllib.request.urlopen("http://maps.googleapis.com/maps/api/geocode/json?latlng={0},{1}&sensor=false".format(px_val[i],py_val[i]))
pw=wp.read().decode('utf-8')
pw = json.loads(wp)
cursor.execute("UPDATE GEOCODE_TBL SET JSON_str = ? WHERE GEOCODE_ID = ?", str(pw[i]),str(ID(i)))
print('Done')
cnxn.commit()
Any help on this will be grateful.
Thanks.
I understand how to open a database using a python script:
import sqlite3 as lite
.
.
con = lite.connect(db_name)
However what I have is a lot of database files made with tsk_loaddb, when I am in sqlite3 I can use the following
SQLite version 3.8.7.2 2014-11-18 20:57:56
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open /work/jmjohnso1/db_project/db_IN60-1020
sqlite> .tables
blackboard_artifact_types tsk_files_derived_method
blackboard_artifacts tsk_files_path
blackboard_attribute_types tsk_fs_info
blackboard_attributes tsk_image_info
tsk_db_info tsk_image_names
tsk_file_layout tsk_objects
tsk_files tsk_vs_info
tsk_files_derived tsk_vs_parts
However, I am not understanding how to do it in python. My question is how do I translate the open command with the sqlite3 python library.
Turns out I had a newline in the name because I was reading from a file. So the following works.
def open_sql(db_folder, db_name, table):
# databases are located at /work/jmjohnso1/db_project
path_name = os.path.join(db_folder,db_name).strip()
con = lite.connect(path_name)
cur = con.cursor()
cur.execute('SELECT * FROM ' + table)
col_names = [cn[0] for cn in cur.description]
rows = cur.fetchall()
print_header(col_names)
print_output(rows)
con.close()
The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.
However, I am unable to find any good examples for working with databases like MySQL for example.
Can anyone point me to links or give some code snippets of how to convert query results using mysql-python to data frames in Pandas efficiently ?
As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using the MySQLdb and cx_Oracle libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example for cx_Oracle:
import pandas as pd
import cx_Oracle
ora_conn = cx_Oracle.connect('your_connection_string')
df_ora = pd.read_sql('select * from user_objects', con=ora_conn)
print 'loaded dataframe from Oracle. # Records: ', len(df_ora)
ora_conn.close()
And here is the equivalent example for MySQLdb:
import MySQLdb
mysql_cn= MySQLdb.connect(host='myhost',
port=3306,user='myusername', passwd='mypassword',
db='information_schema')
df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)
print 'loaded dataframe from MySQL. records:', len(df_mysql)
mysql_cn.close()
For recent readers of this question: pandas have the following warning in their docs for version 14.0:
Warning: Some of the existing functions or function aliases have been
deprecated and will be removed in future versions. This includes:
tquery, uquery, read_frame, frame_query, write_frame.
And:
Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has
been deprecated. MySQL will be further supported with SQLAlchemy
engines (GH6900).
This makes many of the answers here outdated. You should use sqlalchemy:
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('dialect://user:pass#host:port/schema', echo=False)
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')
For the record, here is an example using a sqlite database:
import pandas as pd
import sqlite3
with sqlite3.connect("whatever.sqlite") as con:
sql = "SELECT * FROM table_name"
df = pd.read_sql_query(sql, con)
print df.shape
I prefer to create queries with SQLAlchemy, and then make a DataFrame from it. SQLAlchemy makes it easier to combine SQL conditions Pythonically if you intend to mix and match things over and over.
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pandas import DataFrame
import datetime
# We are connecting to an existing service
engine = create_engine('dialect://user:pwd#host:port/db', echo=False)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()
# And we want to query an existing table
tablename = Table('tablename',
Base.metadata,
autoload=True,
autoload_with=engine,
schema='ownername')
# These are the "Where" parameters, but I could as easily
# create joins and limit results
us = tablename.c.country_code.in_(['US','MX'])
dc = tablename.c.locn_name.like('%DC%')
dt = tablename.c.arr_date >= datetime.date.today() # Give me convenience or...
q = session.query(tablename).\
filter(us & dc & dt) # That's where the magic happens!!!
def querydb(query):
"""
Function to execute query and return DataFrame.
"""
df = DataFrame(query.all());
df.columns = [x['name'] for x in query.column_descriptions]
return df
querydb(q)
MySQL example:
import MySQLdb as db
from pandas import DataFrame
from pandas.io.sql import frame_query
database = db.connect('localhost','username','password','database')
data = frame_query("SELECT * FROM data", database)
The same syntax works for Ms SQL server using podbc also.
import pyodbc
import pandas.io.sql as psql
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;DATABASE=mydb;UID=username;PWD=password')
cursor = cnxn.cursor()
sql = ("""select * from mytable""")
df = psql.frame_query(sql, cnxn)
cnxn.close()
And this is how you connect to PostgreSQL using psycopg2 driver (install with "apt-get install python-psycopg2" if you're on Debian Linux derivative OS).
import pandas.io.sql as psql
import psycopg2
conn = psycopg2.connect("dbname='datawarehouse' user='user1' host='localhost' password='uberdba'")
q = """select month_idx, sum(payment) from bi_some_table"""
df3 = psql.frame_query(q, conn)
For Sybase the following works (with http://python-sybase.sourceforge.net)
import pandas.io.sql as psql
import Sybase
df = psql.frame_query("<Query>", con=Sybase.connect("<dsn>", "<user>", "<pwd>"))
pandas.io.sql.frame_query is deprecated. Use pandas.read_sql instead.
import the module
import pandas as pd
import oursql
connect
conn=oursql.connect(host="localhost",user="me",passwd="mypassword",db="classicmodels")
sql="Select customerName, city,country from customers order by customerName,country,city"
df_mysql = pd.read_sql(sql,conn)
print df_mysql
That works just fine and using pandas.io.sql frame_works (with the deprecation warning). Database used is the sample database from mysql tutorial.
This should work just fine.
import MySQLdb as mdb
import pandas as pd
con = mdb.connect(‘127.0.0.1’, ‘root’, ‘password’, ‘database_name’);
with con:
cur = con.cursor()
cur.execute(“select random_number_one, random_number_two, random_number_three from randomness.a_random_table”)
rows = cur.fetchall()
df = pd.DataFrame( [[ij for ij in i] for i in rows] )
df.rename(columns={0: ‘Random Number One’, 1: ‘Random Number Two’, 2: ‘Random Number Three’}, inplace=True);
print(df.head(20))
This helped for me for connecting to AWS MYSQL(RDS) from python 3.x based lambda function and loading into a pandas DataFrame
import json
import boto3
import pymysql
import pandas as pd
user = 'username'
password = 'XXXXXXX'
client = boto3.client('rds')
def lambda_handler(event, context):
conn = pymysql.connect(host='xxx.xxxxus-west-2.rds.amazonaws.com', port=3306, user=user, passwd=password, db='database name', connect_timeout=5)
df= pd.read_sql('select * from TableName limit 10',con=conn)
print(df)
# TODO implement
#return {
# 'statusCode': 200,
# 'df': df
#}
For Postgres users
import psycopg2
import pandas as pd
conn = psycopg2.connect("database='datawarehouse' user='user1' host='localhost' password='uberdba'")
customers = 'select * from customers'
customers_df = pd.read_sql(customers,conn)
customers_df