Querying SQLite database file in Google Colab - python

print ('Files in Drive:')
!ls drive/AI
Files in Drive:
database.sqlite
Reviews.csv
Untitled0.ipynb
fine_food_reviews.ipynb
Titanic.csv
When I run the above code in Google Colab, clearly my sqlite file is present in my drive. But whenever I run some query on this file, it says
# using the SQLite Table to read data.
con = sqlite3.connect('database.sqlite')
#filtering only positive and negative reviews i.e.
# not taking into consideration those reviews with Score=3
filtered_data = pd.read_sql_query("SELECT * FROM Reviews WHERE Score !=3",con)
DatabaseError: Execution failed on sql 'SELECT * FROM Reviews WHERE
Score != 3 ': no such table: Reviews

Below you will find code that addresses the db setup on the Colab VM, table creation, data insertion and data querying. Execute all code snippets in individual notebook cells.
Note however that this example only shows how to execute the code on a non-persistent Colab VM. If you want to save your database to GDrive you will have to mount your Gdrive first (source):
from google.colab import drive
drive.mount('/content/gdrive')
and navigate to the appropriate file directory after.
Step 1: Create DB
import sqlite3
conn = sqlite3.connect('SQLite_Python.db') # You can create a new database by changing the name within the quotes
c = conn.cursor() # The database will be saved in the location where your 'py' file is saved
# Create table - CLIENTS
c.execute('''CREATE TABLE SqliteDb_developers
([id] INTEGER PRIMARY KEY, [name] text, [email] text, [joining_date] date, [salary] integer)''')
conn.commit()
Test whether the DB was created successfully:
!ls
Output:
sample_data SQLite_Python.db
Step 2: Insert Data Into DB
import sqlite3
try:
sqliteConnection = sqlite3.connect('SQLite_Python.db')
cursor = sqliteConnection.cursor()
print("Successfully Connected to SQLite")
sqlite_insert_query = """INSERT INTO SqliteDb_developers
(id, name, email, joining_date, salary)
VALUES (1,'Python','MakesYou#Fly.com','2020-01-01',1000)"""
count = cursor.execute(sqlite_insert_query)
sqliteConnection.commit()
print("Record inserted successfully into SqliteDb_developers table ", cursor.rowcount)
cursor.close()
except sqlite3.Error as error:
print("Failed to insert data into sqlite table", error)
finally:
if (sqliteConnection):
sqliteConnection.close()
print("The SQLite connection is closed")
Output:
Successfully Connected to SQLite
Record inserted successfully into SqliteDb_developers table 1
The SQLite connection is closed
Step 3: Query DB
import sqlite3
conn = sqlite3.connect("SQLite_Python.db")
cur = conn.cursor()
cur.execute("SELECT * FROM SqliteDb_developers")
rows = cur.fetchall()
for row in rows:
print(row)
conn.close()
Output:
(1, 'Python', 'MakesYou#Fly.com', '2020-01-01', 1000)

Try this instead. See what tables are there.
"SELECT name FROM sqlite_master WHERE type='table'"

give similar sharable id to your database file just like you did with Reviews.csv
database_file=drive.CreateFile({'id':'your_sharable_id for sqlite file'})
database_file.GetContentFile('database.sqlite')

If you are trying to access the files from your google drive, you need to mount the drive first:
from google.colab import drive
drive.mount('/content/drive')
After you do this, right click on the file that you intend to read in colab session and select 'Copy Path'and paste it in the connection string.
con = sqlite3.connect('/content/database.sqlite')
You can now read the file.

con = sqlite3.connect('database.sqlite')
filtered_data = pd.read_sql_query("SELECT * FROM Reviews WHERE Score !=3",con)
If you are executing it twice you will definitely end with this type of error.Execute it exactly once without any fail.
If you get any error then remove
database.sqlite
this file and extract it again.This time execute it again without any fail/error .This worked for me .

Related

How to extract data from Oracle database with AWS Glue and other AWS services

I am new to AWS glue and other AWS stuff. I have a requirement to build an ETL framework for a project.
This is the high-level diagram. I want to understand, instead of creating 400 glue pipelines, can I create a template kind of a thing which is driven by reference data from a postgres aurora/mysql. I am familiar with Python.
Anyone has any ideas on this? Any references, code examples.
We had a config master table in our mysql db. The columns per convenience we had source_table_name as the identifier to fetch appropriate table column names/queries for CREATING STG TABLE, LOAD DATA INTO STG TABLE, INSERT/UPDATE INTO TARGET TABLEs etc.
We have also split the INSERT/UPDATE into two different columns in config master, since we were using ON DUPLICATE KEY to update existing records.
get the source table name, by processing the lambda events which will have landing file name.
Fetch all required data from the config master for the source table name. It would be something like following:
sql_query = "SELECT * FROM {0}.CONFIG_MASTER WHERE src_tbl_name = %s ".format(mydb)
cur.execute(sql_query, (source_fname))
result = cur.fetchall()
for row in result:
stg_table_name = row[1]
tgt_table_name = row[2]
create_stg_table_qry = row[3]
load_data_stg_table_qry = row[4]
insert_tgt_table_qry = row[5]
insert_tgt_table_qry_part_1 = row[6]
insert_tgt_table_qry_part_2 = row[7]
conn.commit()
cur.close()
Pass appropriate parameters to the generic functions as below:
create_stg_table(stg_table_name, create_stg_table_qry, load_data_stg_table_qry)
loaddata(tgt_table_name, insert_tgt_table_qry_part_1, insert_tgt_table_qry_part_2, stg_table_name)
The generic functions would be something like below, this is for aurora RDS, please make changes as needed.
def create_stg_table(stg_table_name, create_stg_table_qry, load_data_stg_table_qry):
cur, conn = connect()
createStgTable1 = "DROP TABLE IF EXISTS {0}.{1}".format(mydb, stg_table_name)
createStgTable2 = "CREATE TABLE {0}.{1} {2}".format(mydb, stg_table_name, create_stg_table_qry)
loadQry = "LOAD DATA FROM S3 PREFIX 's3://' REPLACE INTO TABLE ...".format()
cur.execute(createStgTable1)
cur.execute(createStgTable2)
cur.execute(loadQry)
conn.commit()
conn.close()
def loaddata(tgt_table_name, insert_tgt_table_qry_part_1, insert_tgt_table_qry_part_2, stg_table_name):
cur, conn = connect()
insertQry = "INSERT INTO target table, from the staging table query here"
print(insertQry)
cur.execute(insertQry)
conn.commit()
conn.close()
Hope this gives an idea.
Thanks

Python SQLite - fuse_hidden not deleted

I am trying to setup a python script to get some data and store it into a SQLite database. However when I am running the script a .fuse_hidden file is created.
On windows no .fuse_hidden file is observed but on ubuntu it generates at each call. The .fuse_hidden file seems to contain some form of sql query with input and tables.
I can delete the files without error during runtime but they are not deleted automatically. I make sure to end my connection to the db when I am finished with the query.
lsof give no information.
I am out of ideas on what to try next to get the files removed automatically. Any suggestions?
Testing
In order to confirm that it is nothing wrong with the code I made a simple script
(Assume there is an empty error.db)
import sqlite3
conn = sqlite3.connect("error.db")
cur = conn.cursor()
create_query = """
CREATE TABLE Errors (
name TEXT
);"""
try:
cur.execute(create_query)
except:
pass
cur.execute("INSERT INTO Errors (name) VALUES(?)", ["Test2"])
conn.commit()
cur.close()
conn.close()

ORA-00942 when importing data from Oracle Database to Pandas

I try to import data from a oracle database to a pandas dataframe.
right Now i am using:
import cx_Oracle
import pandas as pd
db_connection_string = '.../A1#server:port/servername'
con = cx_Oracle.connect(db_connection_string)
query = """SELECT*
FROM Salesdata"""
df = pd.read_sql(query, con=con)
and get the following error: DatabaseError: ORA-00942: Table or view doesn't exist
When I run a query to get the list of all tables:
cur = con.cursor()
cur.execute("SELECT table_name FROM dba_tables")
for row in cur:
print (row)
The output looks like this:
('A$',)
('A$BD',)
('Salesdata',)
What I am doing wrong? I used this question to start.
If I use the comment to print(query)I get:
SELECT*
FROM Salesdata
Getting ORA-00942 when running SELECT can have 2 possible causes:
The table does not exist: here you should make sure the table name is prefixed by the table owner (schema name) as in select * from owner_name.table_name. This is generally needed if the current Oracle user connected is not the table owner.
You don't have SELECT privileges on the table. This is also generally needed if the current Oracle user connected is not the table owner.
You need to check both.

Load OSM data into PostgreSQL using Python-OGR

I want to load OSM data into PostgreSQL database using Python script. When I am trying this Python script, it doesn't load OSM data into database. Can anyone guide me? I know how to load data in PostgreSQL using osmosis but currently I am looking for something using Python.
import psycopg2
from osgeo import ogr
# connect to the database
connection = psycopg2.connect(user="postgres",
password="password",
host="localhost",
database="example")
# create cursor
cursor = connection.cursor()
cursor.execute("DROP TABLE IF EXISTS trial1")
cursor.execute("CREATE TABLE trial1 (id SERIAL PRIMARY KEY, geom Geometry)")
cursor.execute("CREATE INDEX trial1_index ON trial1 USING GIST(geom)")
print("Successfully created ")
connection.commit()
# define OSMfile path
osm = ogr.Open("pedestrian.osm")
layer = osm.GetLayer(1)
# delete the existing contents of the table
cursor.execute("DELETE FROM trial1")
print(str(layer.GetFeatureCount()))
for i in range(layer.GetFeatureCount()):
feature = layer.GetFeature(i)
# Get feature geometry
geometry = feature.GetGeometryRef()
wkt = geometry.ExportToWkt()
# Insert data into database,
cursor.execute("INSERT INTO trial1 (geom) VALUES (ST_GeomFromText(" + "'" + wkt + "', 4326))")
print("Data inserted successfully")
connection.commit()
expected result:This code should create table in database and also import data into it from given file.currently this code is just creating table .after table creation it dont give error ,it says Process finished with exit code 0.

using python 2.7 to query sqlite3 database and getting "sqlite3 operational error no such table"

My simple test code is listed below. I created the table already and can query it using the SQLite Manager add-in on Firefox so I know the table and data exist. When I run the query in python (and using the python shell) I get the no such table error
def TroyTest(self, acctno):
conn = sqlite3.connect('TroyData.db')
curs = conn.cursor()
v1 = curs.execute('''
SELECT acctvalue
FROM balancedata
WHERE acctno = ? ''', acctno)
print v1
conn.close()
When you pass SQLite a non-existing path, it'll happily open a new database for you, instead of telling you that the file did not exist before. When you do that, it'll be empty and you'll instead get a "No such table" error.
You are using a relative path to the database, meaning it'll try to open the database in the current directory, and that is probably not where you think it is..
The remedy is to use an absolute path instead:
conn = sqlite3.connect('/full/path/to/TroyData.db')
You need to loop over the cursor to see results:
curs.execute('''
SELECT acctvalue
FROM balancedata
WHERE acctno = ? ''', acctno)
for row in curs:
print row[0]
or call fetchone():
print curs.fetchone() # prints whole row tuple
The problem is the SQL statment. you must specify the db name and after the table name...
'''SELECT * FROM db_name.table_name WHERE acctno = ? '''

Categories