How to read a .db file in Python?

How to read a .db file in Python? - python

I have a excel file and want to store my excel file into a .db file. I have done that through sqlite. Now, I want to read my .db file through Python which I am unable to do as the code I have used says that the data is empty.
Below is the code:
df=pd.read_excel('filename.xlsx')
db='xyzDB'
conn=sqlite3.connect(db + '.sqlite')
c=conn.cursor()
table_list = [a for a in c.execute("SELECT name FROM sqlite_master WHERE type = 'Sheet1'")]
print(tablelist)
#another method
chunksize = 10000
for chunk in pd.read_excel('filename.xlsx', chunksize=chunksize):
chunk.columns = chunk.columns.str.replace(' ', '_') #replacing
chunk.to_sql(name='Sheet1', con=conn)
names = list(map(lambda x: x[0], c.description)) #Returns the column names
print(names)
for row in c:
print(row)
Note: have found these two codes from net and didn't understand the code. Would appreciate if you could guide me.

Try something like this ...
import pandas as pd
import sqlite3 as sq
# read csv into data frame
df=pd.read_csv('addresses.csv')
sql_data = 'addresses.sqlite'
conn = sq.connect(sql_data)
# write the data frame to the db
df.to_sql('addresses', conn, if_exists='replace', index=False)
conn.commit()
# read back from the database
print(pd.read_sql('select * from addresses', conn))
conn.close()

Related

Handle big files with python & pandas

Thanks for reading my post.
I need to deal with big files, let me give you more context, I extract some tables from a database convert those tables to CSV and after that, I convert them to JSON.
All that is to send the information to BigQuery.
Now my script works fine but I have a problem, some tables I extract are so so big one of them has 14 Gb, my problem is my server memory just has 8 Gb, exist any way to integrate some to my script to split or append the information ???
My script:
import pyodbc
import fileinput
import csv
import pandas as pd
import json
import os
import sys
conn = pyodbc.connect("Driver={SQL Server};"
"Server=TEST;"
"username=test;"
"password=12345;"
"Database=TEST;"
"Trusted_Connection=no;")
cursor = conn.cursor()
query = "SELECT * FROM placeholder where "
with open(r"D:\Test.txt") as file:
lines = file.readlines()
print(lines)
for user_input in lines:
result = query.replace("placeholder", user_input)
print(result)
sql_query = pd.read_sql(result,conn)
df = pd.DataFrame(sql_query)
user_inputs = user_input.strip("\n")
filename = os.path.join('D:\\', user_inputs + '.csv')
df.to_csv (filename, index = False, encoding='utf-8', sep = '~', quotechar = "`", quoting=csv.QUOTE_ALL)
print(filename)
filename_json = os.path.join('D:\\', user_inputs + '.jsonl')
csvFilePath = (filename)
jsonFilePath = (filename_json)
print(filename_json)
df_o = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
df_o.to_json(filename_json, orient = "records", lines = True, date_format = "iso", double_precision = 15, force_ascii = False, date_unit = 'ms', default_handler = str)
dir_name = "D:\\"
test = os.listdir(dir_name)
for item in test:
if item.endswith(".csv"):
os.remove(os.path.join(dir_name, item))
cursor.close()
conn.close()
I'm really new to python, I hope you can help me to integrate some into my script.
Really thanks so many guys !!!
Kind regards.

For large data sets you should avoid reading all of it at once and then writing it all at once. You should do partial reads and partial writes.
Since you are using BigQuery you should use paritions to limit the query output. Have some logic to update the partition offsets. For each partition you can generate one file per parition. In this case your output would be like output-1.csv, output-2.csv etc.
An example of using parition:
SELECT * FROM placeholder
WHERE transaction_date >= '2016-01-01'
As a bonus tip, avoid doing Select * as BigQuery is columnar storage system mentioning the columns you would want to read will significatnly improve the peformance.

Export Oracle database column headers with column names into csv

I am making a program that fetches column names and dumps the data into csv format.
Now everything is working just fine and data is being dumped into csv, the problem is,
I am not able to fetch headers into csv. If I open the exported csv file into excel, only data shows up not the column headers. How do I do that?
Here's my code:
import cx_Oracle
import csv
dsn_tns = cx_Oracle.makedsn(--Details--)
conn = cx_Oracle.connect(--Details--)
d = conn.cursor()
csv_file = open("profile.csv", "w")
writer = csv.writer(csv_file, delimiter=',', lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
d.execute("""
select * from all_tab_columns where OWNER = 'ABBAS'
""")
tables_tu = d.fetchall()
for row in tables_tu:
writer.writerow(row)
conn.close()
csv_file.close()
What code do I use to export headers too in csv?

Place this just above your for loop:
writer.writerow(i[0] for i in d.description)
Because d.description is a read-only attribute containing 7-tuples that look like:
(name,
type_code,
display_size,
internal_size,
precision,
scale,
null_ok)

Create an excel file from BytesIO using python

I am using pandas library to store excel into bytesIO memory. Later, I am storing this bytesIO object into SQL Server as below-
df = pandas.DataFrame(data1, columns=['col1', 'col2', 'col3'])
output = BytesIO()
writer = pandas.ExcelWriter(output,engine='xlsxwriter')
df.to_excel(writer)
writer.save()
output.seek(0)
workbook = output.read()
#store into table
Query = '''
INSERT INTO [TABLE]([file]) VALUES(?)
'''
values = (workbook)
cursor = conn.cursor()
cursor.execute(Query, values)
cursor.close()
conn.commit()
#Create excel file.
Query1 = "select [file] from [TABLE] where [id] = 1"
result = conn.cursor().execute(Query1).fetchall()
print(result[0])
Now, I want to pull the BytesIO object back from table and create an excel file and store it locally. How Do I do it?

Finally, I got solution.Below are the steps performed:
Takes Dataframe and convert it to excel and store it in memory in BytesIO format.
Store BytesIO object in Database column having varbinary(max)
Pull the stored BytesIO object and create an excel file locally.
Python Code:
#Get Required data in DataFrame:
df = pandas.DataFrame(data1, columns=['col1', 'col2', 'col3'])
#Convert the data frame to Excel and store it in BytesIO object `workbook`:
output = BytesIO()
writer = pandas.ExcelWriter(output,engine='xlsxwriter')
df.to_excel(writer)
writer.save()
output.seek(0)
workbook = output.read()
#store into Database table
Query = '''
INSERT INTO [TABLE]([file]) VALUES(?)
'''
values = (workbook)
cursor = conn.cursor()
cursor.execute(Query, values)
cursor.close()
conn.commit()
#Retrieve the BytesIO object from Database
Query1 = "select [file] from [TABLE] where [id] = 1"
result = conn.cursor().execute(Query1).fetchall()
WriteObj = BytesIO()
WriteObj.write(result[0][0])
WriteObj.seek(0)
df = pandas.read_excel(WriteObj)
df.to_excel("outputFile.xlsx")

How to open and convert sqlite database to pandas dataframe

I have downloaded some datas as a sqlite database (data.db) and I want to open this database in python and then convert it into pandas dataframe.
This is so far I have done
import sqlite3
import pandas
dat = sqlite3.connect('data.db') #connected to database with out error
pandas.DataFrame.from_records(dat, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
But its throwing this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 980, in from_records
coerce_float=coerce_float)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 5353, in _to_arrays
if not len(data):
TypeError: object of type 'sqlite3.Connection' has no len()
How to convert sqlite database to pandas dataframe

Despite sqlite being part of the Python Standard Library and is a nice and easy interface to SQLite databases, the Pandas tutorial states:
Note In order to use read_sql_table(), you must have the SQLAlchemy
optional dependency installed.
But Pandas still supports sqlite3 access if you want to avoid installing SQLAlchemy:
import sqlite3
import pandas as pd
# Create your connection.
cnx = sqlite3.connect('file.db')
df = pd.read_sql_query("SELECT * FROM table_name", cnx)
As stated here, but you need to know the name of the used table in advance.

The line
data = sqlite3.connect('data.db')
opens a connection to the database. There are no records queried up to this. So you have to execute a query afterward and provide this to the pandas DataFrame constructor.
It should look similar to this
import sqlite3
import pandas as pd
dat = sqlite3.connect('data.db')
query = dat.execute("SELECT * From <TABLENAME>")
cols = [column[0] for column in query.description]
results= pd.DataFrame.from_records(data = query.fetchall(), columns = cols)
I am not really firm with SQL commands, so you should check the correctness of the query. should be the name of the table in your database.

Parsing a sqlite .db into a dictionary of dataframes without knowing the table names:
def read_sqlite(dbfile):
import sqlite3
from pandas import read_sql_query, read_sql_table
with sqlite3.connect(dbfile) as dbcon:
tables = list(read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", dbcon)['name'])
out = {tbl : read_sql_query(f"SELECT * from {tbl}", dbcon) for tbl in tables}
return out

Search sqlalchemy, engine and database name in google (sqlite in this case):
import pandas as pd
import sqlalchemy
db_name = "data.db"
table_name = "LITTLE_BOBBY_TABLES"
engine = sqlalchemy.create_engine("sqlite:///%s" % db_name, execution_options={"sqlite_raw_colnames": True})
df = pd.read_sql_table(table_name, engine)

I wrote a piece of code up that saves tables in a database file such as .sqlite or .db and creates an excel file out of it with each table as a sheet or makes individual tables into csvs.
Note: You don't need to know the table names in advance!
import os, fnmatch
import sqlite3
import pandas as pd
#creates a directory without throwing an error
def create_dir(dir):
if not os.path.exists(dir):
os.makedirs(dir)
print("Created Directory : ", dir)
else:
print("Directory already existed : ", dir)
return dir
#finds files in a directory corresponding to a regex query
def find(pattern, path):
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
#convert sqlite databases(.db,.sqlite) to pandas dataframe(excel with each table as a different sheet or individual csv sheets)
def save_db(dbpath=None,excel_path=None,csv_path=None,extension="*.sqlite",csvs=True,excels=True):
if (excels==False and csvs==False):
print("Atleast one of the parameters need to be true: csvs or excels")
return -1
#little code to find files by extension
if dbpath==None:
files=find(extension,os.getcwd())
if len(files)>1:
print("Multiple files found! Selecting the first one found!")
print("To locate your file, set dbpath=<yourpath>")
dbpath = find(extension,os.getcwd())[0] if dbpath==None else dbpath
print("Reading database file from location :",dbpath)
#path handling
external_folder,base_name=os.path.split(os.path.abspath(dbpath))
file_name=os.path.splitext(base_name)[0] #firstname without .
exten=os.path.splitext(base_name)[-1] #.file_extension
internal_folder="Saved_Dataframes_"+file_name
main_path=os.path.join(external_folder,internal_folder)
create_dir(main_path)
excel_path=os.path.join(main_path,"Excel_Multiple_Sheets.xlsx") if excel_path==None else excel_path
csv_path=main_path if csv_path==None else csv_path
db = sqlite3.connect(dbpath)
cursor = db.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
print(len(tables),"Tables found :")
if excels==True:
#for writing to excel(xlsx) we will be needing this!
try:
import XlsxWriter
except ModuleNotFoundError:
!pip install XlsxWriter
if (excels==True and csvs==True):
writer = pd.ExcelWriter(excel_path, engine='xlsxwriter')
i=0
for table_name in tables:
table_name = table_name[0]
table = pd.read_sql_query("SELECT * from %s" % table_name, db)
i+=1
print("Parsing Excel Sheet ",i," : ",table_name)
table.to_excel(writer, sheet_name=table_name, index=False)
print("Parsing CSV File ",i," : ",table_name)
table.to_csv(os.path.join(csv_path,table_name + '.csv'), index_label='index')
writer.save()
elif excels==True:
writer = pd.ExcelWriter(excel_path, engine='xlsxwriter')
i=0
for table_name in tables:
table_name = table_name[0]
table = pd.read_sql_query("SELECT * from %s" % table_name, db)
i+=1
print("Parsing Excel Sheet ",i," : ",table_name)
table.to_excel(writer, sheet_name=table_name, index=False)
writer.save()
elif csvs==True:
i=0
for table_name in tables:
table_name = table_name[0]
table = pd.read_sql_query("SELECT * from %s" % table_name, db)
i+=1
print("Parsing CSV File ",i," : ",table_name)
table.to_csv(os.path.join(csv_path,table_name + '.csv'), index_label='index')
cursor.close()
db.close()
return 0
save_db();

If data.db is your SQLite database and table_name is one of its tables, then you can do:
import pandas as pd
df = pd.read_sql_table('table_name', 'sqlite:///data.db')
No other imports needed.

i have stored my data in database.sqlite table name is Reviews
import sqlite3
con=sqlite3.connect("database.sqlite")
data=pd.read_sql_query("SELECT * FROM Reviews",con)
print(data)

Python csv from database query adding a custom column to csv file

here is what I try to achieve my current code is working fine I get the query to run on my sql server but I will need to gather information from several servers. How would I add a column with the dbserver listed in that column?
import pyodbc
import csv
f = open("dblist.ini")
dbserver,UID,PWD = [ variable[variable.find("=")+1 :] for variable in f.readline().split("~")]
connectstring = "DRIVER={SQL server};SERVER=" + dbserver + ";DATABASE=master;UID="+UID+";PWD="+PWD
cnxn = pyodbc.connect(connectstring)
cursor = cnxn.cursor()
fd = open('mssql1.txt', 'r')
sqlFile = fd.read()
fd.close()
cursor.execute(sqlFile)
with open("out.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file, delimiter = '!')
csv_writer.writerow([i[0] for i in cursor.description]) # write headers
csv_writer.writerows(cursor)

You could add the extra information in your sql query. For example:
select "dbServerName", * from table;
Your cursor will return with an extra column in front of your real data that has the db Server name. The downside to this method is you're transferring a little more extra data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read a .db file in Python? - python

Related

Handle big files with python & pandas

Export Oracle database column headers with column names into csv

Create an excel file from BytesIO using python

How to open and convert sqlite database to pandas dataframe

Python csv from database query adding a custom column to csv file

Categories

Resources