csv into sqlite table python

csv into sqlite table python - python

Using python, I am trying to import a csv into an sqlite table and use the headers in the csv file to become the headers in the sqlite table. The code runs but the table "MyTable" does not appear to be created. Here is the code:
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
#Strips white space in header
columns = [h.strip() for h in columns]
#reader = csv.DictReader(f, fieldnames=columns)
for row in reader:
print(row)
con = sqlite3.connect("city_spec.db")
cursor = con.cursor()
#Inserts data from csv into table in sql database.
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
#cursor.commit()
con.commit()
con.close()
Thanks in advance for any help.

You can use Pandas to make this easy (you may need to pip install pandas first):
import sqlite3
import pandas as pd
# load data
df = pd.read_csv('dict_output.csv')
# strip whitespace from headers
df.columns = df.columns.str.strip()
con = sqlite3.connect("city_spec.db")
# drop data into database
df.to_sql("MyTable", con)
con.close()
Pandas will do all of the hard work for you, including create the actual table!

You haven't marked your answer solved yet so here goes.
Connect to the database just once, and create a cursor just once.
You can read the csv records only once.
I've added code that creates a crude form of the database table based on the column names alone. Again, this is done just once in the loop.
Your insertion code works fine.
import sqlite3
import csv
con = sqlite3.connect("city_spec.sqlite") ## these statements belong outside the loop
cursor = con.cursor() ## execute them just once
first = True
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
columns = [h.strip() for h in columns]
if first:
sql = 'CREATE TABLE IF NOT EXISTS MyTable (%s)' % ', '.join(['%s text'%column for column in columns])
print (sql)
cursor.execute(sql)
first = False
#~ for row in reader: ## we will read the rows later in the loop
#~ print(row)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
con.commit()
con.close()

You can also do it easy with peewee orm. For this you only use an extension from peewee, the playhouse.csv_loader:
from playhouse.csv_loader import *
db = SqliteDatabase('city_spec.db')
Test = load_csv(db, 'dict_output.csv')
You created the database city_spec.db with the headers as fields and the data from the dict_output.csv
If you don't have peewee you can install it with
pip install peewee

Related

How to insert a Pandas Dataframe into MySql using PyMySQL

I have got a DataFrame which has got around 30,000+ rows and 150+ columns. So, currently I am using the following code to insert the data into MySQL. But since it is reading the rows one at a time, it is taking too much time to insert all the rows into MySql.
Is there any way in which I can insert the rows all at once or in batches? The constraint here is that I need to use only PyMySQL, I cannot install any other library.
import pymysql
import pandas as pd
# Create dataframe
data = pd.DataFrame({
'book_id':[12345, 12346, 12347],
'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
'price':[29, 23, 27]
})
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='book')
# create cursor
cursor=connection.cursor()
# creating column list for insertion
cols = "`,`".join([str(i) for i in data.columns.tolist()])
# Insert DataFrame recrds one by one.
for i,row in data.iterrows():
sql = "INSERT INTO `book_details` (`" +cols + "`) VALUES (" + "%s,"*(len(row)-1) + "%s)"
cursor.execute(sql, tuple(row))
# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()
# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)
connection.close()
Thank You.

Try using SQLALCHEMY to create an Engine than you can use later with pandas df.to_sql function. This function writes rows from pandas dataframe to SQL database and it is much faster than iterating your DataFrame and using the MySql cursor.
Your code would look something like this:
import pymysql
import pandas as pd
from sqlalchemy import create_engine
# Create dataframe
data = pd.DataFrame({
'book_id':[12345, 12346, 12347],
'title':['Python Programming', 'Learn MySQL', 'Data Science Cookbook'],
'price':[29, 23, 27]
})
db_data = 'mysql+mysqldb://' + 'root' + ':' + '12345' + '#' + 'localhost' + ':3306/' \
+ 'book' + '?charset=utf8mb4'
engine = create_engine(db_data)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='12345',
db='book')
# create cursor
cursor=connection.cursor()
# Execute the to_sql for writting DF into SQL
data.to_sql('book_details', engine, if_exists='append', index=False)
# Execute query
sql = "SELECT * FROM `book_details`"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)
engine.dispose()
connection.close()
You can take a look to all the options this function has in pandas doc

It is faster to push a file to the SQL server and let the server manage the input.
So first push the data to a CSV file.
data.to_csv("import-data.csv", header=False, index=False, quoting=2, na_rep="\\N")
And then load it at once into the SQL table.
sql = "LOAD DATA LOCAL INFILE \'import-data.csv\' \
INTO TABLE book_details FIELDS TERMINATED BY \',\' ENCLOSED BY \'\"\' \
(`" +cols + "`)"
cursor.execute(sql)

Possible improvements.
remove or disable indexes on the table(s)
Take the commit out of the loop
Now try and load the data.
Generate a CSV file and load using ** LOAD DATA INFILE ** - this would be issued from within mysql.

sql queries and python

I am trying use sqlite3 and python3 on the CSV file to extract the booking_id table for some specific data. But I am getting a KeyError which means the requested table is not in the dictionary. I don't get it.
import csv
import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (booking_id,customer_id,source,status,checkin,checkout,oyo_rooms,hotel_id,amount,discount,date,PRIMARY KEY(booking_id))")
with open('TableA.csv', 'r') as fin:
dr = csv.DictReader(fin, delimiter='\t')
to_db = [(i['booking_id'], i['customer_id'], i['source'], i['status'], i['checkin'], i['checkout'],
i['oyo_rooms'], i['hotel_id'], i['amount'], i['discount'], i['date']) for i in dr]
cur.executemany(
"INSERT INTO t (booking_id,customer_id,source,status,checkin,checkout,oyo_rooms,hotel_id,amount,discount,date) VALUES (?, ?,?, ?, ?,?, ?,?, ?,?);", to_db)
con.commit()
con.close()
#error message
KeyError: 'booking_id'
This is the csv file - https://pastebin.com/xbgFryhZ

The key error you're getting means your csv file doesn't have that column. Post the first few lines from the csv file so we can check it out.
EDIT:
Now that you added the CSV file we can see that it is separated by commas, not tabs.
Change
dr = csv.DictReader(fin, delimiter='\t')
to
dr = csv.DictReader(fin, delimiter=',')

importing from excel to mysql

I am trying to import data from excel to MySQl below is my code , problem here is that it only writes the last row from my excel sheet to MySQl db and i want it to import all the rows from my excel sheet.
import pymysql
import xlrd
book = xlrd.open_workbook('C:\SqlExcel\Backup.xlsx')
sheet = book.sheet_by_index(0)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='',
db='test')
cursor = connection.cursor()
query = """INSERT INTO report_table (FirstName, LastName) VALUES (%s, %s)"""
for r in range(1, sheet.nrows):
fname = sheet.cell(r,1).value
lname = sheet.cell(r,2).value
values = (fname, lname)
cursor.execute(query, values)
connection.commit()
cursor.close()
connection.close()

You code is currently only storing the last pair, and writing that to the database. You need to call fname and lname inside the loop and write each pair seperately to the database.
You can ammend your code to this:
import pymysql
import xlrd
book = xlrd.open_workbook('C:\SqlExcel\Backup.xlsx')
sheet = book.sheet_by_index(0)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='',
db='test',
autocommit=True)
cursor = connection.cursor()
query = """INSERT INTO report_table (FirstName, LastName) VALUES (%s, %s)"""
# loop over each row
for r in range(1, sheet.nrows):
# extract each cell
fname = sheet.cell(r,1).value
lname = sheet.cell(r,2).value
# extract cells into pair
values = fname, lname
# write pair to db
cursor.execute(query, values)
# close everything
cursor.close()
connection.close()
Note: You can set autocommit=True in the connect phase. PyMySQL disables autocommit by default. This means you dont have to call cursor.commit() after your query.

Your variable values have to be inside the for instruction like this :
import pymysql
import xlrd
book = xlrd.open_workbook('C:\SqlExcel\Backup.xlsx')
sheet = book.sheet_by_index(0)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='',
db='test')
cursor = connection.cursor()
query = """INSERT INTO report_table (FirstName, LastName) VALUES (%s, %s)"""
for r in range(1, sheet.nrows):
fname = sheet.cell(r,1).value
lname = sheet.cell(r,2).value
values = (fname, lname)
cursor.execute(query, values)
connection.commit()
cursor.close()
connection.close()

Sorry, I don't know much about databases, so nor about pymysql. But assumed all the rest is correct I guess it could work like:
...
cursor = connection.cursor()
query = """INSERT INTO report_table (FirstName, LastName) VALUES (%s, %s)"""
for r in range(1, sheet.nrows):
fname = sheet.cell(r,1).value
lname = sheet.cell(r,2).value
values = (fname, lname)
cursor.execute(query, values)
connection.commit()
cursor.close()
connection.close()

Is this something you will do on a regular basis? I see the script you're writing but I am not sure if this is something you need to run over and over again or if you are just importing the data into MySQL once.
If this is a one shot deal, you can try this.
Open the spreadsheet and SELECT ALL then COPY all your data. Paste it into a text document and save the text document (let's say the text document will be in c:\temp\exceldata.txt). You can then load it all into the table with one command:
LOAD DATA INFILE 'c:/temp/exceldata.txt'
INTO TABLE report_table
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
I am making a few assumptions here:
The spreadsheet has only two columns and they are in the same order as the fields in your table.
You do NOT need to clear out the table before the load. If you do, issue the command TRUNCATE TABLE report_table; before the load.
Note, I chose a tab delimited format because I prefer it. You could save the file as a .CSV file and adjust the command as follows:
LOAD DATA INFILE 'c:/temp/exceldata.txt'
INTO TABLE report_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
The "optionally enclosed by" is there because Excel will put quotes around text data with a comma in it.
If you need to do this on a regular basis, you can still use the CSV method by writing an excel script that saves the file to a .CSV copy whenever the spreadsheet is saved. I have done that too.
I have never written python but this is how I do it in PHP.
HTH

This code worked for me after taking help from the above suggestion the error was of indentation now its working :)
import pymysql
import xlrd
book = xlrd.open_workbook('C:\SqlExcel\Backup.xlsx')
sheet = book.sheet_by_index(0)
# Connect to the database
connection = pymysql.connect(host='localhost',
user='root',
password='',
db='test',
autocommit=True)
cursor = connection.cursor()
query = """INSERT INTO report_table (FirstName, LastName) VALUES (%s, %s)"""
for r in range(1, sheet.nrows):
fname = sheet.cell(r,1).value
lname = sheet.cell(r,2).value
values = (fname, lname)
cursor.execute(query, values)
cursor.close()
connection.close()

Writing a csv file into SQL Server database using python

I am trying to write a csv file into a table in SQL Server database using python. I am facing errors when I pass the parameters , but I don't face any error when I do it manually. Here is the code I am executing.
cur=cnxn.cursor() # Get the cursor
csv_data = csv.reader(file(Samplefile.csv')) # Read the csv
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",rows)
cnxn.commit()
Error:
pyodbc.DataError: ('22001', '[22001] [Microsoft][ODBC SQL Server Driver][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW); [01000] [Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated. (3621)')
However when I insert the values manually. It works fine
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",'A','B','C','D')
I have ensured that the TABLE is there in the database, data types are consistent with the data I am passing. Connection and cursor are also correct. The data type of rows is "list"

Consider building the query dynamically to ensure the number of placeholders matches your table and CSV file format. Then it's just a matter of ensuring your table and CSV file are correct, instead of checking that you typed enough ? placeholders in your code.
The following example assumes
CSV file contains column names in the first line
Connection is already built
File name is test.csv
Table name is MyTable
Python 3
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
cursor = connection.cursor()
for data in reader:
cursor.execute(query, data)
cursor.commit()
If column names are not included in the file:
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
data = next(reader)
query = 'insert into MyTable values ({0})'
query = query.format(','.join('?' * len(data)))
cursor = connection.cursor()
cursor.execute(query, data)
for data in reader:
cursor.execute(query, data)
cursor.commit()

I modified the code written above by Brian as follows since the one posted above wouldn't work on the delimited files that I was trying to upload. The line row.pop() can also be ignored as it was necessary only for the set of files that I was trying to upload.
import csv
def upload_table(path, filename, delim, cursor):
"""
Function to upload flat file to sqlserver
"""
tbl = filename.split('.')[0]
cnt = 0
with open (path + filename, 'r') as f:
reader = csv.reader(f, delimiter=delim)
for row in reader:
row.pop() # can be commented out
row = ['NULL' if val == '' else val for val in row]
row = [x.replace("'", "''") for x in row]
out = "'" + "', '".join(str(item) for item in row) + "'"
out = out.replace("'NULL'", 'NULL')
query = "INSERT INTO " + tbl + " VALUES (" + out + ")"
cursor.execute(query)
cnt = cnt + 1
if cnt % 10000 == 0:
cursor.commit()
cursor.commit()
print("Uploaded " + str(cnt) + " rows into table " + tbl + ".")

You can pass the columns as arguments. For example:
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)", *rows)

If you are using MySqlHook in airflow , if cursor.execute() with params throw san error
TypeError: not all arguments converted during string formatting
use %s instead of ?
with open('/usr/local/airflow/files/ifsc_details.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
columns = next(csv_reader)
query = '''insert into ifsc_details({0}) values({1});'''
query = query.format(','.join(columns), ','.join(['%s'] * len(columns)))
mysql = MySqlHook(mysql_conn_id='local_mysql')
conn = mysql.get_conn()
cursor = conn.cursor()
for data in csv_reader:
cursor.execute(query, data)
cursor.commit()

I got it sorted out. The error was due to the size restriction restriction of table. It changed the column capacity like from col1 varchar(10) to col1 varchar(35) etc. Now it's working fine.

Here is the script and hope this works for you:
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()

You can also import data into SQL by using either:
The SQL Server Import and Export Wizard
SQL Server Integration Services (SSIS)
The OPENROWSET function
More details can be found on this webpage:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-data-from-excel-to-sql?view=sql-server-2017

Write to CSV from sqlite3 database in python

Ok, So I have a database called cars.db which has a table == inventory,
Inventory essentially contains
('Ford', 'Hiluz', 2),
('Ford', 'Tek', 6),
('Ford', 'Outlander', 9),
('Honda', 'Dualis', 3),
('Honday', 'Elantre', 4)
I then wrote this which is meant to edit that to the csv, however, I can't seem to work this out, in some cases I get stuff to print but its not right, and when I try and fix that, nothing prints. Any suggestions to get me on track?
#write table to csv
import sqlite3
import csv
with sqlite3.connect("cars.db") as connection:
csvWriter = csv.writer(open("output.csv", "w"))
c = connection.cursor()
rows = c.fetchall()
for x in rows:
csvWriter.writerows(x)

You should just do:
rows = c.fetchall()
csvWriter.writerows(rows)
If the reason you iterate through the rows is because you wan't to preprocess them before writing them to the file, then use the writerow method:
rows = c.fetchall()
for row in rows:
# do your stuff
csvWriter.writerow(row)

In order to put tittles in first row, dictionary approach is suggested for table inventory in cars.db
import sqlite3
import csv
import os.path
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
db_path = os.path.join(BASE_DIR, "cars.db")
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("SELECT rowid, * FROM inventory")
columns = [column[0] for column in c.description]
results = []
for row in c.fetchall():
results.append(dict(zip(columns, row)))
with open("output.csv", "w", newline='') as new_file:
fieldnames = columns
writer = csv.DictWriter(new_file,fieldnames=fieldnames)
writer.writeheader()
for line in results:
writer.writerow(line)
conn.close()

Using Pandas should be more performant and requires less code. You can save the data from a sqlite table to a Pandas DataFrame and then use Pandas to write the CSV file.
df = pd.read_sql('SELECT * from cars', conn)
df.to_csv('cars.csv')
Here's the full code that creates your sqlite table with fake data:
import pandas as pd
import sqlite3
# create Pandas DataFrame
data = [('Toyota', 'Hilux', 2),
('Ford', 'Tek', 6),
('Ford', 'Outlander', 9),
('Honda', 'Dualis', 3),
('Honday', 'Elantre', 4)]
df = pd.DataFrame.from_records(data, columns=['make', 'model', 'age'])
# establish sqlite connection
conn = sqlite3.connect('../tmp/cars.db')
c = conn.cursor()
# create sqlite table
c.execute('''CREATE TABLE cars (make text, model text, age int)''')
# add data to sqlite table
df.to_sql('cars', conn, if_exists='append', index = False)
# write sqlite table out as a CSV file
df = pd.read_sql('SELECT * from cars', conn)
df.to_csv('../tmp/cars.csv')
Here's code to write out all the tables in a sqlite database as CSV files with a single command:
for table in c.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall():
t = table[0]
df = pd.read_sql('SELECT * from ' + t, conn)
df.to_parquet('../tmp/' + t + '.csv')
See here for more info.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.