sql queries and python - python

I am trying use sqlite3 and python3 on the CSV file to extract the booking_id table for some specific data. But I am getting a KeyError which means the requested table is not in the dictionary. I don't get it.
import csv
import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE t (booking_id,customer_id,source,status,checkin,checkout,oyo_rooms,hotel_id,amount,discount,date,PRIMARY KEY(booking_id))")
with open('TableA.csv', 'r') as fin:
dr = csv.DictReader(fin, delimiter='\t')
to_db = [(i['booking_id'], i['customer_id'], i['source'], i['status'], i['checkin'], i['checkout'],
i['oyo_rooms'], i['hotel_id'], i['amount'], i['discount'], i['date']) for i in dr]
cur.executemany(
"INSERT INTO t (booking_id,customer_id,source,status,checkin,checkout,oyo_rooms,hotel_id,amount,discount,date) VALUES (?, ?,?, ?, ?,?, ?,?, ?,?);", to_db)
con.commit()
con.close()
#error message
KeyError: 'booking_id'
This is the csv file - https://pastebin.com/xbgFryhZ

The key error you're getting means your csv file doesn't have that column. Post the first few lines from the csv file so we can check it out.
EDIT:
Now that you added the CSV file we can see that it is separated by commas, not tabs.
Change
dr = csv.DictReader(fin, delimiter='\t')
to
dr = csv.DictReader(fin, delimiter=',')

Related

Import Data from .csv file into mysql using python

I am trying to import data from two columns of a .csv file (time hh:mm, float). I created a database and a table in mysql.
import mysql.connector
import csv
mydb = mysql.connector.connect(host='127.0.0.1',
user= 'xxx',
passwd='xxx',
db='pv_datenbank')
cursor = mydb.cursor()
# get rid of the '' at the beginning of the .csv file
s = open('Sonneneinstrahlung.csv', mode='r', encoding='utf-8-sig').read()
open('Sonneneinstrahlung.csv', mode='w', encoding='utf-8').write(s)
print(s)
with open('Sonneneinstrahlung.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
sql = """INSERT INTO einstrahlung ('Uhrzeit', 'Einstrahlungsdaten') VALUES (%s, %s)"""
for row in csv_reader:
print(row)
print(cursor.rowcount, "was inserted.")
cursor.executemany(sql, csv_reader)
#cursor.execute(sql, row, multi=True)
mydb.commit()
mydb.close()
If I run the program with executemany(), result is the following:
['01:00', '1']
'-1 was inserted.'
and after this I do get the error code: Not all parameters were used again.
When I try the same thing with the execute() operator, no error is shown, but the data is not inserted in the table of my database.
Here you can see the input data:
executemany takes a statement and a sequence of sets of parameters.
Try this:
with open('Sonneneinstrahlung.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
sql = """INSERT INTO einstrahlung (Uhrzeit, Einstrahlungsdaten) VALUES (%s, %s)"""
cursor.executemany(sql, csv_reader)
mydb.commit()

Querying a csv file with default values to columns

I have to query this table with the query of type:
query(id,mtype,competition,gender,team1,team2,venue,date)
If every of the parameters in query is given, we can use if statements to store the results. But some of the parameters may not be provided. In that case, we have to consider all the column values.
Also, I have these data in a csv file. I want to read the csv file into a list and then query it. The only catch is that if the user doesn't provide a parameter in the query, it should consider all the values in the column.
Can someone suggest a way to do this with only few if-else statements or suggest some other way?
You can use pandas with read_csv and query , i.e.:
import pandas as pd
# csv file should have the field names on the first row
# id,mtype,competition,gender,team1,team2,venue,date
df = pd.read_csv("the_file.csv", sep=",")
df['date'] = pd.to_datetime(df['date']) # convert date to a datetime object
mtype = "ODM"
sd = "2017-02-18"
ed = "2017-02-20"
df_query = df.query("mtype == '{}' and date > '{}' and date < '{}'".format(mtype, sd, ed))
print df_query
Option 2:
You can also convert the csv file into an sqlite db and issue the queries there, something like:
Import csv to sqlite:
import csv
import sqlite3
import os.path
csv_file = "csv_to_db.csv"
db_file = "csv_to_db.sqlite"
if not os.path.exists(db_file): # if no db_file we create one
con = sqlite3.Connection(db_file,detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
# csv fields: id,mtype,competition,gender,team1,team2,venue,date
cur.execute('CREATE TABLE "venues" ("id" int primary key, "mtype" text,'
' "competition" text, "gender" text, "team1" text, '
'"team2" text, "venue" text, "venue_date" date);')
f = open(csv_file)
csv_reader = csv.reader(f, delimiter=',')
cur.executemany('INSERT INTO venues VALUES (?, ?, ?, ?, ?, ?, ?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()
Now we can start querying the db. You've asked:
Can you provide an example of type query(mtype,start_date,end_date)
with all other parameter missing?
For that you can use:
conn = sqlite3.connect(db_file,detect_types=sqlite3.PARSE_DECLTYPES)
c = conn.cursor()
start_date = "2017-02-15"
end_date = "2017-02-20"
c.execute("SELECT * FROM {table} WHERE mtype='{query}' AND venue_date BETWEEN date('{start_date}') AND date('{end_date}')".format(table="venues", query="ODM", start_date=start_date, end_date=end_date))
all_rows = c.fetchall()
print( all_rows)
Grab the complete gist
You can use Pandas, which provides a way to filter rows.

csv into sqlite table python

Using python, I am trying to import a csv into an sqlite table and use the headers in the csv file to become the headers in the sqlite table. The code runs but the table "MyTable" does not appear to be created. Here is the code:
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
#Strips white space in header
columns = [h.strip() for h in columns]
#reader = csv.DictReader(f, fieldnames=columns)
for row in reader:
print(row)
con = sqlite3.connect("city_spec.db")
cursor = con.cursor()
#Inserts data from csv into table in sql database.
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
#cursor.commit()
con.commit()
con.close()
Thanks in advance for any help.
You can use Pandas to make this easy (you may need to pip install pandas first):
import sqlite3
import pandas as pd
# load data
df = pd.read_csv('dict_output.csv')
# strip whitespace from headers
df.columns = df.columns.str.strip()
con = sqlite3.connect("city_spec.db")
# drop data into database
df.to_sql("MyTable", con)
con.close()
Pandas will do all of the hard work for you, including create the actual table!
You haven't marked your answer solved yet so here goes.
Connect to the database just once, and create a cursor just once.
You can read the csv records only once.
I've added code that creates a crude form of the database table based on the column names alone. Again, this is done just once in the loop.
Your insertion code works fine.
import sqlite3
import csv
con = sqlite3.connect("city_spec.sqlite") ## these statements belong outside the loop
cursor = con.cursor() ## execute them just once
first = True
with open ('dict_output.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
columns = [h.strip() for h in columns]
if first:
sql = 'CREATE TABLE IF NOT EXISTS MyTable (%s)' % ', '.join(['%s text'%column for column in columns])
print (sql)
cursor.execute(sql)
first = False
#~ for row in reader: ## we will read the rows later in the loop
#~ print(row)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print(query)
cursor = con.cursor()
for row in reader:
cursor.execute(query, row)
con.commit()
con.close()
You can also do it easy with peewee orm. For this you only use an extension from peewee, the playhouse.csv_loader:
from playhouse.csv_loader import *
db = SqliteDatabase('city_spec.db')
Test = load_csv(db, 'dict_output.csv')
You created the database city_spec.db with the headers as fields and the data from the dict_output.csv
If you don't have peewee you can install it with
pip install peewee

How to insert huge CSV file at once into SQL Server in python?

I have a large CSV file and I want to insert it all at once, instead of row by row. This is my code:
import pypyodbc
import csv
con = pypyodbc.connect('driver={SQL Server};' 'server=server_name;' 'database=DB-name;' 'trusted_connection=true')
cur = con.cursor()
csfile = open('out2.csv','r')
csv_data = csv.reader(csfile)
for row in csv_data:
try:
cur.execute("BULK INSERT INTO Table_name(Attribute, error, msg, Value, Success, TotalCount, SerialNo)" "VALUES (?, ?, ?, ?, ?, ?, ?)", row)
except Exception:
time.sleep(60)
cur.close()
con.commit()
con.close()
Bulk Insert should do it for you.
BULK
INSERT CSVTest
FROM 'c:\csvtest.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest
GO
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
Also, check out this link.
https://www.simple-talk.com/sql/learn-sql-server/bulk-inserts-via-tsql-in-sql-server/
It really depends on your system resources. You can store CSV file in memory and then insert it into database. But if your CSV file is larger than your RAM there should be some Time issue. You can save each row of csv file as an element in python List.here is my code:
csvRows = []
csvFileObj = open('yourfile.csv', 'r')
readerObj = csv.reader(csvFileObj)
for row in readerObj:
element1 = row[0]
.......
csvRows.append((element1,element2,...))
after that read element of the list and insert it to your db. I don't think there is a direct way to insert All csv rows into sqldb at once. you need some preprocessing.

Writing a csv file into SQL Server database using python

I am trying to write a csv file into a table in SQL Server database using python. I am facing errors when I pass the parameters , but I don't face any error when I do it manually. Here is the code I am executing.
cur=cnxn.cursor() # Get the cursor
csv_data = csv.reader(file(Samplefile.csv')) # Read the csv
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",rows)
cnxn.commit()
Error:
pyodbc.DataError: ('22001', '[22001] [Microsoft][ODBC SQL Server Driver][SQL Server]String or binary data would be truncated. (8152) (SQLExecDirectW); [01000] [Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated. (3621)')
However when I insert the values manually. It works fine
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",'A','B','C','D')
I have ensured that the TABLE is there in the database, data types are consistent with the data I am passing. Connection and cursor are also correct. The data type of rows is "list"
Consider building the query dynamically to ensure the number of placeholders matches your table and CSV file format. Then it's just a matter of ensuring your table and CSV file are correct, instead of checking that you typed enough ? placeholders in your code.
The following example assumes
CSV file contains column names in the first line
Connection is already built
File name is test.csv
Table name is MyTable
Python 3
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
cursor = connection.cursor()
for data in reader:
cursor.execute(query, data)
cursor.commit()
If column names are not included in the file:
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
data = next(reader)
query = 'insert into MyTable values ({0})'
query = query.format(','.join('?' * len(data)))
cursor = connection.cursor()
cursor.execute(query, data)
for data in reader:
cursor.execute(query, data)
cursor.commit()
I modified the code written above by Brian as follows since the one posted above wouldn't work on the delimited files that I was trying to upload. The line row.pop() can also be ignored as it was necessary only for the set of files that I was trying to upload.
import csv
def upload_table(path, filename, delim, cursor):
"""
Function to upload flat file to sqlserver
"""
tbl = filename.split('.')[0]
cnt = 0
with open (path + filename, 'r') as f:
reader = csv.reader(f, delimiter=delim)
for row in reader:
row.pop() # can be commented out
row = ['NULL' if val == '' else val for val in row]
row = [x.replace("'", "''") for x in row]
out = "'" + "', '".join(str(item) for item in row) + "'"
out = out.replace("'NULL'", 'NULL')
query = "INSERT INTO " + tbl + " VALUES (" + out + ")"
cursor.execute(query)
cnt = cnt + 1
if cnt % 10000 == 0:
cursor.commit()
cursor.commit()
print("Uploaded " + str(cnt) + " rows into table " + tbl + ".")
You can pass the columns as arguments. For example:
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)", *rows)
If you are using MySqlHook in airflow , if cursor.execute() with params throw san error
TypeError: not all arguments converted during string formatting
use %s instead of ?
with open('/usr/local/airflow/files/ifsc_details.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
columns = next(csv_reader)
query = '''insert into ifsc_details({0}) values({1});'''
query = query.format(','.join(columns), ','.join(['%s'] * len(columns)))
mysql = MySqlHook(mysql_conn_id='local_mysql')
conn = mysql.get_conn()
cursor = conn.cursor()
for data in csv_reader:
cursor.execute(query, data)
cursor.commit()
I got it sorted out. The error was due to the size restriction restriction of table. It changed the column capacity like from col1 varchar(10) to col1 varchar(35) etc. Now it's working fine.
Here is the script and hope this works for you:
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
You can also import data into SQL by using either:
The SQL Server Import and Export Wizard
SQL Server Integration Services (SSIS)
The OPENROWSET function
More details can be found on this webpage:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-data-from-excel-to-sql?view=sql-server-2017

Categories