New to python: My method to import CSV to SQlite DB - python

I just started python programming and find it very useful so far coming from a Delphi/Lazarus background.
I recently downloaded trend data from a SCADA system and needed to import the data into a sqlite db. I thought I would share my python script here.
This process would have taken a lot more programming in Pascal. Now I just create a GUI with Lazarus and use TProcess to run the script with some parameters and the data is in the db.
Sample of trend data
Time,P1_VC70004PID_DRCV,P1_VC70004PID_DRPV,P1_VC70004PID_DRSP
6:00:30,27.75,3000,3000
6:01:00,27.75,3000,3000
6:01:30,27.75,3000,3000
6:02:00,27.75,3000,3000
6:02:30,27.75,3000,3000
6:03:00,27.75,3000,3000
6:03:30,27.75,3000,3000
6:04:00,27.75,3000,3000
6:04:30,27.75,3000,3000
6:05:00,27.75,3000,3000
Python code:
import csv
import sqlite3
import sys
FileName = sys.argv[1]
TableName = "data"
db = "trenddata.db3"
conn = sqlite3.connect(db)
conn.text_factory = str # allows utf-8 data to be stored
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS " + TableName)
c.execute("VACUUM")
i = 0
f = open(FileName, 'rt')
try:
reader = csv.reader(f)
for row in reader:
if i == 0:
## Create Table header section from Header info in CSV doc
c.execute("CREATE TABLE %s (%s)" % (TableName, ", ".join(row)))
else:
## Import row data into database
c.execute("INSERT INTO %s VALUES ('%s')" % (TableName, "', '".join(row)))
i += 1
conn.commit()
finally:
f.close()
conn.close()
print("Imported %s records" % (i))

Related

Is there Python code to write directly into a SQLite command line? [duplicate]

I have a CSV file and I want to bulk-import this file into my sqlite3 database using Python. the command is ".import .....". but it seems that it cannot work like this. Can anyone give me an example of how to do it in sqlite3? I am using windows just in case.
Thanks
import csv, sqlite3
con = sqlite3.connect(":memory:") # change to 'sqlite:///your_filename.db'
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);") # use your column names here
with open('data.csv','r') as fin: # `with` statement available in 2.5+
# csv.DictReader uses first line in file for column headings by default
dr = csv.DictReader(fin) # comma is default delimiter
to_db = [(i['col1'], i['col2']) for i in dr]
cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()
con.close()
Creating an sqlite connection to a file on disk is left as an exercise for the reader ... but there is now a two-liner made possible by the pandas library
df = pandas.read_csv(csvfile)
df.to_sql(table_name, conn, if_exists='append', index=False)
You're right that .import is the way to go, but that's a command from the SQLite3 command line program. A lot of the top answers to this question involve native python loops, but if your files are large (mine are 10^6 to 10^7 records), you want to avoid reading everything into pandas or using a native python list comprehension/loop (though I did not time them for comparison).
For large files, I believe the best option is to use subprocess.run() to execute sqlite's import command. In the example below, I assume the table already exists, but the csv file has headers in the first row. See .import docs for more info.
subprocess.run()
from pathlib import Path
db_name = Path('my.db').resolve()
csv_file = Path('file.csv').resolve()
result = subprocess.run(['sqlite3',
str(db_name),
'-cmd',
'.mode csv',
'.import --skip 1 ' + str(csv_file).replace('\\','\\\\')
+' <table_name>'],
capture_output=True)
edit note: sqlite3's .import command has improved so that it can treat the first row as header names or even skip the first x rows (requires version >=3.32, as noted in this answer. If you have an older version of sqlite3, you may need to first create the table, then strip off the first row of the csv before importing. The --skip 1 argument will give an error prior to 3.32
Explanation
From the command line, the command you're looking for is sqlite3 my.db -cmd ".mode csv" ".import file.csv table". subprocess.run() runs a command line process. The argument to subprocess.run() is a sequence of strings which are interpreted as a command followed by all of it's arguments.
sqlite3 my.db opens the database
-cmd flag after the database allows you to pass multiple follow on commands to the sqlite program. In the shell, each command has to be in quotes, but here, they just need to be their own element of the sequence
'.mode csv' does what you'd expect
'.import --skip 1'+str(csv_file).replace('\\','\\\\')+' <table_name>' is the import command.
Unfortunately, since subprocess passes all follow-ons to -cmd as quoted strings, you need to double up your backslashes if you have a windows directory path.
Stripping Headers
Not really the main point of the question, but here's what I used. Again, I didn't want to read the whole files into memory at any point:
with open(csv, "r") as source:
source.readline()
with open(str(csv)+"_nohead", "w") as target:
shutil.copyfileobj(source, target)
My 2 cents (more generic):
import csv, sqlite3
import logging
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("%s %s" % (f, dt[f]))
# Generate create table statement:
stmt = "CREATE TABLE ads (%s)" % ",".join(cols)
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO ads VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
return con
The .import command is a feature of the sqlite3 command-line tool. To do it in Python, you should simply load the data using whatever facilities Python has, such as the csv module, and inserting the data as per usual.
This way, you also have control over what types are inserted, rather than relying on sqlite3's seemingly undocumented behaviour.
Many thanks for bernie's answer! Had to tweak it a bit - here's what worked for me:
import csv, sqlite3
conn = sqlite3.connect("pcfc.sl3")
curs = conn.cursor()
curs.execute("CREATE TABLE PCFC (id INTEGER PRIMARY KEY, type INTEGER, term TEXT, definition TEXT);")
reader = csv.reader(open('PC.txt', 'r'), delimiter='|')
for row in reader:
to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8"), unicode(row[2], "utf8")]
curs.execute("INSERT INTO PCFC (type, term, definition) VALUES (?, ?, ?);", to_db)
conn.commit()
My text file (PC.txt) looks like this:
1 | Term 1 | Definition 1
2 | Term 2 | Definition 2
3 | Term 3 | Definition 3
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys, csv, sqlite3
def main():
con = sqlite3.connect(sys.argv[1]) # database file input
cur = con.cursor()
cur.executescript("""
DROP TABLE IF EXISTS t;
CREATE TABLE t (COL1 TEXT, COL2 TEXT);
""") # checks to see if table exists and makes a fresh table.
with open(sys.argv[2], "rb") as f: # CSV file input
reader = csv.reader(f, delimiter=',') # no header information with delimiter
for row in reader:
to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8")] # Appends data from CSV file representing and handling of text
cur.execute("INSERT INTO neto (COL1, COL2) VALUES(?, ?);", to_db)
con.commit()
con.close() # closes connection to database
if __name__=='__main__':
main()
"""
cd Final_Codes
python csv_to_db.py
CSV to SQL DB
"""
import csv
import sqlite3
import os
import fnmatch
UP_FOLDER = os.path.dirname(os.getcwd())
DATABASE_FOLDER = os.path.join(UP_FOLDER, "Databases")
DBNAME = "allCompanies_database.db"
def getBaseNameNoExt(givenPath):
"""Returns the basename of the file without the extension"""
filename = os.path.splitext(os.path.basename(givenPath))[0]
return filename
def find(pattern, path):
"""Utility to find files wrt a regex search"""
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
if __name__ == "__main__":
Database_Path = os.path.join(DATABASE_FOLDER, DBNAME)
# change to 'sqlite:///your_filename.db'
csv_files = find('*.csv', DATABASE_FOLDER)
con = sqlite3.connect(Database_Path)
cur = con.cursor()
for each in csv_files:
with open(each, 'r') as fin: # `with` statement available in 2.5+
# csv.DictReader uses first line in file for column headings by default
dr = csv.DictReader(fin) # comma is default delimiter
TABLE_NAME = getBaseNameNoExt(each)
Cols = dr.fieldnames
numCols = len(Cols)
"""
for i in dr:
print(i.values())
"""
to_db = [tuple(i.values()) for i in dr]
print(TABLE_NAME)
# use your column names here
ColString = ','.join(Cols)
QuestionMarks = ["?"] * numCols
ToAdd = ','.join(QuestionMarks)
cur.execute(f"CREATE TABLE {TABLE_NAME} ({ColString});")
cur.executemany(
f"INSERT INTO {TABLE_NAME} ({ColString}) VALUES ({ToAdd});", to_db)
con.commit()
con.close()
print("Execution Complete!")
This should come in handy when you have a lot of csv files in a folder which you wish to convert to a single .db file in a go!
Notice that you dont have to know the filenames, tablenames or fieldnames (column names) beforehand!
If the CSV file must be imported as part of a python program, then for simplicity and efficiency, you could use os.system along the lines suggested by the following:
import os
cmd = """sqlite3 database.db <<< ".import input.csv mytable" """
rc = os.system(cmd)
print(rc)
The point is that by specifying the filename of the database, the data will automatically be saved, assuming there are no errors reading it.
Here are solutions that'll work if your CSV file is really big. Use to_sql as suggested by another answer, but set chunksize so it doesn't try to process the whole file at once.
import sqlite3
import pandas as pd
conn = sqlite3.connect('my_data.db')
c = conn.cursor()
users = pd.read_csv('users.csv')
users.to_sql('users', conn, if_exists='append', index = False, chunksize = 10000)
You can also use Dask, as described here to write a lot of Pandas DataFrames in parallel:
dto_sql = dask.delayed(pd.DataFrame.to_sql)
out = [dto_sql(d, 'table_name', db_url, if_exists='append', index=True)
for d in ddf.to_delayed()]
dask.compute(*out)
See here for more details.
Based on Guy L solution (Love it) but can handle escaped fields.
import csv, sqlite3
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile,dbFile,tablename, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("\"%s\" %s" % (f, dt[f]))
# Generate create table statement:
stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
print(stmt)
con = sqlite3.connect(dbFile)
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
con.close()
You can do this using blaze & odo efficiently
import blaze as bz
csv_path = 'data.csv'
bz.odo(csv_path, 'sqlite:///data.db::data')
Odo will store the csv file to data.db (sqlite database) under the schema data
Or you use odo directly, without blaze. Either ways is fine. Read this documentation
The following can also add fields' name based on the CSV header:
import sqlite3
def csv_sql(file_dir,table_name,database_name):
con = sqlite3.connect(database_name)
cur = con.cursor()
# Drop the current table by:
# cur.execute("DROP TABLE IF EXISTS %s;" % table_name)
with open(file_dir, 'r') as fl:
hd = fl.readline()[:-1].split(',')
ro = fl.readlines()
db = [tuple(ro[i][:-1].split(',')) for i in range(len(ro))]
header = ','.join(hd)
cur.execute("CREATE TABLE IF NOT EXISTS %s (%s);" % (table_name,header))
cur.executemany("INSERT INTO %s (%s) VALUES (%s);" % (table_name,header,('?,'*len(hd))[:-1]), db)
con.commit()
con.close()
# Example:
csv_sql('./surveys.csv','survey','eco.db')
in the interest of simplicity, you could use the sqlite3 command line tool from the Makefile of your project.
%.sql3: %.csv
rm -f $#
sqlite3 $# -echo -cmd ".mode csv" ".import $< $*"
%.dump: %.sql3
sqlite3 $< "select * from $*"
make test.sql3 then creates the sqlite database from an existing test.csv file, with a single table "test". you can then make test.dump to verify the contents.
With this you can do joins on CSVs as well:
import sqlite3
import os
import pandas as pd
from typing import List
class CSVDriver:
def __init__(self, table_dir_path: str):
self.table_dir_path = table_dir_path # where tables (ie. csv files) are located
self._con = None
#property
def con(self) -> sqlite3.Connection:
"""Make a singleton connection to an in-memory SQLite database"""
if not self._con:
self._con = sqlite3.connect(":memory:")
return self._con
def _exists(self, table: str) -> bool:
query = """
SELECT name
FROM sqlite_master
WHERE type ='table'
AND name NOT LIKE 'sqlite_%';
"""
tables = self.con.execute(query).fetchall()
return table in tables
def _load_table_to_mem(self, table: str, sep: str = None) -> None:
"""
Load a CSV into an in-memory SQLite database
sep is set to None in order to force pandas to auto-detect the delimiter
"""
if self._exists(table):
return
file_name = table + ".csv"
path = os.path.join(self.table_dir_path, file_name)
if not os.path.exists(path):
raise ValueError(f"CSV table {table} does not exist in {self.table_dir_path}")
df = pd.read_csv(path, sep=sep, engine="python") # set engine to python to skip pandas' warning
df.to_sql(table, self.con, if_exists='replace', index=False, chunksize=10000)
def query(self, query: str) -> List[tuple]:
"""
Run an SQL query on CSV file(s).
Tables are loaded from table_dir_path
"""
tables = extract_tables(query)
for table in tables:
self._load_table_to_mem(table)
cursor = self.con.cursor()
cursor.execute(query)
records = cursor.fetchall()
return records
extract_tables():
import sqlparse
from sqlparse.sql import IdentifierList, Identifier, Function
from sqlparse.tokens import Keyword, DML
from collections import namedtuple
import itertools
class Reference(namedtuple('Reference', ['schema', 'name', 'alias', 'is_function'])):
__slots__ = ()
def has_alias(self):
return self.alias is not None
#property
def is_query_alias(self):
return self.name is None and self.alias is not None
#property
def is_table_alias(self):
return self.name is not None and self.alias is not None and not self.is_function
#property
def full_name(self):
if self.schema is None:
return self.name
else:
return self.schema + '.' + self.name
def _is_subselect(parsed):
if not parsed.is_group:
return False
for item in parsed.tokens:
if item.ttype is DML and item.value.upper() in ('SELECT', 'INSERT',
'UPDATE', 'CREATE', 'DELETE'):
return True
return False
def _identifier_is_function(identifier):
return any(isinstance(t, Function) for t in identifier.tokens)
def _extract_from_part(parsed):
tbl_prefix_seen = False
for item in parsed.tokens:
if item.is_group:
for x in _extract_from_part(item):
yield x
if tbl_prefix_seen:
if _is_subselect(item):
for x in _extract_from_part(item):
yield x
# An incomplete nested select won't be recognized correctly as a
# sub-select. eg: 'SELECT * FROM (SELECT id FROM user'. This causes
# the second FROM to trigger this elif condition resulting in a
# StopIteration. So we need to ignore the keyword if the keyword
# FROM.
# Also 'SELECT * FROM abc JOIN def' will trigger this elif
# condition. So we need to ignore the keyword JOIN and its variants
# INNER JOIN, FULL OUTER JOIN, etc.
elif item.ttype is Keyword and (
not item.value.upper() == 'FROM') and (
not item.value.upper().endswith('JOIN')):
tbl_prefix_seen = False
else:
yield item
elif item.ttype is Keyword or item.ttype is Keyword.DML:
item_val = item.value.upper()
if (item_val in ('COPY', 'FROM', 'INTO', 'UPDATE', 'TABLE') or
item_val.endswith('JOIN')):
tbl_prefix_seen = True
# 'SELECT a, FROM abc' will detect FROM as part of the column list.
# So this check here is necessary.
elif isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
if (identifier.ttype is Keyword and
identifier.value.upper() == 'FROM'):
tbl_prefix_seen = True
break
def _extract_table_identifiers(token_stream):
for item in token_stream:
if isinstance(item, IdentifierList):
for ident in item.get_identifiers():
try:
alias = ident.get_alias()
schema_name = ident.get_parent_name()
real_name = ident.get_real_name()
except AttributeError:
continue
if real_name:
yield Reference(schema_name, real_name,
alias, _identifier_is_function(ident))
elif isinstance(item, Identifier):
yield Reference(item.get_parent_name(), item.get_real_name(),
item.get_alias(), _identifier_is_function(item))
elif isinstance(item, Function):
yield Reference(item.get_parent_name(), item.get_real_name(),
item.get_alias(), _identifier_is_function(item))
def extract_tables(sql):
# let's handle multiple statements in one sql string
extracted_tables = []
statements = list(sqlparse.parse(sql))
for statement in statements:
stream = _extract_from_part(statement)
extracted_tables.append([ref.name for ref in _extract_table_identifiers(stream)])
return list(itertools.chain(*extracted_tables))
Example (assuming account.csv and tojoin.csv exist in /path/to/files):
db_path = r"/path/to/files"
driver = CSVDriver(db_path)
query = """
SELECT tojoin.col_to_join
FROM account
LEFT JOIN tojoin
ON account.a = tojoin.a
"""
driver.query(query)
import csv, sqlite3
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile,dbFile,tablename, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("\"%s\" %s" % (f, dt[f]))
# Generate create table statement:
stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
print(stmt)
con = sqlite3.connect(dbFile)
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
con.close()
I've found that it can be necessary to break up the transfer of data from the csv to the database in chunks as to not run out of memory. This can be done like this:
import csv
import sqlite3
from operator import itemgetter
# Establish connection
conn = sqlite3.connect("mydb.db")
# Create the table
conn.execute(
"""
CREATE TABLE persons(
person_id INTEGER,
last_name TEXT,
first_name TEXT,
address TEXT
)
"""
)
# These are the columns from the csv that we want
cols = ["person_id", "last_name", "first_name", "address"]
# If the csv file is huge, we instead add the data in chunks
chunksize = 10000
# Parse csv file and populate db in chunks
with conn, open("persons.csv") as f:
reader = csv.DictReader(f)
chunk = []
for i, row in reader:
if i % chunksize == 0 and i > 0:
conn.executemany(
"""
INSERT INTO persons
VALUES(?, ?, ?, ?)
""", chunk
)
chunk = []
items = itemgetter(*cols)(row)
chunk.append(items)
Here is my version, works already by asking you to select the '.csv' file you want to convert
from multiprocessing import current_process
import pandas as pd
import sqlite3
import os
from tkinter import Tk
from tkinter.filedialog import askopenfilename
from pathlib import Path
def csv_to_db(csv_filedir):
if not Path(csv_filedir).is_file(): # if needed ask for user input of CVS file
current_path = os.getcwd()
Tk().withdraw()
csv_filedir = askopenfilename(initialdir=current_path)
try:
data = pd.read_csv(csv_filedir) # load CSV file
except:
print("Something went wrong when opening to the file")
print(csv_filedir)
csv_df = pd.DataFrame(data)
csv_df = csv_df.fillna('NULL') # make NaN = to 'NULL' for SQL format
[path,filename] = os.path.split(csv_filedir) # define path and filename
[filename,_] = os.path.splitext(filename)
database_filedir = os.path.join(path, filename + '.db')
conn = sqlite3.connect(database_filedir) # connect to SQL server
[fields_sql, header_sql_string] = create_sql_fields(csv_df)
# CREATE EMPTY DATABASE
create_sql = ''.join(['CREATE TABLE IF NOT EXISTS ' + filename + ' (' + fields_sql + ')'])
cursor = conn.cursor()
cursor.execute(create_sql)
# INSERT EACH ROW IN THE SQL DATABASE
for irow in csv_df.itertuples():
insert_values_string = ''.join(['INSERT INTO ', filename, header_sql_string, ' VALUES ('])
insert_sql = f"{insert_values_string} {irow[1]}, '{irow[2]}','{irow[3]}', {irow[4]}, '{irow[5]}' )"
print(insert_sql)
cursor.execute(insert_sql)
# COMMIT CHANGES TO DATABASE AND CLOSE CONNECTION
conn.commit()
conn.close()
print('\n' + csv_filedir + ' \n converted to \n' + database_filedir)
return database_filedir
def create_sql_fields(df): # gather the headers of the CSV and create two strings
fields_sql = [] # str1 = var1 TYPE, va2, TYPE ...
header_names = [] # str2 = var1, var2, var3, var4
for col in range(0,len(df.columns)):
fields_sql.append(df.columns[col])
fields_sql.append(str(df.dtypes[col]))
header_names.append(df.columns[col])
if col != len(df.columns)-1:
fields_sql.append(',')
header_names.append(',')
fields_sql = ' '.join(fields_sql)
fields_sql = fields_sql.replace('int64','integer')
fields_sql = fields_sql.replace('float64','integer')
fields_sql = fields_sql.replace('object','text')
header_sql_string = '(' + ''.join(header_names) + ')'
return fields_sql, header_sql_string
csv_to_db('')

how to automatically create table based on CSV into postgres using python

I am a new Python programmer and trying to import a sample CSV file into my Postgres database using python script.
I have CSV file with name abstable1 it has 3 headers:
absid, name, number
I have many such files in a folder
I want to create a table into PostgreSQL with the same name as the CSV file for all.
Here is the code which I tried to just create a table for one file to test:
import psycopg2
import csv
import os
#filePath = 'c:\\Python27\\Scripts\\abstable1.csv'
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= pwdxx")
print("Connecting to Database")
cur = conn.cursor()
#Uncomment to execute the code below to create a table
cur.execute("""CREATE TABLE abs.abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")
#to copy the csv data into created table
with open('abstable1.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abs.abstable1', sep=',')
conn.commit()
conn.close()
This is the error that I am getting:
File "c:\Python27\Scripts\testabs.py", line 26, in <module>
cur.copy_from(f, 'abs.abstable1', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY abstable1, line 1
Any recommendation or alternate solution to resolve this issue is highly appreciated.
Here's what worked for me by: import glob
This code automatically reads all CSV files in a folder and Creates a table with Same name as of the file.
Although I'm still trying to figure out how to extract specific datatypes according to the data in CSV.
But as far as table creation is concerned, this works like a charm for all CSV files in a folder.
import csv
import psycopg2
import os
import glob
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
# Split columns into an array [...]
columns = firstLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ tablename + "("
#some loop or function according to your requiremennt
# Define columns for table
for column in columns:
sqlQueryCreate += column + " VARCHAR(64),\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
cur = conn.cursor()
cur.execute(sqlQueryCreate)
conn.commit()
cur.close()
i tried your code and works fine
import psycopg2
conn = psycopg2.connect("host= 127.0.0.1 dbname=testdb user=postgres password=postgres")
print("Connecting to Database")
cur = conn.cursor()
'''cur.execute("""CREATE TABLE abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")'''
with open('lolo.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abstable1', sep=',', columns=('absid', 'name', 'number'))
conn.commit()
conn.close()
although i had to make some changes for it to work:
i had to name the table abstable1 because using abs.abstable1 postgres assumes that i'm using the schema abs, maybe you created that schema on your database if not check on that, also i'm using python 3.7
i noticed that you are using python 2.7(which i think is no longer supported), this may cause issues, since you say you are learning i would recommend that you use python 3 since it is more used now and you most likely encounter code written on it and you would have to be adapting your code to fit your python 2.7
I post my solution here based on #Rose answer.
I used sqlalchemy, a JSON file as config and glob.
import json
import glob
from sqlalchemy import create_engine, text
def create_tables_from_files(files_folder, engine, config):
try:
for filename in glob.glob(files_folder+"\*csv"):
tablename = filename.replace(files_folder, "").replace('\\', "").replace(".csv", "")
input_file = open(filename, "r")
columns = input_file.readline().strip().split(",")
create_query = 'DROP TABLE IF EXISTS ' + config["staging_schema"] + "." + tablename + "; \n"
create_query +='CREATE TABLE ' + config["staging_schema"] + "." + tablename + " ( "
for column in columns:
create_query += column + " VARCHAR, \n "
create_query = create_query[:-4]
create_query += ");"
engine.execute(text(create_query).execution_options(autocommit=True))
print(tablename + " table created")
except:
print("Error at uploading tables")

SQL query output to .csv

I am running SQL query from python API and want to collect data in Structured(column-wise data under their header).CSV format.
This is the code so far I have.
sql = "SELECT id,author From researches WHERE id < 20 "
cursor.execute(sql)
data = cursor.fetchall()
print (data)
with open('metadata.csv', 'w', newline='') as f_handle:
writer = csv.writer(f_handle)
header = ['id', 'author']
writer.writerow(header)
for row in data:
writer.writerow(row)
Now the data is being printed on the console but not getting in .CSV file this is what I am getting as output:
What is that I am missing?
Here is a simple example of what you are trying to do:
import sqlite3 as db
import csv
# Run your query, the result is stored as `data`
with db.connect('vehicles.db') as conn:
cur = conn.cursor()
sql = "SELECT make, style, color, plate FROM vehicle_vehicle"
cur.execute(sql)
data = cur.fetchall()
# Create the csv file
with open('vehicle.csv', 'w', newline='') as f_handle:
writer = csv.writer(f_handle)
# Add the header/column names
header = ['make', 'style', 'color', 'plate']
writer.writerow(header)
# Iterate over `data` and write to the csv file
for row in data:
writer.writerow(row)
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
from urllib.parse import quote_plus
params = quote_plus(r'Driver={SQL Server};Server=server_name; Database=DB_name;Trusted_Connection=yes;')
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
sql_string = '''SELECT id,author From researches WHERE id < 20 '''
final_data_fetch = pd.read_sql_query(sql_string, engine)
final_data_fetch.to_csv('file_name.csv')
Hope this helps!
with mysql - export csv with mysqlclient library - utf8
import csv
import MySQLdb as mariadb;
import sys
tablelue="extracted_table"
try:
conn = mariadb.connect(
host="127.0.0.1",
port=3306,
user="me",
password="mypasswd",
database="mydb")
cur = conn.cursor()
instruction="show columns from " + tablelue
cur.execute(instruction)
myresult = cur.fetchall()
work=list()
i=0
for x in myresult:
work.append(x[0])
i=i+1
wsql = "SELECT * FROM " + tablelue
cur.execute(wsql)
wdata = cur.fetchall()
# Create the csv file
fichecrit=tablelue+".csv"
with open(fichecrit, 'w', newline='', encoding="utf8") as f_handle:
writer = csv.writer(f_handle,delimiter=";")
# Add the header/column names
header = work
writer.writerow(header)
# Iterate over `data` and write to the csv file
for row in wdata:
writer.writerow(row)
conn.close()
except Exception as e:
print(f"Error: {e}")
sys.exit(0)
You can dump all results to the csv file without looping:
data = cursor.fetchall()
...
writer.writerows(data)

Read and wirte postgres script using python

I have postgres tables and i want to run a PostgreSQL script file on these tables using python and then write the result of the queries in a csv file. The script file have multiple queries separated by semicolon ;. Sample script is shown below
Script file:
--Duplication Check
select p.*, c.name
from scale_polygons_v3 c inner join cartographic_v3 p
on (metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20)) AND c.kind NOT IN (9,10)
where ST_Contains(c.geom, p.geom);
--Area Check
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1
and p.areaid != sp.areaid;
When i run the python code, it executes successfully without any error but the problem i am facing is, during writing the result of the queries to a csv file. Only the result of last executed query is written to the csv file. It means that first query result is overwrite by the second query, second by third and so on till the last query.
Here is my python code:
import psycopg2
import sys
import csv
import datetime, time
def run_sql_file(filename, connection):
'''
The function takes a filename and a connection as input
and will run the SQL query on the given connection
'''
start = time.time()
file = open(filename, 'r')
sql = s = " ".join(file.readlines())
#sql = sql1[3:]
print "Start executing: " + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n"
print "Query:\n", sql + "\n"
cursor = connection.cursor()
cursor.execute(sql)
records = cursor.fetchall()
with open('Report.csv', 'a') as f:
writer = csv.writer(f, delimiter=',')
for row in records:
writer.writerow(row)
connection.commit()
end = time.time()
row_count = sum(1 for row in records)
print "Done Executing:", filename
print "Number of rows returned:", row_count
print "Time elapsed to run the query:",str((end - start)*1000) + ' ms'
print "\t ==============================="
def main():
connection = psycopg2.connect("host='localhost' dbname='central' user='postgres' password='tpltrakker'")
run_sql_file("script.sql", connection)
connection.close()
if __name__ == "__main__":
main()
What is wrong with my code?
If you are able to change the SQL script a bit then here is a workaround:
#!/usr/bin/env python
import psycopg2
script = '''
declare cur1 cursor for
select * from (values(1,2),(3,4)) as t(x,y);
declare cur2 cursor for
select 'a','b','c';
'''
print script
conn = psycopg2.connect('');
# Cursors exists and available only inside the transaction
conn.autocommit = False;
# Create cursors from script
conn.cursor().execute(script);
# Read names of cursors
cursors = conn.cursor();
cursors.execute('select name from pg_cursors;')
cur_names = cursors.fetchall()
# Read data from each available cursor
for cname in cur_names:
print cname[0]
cur = conn.cursor()
cur.execute('fetch all from ' + cname[0])
rows = cur.fetchall()
# Here you can save the data to the file
print rows
conn.rollback()
print 'done'
Disclaimer: I am totally newbie with Python.
This is the simplest to output each query as a different file. copy_expert
query = '''
select p.*, c.name
from
scale_polygons_v3 c
inner join
cartographic_v3 p on metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20) and c.kind not in (9,10)
where ST_Contains(c.geom, p.geom)
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
query = '''
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1 and p.areaid != sp.areaid;
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report2.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
If you want to append the second output to the same file then just keep the first file object opened.
Notice that it is necessary that copy outputs to stdout to make it available to copy_expert

Insert CSV into SQL database in python

I want to insert the data in my CSV file into the table that I created before.
so lets say I created a table named T
the csv_file is the following:
Last,First,Student Number,Department
Gonzalez,Oliver,1862190394,Chemistry
Roberts,Barbara,1343146197,Computer Science
Carter,Raymond,1460039151,Philosophy
Building on what was shared by Mumpo.
This has worked for me when inserting a CSV to SQL Server. You just need to provide your connection details, filepath, and the table you want to write to. The only caveat is your table must already exist, as this code will insert a CSV to an existing table.
import pyodbc
import csv
# DESTINATION CONNECTION
drivr = ""
servr = ""
db = ""
username = ""
password = ""
my_cnxn = pyodbc.connect('DRIVER={};SERVER={};DATABASE={};UID={};PWD={}'.format(drivr,servr,db,username,password))
my_cursor = cnxn.cursor()
def insert_records(table, yourcsv, cursor, cnxn):
#INSERT SOURCE RECORDS TO DESTINATION
with open(yourcsv) as csvfile:
csvFile = csv.reader(csvfile, delimiter=',')
header = next(csvFile)
headers = map((lambda x: x.strip()), header)
insert = 'INSERT INTO {} ('.format(table) + ', '.join(headers) + ') VALUES '
for row in csvFile:
values = map((lambda x: "'"+x.strip()+"'"), row)
b_cursor.execute(insert +'('+ ', '.join(values) +');' )
b_cnxn.commit() #must commit unless your sql database auto-commits
table = <sql-table-here>
mycsv = '...T.csv' # SET YOUR FILEPATH
insert_records(table, mycsv, my_cursor, my_cnxn)
cursor.close()

Categories