Read and wirte postgres script using python

Read and wirte postgres script using python - python

I have postgres tables and i want to run a PostgreSQL script file on these tables using python and then write the result of the queries in a csv file. The script file have multiple queries separated by semicolon ;. Sample script is shown below
Script file:
--Duplication Check
select p.*, c.name
from scale_polygons_v3 c inner join cartographic_v3 p
on (metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20)) AND c.kind NOT IN (9,10)
where ST_Contains(c.geom, p.geom);
--Area Check
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1
and p.areaid != sp.areaid;
When i run the python code, it executes successfully without any error but the problem i am facing is, during writing the result of the queries to a csv file. Only the result of last executed query is written to the csv file. It means that first query result is overwrite by the second query, second by third and so on till the last query.
Here is my python code:
import psycopg2
import sys
import csv
import datetime, time
def run_sql_file(filename, connection):
'''
The function takes a filename and a connection as input
and will run the SQL query on the given connection
'''
start = time.time()
file = open(filename, 'r')
sql = s = " ".join(file.readlines())
#sql = sql1[3:]
print "Start executing: " + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n"
print "Query:\n", sql + "\n"
cursor = connection.cursor()
cursor.execute(sql)
records = cursor.fetchall()
with open('Report.csv', 'a') as f:
writer = csv.writer(f, delimiter=',')
for row in records:
writer.writerow(row)
connection.commit()
end = time.time()
row_count = sum(1 for row in records)
print "Done Executing:", filename
print "Number of rows returned:", row_count
print "Time elapsed to run the query:",str((end - start)*1000) + ' ms'
print "\t ==============================="
def main():
connection = psycopg2.connect("host='localhost' dbname='central' user='postgres' password='tpltrakker'")
run_sql_file("script.sql", connection)
connection.close()
if __name__ == "__main__":
main()
What is wrong with my code?

If you are able to change the SQL script a bit then here is a workaround:
#!/usr/bin/env python
import psycopg2
script = '''
declare cur1 cursor for
select * from (values(1,2),(3,4)) as t(x,y);
declare cur2 cursor for
select 'a','b','c';
'''
print script
conn = psycopg2.connect('');
# Cursors exists and available only inside the transaction
conn.autocommit = False;
# Create cursors from script
conn.cursor().execute(script);
# Read names of cursors
cursors = conn.cursor();
cursors.execute('select name from pg_cursors;')
cur_names = cursors.fetchall()
# Read data from each available cursor
for cname in cur_names:
print cname[0]
cur = conn.cursor()
cur.execute('fetch all from ' + cname[0])
rows = cur.fetchall()
# Here you can save the data to the file
print rows
conn.rollback()
print 'done'
Disclaimer: I am totally newbie with Python.

This is the simplest to output each query as a different file. copy_expert
query = '''
select p.*, c.name
from
scale_polygons_v3 c
inner join
cartographic_v3 p on metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20) and c.kind not in (9,10)
where ST_Contains(c.geom, p.geom)
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
query = '''
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1 and p.areaid != sp.areaid;
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report2.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()
If you want to append the second output to the same file then just keep the first file object opened.
Notice that it is necessary that copy outputs to stdout to make it available to copy_expert

Related

Is there Python code to write directly into a SQLite command line? [duplicate]

I have a CSV file and I want to bulk-import this file into my sqlite3 database using Python. the command is ".import .....". but it seems that it cannot work like this. Can anyone give me an example of how to do it in sqlite3? I am using windows just in case.
Thanks

import csv, sqlite3
con = sqlite3.connect(":memory:") # change to 'sqlite:///your_filename.db'
cur = con.cursor()
cur.execute("CREATE TABLE t (col1, col2);") # use your column names here
with open('data.csv','r') as fin: # `with` statement available in 2.5+
# csv.DictReader uses first line in file for column headings by default
dr = csv.DictReader(fin) # comma is default delimiter
to_db = [(i['col1'], i['col2']) for i in dr]
cur.executemany("INSERT INTO t (col1, col2) VALUES (?, ?);", to_db)
con.commit()
con.close()

Creating an sqlite connection to a file on disk is left as an exercise for the reader ... but there is now a two-liner made possible by the pandas library
df = pandas.read_csv(csvfile)
df.to_sql(table_name, conn, if_exists='append', index=False)

You're right that .import is the way to go, but that's a command from the SQLite3 command line program. A lot of the top answers to this question involve native python loops, but if your files are large (mine are 10^6 to 10^7 records), you want to avoid reading everything into pandas or using a native python list comprehension/loop (though I did not time them for comparison).
For large files, I believe the best option is to use subprocess.run() to execute sqlite's import command. In the example below, I assume the table already exists, but the csv file has headers in the first row. See .import docs for more info.
subprocess.run()
from pathlib import Path
db_name = Path('my.db').resolve()
csv_file = Path('file.csv').resolve()
result = subprocess.run(['sqlite3',
str(db_name),
'-cmd',
'.mode csv',
'.import --skip 1 ' + str(csv_file).replace('\\','\\\\')
+' <table_name>'],
capture_output=True)
edit note: sqlite3's .import command has improved so that it can treat the first row as header names or even skip the first x rows (requires version >=3.32, as noted in this answer. If you have an older version of sqlite3, you may need to first create the table, then strip off the first row of the csv before importing. The --skip 1 argument will give an error prior to 3.32
Explanation
From the command line, the command you're looking for is sqlite3 my.db -cmd ".mode csv" ".import file.csv table". subprocess.run() runs a command line process. The argument to subprocess.run() is a sequence of strings which are interpreted as a command followed by all of it's arguments.
sqlite3 my.db opens the database
-cmd flag after the database allows you to pass multiple follow on commands to the sqlite program. In the shell, each command has to be in quotes, but here, they just need to be their own element of the sequence
'.mode csv' does what you'd expect
'.import --skip 1'+str(csv_file).replace('\\','\\\\')+' <table_name>' is the import command.
Unfortunately, since subprocess passes all follow-ons to -cmd as quoted strings, you need to double up your backslashes if you have a windows directory path.
Stripping Headers
Not really the main point of the question, but here's what I used. Again, I didn't want to read the whole files into memory at any point:
with open(csv, "r") as source:
source.readline()
with open(str(csv)+"_nohead", "w") as target:
shutil.copyfileobj(source, target)

My 2 cents (more generic):
import csv, sqlite3
import logging
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("%s %s" % (f, dt[f]))
# Generate create table statement:
stmt = "CREATE TABLE ads (%s)" % ",".join(cols)
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO ads VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
return con

The .import command is a feature of the sqlite3 command-line tool. To do it in Python, you should simply load the data using whatever facilities Python has, such as the csv module, and inserting the data as per usual.
This way, you also have control over what types are inserted, rather than relying on sqlite3's seemingly undocumented behaviour.

Many thanks for bernie's answer! Had to tweak it a bit - here's what worked for me:
import csv, sqlite3
conn = sqlite3.connect("pcfc.sl3")
curs = conn.cursor()
curs.execute("CREATE TABLE PCFC (id INTEGER PRIMARY KEY, type INTEGER, term TEXT, definition TEXT);")
reader = csv.reader(open('PC.txt', 'r'), delimiter='|')
for row in reader:
to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8"), unicode(row[2], "utf8")]
curs.execute("INSERT INTO PCFC (type, term, definition) VALUES (?, ?, ?);", to_db)
conn.commit()
My text file (PC.txt) looks like this:
1 | Term 1 | Definition 1
2 | Term 2 | Definition 2
3 | Term 3 | Definition 3

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys, csv, sqlite3
def main():
con = sqlite3.connect(sys.argv[1]) # database file input
cur = con.cursor()
cur.executescript("""
DROP TABLE IF EXISTS t;
CREATE TABLE t (COL1 TEXT, COL2 TEXT);
""") # checks to see if table exists and makes a fresh table.
with open(sys.argv[2], "rb") as f: # CSV file input
reader = csv.reader(f, delimiter=',') # no header information with delimiter
for row in reader:
to_db = [unicode(row[0], "utf8"), unicode(row[1], "utf8")] # Appends data from CSV file representing and handling of text
cur.execute("INSERT INTO neto (COL1, COL2) VALUES(?, ?);", to_db)
con.commit()
con.close() # closes connection to database
if __name__=='__main__':
main()

"""
cd Final_Codes
python csv_to_db.py
CSV to SQL DB
"""
import csv
import sqlite3
import os
import fnmatch
UP_FOLDER = os.path.dirname(os.getcwd())
DATABASE_FOLDER = os.path.join(UP_FOLDER, "Databases")
DBNAME = "allCompanies_database.db"
def getBaseNameNoExt(givenPath):
"""Returns the basename of the file without the extension"""
filename = os.path.splitext(os.path.basename(givenPath))[0]
return filename
def find(pattern, path):
"""Utility to find files wrt a regex search"""
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
if __name__ == "__main__":
Database_Path = os.path.join(DATABASE_FOLDER, DBNAME)
# change to 'sqlite:///your_filename.db'
csv_files = find('*.csv', DATABASE_FOLDER)
con = sqlite3.connect(Database_Path)
cur = con.cursor()
for each in csv_files:
with open(each, 'r') as fin: # `with` statement available in 2.5+
# csv.DictReader uses first line in file for column headings by default
dr = csv.DictReader(fin) # comma is default delimiter
TABLE_NAME = getBaseNameNoExt(each)
Cols = dr.fieldnames
numCols = len(Cols)
"""
for i in dr:
print(i.values())
"""
to_db = [tuple(i.values()) for i in dr]
print(TABLE_NAME)
# use your column names here
ColString = ','.join(Cols)
QuestionMarks = ["?"] * numCols
ToAdd = ','.join(QuestionMarks)
cur.execute(f"CREATE TABLE {TABLE_NAME} ({ColString});")
cur.executemany(
f"INSERT INTO {TABLE_NAME} ({ColString}) VALUES ({ToAdd});", to_db)
con.commit()
con.close()
print("Execution Complete!")
This should come in handy when you have a lot of csv files in a folder which you wish to convert to a single .db file in a go!
Notice that you dont have to know the filenames, tablenames or fieldnames (column names) beforehand!

If the CSV file must be imported as part of a python program, then for simplicity and efficiency, you could use os.system along the lines suggested by the following:
import os
cmd = """sqlite3 database.db <<< ".import input.csv mytable" """
rc = os.system(cmd)
print(rc)
The point is that by specifying the filename of the database, the data will automatically be saved, assuming there are no errors reading it.

Here are solutions that'll work if your CSV file is really big. Use to_sql as suggested by another answer, but set chunksize so it doesn't try to process the whole file at once.
import sqlite3
import pandas as pd
conn = sqlite3.connect('my_data.db')
c = conn.cursor()
users = pd.read_csv('users.csv')
users.to_sql('users', conn, if_exists='append', index = False, chunksize = 10000)
You can also use Dask, as described here to write a lot of Pandas DataFrames in parallel:
dto_sql = dask.delayed(pd.DataFrame.to_sql)
out = [dto_sql(d, 'table_name', db_url, if_exists='append', index=True)
for d in ddf.to_delayed()]
dask.compute(*out)
See here for more details.

Based on Guy L solution (Love it) but can handle escaped fields.
import csv, sqlite3
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile,dbFile,tablename, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("\"%s\" %s" % (f, dt[f]))
# Generate create table statement:
stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
print(stmt)
con = sqlite3.connect(dbFile)
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
con.close()

You can do this using blaze & odo efficiently
import blaze as bz
csv_path = 'data.csv'
bz.odo(csv_path, 'sqlite:///data.db::data')
Odo will store the csv file to data.db (sqlite database) under the schema data
Or you use odo directly, without blaze. Either ways is fine. Read this documentation

The following can also add fields' name based on the CSV header:
import sqlite3
def csv_sql(file_dir,table_name,database_name):
con = sqlite3.connect(database_name)
cur = con.cursor()
# Drop the current table by:
# cur.execute("DROP TABLE IF EXISTS %s;" % table_name)
with open(file_dir, 'r') as fl:
hd = fl.readline()[:-1].split(',')
ro = fl.readlines()
db = [tuple(ro[i][:-1].split(',')) for i in range(len(ro))]
header = ','.join(hd)
cur.execute("CREATE TABLE IF NOT EXISTS %s (%s);" % (table_name,header))
cur.executemany("INSERT INTO %s (%s) VALUES (%s);" % (table_name,header,('?,'*len(hd))[:-1]), db)
con.commit()
con.close()
# Example:
csv_sql('./surveys.csv','survey','eco.db')

in the interest of simplicity, you could use the sqlite3 command line tool from the Makefile of your project.
%.sql3: %.csv
rm -f $#
sqlite3 $# -echo -cmd ".mode csv" ".import $< $*"
%.dump: %.sql3
sqlite3 $< "select * from $*"
make test.sql3 then creates the sqlite database from an existing test.csv file, with a single table "test". you can then make test.dump to verify the contents.

With this you can do joins on CSVs as well:
import sqlite3
import os
import pandas as pd
from typing import List
class CSVDriver:
def __init__(self, table_dir_path: str):
self.table_dir_path = table_dir_path # where tables (ie. csv files) are located
self._con = None
#property
def con(self) -> sqlite3.Connection:
"""Make a singleton connection to an in-memory SQLite database"""
if not self._con:
self._con = sqlite3.connect(":memory:")
return self._con
def _exists(self, table: str) -> bool:
query = """
SELECT name
FROM sqlite_master
WHERE type ='table'
AND name NOT LIKE 'sqlite_%';
"""
tables = self.con.execute(query).fetchall()
return table in tables
def _load_table_to_mem(self, table: str, sep: str = None) -> None:
"""
Load a CSV into an in-memory SQLite database
sep is set to None in order to force pandas to auto-detect the delimiter
"""
if self._exists(table):
return
file_name = table + ".csv"
path = os.path.join(self.table_dir_path, file_name)
if not os.path.exists(path):
raise ValueError(f"CSV table {table} does not exist in {self.table_dir_path}")
df = pd.read_csv(path, sep=sep, engine="python") # set engine to python to skip pandas' warning
df.to_sql(table, self.con, if_exists='replace', index=False, chunksize=10000)
def query(self, query: str) -> List[tuple]:
"""
Run an SQL query on CSV file(s).
Tables are loaded from table_dir_path
"""
tables = extract_tables(query)
for table in tables:
self._load_table_to_mem(table)
cursor = self.con.cursor()
cursor.execute(query)
records = cursor.fetchall()
return records
extract_tables():
import sqlparse
from sqlparse.sql import IdentifierList, Identifier, Function
from sqlparse.tokens import Keyword, DML
from collections import namedtuple
import itertools
class Reference(namedtuple('Reference', ['schema', 'name', 'alias', 'is_function'])):
__slots__ = ()
def has_alias(self):
return self.alias is not None
#property
def is_query_alias(self):
return self.name is None and self.alias is not None
#property
def is_table_alias(self):
return self.name is not None and self.alias is not None and not self.is_function
#property
def full_name(self):
if self.schema is None:
return self.name
else:
return self.schema + '.' + self.name
def _is_subselect(parsed):
if not parsed.is_group:
return False
for item in parsed.tokens:
if item.ttype is DML and item.value.upper() in ('SELECT', 'INSERT',
'UPDATE', 'CREATE', 'DELETE'):
return True
return False
def _identifier_is_function(identifier):
return any(isinstance(t, Function) for t in identifier.tokens)
def _extract_from_part(parsed):
tbl_prefix_seen = False
for item in parsed.tokens:
if item.is_group:
for x in _extract_from_part(item):
yield x
if tbl_prefix_seen:
if _is_subselect(item):
for x in _extract_from_part(item):
yield x
# An incomplete nested select won't be recognized correctly as a
# sub-select. eg: 'SELECT * FROM (SELECT id FROM user'. This causes
# the second FROM to trigger this elif condition resulting in a
# StopIteration. So we need to ignore the keyword if the keyword
# FROM.
# Also 'SELECT * FROM abc JOIN def' will trigger this elif
# condition. So we need to ignore the keyword JOIN and its variants
# INNER JOIN, FULL OUTER JOIN, etc.
elif item.ttype is Keyword and (
not item.value.upper() == 'FROM') and (
not item.value.upper().endswith('JOIN')):
tbl_prefix_seen = False
else:
yield item
elif item.ttype is Keyword or item.ttype is Keyword.DML:
item_val = item.value.upper()
if (item_val in ('COPY', 'FROM', 'INTO', 'UPDATE', 'TABLE') or
item_val.endswith('JOIN')):
tbl_prefix_seen = True
# 'SELECT a, FROM abc' will detect FROM as part of the column list.
# So this check here is necessary.
elif isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
if (identifier.ttype is Keyword and
identifier.value.upper() == 'FROM'):
tbl_prefix_seen = True
break
def _extract_table_identifiers(token_stream):
for item in token_stream:
if isinstance(item, IdentifierList):
for ident in item.get_identifiers():
try:
alias = ident.get_alias()
schema_name = ident.get_parent_name()
real_name = ident.get_real_name()
except AttributeError:
continue
if real_name:
yield Reference(schema_name, real_name,
alias, _identifier_is_function(ident))
elif isinstance(item, Identifier):
yield Reference(item.get_parent_name(), item.get_real_name(),
item.get_alias(), _identifier_is_function(item))
elif isinstance(item, Function):
yield Reference(item.get_parent_name(), item.get_real_name(),
item.get_alias(), _identifier_is_function(item))
def extract_tables(sql):
# let's handle multiple statements in one sql string
extracted_tables = []
statements = list(sqlparse.parse(sql))
for statement in statements:
stream = _extract_from_part(statement)
extracted_tables.append([ref.name for ref in _extract_table_identifiers(stream)])
return list(itertools.chain(*extracted_tables))
Example (assuming account.csv and tojoin.csv exist in /path/to/files):
db_path = r"/path/to/files"
driver = CSVDriver(db_path)
query = """
SELECT tojoin.col_to_join
FROM account
LEFT JOIN tojoin
ON account.a = tojoin.a
"""
driver.query(query)

import csv, sqlite3
def _get_col_datatypes(fin):
dr = csv.DictReader(fin) # comma is default delimiter
fieldTypes = {}
for entry in dr:
feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
if not feildslLeft: break # We're done
for field in feildslLeft:
data = entry[field]
# Need data to decide
if len(data) == 0:
continue
if data.isdigit():
fieldTypes[field] = "INTEGER"
else:
fieldTypes[field] = "TEXT"
# TODO: Currently there's no support for DATE in sqllite
if len(feildslLeft) > 0:
raise Exception("Failed to find all the columns data types - Maybe some are empty?")
return fieldTypes
def escapingGenerator(f):
for line in f:
yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")
def csvToDb(csvFile,dbFile,tablename, outputToFile = False):
# TODO: implement output to file
with open(csvFile,mode='r', encoding="ISO-8859-1") as fin:
dt = _get_col_datatypes(fin)
fin.seek(0)
reader = csv.DictReader(fin)
# Keep the order of the columns name just as in the CSV
fields = reader.fieldnames
cols = []
# Set field and type
for f in fields:
cols.append("\"%s\" %s" % (f, dt[f]))
# Generate create table statement:
stmt = "create table if not exists \"" + tablename + "\" (%s)" % ",".join(cols)
print(stmt)
con = sqlite3.connect(dbFile)
cur = con.cursor()
cur.execute(stmt)
fin.seek(0)
reader = csv.reader(escapingGenerator(fin))
# Generate insert statement:
stmt = "INSERT INTO \"" + tablename + "\" VALUES(%s);" % ','.join('?' * len(cols))
cur.executemany(stmt, reader)
con.commit()
con.close()

I've found that it can be necessary to break up the transfer of data from the csv to the database in chunks as to not run out of memory. This can be done like this:
import csv
import sqlite3
from operator import itemgetter
# Establish connection
conn = sqlite3.connect("mydb.db")
# Create the table
conn.execute(
"""
CREATE TABLE persons(
person_id INTEGER,
last_name TEXT,
first_name TEXT,
address TEXT
)
"""
)
# These are the columns from the csv that we want
cols = ["person_id", "last_name", "first_name", "address"]
# If the csv file is huge, we instead add the data in chunks
chunksize = 10000
# Parse csv file and populate db in chunks
with conn, open("persons.csv") as f:
reader = csv.DictReader(f)
chunk = []
for i, row in reader:
if i % chunksize == 0 and i > 0:
conn.executemany(
"""
INSERT INTO persons
VALUES(?, ?, ?, ?)
""", chunk
)
chunk = []
items = itemgetter(*cols)(row)
chunk.append(items)

Here is my version, works already by asking you to select the '.csv' file you want to convert
from multiprocessing import current_process
import pandas as pd
import sqlite3
import os
from tkinter import Tk
from tkinter.filedialog import askopenfilename
from pathlib import Path
def csv_to_db(csv_filedir):
if not Path(csv_filedir).is_file(): # if needed ask for user input of CVS file
current_path = os.getcwd()
Tk().withdraw()
csv_filedir = askopenfilename(initialdir=current_path)
try:
data = pd.read_csv(csv_filedir) # load CSV file
except:
print("Something went wrong when opening to the file")
print(csv_filedir)
csv_df = pd.DataFrame(data)
csv_df = csv_df.fillna('NULL') # make NaN = to 'NULL' for SQL format
[path,filename] = os.path.split(csv_filedir) # define path and filename
[filename,_] = os.path.splitext(filename)
database_filedir = os.path.join(path, filename + '.db')
conn = sqlite3.connect(database_filedir) # connect to SQL server
[fields_sql, header_sql_string] = create_sql_fields(csv_df)
# CREATE EMPTY DATABASE
create_sql = ''.join(['CREATE TABLE IF NOT EXISTS ' + filename + ' (' + fields_sql + ')'])
cursor = conn.cursor()
cursor.execute(create_sql)
# INSERT EACH ROW IN THE SQL DATABASE
for irow in csv_df.itertuples():
insert_values_string = ''.join(['INSERT INTO ', filename, header_sql_string, ' VALUES ('])
insert_sql = f"{insert_values_string} {irow[1]}, '{irow[2]}','{irow[3]}', {irow[4]}, '{irow[5]}' )"
print(insert_sql)
cursor.execute(insert_sql)
# COMMIT CHANGES TO DATABASE AND CLOSE CONNECTION
conn.commit()
conn.close()
print('\n' + csv_filedir + ' \n converted to \n' + database_filedir)
return database_filedir
def create_sql_fields(df): # gather the headers of the CSV and create two strings
fields_sql = [] # str1 = var1 TYPE, va2, TYPE ...
header_names = [] # str2 = var1, var2, var3, var4
for col in range(0,len(df.columns)):
fields_sql.append(df.columns[col])
fields_sql.append(str(df.dtypes[col]))
header_names.append(df.columns[col])
if col != len(df.columns)-1:
fields_sql.append(',')
header_names.append(',')
fields_sql = ' '.join(fields_sql)
fields_sql = fields_sql.replace('int64','integer')
fields_sql = fields_sql.replace('float64','integer')
fields_sql = fields_sql.replace('object','text')
header_sql_string = '(' + ''.join(header_names) + ')'
return fields_sql, header_sql_string
csv_to_db('')

how to automatically create table based on CSV into postgres using python

I am a new Python programmer and trying to import a sample CSV file into my Postgres database using python script.
I have CSV file with name abstable1 it has 3 headers:
absid, name, number
I have many such files in a folder
I want to create a table into PostgreSQL with the same name as the CSV file for all.
Here is the code which I tried to just create a table for one file to test:
import psycopg2
import csv
import os
#filePath = 'c:\\Python27\\Scripts\\abstable1.csv'
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= pwdxx")
print("Connecting to Database")
cur = conn.cursor()
#Uncomment to execute the code below to create a table
cur.execute("""CREATE TABLE abs.abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")
#to copy the csv data into created table
with open('abstable1.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abs.abstable1', sep=',')
conn.commit()
conn.close()
This is the error that I am getting:
File "c:\Python27\Scripts\testabs.py", line 26, in <module>
cur.copy_from(f, 'abs.abstable1', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY abstable1, line 1
Any recommendation or alternate solution to resolve this issue is highly appreciated.

Here's what worked for me by: import glob
This code automatically reads all CSV files in a folder and Creates a table with Same name as of the file.
Although I'm still trying to figure out how to extract specific datatypes according to the data in CSV.
But as far as table creation is concerned, this works like a charm for all CSV files in a folder.
import csv
import psycopg2
import os
import glob
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
# Split columns into an array [...]
columns = firstLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ tablename + "("
#some loop or function according to your requiremennt
# Define columns for table
for column in columns:
sqlQueryCreate += column + " VARCHAR(64),\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
cur = conn.cursor()
cur.execute(sqlQueryCreate)
conn.commit()
cur.close()

i tried your code and works fine
import psycopg2
conn = psycopg2.connect("host= 127.0.0.1 dbname=testdb user=postgres password=postgres")
print("Connecting to Database")
cur = conn.cursor()
'''cur.execute("""CREATE TABLE abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")'''
with open('lolo.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abstable1', sep=',', columns=('absid', 'name', 'number'))
conn.commit()
conn.close()
although i had to make some changes for it to work:
i had to name the table abstable1 because using abs.abstable1 postgres assumes that i'm using the schema abs, maybe you created that schema on your database if not check on that, also i'm using python 3.7
i noticed that you are using python 2.7(which i think is no longer supported), this may cause issues, since you say you are learning i would recommend that you use python 3 since it is more used now and you most likely encounter code written on it and you would have to be adapting your code to fit your python 2.7

I post my solution here based on #Rose answer.
I used sqlalchemy, a JSON file as config and glob.
import json
import glob
from sqlalchemy import create_engine, text
def create_tables_from_files(files_folder, engine, config):
try:
for filename in glob.glob(files_folder+"\*csv"):
tablename = filename.replace(files_folder, "").replace('\\', "").replace(".csv", "")
input_file = open(filename, "r")
columns = input_file.readline().strip().split(",")
create_query = 'DROP TABLE IF EXISTS ' + config["staging_schema"] + "." + tablename + "; \n"
create_query +='CREATE TABLE ' + config["staging_schema"] + "." + tablename + " ( "
for column in columns:
create_query += column + " VARCHAR, \n "
create_query = create_query[:-4]
create_query += ");"
engine.execute(text(create_query).execution_options(autocommit=True))
print(tablename + " table created")
except:
print("Error at uploading tables")

SQL query returns blank output when running inside Python script

I have a python script that is supposed to loop through a text file and gather the domain as an argument from each line in the text file. Then it is supposed to use the domain as an argument in a SQL query. The issue is when I'm passing in the domain_name as an argument the JSON output the script produces is blank. If I set the domain_name argument in my sql query directly inside the query then the script outputs perfect JSON format. As you can see in the top of my script right below def connect_to_db() I start to loop through the text file. I'm not sure where in my code the error is occurring by any assistance would be greatly appreciated!
Code
from __future__ import print_function
try:
import psycopg2
except ImportError:
raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
sys.exit(1)
import re
import sys
import json
import pprint
DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'
def connect_to_db():
filepath = 'test.txt'
with open(filepath) as fp:
for cnt, domain_name in enumerate(fp):
print("Line {}: {}".format(cnt, domain_name))
print(domain_name)
domain_name = domain_name.rstrip()
conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
cursor = conn.cursor()
cursor.execute(
"SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE c.id = ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) = lower('%s') AND x509_notAfter(c.certificate) > statement_timestamp();".format(
domain_name))
unique_domains = cursor.fetchall()
# print out the records using pretty print
# note that the NAMES of the columns are not shown, instead just indexes.
# for most people this isn't very useful so we'll show you how to return
# columns as a dictionary (hash) in the next example.
pprint.pprint(unique_domains)
outfilepath = domain_name + ".json"
with open(outfilepath, 'a') as outfile:
outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4))
if __name__ == "__main__":
connect_to_db()

Don't use format to create your SQL statement. Use ? placeholders and then a tuple of the values to insert:
c.execute('''SELECT c.id, x509_commonName(c.certificate),
x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE
c.id= ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) =
lower(?) AND x509_notAfter(c.certificate) > statement_timestamp()''',(domain_name,))
More generically:
c.execute('''SELECT columnX FROM tableA where columnY = ? AND columnZ =?'''
(desired_columnY_value,desired_columnZ_value))

program hangs... am I closing the connection/program correctly

I am new to pyodbc. I have written a simple program. I think. The program does not close/quit. When I run it it hangs until I manually break the program. It outputs the data. Am I writing it correctly?
import pyodbc
import csv
conn = pyodbc.connect('DSN=connect;UID=gog;PWD=humbleb')
cursor2 = conn.cursor()
cursor2.execute("SELECT PTIME, PVALUE FROM HISTORY_TABLE WHERE POINT = 'POINT' AND PTIME> '2017-04-12' AND PTIME<'2017-04-13' AND HISTTYPE='AVG' AND PERIOD=1200")
i=1
sample = cursor2.fetchall()
for rows in sample:
with open ('C:/directory/record{0}.csv'.format(i), 'w') as f:
csv.writer(f).writerow(rows)
i += 1
cursor2.close()
conn.close()

New to python: My method to import CSV to SQlite DB

I just started python programming and find it very useful so far coming from a Delphi/Lazarus background.
I recently downloaded trend data from a SCADA system and needed to import the data into a sqlite db. I thought I would share my python script here.
This process would have taken a lot more programming in Pascal. Now I just create a GUI with Lazarus and use TProcess to run the script with some parameters and the data is in the db.
Sample of trend data
Time,P1_VC70004PID_DRCV,P1_VC70004PID_DRPV,P1_VC70004PID_DRSP
6:00:30,27.75,3000,3000
6:01:00,27.75,3000,3000
6:01:30,27.75,3000,3000
6:02:00,27.75,3000,3000
6:02:30,27.75,3000,3000
6:03:00,27.75,3000,3000
6:03:30,27.75,3000,3000
6:04:00,27.75,3000,3000
6:04:30,27.75,3000,3000
6:05:00,27.75,3000,3000
Python code:
import csv
import sqlite3
import sys
FileName = sys.argv[1]
TableName = "data"
db = "trenddata.db3"
conn = sqlite3.connect(db)
conn.text_factory = str # allows utf-8 data to be stored
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS " + TableName)
c.execute("VACUUM")
i = 0
f = open(FileName, 'rt')
try:
reader = csv.reader(f)
for row in reader:
if i == 0:
## Create Table header section from Header info in CSV doc
c.execute("CREATE TABLE %s (%s)" % (TableName, ", ".join(row)))
else:
## Import row data into database
c.execute("INSERT INTO %s VALUES ('%s')" % (TableName, "', '".join(row)))
i += 1
conn.commit()
finally:
f.close()
conn.close()
print("Imported %s records" % (i))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read and wirte postgres script using python - python

Related

Is there Python code to write directly into a SQLite command line? [duplicate]

how to automatically create table based on CSV into postgres using python

SQL query returns blank output when running inside Python script

program hangs... am I closing the connection/program correctly

New to python: My method to import CSV to SQlite DB

Categories

Resources