I am trying to figure out how to create a csv file that contains the null values I have in my MS SQL database table. Right now the script I am using fills up the null values with '' (empty strings). How I am supposed to instruct the csv Writer to keep the null values?
example of source table
ID,Date,Entitled Key
10000002,NULL,805
10000003,2020-11-22 00:00:00,805
export_sql_to_csv.py
import csv
import os
import pyodbc
filePath = os.getcwd() + '/'
fileName = 'rigs_latest.csv'
server = 'ip-address'
database = 'db-name'
username = 'admin'
password = 'password'
# Database connection variable.
connect = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' +
server+';DATABASE='+database+';UID='+username+';PWD=' + password)
cursor = connect.cursor()
sqlSelect = "SELECT * FROM my_table"
cursor.execute(sqlSelect)
results = cursor.fetchall()
# Extract the table headers.
headers = [i[0] for i in cursor.description]
# Open CSV file for writing.
csvFile = csv.writer(open(filePath + fileName, 'w', newline=''),
delimiter=',', lineterminator='\r\n',
quoting=csv.QUOTE_NONE, escapechar='\\')
# Add the headers and data to the CSV file.
csvFile.writerow(headers)
csvFile.writerows(results)
Example of the result after running the above script:
ID,Date,Entitled Key
10000002,,805
10000003,2020-11-22 00:00:00,805
The main reason why I would like to keep the null values is that I would like to convert that csv file into series of insert SQL statements and execute those against Aurora Serverless PostgreSQL database. The database doesn't accept empty strings for the type date and results in that error: ERROR: invalid input syntax for type date: ""
As described in the docs for the csv module, the None value is written to CSV as '' (empty string) by design. All other non-string values call str first.
So if you want your CSV to have the string null instead of '' then you have to modify the values before they reach the CSV writer. Perhaps:
results = [
['null' if val is None else val for val in row] for row in results
]
Related
I am a new Python programmer and trying to import a sample CSV file into my Postgres database using python script.
I have CSV file with name abstable1 it has 3 headers:
absid, name, number
I have many such files in a folder
I want to create a table into PostgreSQL with the same name as the CSV file for all.
Here is the code which I tried to just create a table for one file to test:
import psycopg2
import csv
import os
#filePath = 'c:\\Python27\\Scripts\\abstable1.csv'
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= pwdxx")
print("Connecting to Database")
cur = conn.cursor()
#Uncomment to execute the code below to create a table
cur.execute("""CREATE TABLE abs.abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")
#to copy the csv data into created table
with open('abstable1.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abs.abstable1', sep=',')
conn.commit()
conn.close()
This is the error that I am getting:
File "c:\Python27\Scripts\testabs.py", line 26, in <module>
cur.copy_from(f, 'abs.abstable1', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY abstable1, line 1
Any recommendation or alternate solution to resolve this issue is highly appreciated.
Here's what worked for me by: import glob
This code automatically reads all CSV files in a folder and Creates a table with Same name as of the file.
Although I'm still trying to figure out how to extract specific datatypes according to the data in CSV.
But as far as table creation is concerned, this works like a charm for all CSV files in a folder.
import csv
import psycopg2
import os
import glob
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
# Split columns into an array [...]
columns = firstLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ tablename + "("
#some loop or function according to your requiremennt
# Define columns for table
for column in columns:
sqlQueryCreate += column + " VARCHAR(64),\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
cur = conn.cursor()
cur.execute(sqlQueryCreate)
conn.commit()
cur.close()
i tried your code and works fine
import psycopg2
conn = psycopg2.connect("host= 127.0.0.1 dbname=testdb user=postgres password=postgres")
print("Connecting to Database")
cur = conn.cursor()
'''cur.execute("""CREATE TABLE abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")'''
with open('lolo.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abstable1', sep=',', columns=('absid', 'name', 'number'))
conn.commit()
conn.close()
although i had to make some changes for it to work:
i had to name the table abstable1 because using abs.abstable1 postgres assumes that i'm using the schema abs, maybe you created that schema on your database if not check on that, also i'm using python 3.7
i noticed that you are using python 2.7(which i think is no longer supported), this may cause issues, since you say you are learning i would recommend that you use python 3 since it is more used now and you most likely encounter code written on it and you would have to be adapting your code to fit your python 2.7
I post my solution here based on #Rose answer.
I used sqlalchemy, a JSON file as config and glob.
import json
import glob
from sqlalchemy import create_engine, text
def create_tables_from_files(files_folder, engine, config):
try:
for filename in glob.glob(files_folder+"\*csv"):
tablename = filename.replace(files_folder, "").replace('\\', "").replace(".csv", "")
input_file = open(filename, "r")
columns = input_file.readline().strip().split(",")
create_query = 'DROP TABLE IF EXISTS ' + config["staging_schema"] + "." + tablename + "; \n"
create_query +='CREATE TABLE ' + config["staging_schema"] + "." + tablename + " ( "
for column in columns:
create_query += column + " VARCHAR, \n "
create_query = create_query[:-4]
create_query += ");"
engine.execute(text(create_query).execution_options(autocommit=True))
print(tablename + " table created")
except:
print("Error at uploading tables")
I am making a program that fetches column names and dumps the data into csv format.
Now everything is working just fine and data is being dumped into csv, the problem is,
I am not able to fetch headers into csv. If I open the exported csv file into excel, only data shows up not the column headers. How do I do that?
Here's my code:
import cx_Oracle
import csv
dsn_tns = cx_Oracle.makedsn(--Details--)
conn = cx_Oracle.connect(--Details--)
d = conn.cursor()
csv_file = open("profile.csv", "w")
writer = csv.writer(csv_file, delimiter=',', lineterminator="\n", quoting=csv.QUOTE_NONNUMERIC)
d.execute("""
select * from all_tab_columns where OWNER = 'ABBAS'
""")
tables_tu = d.fetchall()
for row in tables_tu:
writer.writerow(row)
conn.close()
csv_file.close()
What code do I use to export headers too in csv?
Place this just above your for loop:
writer.writerow(i[0] for i in d.description)
Because d.description is a read-only attribute containing 7-tuples that look like:
(name,
type_code,
display_size,
internal_size,
precision,
scale,
null_ok)
here is what I try to achieve my current code is working fine I get the query to run on my sql server but I will need to gather information from several servers. How would I add a column with the dbserver listed in that column?
import pyodbc
import csv
f = open("dblist.ini")
dbserver,UID,PWD = [ variable[variable.find("=")+1 :] for variable in f.readline().split("~")]
connectstring = "DRIVER={SQL server};SERVER=" + dbserver + ";DATABASE=master;UID="+UID+";PWD="+PWD
cnxn = pyodbc.connect(connectstring)
cursor = cnxn.cursor()
fd = open('mssql1.txt', 'r')
sqlFile = fd.read()
fd.close()
cursor.execute(sqlFile)
with open("out.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file, delimiter = '!')
csv_writer.writerow([i[0] for i in cursor.description]) # write headers
csv_writer.writerows(cursor)
You could add the extra information in your sql query. For example:
select "dbServerName", * from table;
Your cursor will return with an extra column in front of your real data that has the db Server name. The downside to this method is you're transferring a little more extra data.
I know this is a very basic question.
I have a CSV file, which contains data already. This file is generated automatically not using opening with Dictreader or open object.
Goal
I want to open an existing file
Append the Header in the first row (Shift the first row data)
Save the file
Return the file
Any clues?
cursor.execute(sql, params + (csv_path,))
This command generates file, without header.
Code
sql, params = queryset.query.sql_with_params()
sql += ''' INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description] #error
Tried
SELECT `website` UNION SELECT `request_system_potentialcustomers`.`website` FROM `request_system_potentialcustomers` ORDER BY `request_system_potentialcustomers`.`revenue` DESC
INTO OUTFILE "D:\\out.csv"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Wait a minute. If you have not yet called
cursor.execute(sql, params + (csv_path,))
then you have the opportunity to write the CSV file correctly from the get-go. You should not need to write a new file with the header line, then copy all that CSV into the new file and so forth. That is slow and inefficient -- and your only choice -- if you really have to prepend a line to an existing file.
If instead you have not yet written the CSV file, and if you know the header, then you can add it to the SQL using SELECT ... UNION ... SELECT:
header = ['foo', 'bar', 'baz', ]
query = ['SELECT {} UNION'.format(','.join([repr(h) for h in header]))]
sql, params = queryset.query.sql_with_params()
query.append(sql)
sql = '''INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
query.append(sql)
sql = ' '.join(query)
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
Demo:
mysql> SELECT "foo", "bar" UNION SELECT "baz", "quux" INTO OUTFILE "/tmp/out";
Produces the file /tmp/out containing
foo bar
baz quux
Cursor.description attribute give you information about the result column.
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description]
Write above information to the new file.
Append old csv contents to the new file.
Rename the new file with the old file name.
Not quite clear if you are trying to read an existing csv file or not, but to read a csv off disk without column names:
Use dictreader/dictwriter and specify the column names in your file
Python 3:
import csv
ordered_filenames = ['animal','height','weight']
with open('stuff.csv') as csvfile, open("result.csv","w",newline='') as result:
rdr = csv.DictReader(csvfile, fieldnames=ordered_filenames)
wtr = csv.DictWriter(result, ordered_filenames)
wtr.writeheader()
for line in rdr:
wtr.writerow(line)
With stuff.csv in the same directory:
elephant,1,200
cat,0.1,1
dog,0.2,2
and the output result file:
animal,height,weight
elephant,1,200
cat,0.1,1
dog,0.2,2
I want to insert the data in my CSV file into the table that I created before.
so lets say I created a table named T
the csv_file is the following:
Last,First,Student Number,Department
Gonzalez,Oliver,1862190394,Chemistry
Roberts,Barbara,1343146197,Computer Science
Carter,Raymond,1460039151,Philosophy
Building on what was shared by Mumpo.
This has worked for me when inserting a CSV to SQL Server. You just need to provide your connection details, filepath, and the table you want to write to. The only caveat is your table must already exist, as this code will insert a CSV to an existing table.
import pyodbc
import csv
# DESTINATION CONNECTION
drivr = ""
servr = ""
db = ""
username = ""
password = ""
my_cnxn = pyodbc.connect('DRIVER={};SERVER={};DATABASE={};UID={};PWD={}'.format(drivr,servr,db,username,password))
my_cursor = cnxn.cursor()
def insert_records(table, yourcsv, cursor, cnxn):
#INSERT SOURCE RECORDS TO DESTINATION
with open(yourcsv) as csvfile:
csvFile = csv.reader(csvfile, delimiter=',')
header = next(csvFile)
headers = map((lambda x: x.strip()), header)
insert = 'INSERT INTO {} ('.format(table) + ', '.join(headers) + ') VALUES '
for row in csvFile:
values = map((lambda x: "'"+x.strip()+"'"), row)
b_cursor.execute(insert +'('+ ', '.join(values) +');' )
b_cnxn.commit() #must commit unless your sql database auto-commits
table = <sql-table-here>
mycsv = '...T.csv' # SET YOUR FILEPATH
insert_records(table, mycsv, my_cursor, my_cnxn)
cursor.close()