CSV File to SQL Insert Statement - python

I have a CSV file that looks something like this:
Date,Person,Time Out,Time Back,Restaurant,Calories,Delicious?
6/20/2016,August,11:58,12:45,Black Bear,850,Y
6/20/2016,Marcellus,12:00,12:30,Brought Lunch,,Y
6/20/2016,Jessica,11:30,12:30,Wendy's,815,N
6/21/2016,August,12:05,1:01,Brought Lunch,,Y
So far I have managed to print each row into a list of strings (ex. - ['Date', 'Person', 'Time Out', etc.] or ['6/20/2016', 'August', '11:58' etc.]).
Now I need to do 2 more things:
Add an ID header and sequential numeric string to each row (for ex. - ['ID', 'Date',
'Person', etc.] and ['1', '6/20/2016', 'August', etc.])
Separate each row so that they can be formatted into insert
statements rather than just having the program print out every single row one after another (for ex. - INSERT INTO Table ['ID', 'Date', 'Person', etc.] VALUES ['1', '6/20/2016', 'August', etc.])
Here is the code that has gotten me as far as I am now:
import csv
openFile = open('test.csv', 'r')
csvFile = csv.reader(openFile)
for row in csvFile:
print (row)
openFile.close()

Try this (I ignored the ID part since you can use the mySQL auto_increment)
import csv
openFile = open('test.csv', 'r')
csvFile = csv.reader(openFile)
header = next(csvFile)
headers = map((lambda x: '`'+x+'`'), header)
insert = 'INSERT INTO Table (' + ", ".join(headers) + ") VALUES "
for row in csvFile:
values = map((lambda x: '"'+x+'"'), row)
print (insert +"("+ ", ".join(values) +");" )
openFile.close()

You can use this functions if you want to mantain type conversion, i have used it to put data into google big query with a string sql statement.
PS: You can put other types on the function
import csv
def convert(value):
for type in [int, float]:
try:
return type(value)
except ValueError:
continue
# All other types failed it is a string
return value
def construct_string_sql(file_path, table_name, schema_name):
string_SQL = ''
try:
with open(file_path, 'r') as file:
reader = csv.reader(file)
headers = ','.join(next(reader))
for row in reader:
row = [convert(x) for x in row].__str__()[1:-1]
string_SQL += f'INSERT INTO {schema_name}.{table_name}({headers}) VALUES ({row});'
except:
return ''
return string_SQL

You can use this open-source tool to generate batch INSERT statements: https://simranjitk.github.io/sql-converter/.

Related

removing duplicate id entry from text file using python

I have a text file which contains this data (items corresponds to code,entry1,entry2) :
a,1,2
b,2,3
c,4,5
....
....
Here a,b,c.. will be unique always
Every time I read this file in python to either create a new entry for example d,6,7 or to update existing values: say a,1,2 to a,4,3.
I use the following code :
data = ['a',5,6]
datastring = ''
for d in data
datastring = datastring + str(d) + ','
try:
with open("opfile.txt", "a") as f:
f.write(datastring + '\n')
f.close()
return(True)
except:
return(False)
This appends any data as a new entry.
I am trying something like this which checks the first character of each line:
f = open("opfile.txt", "r")
for x in f:
if(x[0] == username):
pass
I don't know how to club these two so that a check will be done on first character(lets say it as id) and if an entry with id is already in the file, then it should be replaced with new data and all other data remains same else it will be entered as new line item.
Read the file into a dictionary that uses the first field as keys. Update the appropriate dictionary, then write it back.
Use the csv module to parse and format the file.
import csv
data = ['a',5,6]
with open("opfile.txt", "r", newline='') as infile:
incsv = csv.reader(infile)
d = {row[0]: row for row in incsv if len(row) != 0}
d[data[0]] = data
with open("opfile.txt", "w") as outfile:
outcsv = csv.writer(outfile)
outcsv.writerows(d.values())
first append all new row to the file.
second, try using write to update rows in your file
def update_record(file_name, field1, field2, field3):
with open(file_name, 'r') as f:
lines = f.readlines()
with open(file_name, 'w') as f:
for line in lines:
if field1 in line:
f.write(field1 + ',' + field2 + ',' + field3 + '\n')
else:
f.write(line)

Check CSV row values if they match with variable

I want to check a CSV if it has a value that matches a variable. If it does contain that variable I want to print out 'variable present'
I tried to check each row for matching text of the variable and each field in the row. I do not get an error message but the result is always negative.
import csv
old_name = "random name already present in the table"
with open("data.csv", "r") as csv_file:
fieldnames = ["name", "price"]
csv_reader = csv.DictReader(csv_file, fieldnames=fieldnames)
for row in csv_reader:
for field in row:
if field == old_name:
print("already there")
else:
print("not there")
Output is just 'not there' for each item in the table.
Each row returned by iterating DictReader gives you a dict containing the column name as key, you should do something like:
for row in csv_reader:
if row['name'] == old_name

How to not just add a new first column to csv but alter the header names

I would like to do the following
read a csv file, Add a new first column, then rename some of the columns
then load the records from csv file.
Ultimately, I would like the first column to be populated with the file
name.
I'm fairly new to Python and I've kind of worked out how to change the fieldnames however, loading the data is a problem as it's looking for the original fieldnames which no longer match.
Code snippet
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.DictReader(inFile)
fieldnames = ['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype']
w = csv.DictWriter(outfile,fieldnames)
w.writeheader()
*** Here is where I start to go wrong
# copy the rest
for node, row in enumerate(r,1):
w.writerow(dict(row))
Error
File "D:\Apps\Python27\ArcGIS10.3\lib\csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'Databases [xsi:type]', 'Resources [xsi:type]', 'ID'
Would like to some assistance to not just learn but truly understand what I need to do.
Cheers and thanks
Peter
Update..
I think I've worked it out
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.reader(inFile)
w = csv.writer(outfile)
header = next(r)
header.insert(0, 'MapSvcName')
#w.writerow(header)
next(r, None) # skip the first row from the reader, the old header
# write new header
w.writerow(['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype'])
prevRow = next(r)
prevRow.insert(0, '0')
w.writerow(prevRow)
for row in r:
if prevRow[-1] == row[-1]:
val = '0'
else:
val = prevRow[-1]
row.insert(0,val)
prevRow = row
w.writerow(row)

Insert blanks into output file pointer (ofp)

The objective of this script is to take an incoming csv file, read it with a DictReader,
take the keys that were read, see if they match any of the pre-designated values in the fieldMap dictionary, and if they do match, append those keys to my hdrlist. Then, write the header list to an outputted file call ofp.
This issue that I am having is that when I don't a key that matches one of the pre-designated values in the fieldMap, I need to insert a blank (' ').
I've tried appending blank values to the hdrlist in an else statement and having a blank key value pair in my fieldMap dictionary:
if row.has_key(ft_test):
hdrlist.append(ft_test)
else:
hdrlist.append('')
'':[''] #blank key:value pair
,but then my:
if hdrlen != len(hdrlist)-1:
print "Cannot Cannot find a key for %s in file %s" % (ft,fn)"
error handling statement returns more print statements than I think it should, and I'm not sure as to why.
If anyone can shed some light as to how to insert blank into my ofp.write(fmtstring), it would be greatly appreciated.
Also, if anyone could shed some light as to why i get more print statement than I think I should with the above else statement, it would be greatly appreciated as well.
My whole script is below, and if there is any other info needed to help me with this code, I will gladly provide it.
Here is a sample of an input file that would produce to many print statements.
input_file.csv = {'cust_no':1, 'streetaddr':'2103 Union Ave','address2':' ','city':'Chicago'}
#!/usr/bin/env python
import sys, csv, glob
fieldMap = {'zipcode':['Zip5', 'zip9','zipcode','ZIP','zip_code','zip','ZIPCODE'],
'firstname':['firstname','FIRSTNAME'],
'lastname':['lastname','LASTNAME'],
'cust_no':['cust_no','CUST_NO'],
'user_name':['user_name','USER_NAME'],
'status':['status','STATUS'],
'cancel_date':['cancel_date','CANCEL_DATE'],
'reject_date':['REJECT_DATE','reject_date'],
'streetaddr':['streetaddr','STREETADDR','ADDRESS','address'],
'streetno':['streetno','STREETNO'],
'streetnm':['streetnm','STREETNM'],
'suffix':['suffix','SUFFIX'], #suffix of street name: dr, ave, st
'city':['city','CITY'],
'state':['state','STATE'],
'phone_home':['phone_home','PHONE_HOME'],
'email':['email','EMAIL'],
'':['']
}
def readFile(fn,ofp):
count = 0
CSVreader = csv.DictReader(open(fn,'rb'), dialect='excel', delimiter=',')
for row in CSVreader:
count+= 1
if count == 1:
hdrlist = []
for ft in fieldMap.keys():
hdrlen = len(hdrlist)
for ft_test in fieldMap[ft]:
if row.has_key(ft_test):
hdrlist.append(ft_test)
if hdrlen != len(hdrlist)-1:
print "Cannot find a key for %s in file %s" % (ft,fn)
if len(hdrlist) != 16:
print "Another error. Not all header's have been assigned new values."
if count < 5:
x=len(hdrlist)
fmtstring = "%s\t" * len(hdrlist) % tuple(row[x] for x in hdrlist)
ofp.write(fmtstring)
break
if __name__ == '__main__':
filenames = glob.glob(sys.argv[1])
ofp = sys.stdout
ofp.write("zipcode\tfirstname\tlastname\tcust_no\tuser_name\tstatus\t"
"cancel_date\treject_date\tstreetaddr\tstreetno\tstreetnm\t"
"suffix\tcity\tstate\tphone_home\temail")
for filename in filenames:
readFile(filename,ofp)
Sample data:
cust_no,status,streetaddr,address2,city,state,zipcode,billaddr,servaddr,title,latitude,longitude,custsize,telemarket,dirmail,nocredhold,email,phone_home,phone_work,phone_fax,phone_page,phone_cell,phone_othr,taxrate1,taxrate2,taxrate3,taxtot,company,firstname,lastname,user_name,dpbc,container,seq,paytype_am,paytype_di,paytype_mc,paytype_vi
0,0,'123 fake st.',,'chicago','il',60185,'123 billaddr st.','123 servaddr st.','mr.',43.123,54.234 ,2000,'TRUE','TRUE','TRUE','email#email.com',(666)555-6666,,,,,,,,,,,'bob','smith','bob smith',,,,'TRUE','TRUE','TRUE','TRUE'
0,0,'123 fake st.','','chicago','il',60185,'123 billaddr st.','123 servaddr st.','mr.',43.123,54.234 ,2000,'TRUE','TRUE','TRUE','email#email.com',(666)555-6666,'','','','','','','','','','','bob','smith','bob smith','','','','TRUE','TRUE','TRUE','TRUE'
If all you want is a hdrlist of the recognized field names in the csv file being processed, you can create it by comparing the values in the DictReader.fieldnames attribute to the contents of fieldMap immediately after creating the DictReader because doing so with a filenames argument will automatically read in the header row of the file.
I also changed your fieldMap dictionary into an OrderedDict so it would preserve the order of the keys.
import glob
from collections import OrderedDict
import csv
import sys
fieldMap = OrderedDict([
('zipcode', ['zipcode', 'ZIPCODE', 'Zip5', 'zip9', 'ZIP', 'zip_code', 'zip']),
('firstname', ['firstname', 'FIRSTNAME']),
('lastname', ['lastname', 'LASTNAME']),
('cust_no', ['cust_no', 'CUST_NO']),
('user_name', ['user_name', 'USER_NAME']),
('status', ['status', 'STATUS']),
('cancel_date', ['cancel_date', 'CANCEL_DATE']),
('reject_date', ['reject_date', 'REJECT_DATE']),
('streetaddr', ['streetaddr', 'STREETADDR', 'ADDRESS', 'address']),
('streetno', ['streetno', 'STREETNO']),
('streetnm', ['streetnm', 'STREETNM']),
('suffix', ['suffix', 'SUFFIX']), # suffix of street name: dr, ave, st
('city', ['city', 'CITY']),
('state', ['state', 'STATE']),
('phone_home', ['phone_home',' PHONE_HOME']),
('email', ['email', 'EMAIL']),
])
def readFile(fn,ofp):
with open(fn, 'rb') as csvfile:
# the following reads the header line into csvReader.fieldnames
csvReader = csv.DictReader(csvfile, dialect='excel', delimiter=',')
# create a list of recognized fieldnames in the csv file
hdrlist = []
for ft in fieldMap:
for ft_test in fieldMap[ft]:
if ft_test in csvReader.fieldnames:
hdrlist.append(ft_test)
break
else:
hdrlist.append(None) # placeholder (could also be '')
hdrlen = len(hdrlist)
ofp.write('hdrlist: {}\n'.format(hdrlist))
if hdrlen != len(fieldMap):
print "Note that not all field names were present in file."
ofp.write("\t".join(fieldMap) + '\n')
for row in csvReader:
fmtstring = "%s\t" * hdrlen % tuple(
row[field] if field else 'NA' for field in hdrlist)
ofp.write(fmtstring+'\n')
if __name__ == '__main__':
# sys.argv = [sys.argv[0], 'ofp_input.csv'] # hardcode for testing
if len(sys.argv) != 2:
print "Error: Filename argument missing!"
sys.exit(-1)
filenames = glob.glob(sys.argv[1])
ofp = sys.stdout
for filename in filenames:
readFile(filename, ofp)

How do I iterate through 2 CSV files and get data from one and add to the other?

I'm trying to iterate over a CSV file that has a 'master list' of names, and compare it to another CSV file that contains only the names of people who were present and made phone calls.
I'm trying to iterate over the master list and compare it to the names in the other CSV file, take the number of calls made by the person and write a new CSV file containing number of Calls if the name isn't found or if it's 0, I need that column to have 0 there.
I'm not sure if its something incredibly simple I'm overlooking, or if I am truly going about this incorrectly.
Edited for formatting.
import csv
import sys
masterlst = open('masterlist.csv')
comparelst = open(sys.argv[1])
masterrdr = csv.DictReader(masterlst, dialect='excel')
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
for lines in masterrdr:
for row in comparerdr:
if lines['Names'] == row['Names']:
print(lines['Names'] + ' has ' + row['Calls'] + ' calls')
wrtr.writerow(row)
elif lines['Names'] != row['Names']:
row['Calls'] = ('%s' % 0)
wrtr.writerow(row)
print(row['Names'] + ' had 0 calls')
masterlst.close()
comparelst.close()
Here's how I'd do it, assuming the file sizes do not prove to be problematic:
import csv
import sys
with open(sys.argv[1]) as comparelst:
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
names_and_counts = {}
for line in comparerdr:
names_and_counts[line['Names']] = line['Calls']
# or, if you're sure you only want the ones with 0 calls, just use a set and only add the line['Names'] values that that line['Calls'] == '0'
with open('masterlist.csv') as masterlst:
masterrdr = csv.DictReader(masterlst, dialect='excel')
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
# or if you're on 2.7, wrtr.writeheader()
for line in masterrdr:
if names_and_counts.get(line['Names']) == '0':
row = {'Names': line['Names'], 'Calls': '0'}
wrtr.writerow(row)
That writes just the rows with 0 calls, which is what your text description said - you could tweak it if you wanted to write something else for non-0 calls.
Thanks everyone for the help. I was able to nest another with statement inside of my outer loop, and add a variable to test whether or not the name from the master list was found in the compare list. This is my final working code.
import csv
import sys
masterlst = open('masterlist.csv')
comparelst = open(sys.argv[1])
masterrdr = csv.DictReader(masterlst, dialect='excel')
comparerdr = csv.DictReader(comparelst, dialect='excel')
headers = comparerdr.fieldnames
with open('callcounts.csv', 'w') as outfile:
wrtr = csv.DictWriter(outfile, fieldnames=headers, dialect='excel', quoting=csv.QUOTE_MINIMAL, delimiter=',', escapechar='\n')
wrtr.writerow(dict((fn,fn) for fn in headers))
for line in masterrdr:
found = False
with open(sys.argv[1]) as loopfile:
looprdr = csv.DictReader(loopfile, dialect='excel')
for row in looprdr:
if row['Names'] == line['Names']:
line['Calls'] = row['Calls']
wrtr.writerow(line)
found = True
break
if found == False:
line['Calls'] = '0'
wrtr.writerow(line)
masterlst.close()
comparelst.close()

Categories