Parsing just a column of a CSV file with multiple columns - python

I am learning Python and am currently working with it to parse a CSV file.
The CSV file has 3 columns:
Full_name, university, and Birth_Year.
I have successfully loaded,read, and printed the content of a given CSV file into Python, but here’s where I am stuck at:
I want to use and parse ONLY the column Full_name to 3 columns: first, middle, and last. If there are only 2 words in the name, then the middle name should be null.
The resulting parsed output should then be inserted to a sql db through Python.
Here’s my code so far:
import csv
if __name__ == '__main__':
if len (sys.argv) != 2:
print("Please enter the csv file too: python name_parsing.py student_info.csv")
sys.exit()
else:
with open(sys.argv[1], "r" ) as file:
reader = csv.DictReader(file) #I use DictReader because the csv file has > 1 col
# Print the names on the cmd
for row in reader:
name = row["Full_name"]
for name in reader:
if len(name) == 2:
print(first_name = name[0])
print(middle_name = None)
print(last_name = name[2])
if len(name) == 3 : # The assumption is that name is either 2 or 3 words only.
print(first_name = name[0])
print(middle_name = name[1])
print(last_name = name[2])
db.execute("INSERT INTO name (first, middle, last) VALUES(?,?,?)",
row["first_name"], row["middle_name"], row["last_name"])
Running the program above gives me no output whatsoever. How to parse my code the right way? Thank you.

I created a sample file based on your description. The content looks as below:
Full_name,University,Birth_Year
Prakash Ranjan Gupta,BPUT,1920
Hari Shankar,NIT,1980
John Andrews,MIT,1950
Arbaaz Aslam Khan,REC,2005
And then I executed the code below. It runs fine on my jupyter notebook. You can add the lines (sys.argv) != 2 etc) with this as you need. I have used sqlite3 database I hope this works. In case you want the if/main block added to this, let me know: can edit.
This is going by your code. (Otherwise You can do this using pandas in an easier way I believe.)
import csv
import sqlite3
con = sqlite3.connect('name_data.sql') ## Make DB connection and create a table if it does not exist
cur = con.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS UNIV_DATA
(FIRSTNAME TEXT,
MIDDLE_NAME TEXT,
LASTNAME TEXT,
UNIVERSITY TEXT,
YEAR TEXT)''')
with open('names_data.csv') as fh:
read_data = csv.DictReader(fh)
for uniData in read_data:
lst_nm = uniData['Full_name'].split()
if len(lst_nm) == 2:
fn,ln = lst_nm
mn = None
else:
fn,mn,ln = lst_nm
# print(fn,mn,ln,uniData['University'],uniData['Birth_Year'] )
cur.execute('''
INSERT INTO UNIV_DATA
(FIRSTNAME, MIDDLE_NAME, LASTNAME, UNIVERSITY, YEAR)
VALUES(?,?,?,?,?)''',
(fn,mn,ln,uniData['University'],uniData['Birth_Year'])
)
con.commit()
cur.close()
con.close()
If you want to read the data in the table UNIV_DATA:
Option 1: (prints the rows in the form of tuple)
import sqlite3
con = sqlite3.connect('name_data.sql') #Make connection to DB and create a connection object
cur = con.cursor() #Create a cursor object
results = cur.execute('SELECT * FROM UNIV_DATA') # Execute the query and store the rows retrieved in 'result'
[print(result) for result in results] #Traverse through 'result' in a loop to print the rows retrieved
cur.close() #close the cursor
con.close() #close the connection
Option 2: (prints all the rows in the form of a pandas data frame - execute in jupyter ...preferably )
import sqlite3
import pandas as pd
con = sqlite3.connect('name_data.sql') #Make connection to DB and create a connection object
df = pd.read_sql('SELECT * FROM UNIV_DATA', con) #Query the table and store the result in a dataframe : df
df

When you call name = row["Full_name"] it is going to return a string representing the name, e.g. "John Smith".
In python strings can be treated like lists, so in this case if you called len(name) it would return 10 as "John Smith" has 10 characters. As this doesn't equal 2 or 3, nothing will happen in your for loop.
What you need is some way to turn the string into a list that containing the first, second and last names. You can do this using the split function. If you call name.split(" ") it would split the string whenever there is a space, continuing the above example this would return ["John", "Smith"] which should make your code work.

Related

how to ingest a table specification file in .txt form and create a table in sqlite?

I tried a few different ways, below but having trouble a) removing the width and b) removing the \n with a comma. I have a txt file like the below and I want to take that information and create a table in sqlite (all using python)
"field",width, type
name, 15, string
revenue, 10, decimal
invoice_date, 10, string
amount, 2, integer
Current python code - trying to read in the file, and get the values to pass in the sql statement below
import os
import pandas as pd
dir_path = os.path.dirname(os.path.realpath(__file__))
file = open(str(dir_path) + '/revenue/revenue_table_specifications.txt','r')
lines = file.readlines()
table = lines[2::]
s = ''.join(str(table).split(','))
x = s.replace("\n", ",").strip()
print(x)
sql I want to pass in
c = sqlite3.connect('rev.db') #connnect to DB
try:
c.execute('''CREATE TABLE
revenue_table (information from txt file,
information from txt file,
....)''')
except sqlite3.OperationalError: #i.e. table exists already
pass
This produces something that will work.
def makesql(filename):
s = []
for row in open(filename):
if row[0] == '"':
continue
parts = row.strip().split(", ")
s.append( f"{parts[0]} {parts[2]}" )
return "CREATE TABLE revenue_table (\n" + ",\n".join(s) + ");"
sql = makesql( 'x.csv' )
print(sql)
c.execute( sql )

Incorrect date value when loading xlsx file to table using pymysql and xlrd

(Very) beginner python user here. I'm trying to load an xlsx file into a MySQL table using xlrd and pymysql python libraries and I'm getting an error:
pymysql.err.InternalError: (1292, "Incorrect date value: '43500' for column 'invoice_date' at row 1")
The datatype for invoice_date for my table is DATE. The format for this field on my xlsx file is also Date. Things work fine if I change the table datatype to varchar, but I'd prefer to have the data load into my table as a date instead of converting after the fact. Any ideas as to why I'm getting this error? It appears that xlrd or pymysql is reading '2/4/2019' in my xlxs file as '43500' and mysql is rejecting it due to a datatype mismatch.
import xlrd
import pymysql as MySQLdb
# Open workbook and define first sheet
book = xlrd.open_workbook("2019_Complete.xlsx")
sheet = book.sheet_by_index(0)
# MySQL connection
database = MySQLdb.connect (host="localhost", user="root",passwd="password", db="vendor")
# Get cursor, which is used to traverse the databse, line by line
cursor = database.cursor()
# INSERT INTO SQL query
query = """insert into table values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
# Create a For loop to iterate through each row in the XLS file, starting at row 2 to skip the headers
for r in range(1, sheet.nrows):
lp = sheet.cell(r,0).value
pallet_lp = sheet.cell(r,1).value
bol = sheet.cell(r,2).value
invoice_date = sheet.cell(r,3).value
date_received = sheet.cell(r,4).value
date_repaired = sheet.cell(r,5).value
time_in_repair = sheet.cell(r,6).value
date_shipped = sheet.cell(r,7).value
serial_number = sheet.cell(r,8).value
upc = sheet.cell(r,9).value
product_type = sheet.cell(r,10).value
product_description = sheet.cell(r,11).value
repair_code = sheet.cell(r,12).value
condition = sheet.cell(r,13).value
repair_cost = sheet.cell(r,14).value
parts_cost = sheet.cell(r,15).value
total_cost = sheet.cell(r,16).value
repair_notes = sheet.cell(r,17).value
repair_cap = sheet.cell(r,18).value
complaint = sheet.cell(r,19).value
delta = sheet.cell(r,20).value
# Assign values from each row
values = (lp, pallet_lp, bol, invoice_date, date_received, date_repaired, time_in_repair, date_shipped, serial_number, upc, product_type, product_description, repair_code, condition, repair_cost, parts_cost, total_cost, repair_notes, repair_cap, complaint, delta)
# Execute sql Query
cursor.execute(query, values)
# Close the cursor
cursor.close()
# Commit the transaction
database.commit()
# Close the database connection
database.close()
# Print results
print ("")
columns = str(sheet.ncols)
rows = str(sheet.nrows)
print ("I just imported " + columns + " columns and " + rows + " rows to MySQL!")
You can see this answer for a more detailed explanation, but basically Excel treats dates as a number relative to 1899-12-31, and so to convert your date value to an actual date you need to convert that number into an ISO format date which MySQL will accept. You can do that using date.fromordinal and date.isoformat. For example:
dval = 43500
d = date.fromordinal(dval + 693594)
print(d.isoformat())
Output:
2019-02-04

I am working in terminal application(Python+MySQL)

def view_empdetails(): #this is my function: it works great
conn = mysql.connector.connect(host="localhost",user="root",passwd="#####",database="#DB")
cursor = conn.cursor() # this is database connection
viw = """select * from employees"""
cursor.execute(viw)
for emp_no,first_name,last_name,gender,DOB,street,city,state,zipcode,email,phone,hire_date in cursor.fetchall(): # fetch all data from employee table in DB
print('-'*50)
print(emp_no)
print(first_name)
print(last_name)
print(gender)
print(DOB)
print(street)
print(city)
print(state) # I need all these output be in a table or organize format
print(zipcode) #not only list of records
print(email)
print(phone)
print(hire_date)
print('-'*50)
conn.commit()
conn.close()
return menu2()
I need all records in one table|| codes bring data from Database as line by line without formatting> I need them in table
I'm not sure if you are familiar with the pandas library, but I believe it is helpful here. I have never used it with mysql, but I have used it with psycopg2 and pyodbc, so I think the basic idea should work:
data = pd.DataFrame(cur.fetchall(),columns = colnames)
creates a dataFrame (think python spreadsheet or python table) that uses the column names from the table you're querying.

how to read from csv file and store data in sqlite3 using python

i have a python class readCSVintoDB that read from csv file and store data into sqlite 3 database.
note :
the csv file includes many fields so i just need 3 of them.
until now i am able to read csv file and stored into dataframe using pandas. but how to store the dataframe into the database.
error displayed :
File "C:\Users\test\Documents\Python_Projects\readCSV_DB.py", line 15,
in init self.importCSVintoDB() File
"C:\Users\test\Documents\Python_Projects\readCSV_DB.py", line 60, in
importCSVintoDB INSERT INTO rduWeather VALUES (?,?,?,?)''', i)
sqlite3.IntegrityError: datatype mismatch
when i tried to print i in the for loop it display the header name date
readCSV_DB :
import sqlite3
import pandas as pd
import os
class readCSVintoDB():
def __init__(self):
'''
self.csvobj = csvOBJ
self.dbobj = dbOBJ
'''
self.importCSVintoDB()
def importCSVintoDB(self):
userInput= input("enter the path of the csv file: ")
csvfile = userInput
df = pd.read_csv(csvfile,sep=';')
#print("dataFrame Headers is {0}".format(df.columns))# display the Headers
dp = (df[['date','temperaturemin','temperaturemax']])
print(dp)
'''
check if DB file exist
if no create an empty db file
'''
if not(os.path.exists('./rduDB.db')):
open('./rduDB.db','w').close()
'''
connect to the DB and get a connection cursor
'''
myConn = sqlite3.connect('./rduDB.db')
dbCursor = myConn.cursor()
'''
Assuming i need to create a table of (Name,FamilyName,age,work)
'''
dbCreateTable = '''CREATE TABLE IF NOT EXISTS rduWeather
(id INTEGER PRIMARY KEY,
Date varchar(256),
TemperatureMin FLOAT,
TemperatureMax FLOAT)'''
dbCursor.execute(dbCreateTable)
myConn.commit()
'''
insert data into the database
'''
for i in dp:
print(i)
dbCursor.execute('''
INSERT INTO rduWeather VALUES (?,?,?,?)''', i)
#myInsert=dbCursor.execute('''insert into Info ('Name','FA','age','work')
#VALUES('georges','hateh',23,'None')''')
myConn.commit()
mySelect=dbCursor.execute('''SELECT * from rduWeather WHERE (id = 10)''')
print(list(mySelect))
myConn.close()
test1 = readCSVintoDB()
If you want to write a single row (e.g: reg = (...)) try this function:
def write_datarow(conn, cols, reg):
''' Create a new entry (reg) into the rduWeather table
input: conn (class SQLite connection)
input: cols (list)
Table columns names
input: reg (tuple)
Data to be written as a row
'''
sql = 'INSERT INTO rduWeather({}) VALUES({})'.format(', '. join(cols),'?, '*(len(cols)-1)+'?')
cur = conn.cursor()
# Execute the SQL query
cur.execute(sql, reg)
# Confirm
conn.commit()
return
But if you had multiple rows reg = [(...),...,(...)] then use:
def write_datarow(conn, cols, reg):
''' Create a new entry (reg) into the rduWeather table
input: conn (class SQLite connection)
input: cols (list)
Table columns names
input: reg (list of tuples)
List of rows to be written
'''
sql = 'INSERT INTO rduWeather({}) VALUES({})'.format(', '. join(cols),'?, '*(len(cols)-1)+'?')
cur = conn.cursor()
# Execute the SQL query
cur.executemany(sql, reg)
# Confirm
conn.commit()
return
After your edition now I saw the problem. You commit the SQL query outside the for-loop.
Code this:
for i in dp:
dbCursor.execute(''' INSERT INTO rduWeather VALUES (?,?,?,?)''', i)
myConn.commit()

Operational error when inserting list of lists into sqlite table with executemany

I am creating a database and inserting a list of lists that are created from an excel sheet. The main list is created using openpyxl, that list is then split every 18 items into a list of lists. I would like to then insert all the items into the database. Not very familiar with SQL but after some research, I managed to put this together:
import sqlite3
from openpyxl import load_workbook
import os
filepath = os.path.expanduser("~\Desktop\\")
data1 = []
data=[]
wb1 = load_workbook(filename=filepath+"exportUL1.XLSX")
ws1 = wb1['Sheet1'] ###call the worksheet with the data
x = 0
for row in ws1.iter_rows():
#look for the correct value, "Q" and return all the data in that row to the data list
for cell in row:
if cell.value == 'Q':
data.append(x) #append the id numbers to the list
for cell in row:
data.append(str(cell.value)) #append the row data to the list
x += 1
data_lists = [data[x:x+18] for x in range(0, len(data),18)] #convert list to list of lists, split every 18 items
conn = sqlite3.connect('test.db')
print "Opened database successfully";
with conn:
cur = conn.cursor()
cur.execute('''CREATE TABLE QualityHold(
Id Int,StorageLocation TEXT, StorageType TEXT, StorageBin TEXT, StorageUnit TEXT,
Material TEXT, Plant TEXT, Batch TEXT, StockCategory TEXT, SpecialStock TEXT, SpecialStockNumber TEXT,
Duration TEXT, PutawayBlock TEXT, StockRemovalBlock TEXT, AvailableStock TEXT, StockforPutaway TEXT,
PickQuantity TEXT, TotalStock TEXT)''') #create the table with these headers
sql = '''INSERT INTO QualityHold (Id,StorageLocation,
StorageType, StorageBin, StorageUnit, Material,
Plant, Batch, StockCategory, SpecialStock,
SpecialStockNumber, Duration, PutawayBlock, StockRemovalBlock,
AvailableStock, StockforPutaway, PickQuantity, TotalStock)
VALUES
(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'''#sql command
cur.executemany(sql,data_lists)###execute the sql command using the list of lists for the variables(LINE 38)
conn.close()
Once I run this I get the following error
Traceback (most recent call last):
File "C:\Users\ONP1LDY\eclipse-workspace\WOrk\QualityInspection.py", line 38, in <module>
cur.executemany(sql,data_lists)
sqlite3.OperationalError: near "%": syntax error
Any help with what would be causing this would be great!
I went a different route and replaced the executemany with the following:
for lst in data_lists:
var_string = ', '.join('?'*len(lst))
query_string = 'INSERT OR IGNORE INTO QualityHold VALUES (%s);' % var_string
cur.execute(query_string, lst)

Categories