Automatically Load SQL table by reading data from text file - python

I am trying to write a python script that is going to load the tables that I created in pyhton using SQL and populate them with data automatically that is coming from a text file. I am stuck on basic coding. I do have a general idea but I am getting errors when I try to run this approach. I have created 2 tables. I have read the file. the file is a comma seperated text file with no headers.
first 3 lines of the file looks like this.
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
| John | Smith | 111 N. Wabash Avenue | plumber | 5 | 1.0 | 200 |
| John | Smith | 111 N. Wabash Avenue | bouncer | 5 | 1.0 | 200 |
| Jane | Doe | 243 S. Wabash Avenue | waitress | 1 | 5.0 | 10000 |
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
import sqlite3
conn= sqlite3.connect('csc455.db')
c = conn.cursor()
#Reading the data file
fd = open ('C:/Users/nasia/Documents/data_hw2.txt','r')
data = fd.readlines()
#Creating Tables
>>> L = """create table L
... (first text, last text, address text, job text, LNum integer,
... constraint L_pk
... primary key(first, last, address, job),
... constraint L_fk
... foreign key (LNum) references LN(LNum)
... );"""
>>> c.execute(L)
LN = """create table LN
... (
... LNum integer, Interest float, Amount, Integer,
... constraint LN_pk
... primary key (LNum)
... );"""
c.execute(LN)
#Inserting into database
for elt in data:
... currentRow = elt.split(", ")[:-1]
... insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
... c.execute(insert)
There is some syntax error here. The code stops working. I cannot figure out what I am doing wrong.
The error is
Traceback (most recent call last):
File "", line 4, in
OperationalError: near "(": syntax error
I can not figure out what am I doing wrong

You haven't explained what format the data are in, or what your table structure is, or how you want to map them, which makes this difficult to answer. But I'll make up my own, and answer that, and hopefully it will help:
infile.txt:
CommonName,Species,Location,Color
Black-headed spider monkey,Ateles fusciceps,Ecuador,black
Central American squirrel monkey,Saimiri oerstedii,Costa Rica,orange
Vervet,Chlorocebus pygerythrus,South Africa,white
script.py
import csv
import sqlite3
db = sqlite3.connect('outfile.db')
cursor = db.cursor()
cursor.execute('CREATE TABLE Monkeys (Common Name, Color, Species)')
cursor.execute('''CREATE TABLE MonkeyLocations (Species, Location,
FOREIGN KEY(Species) REFERENCES Monkeys(Species))''')
with open('infile.txt') as f:
for row in csv.DictReader(f):
cursor.execute('''INSERT INTO Monkeys
VALUES (:CommonName, :Color, :Species)''', row)
cursor.execute('''INSERT INTO MonkeyLocations
VALUES (:Species, :Location)''', row)
db.commit()
db.close()
Of course if your real data are in some other format than CSV, you'll use different code to parse the input file.
I've also made things slightly more complex than your real data might have to deal with—the CSV columns don't have quite the same names as the SQL columns.
In other ways, your data might be more complex—e.g., if your schema has foreign keys that reference an auto-incremented row ID instead of a text field, you'll need to get the rowid after the first insert.
But this should be enough to give you the idea.
Now that you've shown more details… you were on the right track (although it's wasteful to call readlines instead of just iterating over fd directly, and you should close your db and file, ideally with a with statement, …), but you've got a simple mistake right near the end that prevents you from getting any farther:
insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
c.execute(insert)
You've put the formatting % expression directly into the string, instead of using the operator on the string. I think what you were trying to do is:
insert = """insert into LN values (%s, %s, %s);""" % (currentRow[4], currentRow[5], currentRow[6])
c.execute(insert)
However, you shouldn't do that. Instead, do this:
insert = """insert into LN values (?, ?, ?);"""
c.execute(insert, (currentRow[4], currentRow[5], currentRow[6]))
What's the difference?
Well, the first one just inserts the values into the statement as Python strings. That means you have to take care of converting to the proper format, quoting, escaping, etc. yourself, instead of letting the database engine decide how to deal with each value. Besides being a source of frustrating bugs when you try to save a boolean value or forget to quote a string, this also leaves you open to SQL injection attacks unless you're very careful.
There are other problems besides that one. For example, most databases will try to cache repeated statements, and it's trivial to tell that 3000 instances of insert into LN values (?, ?, ?) are all the same statement, but less so to tell that insert into LN values (5, 1.0, 200) and insert into LN values (1, 5.0, 5000) are the same statement.

If you can use standard sqlite3 utility, you can do it much easier:
sqlite3 -init mydata.sql mydatabase.db ""
simply call this line from your python script, and you're done.
This will read any text file that contains valid SQL statements, and will create mydatabase.db if it did not exist. What's more important, it supports statements spanning more than one line, and also properly ignores SQL comments using both --comment syntax and C/C++ like /*comment*/ syntax.
Typically your mydata.sql content should look like this:
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS table1 (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
);
INSERT INTO table1 (name) VALUES
('John'),
('Jack'),
('Jill');
-- more statements ...
COMMIT;

Related

Python regex inserting big string full of data into SQL database

I would like to take a gigantic string, chop it up and put it into an SQL table in order.
So far I have tried using regex to split up the string, getting the values I want and trying insert them into the table like so.
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
for line in bigStringLines:
values = re.split(":", line)
stmt = "INSERT INTO mytable (\"" + values[0] + "\") VALUES (\"" + values[1] + "\");"
c.execute(stmt)
However it looks like this inside the SQL database
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1
1.2.3.123
CyberSoftware
20121115
Computer1
b37da93e9c05
Installed program 2
4.5.6.456
MicroSoftware
20160414
Computer2
b37da93e9c06
Idealy I would like it to look like this inside the database:
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1 1.2.3.123 CyberSoftware 20121115 Computer1 b37da93e9c05
Installed program 2 4.5.6.456 MicroSoftware 20160414 Computer2 b37da93e9c06
Here's what the main structure of the string looks like:
DisplayName : Installed program 1
DisplayVersion : 1.2.3.123
Publisher : CyberSoftware
InstallDate : 20121115
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
DisplayName : Installed program 2
DisplayVersion : 2.2.2.147
Publisher : CyberSoftware
InstallDate : 20140226
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
Just for a bit of extra background info, this will be part of a bigger program that queries what apps are installed on a large group of computers. For testing I'm just using SQLite however plan to move it to MySQL in the future.
If anyone know what I'm doing wrong or has any suggestions I would greatly appreciate it.
You're doing an insert for every line in the text file, not for every record in the file. Only do an insert for every record. If this is consistent, then fill variables and insert after filling RunSpaceId or a blank line, then clear all variables (or use a dictionary, probably easier) and iterate to the next record. Something like:
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
record = {}
for line in bigStringLines:
if line.startswith("DisplayName"):
record["DisplayName"] = re.split(":", line)[1] # or find index of colon and use negative slice notation from end of string
elif line.startswith("DisplayVersion"):
record["DisplayVersion"] = re.split(":", line)[1]
# and so on for all values....
elif line.strip() == "": # blank line = end of record (or use `RunSpaceId as trigger once populated)
stmt = "INSERT INTO mytable (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId) VALUES ({DisplayName}, {DisplayVersion}, {Publisher}, {InstallDate}, {PSCOmputerName}, {RunspaceId});".format(**record) # adjust as needed depending on python version
c.execute(stmt)
record = {} # reset for next record
And PS, if this is in a text file, this can all be accomplished without using RegEx at all (and I recommend this). There is no reason to read the entire file into memory if it is a local flat file.

Trying to insert value from txt file with sqlite in python

def quantity():
i = 0
x = 1
file = open("john.txt", "r")
while i < 5000:
for line in file:
c.execute("INSERT INTO test (playerNAME, playerID) VALUES ("+line+", "+str(x)+")")
conn.commit()
x = random.randint(100,10000000000000000)
i += 1
I try to iterate through the John.txt file and insert each value into a table. The first word in the txt file is "abc123". When I run this code there is an error: sqlite3.OperationalError: no such column: abc123
I can get the code to enter the random numbers into playerID but I can't get the txt file query to work...
You need single quotes around the string.
c.execute("INSERT INTO test (playerNAME, playerID) VALUES ('"+line+"', "+str(x)+")")
Otherwise it tries to interpret it as a sql expression and looks for the named column.
More generally you should use parameters or sanitize the incoming data from the file for safety against sql insertion. Even if you trust this particular file. It's a good habit.
c.execute("INSERT INTO test (playerName, playerID) VALUES (?, ?)", (line, x))
Details are here and here is why it's important.
Formatting sql queries via string concatenation is very bad practice.
Variable bindging should always be used:
c.execute("INSERT INTO test (playerNAME, playerID) VALUES (?, ?)", [line, x])
In your case the line probably contains spaces or any punctuation mark.
The sqlite's error string is misleading, though.

Inserting multi-word string and an empty array with psycopg2

psycopg2 complains when inserting multiple words, empty strings, and empty arrays:
name = "Meal Rounds"
description = ""
sizes = []
cur.execute(""" INSERT INTO items (name, description, sizes) VALUES (%s, %s, %s)""" % (name, description, sizes))
Errors:
# Multi word error
psycopg2.ProgrammingError: syntax error at or near "Rounds"
LINE 1: ... (name, description, sizes) VALUES (Meal Rounds, , ...
^
# Empty string error
psycopg2.ProgrammingError: syntax error at or near ","
LINE 1: ...scription, sizes) VALUES ("Meal Rounds", , [], Fals...
^
# Empty array error
psycopg2.ProgrammingError: syntax error at or near "["
LINE 1: ...n, sizes) VALUES ("Meal Rounds", "None", [], False)...
^
I can get around the multi word error by escaping:
""" INSERT INTO items (name, description, sizes) VALUES (\"%s\", \"%s\", %s)"""
But for tables with 15+ columns, escaping each one is a pain. Does psycopg2 not handle this in an easier fashion? It will still throw errors for empty strings though.
How do I insert multiple words more efficiently, and how to insert empty strings and arrays?
Here is what psql prints out on my columns:
name | character varying(255) |
description | character varying(255) |
sizes | integer[] |
Your call to execute is creating a string with Python string substitution, which is turning out to be invalid SQL. You should be using the parameter substitution provided by the Python DB API:
https://www.python.org/dev/peps/pep-0249/#id15
To call execute using parameter substitution, you pass it two arguments. The first is the query with parameter strings which are database dependent. Psycopg2 uses "pyformat" paramstyle so your query will work as written. The second argument should be the variables you want to substitute into the query. The database driver will handle all the quoting/escaping you need. So your call to execute should be
cur.execute("""INSERT INTO items (name, description, sizes) VALUES (%s, %s, %s)""", (name, description, sizes))

Python: Iterating through MySQL columns

I'm wondering if you can help me. I'm trying to change the value in each column if the text matches a corresponding keyword. This is the loop:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike %s") %(column_names[i], search_terms[i])
The MySQL command works fine on its own, but not when I put it in the loop. It's giving an error at the first %s
Does anyone have any insights?
This is the error:
_mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s = 1 WHERE text rlike %s' at line 1")
Column names looks like
column_names = ["col1","col2","col3"...]
Search terms look like
search_terms = ["'(^| |.|-)word1[;:,. ?-]'","'(^| |.|-)word2[;:,. ?-]'",...]
The right way to do this is to give values to Python, which will quote things correctly.
adapted from voyager's post:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET {} = 1 WHERE text rlike %s".format(column_names[i]),
(search_terms[i],),
)
In this case it's confusing because the column_name isn't a value, it's part of the table structure, so it's inserted using good old string formatting. The search_term is a value, so is passed to cursor.execute() for correct, safe quoting.
(Don't use string manipulation to add the quotes -- you're exposing yourself to SQL injection.)
Missing quotes and wrong parenthesis placement...
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike '%s'" %(column_names[i], search_terms[i]))
# ^ ^
# (-----------------------------------------------------------------------------------)
Please note, this is not the right way of doing this, if your string may contain quotes by itself...
What about that instead:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike ?" % (column_names[i],),
(search_terms[i],))
This uses the % operator to set the column name, but uses an executes parameter to bind the data, letting the DB driver escape all characters that need so.

Read a text file and transfer contents to mysql database table using python

I am new to data base handling using python programming.
By using python programming ,i want to read raw text file which consist of STUDEN T_NAME,STUDENT_MARKS. Which are separated by pipe symbol(given as below Example),I want to push this data into student table consists of 2 columns (STUDENT_NAME,STUDENT_MARKS) with respective data values.
input data file will be like this(it consists of some thousands of records like this),my input file is .Dat file ,its start only with records,each line contain 0 or more number of records(there is no fixed count of records on each line),there is no other keyword appear anywhere else ::
records STUDENT_NAME| jack | STUDENT_MARKS|200| STUDENT_NAME| clark
|STUDENT_MARKS|200| STUDENT_NAME| Ajkir | STUDENT_MARKS|30|
STUDENT_NAME| Aqqm | STUDENT_MARKS|200| STUDENT_NAME| jone |
STUDENT_MARKS|200| STUDENT_NAME| jake | STUDENT_MARKS|100|
Output mysql table table::
STUDENT_NAME| STUDENT_MARKS
jack | 200
clark | 200
.......
please advice me to read file&push data in efficient way.
I would be so grateful if someone could give me script to achieve this.
# import mysql module
import MySQLDB
# import regular expression module
import re
# set file name & location (note we need to create a temporary file because
# the original one is messed up)
original_fyle = open('/some/directory/some/file.csv', 'r')
ready_fyle = open('/some/directory/some/ready_file.csv', 'w')
# initialize & establish connection
con = MySQLdb.connect(host="localhost",user="username", passwd="password",db="database_name")
cur = con.cursor()
# prepare your ready file
for line in original_fyle:
# substitute useless information this also creates some formatting for the
# actuall loading into mysql
line = re.sub('STUDENT_NAME|', '\n', line)
line = re.sub('STUDENT_MARKS|', '', line)
ready_fyle.write(line)
# load your ready file into db
# close file
ready_file.close()
# create a query
query = 'load data local infile "/some/directory/some/ready_file.csv" into table table_name field terminated by "|" lines terminated by "\n" '
# run it
cur.execute(query)
# commit just in case
cur.commit()
In the spirit of being kind to newcomers, some code to get you started:
# assuming your data is exactly as in the original question
data = '''records STUDENT_NAME| jack | STUDENT_MARKS|200| STUDENT_NAME| clark |STUDENT_MARKS|200| STUDENT_NAME| Ajkir | STUDENT_MARKS|30| STUDENT_NAME| Aqqm | STUDENT_MARKS|200| STUDENT_NAME| jone | STUDENT_MARKS|200| STUDENT_NAME| jake | STUDENT_MARKS|100|'''
data = data.split('|')
for idx in range(1, len(data), 4):
# every second item in the list is a name and every fourth is a mark
name = data[idx].strip() # need to add code to check for duplicate names
mark = int(data[idx+2].strip()) # this will crash if not a number
print(name, mark) # use these values to add to the database
You may want to play with SQLite using this tutorial to learn how to use such databases with Python.
And this tutorial about file input may be useful.
You may want to start with this and then come back with some code.

Categories