Python regex inserting big string full of data into SQL database

Python regex inserting big string full of data into SQL database - python

I would like to take a gigantic string, chop it up and put it into an SQL table in order.
So far I have tried using regex to split up the string, getting the values I want and trying insert them into the table like so.
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
for line in bigStringLines:
values = re.split(":", line)
stmt = "INSERT INTO mytable (\"" + values[0] + "\") VALUES (\"" + values[1] + "\");"
c.execute(stmt)
However it looks like this inside the SQL database
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1
1.2.3.123
CyberSoftware
20121115
Computer1
b37da93e9c05
Installed program 2
4.5.6.456
MicroSoftware
20160414
Computer2
b37da93e9c06
Idealy I would like it to look like this inside the database:
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1 1.2.3.123 CyberSoftware 20121115 Computer1 b37da93e9c05
Installed program 2 4.5.6.456 MicroSoftware 20160414 Computer2 b37da93e9c06
Here's what the main structure of the string looks like:
DisplayName : Installed program 1
DisplayVersion : 1.2.3.123
Publisher : CyberSoftware
InstallDate : 20121115
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
DisplayName : Installed program 2
DisplayVersion : 2.2.2.147
Publisher : CyberSoftware
InstallDate : 20140226
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
Just for a bit of extra background info, this will be part of a bigger program that queries what apps are installed on a large group of computers. For testing I'm just using SQLite however plan to move it to MySQL in the future.
If anyone know what I'm doing wrong or has any suggestions I would greatly appreciate it.

You're doing an insert for every line in the text file, not for every record in the file. Only do an insert for every record. If this is consistent, then fill variables and insert after filling RunSpaceId or a blank line, then clear all variables (or use a dictionary, probably easier) and iterate to the next record. Something like:
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
record = {}
for line in bigStringLines:
if line.startswith("DisplayName"):
record["DisplayName"] = re.split(":", line)[1] # or find index of colon and use negative slice notation from end of string
elif line.startswith("DisplayVersion"):
record["DisplayVersion"] = re.split(":", line)[1]
# and so on for all values....
elif line.strip() == "": # blank line = end of record (or use `RunSpaceId as trigger once populated)
stmt = "INSERT INTO mytable (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId) VALUES ({DisplayName}, {DisplayVersion}, {Publisher}, {InstallDate}, {PSCOmputerName}, {RunspaceId});".format(**record) # adjust as needed depending on python version
c.execute(stmt)
record = {} # reset for next record
And PS, if this is in a text file, this can all be accomplished without using RegEx at all (and I recommend this). There is no reason to read the entire file into memory if it is a local flat file.

Related

How to put a $ before a value taken from a database through SQL in Python

I'm trying to display values in HTML that have a "$" at the beginning, but the way I print out the values in HTML makes it so that with the justification I can only add it at the end of the previous value or at the end of the value.
I'm thinking I have to somehow incorporate the "$" into the for loop, but I'm not sure how to do that.
BODY['html'] += '<br>Total shipped this month:..............Orders........Qty...........Value<br>'
SQL5 = '''
select count(*) as CNT, sum(A.USER_SHIPPED_QTY) as QTY, sum(( A.USER_SHIPPED_QTY) * A.UNIT_PRICE) as VALUE
from SHIPPER_LINE A, SHIPPER B
where B.PACKLIST_ID = A.PACKLIST_ID
and A.CUST_ORDER_ID like ('CO%')
and B.SHIPPED_DATE between ('{}') and ('{}')
'''.format(RP.get_first_of_cur_month_ora(), RP.get_rep_date_ora())
## {} and .format get around the issue of using %s with CO%
print SQL5
curs.execute(SQL5)
for line in curs: ##used to print database lines in HTML
print line
i=0
for c in line:
if i==0:
BODY['html'] += '<pre>' + str(c).rjust(60,' ')
elif i == 1:
BODY['html'] += str(c).rjust(15,' ')
else:
BODY['html'] += str(c).rjust(22,' ') + '</pre>'
i+=1
The "pre" in HTML is used to keep the whitespace and the ' ' after rjust is used to space the numbers properly to go under the column headings. The values that are printed out are generated from the database using the SQL.
Here is what displays in HTML for this code:
Total shipped this month:..............Orders........Qty...........Value
3968 16996 1153525.96
This is what I want it to look like:
Total shipped this month:..............Orders........Qty...........Value
3968 16996 $1153525.96

You could apply the format in the DB by wrapping your sum with a to_char and a currency/numeric format model ...
select to_char(12345.67, 'FML999,999.99') FROM DUAL;

How to fix python hardcoded dictionary encoding issue

Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xEF\\xBF\\xBD 20...' for column 'history' at row 1")
I've received a few variations of this as I've tried to tweak my dictionary, always in the history column, the only variations is the characters it tells me are issues.
I can't post the dictionary because it's got sensitive information, but here is the jist:
I started with 200 addresses (including state, zip, etc) that needed
to be validated, normalized and standardized for DB insertion.
I spent a lot of time on google maps validating and standardizing.
I decided to get fancy, and put all the crazy accented letters in the addresses of these world addresses (often copies from google because I don't know how to type and A with an o over it, lol), Singapore to Brazil, everywhere.
I ended up with 120 unique addresses in my dictionary after processing.
Everything works 100% perfectly when INSERTING the data in SQLite and OUTPUTING to a CSV. The issue is exclusively with MySQL and some sneaky un-viewable characters.
Note: I used this to remove the accents after 7 hours of copy/pasting to notepad, encoding it with notepad++ and just trying to processes the data in a way that made it all the correct encoding. I think I did lose the version with the accents and only have this tools output now.
I do not see "\xEF\xBF\xBD 20..." in my dictionary I only see text. Currently I don't even see "20"... those two chars helped me find the previous issues.
Code I can show:
def insert_tables(cursor, assets_final, ips_final):
#Insert Asset data into asset table
field_names_dict = get_asset_field_names(assets_final)
sql_field_names = ",".join(field_names_dict.keys())
for key, row in assets_final.items():
insert_sql = 'INSERT INTO asset(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
#Insert IP data into IP table
field_names_dict = get_ip_field_names(ips_final)
sql_field_names = ",".join(field_names_dict.keys())
for hostname_key, ip_dict in ips_final.items():
for ip_key, ip_row in ip_dict.items():
insert_sql = 'INSERT INTO ip(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(ip_row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
def output_sqlite_db(sqlite_file, assets_final, ips_final):
conn = sqlite3.connect(sqlite_file)
cursor = conn.cursor()
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
def output_mysql_db(assets_final, ips_final):
conn = mysql.connect(host=config.mysql_ip, port=config.mysql_port, user=config.mysql_user, password=config.mysql_password, charset="utf8mb4", use_unicode=True)
cursor = conn.cursor()
cursor.execute('USE ' + config.mysql_DB)
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
EDIT: Could this have something to do with the fact I'm using Cygwin as my terminal? HA! I added this line and got a different message (now using the accented version again):
cursor.execute('SET NAMES utf8')
Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xC5\\x81A II...' for column 'history' at row 1")

I can shine a bit of light on the messages that you have supplied:
Case 1:
>>> import unicodedata as ucd
>>> s1 = b"\xEF\xBF\xBD"
>>> s1
b'\xef\xbf\xbd'
>>> u1 = s1.decode('utf8')
>>> u1
'\ufffd'
>>> ucd.name(u1)
'REPLACEMENT CHARACTER'
>>>
Looks like you have obtained some bytes encoded in an encoding other than utf8 (e.g. cp1252) then tried bytes.decode(encoding='utf8', errors='strict'). This detected some errors. You then decoded again with errors="replace". This raised no exceptions. However your data has had the error bytes replaced by the replacement character (U+FFFD). Then you encoded your data using str.encodeso that you could write to a file or database. Each replacement characters turns up as 3 hex bytes EF BF BD.
... more to come
Case 2:
>>> s2 = b"\xC5\x81A II"
>>> s2
b'\xc5\x81A II'
>>> u2 = s2.decode('utf8')
>>> u2
'\u0141A II'
>>> ucd.name(u2[0])
'LATIN CAPITAL LETTER L WITH STROKE'
>>>

Python: Iterating through MySQL columns

I'm wondering if you can help me. I'm trying to change the value in each column if the text matches a corresponding keyword. This is the loop:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike %s") %(column_names[i], search_terms[i])
The MySQL command works fine on its own, but not when I put it in the loop. It's giving an error at the first %s
Does anyone have any insights?
This is the error:
_mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s = 1 WHERE text rlike %s' at line 1")
Column names looks like
column_names = ["col1","col2","col3"...]
Search terms look like
search_terms = ["'(^| |.|-)word1[;:,. ?-]'","'(^| |.|-)word2[;:,. ?-]'",...]

The right way to do this is to give values to Python, which will quote things correctly.
adapted from voyager's post:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET {} = 1 WHERE text rlike %s".format(column_names[i]),
(search_terms[i],),
)
In this case it's confusing because the column_name isn't a value, it's part of the table structure, so it's inserted using good old string formatting. The search_term is a value, so is passed to cursor.execute() for correct, safe quoting.
(Don't use string manipulation to add the quotes -- you're exposing yourself to SQL injection.)

Missing quotes and wrong parenthesis placement...
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike '%s'" %(column_names[i], search_terms[i]))
# ^ ^
# (-----------------------------------------------------------------------------------)
Please note, this is not the right way of doing this, if your string may contain quotes by itself...
What about that instead:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike ?" % (column_names[i],),
(search_terms[i],))
This uses the % operator to set the column name, but uses an executes parameter to bind the data, letting the DB driver escape all characters that need so.

Automatically Load SQL table by reading data from text file

I am trying to write a python script that is going to load the tables that I created in pyhton using SQL and populate them with data automatically that is coming from a text file. I am stuck on basic coding. I do have a general idea but I am getting errors when I try to run this approach. I have created 2 tables. I have read the file. the file is a comma seperated text file with no headers.
first 3 lines of the file looks like this.
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
| John | Smith | 111 N. Wabash Avenue | plumber | 5 | 1.0 | 200 |
| John | Smith | 111 N. Wabash Avenue | bouncer | 5 | 1.0 | 200 |
| Jane | Doe | 243 S. Wabash Avenue | waitress | 1 | 5.0 | 10000 |
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
import sqlite3
conn= sqlite3.connect('csc455.db')
c = conn.cursor()
#Reading the data file
fd = open ('C:/Users/nasia/Documents/data_hw2.txt','r')
data = fd.readlines()
#Creating Tables
>>> L = """create table L
... (first text, last text, address text, job text, LNum integer,
... constraint L_pk
... primary key(first, last, address, job),
... constraint L_fk
... foreign key (LNum) references LN(LNum)
... );"""
>>> c.execute(L)
LN = """create table LN
... (
... LNum integer, Interest float, Amount, Integer,
... constraint LN_pk
... primary key (LNum)
... );"""
c.execute(LN)
#Inserting into database
for elt in data:
... currentRow = elt.split(", ")[:-1]
... insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
... c.execute(insert)
There is some syntax error here. The code stops working. I cannot figure out what I am doing wrong.
The error is
Traceback (most recent call last):
File "", line 4, in
OperationalError: near "(": syntax error
I can not figure out what am I doing wrong

You haven't explained what format the data are in, or what your table structure is, or how you want to map them, which makes this difficult to answer. But I'll make up my own, and answer that, and hopefully it will help:
infile.txt:
CommonName,Species,Location,Color
Black-headed spider monkey,Ateles fusciceps,Ecuador,black
Central American squirrel monkey,Saimiri oerstedii,Costa Rica,orange
Vervet,Chlorocebus pygerythrus,South Africa,white
script.py
import csv
import sqlite3
db = sqlite3.connect('outfile.db')
cursor = db.cursor()
cursor.execute('CREATE TABLE Monkeys (Common Name, Color, Species)')
cursor.execute('''CREATE TABLE MonkeyLocations (Species, Location,
FOREIGN KEY(Species) REFERENCES Monkeys(Species))''')
with open('infile.txt') as f:
for row in csv.DictReader(f):
cursor.execute('''INSERT INTO Monkeys
VALUES (:CommonName, :Color, :Species)''', row)
cursor.execute('''INSERT INTO MonkeyLocations
VALUES (:Species, :Location)''', row)
db.commit()
db.close()
Of course if your real data are in some other format than CSV, you'll use different code to parse the input file.
I've also made things slightly more complex than your real data might have to deal with—the CSV columns don't have quite the same names as the SQL columns.
In other ways, your data might be more complex—e.g., if your schema has foreign keys that reference an auto-incremented row ID instead of a text field, you'll need to get the rowid after the first insert.
But this should be enough to give you the idea.
Now that you've shown more details… you were on the right track (although it's wasteful to call readlines instead of just iterating over fd directly, and you should close your db and file, ideally with a with statement, …), but you've got a simple mistake right near the end that prevents you from getting any farther:
insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
c.execute(insert)
You've put the formatting % expression directly into the string, instead of using the operator on the string. I think what you were trying to do is:
insert = """insert into LN values (%s, %s, %s);""" % (currentRow[4], currentRow[5], currentRow[6])
c.execute(insert)
However, you shouldn't do that. Instead, do this:
insert = """insert into LN values (?, ?, ?);"""
c.execute(insert, (currentRow[4], currentRow[5], currentRow[6]))
What's the difference?
Well, the first one just inserts the values into the statement as Python strings. That means you have to take care of converting to the proper format, quoting, escaping, etc. yourself, instead of letting the database engine decide how to deal with each value. Besides being a source of frustrating bugs when you try to save a boolean value or forget to quote a string, this also leaves you open to SQL injection attacks unless you're very careful.
There are other problems besides that one. For example, most databases will try to cache repeated statements, and it's trivial to tell that 3000 instances of insert into LN values (?, ?, ?) are all the same statement, but less so to tell that insert into LN values (5, 1.0, 200) and insert into LN values (1, 5.0, 5000) are the same statement.

If you can use standard sqlite3 utility, you can do it much easier:
sqlite3 -init mydata.sql mydatabase.db ""
simply call this line from your python script, and you're done.
This will read any text file that contains valid SQL statements, and will create mydatabase.db if it did not exist. What's more important, it supports statements spanning more than one line, and also properly ignores SQL comments using both --comment syntax and C/C++ like /*comment*/ syntax.
Typically your mydata.sql content should look like this:
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS table1 (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(32)
);
INSERT INTO table1 (name) VALUES
('John'),
('Jack'),
('Jill');
-- more statements ...
COMMIT;

Generating a list of names in python from mysql

I currently am able to generate a text file with the information but for some reason i can not send the data to go into a list. i have tried it 2 ways:
cnx = mysql.connector.connect(user='root', database='smor')
cursor = cnx.cursor()
sqlQuery = ("SELECT id,name,CAST(aa_seq as CHAR(65535)) aa_seq FROM smor.domain_tbl WHERE domain_type_id=5 AND domain_special IS NULL LIMIT 100000")
cursor.execute(sqlQuery)
print "Generating FASTA file: ", FASTA_File1
with open(FASTA_File1, "w") as FASTA1:
for (aa_id, name, aa_seq) in cursor:
FASTA1.write(">" + name + '\n' + aa_seq + '\n')
print ">" + name + '\n' + aa_seq
ListOfNames =[]
for (aa_id, name, aa_seq) in cursor:
ListOfNames.append(name)
cursor.close()
print "ListOfNames", ListOfNames
this successfully prints the name and amino acid sequence into the text file but the string is empty. here are the last lines of the output in the console:
>NC_018581.1_05_011_001_020 P
RVPGEMYERAEDGALIPTGVRARWVDAPGSRREIVGPIARHPRIDGRRVDLDVVEEALAAVTGVTAAAVVGLPTDDGVEVGACVVLDRDDLDVPGLRRELSQTLAAHCVPTMISIVESIPLGTDGRPDHGEV
ListOfNames []
As you can see the list remains empty. I thought that perhaps the cursor could not jump back up to the top so i closed the cursor and reopened it exactly as above but with the list generation in the second instance. this caused an error in the script and i do not know why.
Is it that the data can not be read directly into a list?
Theoretically i can split the names of the sequences back out of the text file but i am curious why this method is not working.

As you suspect, the cursor's result set can be read once, after which it is 'consumed'.
Just put the result inside the list first, then iterate over the list to write it's content to the file. Or do both in one loop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regex inserting big string full of data into SQL database - python

Related

How to put a $ before a value taken from a database through SQL in Python

How to fix python hardcoded dictionary encoding issue

Python: Iterating through MySQL columns

Automatically Load SQL table by reading data from text file

Generating a list of names in python from mysql

Categories

Resources