I currently am able to generate a text file with the information but for some reason i can not send the data to go into a list. i have tried it 2 ways:
cnx = mysql.connector.connect(user='root', database='smor')
cursor = cnx.cursor()
sqlQuery = ("SELECT id,name,CAST(aa_seq as CHAR(65535)) aa_seq FROM smor.domain_tbl WHERE domain_type_id=5 AND domain_special IS NULL LIMIT 100000")
cursor.execute(sqlQuery)
print "Generating FASTA file: ", FASTA_File1
with open(FASTA_File1, "w") as FASTA1:
for (aa_id, name, aa_seq) in cursor:
FASTA1.write(">" + name + '\n' + aa_seq + '\n')
print ">" + name + '\n' + aa_seq
ListOfNames =[]
for (aa_id, name, aa_seq) in cursor:
ListOfNames.append(name)
cursor.close()
print "ListOfNames", ListOfNames
this successfully prints the name and amino acid sequence into the text file but the string is empty. here are the last lines of the output in the console:
>NC_018581.1_05_011_001_020 P
RVPGEMYERAEDGALIPTGVRARWVDAPGSRREIVGPIARHPRIDGRRVDLDVVEEALAAVTGVTAAAVVGLPTDDGVEVGACVVLDRDDLDVPGLRRELSQTLAAHCVPTMISIVESIPLGTDGRPDHGEV
ListOfNames []
As you can see the list remains empty. I thought that perhaps the cursor could not jump back up to the top so i closed the cursor and reopened it exactly as above but with the list generation in the second instance. this caused an error in the script and i do not know why.
Is it that the data can not be read directly into a list?
Theoretically i can split the names of the sequences back out of the text file but i am curious why this method is not working.
As you suspect, the cursor's result set can be read once, after which it is 'consumed'.
Just put the result inside the list first, then iterate over the list to write it's content to the file. Or do both in one loop.
Related
Loading data to oracle via python 3 using cx_Oracle library. Here is code snippet:
for fl in processing_list:
fname = fl.split('/')[-1]
data_set = []
data_reader = csv.reader(open(fl,'r'),delimiter='|')
for rec in data_reader:
rec.insert(0,fname)
data_set.append(rec)
curs.executemany('insert into test_sdp_dump values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:27,:28,:29,:30,:31,:32,:33,:34,:35,:36,:37,:38,:39,:40,:41,:42,:43,:44,:45,:46,:47,:48,:49,:50,:51,:52,:53,:54,:55,:56,:57,:58,:59,:60,:61,:62,:63,:64,:65,:66,:67,:68,:69,:70,:71,:72,:73,:74,:75,:76,:77,:78,:79,:80,:81,:82,:83,:84,:85,:86,:87,:88,:89,:90,:91,:92,:93,:94,:95,:96,:97)',data_set,batcherrors=True)
for error in curs.getbatcherrors():
print('Error Message:' + error.message + 'Row Offset:' + str(error.offset))
print(data_set[error.offset])
It works fine while inserting and giving error message.
However, I need to have the erroneous records and keep it in file.
Tried to find the record via Row Offset but it doesn't give the correct records.
How can I get the erroneous records?
Suggest a way forward kindly.
You can add a list(err), and append your error messages to this list, and create a new file in w mode in order to write the messages in it line by line such as below code :
curs.executemany('INSERT INTO test_sdp_dump VALUES(:1,:2,..)',data_set,batcherrors=True)
err = []
for error in curs.getbatcherrors():
print(data_set[error.offset])
err.append('Error Message:' + error.message + ' - Row Offset:' + str(error.offset+1))
with open('err.txt','w') as f_out:
for i in err:
f_out.write(''.join(i) + '\n')
I'm trying to display values in HTML that have a "$" at the beginning, but the way I print out the values in HTML makes it so that with the justification I can only add it at the end of the previous value or at the end of the value.
I'm thinking I have to somehow incorporate the "$" into the for loop, but I'm not sure how to do that.
BODY['html'] += '<br>Total shipped this month:..............Orders........Qty...........Value<br>'
SQL5 = '''
select count(*) as CNT, sum(A.USER_SHIPPED_QTY) as QTY, sum(( A.USER_SHIPPED_QTY) * A.UNIT_PRICE) as VALUE
from SHIPPER_LINE A, SHIPPER B
where B.PACKLIST_ID = A.PACKLIST_ID
and A.CUST_ORDER_ID like ('CO%')
and B.SHIPPED_DATE between ('{}') and ('{}')
'''.format(RP.get_first_of_cur_month_ora(), RP.get_rep_date_ora())
## {} and .format get around the issue of using %s with CO%
print SQL5
curs.execute(SQL5)
for line in curs: ##used to print database lines in HTML
print line
i=0
for c in line:
if i==0:
BODY['html'] += '<pre>' + str(c).rjust(60,' ')
elif i == 1:
BODY['html'] += str(c).rjust(15,' ')
else:
BODY['html'] += str(c).rjust(22,' ') + '</pre>'
i+=1
The "pre" in HTML is used to keep the whitespace and the ' ' after rjust is used to space the numbers properly to go under the column headings. The values that are printed out are generated from the database using the SQL.
Here is what displays in HTML for this code:
Total shipped this month:..............Orders........Qty...........Value
3968 16996 1153525.96
This is what I want it to look like:
Total shipped this month:..............Orders........Qty...........Value
3968 16996 $1153525.96
You could apply the format in the DB by wrapping your sum with a to_char and a currency/numeric format model ...
select to_char(12345.67, 'FML999,999.99') FROM DUAL;
I would like to take a gigantic string, chop it up and put it into an SQL table in order.
So far I have tried using regex to split up the string, getting the values I want and trying insert them into the table like so.
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
for line in bigStringLines:
values = re.split(":", line)
stmt = "INSERT INTO mytable (\"" + values[0] + "\") VALUES (\"" + values[1] + "\");"
c.execute(stmt)
However it looks like this inside the SQL database
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1
1.2.3.123
CyberSoftware
20121115
Computer1
b37da93e9c05
Installed program 2
4.5.6.456
MicroSoftware
20160414
Computer2
b37da93e9c06
Idealy I would like it to look like this inside the database:
DisplayName DisplayVersion Publisher InstallDate PSComputerName RunspaceId
Installed program 1 1.2.3.123 CyberSoftware 20121115 Computer1 b37da93e9c05
Installed program 2 4.5.6.456 MicroSoftware 20160414 Computer2 b37da93e9c06
Here's what the main structure of the string looks like:
DisplayName : Installed program 1
DisplayVersion : 1.2.3.123
Publisher : CyberSoftware
InstallDate : 20121115
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
DisplayName : Installed program 2
DisplayVersion : 2.2.2.147
Publisher : CyberSoftware
InstallDate : 20140226
PSComputerName : Computer1
RunspaceId : 38ff5be0-da11-4664-97b1-b37da93e9c05
Just for a bit of extra background info, this will be part of a bigger program that queries what apps are installed on a large group of computers. For testing I'm just using SQLite however plan to move it to MySQL in the future.
If anyone know what I'm doing wrong or has any suggestions I would greatly appreciate it.
You're doing an insert for every line in the text file, not for every record in the file. Only do an insert for every record. If this is consistent, then fill variables and insert after filling RunSpaceId or a blank line, then clear all variables (or use a dictionary, probably easier) and iterate to the next record. Something like:
conn = sqlite3.connect('PP.DB')
c = conn.cursor()
c.execute('''CREATE TABLE apps (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId)''')
# Split up string based on new lines
bigStringLines = re.split(r'\\r\\n', myBigString)
record = {}
for line in bigStringLines:
if line.startswith("DisplayName"):
record["DisplayName"] = re.split(":", line)[1] # or find index of colon and use negative slice notation from end of string
elif line.startswith("DisplayVersion"):
record["DisplayVersion"] = re.split(":", line)[1]
# and so on for all values....
elif line.strip() == "": # blank line = end of record (or use `RunSpaceId as trigger once populated)
stmt = "INSERT INTO mytable (DisplayName, DisplayVersion, Publisher, InstallDate, PSCOmputerName, RunspaceId) VALUES ({DisplayName}, {DisplayVersion}, {Publisher}, {InstallDate}, {PSCOmputerName}, {RunspaceId});".format(**record) # adjust as needed depending on python version
c.execute(stmt)
record = {} # reset for next record
And PS, if this is in a text file, this can all be accomplished without using RegEx at all (and I recommend this). There is no reason to read the entire file into memory if it is a local flat file.
I have an excel book with a couple of sheets. Each sheet has two columns with PersonID and LegacyID. We are basically trying to update some records in the database based on personid. This is relatively easy to do TSQL and I might even be able to get it done pretty quick in powershell but since I have been trying to learn Python, I thought I would try this in Python. I used xlrd module and I was able to print update statements. below is my code:
import xlrd
book = xlrd.open_workbook('D:\Scripts\UpdateID01.xls')
sheet = book.sheet_by_index(0)
myList = []
for i in range(sheet.nrows):
myList.append(sheet.row_values(i))
outFile = open('D:\Scripts\update.txt', 'wb')
for i in myList:
outFile.write("\nUPDATE PERSON SET LegacyID = " + "'" + str(i[1]) + "'" + " WHERE personid = " + "'" + str(i[0])
+ "'")
Two problems - when I read the output file, I see the LegacyID printed as float. How do I get rid of .0 at the end of each id? Second problem, python doesn't print each update statement in a new line in the output text file. How to I format it?
Edit: Please ignore the format issue. It did print in new lines when I opened the output file in Notepad++. The float issue still remains.
Can you turn the LegacyID into ints ?
i[1] = int(i[1])
outFile.write("\nUPDATE PERSON SET LegacyID = " + "'" + str(i[1]) + "'" + " WHERE personid = " + "'" + str(i[0])
+ "'")
try this..
# use 'a' if you want to append in your text file
outFile = open(r'D:\Scripts\update.txt', 'a')
for i in myList:
outFile.write("\nUPDATE PERSON SET LegacyID = '%s' WHERE personid = '%s'" %( int(i[1]), str(i[0])))
Since you are learning Python (which is very laudable!) you should start reading about string formatting in the Python docs. This is the best place to start whenever you have a question light this.
Hint: You may want to convert the float items to integers using int().
I have searched the grep answers on here and cannot find an answer. They all seem to search for a string in a file, not a list of strings from a file. I already have a search function that works, but grep does it WAY faster. I have a list of strings in a file sn.txt (with one string on each line, no deliminators). I want to search another file (Merge_EXP.exp) for lines that have a match and write it out to a new file. The file I am searching in has a half millions lines, so searching for a few thousand in there takes hours without grep.
When I run it from command prompt in windows, it does it in minutes:
grep --file=sn.txt Merge_EXP.exp > Merge_EXP_Out.exp
How can I call this same process from Python? I don't really want alternatives in Python because I already have one that works but takes a while. Unless you think you can significantly improve the performance of that:
def match_SN(serialnumb, Exp_Merge, output_exp):
fout = open(output_exp,'a')
f = open(Exp_Merge,'r')
# skip first line
f.readline()
for record in f:
record = record.strip().rstrip('\n')
if serialnumb in record:
fout.write (record + '\n')
f.close()
fout.close()
def main(Output_CSV, Exp_Merge, updated_exp):
# create a blank output
fout = open(updated_exp,'w')
# copy header records
f = open(Exp_Merge,'r')
header1 = f.readline()
fout.write(header1)
header2 = f.readline()
fout.write(header2)
fout.close()
f.close()
f_csv = open(Output_CSV,'r')
f_csv.readline()
for rec in f_csv:
rec_list = rec.split(",")
sn = rec_list[2]
sn = sn.strip().rstrip('\n')
match_SN(sn,Exp_Merge,updated_exp)
Here is a optimized version of pure python code:
def main(Output_CSV, Exp_Merge, updated_exp):
output_list = []
# copy header records
records = open(Exp_Merge,'r').readlines()
output_list = records[0:2]
serials = open(Output_CSV,'r').readlines()
serials = [x.split(",")[2].strip().rstrip('\n') for x in serials]
for s in serials:
items = [x for x in records if s in x]
output_list.extend(items)
open(updated_exp, "w").write("".join(output_list))
main("sn.txt", "merge_exp.exp", "outx.txt")
Input
sn.txt:
x,y,0011
x,y,0002
merge_exp.exp:
Header1
Header2
0011abc
0011bcd
5000n
5600m
6530j
0034k
2000lg
0002gg
Output
Header1
Header2
0011abc
0011bcd
0002gg
Try this out and see how much time it takes...
When I use full path to grep location it worked (I pass it the grep_loc, Serial_List, Export):
import os
Export_Dir = os.path.dirname(Export)
Export_Name = os.path.basename(Export)
Output = Export_Dir + "\Output_" + Export_Name
print "\nOutput: " + Output + "\n"
cmd = grep_loc + " --file=" + Serial_List + " " + Export + " > " + Output
print "grep usage: \n" + cmd + "\n"
os.system(cmd)
print "Output created\n"
I think you have not chosen the right title for your question: What you want to do is the equivalent of a database JOIN. You can use grep for that in this particular instance, because one of your files only has keys and no other information. However, I think it is likely (but of course I don't know your case) that in the future your sn.txt may also contain extra information.
So I would solve the generic case. There are multiple solutions:
import all data into a database, then do a LEFT JOIN (in sql) or equivalent
use a python large data tool
For the latter, you could try numpy or, recommended because you are working with strings, pandas. Pandas has an optimized merge routine, which is very fast in my experience (uses cython under the hood).
Here is pandas PSEUDO code to solve your problem. It is close to real code but I need to know the names of the columns that you want to match on. I assumed here the one column in sn.txt is called key, and the matching column in merge_txt is called sn. I also see you have two header lines in merge_exp, read the docs for that.
# PSEUDO CODE (but close)
import pandas
left = pandas.read_csv('sn.txt')
right = pandas.read_csv('merge_exp.exp')
out = pandas.merge(left, right, left_on="key", right_on="sn", how='left')
out.to_csv("outx.txt")