I'm working on a Python project that retrieves an image from MSSQL. My code is able to retrieve the images successfully but with a fixed size of 63KB. if the image is greater than that size, it just brings the first 63KB from the image!
The following is my code:
#!/usr/bin/python
import _mssql
mssql=_mssql.connect('<ServerIP>','<UserID>','<Password>')
mssql.select_db('<Database>')
x=1
while x==1:
query="select TOP 1 * from table;"
if mssql.query(query):
rows=mssql.fetch_array()
rowNumbers = rows[0][1]
#print "Number of rows fetched: " + str(rowNumbers)
for row in rows:
for i in range(rowNumbers):
FILE=open('/home/images/' + str(row[2][i][1]) + '-' + str(row[2][i][2]).strip() + ' (' + str(row[2][i][0]) + ').jpg','wb')
FILE.write(row[2][i][4])
FILE.close()
print 'Successfully downloaded image: ' + str(row[2][i][0]) + '\t' + str(row[2][i][2]).strip() + '\t' + str(row[2][i][1])
else:
print mssql.errmsg()
print mssql.stdmsg()
mssql.close()
It's kind of hard to tell what the problem is when you're using a database like this. Your query isn't explicitly selecting any columns, so we have no idea what your table structure is, or what types the columns are. I suspect the table format is not what you're expecting, or the columntype is incorrect for your data.
Also your code doesn't even look like it would run. You have "for row in rows:" and then don't indent after that. Maybe post your schema?
If your using freetds (I think you are): Search in your freetds.conf for the 'text size' setting.. standard its at 63 kb
Related
I need to find elements on a page by looking for text(), so I use xlsx as a database with all the texts that will be searched.
It turns out that it is showing the error reported in the title of the publication, this is my code:
search_num = str("'//a[contains(text()," + '"' + row[1] + '")' + "]'")
print(search_num)
xPathnum = self.chrome.find_element(By.XPATH, search_num)
print(xPathnum.get_attribute("id"))
print(search_num) returns = '//a[contains(text(),"0027341-66.2323.0124")]'
Does anyone know where I'm going wrong, despite having similar posts on the forum, none of them solved my problem. Grateful for the attention
Lot more quotes appear, Use python format() function to substitute the variable.
search_num ="//a[contains(text(),'{}')]".format(row[1])
Looks like you have extra quotes here
str("'//a[contains(text()," + '"' + row[1] + '")' + "]'")
Try changing to f"//a[contains(text(),'{row[1]}')]"
if year in Year:
#print 'executing'
for rows in range(1,sheet.nrows):
records = []
FIP = str(sheet.cell(rows, 1).value)
for cols in range(9,sheet.ncols):
records.append(str(sheet.cell(rows,cols).value))
cur.execute("UPDATE " + str(table_name) + " SET " + (str(variables[0]) + "= \'{0}\', ".format(records[0])
+ str(variables[1]) + " = \'{0}\', ".format(records[1])
+ str(variables[2]) + " = \'{0}\', ".format(records[2])
+ str(variables[3]) + " = \'{0}\', ".format(records[3])
+ str(variables[4]) + " = \'{0}\',".format(records[4])
+ str(variables[5]) + " = \'{0}\', ".format(records[5])
+ str(variables[6]) + " = \'{0}\' ".format(records[6])+
"WHERE DATA_Year='2010'AND FIPS='{0}'".format(FIP)))
The above code is updating 7 columns whose names are stored in the list 'variables'.
I want to make it dynamic so that if number of elements(columns) in the list 'variables' is increased, it should update all the columns and not just 7.
I tried doing that using this code:
if year in Year:
#print 'executing'
for rows in range(1,sheet.nrows):
records = []
FIP = str(sheet.cell(rows, 1).value)
for cols in range(9,sheet.ncols):
records.append(str(sheet.cell(rows,cols).value))
for x in range(0,len(variables)):
#print x
cur.execute("UPDATE " + str(table_name) + " SET " + (str(variables[x])) + "= \'{0}\', ".format(records[x])
+ "WHERE DATA_Year='2010' AND FIPS='{0}'".format(FIP))
But I am getting the error:
pypyodbc.ProgrammingError: (u'42000', u"[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near the keyword 'WHERE'.")
It would be great if someone can help me figure out what's wrong with my code and whether there is a alternate way of doing what I am trying to do.
You will find it easier to use parameter-substitution. See params, and note that execute takes a sequence argument. Then that line starts to look something like,
cur.execute(sql, records)
If memory permits (it probably does), you may find executemany performs better: you call it once with an array of records.
With that in mind, the dynamic part of your question comes into focus. Construct the parameterized query string as you iterate over cols. When that's done, you should have a matched set (in effect) of parameter placeholders and elements in records. Then tack on the WHERE clause, append FIP to records, and execute.
HTH.
I recently acquired a trial version of some source code to check MISRA compliance before purchasing. I have run pc-lint over the C code to verify compliance, and have got an output of a huge amount of violations. I was wanting to nicify the html generated so that I can sort what violations there are. I have tried googling for something that exists already to do this with little yield, so instead i began writing a python script...
In short, the script iterates through every line of the html output multiple times in order to check for a particular string. Of course this takes a ridiculously long time to execute, I have been unable to find an elegant solution to this, but I'm hoping im missing something obvious that someone could point out... otherwise, perhaps another language would be more appropriate that would execute faster. Cheers!
#!/usr/bin/env python
import re
rule_search = re.compile("Required Rule (.*?),",re.DOTALL|re.M)
rule_search2 = re.compile("MISRA 2004 Rule (.*?)]",re.DOTALL|re.M)
line_search = re.compile("<br>(.*?)<br>",re.DOTALL|re.M)
data=open('lint-all.html').read()
unique_rules = list(set(rule_search.findall(data)))
unique_rules2 = list(set(rule_search2.findall(data)))
MISRA_Rules = unique_rules + unique_rules2
count = [0] * len(MISRA_Rules)
page_lines = {}
pages = {}
counts = open("pages/counts.html",'w')
counts.write("<h2>Violated Rules Count</h2><h3><ol>")
counts.close()
for i in range (len(MISRA_Rules)):
pages[i] = open("pages/" + str(MISRA_Rules[i]).translate(None, '.') + ".html", 'w')
pages[i].close()
counts = open("pages/counts.html",'a+')
counts.write("<a href=" + str(MISRA_Rules[i]).translate(None, '.') + ".html>" + str(MISRA_Rules[i]) + "</a>: <font size='3'> 0 </font> " )
if i%4 == 0 and i != 0:
counts.write("<br />")
counts.write("<br /><a href=sorted.html>Total:</a> " + "<font size='3'>" + str(count) + "</font>")
counts.write("</h3>")
for i in range (len(MISRA_Rules)):
pages[i] = open("pages/" + str(MISRA_Rules[i]).translate(None, '.') + ".html", 'a+')
pages[i].write("<h1>MISRA Rule " + str(MISRA_Rules[i]) + "</h1>")
pages[i].write("""<link rel="import" href="counts.html">""")
for j in range (len(line_search.findall(data))):
if "Rule " + str(MISRA_Rules[i]) in line_search.findall(data)[j]:
count[i] += 1
pages[i].write("<br>")
pages[i].write(line_search.findall(data)[j])
pages[i].write("</br>")
print "out"
new_html = open('pages/sorted.html', 'w')
counts = """<h2>Violated Rules Count</h2><h3><ol>"""
for i in range (len(MISRA_Rules)):
counts += """""" + str(MISRA_Rules[i]) + """: <font size="3">""" + str(count[i]) + """</font> """
if i%4 == 0 and i != 0:
counts += """<br />"""
counts += """<br /><a href=sorted.html>Total:</a> """ + """<font size="3">""" + str(count) + """</font>"""
counts += """</h3>"""
counts.close()
new_html.write(counts)
new_html.write(data)
new_html.close()
Several approaches possible.
First is to optimize existing code. It's difficult to say what's wrong with it. In this case one goes to cprofile docs and sets up a profiler. There you'll see the bottlenecks.
Second approach (most preferable to my opinion): parse data in Python, but leave HTML generation to specialized tools, such as jinja2 template engine, which is extensively used in web development. The simpler version of jinja2 is mustache, most likely that in won't require any installation.
Third approach is to do all this stuff in-browser. Add jQuery for DOM manipulation (introduce new tags and classes) and a css stylesheet (determine how new tags and classes should look like).
I have an excel book with a couple of sheets. Each sheet has two columns with PersonID and LegacyID. We are basically trying to update some records in the database based on personid. This is relatively easy to do TSQL and I might even be able to get it done pretty quick in powershell but since I have been trying to learn Python, I thought I would try this in Python. I used xlrd module and I was able to print update statements. below is my code:
import xlrd
book = xlrd.open_workbook('D:\Scripts\UpdateID01.xls')
sheet = book.sheet_by_index(0)
myList = []
for i in range(sheet.nrows):
myList.append(sheet.row_values(i))
outFile = open('D:\Scripts\update.txt', 'wb')
for i in myList:
outFile.write("\nUPDATE PERSON SET LegacyID = " + "'" + str(i[1]) + "'" + " WHERE personid = " + "'" + str(i[0])
+ "'")
Two problems - when I read the output file, I see the LegacyID printed as float. How do I get rid of .0 at the end of each id? Second problem, python doesn't print each update statement in a new line in the output text file. How to I format it?
Edit: Please ignore the format issue. It did print in new lines when I opened the output file in Notepad++. The float issue still remains.
Can you turn the LegacyID into ints ?
i[1] = int(i[1])
outFile.write("\nUPDATE PERSON SET LegacyID = " + "'" + str(i[1]) + "'" + " WHERE personid = " + "'" + str(i[0])
+ "'")
try this..
# use 'a' if you want to append in your text file
outFile = open(r'D:\Scripts\update.txt', 'a')
for i in myList:
outFile.write("\nUPDATE PERSON SET LegacyID = '%s' WHERE personid = '%s'" %( int(i[1]), str(i[0])))
Since you are learning Python (which is very laudable!) you should start reading about string formatting in the Python docs. This is the best place to start whenever you have a question light this.
Hint: You may want to convert the float items to integers using int().
I have searched the grep answers on here and cannot find an answer. They all seem to search for a string in a file, not a list of strings from a file. I already have a search function that works, but grep does it WAY faster. I have a list of strings in a file sn.txt (with one string on each line, no deliminators). I want to search another file (Merge_EXP.exp) for lines that have a match and write it out to a new file. The file I am searching in has a half millions lines, so searching for a few thousand in there takes hours without grep.
When I run it from command prompt in windows, it does it in minutes:
grep --file=sn.txt Merge_EXP.exp > Merge_EXP_Out.exp
How can I call this same process from Python? I don't really want alternatives in Python because I already have one that works but takes a while. Unless you think you can significantly improve the performance of that:
def match_SN(serialnumb, Exp_Merge, output_exp):
fout = open(output_exp,'a')
f = open(Exp_Merge,'r')
# skip first line
f.readline()
for record in f:
record = record.strip().rstrip('\n')
if serialnumb in record:
fout.write (record + '\n')
f.close()
fout.close()
def main(Output_CSV, Exp_Merge, updated_exp):
# create a blank output
fout = open(updated_exp,'w')
# copy header records
f = open(Exp_Merge,'r')
header1 = f.readline()
fout.write(header1)
header2 = f.readline()
fout.write(header2)
fout.close()
f.close()
f_csv = open(Output_CSV,'r')
f_csv.readline()
for rec in f_csv:
rec_list = rec.split(",")
sn = rec_list[2]
sn = sn.strip().rstrip('\n')
match_SN(sn,Exp_Merge,updated_exp)
Here is a optimized version of pure python code:
def main(Output_CSV, Exp_Merge, updated_exp):
output_list = []
# copy header records
records = open(Exp_Merge,'r').readlines()
output_list = records[0:2]
serials = open(Output_CSV,'r').readlines()
serials = [x.split(",")[2].strip().rstrip('\n') for x in serials]
for s in serials:
items = [x for x in records if s in x]
output_list.extend(items)
open(updated_exp, "w").write("".join(output_list))
main("sn.txt", "merge_exp.exp", "outx.txt")
Input
sn.txt:
x,y,0011
x,y,0002
merge_exp.exp:
Header1
Header2
0011abc
0011bcd
5000n
5600m
6530j
0034k
2000lg
0002gg
Output
Header1
Header2
0011abc
0011bcd
0002gg
Try this out and see how much time it takes...
When I use full path to grep location it worked (I pass it the grep_loc, Serial_List, Export):
import os
Export_Dir = os.path.dirname(Export)
Export_Name = os.path.basename(Export)
Output = Export_Dir + "\Output_" + Export_Name
print "\nOutput: " + Output + "\n"
cmd = grep_loc + " --file=" + Serial_List + " " + Export + " > " + Output
print "grep usage: \n" + cmd + "\n"
os.system(cmd)
print "Output created\n"
I think you have not chosen the right title for your question: What you want to do is the equivalent of a database JOIN. You can use grep for that in this particular instance, because one of your files only has keys and no other information. However, I think it is likely (but of course I don't know your case) that in the future your sn.txt may also contain extra information.
So I would solve the generic case. There are multiple solutions:
import all data into a database, then do a LEFT JOIN (in sql) or equivalent
use a python large data tool
For the latter, you could try numpy or, recommended because you are working with strings, pandas. Pandas has an optimized merge routine, which is very fast in my experience (uses cython under the hood).
Here is pandas PSEUDO code to solve your problem. It is close to real code but I need to know the names of the columns that you want to match on. I assumed here the one column in sn.txt is called key, and the matching column in merge_txt is called sn. I also see you have two header lines in merge_exp, read the docs for that.
# PSEUDO CODE (but close)
import pandas
left = pandas.read_csv('sn.txt')
right = pandas.read_csv('merge_exp.exp')
out = pandas.merge(left, right, left_on="key", right_on="sn", how='left')
out.to_csv("outx.txt")