Comparing SQL results with Python - python

I was looking for a way to my program to take 2 values from my database, compare them and update or insert some value.
I tried in sqlite3 but I didn't find a good solution and I tried on python, but when I run the program, the values are different and never match. I already look on google, here on stack overflow, but didn't find anything.
cursor.execute("select * from [Sistema]")
Teste = cursor.fetchall()
cursor.execute("select [SistemaOperacional] from [Sistema] where [SistemaOperacional] = 'teste'")
comparacao = cursor.fetchall()
testando = comparacao
for row in Teste:
#I was checking the values to see if they where equal
print(comparação[0]) #Value ('teste',)
print(row[0]) #Value 'teste'
if row[0] == comparação[0]:
cursor.execute("Update [Sistema] set [SistemaOperacional] = '1' where [VersaoBanco] = 3")
print('Executado')
break
else:
cursor.execute("insert into [Sistema] values ('9','9','1')")
print('não')
break
the output whas
('teste',)
teste
não
I wasn't finding a solution for this in sql, that's why I tried with python, but I'm open to listen any other suggestion than python

It's a pretty common problem for people unfamiliar with Python/MySQL to run into troubles receiving tuples back from querying. comparacao[0] is a tuple. Tuples are compared lexicographically by element, so you'd have to retrieve the first element (like #Gwyn Evans said) e.g. comparação[0][0] in order to compare it to your string row.
Here is a link to Python docs about comparisons: https://docs.python.org/2.0/ref/comparisons.html

I could be on the wrong track here, but isn't your comparação[0] itself a tuple, so you probably want to be comparing comparação[0][0] with row[0].

Related

Forcing the for loop to start with the same ligne

I want to force the for loop to start always with the same ligne, so that the order of all lignes of the dataset still always the same.
In other words, when I'm searching the index of the ligne with id=15, I found always a diffrent result.
Here is my code:
`import psycopg2 as p
conn = p.connect("dbname=Chicago user=postgres password=admin host=localhost ")
cur = conn.cursor()
cur.execute("select * from chicago_2po_4pgr")
nbrows = cur.rowcount
rows = cur.fetchall()
for r in range(0,nbrows):
id=rows[r][0]
if id==15:
print(r,rows[r][0],rows[r][1])`
The result of the first run is:
`56153 15 4271616`
(r is 56153)
The result of the second run (of the same code) is:
`126523 15 4271616`
(r is 126523)
Any suggestion of how can I edit my code to have always the same order of lignes?
Add an ORDER BY clause. SQL queries without order by can return results in any arbitrary order.
If the data being queried can be changed (insert, update or delete) then the record at position 15 can change. You could query a specific key value, or grab the result set and index it by a key, to get a consistent result.

Cannot validate query results

query = "SELECT serialno from registeredpcs where ipaddress = "
usercheck = query + "'%s'" %fromIP
#print("query"+"-"+usercheck)
print(usercheck)
rs = cursor.execute(usercheck)
print(rs)
row = rs
#print(row)
#rs = cursor.rowcount()
if int(row) == 1:
query = "SELECT report1 from registeredpcs where serialno = "
firstreport = query + "'%s'" %rs
result = cursor.execute(firstreport)
print(result)
elif int(row) == 0:
query_new = "SELECT * from registeredpcs"
cursor.execute(query_new)
newrow = cursor.rowcount()+1
print(new row)
What I am trying to do here is fetch the serialno values from the db when it matches a certain ipaddress. This query if working fine. As it should the query result set rs is 0. Now I am trying to use that value and do something else in the if else construct. Basically I am trying to check for unique values in the db based on the ipaddress value. But I am getting this error
error: uncaptured python exception, closing channel smtpd.SMTPChannel connected
192.168.1.2:3630 at 0x2e47c10 (**class 'TypeError':'int' object is not
callable** [C:\Python34\lib\asyncore.py|read|83]
[C:\Python34\lib\asyncore.py|handle_read_event|442]
[C:\Python34\lib\asynchat.py|handle_read|171]
[C:\Python34\lib\smtpd.py|found_terminator|342] [C:/Users/Dev-
P/PycharmProjects/CR Server Local/LRS|process_message|43])
I know I am making some very basic mistake. I think it's the part in bold thats causing the error. But just can't put my finger on to it. I tried using the rowcount() method didn't help.
rowcount is an attribute, not a method; you shouldn't call it.
"I know I am making some very basic mistake" : well, Daniel Roseman alreay adressed the cause of your main error, but there are a couple other mistakes in your code:
query = "SELECT serialno from registeredpcs where ipaddress = "
usercheck = query + "'%s'" % fromIP
rs = cursor.execute(usercheck)
This part is hard to read (you're using both string concatenation and string formatting for no good reason), brittle (try this with `fromIP = "'foo'"), and very very unsafe. You want to use paramerized queries instead, ie:
# nb check your exact db-api module for the correct placeholder,
# MySQLdb uses '%s' but some other use '?' instead
query = "SELECT serialno from registeredpcs where ipaddress=%s"
params = [fromIP,]
rs = cursor.execute(query, params)
"As it should the query result set rs is 0"
This is actually plain wrong. cursor.execute() returns the number of rows affected (selected, created, updated, deleted) by the query. The "resultset" is really the cursor itself. You can fetch results using cursor.fetchone(), cursor.fetall(), or more simply (and more efficiently if you want to work on the whole resultset with constant memory use) by iterating over the cursor, ie:
for row in cursor:
print row
Let's continue with your code:
row = rs
if int(row) == 1:
# ...
elif int(row) == 0:
# ...
The first line is useless - it only makes row an alias of rs, and badly named - it's not a "row" (one line of results from your query), it's an int. Since it's already an int, converting it to int is also useless. And finally, unless 'ipadress' is a unique key in your table, your query might return more than one row.
If what you want is the effective value(s) for the serialno field for records matching fromIP, you have to fetch the row(s):
row = cursor.fetchone() # first row, as a tuple
then get the value, which in this case will be the first item in row:
serialno = row[0]

How can I make my code more efficient?

I have a list of tuples that contains a tool_id, a time, and a message. I want to select from this list all the elements where the message matches some string, and all the other elements where the time is within some diff of any matching message for that tool.
Here is how I am currently doing this:
# record time for each message matching the specified message for each tool
messageTimes = {}
for row in cdata: # tool, time, message
if self.message in row[2]:
messageTimes[row[0], row[1]] = 1
# now pull out each message that is within the time diff for each matched message
# as well as the matched messages themselves
def determine(tup):
if self.message in tup[2]: return True # matched message
for (tool, date_time) in messageTimes:
if tool == tup[0]:
if abs(date_time-tup[1]) <= tdiff:
return True
return False
cdata[:] = [tup for tup in cdata if determine(tup)]
This code works, but it takes way too long to run - e.g. when cdata has 600,000 elements (which is typical for my app) it takes 2 hours for this to run.
This data came from a database. Originally I was getting just the data I wanted using SQL, but that was taking too long also. I was selecting just the messages I wanted, then for each one of those doing another query to get the data within the time diff of each. That was resulting in tens of thousands of queries. So I changed it to pull all the potential matches at once and then process it in python, thinking that would be faster. Maybe I was wrong.
Can anyone give me some suggestions on speeding this up?
Updating my post to show what I did in SQL as was suggested.
What I did in SQL was pretty straightforward. The first query was something like:
SELECT tool, date_time, message
FROM event_log
WHERE message LIKE '%foo%'
AND other selection criteria
That was fast enough, but it may return 20 or 30 thousand rows. So then I looped through the result set, and for each row ran a query like this (where dt and t are the date_time and tool from a row from the above select):
SELECT date_time, message
FROM event_log
WHERE tool = t
AND ABS(TIMESTAMPDIFF(SECOND, date_time, dt)) <= timediff
That was taking about an hour.
I also tried doing in one nested query where the inner query selected the rows from my first query, and the outer query selected the time diff rows. That took even longer.
So now I am selecting without the message LIKE '%foo%' clause and I am getting back 600,000 rows and trying to pull out the rows I want from python.
The way to optimize the SQL is to do it all in one query, instead of iterating over 20K rows and doing another query for each one.
Usually this means you need to add a JOIN, or occasionally a sub-query. And yes, you can JOIN a table to itself, as long as you rename one or both copies. So, something like this:
SELECT el2.date_time, el2.message
FROM event_log as el1 JOIN event_log as el2
WHERE el1.message LIKE '%foo%'
AND other selection criteria
AND el2.tool = el1.tool
AND ABS(TIMESTAMPDIFF(SECOND, el2.datetime, el1.datetime)) <= el1.timediff
Now, this probably won't be fast enough out of the box, so there are two steps to improve it.
First, look for any columns that obviously need to be indexed. Clearly tool and datetime need simple indices. message may benefit from either a simple index or, if your database has something fancier, maybe something fancier, but given that the initial query was fast enough, you probably don't need to worry about it.
Occasionally, that's sufficient. But usually, you can't guess everything correctly. And there may also be a need to rearrange the order of the queries, etc. So you're going to want to EXPLAIN the query, and look through the steps the DB engine is taking, and see where it's doing a slow iterative lookup when it could be doing a fast index lookup, or where it's iterating over a large collection before a small collection.
For tabular data, you can't go past the Python pandas library, which contains highly optimised code for queries like this.
I fixed this by changing my code as follows:
-first I made messageTimes a dict of lists keyed by the tool:
messageTimes = defaultdict(list) # a dict with sorted lists
for row in cdata: # tool, time, module, message
if self.message in row[3]:
messageTimes[row[0]].append(row[1])
-then in the determine function I used bisect:
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
le = bisect.bisect_right(times, tup[1])
ge = bisect.bisect_left(times, tup[1])
return (le and tup[1]-times[le-1] <= tdiff) or (ge != len(times) and times[ge]-tup[1] <= tdiff)
With these changes the code that was taking over 2 hours took under 20 minutes, and even better, a query that was taking 40 minutes took 8 seconds!
I made 2 more changes and now that 20 minute query is taking 3 minutes:
found = defaultdict(int)
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
idx = found[tup[0]]
le = bisect.bisect_right(times, tup[1], idx)
idx = le
return (le and tup[1]-times[le-1] <= tdiff) or (le != len(times) and times[le]-tup[1] <= tdiff)

python sqlite3 selecting variables

I'm trying to select 3 variables in a sqlite3 statement. And putting this into 3 python variables, can't get it to work..
knifekingdb.execute("SELECT rank, rounds, date FROM knifekingdb WHERE steamid = ?", steamid)
I can put it into one list by assigning that statement to a python variabel. But i don't know how to split a list of integers and strings into different variabels.
Can you please help me, because i'm a bit stuck.
knifekingdb.execute(
"""SELECT rank, rounds, date
FROM knifekingdb WHERE steamid = ? LIMIT 1""", steamid)
try:
rank, rounds, date = knifekingdb.fetchone()
except TypeError:
# fetchone returned None because no row was found
# handle error here
raise

How to check if record exists with Python MySQdb

Im creating a python program that connects to mysql.
i need to check if a table contains the number 1 to show that it has connected successfully, this is my code thus far:
xcnx.execute('CREATE TABLE settings(status INT(1) NOT NULL)')
xcnx.execute('INSERT INTO settings(status) VALUES(1)')
cnx.commit()
sqlq = "SELECT * FROM settings WHERE status = '1'"
xcnx.execute(sqlq)
results = xcnx.fetchall()
if results =='1':
print 'yep its connected'
else:
print 'nope not connected'
what have i missed? i am an sql noob, thanks guys.
I believe the most efficient "does it exist" query is just to do a count:
sqlq = "SELECT COUNT(1) FROM settings WHERE status = '1'"
xcnx.execute(sqlq)
if xcnx.fetchone()[0]:
# exists
Instead of asking the database to perform any count operations on fields or rows, you are just asking it to return a 1 or 0 if the result produces any matches. This is much more efficient that returning actual records and counting the amount client side because it saves serialization and deserialization on both sides, and the data transfer.
In [22]: c.execute("select count(1) from settings where status = 1")
Out[22]: 1L # rows
In [23]: c.fetchone()[0]
Out[23]: 1L # count found a match
In [24]: c.execute("select count(1) from settings where status = 2")
Out[24]: 1L # rows
In [25]: c.fetchone()[0]
Out[25]: 0L # count did not find a match
count(*) is going to be the same as count(1). In your case because you are creating a new table, it is going to show 1 result. If you have 10,000 matches it would be 10000. But all you care about in your test is whether it is NOT 0, so you can perform a bool truth test.
Update
Actually, it is even faster to just use the rowcount, and not even fetch results:
In [15]: if c.execute("select (1) from settings where status = 1 limit 1"):
print True
True
In [16]: if c.execute("select (1) from settings where status = 10 limit 1"):
print True
In [17]:
This is also how django's ORM does a queryObject.exists().
If all you want to do is check if you have successfully established a connection then why are you trying to create a table, insert a row, and then retrieve data from it?
You could simply do the following...
sqlq = "SELECT * FROM settings WHERE status = '1'"
xcnx.execute(sqlq)
results = xcnx.fetchone()
if results =='1':
print 'yep its connected'
else:
print 'nope not connected'
In fact if your program has not thrown an exception so far indicates that you have established the connection successfully. (Do check the code above, I'm not sure if fetchone will return a tuple, string, or int in this case).
By the way, if for some reason you do need to create the table, I would suggest dropping it before you exit so that your program runs successfully the second time.
When you run results = xcnx.fetchall(), the return value is a sequence of tuples that contain the row values. Therefore when you check if results == '1', you are trying to compare a sequence to a constant, which will return False. In your case, a single row of value 0 will be returned, so you could try this:
results = xcnx.fetchall()
# Get the value of the returned row, which will be 0 with a non-match
if results[0][0]:
print 'yep its connected'
else:
print 'nope not connected'
You could alternatively use a DictCursor (when creating the cursor, use .cursor(MySQLdb.cursors.DictCursor) which would make things a bit easier to interpret codewise, but the result is the same:
if results[0]['COUNT(*)]':
# Continues...
Also, not a big deal in this case, but you are comparing an integer value to a string. MySQL will do the type conversion, but you could use SELECT COUNT(*) FROM settings WHERE status = 1 and save a (very small) bit of processing.
I recently improved my efficiency by instead of querying select, just adding a primary index to the unique column and then adding it. MySQL will only add it if it doesn't exist.
So instead of 2 statements:
Query MySQL for exists:
Query MySQL insert data
Just do 1 and it will only work if it's unique:
Query MySQL insert data
1 Query is better than 2.

Categories