I got a function that is called to calculate a response every time the user inputs something. The function gets the response from a database. What I don't understand is, why I have to redefine my variable (I have called it intents_db) that contains all the data from the database each time the function is called? I have tried putting it outside the function, but then my program only works the first time, but the returns an empty answer the second time the user inputs something.
def response(sentence, user_id='1'):
results = classify_intent(sentence)
intents_db = c.execute("SELECT row_num, responses, tag, responses, intent_type, response_type, context_set,\
context_filter FROM intents")
if results:
# loop as long as there are matches to process
while results:
if results[0][1] > answer_threshold:
for i in intents_db:
# print('tag:', i[2])
if i[2] == results[0][0]:
print(i[6])
if i[6] != 'N/A':
if show_details:
print('context: ', i[6])
context[user_id] = i[6]
responses = i[1].split('&/&')
print(random.choice(responses))
if i[7] == 'N/A' in i or \
(user_id in context and i[7] in i and i[7] == context[
user_id]):
# a random response from the intent
responses = i[1].split('&/&')
print(random.choice(responses))
print(i[4], i[5])
print(results[0][1])
elif results[0][1] <= answer_threshold:
print(results[0][1])
for i in intents_db:
if i[2] == 'unknown':
# a random response from the intent
responses = i[1].split('&/&')
print(random.choice(responses))
initial_comm_output = random.choice(responses)
return initial_comm_output
else:
initial_comm_output = "Something unexpected happened when calculating response. Please restart me"
return initial_comm_output
results.pop(0)
return results
Also, I started getting into databases and sqlite3 because I want to make a massive database long term. Therefore it also seems inefficient that I have to load the whole database at all. Is there some way I can only load the row of data I need? I got a row_number column in my database, so if it was somehow possible to say like:
"SELECT WHERE row_num=2 FROM intents"
that would be great, but I can't figure out how to do it.
cursor.execute() returns an iterator, and you can only loop over it once.
If you want to reuse it, turn it into a list:
intents_db = list(c.execute("..."))
Therefore it also seems inefficient that I have to load the whole database at all. Is there some way I can only load the row of data I need? I got a row_number column in my database, so if it was somehow possible to say like: "SELECT WHERE row_num=2 FROM intents" that would be great, but I can't figure out how to do it.
You nearly got it: it is
intents_db = c.execute("SELECT row_num, responses, tag, responses, intent_type,
response_type, context_set, context_filter
FROM intents WHERE row_num=2")
But don't do the mistake other database beginners make and try to put in some variable from your program directly into that string. This makes the program prone to SQL injections.
Rather, do
row_num = 2
intents_db = c.execute("SELECT row_num, responses, tag, responses, intent_type,
response_type, context_set, context_filter
FROM intents WHERE row_num=?", (row_num,))
Of course, you can also set conditions for other fields.
Related
conn = database_connect()
if(conn is None):
return None
cur = conn.cursor()
try:
# Try executing the SQL and get from the database
sql = """SELECT *
FROM user
WHERE user_id**strong text** =%s AND password =%s"""
cur.execute(sql, (employee_id, password))
r = cur.fetchone()# Fetch the first row
rr = cur.fetchall()
cur.close() # Close the cursor
conn.close() # Close the connection to the db
except:
# If there were any errors, return a NULL row printing an error to the debug
print("Error Invalid Login")
cur.close() # Close the cursor
conn.close() # Close the connection to the db
return None
user_info = []
if rr is None:
print("worry")
return []
for n in rr:
user_info.append(n)
test = {
'info1': user_info[0],
'info2': user_info[1],
'info3': user_info[2],
'info4': user_info[3],
}
return test
Here is my code. First Implement the login function, and then get the user information, but there is a IndexError: list index out of range. How do I fix this?
Here:
r = cur.fetchone()# Fetch the first row
rr = cur.fetchall()
The call to fetchone() will consume the first row, so rr will contain n-1 rows.
Also, if your database allow duplicated user_id then you have a serious db design issue - whether user_id is (suppoed to be) the primary key or the "username" (loging name), it should really be unique. If it isn't, then you want to change your schema, and if it is indeed unique then obviously your query can only return (at most !) one single row, in which case rr is garanteed to always be empty (since you consumed the first row with the fetchone() call.
As a side note, here are some possible improvements:
this :
for n in rr:
user_info.append(n)
is completely useless - you are just building a shallow copy of rr, wo just work directly with rr instead.
Then do not assume you will always have four rows in your query result, so at least build your test dict dynamically:
test = {}
# enumerate(seq) yields an index, value tuple
# for each value in `seq`.
for num, row in enumerate(rr, 1):
key = "info{}.format(num)
test[key] = row
or more succintly:
test = {"info{}".format(num):row for num, row in enumerate(rr, 1)}
Note that this dict doesn't add much value compared to rr - you have "info" keys instead of numeric indexes and that's about all, so you could just directly use the rr list instead.
Last but not least, your try/except clause is too broad (too much code in the try block), the except clause will eat very valuable debugging information (the exact error message and the full traceback) and even worse, the error message you display assumes way too much about what really happens. Actually you probably shouldn't even have an except clause here (at least not until you really know what errors can happen here AND be properly handled at this point) so better to let the error propagate (so you get the full error message and traceback), and use the finally clause to close your connection instead:
sql = """SELECT *
FROM user
WHERE user_id**strong text** =%s AND password =%s"""
try:
cur.execute(sql, (employee_id, password))
r = cur.fetchone()# Fetch the first row
rr = cur.fetchall()
finally:
# this will always be executed whatever happens in the try block
cur.close() # Close the cursor
conn.close() # Close the connection to the db
I'm trying to sort a collection, and then print the first 5 docs to make sure it has worked:
#!/user/bin/env python
import pymongo
# Establish a connection to the mongo database.
connection = pymongo.MongoClient('mongodb://localhost')
# Get a handle to the students database.
db = connection.school
students = db.students
def order_homework():
projection = {'scores': {'$elemMatch': {'type': 'homework'}}}
cursor = students.find({}, projection)
# Sort each item's scores.
for each in cursor:
each['scores'].sort()
# Sort by _id.
cursor = sorted(cursor, key=lambda x: x['_id'])
# Print the first five items.
count = 0
for each in cursor:
print(each)
count += 1
if count == 5:
break
if __name__ == '__main__':
order_homework()
When I run this, nothing prints.
If I take out the sorts, then it prints.
Each sort works when run individually.
Please teach me what I'm doing wrong / educate me.
You're trying to treat the cursor like a list, which you can iterate several times from the start. PyMongo cursors don't act that way - once you've iterated it in for each in cursor, the cursor is completed and you can't iterate it again.
You can turn the cursor into a list like:
data = list(students.find({}, projection))
For efficiency, get results pre-sorted from MongoDB:
list(students.find({}, projection).sort('_id'))
This sends the sort criterion to the server, which then streams the results back to you pre-sorted, instead of requiring you to do it client-side. Now delete your "Sort by _id" line below.
Basically what i want to do is return entries from a database based on the user's input and display the records in PyQt4's textBrowser widget.
This is the code:
def url_search(self):
self.browserUrl.append("Look") \\ for testing reasons, works.
items = []
for index in range(self.listUrl.count()):
items.append(self.listUrl.item(index))
conn = sqlite3.connect(directory + '\\CyberConan Databases\\CB Database\\Google Chrome\\Chrome Artifacts.db')
for kw in items:
self.browserUrl.append("it is") \\for testing reasons, works
x = str(kw)
for row in conn.execute("SELECT * FROM urls WHERE ID LIKE ? OR URL LIKE ? OR Title LIKE ? "
"OR Visit_Count LIKE ? OR Typed_Count LIKE ?;",
("'%"+x+"%'", "'%"+x+"%'", "'%"+x+"%'", "'%"+x+"%'", "'%+x+%'")):
self.browserUrl.append("working") \\ for testing reasons, does not work
self.browserUrl.append(str(row[0]))
self.browserUrl.append(str(row[1]))
self.browserUrl.append(str(row[2]))
self.browserUrl.append(str(row[3]))
self.browserUrl.append("working man!") \\ for testing reasons, does not work
self.browserUrl.append(str(row[4]))
self.browserUrl.append(str(row[5]))
So the user would press a Button and this function runs. The only output im getting is:
Look
it is
This output is occurring every time the button is pressed. All the naming is correct. I get no errors. The table does contain what the user is searching for, so there should be output. Note that "it is" appears for as many kw there are in items.
(This is for a graduation project)
okay so I found out what the problem was.
First: the parsing of the string in the select statement was not done correctly.
Second: when I was returning the items from the list, i was returning their "locations" and not the strings themselves.
This is the solved code for anyone who may have run into a similar issue:
def url_search(self):
items = []
for index in range(self.listUrl.count()):
thing = self.listUrl.item(index)
items.append(thing.text())
conn = sqlite3.connect(directory + '\\CyberConan Databases\\CB Database\\Google Chrome\\Chrome Artifacts.db')
for kw in items:
x = str(kw)
for row in conn.execute("SELECT * FROM urls WHERE ID LIKE ? OR URL LIKE ? OR Title LIKE ? "
"OR Visit_Count LIKE ? OR Typed_Count LIKE ?;",
("%"+x+"%", "%"+x+"%", "%"+x+"%", "%"+x+"%", "%"+x+"%")):
self.browserUrl.append(str(row))
conn.close()
query = "SELECT serialno from registeredpcs where ipaddress = "
usercheck = query + "'%s'" %fromIP
#print("query"+"-"+usercheck)
print(usercheck)
rs = cursor.execute(usercheck)
print(rs)
row = rs
#print(row)
#rs = cursor.rowcount()
if int(row) == 1:
query = "SELECT report1 from registeredpcs where serialno = "
firstreport = query + "'%s'" %rs
result = cursor.execute(firstreport)
print(result)
elif int(row) == 0:
query_new = "SELECT * from registeredpcs"
cursor.execute(query_new)
newrow = cursor.rowcount()+1
print(new row)
What I am trying to do here is fetch the serialno values from the db when it matches a certain ipaddress. This query if working fine. As it should the query result set rs is 0. Now I am trying to use that value and do something else in the if else construct. Basically I am trying to check for unique values in the db based on the ipaddress value. But I am getting this error
error: uncaptured python exception, closing channel smtpd.SMTPChannel connected
192.168.1.2:3630 at 0x2e47c10 (**class 'TypeError':'int' object is not
callable** [C:\Python34\lib\asyncore.py|read|83]
[C:\Python34\lib\asyncore.py|handle_read_event|442]
[C:\Python34\lib\asynchat.py|handle_read|171]
[C:\Python34\lib\smtpd.py|found_terminator|342] [C:/Users/Dev-
P/PycharmProjects/CR Server Local/LRS|process_message|43])
I know I am making some very basic mistake. I think it's the part in bold thats causing the error. But just can't put my finger on to it. I tried using the rowcount() method didn't help.
rowcount is an attribute, not a method; you shouldn't call it.
"I know I am making some very basic mistake" : well, Daniel Roseman alreay adressed the cause of your main error, but there are a couple other mistakes in your code:
query = "SELECT serialno from registeredpcs where ipaddress = "
usercheck = query + "'%s'" % fromIP
rs = cursor.execute(usercheck)
This part is hard to read (you're using both string concatenation and string formatting for no good reason), brittle (try this with `fromIP = "'foo'"), and very very unsafe. You want to use paramerized queries instead, ie:
# nb check your exact db-api module for the correct placeholder,
# MySQLdb uses '%s' but some other use '?' instead
query = "SELECT serialno from registeredpcs where ipaddress=%s"
params = [fromIP,]
rs = cursor.execute(query, params)
"As it should the query result set rs is 0"
This is actually plain wrong. cursor.execute() returns the number of rows affected (selected, created, updated, deleted) by the query. The "resultset" is really the cursor itself. You can fetch results using cursor.fetchone(), cursor.fetall(), or more simply (and more efficiently if you want to work on the whole resultset with constant memory use) by iterating over the cursor, ie:
for row in cursor:
print row
Let's continue with your code:
row = rs
if int(row) == 1:
# ...
elif int(row) == 0:
# ...
The first line is useless - it only makes row an alias of rs, and badly named - it's not a "row" (one line of results from your query), it's an int. Since it's already an int, converting it to int is also useless. And finally, unless 'ipadress' is a unique key in your table, your query might return more than one row.
If what you want is the effective value(s) for the serialno field for records matching fromIP, you have to fetch the row(s):
row = cursor.fetchone() # first row, as a tuple
then get the value, which in this case will be the first item in row:
serialno = row[0]
I have a list of tuples that contains a tool_id, a time, and a message. I want to select from this list all the elements where the message matches some string, and all the other elements where the time is within some diff of any matching message for that tool.
Here is how I am currently doing this:
# record time for each message matching the specified message for each tool
messageTimes = {}
for row in cdata: # tool, time, message
if self.message in row[2]:
messageTimes[row[0], row[1]] = 1
# now pull out each message that is within the time diff for each matched message
# as well as the matched messages themselves
def determine(tup):
if self.message in tup[2]: return True # matched message
for (tool, date_time) in messageTimes:
if tool == tup[0]:
if abs(date_time-tup[1]) <= tdiff:
return True
return False
cdata[:] = [tup for tup in cdata if determine(tup)]
This code works, but it takes way too long to run - e.g. when cdata has 600,000 elements (which is typical for my app) it takes 2 hours for this to run.
This data came from a database. Originally I was getting just the data I wanted using SQL, but that was taking too long also. I was selecting just the messages I wanted, then for each one of those doing another query to get the data within the time diff of each. That was resulting in tens of thousands of queries. So I changed it to pull all the potential matches at once and then process it in python, thinking that would be faster. Maybe I was wrong.
Can anyone give me some suggestions on speeding this up?
Updating my post to show what I did in SQL as was suggested.
What I did in SQL was pretty straightforward. The first query was something like:
SELECT tool, date_time, message
FROM event_log
WHERE message LIKE '%foo%'
AND other selection criteria
That was fast enough, but it may return 20 or 30 thousand rows. So then I looped through the result set, and for each row ran a query like this (where dt and t are the date_time and tool from a row from the above select):
SELECT date_time, message
FROM event_log
WHERE tool = t
AND ABS(TIMESTAMPDIFF(SECOND, date_time, dt)) <= timediff
That was taking about an hour.
I also tried doing in one nested query where the inner query selected the rows from my first query, and the outer query selected the time diff rows. That took even longer.
So now I am selecting without the message LIKE '%foo%' clause and I am getting back 600,000 rows and trying to pull out the rows I want from python.
The way to optimize the SQL is to do it all in one query, instead of iterating over 20K rows and doing another query for each one.
Usually this means you need to add a JOIN, or occasionally a sub-query. And yes, you can JOIN a table to itself, as long as you rename one or both copies. So, something like this:
SELECT el2.date_time, el2.message
FROM event_log as el1 JOIN event_log as el2
WHERE el1.message LIKE '%foo%'
AND other selection criteria
AND el2.tool = el1.tool
AND ABS(TIMESTAMPDIFF(SECOND, el2.datetime, el1.datetime)) <= el1.timediff
Now, this probably won't be fast enough out of the box, so there are two steps to improve it.
First, look for any columns that obviously need to be indexed. Clearly tool and datetime need simple indices. message may benefit from either a simple index or, if your database has something fancier, maybe something fancier, but given that the initial query was fast enough, you probably don't need to worry about it.
Occasionally, that's sufficient. But usually, you can't guess everything correctly. And there may also be a need to rearrange the order of the queries, etc. So you're going to want to EXPLAIN the query, and look through the steps the DB engine is taking, and see where it's doing a slow iterative lookup when it could be doing a fast index lookup, or where it's iterating over a large collection before a small collection.
For tabular data, you can't go past the Python pandas library, which contains highly optimised code for queries like this.
I fixed this by changing my code as follows:
-first I made messageTimes a dict of lists keyed by the tool:
messageTimes = defaultdict(list) # a dict with sorted lists
for row in cdata: # tool, time, module, message
if self.message in row[3]:
messageTimes[row[0]].append(row[1])
-then in the determine function I used bisect:
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
le = bisect.bisect_right(times, tup[1])
ge = bisect.bisect_left(times, tup[1])
return (le and tup[1]-times[le-1] <= tdiff) or (ge != len(times) and times[ge]-tup[1] <= tdiff)
With these changes the code that was taking over 2 hours took under 20 minutes, and even better, a query that was taking 40 minutes took 8 seconds!
I made 2 more changes and now that 20 minute query is taking 3 minutes:
found = defaultdict(int)
def determine(tup):
if self.message in tup[3]: return True # matched message
times = messageTimes[tup[0]]
idx = found[tup[0]]
le = bisect.bisect_right(times, tup[1], idx)
idx = le
return (le and tup[1]-times[le-1] <= tdiff) or (le != len(times) and times[le]-tup[1] <= tdiff)