I've been trying to figure out the best way to write a query to compare the rows in two tables. My goal is to see if the two tuples in result Set A are in the larger result set B. I only want to see the tuples that are different in the query results.
'''SELECT table1.field_b, table1.field_c, table1.field_d
'''FROM table1
'''ORDER BY field_b
results_a = [(101010101, 111111111, 999999999), (121212121, 222222222, 999999999)]
'''SELECT table2.field_a, table2.fieldb, table3.field3
'''FROM table2
'''ORDER BY field_a
results_b =[(101010101, 111111111, 999999999), (121212121, 333333333, 999999999), (303030303, 444444444, 999999999)]
So what I want to do is take results_a and make sure that they have an exact match somewhere in results_b. So since the second record in the second tuple is different than what is in results_a, I would like to return the second tuple in results_a.
Ultimately I would like to return a set that also has the second tuple that did not match in the other set so I could reference both in my program. Ideally since the second tuples primary key (field_b in table1) didn't match the corresponding primary key (field_a) in table2 then I would want to display results_c ={(121212121, 222222222, 999999999):(121212121, 222222222, 999999999)}. This is complicated by the facts that the results in both tables will not be in the same order so I can't write code that says (compare tuple2 in results_a to tuple2 in results_b). It is more like (compare tuple2 in results_a and see if it matches any record in results_b. If the primary keys match and none of the tuples in results b completely match or no partial match is found return the records that don't match.)
I apologize that this is so wordy. I couldn't think of a better way to explain it. Any help would be much appreciated.
Thanks!
UPDATED EFFORT ON PARTIAL MATCHES
a = [(1, 2, 3),(4,5,7)]
b = [(1, 2, 3),(4,5,6)]
pmatch = dict([])
def partial_match(x,y):
return sum(ea == eb for (ea,eb) in zip(x,y))>=2
for el_a in a:
pmatch[el_a] = [el_b for el_b in b if partial_match(el_a,el_b)]
print(pmatch)
OUTPUT = {(4, 5, 7): [(4, 5, 6)], (1, 2, 3): [(1, 2, 3)]}. I would have expected it to be just {(4,5,7):(4,5,6)} because those are the only sets that are different. Any ideas?
Take results_a and make sure that they have an exact match somewhere in results_b:
for el in results_a:
if el in results_b:
...
Get partial matches:
pmatch = dict([])
def partial_match(a,b):
# for instance ...
return sum(ea == eb for (ea,eb) in zip(a,b)) >= 2
for el_a in results_a:
pmatch[el_a] = [el_b for el_b in results_b if partial_macth(el_a,el_b)]
Return the records that don't match:
no_match = [el for el in results_a if el not in results_b]
-- EDIT / Another possible partial_match
def partial_match(x,y):
nb_matches = sum(ea == eb for (ea,eb) in zip(x,y))
return 0.6 < float(nb_matches) / len(x) < 1
Related
I'm a python student (by myself), and as an exercise I decided to try making a script to 'encrypt/decrypt' a message.
The 'encryption algo' that I'm using is very simple, I learned during military service and it was used for troops in the field to encrypt radio messages only.
I assume it's not a secure way to encrypt stuff. (If someone can comment on that, I would love to know more)
Anyway, I'm doing it as an exercise for programming logic, but I've been stuck for a while now.
Here it's how it works:
You get a keyword/phrase (More often used with 2 words (vertical and horizontal) but for now I'm coding the 1 keyword only).
Let's use 'PASSWORD' as key and the message: 'This is a sample message'. I would make a table with PASSWORD as colum index, and fill the table with the message:
P A S S W O R D
t h i s i s a s
a m p l e m e s
s a g e x y z x
[Since the message didn't complete all the columns we completed it with letters that won't cause issues]
Then, we determine the order for the scramble, deriving it alphabetically from the key:
4 1 6 7 8 3 5 2
P A S S W O R D
[a,d,o,p,r,s,s,w]
So line by line, letter by letter, we would take the letters from the message according to the key-order, and form the encrypted message:
'hsstaisi' for the first line, 'msmaeple' and 'axyszgex' for the second and third line.
So the message would be 'hsstaisimsmaepleaxyszgex' [Usually transmitted as "hssta isims maepl eaxys zgex" to make it easier for the radio operator]
Now the code:
I manage to make it work (kind of...), here is how:
I get the message and key, remove spaces, make both them into lists. I create a dictionary where every letter from the key(list) becomes a key in the dict, and the value is a number (from 0 to lenght of the key), like an iterator.
{ 'p':0, a':1, 's':2,... } #[Here is my problem]
After that we sort the key(list) alphabetically and use it as iterator to call for the key(dict) that will call for a number that will be a index from the message list. (My explanation is confusing, may be easier to understand by checking the code bellow).
Letter by letter the message is scrambled and appended in a new list, and then presented as 'encrypted'.
It works! Except if the keyphrase has repeated letters (like ours 'password'). In that situation the corresponding value of a repeated dictionary key gets overwritten, because dict keys are unique.
I've written several different versions for the same code, but I always get stuck in the dict problem, at some point or the other.
Here is the piece of code:
key = ['p','a','s','s','w','o','r','d']
msg = ['t','h','i','s','i','s','a','s','a','m','p','l','e','m','e','s','s','a','g','e']
def encrypt(key_list,msg_list):
while len(msg_list) % len(key_list) != 0:
rest = len(key_list) - (len(msg_list) % len(key_list))
for i in range(rest):
if msg_list[-1] == 'z':
msg_list.append('x')
else:
msg_list.append('z')
key_dict = {}
for i in range(len(key_list)):
key_dict[key_list[i]] = i
key_list.sort()
qnty_rows = len(msg_list) // len(key_list)
cloop = 0
scramble_list = []
while cloop < qnty_rows:
for i in range(len(key_list)):
scramble_list.append(msg_list[key_dict[key_list[i]]+(cloop*len(key_list))])
cloop +=1
encrypted_msg = "".join(scramble_list)
print(encrypted_msg)
Can someone help me find a solution to this, or point me at the right direction?
Considering that I'm still learning to code, any constructive criticism for the code in general is welcomed.
Your error lies in how you assign column numbers to each of the key characters, using a dictionary:
for i in range(len(key_list)):
key_dict[key_list[i]] = i
For repeated letters, only the last index remains; s from password maps first to 2, then to 3, so key_dict['s'] ends up being 3:
i = 0, key_list[i] == 'p', key_dict['p'] = 0
i = 1, key_list[i] == 'a', key_dict['a'] = 1
i = 2, key_list[i] == 's', key_dict['s'] = 2
i = 3, key_list[i] == 's', key_dict['s'] = 3 # replacing 2
i = 4, key_list[i] == 'w', key_dict['w'] = 4
# etc.
Don't use a dictionary; generate a list of paired index and character values, sort this by letter, then extract just the indices:
indices = [i for i, c in sorted(enumerate(key_list, key=lambda p: p[1]))]
I used the enumerate() function to generate the indices; it's achieves the same thing as your range(len(key_list)) loop in a more compact form.
Because enumerate() produces (index, value) pairs and we want to sort on the values (the characters), the above code uses a sort key extracting the values (p[1]).
Note that you don't need your key to be a list even, the above would work directly on a string too; strings are sequences just like lists are.
Here's how this works:
>>> keyphrase = 'password'
>>> list(enumerate(keyphrase)) # add indices
[(0, 'p'), (1, 'a'), (2, 's'), (3, 's'), (4, 'w'), (5, 'o'), (6, 'r'), (7, 'd')]
>>> sorted(enumerate(keyphrase), key=lambda p: p[1]) # sorted on the letter
[(1, 'a'), (7, 'd'), (5, 'o'), (0, 'p'), (6, 'r'), (2, 's'), (3, 's'), (4, 'w')]
>>> [i for i, c in sorted(enumerate(keyphrase), key=lambda p: p[1])] # just the indices
[1, 7, 5, 0, 6, 2, 3, 4]
Now you can use these indices to remap chunks of the plaintext input to encrypted output.
What you have is a called a columnar transposition cipher. For modern computers such a cipher is rather trivial to break. See https://crypto.stackexchange.com/questions/40119/how-to-solve-columnar-transposition-cipher-without-a-key for a discussion on how to approach cracking such a ciphertext.
i'm having trouble to merge the values of a dictionary whereas the dictionary varies in its key count
i found an working example using two lists like
t1 = [1,2,3]
t2 = ["a","b","c"]
output = list(zip(t1, t2))
which leads to [(1, 'a'), (2, 'b'), (3, 'c')] ... first success.
But I need to zip all the values from a dictionary, which varies in the count of the key values. (Sometimes there are 2 keys in, sometimes 4 and so on..)
Is there a way to do the zip with a dynamic input, dependent on the count of the keys
Lets say
t1 = [1,2,3]
t2 = ["a","b","c"]
generated_rows = OrderedDict()
generated_rows['t1'] = t1
generated_rows['t2']=t2
output = list(zip(??*))
the expected output would be as above:
[(1, 'a'), (2, 'b'), (3, 'c')]
but the parameters of the zip method should somehow come from the dictionary in a dynamic way. The following variing dicts should work with the method:
d1 = {'k1':[0,1,2], 'k2':['a','b','c']}
d2 = {'k1':[0,1,2], 'k2':['a','b','c'], 'k3':['x','y','z']}
d3 = ...
solution (thanks to Todd):
d1 = {'k1':[0,1,2], 'k2':['a','b','c']}
o = list(zip(*d1.values()))
If your second piece of code accurately represents what you want to do with N different lists, then the code would probably be:
t1 = [ 1, 2, 3 ]
t2 = [ 'a', 'b', 'c' ]
# And so on
x = []
x.append( t1 )
x.append( t2 )
# And so on
output = zip(*x)
You don't need the extra list() because zip() already returns a list. The * operator is sometimes referred to as the 'splat' operator, and when used like this represents unpacking the arguments.
A list is used instead of a dictionary because the 'splat' operator doesn't guarantee the order it unpacks things in beyond "whatever order the type in question uses when iterating over it". An ordered dictionary may work if the keys are selected to impose the correct ordering.
I'm working with a large set of records and need to sum a given field for each customer account to reach an overall account balance. While I can probably put the data in any reasonable form, I figured the easiest would be a list of tuples (cust_id,balance_contribution) as I process through each record. After the round of processing, I'd like to add up the second item for each cust_id, and I am trying to do it without looping though the data thousands of time.
As an example, the input data could look like:[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(2,20.00)]
And I want the output to be something like this:
[(1,125.00),(2,50.00)]
I've read other questions where people have just wanted to add the values of the second element of the tuple using the form of sum(i for i, j in a), but that does separate them by the first element.
This discussion, python sum tuple list based on tuple first value, which puts the values as a list assigned to each key (cust_id) in a dictionary. I suppose then I could figure out how to add each of the values in a list?
Any thoughts on a better approach to this?
Thank you in advance.
import collections
def total(records):
dct = collections.defaultdict(int)
for cust_id, contrib in records:
dct[cust_id] += contrib
return dct.items()
Would the following code be useful?
in_list = [(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
totals = {}
for uid, x in in_list :
if uid not in totals :
totals[uid] = x
else :
totals[uid] += x
print(totals)
output :
{1: 125.0, 2: 30.0, 3: 20.0}
People usually like one-liners in python:
[(uk,sum([vv for kk,vv in data if kk==uk])) for uk in set([k for k,v in data])]
When
data=[(1,125.50),(2,30.00),(1,24.50),(1,-25.00),(3,20.00)]
The output is
[(1, 125.0), (2, 30.0), (3, 20.0)]
Here's an itertools solution:
from itertools import groupby
>>> x
[(1, 125.5), (2, 30.0), (1, 24.5), (1, -25.0), (2, 20.0)]
>>> sorted(x)
[(1, -25.0), (1, 24.5), (1, 125.5), (2, 20.0), (2, 30.0)]
>>> for a,b in groupby(sorted(x), key=lambda item: item[0]):
print a, sum([item[1] for item in list(b)])
1 125.0
2 50.0
I have this python function that takes 2 arguments (string , dictionary) and returns a float. The function is designed to take the average of the integers within a dicionary of scores and strings.
def happiness_score(string, dic):
keys = string.lower().split()
v = sum(dic[key] for key in keys)
return float(v)/len(keys)
I have this test case which works:
print happiness_score("a b" , {"a":(1.2) , "b":(3.4)})
>>> 2.3
I also have a test case with tuples:
print happiness_score("a b" , {"a":(1,2) , "b":(3,4)})
How can I change my code so that I can convert any given tuple to integer so that I can still run my program?
Attempting to use my ninja mind reading skills to guess how you want to convert a tuple to a float, Perhaps you want:
def tup2float(tup):
return float('.'.join(str(x) for x in tup))
This will only work with 2-tuples...
Some results:
>>> tup2float((1,2))
1.2
>>> tup2float((2,3))
2.3
>>> tup2float((2,30))
2.3
>>> tup2float((2,32))
2.32
I'm going to assume that you want the answer in this case to be (2, 3). In other words, you want to treat each element in the tuple as the source of a separate mean.
def happiness_score(dic):
scores = map(sum, zip(*d.values()))
return [float(v)/len(dic) for v in scores]
The first step is to create a set of values for each tuple element:
d = {'a': (1, 2), 'b': (3, 4)}
zip(*d.values())
[(1, 3), (2, 4)]
Then apply sum to each tuple using map.
assuming you read the file into a list called lines
hrs, miles = zip(*[(float(line.split()[-4]), float(line.split()[-2])) for line in lines if 'miles' and 'hrs' in line])
total_hrs = sum(hrs)
total_miles = sum(miles)
So this is what I'd "like" to be able to write:
cur_loc = min(open_set,key=lambda x:costs[x])
cur_loc is a tuple, and the goal is to set it equal to the tuple in open_set with the lowest cost. (and you find the cost of x with costs[x])
How could I do this? I tried Python.org's documentation on min(), but I didn't seem to find much help.
Thanks!
EDIT:
I resolved my own problem.
I was retarded and hadn't initialized the costs dictionary. I was actually copy and pasting someone else's python code in order to test what they were doing, but apparently the snippet they created didn't include the initialization part. Woops. If anyone is interested:
for row in range(self.rows):
for col in range(self.cols):
myloc = (row,col)
if (myloc) not in closed_set:
costs[myloc] = (abs(end_row-row)+abs(end_col - col))*10
if (myloc) not in open_set:
open_set.add(myloc)
parents[myloc] = cur_loc
cur_loc = min(open_set,key=lambda x:costs[x])
Worked for me. What's your question?
>>> costs = { '1': 1, '2': 2, '3': 3 }
>>> open_set = set( ['1','2'] )
>>> min(open_set,key=lambda x:costs[x])
'1'
The you supplied would be correct if cost[x] was a dictionary lookup with tuples as keys. I think you may have meant to extract a field for the tuple and lookup the cost of that field:
>>> costs = dict(red=10, green=20, blue=30)
>>> open_set = {('red', 'car'), ('green', 'boat'), ('blue', 'plane')}
>>> min(open_set, key=lambda x: costs[x[0]])
('red', 'car')