python intersect of dict items - python

Suppose I have a dict like:
aDict[1] = '3,4,5,6,7,8'
aDict[5] = '5,6,7,8,9,10,11,12'
aDict[n] = '5,6,77,88'
The keys are arbitrary, and there could be any number of them. I want to consider every value in the dictionary.
I want to treat each string as comma-separated values, and find the intersection across the entire dictionary (the elements common to all dict values). So in this case the answer would be '5,6'. How can I do this?

from functools import reduce # if Python 3
reduce(lambda x, y: x.intersection(y), (set(x.split(',')) for x in aDict.values()))

First of all, you need to convert these to real lists.
l1 = '3,4,5,6,7,8'.split(',')
Then you can use sets to do the intersection.
result = set(l1) & set(l2) & set(l3)

Python Sets are ideal for that task. Consider the following (pseudo code):
intersections = None
for value in aDict.values():
temp = set([int(num) for num in value.split(",")])
if intersections is None:
intersections = temp
else:
intersections = intersections.intersection(temp)
print intersections

result = None
for csv_list in aDict.values():
aList = csv_list.split(',')
if result is None:
result = set(aList)
else:
result = result & set(aList)
print result

Since set.intersection() accepts any number of sets, you can make do without any use of reduce():
set.intersection(*(set(v.split(",")) for v in aDict.values()))
Note that this version won't work for an empty aDict.
If you are using Python 3, and your dictionary values are bytes objects rather than strings, just split at b"," instead of ",".

Related

python efficient way to compare nested lists and append matches to new list

I wish to compare two nested lists. If there is a match between the first element of each sublist, I wish to add the matched element to a new list for further operations. Below is an example and what I've tried so far:
Example:
x = [['item1','somethingelse1'], ['item2', 'somethingelse2']...]
y = [['item1','somethingelse3'], ['item3','somethingelse4']...]
What I've I tried so far:
match = []
for itemx in x:
for itemy in y:
if itemx[0] == itemy[0]:
match.append(itemx)
The above of what I tried did the job of appending the matched item into the new list, but I have two very long nested lists, and what I did above is very slow for operating on very long lists. Are there any more efficient ways to get out the matched item between two nested lists?
Yes, use a data structure with constant-time membership testing. So, using a set, for example:
seen = set()
for first,_ in x:
seen.add(first)
matched = []
for first,_ in y:
if first in seen:
matched.append(first)
Or, more succinctly using set/list comprehensions:
seen = {first for first,_ in x}
matched = [first for first,_ in y if first in seen]
(This was before the OP changed the question from append(itemx[0]) to append(itemx)...)
>>> {a[0] for a in x} & {b[0] for b in y}
{'item1'}
Or if the inner lists are always pairs:
>>> dict(x).keys() & dict(y)
{'item1'}
IIUC using numpy:
import numpy as np
y=[l[0] for l in y]
x=np.array(x)
x[np.isin(x[:, 0], y)]

Identify a single difference in a python list

I would have to get some help concerning a part of my code.
I have some python list, example:
list1 = (1,1,1,1,1,1,5,1,1,1)
list2 = (6,7,4,4,4,1,6,7,6)
list3 = (8,8,8,8,9)
I would like, for each list, know if there is a single value that is different compare to every other values if and only if all of these other values are the same. For example, in the list1, it would identify "5" as a different value, in list2 it would identify nothing as there are more than 2 different values and in list3 it would identify "9"
What i already did is :
for i in list1:
if list1(i)==len(list1)-1
print("One value identified")
The problem is that i get "One value identified" as much time as "1" is present in my list ...
But what i would like to have is an output like that :
The most represented value equal to len(list1)-1 (Here "1")
The value that is present only once (Here "5")
The position in the list where the "5"
You could use something like that:
def odd_one_out(lst):
s = set(lst)
if len(s)!=2: # see comment (1)
return False
else:
return any(lst.count(x)==1 for x in s) # see comment (2)
which for the examples you provided, yields:
print(odd_one_out(list1)) # True
print(odd_one_out(list2)) # False
print(odd_one_out(list3)) # True
To explain the code I would use the first example list you provided [1,1,1,1,1,1,5,1,1,1].
(1) converting to set removes all the duplicate values from your list thus leaving you with {1, 5} (in no specific order). If the length of this set is anything other than 2 your list does not fulfill your requirements so False is returned
(2) Assuming the set does have a length of 2, what we need to check next is that at least one of the values it contains appear only once in the original list. That is what this any does.
You can use the built-in Counter from High-performance container datatypes :
from collections import Counter
def is_single_diff(iterable):
c = Counter(iterable)
non_single_items = list(filter(lambda x: c[x] > 1, c))
return len(non_single_items) == 1
Tests
list1 = (1,1,1,1,1,1,5,1,1,1)
list2 = (6,7,4,4,4,1,6,7,6)
list3 = (8,8,8,8,9)
In: is_single_diff(list1)
Out: True
In: is_single_diff(list2)
Out: False
In: is_single_diff(list3)
Out: True
Use numpy unique, it will give you all the information you need.
myarray = np.array([1,1,1,1,1,1,5,1,1,1])
vals_unique,vals_counts = np.unique(myarray,return_counts=True)
You can first check for the most common value. After that, go through the list to see if there is a different value, and keep track of it.
If you later find another value that isn't the same as the most common one, the list does not have a single difference.
list1 = [1,1,1,1,1,1,5,1,1,1]
def single_difference(lst):
most_common = max(set(lst), key=lst.count)
diff_idx = None
diff_val = None
for idx, i in enumerate(lst):
if i != most_common:
if diff_val is not None:
return "No unique single difference"
diff_idx = idx
diff_val = i
return (most_common, diff_val, diff_idx)
print(single_difference(list1))

how to convert a set in python into a dictionary

I am new to python and trying to convert a Set into a Dictionary. I am struggling to find a way to make this possible. Any inputs are highly appreciated. Thanks.
Input : {'1438789225', '1438789230'}
Output : {'1438789225':1, '1438789230':2}
Use enumerate() to generate a value starting from 0 and counting upward for each item in the dictionary, and then assign it in a comprehension:
input_set = {'1438789225', '1438789230'}
output_dict = {item:val for val,item in enumerate(input_set)}
Or a traditional loop:
output_dict = {}
for val,item in enumerate(input_set):
output_dict[item] = val
If you want it to start from 1 instead of 0, use item:val+1 for the first snippet and output_dict[item] = val+1 for the second snippet.
That said, this dictionary would be pretty much the same as a list:
output = list(input_set)
My one-liner:
output = dict(zip(input_set, range(1, len(s) + 1)))
zip mixes two lists (or sets) element by element (l1[0] + l2[0] + l1[1] + l2[1] + ...).
We're feeding it two things:
the input_set
a list from 1 to the length of the set + 1 (since you specified you wanted to count from 1 onwards, not from 0)
The output is a list of tuples like [('1438789225', 1), ('1438789230', 2)] which can be turned into a dict simply by feeding it to the dict constructor... dict.
But like TigerhawkT3 said, I can hardly find a use for such a dictionary. But if you have your motives there you have another way of doing it. If you take away anything from this post let it be the existence of zip.
an easy way of doing this is by iterating on the set, and populating the result dictionary element by element, using a counter as dictionary key:
def setToIndexedDict(s):
counter = 1
result = dict()
for element in s:
result[element] = counter #adding new element to dictionary
counter += 1 #incrementing dictionary key
return result
My Python is pretty rusty, but this should do it:
def indexedDict(oldSet):
dic = {}
for elem,n in zip(oldSet, range(len(oldSet)):
dic[elem] = n
return dic
If I wrote anything illegal, tell me and I'll fix it. I don't have an interpreter handy.
Basically, I'm just zipping the list with a range object (basically a continuous list of numbers, but more efficient), then using the resulting tuples.
Id got with Tiger's answer, this is basically a more naive version of his.

Ordering a string by its substring numerical value in python

I have a list of strings that need to be sorted in numerical order using as a int key two substrings.
Obviously using the sort() function orders my strings alphabetically so I get 1,10,2... that is obviously not what I'm looking for.
Searching around I found a key parameter can be passed to the sort() function, and using sort(key=int) should do the trick, but being my key a substring and not the whole string should lead to a cast error.
Supposing my strings are something like:
test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1
I want my list to be ordered in numeric order on the basis of the first integer and then on the second, so I would have:
test1txtfgg2
test1txtfgf10
test2txtsdsd1
test2txffdt3
I think I could extract the integer values, sort only them keeping track of what string they belong to and then ordering the strings, but I was wondering if there's a way to do this thing in a more efficient and elegant way.
Thanks in advance
Try the following
In [26]: import re
In [27]: f = lambda x: [int(x) for x in re.findall(r'\d+', x)]
In [28]: sorted(strings, key=f)
Out[28]: ['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']
This uses regex (the re module) to find all integers in each string, then compares the resulting lists. For example, f('test1txtfgg2') returns [1, 2], which is then compared against other lists.
Extract the numeric parts and sort using them
import re
d = """test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1"""
lines = d.split("\n")
re_numeric = re.compile("^[^\d]+(\d+)[^\d]+(\d+)$")
def key(line):
"""Returns a tuple (n1, n2) of the numeric parts of line."""
m = re_numeric.match(line)
if m:
return (int(m.groups(1)), int(m.groups(2)))
else:
return None
lines.sort(key=key)
Now lines are
['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']
import re
k = [
"test1txtfgf10",
"test1txtfgg2",
"test2txffdt3",
"test2txtsdsd1"
]
tmp = [([e for e in re.split("[a-z]",el) if e], el) for el in k ]
sorted(tmp, key=lambda k: tmp[0])
tmp = [res for cm, res in tmp]

Given two lists in python one with strings and one with objects, how do you map them?

I have a list of strings
string_list = ["key_val_1", "key_val_2", "key_val_3", "key_val_4", ...]
and a list with objects
object_list = [object_1, object_2, object_3,...]
Every object object_i has an attribute key.
I want to sort the objects in object_list by the order of string_list.
I could do something like
new_list = []
for key in string_list:
for object in object_list:
if object.key == key:
new_list.append(object)
but there must be a more pythonic way, then this brute force one. :-) How would you solve this?
First, create a dictionary mapping object keys to objects:
d = dict((x.key, x) for x in object_list)
Next create the sorted list using a list comprehension:
new_list = [d[key] for key in string_list]
Map each key to its desired precedence:
key_precedence = dict((x, n) for n, x in enumerate(string_list))
Then sort by precedence:
object_list.sort(key=lambda x: key_precedence[x.key])
To handle keys that might not be in string_list:
default = -1 # put "unknown" in front
default = sys.maxint # put "unknown" in back
object_list.sort(key=lambda x: key_precedence.get(x.key, default))
If string_list is short (e.g. 10 or fewer items), you can simplify:
object_list.sort(key=lambda x: string_list.index(x.key))
# But it's more cumbersome to handle defaults this way.
However, this is prohibitive for larger lengths of string_list.
You can use the cmp argument of the sort() method:
object_list.sort(cmp=lambda x,y: cmp(string_list.index(x.key),
string_list.index(y.key)))
or use sorted() to avoid the in-place substitution:
sorted(object_list, cmp=lambda x,y: cmp(string_list.index(x.key),
string_list.index(y.key)))

Categories