Python intersection of arrays in dictionary - python

I have dictionary of arrays as like:
y_dict= {1: np.array([5, 124, 169, 111, 122, 184]),
2: np.array([1, 2, 3, 4, 5, 6, 111, 184]),
3: np.array([169, 5, 111, 152]),
4: np.array([0, 567, 5, 78, 90, 111]),
5: np.array([]),
6: np.array([])}
I need to find interception of arrays in my dictionary: y_dict.
As a first step I cleared dictionary from empty arrays, as like
dic = {i:j for i,j in y_dict.items() if np.array(j).size != 0}
So, dic has the following view:
dic = { 1: np.array([5, 124, 169, 111, 122, 184]),
2: np.array([1, 2, 3, 4, 5, 6, 111, 184]),
3: np.array([169, 5, 111, 152]),
4: np.array([0, 567, 5, 78, 90, 111])}
To find interception I tried to use tuple approach as like:
result_dic = list(set.intersection(*({tuple(p) for p in v} for v in dic.values())))
Actual result is empty list: [];
Expected result should be: [5, 111]
Could you please help me to find intersection of arrays in dictionary? Thanks

The code you posted is overcomplex and wrong because there's one extra inner iteration that needs to go. You want to do:
result_dic = list(set.intersection(*(set(v) for v in dic.values())))
or with map and without a for loop:
result_dic = list(set.intersection(*(map(set,dic.values()))))
result
[5, 111]
iterate on the values (ignore the keys)
convert each numpy array to a set (converting to tuple also works, but intersection would convert those to sets anyway)
pass the lot to intersection with argument unpacking
We can even get rid of step 1 by creating sets on every array and filtering out the empty ones using filter:
result_dic = list(set.intersection(*(filter(None,map(set,y_dict.values())))))
That's for the sake of a one-liner, but in real life, expressions may be decomposed so they're more readable & commentable. That decomposition may also help us to avoid the crash which occurs when passed no arguments (because there were no non-empty sets) which defeats the smart way to intersect sets (first described in Best way to find the intersection of multiple sets?).
Just create the list beforehand, and call intersection only if the list is not empty. If empty, just create an empty set instead:
non_empty_sets = [set(x) for x in y_dict.values() if x.size]
result_dic = list(set.intersection(*non_empty_sets)) if non_empty_sets else set()

You should be using numpy's intersection here, not directly in Python. And you'll need to add special handling for the empty intersection.
>>> intersection = None
>>> for a in y_dict.values():
... if a.size:
... if intersection is None:
... intersection = a
... continue
... intersection = np.intersect1d(intersection, a)
...
>>> if intersection is not None:
... print(intersection)
...
[ 5 111]
For the case where intersection is None, it means that all of the arrays in y_dict had size zero (no elements). In this case the intersection is not well-defined, you have to decide for yourself what the code should do here - probably raise an exception, but it depends on the use-case.

Related

How to make a Custom Sorting Function for Dictionary Key Values?

I have a dictionary whose key values are kind of like this,
CC-1A
CC-1B
CC-1C
CC-3A
CC-3B
CC-5A
CC-7A
CC-7B
CC-7D
SS-1A
SS-1B
SS-1C
SS-3A
SS-3B
SS-5A
SS-5B
lst = ['CC-1A', 'CC-1B', 'CC-1C', 'CC-3A', 'CC-3B', 'CC-5A', 'CC-7A', 'CC-7B',
'CC-7D', 'SS-1A', 'SS-1B', 'SS-1C', 'SS-3A', 'SS-3B', 'SS-5A', 'SS-5B']
d = dict.fromkeys(lst)
^Not exactly in this order, but in fact they are all randomly placed in the dictionary as key values.
Now, I want to sort them. If I use the built in function to sort the dictionary, it sorts all the key values according to the order given above.
However, I want the dictionary to be first sorted based upon the values after the - sign (i.e. 1A, 1B, 1C etc.) and then based upon the first two characters.
So, for the values given above, following would be my sorted list,
CC-1A
CC-1B
CC-1C
SS-1A
SS-1B
SS-1C
CC-3A
CC-3B
SS-3A
SS-3B
CC-5A
and so on
First, sorting is done based upon the "4th" character in the keys. (that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys (i.e. CC, SS etc.)
Is there any way to achieve this?
Your "wanted" and your sorting description deviate.
Your "wanted" can be achieved by
di = {"CC-1A":"value1","CC-1A":"value2","CC-1B":"value3",
"CC-1C":"value4","CC-3A":"value5","CC-3B":"value6",
"CC-5A":"value7","CC-7A":"value8","CC-7B":"value9",
"CC-7D":"value0","SS-1A":"value11","SS-1B":"value12",
"SS-1C":"value13","SS-3A":"value14","SS-3B":"value15",
"SS-5A":"value16","SS-5B":"value17"}
print(*((v,di[v]) for v in sorted(di, key= lambda x: (x[3], x[:2], x[4]) )),
sep="\n")
to get
('CC-1A', 'value2')
('CC-1B', 'value3')
('CC-1C', 'value4')
('SS-1A', 'value11')
('SS-1B', 'value12')
('SS-1C', 'value13')
('CC-3A', 'value5')
('CC-3B', 'value6')
('SS-3A', 'value14')
('SS-3B', 'value15')
('CC-5A', 'value7')
('SS-5A', 'value16')
('SS-5B', 'value17')
('CC-7A', 'value8')
('CC-7B', 'value9')
('CC-7D', 'value0')
which sorts by number (Pos 4 - (1based)), Start (Pos 1+2 (1based)) then letter (Pos 5 (1based))
but that conflicts with
First, sorting is done based upon the "4th" character in the keys.
(that is, 1, 3, etc.)
Then sorting is done based upon the last character (i.e. A, B etc.)
Then sorting is done based upon the first two characters of the keys
(i.e. CC, SS etc.)
One suggestion is to use a nested dictionary, so instead of:
my_dict = {'CC-1A1': 2,
'CC-1A2': 3,
'CC-1B': 1,
'CC-1C': 5,
'SS-1A': 33,
'SS-1B': 23,
'SS-1C': 31,
'CC-3A': 55,
'CC-3B': 222,
}
you would have something like:
my_dict = {'CC': {'1A1': 2, '1A2': 3, '1B': 1, '1C': 5, '3A': 55, '3B': 222},
'SS': {'1A': 33, '1B': 22, '1C': 31}
}
which would allow you to sort first based on the leading number/characters and then by group. (Actually I think you want this concept reversed based on your question).
Then you can create two lists with your sorted keys/values by doing something like:
top_keys = sorted(my_dict)
keys_sorted = []
values_sorted = []
for key in top_keys:
keys_sorted.append([f"{key}-{k}" for k in my_dict[key].keys()])
values_sorted.append([v for v in my_dict[key].values()])
flat_keys = [key for sublist in keys_sorted for key in sublist]
flat_values = [value for sublist in values_sorted for value in sublist]
Otherwise, you'd have to implement a custom sorting algorithm based first the characters after the - and subsequently on the initial characters.
You can write a function to build a sorting key that will make the required decomposition of the key strings and return a tuple to sort by. Then use that function as the key= parameter of the sorted function:
D = {'CC-1A': 0, 'CC-1B': 1, 'CC-1C': 2, 'CC-3A': 3, 'CC-3B': 4,
'CC-5A': 5, 'CC-7A': 6, 'CC-7B': 7, 'CC-7D': 8, 'SS-1A': 9,
'SS-1B': 10, 'SS-1C': 11, 'SS-3A': 12, 'SS-3B': 13, 'SS-5A': 14,
'SS-5B': 15}
def sortKey(s):
L,R = s.split("-",1)
return (R[:-1],L)
D={k:D[k] for k in sorted(D.keys(),key=sortKey)}
print(D)
{'CC-1A': 0,
'CC-1B': 1,
'CC-1C': 2,
'SS-1A': 9,
'SS-1B': 10,
'SS-1C': 11,
'CC-3A': 3,
'CC-3B': 4,
'SS-3A': 12,
'SS-3B': 13,
'CC-5A': 5,
'SS-5A': 14,
'SS-5B': 15,
'CC-7A': 6,
'CC-7B': 7,
'CC-7D': 8}
If you expect the numbers to eventually go beyond 9 and want a numerical order, then right justify the R part in the tuple: e.g. return (R[:-1].rjust(10),L)
You could use a custom function that implements your rule as sorting key:
def get_order(tpl):
s = tpl[0].split('-')
return (s[1][0], s[0], s[1][1])
out = dict(sorted(d.items(), key=get_order))
Output:
{'CC-1A': None, 'CC-1B': None, 'CC-1C': None, 'SS-1A': None, 'SS-1B': None, 'SS-1C': None, 'CC-3A': None, 'CC-3B': None, 'SS-3A': None, 'SS-3B': None, 'CC-5A': None, 'SS-5A': None, 'SS-5B': None, 'CC-7A': None, 'CC-7B': None, 'CC-7D': None}

How to get a list of values from tuples which the second value is the same as first in the next tuple?

I'm having trouble trying to create a list of values from a list of tuples, which link to where the second value is the same as the first value in another tuple, that starts and ends with certain values.
For example:
start = 11
end = 0
list_tups = [(0,1),(0, 2),(0, 3),(261, 0),(8, 15),(118, 32),(11, 8),(15, 118),(32, 261)]
So I want to iterate through those list of tups, starting with the one which is the same as the start value and searching through the tups where it'll end with the end value.
So my desired output would be:
[11, 8, 15, 118, 32, 261, 0]
I understand how to check the values i'm just having trouble with interating through the tuples every time to check if there is a tuple in the list that matches the second value.
You are describing pathfinding in a directed graph.
>>> import networkx as nx
>>> g = nx.DiGraph(list_tups)
>>> nx.shortest_path(g, start, end)
[11, 8, 15, 118, 32, 261, 0]
This doesn't work with end = 0 because there is no 0 at the end, but here it is with 32:
>>> start = 11
>>> end = 32
>>> flattened = [i for t in list_tups for i in t]
>>> flattened[flattened.index(start):flattened.index(end, flattened.index(start))+1]
[11, 8, 15, 118, 32]
You can recursively search the tuples, moving the start value closer and closer. The path will be accumulated as we move back up through the chain You may need to tweak the path a little to get your desired outcome (I believe you'll need to append the first starting value, then reverse it).
def find(start, end, tuples, path):
for t in tuples:
if t[0] == start:
if t[1] == end or find(t[1], end, tuples):
path.append(t[1])
return True
return False

How can we initialize a python array without copying values from a different container?

The following code snippet shows how to initialize a python array from various container classes (tuple, list, dictionary, set, etc...)
import array as arr
ar_iterator = arr.array('h', range(100))
ar_tuple = arr.array('h', (0, 1, 2,))
ar_list = arr.array('h', [0, 1, 2,])
ar_list = arr.array('h', {0:None, 1:None, 2:None}.keys())
ar_set = arr.array('h', set(range(100)))
ar_fset = arr.array('h', frozenset(range(100)))
The array initialized from range(100) is particularly nice because an iterator does not need to store a hundred elements. It can simply store the current value and a transition function describing how to calculate the next value from the current value (add one to the current value every-time __next__ is called).
However, what if the initial values of an array do not follow a simple pattern, such as counting upwards 0, 1, 2, 3, 4, ..., 99? An iterator might not be practical. It makes no sense to create a list, copy the list to the array, and then delete the list. You have essentially created the array twice and copied it unnecessarily. Is there someway to construct an array directly, by passing in the initial values?
From the python docs (https://docs.python.org/3/library/array.html):
class array.array(typecode[, initializer])
A new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, a bytes-like object, or iterable over elements of the appropriate type.
So it would appear that you are constrained to passing in an initial python container.
Assuming that the initial elements can be derived logically, you could pass a generator as the initialiser. Generators yield their elements as they are iterated over, similar to range.
>>> def g():
... for _ in range(10):
... yield random.randint(0, 100)
...
>>> arr = array.array('h', g())
>>> arr
array('h', [47, 6, 91, 0, 76, 20, 77, 75, 46, 7])
For simple cases, a generator expression can be used:
>>> arr = array.array('h', (random.randint(0, 100) for _ in range(10)))
>>> arr
array('h', [72, 30, 40, 58, 77, 74, 25, 6, 71, 58])

Python- printing n'th level sublist

I have a complicated list arrangement. There are many lists, and some of them have sub-lists. Now, some of the elements from the aforementioned lists are to be printed. What makes it more complicated is, the index of the value to be printed is in an excel file, as shown here:
[list_1,1,2] #Means - list[1][2] is to be printed (sub-lists are there)
[list_2,7] #Means - list_2[7] is to be printed (no sub-list)
................
[list_100,3,6] #Means list_100[3][6] is to be printed (sub list is there)
The number of the lists is so long, so that I was using a for loop and multiple if statements. For example (pseudo code):
for i in range(100): #because 100 lists are there in excel
if len(row_i) == 3:
print(list_name[excel_column_1_value][excel_column_2_value])
else:
print(list_name[excel_column_1_value])
Please note that, the excel sheet is only to get the list name and index, the lists are all saved in the main code.
Is there any way to avoid the if statements and automate that part as well ? Asking because, the if condition value is only based on the length given by the excel sheet. Thanks in advance.
Suppose you have data like this:
data = {
"list1": [[100, 101, 102], [110, 111, 112], [120, 121, 123]],
"list2": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"list3": [[200, 201, 202], [210, 211, 212], [220, 221, 223]],
}
If it is homework, your teacher probably want you to solve it using recursion, but I recommend using an iterative version in Python unless you can assure you would not stack more than 999 calls:
fetch_element(data, listname, *indices):
value = data[listname]
for index in indices:
value = value[index]
return value
Then you have the list of elements you want:
desired = [
["list1", 0, 0],
["list2", 7],
["list3", 2, 2],
]
Now you can do:
>>> [fetch_element(data, *line) for line in desired]
[100, 7, 223]
Which is the same as:
>>> [data["list1"][0][0], data["list2"][7], data["list3"][2][2]]
[100, 7, 223]
Can you post a better example? how does your list look like and what's the desired output when printing?
You can open the file, read the indexes and lists names you want to print into a list and iterate that list to print what you want.
There are many ways to print a list a simple one, you can use:
mylist = ['hello', 'world', ':)']
print ', '.join(mylist)
mylist2 = [['hello', 'world'], ['Good', 'morning']]
for l in mylist2:
print(*l)

Constructing the largest number possible by rearranging a list

Say I have an array of positive whole integers; I'd like to manipulate the order so that the concatenation of the resultant array is the largest number possible. For example [97, 9, 13] results in 99713; [9,1,95,17,5] results in 9955171. I'm not sure of an answer.
sorted(x, cmp=lambda a, b: -1 if str(b)+str(a) < str(a)+str(b) else 1)
Intuitively, we can see that a reverse sort of single digit numbers would lead to the higest number:
>>> ''.join(sorted(['1', '5', '2', '9'], reverse=True))
'9521'
so reverse sorting should work. The problem arises when there are multi-digit snippets in the input. Here, intuition again lets us order 9 before 95 and 17 before 1, but why does that work? Again, if they had been the same length, it would have been clear how to sort them:
95 < 99
96 < 97
14 < 17
The trick then, is to 'extend' shorter numbers so they can be compared with the longer ones and can be sorted automatically, lexicographically. All you need to do, really, is to repeat the snippet to beyond the maximum length:
comparing 9 and 95: compare 999 and 9595 instead and thus 999 comes first.
comparing 1 and 17: compare 111 and 1717 instead and thus 1717 comes first.
comparing 132 and 13: compare 132132 and 1313 instead and thus 132132 comes first.
comparing 23 and 2341: compare 232323 and 23412341 instead and thus 2341 comes first.
This works because python only needs to compare the two snippets until they differ somewhere; and it's (repeating) matching prefixes that we need to skip when comparing two snippets to determine which order they need to be in to form a largest number.
You only need to repeat a snippet until it is longer than the longest snippet * 2 in the input to guarantee that you can find the first non-matching digit when comparing two snippets.
You can do this with a key argument to sorted(), but you need to determine the maximum length of the snippets first. Using that length, you can 'pad' all snippets in the sort key until they are longer than that maximum length:
def largestpossible(snippets):
snippets = [str(s) for s in snippets]
mlen = max(len(s) for s in snippets) * 2 # double the length of the longest snippet
return ''.join(sorted(snippets, reverse=True, key=lambda s: s*(mlen//len(s)+1)))
where s*(mlen//len(s)+1) pads the snippet with itself to be more than mlen in length.
This gives:
>>> combos = {
... '12012011': [1201, 120, 1],
... '87887': [87, 878],
... '99713': [97, 9, 13],
... '9955171': [9, 1, 95, 17, 5],
... '99799713': [97, 9, 13, 979],
... '10100': [100, 10],
... '13213': [13, 132],
... '8788717': [87, 17, 878],
... '93621221': [936, 21, 212],
... '11101110': [1, 1101, 110],
... }
>>> def test(f):
... for k,v in combos.items():
... print '{} -> {} ({})'.format(v, f(v), 'correct' if f(v) == k else 'incorrect, should be {}'.format(k))
...
>>> test(largestpossible)
[97, 9, 13] -> 99713 (correct)
[1, 1101, 110] -> 11101110 (correct)
[936, 21, 212] -> 93621221 (correct)
[13, 132] -> 13213 (correct)
[97, 9, 13, 979] -> 99799713 (correct)
[87, 878] -> 87887 (correct)
[1201, 120, 1] -> 12012011 (correct)
[100, 10] -> 10100 (correct)
[9, 1, 95, 17, 5] -> 9955171 (correct)
[87, 17, 878] -> 8788717 (correct)
Note that this solution is a) 3 lines short and b) works on Python 3 as well without having to resort to functools.cmp_to_key() and c) does not bruteforce the solution (which is what the itertools.permutations option does).
Hint one: you concatenate strings, not integers.
Hint two: itertools.permutations().
import itertools
nums = ["9", "97", "13"]
m = max(("".join(p) for p in itertools.permutations(nums)), key = int)
You can use itertools.permutations as hinted and use the key argument of the max function (which tells which function to apply to each element in order to decide the maximum) after you concat them with the join function.
It's easier to work with strings to begin with.
I don't like the brute force approach to this. It requires a massive amount of computation for large sets.
You can write your own comparison function for the sorted builtin method, which will return a sorting parameter for any pair, based on any logic you put in the function.
Sample code:
def compareInts(a,b):
# create string representations
sa = str(a)
sb = str(b)
# compare character by character, left to right
# up to first inequality
# if you hit the end of one str before the other,
# and all is equal up til then, continue to next step
for i in xrange(min(len(sa), len(sb))):
if sa[i] > sb[i]:
return 1
elif sa[i] < sb[i]:
return -1
# if we got here, they are both identical up to the length of the shorter
# one.
# this means we need to compare the shorter number again to the
# remainder of the longer
# at this point we need to know which is shorter
if len(sa) > len(sb): # sa is longer, so slice it
return compareInts(sa[len(sb):], sb)
elif len(sa) < len(sb): # sb is longer, slice it
return compareInts(sa, sb[len(sa):])
else:
# both are the same length, and therefore equal, return 0
return 0
def NumberFromList(numlist):
return int(''.join('{}'.format(n) for n in numlist))
nums = [97, 9, 13, 979]
sortednums = sorted(nums, cmp = compareInts, reverse = True)
print nums # [97, 9, 13, 979]
print sortednums # [9, 979, 97, 13]
print NumberFromList(sortednums) # 99799713
Well, there's always the brute force approach...
from itertools import permutations
lst = [9, 1, 95, 17, 5]
max(int(''.join(str(x) for x in y)) for y in permutations(lst))
=> 9955171
Or this, an adaptation of #Zah's answer that receives a list of integers and returns an integer, as specified in the question:
int(max((''.join(y) for y in permutations(str(x) for x in lst)), key=int))
=> 9955171
You can do this with some clever sorting.
If two strings are the same length, choose the larger of the two to come first. Easy.
If they're not the same length, figure out what would be the result if the best possible combination were appended to the shorter one. Since everything that follows the shorter one must be equal to or less than it, you can determine this by appending the short one to itself until it's the same size as the longer one. Once they're the same length you do a direct comparison as before.
If the second comparison is equal, you've proven that the shorter string can't possibly be better than the longer one. Depending on what it's paired with it could still come out worse, so the longer one should come first.
def compare(s1, s2):
if len(s1) == len(s2):
return -1 if s1 > s2 else int(s2 > s1)
s1x, s2x = s1, s2
m = max(len(s1), len(s2))
while len(s1x) < m:
s1x = s1x + s1
s1x = s1x[:m]
while len(s2x) < m:
s2x = s2x + s2
s2x = s2x[:m]
return -1 if s1x > s2x or (s1x == s2x and len(s1) > len(s2)) else 1
def solve_puzzle(seq):
return ''.join(sorted([str(x) for x in seq], cmp=compare))
>>> solve_puzzle([9, 1, 95, 17, 5])
'9955171'
>>> solve_puzzle([97, 9, 13])
'99713'
>>> solve_puzzle([936, 21, 212])
'93621221'
>>> solve_puzzle([87, 17, 878])
'8788717'
>>> solve_puzzle([97, 9, 13, 979])
'99799713'
This should be much more efficient than running through all the permutations.
import itertools
def largestInt(a):
b = list(itertools.permutations(a))
c = []
x = ""
for i in xrange(len(b)):
c.append(x.join(map(str, b[i])))
return max(c)

Categories