Adjacency matrix without nested loop - python

I'm trying to create a adjacency matrix for a set of team players. I've stored the match details like this-
x={'match1':[1,2,3],'match2':[2,3,4],'match3':[3,4,5]}
Here each key word has a list of values that contains the team players that played for that match.
I'm trying to create an adjacency matrix that shows how many matches each team player played with another team member
The output should look like this
1 2 3 4 5
1 1 1 1 0 0
2 1 2 2 1 0
3 1 2 3 2 1
4 0 1 2 2 1
5 0 0 1 1 1
(i,i) element in the matrix is the total number of matches played by that team member. I've been able to calculate the values correctly using Counter.
from collections import defaultdict, Counter
if __name__=='__main__':
x = {'match1': [1, 2, 3], 'match2': [2, 3, 4], 'match3': [3, 4, 5]}
d = defaultdict(list)
col_count = dict()
for key, value in x.items():
for i in value:
d[i] += value
for key, value in d.items():
col_count[key] = Counter(d[key])
print(col_count)
The output is:
{1: Counter({1: 1, 2: 1, 3: 1}), 2: Counter({2: 2, 3: 2, 1: 1, 4: 1}), 3: Counter({3: 3, 2: 2, 4: 2, 1: 1, 5: 1}), 4: Counter({3: 2, 4: 2, 2: 1, 5: 1}), 5: Counter({3: 1, 4: 1, 5: 1})}
Given that x dictionary will contain a large number of keys and each key will have a list with many elements, I wish to avoid the use of nested for loops.
Is it possible to store the match details as any other data type so that subsequent computation will not require nested loops?
If a dictionary is the best way, is it possible to calculate the matrix some other way?

Without modifying the input and output formats I do not see how to avoid nested loops as you have information grouped by match and you want to extract it by player. What you could actually do is avoid the last loop by creating the Counter inside the nested loop:
from collections import defaultdict, Counter
if __name__=='__main__':
x = {'match1': [1, 2, 3], 'match2': [2, 3, 4], 'match3': [3, 4, 5]}
d = defaultdict(Counter)
for key, value in x.items():
for i in value:
d[i].update(value)
print(d)
If the input can be modified you could go for something like:
from collections import Counter
if __name__=='__main__':
x = {1: [{1, 2, 3}], 2: [{1, 2, 3}, {2, 3, 4}], 3: [{1, 2, 3}, {2, 3, 4}, {3, 4, 5}], 4: [{2, 3, 4}, {3, 4, 5}], 5: [{3, 4, 5}]}
d = {player: Counter([p for match in matches for p in match]) for player, matches in x.items()}
print(d)
Where you swap the nested loops for a dict and a list comprehension which should be more efficient. Probably players and matches are not ints and lists of ints so this could be done a bit more redable. For example:
from collections import defaultdict, Counter
def printMatrix(matrix):
print(' '.join([' |'] + list(map(str, matrix.keys()))))
print('---+-' + '-'*len(matrix)*2)
for row, values in matrix.items():
fmt = ' {row} |'
for col in matrix.keys():
fmt += ' {values[' + str(col) + ']}'
print(fmt.format(row=row, values=values))
class Entity:
def __init__(self):
self._id = None
#classmethod
def register(cls, value):
if value in cls.ids:
raise ValueError("The provided ID is already in use")
cls.ids.add(value)
#classmethod
def unregister(cls, value):
if value is not None:
cls.ids.remove(value)
#property
def id(self):
return self._id
#id.setter
def id(self, value):
if value == self.id:
return
self.register(value)
self.unregister(self.id)
self._id = value
class Player(Entity):
ids = set()
def __init__(self, pid):
super().__init__()
self.id = pid
self.__matches = set()
def __repr__(self):
return 'Player<{}>'.format(self.id)
#property
def matches(self):
return set(self.__matches)
def inscribe(self, match):
if match not in self.__matches:
self.__matches.add(match)
def delist(self, match):
self.__matches.remove(match)
class Match(Entity):
ids = set()
def __init__(self, mid, players):
super().__init__()
self.id = mid
self.__players = set()
self.players = players
for player in players:
player.inscribe(self)
def __repr__(self):
return 'Match<{}>'.format(self.id)
#property
def players(self):
return set(self.__players)
#players.setter
def players(self, value):
for player in self.__players:
player.delist(self)
self.__players = set(value)
for player in self.__players:
player.inscribe(self)
if __name__=='__main__':
players = [Player(i) for i in range(1, 6)]
matches = [Match(i, {players[i-1], players[i], players[i+1]}) for i in range(1, 4)]
for player in players:
print(player, player.matches)
for match in matches:
print(match, match.players)
d = {player.id: Counter([p.id for match in player.matches for p in match.players]) for player in players}
printMatrix(d)
The printMatrix() function is just a helper I made to pretty-print the output into the screen.
The Entity class avoids duplicate code that would be needed for both the Player and Match classes as they both have unique IDs. The constructor creates a empty _id attribute. The register() and unregister() methods handle adding and removing IDs from the class attribute ids. It also declares the id property with its getter and setter. Children classes only need to call super().__init__() at the constructor and create the ids class attribute at the level where the ID-uniqueness wants to be enforced, as both Player and Match are doing.
The Player class additionally has the matches instance read-only property that is populated and depopulated with inscribe() and delist() methods respectively. The Match class has the players property with its getter and setter methods.
First the players and matches are created with two list comprehensions (remember that lists start at position 0, so the Player with ID 1 is at players[0]) and printed with their corresponding relations (matches that they play for players and players that participate for the matches).
As both are keeping a reference to the other type, we can get all the info needed to build the dict of Counters you requested just from players.

Related

Loops and Dictionary

How to loop in a list while using dictionaries and return the value that repeats the most, and if the values are repeated the same amount return that which is greater?
Here some context with code unfinished
def most_frequent(lst):
dict = {}
count, itm = 0, ''
for item in lst:
dict[item] = dict.get(item, 0) + 1
if dict[item] >= count:
count, itm = dict[item], item
return itm
#lst = ["a","b","b","c","a","c"]
lst = [2, 3, 2, 2, 1, 3, 3,1,1,1,1] #this should return 1
lst2 = [2, 3, 2, 2, 1, 3, 3] # should return 3
print(most_frequent(lst))
Here is a different way to go about it:
def most_frequent(lst):
# Simple check to ensure lst has something.
if not lst:
return -1
# Organize your data as: {number: count, ...}
dct = {}
for i in lst:
dct[i] = dct[i] + 1 if i in dct else 1
# Iterate through your data and create a list of all large elements.
large_list, large_count = [], 0
for num, count in dct.items():
if count > large_count:
large_count = count
large_list = [num]
elif count == large_count:
large_list.append(num)
# Return the largest element in the large_list list.
return max(large_list)
There are many other ways to solve this problem, including using filter and other built-ins, but this is intended to give you a working solution so that you can start thinking on how to possibly optimize it better.
Things to take out of this; always think:
How can I break this problem down into smaller parts?
How can I organize my data so that it is more useful and easier to manipulate?
What shortcuts can I use along the way to make this function easier/better/faster?
Your code produces the result as you describe in your question, i.e. 1. However, your question states that you want to consider the case where two list elements are co-equals in maximum occurrence and return the largest. Therefore, tracking and returning a single element doesn't satisfy this requirement. You need to compile the dict and then evaluate the result.
def most_frequent(lst):
dict = {}
for item in lst:
dict[item] = dict.get(item, 0) + 1
itm = sorted(dict.items(), key = lambda kv:(-kv[1], -kv[0]))
return itm[0]
#lst = ["a","b","b","c","a","c"]
lst = [2, 3, 2, 2, 2, 2, 1, 3, 3,1,1,1,1] #this should return 1
lst2 = [2, 3, 2, 2, 1, 3, 3] # should return 3
print(most_frequent(lst))
I edited the list 'lst' so that '1' and '2' both occur 5 times. The result returned is a tuple:
(2,5)
I reuse your idea which is quite neat, and I just modified your program a bit.
def get_most_frequent(lst):
counts = dict()
most_frequent = (None, 0) # (item, count)
ITEM_IDX = 0
COUNT_IDX = 1
for item in lst:
counts[item] = counts.get(item, 0) + 1
if most_frequent[ITEM_IDX] is None:
# first loop, most_frequent is "None"
most_frequent = (item, counts[item])
elif counts[item] > most_frequent[COUNT_IDX]:
# if current item's "counter" is bigger than the most_frequent's counter
most_frequent = (item, counts[item])
elif counts[item] == most_frequent[COUNT_IDX] and item > most_frequent[ITEM_IDX]:
# if the current item's "counter" is the same as the most_frequent's counter
most_frequent = (item, counts[item])
else:
pass # do nothing
return most_frequent
lst1 = [2, 3, 2, 2, 1, 3, 3,1,1,1,1, 2] # 1: 5 times
lst2 = [2, 3, 1, 3, 3, 2, 2] # 3: 3 times
lst3 = [1]
lst4 = []
print(get_most_frequent(lst1))
print(get_most_frequent(lst2))
print(get_most_frequent(lst3))
print(get_most_frequent(lst4))

Return the only element in the hash table in Python

I am working on a problem as:
In a non-empty array of integers, every number appears twice except for one, find that single number.
I tried to work it out by the hash table:
class Solution:
def singleNumber(self, array):
hash = {}
for i in array:
if i not in hash:
hash[i] = 0
hash[i] += 1
if hash[i] == 2:
del hash[i]
return hash.keys()
def main():
print(Solution().singleNumber([1, 4, 2, 1, 3, 2, 3]))
print(Solution().singleNumber([7, 9, 7]))
main()
returning the result as:
dict_keys([4])
dict_keys([9])
Process finished with exit code 0
I am not sure if there is any way that I can return only the number, e.g. 4 and 9. Thanks for your help.
Instead of return hash.keys() do return hash.popitem()[0] or return list(hash.keys())[0].
Of course this assumes that there is at least one pair in the hashmap. You can check for this using len(hash) > 0 before accessing the first element:
class Solution:
def singleNumber(self, array):
hash = {}
for i in array:
if i not in hash:
hash[i] = 0
hash[i] += 1
if hash[i] == 2:
del hash[i]
return hash.popitem()[0] if len(hash) > 0 else -1 # or throw an error
One solution that might be simpler is to use the .count method.
myList = [1, 4, 2, 1, 3, 2, 3]
non_repeating_numbers = []
for n in myList:
if myList.count(n) < 2:
non_repeating_numbers.append(n)
Applied to your code it could look something like this:
class Solution:
def singleNumber(self, array):
for n in array:
if array.count(n) < 2:
return n
def main():
print(Solution().singleNumber([1, 4, 2, 1, 3, 2, 3]))
print(Solution().singleNumber([7, 9, 7]))

How to find the number of every length of contiguous sequences of values in a list?

Problem
Given a sequence (list or numpy array) of 1's and 0's how can I find the number of contiguous sub-sequences of values? I want to return a JSON-like dictionary of dictionaries.
Example
[0, 0, 1, 1, 0, 1, 1, 1, 0, 0] would return
{
0: {
1: 1,
2: 2
},
1: {
2: 1,
3: 1
}
}
Tried
This is the function I have so far
def foo(arr):
prev = arr[0]
count = 1
lengths = dict.fromkeys(arr, {})
for i in arr[1:]:
if i == prev:
count += 1
else:
if count in lengths[prev].keys():
lengths[prev][count] += 1
else:
lengths[prev][count] = 1
prev = i
count = 1
return lengths
It is outputting identical dictionaries for 0 and 1 even if their appearance in the list is different. And this function isn't picking up the last value. How can I improve and fix it? Also, does numpy offer any quicker ways to solve my problem if my data is in a numpy array? (maybe using np.where(...))
You're suffering from Ye Olde Replication Error. Let's instrument your function to show the problem, adding one line to check the object ID of each dict in the list:
lengths = dict.fromkeys(arr, {})
print(id(lengths[0]), id(lengths[1]))
Output:
140130522360928 140130522360928
{0: {2: 2, 1: 1, 3: 1}, 1: {2: 2, 1: 1, 3: 1}}
The problem is that you gave the same dict as initial value for each key. When you update either of them, you're changing the one object to which they both refer.
Replace it with an explicit loop -- not a mutable function argument -- that will create a new object for each dict entry:
for key in lengths:
lengths[key] = {}
print(id(lengths[0]), id(lengths[1]))
Output:
139872021765576 139872021765288
{0: {2: 1, 1: 1}, 1: {2: 1, 3: 1}}
Now you have separate objects.
If you want a one-liner, use a dict comprehension:
lengths = {key: {} for key in lengths}

creating itemsets in apriori algorithm

I am reading about association analysis in book titled Machine learning in action. Following code is given in book
The k-2 thing may be a little confusing. Let’s look at that a little
further. When you were creating {0,1} {0,2}, {1,2} from {0}, {1}, {2},
you just combined items. Now, what if you want to use {0,1} {0,2},
{1,2} to create a three-item set? If you did the union of every set,
you’d get {0, 1, 2}, {0, 1, 2}, {0, 1, 2}. That’s right. It’s the same
set three times. Now you have to scan through the list of three-item
sets to get only unique values. You’re trying to keep the number of
times you go through the lists to a minimum. Now, if you compared the
first element {0,1} {0,2}, {1,2} and only took the union of those that
had the same first item, what would you have? {0, 1, 2} just one time.
Now you don’t have to go through the list looking for unique values.
def aprioriGen(Lk, k): #creates Ck
retList = []
lenLk = len(Lk)
for i in range(lenLk):
for j in range(i+1, lenLk):
L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2] # Join sets if first k-2 items are equal
L1.sort(); L2.sort()
if L1==L2:
retList.append(Lk[i] | Lk[j])
return retLis
Suppose i am calling above function
Lk = [frozenset({2, 3}), frozenset({3, 5}), frozenset({2, 5}), frozenset({1, 3})]
k = 3
aprioriGen(Lk,3)
I am geting following output
[frozenset({2, 3, 5})]
I think there is bug in above logic since we are missing other combinations like {1,2,3}, {1,3,5}. Isn't it? Is my understanding right?
I think you are following the below link, Output set depends on the minSupport what we pass.
http://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/
If we reduce the minSupport value to 0.2, we get all sets.
Below is the complete code
# -*- coding: utf-8 -*-
"""
Created on Mon Dec 31 16:57:26 2018
#author: rponnurx
"""
from numpy import *
def loadDataSet():
return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]
def createC1(dataSet):
C1 = []
for transaction in dataSet:
for item in transaction:
if not [item] in C1:
C1.append([item])
C1.sort()
return list(map(frozenset, C1))#use frozen set so we
#can use it as a key in a dict
def scanD(D, Ck, minSupport):
ssCnt = {}
for tid in D:
for can in Ck:
if can.issubset(tid):
if not can in ssCnt: ssCnt[can]=1
else: ssCnt[can] += 1
numItems = float(len(D))
retList = []
supportData = {}
for key in ssCnt:
support = ssCnt[key]/numItems
if support >= minSupport:
retList.insert(0,key)
supportData[key] = support
return retList, supportData
dataSet = loadDataSet()
print(dataSet)
C1 = createC1(dataSet)
print(C1)
#D is a dataset in the setform.
D = list(map(set,dataSet))
print(D)
L1,suppDat0 = scanD(D,C1,0.5)
print(L1)
def aprioriGen(Lk, k): #creates Ck
retList = []
print("Lk")
print(Lk)
lenLk = len(Lk)
for i in range(lenLk):
for j in range(i+1, lenLk):
L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2]
L1.sort(); L2.sort()
if L1==L2: #if first k-2 elements are equal
retList.append(Lk[i] | Lk[j]) #set union
return retList
def apriori(dataSet, minSupport = 0.5):
C1 = createC1(dataSet)
D = list(map(set, dataSet))
L1, supportData = scanD(D, C1, minSupport)
L = [L1]
k = 2
while (len(L[k-2]) > 0):
Ck = aprioriGen(L[k-2], k)
Lk, supK = scanD(D, Ck, minSupport)#scan DB to get Lk
supportData.update(supK)
L.append(Lk)
k += 1
return L, supportData
L,suppData = apriori(dataSet,0.2)
print(L)
Output:
[[frozenset({5}), frozenset({2}), frozenset({4}), frozenset({3}), frozenset({1})], [frozenset({1, 2}), frozenset({1, 5}), frozenset({2, 3}), frozenset({3, 5}), frozenset({2, 5}), frozenset({1, 3}), frozenset({1, 4}), frozenset({3, 4})], [frozenset({1, 3, 5}), frozenset({1, 2, 3}), frozenset({1, 2, 5}), frozenset({2, 3, 5}), frozenset({1, 3, 4})], [frozenset({1, 2, 3, 5})], []]
Thanks,
Rajeswari Ponnuru

efficient list mapping in python

I have the following input:
input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
and trying to have the following output:
outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]
outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}
Any tips on how to handle given with scalability in mind (var input can get really large).
You probably want something like:
import collections
import itertools
def build_catalog(L):
counter = itertools.count().next
names = collections.defaultdict(counter)
result = []
for t in L:
new_t = [ names[item] for item in t ]
result.append(new_t)
catalog = dict((name, idx) for idx, name in names.iteritems())
return result, catalog
Using it:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
This class will automatically map objects to increasing integer values:
class AutoMapping(object):
def __init__(self):
self.map = {}
self.objects = []
def __getitem__(self, val):
if val not in self.map:
self.map[val] = len(self.objects)
self.objects.append(val)
return self.map[val]
Example usage, for your input:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
Here is one possible solution, although it isn't the greatest. It could be made slightly more efficient if you know how many elements each entry in the list will have before-hand, by pre-allocating them.
labels=[];
label2index={};
outputlist=[];
for group in input:
current=[];
for label in group:
if label not in label2index:
label2index[label]=len(labels);
labels.append(label);
current.append(label2index[label]);
outputlist.append(current);
outputmapping={};
for idx, val in enumerate(labels):
outputmapping[idx]=val;
I had the same problem quite often in my projects, so I wrapped up a class some time ago that does exactly this:
class UniqueIdGenerator(object):
"""A dictionary-like class that can be used to assign unique integer IDs to
names.
Usage:
>>> gen = UniqueIdGenerator()
>>> gen["A"]
0
>>> gen["B"]
1
>>> gen["C"]
2
>>> gen["A"] # Retrieving already existing ID
0
>>> len(gen) # Number of already used IDs
3
"""
def __init__(self, id_generator=None):
"""Creates a new unique ID generator. `id_generator` specifies how do we
assign new IDs to elements that do not have an ID yet. If it is `None`,
elements will be assigned integer identifiers starting from 0. If it is
an integer, elements will be assigned identifiers starting from the given
integer. If it is an iterator or generator, its `next` method will be
called every time a new ID is needed."""
if id_generator is None:
id_generator = 0
if isinstance(id_generator, int):
import itertools
self._generator = itertools.count(id_generator)
else:
self._generator = id_generator
self._ids = {}
def __getitem__(self, item):
"""Retrieves the ID corresponding to `item`. Generates a new ID for `item`
if it is the first time we request an ID for it."""
try:
return self._ids[item]
except KeyError:
self._ids[item] = self._generator.next()
return self._ids[item]
def __len__(self):
"""Retrieves the number of added elements in this UniqueIDGenerator"""
return len(self._ids)
def reverse_dict(self):
"""Returns the reversed mapping, i.e., the one that maps generated IDs to their
corresponding items"""
return dict((v, k) for k, v in self._ids.iteritems())
def values(self):
"""Returns the list of items added so far. Items are ordered according to
the standard sorting order of their keys, so the values will be exactly
in the same order they were added if the ID generator generates IDs in
ascending order. This hold, for instance, to numeric ID generators that
assign integers starting from a given number."""
return sorted(self._ids.keys(), key = self._ids.__getitem__)
Usage example:
>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

Categories