Out of a random list of integers, with integers being repeated in the list, what is the way to print that integer out of the list which is not repeated at all?
I have tried to solve the question by making the following program:
K = int(input())
room_list = list(input().split())
room_set = set(room_list)
for i in room_set:
count = room_list.count(i)
if count == 1:
i = int(i)
print(i)
break
K being the number of the elements in the list.
When I try to run the above program, it works well in the case of less elements however, when it is tested with a list having (say, 825) elements, the program times out.
Please help me in optimizing the above code.
Elements whose repetition count in the list is one will be your answer.
from collections import Counter
a = [1,1,1,2,2,3,4,5,5]
c = Counter(a) # O(n)
x = [key for key, val in c.items() if val == 1]
print(x)
Output:
[3,4]
Counter class creates a dictionary of elements and repetitions by iterating through the list once that takes time O(n) and each element's access takes O(1) time.
The count function of the list iterates every time you call it on a list. In your case taking O(n^2) time.
This will print the number that occured least often:
data = [3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,93,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9]
from collections import Counter
# least often
print (Counter(data).most_common()[-1][0])
# all non-repeated onces
# non_repeat = [k[0] for k in Counter(data).most_common() if k[1] == 1]
Output:
93
It uses a specialized dictionary: collection.Counter thats built for counting things in iterable you give it.
The method .most_common() returns a sorted list of tuples of (key, count) - by printing its last member you get the one thats least often.
The built-up dict looks like this:
Counter({4: 4, 5: 4, 6: 4, 7: 4, 8: 4, 3: 3, 9: 3, 0: 2, 1: 2, 2: 2, 93: 1})
A similar approach is to use a collections.defaultdict and count them yourself, then get the one with the minimal value:
from collections import defaultdict
k = defaultdict(int)
for elem in data:
k[elem] += 1
print( min(k.items(), key=lambda x:x[1]) )
The last solutions is similar in approach without the specialized Counter - the advantage of both of them is that you iterate once over the whole list and increment a value instead of iterating n times over the whole list and count each distinct elements occurences once.
Using count() on a list of pure distinct elements would lead to n counting-runs through n elements = n^2 actions needed.
The dictionary approach only needs one pass though the list so only n actions needed.
Related
How can I count the number of a specific values in a multi-value dictionary?
For example, if I have the keys A and B with different sets of numbers as values, I want get the count of each number amongst all of the dictionary's keys.
I've tried this code, but I get 0 instead of 2.
dic = {'A':{0,1,2},'B':{1,2}}
print(sum(value == 1 for value in dic.values()))
Counter is a good option for this, especially if you want more than a single result:
from collections import Counter
from itertools import chain
from collections import Counter
count = Counter(chain(*(dic.values())))
In the REPL:
>>> count
Counter({1: 2, 2: 2, 0: 1})
>>> count.get(1)
2
Counter simply tallies each item in a list. By using chain we treat a list of lists as simply one large list, gluing everything together. Feeding this right to Counter does the work of counting how many of each item there is.
I understand the basics of this problem however I need help on how I can do this the most efficient way possible (taking the least amount of time for the programmer however not substituting stability of the code or efficiency).
Let's say we have a string:
grades=str(input("Enter a string"))
in my code, I would join a space between all characters in the string above and then split the characters into separate items in the same list:
grades=" ".join(grades)
grades.split(" ")
I then want to use loops of some sort to search the list for repeating items. However, I want to learn how I can do this the most efficient way possible:
x=len(grades)
for i in range(0, x):
if grades[i] == # here is were I'm having trouble
I want to know how I can search whether 1 item in the list is equal to any item in the whole list itself. Kind regards.
I make an example:
from collections import Counter
a =[1,2,3,4,1,2]
c = Counter(a)
for k,v in c.items():
if v>1:
print(k,'repeated more than once')
Here the c will be a Counter object like this Counter({1: 2, 2: 2, 3: 1, 4: 1}). the keys are the array values and values are the count of them.
So I write the for for your understanding. You can do anything with c, it acts like a dict.
>> [k for k,v in c.items() if v>1]
[1, 2]
I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].
This is my code:
a = list(set(l))
res = [a.index(x) for x in l]
Which turns out to be very slow.
l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?
You can do this in O(N) time using a defaultdict and a list comprehension:
>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
In Python 3 use __next__ instead of next.
If you're wondering how it works?
The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.
d at the end will look like this:
>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})
The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.
The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.
Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.
res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)
Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:
[1,1,2,3,1,2]
then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you
import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
and for this example the output of y is:
y = [0,0,2,1,0,2]
I tested this for 1,000,000 entries and it was done essentially immediately.
Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.
To make it O(n), get rid of index() and store indexes in a dictionary:
>>> idx, indexes = 1, {}
>>> for x in l:
... if x not in indexes:
... indexes[x] = idx
... idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.
You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)
For completness, you can also do it eagerly:
from itertools import count
wordid = dict(zip(set(list_), count(1)))
This uses a set to obtain the unique words in list_, pairs
each of those unique words with the next value from count() (which
counts upwards), and constructs a dictionary from the results.
Original answer, written by nneonneo.
I have a program that goes through a list and for each objects finds the next instance that has a matching value. When it does it prints out the location of each objects. The program runs perfectly fine but the trouble I am running into is when I run it with a large volume of data (~6,000,000 objects in the list) it will take much too long. If anyone could provide insight into how I can make the process more efficient, I would greatly appreciate it.
def search(list):
original = list
matchedvalues = []
count = 0
for x in original:
targetValue = x.getValue()
count = count + 1
copy = original[count:]
for y in copy:
if (targetValue == y.getValue):
print (str(x.getLocation) + (,) + str(y.getLocation))
break
Perhaps you can make a dictionary that contains a list of indexes that correspond to each item, something like this:
values = [1,2,3,1,2,3,4]
from collections import defaultdict
def get_matches(x):
my_dict = defaultdict(list)
for ind, ele in enumerate(x):
my_dict[ele].append(ind)
return my_dict
Result:
>>> get_matches(values)
defaultdict(<type 'list'>, {1: [0, 3], 2: [1, 4], 3: [2, 5], 4: [6]})
Edit:
I added this part, in case it helps:
values = [1,1,1,1,2,2,3,4,5,3]
def get_next_item_ind(x, ind):
my_dict = get_matches(x)
indexes = my_dict[x[ind]]
temp_ind = indexes.index(ind)
if len(indexes) > temp_ind + 1:
return(indexes)[temp_ind + 1]
return None
Result:
>>> get_next_item_ind(values, 0)
1
>>> get_next_item_ind(values, 1)
2
>>> get_next_item_ind(values, 2)
3
>>> get_next_item_ind(values, 3)
>>> get_next_item_ind(values, 4)
5
>>> get_next_item_ind(values, 5)
>>> get_next_item_ind(values, 6)
9
>>> get_next_item_ind(values, 7)
>>> get_next_item_ind(values, 8)
There are a few ways you could increase the efficiency of this search by minimising additional memory use (particularly when your data is BIG).
you can operate directly on the list you are passing in, and don't need to make copies of it, in this way you won't need: original = list, or copy = original[count:]
you can use slices of the original list to test against, and enumerate(p) to iterate through these slices. You won't need the extra variable count and, enumerate(p) is efficient in Python
Re-implemented, this would become:
def search(p):
# iterate over p
for i, value in enumerate(p):
# if value occurs more than once, print locations
# do not re-test values that have already been tested (if value not in p[:i])
if value not in p[:i] and value in p[(i + 1):]:
print(e, ':', i, p[(i + 1):].index(e))
v = [1,2,3,1,2,3,4]
search(v)
1 : 0 2
2 : 1 2
3 : 2 2
Implementing it this way will only print out the values / locations where a value is repeated (which I think is what you intended in your original implementation).
Other considerations:
More than 2 occurrences of value: If the value repeats many times in the list, then you might want to implement a function to walk recursively through the list. As it is, the question doesn't address this - and it may be that it doesn't need to in your situation.
using a dictionary: I completely agree with Akavall above, dictionary's are a great way of looking up values in Python - especially if you need to lookup values again later in the program. This will work best if you construct a dictionary instead of a list when you originally create the list. But if you are only doing this once, it is going to cost you more time to construct the dictionary and query over it than simply iterating over the list as described above.
Hope this helps!
I'm looking for a way to make a list containing list (a below) into a single list (b below) with 2 conditions:
The order of the new list (b) is based on the number of times the value has occurred in some of the lists in a.
A value can only appear once
Basically turn a into b:
a = [[1,2,3,4], [2,3,4], [4,5,6]]
# value 4 occurs 3 times in list a and gets first position
# value 2 occurs 2 times in list a and get second position and so on...
b = [4,2,3,1,5,6]
I figure one could do this with set and some list magic. But can't get my head around it when a can contain any number of list. The a list is created based on user input (I guess that it can contain between 1 - 20 list with up 200-300 items in each list).
My trying something along the line with [set(l) for l in a] but don't know how to perform set(l) & set(l).... to get all matched items.
Is possible without have a for loop iterating sublist count * items in sublist times?
I think this is probably the closest you're going to get:
from collections import defaultdict
d = defaultdict(int)
for sub in outer:
for val in sub:
d[val] += 1
print sorted(d.keys(), key=lambda k: d[k], reverse = True)
# Output: [4, 2, 3, 1, 5, 6]
There is an off chance that the order of elements that appear an identical number of times may be indeterminate - the output of d.keys() is not ordered.
import itertools
all_items = set(itertools.chain(*a))
b = sorted(all_items, key = lambda y: -sum(x.count(y) for x in a))
Try this -
a = [[1,2,3,4], [2,3,4], [4,5,6]]
s = set()
for l in a:
s.update(l)
print s
#set([1, 2, 3, 4, 5, 6])
b = list(s)
This will add each list to the set, which will give you a unique set of all elements in all the lists. If that is what you are after.
Edit. To preserve the order of elements in the original list, you can't use sets.
a = [[1,2,3,4], [2,3,4], [4,5,6]]
b = []
for l in a:
for i in l:
if not i in b:
b.append(i)
print b
#[1,2,3,4,5,6] - The same order as the set in this case, since thats the order they appear in the list
import itertools
from collections import defaultdict
def list_by_count(lists):
data_stream = itertools.chain.from_iterable(lists)
counts = defaultdict(int)
for item in data_stream:
counts[item] += 1
return [item for (item, count) in
sorted(counts.items(), key=lambda x: (-x[1], x[0]))]
Having the x[0] in the sort key ensures that items with the same count are in some kind of sequence as well.