Seperate lists based on indices - python

I have 2 lists:
data = [0, 1, 2, 3, 7, 8, 9, 10]
indices = [1, 1, 0, 0, 0, 2, 1, 0]
I want to append the data to a 2-D array given the indices which correspond to the 2-D array. meaning:
new_list = [[]]*len(set(indices))
Where new_list will results as follows:
new_list = [[2,3,7,10],[0,1,9],[8]]
I am using this code:
for i in range(len(set(indices)):
for j in range(len(indices)):
if indices[j] == i:
new_list[i].append(data[j])
else:
pass
However, I get this:
new_list = [[2, 3, 7, 10, 0, 1, 9, 8], [2, 3, 7, 10, 0, 1, 9, 8], [2, 3, 7, 10, 0, 1, 9, 8]]
I am not sure what mistake I am doing, any help is appreciated!

You can use a dict to map the values to their respective indices, and then use a range to output them in order, so that this will only cost O(n) in time complexity:
d = {}
for i, n in zip(indices, data):
d.setdefault(i, []).append(n)
newlist = [d[i] for i in range(len(d))]
newlist becomes:
[[2, 3, 7, 10], [0, 1, 9], [8]]

You're iterating your indices completely for every value, which is wasteful. You're also multiplying a list of lists, which doesn't do what you expect (it makes a list of many references to the same underlying list). You want to pair up indices and values instead (so you do O(n) work, not O(n**2)), which is what zip was made for, and make your list of empty lists safely (a list of several independent lists):
data = [0, 1, 2, 3, 7, 8, 9, 10]
indices = [1, 1, 0, 0, 0, 2, 1, 0]
# Use max because you really care about the biggest index, not the number of unique indices
# A list comprehension producing [] each time produces a *new* list each time
new_list = [[] for _ in range(max(indices)+1)]
# Iterate each datum and matching index in parallel exactly once
for datum, idx in zip(data, indices):
new_list[idx].append(datum)

To get at this, i zipped the data with its index:
>>>data = [0, 1, 2, 3, 7, 8, 9, 10]
>>>indices = [1, 1, 0, 0, 0, 2, 1, 0]
>>>buff = sorted(list(zip(indices,data)))
>>>print(buff)
[(0, 2), (0, 3), (0, 7), (0, 10), (1, 0), (1, 1), (1, 9), (2, 8)]
Then I used the set of unique indices as a way to determine if the data gets included in a new list. This is done with nested list comprehensions.
>>>new_list = list(list((b[1] for b in buff if b[0]==x)) for x in set(indices))
>>>print(new_list)
[[2, 3, 7, 10], [0, 1, 9], [8]]
I hope this helps.

Related

list of unique elements formed by concatenating permutations of the initial lists

I would like to combine several lists, each lists should be preserved up to a permutation.
Here is an example:
I would like to combine these lists
[[0, 7], [2, 4], [0, 1, 7], [0, 1, 4, 7]]
The output I would like to obtain is e.g. this list
[2, 4, 0, 7, 1]
Or as Sembei Norimaki phrased the task:
the result must be a list of unique elements formed by concatenating permutations of the initial lists.
The solution is not unique, and it could be that there is not always a solution possible
Third time lucky. This is a bit cheesy - it checks every permutation of the source list elements to see which ones are valid:
from itertools import permutations
def check_sublist(sublist, candidate):
# a permutation of sublist must exist within the candidate list
sublist = set(sublist)
# check each len(sublist) portion of candidate
for i in range(1 + len(candidate) - len(sublist)):
if sublist == set(candidate[i : i + len(sublist)]):
return True
return False
def check_list(input_list, candidate):
for sublist in input_list:
if not check_sublist(sublist, candidate):
return False
return True
def find_candidate(input_list):
# flatten input_list and make set of unique values
values = {x for sublist in input_list for x in sublist}
for per in permutations(values):
if check_list(input_list, per):
print(per)
find_candidate([[0, 7], [2, 4], [0, 1, 7], [0, 1, 4, 7]])
# (0, 7, 1, 4, 2)
# (1, 0, 7, 4, 2)
# (1, 7, 0, 4, 2)
# (2, 4, 0, 7, 1)
# (2, 4, 1, 0, 7)
# (2, 4, 1, 7, 0)
# (2, 4, 7, 0, 1)
# (7, 0, 1, 4, 2)
You'd definitely do better applying a knowledge of graph theory and using a graphing library, but that's beyond my wheelhouse at present!

Sum of every element in a list

So i have a list:
list1 = [[1, 3, 6, 8, 9, 9, 12], [1, 2, 3, 2, 1, 0, 3, 3]]
but you can also split it into two lists, if it make it any easier. All i have to do is sum every digit with every other digit. Like you know
first row:
1+1, 1+2, 1+3, 1+2, 1+1...
second:
3+1... etc.
first = [1, 3, 6, 8, 9, 9, 12]
second = [1, 2, 3, 2, 1, 0, 3, 3]
w = [x + y for x, y in zip(first, second)]
I was trying to do it in this way. But it doesn't work*, any ideas?
*i mean its summing in a wrong way, instead of every possible digits with every possible, just the first one in 1st list with 1st in second list.
zip is getting only pairs that sit at the same index. You should instead have a double loop:
[x + y for x in first for y in second]
You can do it using itertools to get all possible pair then make a pair of sum list
import itertools
first = [1, 3, 6, 8, 9, 9, 12]
second = [1, 2, 3, 2, 1, 0, 3, 3]
res = itertools.product(first, second)
ress = [sum(pair) for pair in res]
print(ress)

Python: choosing indices from array that correspond to elements of specific value

I have an array that looks like this:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
I want to write a function that will randomly return some specified number of indices that correspond to a specified number. In other words, if I pass the function the array x, the desired number of indices such as 3, and the target value 1, I would want it to return an array such as:
[0, 7, 13]
Since 0, 7, and 13 are the indices that correspond to 1 in x.
Does anyone know how I might do this efficiently?
You want to use random.sample for this:
import random
def f(arr, target, num):
return random.sample([i for i, x in enumerate(arr) if x == target], k=num)
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
print(f(x, 1, 3))
Output:
[0, 1, 15]
You can use the sample function from the random module and pass it the list of indices that match the specified value:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
from random import sample
def randomIndices(a,count,v):
return sample([i for i,n in enumerate(a) if n==v],count)
print(randomIndices(x,3,1)) # [1,18,15]
Your question asks how to do this efficiently, which depends on how you plan on using this code. As myself and others have pointed out, one way is to use enumerate to filter the list for the indices that correspond to the target value. The downside here is that each time you pick a new target value or request a new sample, you have to once again enumerate the list which is an O(n) operation.
If you plan on taking multiple samples, you may be better off building a dictionary mapping the target value to the indices upfront. Then you can subsequently use this dictionary to draw random samples more efficiently than enumerating. (The magnitude of the savings would grow as x becomes very large).
First build the dictionary using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for i, val in enumerate(x):
d[val].append(i)
print(dict(d))
#{1: [0, 1, 7, 13, 15, 16, 18], 2: [2, 5, 6, 8, 10, 12, 14, 17], 3: [3, 4, 9, 11]}
Now you can use d to draw your samples:
from random import sample
def get_random_sample(d, target_value, size):
return sample(d[target_value], size)
print(get_random_sample(d, target_value=1, size=3))
#[16, 7, 18]
You can do the next:
Get the indices of the items with value equal to 1
Use random.sample to select randomly only a few indices (without repetitions) extracted from the previous step.
Here is one way to do it (n indicates the number of indices to pick):
from random import sample
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
n = 3
target = 1
indices = frozenset(filter(lambda k: x[k] == target, range(len(x))))
out = sample(indices, min(len(indices), n))
print(out)
Note that the number of returned indices could be lower than n (if the number of 1s in the list is less than n)

Consecutive numbers list where each number repeats

How can I create a list of consecutive numbers where each number repeats N times, for example:
list = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5]
Another idea, without any need for other packages or sums:
[x//N for x in range((M+1)*N)]
Where N is your number of repeats and M is the maximum value to repeat. E.g.
N = 3
M = 5
[x//N for x in range((M+1)*N)]
yields
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
My first instinct is to get some functional help from the funcy package. If N is the number of times to repeat each value, and M is the maximum value to repeat, then you can do
import funcy as fp
fp.flatten(fp.repeat(i, N) for i in range(M + 1))
This will return a generator, so to get the array you can just call list() around it
sum([[i]*n for i in range(0,x)], [])
The following piece of code is the simplest version I can think of.
It’s a bit dirty and long, but it gets the job done.
In my opinion, it’s easier to comprehend.
def mklist(s, n):
l = [] # An empty list that will contain the list of elements
# and their duplicates.
for i in range(s): # We iterate from 0 to s
for j in range(n): # and appending each element (i) to l n times.
l.append(i)
return l # Finally we return the list.
If you run the code …:
print mklist(10, 2)
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
print mklist(5, 3)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4
Another version a little neater, with list comprehension.
But uhmmm… We have to sort it though.
def mklist2(s, n):
return sorted([l for l in range(s) * n])
Running that version will give the following results.
print mklist2(5, 3)
Raw : [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Sorted: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]

Efficient way to look in list of lists?

I am continuously creating a randomly generated list, New_X of size 10, based on 500 columns.
Each time I create a new list, it must be unique, and my function NewList only returns New_X once it hasn't already been created and appended to a List_Of_Xs
def NewList(Old_List):
end = True
while end == True:
""" Here is code that generates my new sorted list, it is a combination of elements
from Old_List and the other remaining columns,
but the details aren't necessary for this question. """
end = (List_Of_Xs == np.array([New_X])).all(axis=1).any()
List_Of_Xs.append(New_X)
return New_X
My question is, is the line end = (List_Of_Xs == np.array([New_X])).all(axis=1).any() an efficient way of looking in List_Of_Xs?
My List_Of_Xs can grow to a size of over 100,000 lists long, so I am unsure if this is inefficient or not.
Any help would be appreciated!
As I observed in a comment, the array comparison is potentially quite slow, especially as the list gets large. It has to create arrays each time, which consumes time.
Here's a set implementation
Function to create a 10 element list:
def foo(N=10):
return np.random.randint(0,10,N).tolist()
Function to generate lists, and print the unique ones
def foo1(m=10):
Set_of_Xs = set()
while len(Set_of_Xs)<m:
NewX = foo(10)
tx = tuple(NewX)
if not tx in Set_of_Xs:
print(NewX)
Set_of_Xs.add(tx)
return Set_of_Xs
Sample run. As written it doesn't show if there are duplicates.
In [214]: foo1(5)
[9, 4, 3, 0, 9, 4, 9, 5, 6, 3]
[1, 8, 0, 3, 0, 0, 4, 0, 0, 5]
[6, 7, 2, 0, 6, 9, 0, 7, 0, 8]
[9, 5, 6, 3, 3, 5, 6, 9, 6, 9]
[9, 2, 6, 0, 2, 7, 2, 0, 0, 4]
Out[214]:
{(1, 8, 0, 3, 0, 0, 4, 0, 0, 5),
(6, 7, 2, 0, 6, 9, 0, 7, 0, 8),
(9, 2, 6, 0, 2, 7, 2, 0, 0, 4),
(9, 4, 3, 0, 9, 4, 9, 5, 6, 3),
(9, 5, 6, 3, 3, 5, 6, 9, 6, 9)}
So let me get this straight since the code doesn't appear complete:
1. You have an old list that is constantly growing with each iteration
2. You calculate a list
3. You compare it against each of the lists in the old list to see if you should break the loop?
One option is to store the lists in a set instead of a List of List.
Comparing an element against all the elements of a list would be an O(n) operation each iteration. Using a set it should be O(1) avg... Although you may be getting O(n) every iteration until the last.
Other thoughts would be to calculate the md5 of each element and compare those so you're not comparing the full lists.

Categories