Reduce a list in a specific way - python

I have a list of strings which looks like this:
['(num1, num2):1', '(num3, num4):1', '(num5, num6):1', '(num7, num8):1']
What I try to achieve is to reduce this list and combine every two elements and I want to do this until there is only one big string element left.
So the intermediate list would look like this:
['((num1, num2):1,(num3, num4):1)', '((num5, num6):1,(num7, num8):1)']
The complicated thing is (as you can see in the intermediate list), that two strings need to be wrapped in paranthesis. So for the above mentioned starting point the final result should look like this:
(((num_1,num_2):1,(num_3,num_4):1),((num_5,num_6):1,(num_7,num_8):1))
Of course this should work in a generic way also for 8, 16 or more string elements in the starting list. Or to be more precise it should work for an=2(n+1).
Just to be very specific how the result should look with 8 elements:
'((((num_1,num_2):1,(num_3,num_4):1),((num_5,num_6):1,(num_7,num_8):1)),(((num_9,num_10):1,(num_11,num_12):1),((num_13,num_14):1,(num_15,num_16):1)))'
I already solved the problem using nested for loops but I thought there should be a more functional or short-cut solution.
I also found this solution on stackoverflow:
import itertools as it
l = [map( ",".join ,list(it.combinations(my_list, l))) for l in range(1,len(my_list)+1)]
Although, the join isn't bad, I still need the paranthesis. I tried to use:
"{},{}".format
instead of .join but this seems to be to easy to work :).
I also thought to use reduce but obviously this is not the right function. Maybe one can implement an own reduce function or so?
I hope some advanced pythonics can help me.

Sounds like a job for the zip clustering idiom: zip(*[iter(x)]*n) where you want to break iterable x into size n chunks. This will discard "leftover" elements that don't make up a full chunk. For x=[1, 2, 3], n=2 this would yield (1, 2)
def reducer(l):
while len(l) > 1:
l = ['({},{})'.format(x, y) for x, y in zip(*[iter(l)]*2)]
return l
reducer(['(num1, num2):1', '(num3, num4):1', '(num5, num6):1', '(num7, num8):1'])
# ['(((num1, num2):1,(num3, num4):1),((num5, num6):1,(num7, num8):1))']

This is an explanation of what is happening in zip(*[iter(l)]*2)
[iter(l)*2] This creates an list of length 2 with two times the same iterable element or to be more precise with two references to the same iter-object.
zip(*...) does the extracting. It pulls:
Loop
the first element from the first reference of the iter-object
the second element from the second reference of the iter-object
Loop
the third element from the first reference of the iter-object
the fourth element from the second reference of the iter object
Loop
the fifth element from the first reference of the iter-object
the sixth element from the second reference of the iter-object
and so on...
Therefore we have the extracted elements available in the for-loop and can use them as x and y for further processing.
This is really handy.
I also want to point to this thread since it helped me to understand the concept.

Related

How to keep on updating a specific list within the for loop?

I'm very new to the world of programming, I've been trying to solve a specific python academic exercise but I ran into an obstacle.
The problem is that I need to generate a lucky numbers sequence, as in the user inputs a sequence [1,...,n] and these steps happen:
Every second element is removed
Every third element is removed
Every fourth element is removed
.
.
.
When it becomes impossible to remove more numbers, the numbers left in the list are "lucky".
This is my code:
def lucky(l):
index = 2
new_list = []
while(index<len(l)):
for i in range(len(l)):
if(i%index==0):
new_list.append(l[i])
index=index+1
return new_list
The while loop is to have the final condition when " it is impossible to remove more numbers". However with every iteration, the list gets shorter more and more, but I don't know how to do it.
My code works for the first condition when index=2(remove every 2nd element), then in the following loops it doesn't work because:
It is still limited by length of the original list.
new_list.append(l[i]) will just add more elements to the new_list, rather than updating it in its place.
I don't know how to update the list without creating multiple amounts of lists and with each iteration adding the new elements to a new list.
Any help is appreciated.
You could use del with appropriate list slicing (see the manual for more details) to update the list in-place:
def lucky(l):
interval = 2
while interval <= len(l):
del l[interval-1::interval]
interval += 1
I am not sure if I understand your question correctly, but you can remove items from your original list via del l[index], where index is the index of the element to be removed.
For more details on lists look here:
https://docs.python.org/3/tutorial/datastructures.html
import math
def lucky(l, index):
for i in range(math.floor(len(l)/index)):
del l[(i+1)*(index-1)]
Not sure if the code will work, as I cannot test it right now. But I think it should work somehow like that.
EDIT:
Tested it and the code works. If you want to run all three steps, just do:
l = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
lucky(l,2)
lucky(l,3)
lucky(l,4)
print(l)
>>>[1,3,7,13,15]

Getting the indexes of an element in a subset

I have an list and a subset of it, and want to find the index of each element in the subset. I have currently tried this code:
def convert_toindex(listof_elements, listof_indices):
for i in range(len(listof_elements)):
listof_elements[:] = [listof_indices.index(x) for x in listof_elements]
return listof_elements
list1 = ['lol', 'please', 'help']
list2 = ['help', 'lol', 'please', 'extra']
What I want to happen when I do convert_toindex(list1, list2) is the output to be [2, 0, 1]
However, when I do this I get a ValueError: '0' is not in list.
0, however, appears nowhere in either list so I am not sure why this is happening.
Secondly, if I have a list of lists, and I want to do this process all the nested lists inside the big list, would I do something like this?
for smalllist in biglist:
smalllist[:] = [dict_of_indices[x] for x in smalllist]
Where dict_of_indices is the dictionary of indices created following the top answer.
The problem is that, instead of doing this one times, you're doing it over and over, N times:
for i in range(len(listof_elements)):
listof_elements[:] = [listof_indices.index(x) for x in listof_elements]
The first time through, you replace every value in listof_elements with its index in listof_indices. So far, so good. In fact, you should be done there.
But then you do it a second time. You look up each of those indices, as if they were values, in listof_indices. And some of them aren't there. So you get an error.
You can solve this just by removing the outer loop. You're already done after the first time.
You may be confused because this problem seems to inherently require two loops—but you already do have two loops. The first is the obvious one in the list comprehension, and the second one is the one hidden inside listof_indices.index.
While we're at it: while this problem does require two loops, it doesn't require them to be nested.
Instead of looping over listof_indices to find each x, you can loop over it in advance to build a dictionary:
dict_of_indices = {value: index for index, value in enumerate(listof_indices)}
And then just do a direct lookup in that dictionary:
listof_elements[:] = [dict_of_indices[x] for x in listof_elements]
Besides being a whole lot faster (O(N+M) time rather than O(N*M)), I think this might also easier be to understand, and to debug. The first line may be a bit tricky, but you can easily print out the dict and verify that it's correct. And then the second line is about as trivial as you can get.

Python: check if item exists in variable amount of lists

I'm working on a small search engine and I'm lost a certain point. I have multiple lists containing items, and I want to check which items exist in all lists. The amount of lists can vary, since they are created based on the number of words in the search query, done with:
index_list = [[] for i in range((len(query)+1))]
I figured I start with finding out what the shortest list is, since that is the maximum amount of items that need to be checked. So for example, with a three-word-search-query:
index_list[1]=[set(1,2,3,4,5)]
index_list[2]=[set(3,4,5,6,7)]
index_list[3]=[set(4,5,6,7)]
shortest_list = index_list[3]
(What the shortest list is, is figured out with a function, not relevant for now).
Now I want to check if the items of the shortest list, index_list[3], also exist in the other lists. In this case there are 3 lists in total, but when entering a longer search query, the amount of lists increase. I thought to do something with loops, like:
result = []
for element in shortest_list:
for subelement in element:
for element2 in index_list[1]:
if subelement in element2:
for element3 in index_list[2]:
if subelement in element3:
result.append(subelement)
So, the result should be:
[4, 5]
since these items exist in all lists.
But, the loop above won't work when there are more lists. As described earlier, I don't know the amount of lists beforehand because it depends on the amount of words in the search query. So basically the depth of my loop depends on the amount of lists I have.
When doing research I found some postings suggesting recursion may do the job. Unfortunately I'm not Python skilled that well.
Any suggestions?
Thanks in advance!
Just use all sets and use set.intersection to find the common elements, also {1,2,3,4,5} is how to create a set of ints not set(1,2,3,4,5):
index_list = [set() for i in range(4)]
index_list[0].update({1,2,3,4,5})
index_list[1].update({3,4,5,6,7})
index_list[2].update({4,5,6,7})
shortest_list = index_list[2]
print(shortest_list.intersection(*index_list[:2]))
set([4, 5])
Try to go about it the opposite way: First make a list of all the index lists by doing something like
index_list_list = []
for ix_list in get_index_lists(): #Or whatever
index_list_list.append(ix_list)
Then you can loop through all of these, removing the elements in your 'remaining_items' list if they are not contained in the others:
remaining_items = shortest_list
for index_list in index_list_list:
curr_remaining_items = copy(remaining_items)
for element in curr_remaining_items:
if element not in index_list:
remaining_items.remove(element)
Your final 'remaining_items' list would then contain the elements that are common to all the lists.
I written code by your approach. You can try out following code:
index_list=['1','2','3','4','5']
index_list1=['3','4','5','6','7']
index_list2=['4','5','6','7']
result = []
for element in index_list:
for subelement in element:
for element2 in index_list1:
if subelement in element2:
for element3 in index_list2:
if subelement in element3:
result.append(subelement)
print result
output:
['4', '5']
It is a little confusing that you appear to have something shadowing the built in type set, which happens to be built for precisely this type of job.
subset = set(shortest_list)
# Use map here to only lookup method once.
# We don't need the result, which will be a list of None.
map(subset.intersection_update, index_lists)
# Alternative: describe the reduction more directly
# Cost: rebuilds a new set for each list
subset = reduce(set.intersection, index_lists, set(shortest_list))
Note: As Padraic indicated in his answer, set.intersection and set.intersection_update both take an arbitrary number of arguments so it is unnecessary to use map or reduce in this case.
It is also by far preferable that all the lists already be sets, since the intersection can be optimized to the size of the smaller set, but a list intersection requires scanning the list.

Most efficient way to cycle over Python sublists while making them grow (insert method)?

My problem is about managing insert/append methods within loops.
I have two lists of length N: the first one (let's call it s) indicates a subset to which, while the second one represents a quantity x that I want to evaluate. For sake of simplicity, let's say that every subset presents T elements.
cont = 0;
for i in range(NSUBSETS):
for j in range(T):
subcont = 0;
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T)+cont, s[(i*T)+j+cont]);
x.insert(((i+1)*T)+cont, x[(i*T)+j+cont]);
subcont += 1;
cont += subcont;
While cycling over all the elements of the two lists, I'd like that, when a certain condition is fulfilled (e.g. x[i] < 100), a copy of that element is put at the end of the subset, and then going on with the loop till completing the analysis of all the original members of the subset. It would be important to maintain the "order", i.e. inserting the elements next to the last element of the subset it comes from.
I thought a way could have been to store within 2 counter variables the number of copies made within the subset and globally, respectively (see code): this way, I could shift the index of the element I was looking at according to that. I wonder whether there exists some simpler way to do that, maybe using some Python magic.
If the idea is to interpolate your extra copies into the lists without making a complete copy of the whole list, you can try this with a generator expression. As you loop through your lists, collect the matches you want to append. Yield each item as you process it, then yield each collected item too.
This is a simplified example with only one list, but hopefully it illustrates the idea. You would only get a copy if you do like i've done and expand the generator with a comprehension. If you just wanted to store or further analyze the processed list (eg, to write it to disk) you could never have it in memory at all.
def append_matches(input_list, start, end, predicate):
# where predicate is a filter function or lambda
for item in input_list[start:end]:
yield item
for item in filter(predicate, input_list[start:end]):
yield item
example = lambda p: p < 100
data = [1,2,3,101,102,103,4,5,6,104,105,106]
print [k for k in append_matches (data, 0, 6, example)]
print [k for k in append_matches (data, 5, 11, example)]
[1, 2, 3, 101, 102, 103, 1, 2, 3]
[103, 4, 5, 6, 104, 105, 4, 5, 6]
I'm guessing that your desire not to copy the lists is based on your C background - an assumption that it would be more expensive that way. In Python lists are not actually lists, inserts have O(n) time as they are more like vectors and so those insert operations are each copying the list.
Building a new copy with the extra elements would be more efficient than trying to update in-place. If you really want to go that way you would need to write a LinkedList class that held prev/next references so that your Python code really was a copy of the C approach.
The most Pythonic approach would not try to do an in-place update, as it is simpler to express what you want using values rather than references:
def expand(origLs) :
subsets = [ origLs[i*T:(i+1)*T] for i in range(NSUBSETS) ]
result = []
for s in subsets :
copies = [ e for e in s if e<100 ]
result += s + copies
return result
The main thing to keep in mind is that the underlying cost model for an interpreted garbage-collected language is very different to C. Not all copy operations actually cause data movement, and there are no guarantees that trying to reuse the same memory will be successful or more efficient. The only real answer is to try both techniques on your real problem and profile the results.
I'd be inclined to make a copy of your lists and then, while looping across the originals, as you come across a criteria to insert you insert into the copy at the place you need it to be at. You can then output the copied and updated lists.
I think to have found a simple solution.
I cycle from the last subset backwards, putting the copies at the end of each subset. This way, I avoid encountering the "new" elements and get rid of counters and similia.
for i in range(NSUBSETS-1, -1, -1):
for j in range(T-1, -1, -1):
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T), s[(i*T)+j])
x.insert(((i+1)*T), x[(i*T)+j])
One possibility would be using numpy's advanced indexing to provide the illusion of copying elements to the ends of the subsets by building a list of "copy" indices for the original list, and adding that to an index/slice list that represents each subset. Then you'd combine all the index/slice lists at the end, and use the final index list to access all your items (I believe there's support for doing so generator-style, too, which you may find useful as advanced indexing/slicing returns a copy rather than a view). Depending on how many elements meet the criteria to be copied, this should be decently efficient as each subset will have its indices as a slice object, reducing the number of indices needed to keep track of.

if a in list (I want to add something to the element in list) Python

As the topic states:
list = ["a", "b"]
element = "ac"
Can I use the:
if element in list:
If element is equal to the element in (list + "c")
Pseudocode to what I want to achieve:
if element in (list+c)
What is the best way to get this behavior in python?
Edit: I know there are many ways to get around this, but can this be done in one line as the code above.
More efficient would be:
if any(x+'c' == element for x in your_list):
as it avoids scanning through the list twice (once to make the "+c" versions, once to check if element is in the resulting list). It'll also "short-circuit" (that is, quickly move on) if it finds the element before going through the entire list.
P.S. - it's best not to name variables list, since that's already the name for the actual list type.
if element in [elem + 'c' for elem in my_list]:
# ...
Never a good practice to call a variable list (or int, float, map, tuple, etc.), because you are loosing those built-in types.
if element[0] in list:
You don't want to add "c" to every item in the list and check to see whether "ac" is in the resut; you want to check to see if the first letter of "ac" is in the list. It's the same thing except a lot easier.
if element[:-1] in list:
It is better to calculate the element without 'c'. So you are making just one calculation.

Categories