python randomize preserving every 3 items (list of lists?) - python

I have a list that looks like this:
important stuff = [1, 1, '539287_640214358329_682457984_n.jpg',
1, 2, '10273290_745672065239_6510327149011099172_o.jpg',
1, 3,'196453_640214498049_2103349152_n.jpg',
1, 4, '1277816_699439470729_877539164_o.jpg',
1, 5, '10682279_777090163119_2043260231272895742_o.jpg',
1, 6,'736323_656687181659_1199237329_o.jpg',
1, 7, '185184_640214403239_1313590472_n.jpg',
1, 8, '1898786_730004488189_837817798_o.jpg']
I need a way to shuffle it keeping the "rows" (aka every 3 values) constant. The two numbers need to stay associated with the same .jpg.
Is the best way to do this to create a list of lists? how would that work? I found answers to creating a flat list from a list of lists, but I need to go in the opposite direction.

A list of lists is probably one of the easier ways of handling this. Assuming your initial list is properly formatted according to your description (two numbers to one string), you could do something like this:
from random import shuffle
listOfLists = []
for i in range(0,len(importantStuff)/3):
listOfLists.append([importantStuff[i*3+0],importantStuff[i*3+1],importantStuff[i*3+2]])
shuffle(listOfLists)
singleList = listOfLists[0]
singleItem = listOfLists[0][2]
For more generic cases use variables instead of hardcoded values

import random
# list of lists
data = [data[x:x+3] for x in range(0, len(data),3)]
# shuffle.
random.shuffle(data,random.random)
# OR keep them in dictionary with filename as key.
foo = {}
for i in data:
foo[i[2]] = i[:2]
print foo
It's really your choice.
Personally I would keep it as dictionary for fast look up and organization.

Related

How to iterate over a list of suffixes to add to the end of each variable in a list?

What I'm trying to do is optimize my current code that uses dataframes. The dataframes follow a similar naming convention, so instead of explicitly naming them, I'm trying to create them on the fly and then process them.
days = [1, 2, 3, 4, 5]
#1,2,3,4,5 are dataframes
endings = ['_operation1', '_operation2']
suffixedFrames = [x.add_suffix(y) for x, y in zip(days, endings)]
print(suffixedFrames)
The problem is that when I run this, it only prints out two dataframes, specifically, 1_operation1 and 1_operation2. How do I make it add the suffix to every dataframe in list days?
Zip does not work as you want to, but it creates an iterator of tuples, where the i-th tuple contains the i-th element of the 2 list you're using, as you can see in the documentation. In your case if you do print(list(zip(days, endings))) you will see this:
[(1, '_operation1'), (2, '_operation2')].
In order to achieve what you want you can do as follows:
import itertools
days = [1, 2, 3, 4, 5]
#1,2,3,4,5 are dataframes
endings = ['_operation1', '_operation2']
suffixedFrames = [x.add_suffix(y) for x, y in list(itertools.product(days, endings))]
print(suffixedFrames)

Iterate overs values in nested list

I'm working on scientific data and using a module called pysam in order to get reference position for each unique "object" in my file.
In the end, I obtain a "list of lists" that looks like that (here I provide an example with only two objects in the file):
pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]
and, for each list in pos, I would like to iterate over the values and compare value[i] with value[i+1]. When the difference is greater than 2 (for example) I want to store both values (value[i] and value[i+1]) into a new list.
If we call it final_pos then I would like to obtain:
final_pos = [[3,6,8,15,17,20],[1,5,8,20]]
It seemed rather easy to do, at first, but I must be lacking some basic knowledge on how lists works and I can't manage to iterate over each values of each list and then compare consecutive values together..
If anyone has an idea, I'm more than willing to hear about it !
Thanks in advance for your time !
EDIT: Here's what I tried:
pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]
final_pos = []
for list in pos:
for value in list:
for i in range(len(list)-1):
if value[i+1]-value[i] > 2:
final_pos.append(value[i])
final_pos.append(value[i+1])
You can iterate over each of the individual list in pos and then compare the consecutive values. When you need to insert the values, you can use a temporary set because you wouldn't want to insert the same element twice in your final list. Then, you can convert the temporary set to a list and append it to your final list (after sorting it, to preserve order). Also, the sorting will only work if the elements in the original list is actually sorted.
pos = [[1,2,3,6,7,8,15,16,17,20],[1,5,6,7,8,20]]
final_pos = []
for l in pos:
temp_set = set()
for i in range(len(l)-1):
if l[i+1] - l[i] > 2:
temp_set.add(l[i])
temp_set.add(l[i+1])
final_pos.append(sorted(list(temp_set)))
print(final_pos)
Output
[[3, 6, 8, 15, 17, 20], [1, 5, 8, 20]]
Edit: About what you tried:
for list in pos:
This line will give us list = [1,2,3,6,7,8,15,16,17,20] (in the first iteration)
for value in list:
This line will give us value = 1 (in the first iteration)
Now, value is just a number not a list and hence, value[i] and value[i+1] doesn't make sense.
Your code has an obvious "too many loop" issues. It also stores the result as a flat list, you need a list of lists.
It has also a more subtle bug: a same index can be added more than once if 2 intervals match in a row. I've registered the added indices in a set to avoid this.
The bug doesn't show with your original data (which tripped a lot of experienced users, including me), so I've changed it:
pos = [[1,2,3,6,7,8,11,15,16,17,20],[1,5,6,7,8,20]]
final_pos = []
for value in pos:
sublist = []
added_indexes = set()
for i in range(len(value)-1):
if value[i+1]-value[i] > 2:
if not i in added_indexes:
sublist.append(value[i])
## added_indexes.add(i) # we don't need to add it, we won't go back
# no need to test for i+1, it's new
sublist.append(value[i+1])
# registering it for later
added_indexes.add(i+1)
final_pos.append(sublist)
print(final_pos)
result:
[[3, 6, 8, 11, 15, 17, 20], [1, 5, 8, 20]]
Storing the indexes in a set, and not the values (which would also work here, with some post-processing sort, see this answer) also would work when objects aren't hashable (like custom objects which have a custom distance implemented between them) or only partially sorted (waves) if it has some interest (ex: pos = [[1,2,3,6,15,16,17,20,1,6,10,11],[1,5,6,7,8,20,1,5,6,7,8,20]])

Combinations of several dicts python

I have 15 dicts like the following 3 (all 15 are of varying lengths).
For example:
HQDict = {'HQ1':10, 'HQ2':3, 'HQ3':5}
BADict = {'BA1':15, 'BA2':4, 'BA3':3}
STDict = {'ST1':5, 'ST2':4, 'ST3':3}
I want to create all the possible combinations of the 15 dicts with only one element selected from each dict with the values added together and the keys stored in a list. I have been able to get all the information into the respective dicts but I am clueless on where to start with the combinations, I have seen itertools.combinations but I'm not sure how to make it only select 1 element from each dict. If you need any more information please ask and I will be happy to edit.
Edit1:
I also needed to add that the values are additive so value of BA2 will be the value of BA1 + BA2 and that the combination could be a list of 1.
list=[HQ1,BA2,ST1]
value=34
next permutation
list=[HQ2]
value=13
Edit2:
Rather than try and create the combinations of the dicts the end goal is to give the function a total and it will return all the possible combinations of buildings (each dict represents a building and each item in the dict a level) that add up to that total. So for example:
combinations(34) would return
[HQ1,BA2,ST1]
and combinations(13 would return
[HQ2]
pastebin to file containing all buildings and code im using to create the dicts : link to pastebin
I have seen itertools.combinations but I'm not sure how to make it only select 1 element from each dict.
Use itertools.product(..) instead. It takes a varying list of arguments each corresponding to a list of options to pick in one iteration:
>>> map(dict, product(HQDict.items(), STDict.items(), BADict.items()))
[{'HQ1': 10, 'BA2': 4, 'ST1': 5}, {'HQ1': 10, 'ST1': 5, 'BA3': 3}, ...... ]
If you have 15 such dicts, I'd suggest putting all of them in a list, and calling product like below:
>>> map(dict, product(*list_of_dicts))
EDIT: In python3, you will get a map object back, and you'll have to iterate over it to get the actual values. You can convert it to a list, but will defeat the purpose of map returning something that you can iterate over. You can convert to a list like:
>>> [dict(x) for x in product(HQDict.items(), STDict.items(), BADict.items())]
[{'HQ1': 10, 'BA2': 4, 'ST1': 5}, {'HQ1': 10, 'ST1': 5, 'BA1': 15}, ..]

Make python set discard tuples with the same content, regardless of their order [duplicate]

I'm trying to make a set of sets in Python. I can't figure out how to do it.
Starting with the empty set xx:
xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)
but I get
TypeError: unhashable type: 'list'
or
TypeError: unhashable type: 'set'
Is it possible to have a set of sets in Python?
I am dealing with a large collection of sets and I want to be able to not have to deal duplicate sets (a set B of sets A1, A2, ...., An would "cancel" two sets if Ai = Aj)
Python's complaining because the inner set objects are mutable and thus not hashable. The solution is to use frozenset for the inner sets, to indicate that you have no intention of modifying them.
xx = set([])
# Nested sets must be frozen
elements = frozenset([2,3,4])
xx.add(elements)
People already mentioned that you can do this with a frozenset(), so I will just add a code how to achieve this:
For example you want to create a set of sets from the following list of lists:
t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]
you can create your set in the following way:
t1 = set(frozenset(i) for i in t)
Use frozenset inside.
So I had the exact same problem. I wanted to make a data structure that works as a set of sets. The problem is that the sets must contain immutable objects. So, what you can do is simply make it as a set of tuples. That worked fine for me!
A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

Deduping a complex list using a simplified copy of itself

I have two lists of strings that are passed into a function. They are more or less the same, except that one has been run through a regex filter to remove certain boilerplate substrings (e.g. removing 'LLC' from 'Blues Brothers LLC').
This function is meant to internally deduplicate the modified list and remove the associated item in the non-modified list. You can assume that these lists were sorted alphabetically before being run through the regex filter, and remain in the same order (i.e. original[x] and modified[x] refer to the same entity, even if original[x] != modified[x]). Relative order must be maintained between the two lists in the output.
This is what I have so far. It works 99% of the time, except for very rare combinations of inputs and boilerplate strings (1 in 1000s) where some output strings will be mismatched by a single list position. Input lists are 'original' and 'modified'.
# record positions of duplicates so we're not trying to modify the same lists we're iterating
dellist_modified = []
dellist_original = []
# probably not necessary, extra precaution against modifying lists being iterated.
# fwiw the problem still exists if I remove these and change their references in the last two lines directly to the input lists
modified_copy = modified
original_copy = original
for i in range(0, len(modified)-1):
if modified[i] == modified[i+1]:
dellist_modified.append(modified[i+1])
dellist_original.append(original[i+1])
for j in dellist_modified:
if j in modified:
del modified_copy[agg_match.index(j)]
del original_copy[agg_match.index(j)]
# return modified_copy and original_copy
It's ugly, but it's all I got. My testing indicates the problem is created by the last chunk of code.
Modifications or entirely new approaches would be greatly appreciated. My next step is to try using dictionaries.
Here is a clean way of doing this:
original = list(range(10))
modified = list(original)
modified[5] = "a"
modified[6] = "a"
def without_repeated(original, modified):
seen = set()
for (o, m) in zip(original, modified):
if m not in seen:
seen.add(m)
yield o, m
original, modified = zip(*without_repeated(original, modified))
print(original)
print(modified)
Giving us:
(0, 1, 2, 3, 4, 5, 7, 8, 9)
(0, 1, 2, 3, 4, 'a', 7, 8, 9)
We iterate through both lists at the same time. We keep a set of items we have seen (sets have very fast checks for ownership) and then yields any results that we haven't already seen.
We can then use zip again to give us two lists back.
Note we could actually do this like so:
seen = set()
original, modified = zip(*((o, m) for (o, m) in zip(original, modified) if m not in seen and not seen.add(m)))
This works the same way, except using a single generator expression, with adding the item to the set hacked in using the conditional statement (as add always returns false, we can do this). However, this method is considerably harder to read and so I'd advise against it, just an example for the sake of it.
A set in python is a collection of distinct elements. Is the order of these elements critical? Something like this may work:
distinct = list(set(original))
Why use parallel lists? Why not a single list of class instances? That keeps things grouped easily, and reduces your list lookups.

Categories