My shuffle algorithm crashes when more trials than objects - python

I am trying to make an experiment where a folder is scanned for images. For each trial, a target is shown and some (7) distractor images. Afterward, in half the trials people are shown the target image and in the other half, they are shown an image that wasn't in the previous display.
My current code sort of works, but only if there are fewer trials than objects:
repeats = 20
# Scan dir for images
jpgs = []
for path, dirs, files in os.walk(directory):
for f in files:
if f.endswith('.jpg'):
jpgs.append(f)
# Shuffle up jpgs
np.random.shuffle(jpgs)
# Create list with target and probe object, Half random, half identical
display = []
question = []
sameobject = []
position = np.repeat([0,1,2,3,4,5,6,7], repeats)
for x in range(1,(repeats*8)+1):
display.append(jpgs[x])
if x % 2 == 0:
question.append(jpgs[-x])
sameobject.append(0)
else:
question.append(jpgs[x])
sameobject.append(1)
# Concatonate objects together
together = np.c_[display,question,position,sameobject]
np.random.shuffle(together)
for x in together:
# Shuffle and set image
np.random.shuffle(jpgs)
myList = [i for i in jpgs if i != together[trial,0]]
myList = [i for i in myList if i != together[trial,1]]
# Set correct image for target
myList[int(together[trial,2])] = together[trial,0]
First of all, I am aware that this is horrible code. But it gets the job done coarsely. With 200 jpgs and a repeat of 20, it works. If repeat is set to 30 it crashes.
Here is an example with repeat too high:
File "H:\Code\Stims\BetaObjectPosition.py", line 214, in <module>
display.append(jpgs[x])
IndexError: list index out of range
Is there a way to update my code in a way that allows more trials while all objects are used as evenly as possible (one object should not be displayed 3 times while another is displayed 0) over an entire experiment?
Full, reproducible example
Bonus points if anyone can see an obvious way to balance the way the 7 distractor images are selected too.
Thanks for taking your time to read this. I hope you can help me onwards.

The solution that changes your code the least should be to change each call of jpgs[x] to jpgs[x % len(jpgs)]1. This should get rid of the IndexError; it basically wraps the list index "around the edges", making sure it's never to large. Although I'm not sure how it will interact with the jpgs[-x] call.
An alternative would be to implement a class that produces a longer sequence of objects from a shorter one.
Example:
from random import shuffle
class InfiniteRepeatingSequence(object):
def __init__(self, source_list):
self._source = source_list
self._current = []
def next(self):
if len(self._current) == 0:
# copy the source
self._current = self._source[:]
shuffle(self._current)
# get and remove an item from a list
return self._current.pop()
This class repeats the list indefinitely. It makes sure to use each element once before re-using the list.
It can easily be turned into an iterator (try changing next to __next__). But be careful since the class above produces an infinite sequence of elements.
1 See "How does % work in Python?" for an explanation about the modulo operator.
Edit: Added link to modulo question.

Related

Inexplicable error deleting elements from a list after taking them from that same list

I am currently having an error deleting 2 elements from a list after taking them from that same list. The problem does not seem difficult, and it is not, but I getting an error that I am not able to explain.
Initially I have two list: list_h (with 30.000 elements) and list_v (with 60.000 elements). The problem here is just to iterate, choosing each iteration between list_h or list_v until both of the lists are empty. The choice between both lists is made randomly each iteration and a condition must be fulfilled: if list_h is chosen, one element is added to a final list and deleted from the list it came from and if list_v is chosen, we have to choose one pair of elements, adding them in the same way as an element to that same final list and deleted from list_v. Obviously, such a pair of elements cannot consist of the same element twice.
The whole process takes a few tens of minutes, but despite the fact that for most of the elements there is no error at all, surprisingly every time I try it, in some iterations an error occurs showing that an element has been tried to be deleted twice. In fact, I print the elements before deleting them to verify that they are indeed not the same, but the function (which follows immediately after printing the elements) receives two identical element that are neither of the two that were just printed.
Example:
def add_vertical_picture(pic1, pic2, f_s_order, pic_left):
try:
pic_left.remove(pic1)
pic_left.remove(pic2)
f_s_order.add((pic1, pic2))
except Exception as e:
print('V Error in: ', pic1.show_pic_info())
return pic_left, f_s_order
def find_nextpic_maxscore(last_pic, orientation_order_l):
best_score = 0
best_next_pic = None
pic2 = None
for i in range(len(orientation_order_l)):
score_v = get_2pictures_score(last_pic, orientation_order_l[i])
if score_v>best_score:
best_score = score_v
best_next_pic = orientation_order_l[i]
if orientation_order_l[0].orientation == 'V':
for s in [orientation_order_l[i] for i in range(len(orientation_order_l)) if orientation_order_l[i].id_num != best_next_pic.id_num]:
if s.id_num != best_next_pic.id_num:
pic2 = s
return best_next_pic, pic2
while len(Hpictures_left)>0 or len(Vpictures_left)>0:
V_prob = len(Vpictures_left)/(len(Vpictures_left)+len(Hpictures_left))
if np.random.random_sample()<V_prob: # Vertical photo
best_next_Vpic, best2_next_Vpic = find_nextpic_maxscore(list(final_slice_order)[-1], Vpictures_left)
print('Best1: ', best_next_Vpic.show_pic_info(),' -- Best2:', best2_next_Vpic.show_pic_info())
Vpictures_left, final_slice_order = add_vertical_picture(best_next_Vpic, best2_next_Vpic, final_slice_order, Vpictures_left)
And there is more! Every time I execute the program, the numbers of errors that occur changes, sometimes there are only two errors, but sometimes the number increases to 10 or more... How can I explain this?
Thanks you all in advance

Extracting multiple data from a single list

I working on a text file that contains multiple information. I converted it into a list in python and right now I'm trying to separate the different data into different lists. The data is presented as following:
CODE/ DESCRIPTION/ Unity/ Value1/ Value2/ Value3/ Value4 and then repeat, an example would be:
P03133 Auxiliar helper un 203.02 417.54 437.22 675.80
My approach to it until now has been:
Creating lists to storage each information:
codes = []
description = []
unity = []
cost = []
Through loops finding a code, based on the code's structure, and using the code's index as base to find the remaining values.
Finding a code's easy, it's a distinct type of information amongst the other data.
For the remaining values I made a loop to find the next value that is numeric after a code. That way I can delimitate the rest of the indexes:
The unity would be the code's index + index until isnumeric - 1, hence it's the first information prior to the first numeric value in each line.
The cost would be the code's index + index until isnumeric + 2, the third value is the only one I need to store.
The description is a little harder, the number of elements that compose it varies across the list. So I used slicing starting at code's index + 1 and ending at index until isnumeric - 2.
for i, carc in enumerate(txtl):
if carc[0] == "P" and carc[1].isnumeric():
codes.append(carc)
j = 0
while not txtl[i+j].isnumeric():
j = j + 1
description.append(" ".join(txtl[i+1:i+j-2]))
unity.append(txtl[i+j-1])
cost.append(txtl[i+j])
I'm facing some problems with this approach, although there will always be more elements to the list after a code I'm getting the error:
while not txtl[i+j].isnumeric():
txtl[i+j] list index out of range.
Accepting any solution to debug my code or even new solutions to problem.
OBS: I'm also going to have to do this to a really similar data font, but the code would be just a sequence of 7 numbers, thus harder to find amongst the other data. Any solution that includes this facet is also appreciated!
A slight addition to your code should resolve this:
while i+j < len(txtl) and not txtl[i+j].isnumeric():
j += 1
The first condition fails when out of bounds, so the second one doesn't get checked.
Also, please use a list of dict items instead of 4 different lists, fe:
thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})
In this way you always have the correct values together in the list, and you don't need to keep track of where you are with indexes or other black magic...
EDIT after comment interactions:
Ok, so in this case you split the line you are processing on the space character, and then process the words in the line.
from pprint import pprint # just for pretty printing
textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []
def handle_line(textl: str):
description = ''
unity = None
values = []
for word in textl.split()[1:]:
# it splits on space characters by default
# you can ignore the first item in the list, as this will always be the code
# str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
if not word.replace(',', '').replace('.', '').isnumeric():
if len(description) == 0:
description = word
else:
description = f'{description} {word}' # I like f-strings
elif not unity:
# if unity is still None, that means it has not been set yet
unity = word
else:
values.append(word)
return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}
the_list.append(handle_line(textl))
pprint(the_list)
str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296

Trialhandler and time measuring in psychopy

For a go-NoGo Task I want to organize pictures with the data.TrialHandler class from psychopy:
trials = data.TrialHandler(ImageList, nReps=1, method='random')
Now I want to code a loop in which psychopy is going into the dictionary, is presenting the first set of pictures (e.g. A_n) and afterwards is going to the second set until the sixth set. I tried the following:
import glob, os, random, sys, time
import numpy.random as rnd
from psychopy import visual, core, event, monitors, gui, logging, data
im_a = glob.glob('./a*') # upload pictures of the a-type, gives out a List of .jpg-files
im_n = glob.glob('./n*') # n-type
im_e = glob.glob('./e*') # e-type
# combining Lists of Pictures
A_n = im_a + im_n
N_a = im_n + im_a
A_e = im_a + im_e
E_a = im_e + im_a
E_n = im_e + im_n
N_e = im_n + im_e
# making a Dictionary of Pictures and Conditions
PicList = [A_n, N_a, A_e, E_a, E_n, N_e] # just the six Combinations
CondList = [im_a,im_n,im_a,im_e,im_e,im_n] # images that are in the GO-Condition
ImageList = []
for imagelist, condition in zip(PicList, CondList):
ImageList.append({'imagelist':imagelist,'condition':condition}) # to associate the picturelist with the GO Conditionlist
for the header I ask an extra question: Combining and associating multiple dictionaries
# Set Experiment
win = visual.Window(color='white',units='pix', fullscr=False)
fixCross=visual.TextStim(win,text='+',color='black',pos=(0.0,0.0), height=40)
corrFb = visual.TextStim(win,text='O',height=40,color='green',pos=[0,0])
incorrFb = visual.TextStim(win,text='X',height=40, color='red',pos=[0,0])
# Start Experiement
trials = data.TrialHandler(ImageList, nReps=1, method='random')
rt_clock = core.Clock()
bitmap = visual.ImageStim(win)
for liste in ImageList[0:5]: # to loop through all 6 conditions
keys = []
for i,Pictures in enumerate(liste): # to loop through all pictures in each condition
bitmap.setImage(Pictures) # attribute error occurs, not if I use Pictures[0][0], even though in this case every pictures is the same
bitmap.draw()
win.flip()
rt_clock.reset()
resp = False
while rt_clock.getTime() < 2.0: # timelimit is defined 2 s
if not resp:
resp = event.getKeys(keyList=['space'])
rt = rt_clock.getTime()
if bool(resp) is (Pictures in CondList): # at this point python should have access to the Dictionary in which the associated GO Pictures are saved
corrFb.draw()
accu=1 # doesn't work yet
else:
incorrFb.draw()
accu=0
win.flip()
core.wait(0.5)
trials.addData('rt_'+str(i), rt) # is working well when loop: if trial in trials: ...; in this case trialHAndler is not used, therefor trials.addData is not working
trials.addData('accu_'+str(i), accu)
trials.saveAsExcel(datanames)
core.quit()
There are a few problems in this code: first it only presents one pictuere for six times, but not six different pictures [1]
and secondly a totally different problem [2] ist the time measuring and the saving of the accuracy which the trialhandler is doing, but for each trial. So it adds up all the RT's for each trial. I want to get the RT's for each image. I tried a few things like an extra stimulus.trialhandler for the stimuli and an extraloop in the end which gives me the last RT but not each. --> is answered below!!!
for stimuli in stimulus: stimulus.addData('rt', rt)
I know these four questions are a lot for one question, but maybe somebody can give me some good ideas of how I can solve these... Thanks everybody!
The reason for your problem labelled [1] is that you set the image to PicList[0][0] which never changes. As Mike is suggesting above you need::
for i,thisPic in enumerate(PicList):
bitmap.setImage(thisPic) #not PicList[0][0]
But maybe you need to go back to basics so that you actually use the trial handler to handle your trials ;-)
Create a single list of dictionaries where one dictionary represents one trial, and then run through those in order (tell the TrialHandler to use the list 'sequential' rather than 'random'). So the loops that you're currently using should just be to create your list of condition dicts, not to run the trials. Then pass that one list to the trial handler::
trials = TrialHandler(trialTypes = myCondsListInOrder, nReps=1, method='sequential')
for thisTrial in trials:
pic = thisTrial['pic']
stim.setImage(pic)
...
trials.addData('rt', rt)
trials.addData('acc',acc)
Also, I would output your data not using the excel format, but the 'long wide' format::
trials.saveAsWideText('mydataFile.csv')
best wishes,
Jon
(A) This isn't relevant to your question but will improve performance.
The line:
bitmap = visual.ImageStim(win)
Shouldn't occur within the loop. i.e. you should initialise each stimulus only once, then within a loop you just update that the properties of that stimulus, e.g. with bitmap.setImage(…). So shift this initialisation line to the top, where you create the TextStims.
(B) [deleted: I hadn't paid attention to the first code block.]
(C)
bitmap.draw(pictures)
This line doesn't take any arguments. It should just be bitmap.draw(). And anyway, it isn't clear what 'pictures' refers to. Remember that Python is case sensitive. This isn't the same thing as 'Pictures' defined in the loop above. I'm guessing that you want to update what picture is being shown? In that case, then you need to be doing the bitmap.setImage(…) line within this loop, not above, where you will always be drawing a fixed picture as that is the only one that gets set on each trial.
(D) Re the RTs, you are saving this only once per trial (check the indentation). If you want to save one per image, you need to indent these lines again. Also, you only get one line per trial in the data output. If you want to record multiple RTs per trial, you will need to give them unique names, e.g. rt_1, rt_2, …, rt_6 so they each appear in a separate column. e.g. you could use an enumerator for this loop:
for i, picture in enumerate(Piclist)
# lots of code
# then save data:
trials.addData('rt_'+str(i), rt)

What is the most effective way to incremente a large number of values in Python?

Okay, sorry if my problem seems a bit rough. I'll try to explain it in a figurative way, I hope this is satisfactory.
10 children. 5 boxes. Each child chooses three boxes. Each box is opened:
- If it contains something, all children selected this box gets 1 point
- Otherwise, nobody gets a point.
My question is about what I put in bold. Because in my code, there are lots of kids and lots of boxes.
Currently, I proceed as follows:
children = {"child_1" : 0, ... , "child_10": 0}
gp1 = ["child_3", "child_7", "child_10"] #children who selected the box 1
...
gp5 = ["child_2", "child_5", "child_8", "child_10"]
boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]
for box in boxes:
if box[0] == 1: #something inside
for child in box[1]:
children[child] += 1
I worry mainly about the for loop that assigns each child an extra point. Because in my final code, I have many many children, I fear that doing so would slow the program too.
Is there a more efficient way for all children of the same group may have their point faster?
Represent children as indices into arrays, not as strings:
childrenScores = [0] * 10
gp1 = [2,6,9] # children who selected box 1
...
gp5 = [1,4,7,9]
boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]
Then, you can store childrenScores as a NumPy array and use advanced indexing:
childrenScores = np.zeros(10, dtype=int)
...
for box in boxes:
if box[0]:
childrenScores[box[1]] += 1 # NumPy advanced indexing
This still involves a loop somewhere, but the loop is deep inside NumPy instead, which should provide a meaningful speedup.
The only speed up that I can think of is to use numpy arrays and stream the sum operation.
children[child] += np.ones(len(children[child]))
You should benchmark the operation and see if that is too slow for your business case.
What I would do
In the gpX lists don't save the "name of the child" (e.g. "child_10") but save a reference to the child's number of points.
How to do that
Using the fact that lists are objects in python, you can:
Change the children dict to look like: children = {"child_0": [0], "child_1": [0], ...} and so on.
When you assign to group, don't assign the key but assign the value (e.g. gp1.append(children["child_0"])).
The loop should then look like: for child in box[1]: child[0]+=1. This WILL update the children dict.
EDIT:
Why this is faster:
Because you leave out the part where you search for children[child], which might be costly.
This technique works because by storing the totals in a mutable type, and appending those values to the group lists, both the dict value and each box's list value will point to the same list entries, and changing one will change the other.
Two general points:
(1) Based on what you've told us, there's no reason to focus your energy on minor performance optimizations. Your time would be better spent thinking about ways to make your data structures less awkward and more communicative. A bunch of interrelated dicts, lists, and tuples quickly becomes difficult to maintain. For an alternative, see the example below.
(2) As the game designer, you understand that events follow a certain sequence: first the kids select their boxes, and later they discover whether they get points for them. But you don't have to implement it that way. A kid can choose a box and get points (or not) immediately. If there's a need to preserve the child's ignorance about such outcomes, the parts of your algorithm that depend on such ignorance can enforce that veil of secrecy as needed. The upshot: there is no need for a box to loop through its children, awarding points to each one; instead, award the points immediately to kids as boxes are selected.
import random
class Box(object):
def __init__(self, name):
self.name = name
self.prize = random.randint(0,1)
class Child(object):
def __init__(self, name):
self.name = name
self.boxes = []
self.score = 0
self._score = 0
def choose(self, n, boxes):
bs = random.sample(boxes, n)
for b in bs:
self.boxes.append(b)
self._score += b.prize
def reveal_score(self):
self.score = self._score
boxes = [Box(i) for i in range(5)]
kids = [Child(i) for i in range(10)]
for k in kids:
k.choose(3, boxes)
# Later in the game ...
for k in kids:
k.reveal_score()
print (k.name, k.score), '=>', [(b.name, b.prize) for b in k.boxes]
One way or another, you're going to be looping over the children, and your answer appears to avoid looping over children who don't get any points.
It might be slightly faster to use filter or itertools.ifilter to pick the boxes that have something in them:
import itertools
...
for box in itertools.ifilter(lambda x: x[0], boxes):
for child in box[1]
children[child] += 1
If you don't need to immediately print the number of points for every child, you could calculate it on demand, thus saving time. This could help if you only need to query a child every now and then for how many points it has. You can cache each result as you obtain is so you don't go about calculating it again the next time you need it.
Firstly, you'll need to know which groups a child belongs to. We'll store this information as map we'll call childToGroupsMap, which will map each child to an array holding the boxes it belongs to, like so:
childToGroupsMap = {}
for child in children:
childToGroupsMap[child[0]] = []
for box in boxes:
for child in box[1]:
if (box[1] not in childToGroupsMap[child]):
childToGroupsMap[child].append(box[1])
This constructs the reverse map from children to boxes.
It'll also help to have a map from each box to a boolean representing whether it's been opened:
boxToOpenedMap = {}
for box in boxes:
boxToOpenedMap[box[1]] = box[0]
Now, when someone queries how many points a child has, you can go through each of the boxes it belongs to (using childToGroupsMap, of course), and simply count how many of those boxes have been mapped to 1 in the map boxes:
def countBoxesForChild(child):
points = 0
for box in childToGroupsMap[child]
if boxToOpenedMap[box] == 1:
points += 1
return points
To make this better, you can cache the resulting number of points. Make a map like so:
childToPointsCalculated = {}
for child in children:
childToPointsCalculated[child[0]] = -1
Where the -1 denotes that we don't yet know how many points this child has.
Finally, you can modify your countBoxesForChild function to exploit the cache:
def countBoxesForChild(child):
if childToPointsCalculated[child] != -1
return childToPointsCalculated[child]
points = 0
for box in childToGroupsMap[child]
if boxToOpenedMap[box] == 1:
points += 1
childToPointsCalculated[child] = points
return points

Python - How to track (add/remove) lots of class instances over mulitple iterations?

I am building a dynamic map of earthquakes, using the vtk library.
I've already made a static one, (see here: https://github.com/yacobuk/QuakeCloud and here: http://www.youtube.com/watch?v=4HVdTcI_ozI) so I know the basic idea works, but now I want to try and show the quakes over time.
I have some code examples that show me how to update the frame, and how to add / remove objects, but I'm stuck on figuring out how to spin up an instance, track it for a few periods, then remove it.
The basic add/ remove code looks like this:
for point_and_mag in pm.points_mag:
time.sleep(0.5)
mag = point_and_mag[1]
point = point_and_mag[0]
if mag > 2:
pointCloud = VtkPointCloud(pm)
pointCloud.addPoint(point, math.log(mag)*10)
renderer.AddActor(pointCloud.vtkActor)
renderer.ResetCamera()
renderWindow.Render()
time.sleep(0.3)
renderer.RemoveActor(pointCloud.vtkActor)
renderer.ResetCamera()
renderWindow.Render()
But of course, this only allows one object at a time (an instance of pointCloud.vtkActor via renderer.AddActor(pointCloud.vtkActor) waits a while, then removes it with renderer.RemoveActor(pointCloud.vtkActor)
How can I add a number of actors (I'm going to use 10 min interval, and there was as many as 5 quakes in that time), tag it with a counter, increment the counter at every loop iteration, and when it reaches 5 iterations, remove the actor?
There is some more context to this question here: Python/vtk - set each point size individually in a vtkPolyData object?
A possible(untested) solution might be:
from collections import deque
# The number 5 indicates for how many iterations the actors should be rendered.
rendered_actors = deque([None] * 5, maxlen=5)
for point_and_mag in pm.points_mag:
if rendered_actors[-1] is not None:
renderer.removeActor(rendered_actors[-1])
renderer.ResetCamera()
renderWindow.Render()
time.sleep(0.5)
mag = point_and_mag[1]
point = point_and_mag[0]
if mag > 2:
pointCloud = VtkPointCloud(pm)
pointCloud.addPoint(point, math.log(mag)*10)
rendered_actors.appendleft(pointcloud.vtkActor)
renderer.AddActor(pointCloud.vtkActor)
renderer.ResetCamera()
renderWindow.Render()
else:
rendered_actors.appendleft(None)
This code creates a deque(which is a double-linked list) of length 5. The actors are inserted at the left of this deque and at each iteration the rightmost value, if it is an "actor", it is removed from the scene and the scene is re-rendered.
Note that I don't have vtk so I cannot test this code.
A small style note: this is really unpythonic code-style:
for point_and_mag in pm.points_mag:
mag = point_and_mag[1]
point = point_and_mag[0]
Use tuple-unpacking:
for point, mag in pm.points_mag:
# ...
if mag > 2:
# ...

Categories