Single remove clause in while loop is removing two elements - python

I am writing a simple secret santa script that selects a "GiftReceiver" and a "GiftGiver" from a list. Two lists and an empty dataframe to be populated are produced:
import pandas as pd
import random
santaslist_receivers = ['Rudolf',
'Blitzen',
'Prancer',
'Dasher',
'Vixen',
'Comet'
]
santaslist_givers = santaslist_receivers
finalDataFrame = pd.DataFrame(columns = ['GiftGiver','GiftReceiver'])
I then have a while loop that selects random elements from each list to pick a gift giver and receiver, then remove from the respective list:
while len(santaslist_receivers) > 0:
print (len(santaslist_receivers)) #Used for testing.
gift_receiver = random.choice(santaslist_receivers)
santaslist_receivers.remove(gift_receiver)
print (len(santaslist_receivers)) #Used for testing.
gift_giver = random.choice(santaslist_givers)
while gift_giver == gift_receiver: #While loop ensures that gift_giver != gift_receiver
gift_giver = random.choice(santaslist_givers)
santaslist_givers.remove(gift_giver)
dummyDF = pd.DataFrame({'GiftGiver':gift_giver,'GiftReceiver':gift_receiver}, index = [0])
finalDataFrame = finalDataFrame.append(dummyDF)
The final dataframe only contains three elements instead of six:
print(finalDataframe)
returns
GiftGiver GiftReceiver
0 Dasher Prancer
0 Comet Vixen
0 Rudolf Blitzen
I have inserted two print lines within the while loop to investigate. These print the length of the list santaslist_receivers before and after the removal of an element. The expected return is to see original list length on the first print, then minus 1 on the second print, then the same length again on the first print of the next iteration of the while loop, then so on. Specifically I expect:
6,5,5,4,4,3,3... and so on.
What is returned is
6,5,4,3,2,1
Which is consistent with the DataFrame having only 3 rows, but I do not see the cause of this.
What is the error in my code or my approach?

You can solve it by simply changing this line
santaslist_givers = santaslist_receivers
to
santaslist_givers = list(santaslist_receivers)
Python variables are pointers essentially so they refer to the same list , ie santaslist_givers and santaslist_receivers were accessing the same location in memory in your implementation . To make them different use a list function
And for some extra information , you can refer copy.deepcopy

You should make an explicit copy of your list here
santaslist_givers = santaslist_receivers
there are multiple options for doing this as explained in this question.
In this case I would recommend (if you have Python >= 3.3):
santaslist_givers = santaslist_receivers.copy()
If you are on an older version of Python, the typical way to do it is:
santaslist_givers = santaslist_receivers[:]

Related

How to assign a new value to the list in running loop

My data cutting loop seems to run ok in the loop, but when it prints the result outside the loop, the contents are unchanged. Presuming it's buggy because I'm trying to assign to what the for loop is running through, but I don't know.
For reference, it's a small web review scraper project I'm working on. To get it formatted to CSV with pandas I think all the data needs to end at the same point (length), so I'm cutting any lists that are longer than the shortest. The values "cust_stars_result, rev_result, cust_res" are all lists with basics strings stored inside, in this case equal to lengths 16, 12, and 15. I try to slice everything down to 12 in the end but the results are overwritten. What is the right/best way to go about this?
star_len = len(cust_stars_result)
rev_len = len(rev_result)
custname_len = len(cust_res)
print('customer name length: ' + str(custname_len) + ' -- review length: ' + str(rev_len) + ' -- star length: ' + str(star_len))
datalen = [star_len, rev_len, custname_len]
print(min(datalen))
datapack = [cust_stars_result, rev_result, cust_res]
# LOOPER FOR CULLING
for data in datapack:
if len(data) != min(datalen):
print("operating culler to make data even length")
print(len(data))
data = data[: min(datalen)]
print(len(data)) #this comes out OK
else:
print("equal length, skipping culler")
pass
print(datapack) # prints the original values
Inside your loop you update the data variable but that's just reassigning the value of that variable. You want to do something like
for i, data in enumerate(datapack):
...
datapack[i] = data[: min(datalen)]
This will update the datapack element
While "trying to assign to what the for loop is running through" is a real issue, in this case the problem is rather that your code is not assigning anything to datapack when you change data. Instead, what it does is assign each item in datapack to data, so when you change data, datapack remain unchanged.
Instead, try either adding each item to new list, and then assigning datapack to equal the new list:
temp = []
for data in datapack:
...
temp.append(data[:min(datalen)])
datapack = temp
Or try using a range or enumerate loop:
for i, data in enumerate(datapack):
...
datapack[i] = data[:min(datalen)]
There are more fancy ways (but less readable and debuggable) to accomplish what you're doing here (slicing off the end of the list), such as the below which uses list comprehension and map:
mindatalen = min(map(len, datapack))
datapack = [data[:mindatalen]for data in datapack]

Extracting multiple data from a single list

I working on a text file that contains multiple information. I converted it into a list in python and right now I'm trying to separate the different data into different lists. The data is presented as following:
CODE/ DESCRIPTION/ Unity/ Value1/ Value2/ Value3/ Value4 and then repeat, an example would be:
P03133 Auxiliar helper un 203.02 417.54 437.22 675.80
My approach to it until now has been:
Creating lists to storage each information:
codes = []
description = []
unity = []
cost = []
Through loops finding a code, based on the code's structure, and using the code's index as base to find the remaining values.
Finding a code's easy, it's a distinct type of information amongst the other data.
For the remaining values I made a loop to find the next value that is numeric after a code. That way I can delimitate the rest of the indexes:
The unity would be the code's index + index until isnumeric - 1, hence it's the first information prior to the first numeric value in each line.
The cost would be the code's index + index until isnumeric + 2, the third value is the only one I need to store.
The description is a little harder, the number of elements that compose it varies across the list. So I used slicing starting at code's index + 1 and ending at index until isnumeric - 2.
for i, carc in enumerate(txtl):
if carc[0] == "P" and carc[1].isnumeric():
codes.append(carc)
j = 0
while not txtl[i+j].isnumeric():
j = j + 1
description.append(" ".join(txtl[i+1:i+j-2]))
unity.append(txtl[i+j-1])
cost.append(txtl[i+j])
I'm facing some problems with this approach, although there will always be more elements to the list after a code I'm getting the error:
while not txtl[i+j].isnumeric():
txtl[i+j] list index out of range.
Accepting any solution to debug my code or even new solutions to problem.
OBS: I'm also going to have to do this to a really similar data font, but the code would be just a sequence of 7 numbers, thus harder to find amongst the other data. Any solution that includes this facet is also appreciated!
A slight addition to your code should resolve this:
while i+j < len(txtl) and not txtl[i+j].isnumeric():
j += 1
The first condition fails when out of bounds, so the second one doesn't get checked.
Also, please use a list of dict items instead of 4 different lists, fe:
thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})
In this way you always have the correct values together in the list, and you don't need to keep track of where you are with indexes or other black magic...
EDIT after comment interactions:
Ok, so in this case you split the line you are processing on the space character, and then process the words in the line.
from pprint import pprint # just for pretty printing
textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []
def handle_line(textl: str):
description = ''
unity = None
values = []
for word in textl.split()[1:]:
# it splits on space characters by default
# you can ignore the first item in the list, as this will always be the code
# str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
if not word.replace(',', '').replace('.', '').isnumeric():
if len(description) == 0:
description = word
else:
description = f'{description} {word}' # I like f-strings
elif not unity:
# if unity is still None, that means it has not been set yet
unity = word
else:
values.append(word)
return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}
the_list.append(handle_line(textl))
pprint(the_list)
str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296

Why I have IndexError when program makes succesful first step?

I tried to make sorter that deletes duplicates of IP's in first list and saves it into a file, but after first succesful round it gives me IndexError: list index out of range.
I've expected normal sorting process, but it doesn't works
Code:
ip1 = open('hosts', 'r')
ip2 = open('rotten', 'r')
ipList1 = [line.strip().split('\n') for line in ip1]
ipList2 = [line.strip().split('\n') for line in ip2]
for i in range(len(ipList1)):
for a in range(len(ipList2)):
if(ipList1[i] == ipList2[a]):
print('match')
del(ipList1[i])
del(ipList2[a])
i -= 1
a -= 1
c = open('end', 'w')
for d in range(len(ipList1)):
c.write(str(ipList1[d]) + '\n')
c.close()
You're deleting from the list while iterating over it, that's why you're getting an IndexError.
This could be easier done with sets:
with open('hosts') as ip1, open('rotten') as ip2:
ipList1 = set(line.strip().split('\n') for line in ip1)
ipList2 = set(line.strip().split('\n') for line in ip2)
good = ipList1 - ipList2
with open('end', 'w') as c:
for d in good:
c.write(d + '\n')
You changed lists in a fly. For expression gets a list with, as an example, of 5 elements length, after the first iteration you remove 4, so in the second iteration for tried to extract the second element but now it does not exist.
If necessary save the ordering you can use generator expression:
ips = [ip for ip in ipList1 if ip not in set(list2)]
If doesn't, just use sets expression.
You should never modify a list that you are currently iterating over.
A fix would be just to make a third list that saves the non duplicates. Another way would be to just use sets and subtract them from each other although I do know if you like duplicates in one list itself. Also, the way you are doing it right now a duplicate is only found if its at the same index.
ip2 = open('rotten', 'r')
ipList1 = [line.strip().replace('\n', '') for line in ip1]
ipList2 = [line.strip().replace('\n', '') for line in ip2]
ip1.close()
ip2.close()
newlist = []
for v in ip1:
if v not in ip2:
newlist.append(v)
c = open('end', 'w')
c.write('\n'.join(newlist))
c.close()
Other answers focus on deleting from a container while iterating over it. While that’s generally a bad idea, it’s not the crux of the problem here because you have (unpythonically) set up the for loops to use a sequence of indices, and so you aren’t strictly speaking iterating over the lists themselves anyway.
No, the problem here is that i-=1 and a-=1 have no effect: when a for loop begins a new iteration, it doesn’t work off of the previous value of the index. It just takes the next value that it was always destined to take, from the iterator that you established at the beginning (in your case, the output of range())

Add to dictionary in if loop

I have an if loop in which I am trying to;
(1) Create a dataframe from a filepath.
(2) Format this dataframe
(3) Add that dataframe to a dictionary that is a property of an instance of a class.
Here is my code defining the class and the method:
class myClass:
def __init__(self, name, filepathlist):
self.name = name
self.filepathlist = filepathlist
def formatData(self):
i = 0
self.dataframeDict = {}
if i < (len(self.filepathlist) - 1):
DFRAW = pd.read_csv(self.filepathlist[i], header = 9) #Row 9 is the row that is not blank (all blank auto-skipped)
DFRAW['DateTime'], DFRAW['dummycol1'] = DFRAW[' ;W;W;W;W'].str.split(';', 1).str
DFRAW['Col1'], DFRAW['dummycol2'] = DFRAW['dummycol1'].str.split(';', 1).str
DFRAW['Col2'], DFRAW['dummycol3'] = DFRAW['dummycol2'].str.split(';', 1).str
DFRAW['Col3'], DFRAW['Col4'] = DFRAW['dummycol3'].str.split(';', 1).str
DFRAW = DFRAW.drop([' ;W;W;W;W', 'dummycol1', 'dummycol2', 'dummycol3'], axis = 1)
dictIndex = self.filepathlist[i][39:44]
self.dataframeDict.update({dictIndex: DFRAW})
i = i + 1
Then I create an instance of the class and run the method:
filepathlist = ['filepath1','filepath2']
myINST = myClass('Mydataname', filepathlist)
myINST.formatData()
I then expect myINST.dataframeDict to have two dataframes as per the 2 input filepaths and thus 2 iterations of the if loop. However only 1 is present.
What is the error in my code or my approach?
It is hard to tell whether this will completely solve your problem, because no dummy data is provided. You will, however, get one step closer to your solution if you replace if i < (len(self.filepathlist) - 1): with while i < (len(self.filepathlist) - 1):.
You are currently just checking if i=0 is smaller than len(self.filepathlist)-1. If so, then the if-block is executed once. What you are actually looking for is a loop that keeps on iterating, as long as i is smaller than len(self.filepathlist)-1. This is done with while-loops.
You need to change your condition to for i in range(len(self.filepathlist)):
(Also, remove the assignment of i as the for loop does it automatically. For the same reason, you should also remove the line which increments i).
If you want to use a while loop, change the if line to while i < len(self.filepathlist):.
Notice that there's no -1. This is because you're using < instead of <=. If you want to use -1, then you also need the <= as this will ensure the loop runs the correct number of times.

Python, I need the following code to finish quicker

I need the following code to finish quicker without threads or multiprocessing. If anyone knows of any tricks that would be greatly appreciated. maybe for i in enumerate() or changing the list to a string before calculating, I'm not sure.
For the example below, I have attempted to recreate the variables using a random sequence, however this has rendered some of the conditions inside the loop useless ... which is ok for this example, it just means the 'true' application for the code will take slightly longer.
Currently on my i7, the example below (which will mostly bypass some of its conditions) completes in 1 second, I would like to get this down as much as possible.
import random
import time
import collections
import cProfile
def random_string(length=7):
"""Return a random string of given length"""
return "".join([chr(random.randint(65, 90)) for i in range(length)])
LIST_LEN = 18400
original = [[random_string() for i in range(LIST_LEN)] for j in range(6)]
LIST_LEN = 5
SufxList = [random_string() for i in range(LIST_LEN)]
LIST_LEN = 28
TerminateHook = [random_string() for i in range(LIST_LEN)]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exclude above from benchmark
ListVar = original[:]
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
print(ListVar[b])
ListVar = original[:]
That makes a shallow copy of ListVar, so your changes to the second level lists are going to affect the original also. Are you sure that is what you want? Much better would be to build the new modified list from scratch.
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
Yuck: whenever possible iterate directly over lists.
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
You want to ignore spaces in the middle of numbers? That doesn't sound right. If the numbers can be negative you may want to use the try..except but if they are only positive just use .isdigit().
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
Is that just bad naming? SufxList implies you are looking for suffixes, if so just use .endswith() (and note that you can pass a tuple in to avoid the loop). If you really do want to find the the suffix is anywhere in the string use the in operator.
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
Again use the in operator. Also any() is useful here.
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
and that while is O(n^2) i.e. it will be slow. You could use a list comprehension instead to strip out the blanks, but better just to build clean lists to begin with.
print(ListVar[b])
I think maybe your indentation was wrong on that print.
Putting these suggestions together gives something like:
suffixes = tuple(SufxList)
newListVar = []
for row in original:
newRow = []
newListVar.append(newRow)
for value in row:
if (not value.isdigit() and
not value.endswith(suffixes) and
not any(th in value for th in TerminateHook)):
newRow.append(value)
print(newRow)

Categories