Python Index is out of bounds - python

I am cleaning up data in a .txt file. I use 3 different .txt. files to analyze and clean up three different constructs. The .txt files all have 10 respondents, the first and the last have 17 answers per respondent. The middle one has 16 answers per respondent. The problem I'm facing right now is that the first and last work, but the middle one with 16 questions has problems with the index. All three pieces of code look almost identical.
The error code:
Traceback (most recent call last):
File "main.py", line 161, in <module>
itemF = dataF[row,column]
IndexError: index 16 is out of bounds for axis 1 with size 16
Sample input:
['N.v.t.', '0', '1', '2', '1', 'N.v.t.', '0', '0', '2', '0', '0', '3', '2', '3', '1', '1']
['N.v.t.', '1', 'N.v.t.', '0', '0', 'N.v.t.', '2', '0', 'N.v.t.', '1', '0', '1', '1', '2', '0', '1']
['N.v.t.', '0', 'N.v.t.', '0', '0', 'N.v.t.', '0', '0', 'N.v.t.', '0', '0', '3', '0', '3', '0', '0']
['2', '2', 'N.v.t.', '1', '3', '1', '2', '1', '1', '3', '2', '2', '3', '1', '2', '3']
['1', '2', 'N.v.t.', '0', '0', 'N.v.t.', '2', '2', '0', '2', '1', '2', '2', '3', '1', '2']
['N.v.t.', '0', 'N.v.t.', '1', '0', 'N.v.t.', '1', '2', 'N.v.t.', '1', '0', '3', '1', '3', '2', '2']
['0', '3', 'N.v.t.', '0', '2', '3', '2', '1', '3', '2', '2', '2', '2', '3', '0', '1']
['1', '3', 'N.v.t.', '0', '2', 'N.v.t.', '0', '2', 'N.v.t.', '0', '1', '1', '0', '2', '2', '1']
['1', '2', '2', '2', '3', '3', '0', '2', '2', '2', '2', '2', '2', '2', '2', '1']
['1', '2', 'N.v.t.', '0', '2', 'N.v.t.', '1', '3', '2', '2', '1', '3', '2', '2', '2', '2']
The code:
import numpy
dataF = numpy.loadtxt("answersFEAR.txt", dtype = str, delimiter = ", ")
shapeF = dataF.shape
(shapeF[0] == 5)
print(dataF)
for i in range(0, shape[0]):
str1 = dataF[i, 0]
str2 = dataF[i, -1]
dataF[i, 0] = str1.replace('[', '')
dataF[i, -1] = str2.replace(']', '')
for column in range(0,shape[1]):
for row in range(0, shape[0]):
itemF = dataF[row,column]
dataF[rij,kolom] = itemF.replace("'", '')
dataF[dataF == 'N.v.t.'] = numpy.nan
print("DATA FEAR")
print(dataD)
scoresF = dataF[:,1:17]
scoresF = scoresF.astype(float)
average_score_fear = numpy.nanmean(scoresF, axis = 1)
print("")
print("AVERAGE SCORE FEAR")
print(average_score_fear)
The expected outcome should look like this (this is just one result):
["['1'" "'2'" "'2'" "'2'" "'3'" "'3'" "'0'" "'2'" "'2'" "'2'" "'2'" "'2'" '2'" "'2'" "'2'" "'1']"]
DATA FEAR
[['1', '2', '2', '2', '3', '3', '0', '2', '2', '2', '2', '2', '2', '2', '2', '1']]
AVERAGE SCORE FEAR

Related

Write and store data from input as binary, like an array

I need to take an input (ranging from 1-12) from the user and store the data as binary. (cannot use arrays)
For example: if the user inputs 3, it would return 000000000100. (the 3rd digit from the right)
I was thinking that this would be possible with a log algorithm, but I don't really know where to start. How would I do this? Any help is appreciated.
If I am reading this problem correctly you are being fed in numbers 1-12 and depending on what numbers are fed in you need to return a binary string where the bits at whatever positions are given are equal to one (without lists/arrays). To achieve this you could read in values to a set and then construct a string where the inputted values are one and everything else is zero. Like this:
def read_in():
positions = set()
while True:
print('Enter 1-12 or Q to stop:',end=' ')
entry = input()
if entry != 'Q':
positions.add(int(entry))
else:
break
ret = ''
for i in range(12,0,-1):
if i in positions:
ret += '1'
else:
ret += '0'
return ret
print(read_in())
If you want to update any index to 1 multiple times across multiple inputs, you might want to use a list containing 12 elements that you can tick. Then with that list, you can already get both the string value e.g. "000000000100" and the int value e.g. 4
# Initialize list that we will tick
bin_digits = ["0"] * 12
# Let's assume that the input from user is from 1 to 12
for num in range(1, 13):
# Tick the target index
bin_digits[-num] = "1"
bin_str = "".join(bin_digits) # String value
bin_int = int(bin_str, 2) # Int value
print(bin_digits, bin_str, bin_int)
Output
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1'] 000000000001 1
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1'] 000000000011 3
['0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1'] 000000000111 7
['0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1'] 000000001111 15
['0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1'] 000000011111 31
['0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1'] 000000111111 63
['0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1'] 000001111111 127
['0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1'] 000011111111 255
['0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1'] 000111111111 511
['0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'] 001111111111 1023
['0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'] 011111111111 2047
['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'] 111111111111 4095

Grouping Lists into specific groups

I'm wondering if it is possible to convert the listings into a specific groups to which I could place them in a table format later on.
This is the output that I needed to group, I converted them into a list so that I could easily divide them in table manner.
f=open("sample1.txt", "r")
f.read()
Here's the output:
'0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL +99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430 31558 63001 10214 20197 40117 52014 70544 82108 333 20211 55062 56999 59012 82820 86280 555 60973=\n'
Here's what I have done already. I have managed to change it into a list which resulted in this output:
with open('sample1.txt', 'r') as file:
data = file.read().replace('\n', '')
print (list(data))
The Output:
['0', '2', '4', '5', '9', '8', '4', '3', '0', '0', '9', '9', '9', '9', '9', '2', '0', '1', '8', '0', '1', '0', '1', '0', '0', '0', '0', '4', '+', '1', '4', '6', '5', '0', '+', '1', '2', '1', '0', '5', '0', 'F', 'M', '-', '1', '2', '+', '0', '0', '4', '6', '9', '9', '9', '9', '9', 'V', '0', '2', '0', '3', '0', '0', '1', 'N', '0', '0', '1', '0', '1', '0', '9', '0', '0', '0', '1', 'C', 'N', '0', '0', '8', '0', '0', '0', '1', '9', '9', '+', '0', '2', '1', '4', '1', '+', '0', '1', '9', '7', '1', '1', '0', '1', '1', '7', '1', 'A', 'D', 'D', 'A', 'Y', '1', '4', '1', '0', '2', '1', 'A', 'Y', '2', '4', '1', '0', '2', '1', 'G', 'A', '1', '0', '2', '1', '+', '0', '0', '6', '0', '0', '1', '0', '8', '1', 'G', 'A', '2', '0', '6', '1', '+', '0', '9', '0', '0', '0', '1', '0', '2', '1', 'G', 'E', '1', '9', 'M', 'S', 'L', ' ', ' ', ' ', '+', '9', '9', '9', '9', '9', '+', '9', '9', '9', '9', '9', 'G', 'F', '1', '0', '6', '9', '9', '1', '0', '2', '1', '9', '9', '9', '0', '0', '6', '0', '0', '1', '9', '9', '9', '9', '9', '9', 'K', 'A', '1', '1', '2', '0', 'N', '+', '0', '2', '1', '1', '1', 'M', 'D', '1', '2', '1', '0', '1', '4', '1', '+', '9', '9', '9', '9', 'M', 'W', '1', '0', '5', '1', 'R', 'E', 'M', 'S', 'Y', 'N', '1', '0', '4', '9', '8', '4', '3', '0', ' ', '3', '1', '5', '5', '8', ' ', '6', '3', '0', '0', '1', ' ', '1', '0', '2', '1', '4', ' ', '2', '0', '1', '9', '7', ' ', '4', '0', '1', '1', '7', ' ', '5', '2', '0', '1', '4', ' ', '7', '0', '5', '4', '4', ' ', '8', '2', '1', '0', '8', ' ', '3', '3', '3', ' ', '2', '0', '2', '1', '1', ' ', '5', '5', '0', '6', '2', ' ', '5', '6', '9', '9', '9', ' ', '5', '9', '0', '1', '2', ' ', '8', '2', '8', '2', '0', ' ', '8', '6', '2', '8', '0', ' ', '5', '5', '5', ' ', '6', '0', '9', '7', '3', '=']
My goal is to group them into something like these:
0245,984300,99999,2018,01,01,0000,4,+1....
The number of digits belonging to each column is predetermined, for example there are always 4 digits for the first column and 6 for the second, and so on.
I was thinking of concatenating them. But I'm not sure if it would be possible.
You can use operator.itemgetter
from operator import itemgetter
g = itemgetter(slice(0, 4), slice(4, 10))
with open('sample1.txt') as file:
for line in file:
print(g(line))
Or even better you can make the slices dynamically using zip and itertools.accumulate:
indexes = [4, 6, ...]
g = itemgetter(*map(slice, *map(accumulate, zip([0]+indexes, indexes))))
Then proceed as before
I would recommend naming everything if you actually want to use this data, and double checking that all the lengths make sense. So to start you do
with open('sample1.txt', 'r') as file:
data = file.read().rstrip('\n"')
first, second, *rest = data.split()
if len(first) != 163:
raise ValueError(f"The first part should be 163 characters long, but it's {len(first)}")
if len(second) != 163:
raise ValueError(f"The second part should be characters long, but it's {len(first)}")
So now you have 3 variables
first is "0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL"
second is "+99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430"
rest is ['31558', '63001', '10214', '20197', '40117', '52014', '70544', '82108', '333', '20211', '55062', '56999', '59012', '82820', '86280', '555', '60973']
And then repeat that idea
date, whatever, whatever2, whatever3 = first.split('+')
and then for parsing the first part I would just have a list like
something = date[0:4]
something_else = date[4:10]
third_thing = date[10:15]
year = [15:19]
month = [19:21]
day = [21:23]
and so on. And then you can use all these variables in the code that analyzes them.
If this is some sort of standard, you should look for a library that parses strings like that or write one yourself.
Obviously name the variables better

Deleting one position to another position elements from list in python

I have a list like the one below and I would like to delete all entries between any word (inclusive) and the next '0' (exclusive).
So for example this list:
array = ['1', '1', '0', '3', '0', '2', 'Continue', '1', '5', '1', '4', '0', '7', 'test', '3', '6', '0']
should become:
['1', '1', '0', '3', '0', '2', '0', '7', '0']
You can also do it by using exclusively list comprehension:
array = ['1', '1', '0', '3', '0', '2', 'Continue', '1', '5', '1', '4', '0', '7', 'test', '3', '6', '0']
# Find indices of strings in list
alphaIndex = [i for i in range(len(array)) if any(k.isalpha() for k in array[i])]
# Find indices of first zero following each string
zeroIndex = [array.index('0',i) for i in alphaIndex]
# Create a list with indices to be `blacklisted`
zippedIndex = [k for i,j in zip(alphaIndex, zeroIndex) for k in range(i,j)]
# Filter the original list
array = [i for j,i in enumerate(array) if j not in zippedIndex]
print(array)
Output:
['1', '1', '0', '3', '0', '2', '0', '7', '0']
array = ['1', '1', '0', '3', '0', '2', 'Continue', '1', '5', '1', '4', '0', '7', 'test', '3', '6', '0']
res = []
skip = False #Flag to skip elements after a word
for i in array:
if not skip:
if i.isalpha(): #Check if element is alpha
skip = True
continue
else:
res.append(i)
else:
if i.isdigit(): #Check if element is digit
if i == '0':
res.append(i)
skip = False
print res
Output:
['1', '1', '0', '3', '0', '2', '0', '7', '0']
Kicking it old school -
array = ['1', '1', '0', '3', '0', '2', 'Continue', '1', '5', '1', '4', '0', '7', 'test', '3', '6', '0']
print(array)
array_op = []
i=0
while i < len(array):
if not array[i].isdigit():
i = array[i:].index('0')+i
continue
array_op.append(array[i])
i += 1
print(array_op)

Python 2d array issues

I have been given this as a list of lists each containing either a number 1 2 3 and 0 (0 is repeated twice). Depending on the number and the position I would like a corresponding variable to get added 1 time for each occurrence.
ballots = [['1', '2', '3', '0', '0'],
['1', '3', '0', '2', '0'],
['1', '2', '3', '0', '0'],
['0', '3', '2', '0', '1'],
['1', '3', '0', '2', '0'],
['2', '0', '3', '1', '0'],
['0', '0', '2', '1', '3'],
['0', '1', '2', '3', '0'],
['0', '1', '0', '2', '3'],
['2', '3', '1', '0', '0'],
['3', '2', '0', '0', '1'],
['0', '1', '3', '2', '0'],
['0', '0', '1', '2', '3'],
['0', '0', '3', '2', '1'],
['1', '2', '3', '0', '0'],
['2', '1', '3', '0', '0'],
['0', '3', '2', '1', '0'],
['0', '2', '3', '0', '1'],
['1', '2', '3', '0', '0'],
['1', '0', '0', '3', '2'],
['2', '1', '3', '0', '0'],
['3', '1', '2', '0', '0'],
['2', '3', '0', '1', '0'],
['0', '0', '3', '1', '2'],
['0', '3', '1', '0', '2'],
['2', '1', '0', '0', '3'],
['2', '0', '0', '1', '3'],
['2', '0', '0', '1', '3'],
['3', '0', '1', '0', '2']]
For example, for the first list:
the 1 in position 1 would mean that candidate1vote1 += 1
the 2 in the 2nd position would mean that candidate2vote2 += 1
the 3 in the 3rd position would mean that candidate3vote3 += 1
All 0's are ignored but still counted as a space. For the second list:
the 1 in the first position would mean that candidate1vote1 += 1
the 3 in the 2nd position would mean that candidate3vote2 += 1
the 2 in the 4th position would mean that candidate4vote2 += 1
Basically the position corresponds to candidate1/2/3/4/5 and the value corresponds to either a 1st preference vote, 2nd preference vote or a 3rd preference vote.
Does anyone know how I'd be able to sort through the lists using for/while loops so that it goes through each ballot and each individual vote doing the corresponding sum?
First want to clarify.. so you intend to collect not just votes for each candidate, but the vector of preference votes (1,2,3) for each candidate?
Understand you are dealing with nested list and how to index them. (you would use the term array for those types in numpy library)
when you index list, you access the data from outside to inside. e.g. [outer][inner] (outer/inner as there could be more than 2 nested list)
Now that you know this, given that you don't have memory/time constraints, and since you seem to be not so comfortable with python..I'd suggest you use double for loop. Let's make a nested list of candidate with preference. Their outer index will be candidate #, inner lists with preference.
len(ballot) gives you the # of rows, (let's just say for convenience) 5 you already have for columns. work out the indentation please..
candidate = [[0]*4 for n in xrange(5)] //depends on your choice - whether you want to count for number of 0s, if you want to match position and preference..
n = len(ballot)
for i in range(0, n): //python index starts with 0, if you use range it includes the start number but not the last. google if you don't know
for j in range(0, 5):
if ballots[i][j] == '1':
candidate[j][1] +=1
elif ballots[i][j] == '2':
candidate[j][2] +=1
elif ballots[i][j] == '3':
candidate[j][3] +=1
else: //0
candidate[j][0] +=1
Like this you can put each answer in a list:
c1= list()
c2= list()
...
for i in ballots:
c1.append(i[0])
c2.append(i[1])
...

Read file character by character starting at a particular line

I am using tkinter to bring up a dialog box where the user can choose a file. I want to parse that file starting on line 11. The file looks like this:
(115,147)
(6,145)
(44,112)
(17,72)
(112,1)
(60,142)
(47,158)
(35,43)
(34,20)
(38,33)
11101111110111011111111110111111111111111111111011111111111111110111111111
111101111101111a11011122112011222222211112111221221111101111111111110111ab
..more down here
How do I retrieve each character when they are not separated by spaces? I know I have to start off like this:
# Bring up save dialog box
file_path = filedialog.askopenfilename(filetypes=[("Text files","*.txt")])
# Check if user clicked cancel
if file is None or file is '':
return False
# Read from file
with open(file, 'r') as f:
# Do something here with f.read()
I want to get these numbers in a list (each at their own index):
11101111110111011111111110111111111111111111111011111111111111110111111111
111101111101111a11011122112011222222211112111221221111101111111111110111ab
Any help would be appreciated, thank you!
Firstly, you need to read() data from the file, and then split by newlines to get a list of lines in the file:
lines=f.read().split("\n")
If you only need from line 11 to the end, you can use:
lines=lines[10:]
And then iterate through it, using list() to split into characters:
characters=[list(line)for line in lines]
Output:
[['1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1'], ['1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', 'a', '1', '1', '0', '1', '1', '1', '2', '2', '1', '1', '2', '0', '1', '1', '2', '2', '2', '2', '2', '2', '2', '1', '1', '1', '1', '2', '1', '1', '1', '2', '2', '1', '2', '2', '1', '1', '1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '1', '1', '1', 'a', 'b']]
# consume 10 lines
for _ in range(10):
f.readline()
# read one character at a time
# f.read(1) will return '' when you reach eof
c = f.read(1)
while c:
# do something with c
c = f.read(1)
For that matter, since lists and strings are sort of the same in python, you could just say
# consume 10 lines
for _ in range(10):
f.readline()
rest = f.read()
and then rest would be a list/string with everything in it...i.e., rest[0] is the first char, rest[1] the next, etc. Be aware that you will capture newlines too this way.

Categories