removing an IndexError on python - python

To start off, I'm basically a beginner in python and coding since its the first programming language that I've learned so if y'all respond could you please simplify what you're trying to say, thanks.
So I have a code that keeps outputing an IndexError: list index out of range error message and yet it also gives me the output I'm looking for.
Here's the code:
def read_voltages(f_name):
f = open(f_name, "r")
# list comprehension
# inside np.array(), builds nested lists, separating each nested list by " " and \n for each line it reads
data = np.array([line.strip("\n").split(" ") for line in f.readlines()])
data_t = data.T
return data_t
def rms(v, N):
RMS = np.sqrt((1 / N) * ((v**2).sum()))
return RMS
f_name = input("Enter file name(eg. file1.txt): ")
data_t = read_voltages(f_name)
# list comprehension ~ converts strings in original output to integer values
# inner comp builds list of int from sequence of valid objects in row
# outer comp builds list of results of inner comp applied to each term in data_t
int_list = [[int(column) for column in row] for row in data_t]
v = int_list
print("Voltage values: \n", v)
print()
size = len(v[0])
for i in range(size):
if size > 0:
N = size
rms_val = rms(np.array(v[i]), N)
print(rms_val)
elif size == 0:
print(None)
And here's the output:
Enter file name(eg. file1.txt): voltages_test.txt
Traceback (most recent call last):
File "C:\Users\amand\Documents\Voltages.py", line 79, in <module>
rms_val = rms(np.array(v[i]), N)
IndexError: list index out of range
Voltage values:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [2, 2, 3, 4, 5, 6, 7, 8, 9, 10], [3, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
6.2048368229954285
6.228964600958975
6.268971207462992
Process finished with exit code 1
For reference heres the .txt file being used:
1 2 3
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Like I said before, the code outputs what I want, but it gives me the IndexError and I don't want that since I need to add more code to finish the program I'm working on. When I removed the [i] in rms_val = rms(np.array(v[i]), N) it gave me a repeated set of 10 identical numbers which were calculated incorrectly. I'm not really sure what else to move around and am quite stuck on this :/.
If anyone can help that'd be great, thanks.
edit: The code is supposed to take in values for "v" which are taken in from a transposed list that comes from a user inputted .txt file. Then, it uses the v values to calculate the root-mean-square and outputs the values in a table of sorts, but that part I can manage.

You are defining your loop variable according to the length of v[0]
size = len(v[0])
Later in the code you use this variable to loop through v
rms_val = rms(np.array(v[i]), N)
The length of v len(v) is shorter than the len of v[0] len(v[0]) thus you are getting the error.
Make sure that you use the appropriate length when looping through your data.

Related

Python: How can I use a String for a If-Statement?

In Python I have to build a (long) if statement dynamically.
How can I do this?
I tried the following test code to store the necessary if-statement within a string with the function "buildFilterCondition".
But this doesn´t work...
Any ideas? What is going wrong?
Thank you very much.
Input = [1,2,3,4,5,6,7]
Filter = [4,7]
FilterCondition = ""
def buildFilterCondition():
global FilterCondition
for f in Filter:
FilterCondition = FilterCondition + "(x==" + str(f) +") | "
#remove the last "| " sign
FilterCondition = FilterCondition[:-2]
print("Current Filter: " + FilterCondition)
buildFilterCondition()
for x in Input:
if( FilterCondition ):
print(x)
With my Function buildFilterCondition() I want to reach the following situation, because the function generates the string "(x==4) | (x==7)", but this doesn´t work:
for x in Input:
if( (x==4) | (x==7) ):
print(x)
The output, the result should be 4,7 (--> filtered)
The background of my question actually had a different intention than to replace an if-statement.
I need a longer multiple condition to select specific columns of a pandas dataframe.
For example:
df2=df.loc[(df['Discount1'] == 1000) & (df['Discount2'] == 2000)]
I wanted to keep the column names and the values (1000, 2000) in 2 separate lists (or dictionary) to make my code a little more "generic".
colmnHeader = ["Discount1", "Discount2"]
filterValue = [1000, 2000]
To "filter" the data frame, I then only need to adjust the lists.
How do I now rewrite the call to the .loc method so that it works for iterating over the lists?
df2=df.loc[(df[colmHeader[0] == [filterValue[0]) & (df[colmHeader[1]] == filterValue[1])]
Unfortunately, my current attempt with the following code does not work because the panda-loc function has not to be called sequentially, but in parallel.
So I need ALL the conditions from the lists directly in the .loc call.
#FILTER
colmn = ["colmn1", "colmn2", "colmn3"]
cellContent = ["1000", "2000", "3000"]
# first make sure, the lists have the same size
if( len(colmn) == len(cellContent)):
curIdx = 0
for curColmnName in colmn:
df_columns= df_columns.loc[df_columns [curColmnName]==cellContent[curIdx]]
curIdx += 1
Thank you again!
Use in operator
Because simple if better than complex.
inputs = [1, 2, 3, 4, 5, 6, 7]
value_filter = [4, 7]
for x in inputs:
if x in value_filter:
print(x, end=' ')
# 4 7
Use operator module
With the operator module, you can build a condition at runtime with a list of operator and values pairs to test the current value.
import operator
inputs = [1, 2, 3, 4, 5, 6, 7]
# This list can be dynamically changed if you need to
conditions = [
(operator.ge, 4), # value need to be greater or equal to 4
(operator.lt, 7), # value need to be lower than 7
]
for x in inputs:
# all to apply a and operator on all condition, use any for or
if all(condition(x, value) for condition, value in conditions):
print(x, end=' ')
# 4 5 6

for loop create a list around "x" based on "n"

Basically trying to figure out how to create a for loop that generates a range around a the main number "x" based on "n"
x = 10 # x = Actual
n = 5
because
Actual = input("What's the Actual") # Enter 10
Target = input("What's the Target") # Enter 15
n = Target - Actual # 5 = 15 - 10
Since Actual is 10
I would like to see..
5, 6, 7, 8, 9 , 10, 11, 12, 13, 14, 15
The code is:
n = 2
def price(sprice):
for i in range(n*2):
sprice = sprice + 1
print(sprice)
price(200)
This code shows 201,202,203,204 and the actual is 200.
I want to see 198,199,200,201,202 because n = 2 and when multiply by 2 = 4 which shows a range of 4 values around 200
According to the docs, range can accept two argument that specify the start (inclusive) and end (exclusive) of the interval. So you can get an interval in the form [start, stop).
You would like to create the interval [Actual - n, Actual + n], so just translate it almost literally to Python, bearing in mind that range excludes the second argument from that range, so you should add one to it:
>>> list(range(Actual - n, Actual + n + 1))
[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
ForceBru has already shown a pythonic solution to your problem. I only want to add that your original code works as intended after some minor tweaks:
n = 2
def price(sprice):
sprice -= n # short way to say: sprice = sprice - n
for i in range(n*2+1): # +1 required as in 1-argument range it is exclusive
sprice = sprice + 1
print(sprice)
price(200)
Output:
199
200
201
202
203
Note that Python recognizes * is to be executed before + independently from its order. Hence you might write 1+n*2 in place of n*2+1.

IndexError out of range but it Is in range

ratings_dict is a dictionary with the format: {'userName' : [0, 0, 1, 0, etc]}
I'm attempting to access each element in the list (in the dictionary) and it is giving me a specific error at 'if ratings_dict[name][i] is not 0:' and it is saying that IndexError: list index out of range.
Edited: This is my revised code. As you can see from the traceback, the value at ratings_dict['Martin'][0] is 1.
I have also revised my code in mind of your efficiency tips..
I am still at a loss at what to do.
def calculate_average_rating(ratings_dict):
ratings = {}
numBooks = len(ratings_dict)
print ratings_dict['Martin'][0]
for i in range(numBooks):
x = 0
sum = 0
numR = 0
for name in ratings_dict:
if ratings_dict[name][x] != 0:
sum = sum + ratings_dict[name][x]
numR += 1
x = x + 1
if numR is 0:
ratings[i] = 0
if sum != 0:
ratings[i] = float(sum) / float(numR)
return ratings
Output:
1
Traceback (most recent call last):
File "C:\Users\Collin\Dropbox\Python Files\main.py", line 106, in <module>
main()
File "C:\Users\Collin\Dropbox\Python Files\main.py", line 103, in main
print calculate_average_rating(ratingsDict)
File "C:\Users\Collin\Dropbox\Python Files\main.py", line 10, in calculate_average_rating
if ratings_dict[name][x] != 0:
IndexError: list index out of range
If you're getting an index error but don't "believe" that you should be, just print out the list and you will see the problem.
a = [0]
Imagine you didn't know what a was and then ran this code:
for i in range(2):
try :
x = a[i]
except IndexError:
print('IndexError, list = ' + str(a) + ', index = ' + str(i))
Then you would see
IndexError, list = [0], index = 1
and so the problem is clear. If the list is too long to print nicely, you can simply print the length instead of the list.
As others have already pointed out, it is very hard to say what is causing the IndexError without looking at ratings_dict. Most likely, the error is happening because the list containing the ratings has different length for different users. You are calculating the length using len(ratings_dict[Names[0]]), which is the length of the list corresponding to the first (i.e. 0th user). If the second user's list has a shorter length, then you will get an IndexError.
In more detail, here's what I mean. Suppose your ratings_dict is as follows:
ratings_dict = {"martin" : [1, 4, 3, 4, 5],
"tom" : [0, 1, 2, 1, 5],
"christina" : [0, 0, 2, 2, 3],
}
and we use the following function (I basically simplified your function a little bit)
def calculate_average_rating(ratings_dict):
names = [key for key in ratings_dict]
numBooks = len(ratings_dict[names[0]])
ratings = []
for i in range(numBooks):
total = 0
for name in names:
if ratings_dict[name][i] != 0:
total = total + ratings_dict[name][i]
ratings.append(float(total) / len(names))
return ratings
then we get no IndexError.
calculate_average_rating(ratings_dict)
[0.3333333333333333,
1.6666666666666667,
2.3333333333333335,
2.3333333333333335,
4.333333333333333]
However, if we Christina has reviewed an additional book and ratings_dict looks as follows:
ratings_dict = {"martin" : [1, 4, 3, 4, 5],
"tom" : [0, 1, 2, 1, 5],
"christina" : [0, 0, 2, 2, 3, 6],
}
then running the function gives an IndexError:
calculate_average_rating(ratings_dict)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-23-1ad9c374e8d8> in <module>()
----> 1 calculate_average_rating(ratings_dict)
<ipython-input-14-080a1623f225> in calculate_average_rating(ratings_dict)
6 total = 0
7 for name in names:
----> 8 if ratings_dict[name][i] != 0:
9 total = total + ratings_dict[name][i]
10
IndexError: list index out of range
It happens because the lists of different users do not have equal length.
Slightly off topic, it is not considered good practice to use capital letters to start variable names, because it can be confused for being a Class. See PEP8 for more details. Therefore, consider changing Names to names.
Secondly, you shouldn't use is not for doing comparisons. In this particular case, it won't bite you. But in some cases, it will. For example:
>>> a = 1000
>>> b = 1000
>>> a is b
False
The is operator compares whether a and b are the same objects, and not their values. In this case, a and b are different objects with the same value, therefore a is b evaluates to False.

How to sum value of integers based on position?

The situation is as followed. I want to sum and eventually calculate their average of specific values based on their positions. So far I have tried many different things and I can come up with the following code, I can't seem to figure out how to match these different positions with their belonging values.
count_pos = 0
for character in score:
asci = ord(character)
count_pos += 1
print(count_pos,asci)
if asci == 10 :
count_pos = 0
print asci generates the following output:
1 35
2 52
3 61
4 68
5 70
6 70
1 35
2 49
3 61
4 68
5 68
6 70
The numbers 1-6 are the positions and the other integers are the values belonging to this value. So what I basically am trying to do is to sum the value of position 1 (35+35) which should give me : 70, and the sum of the values of position 2 should give me (52+49) : 101 and this for all positions.
The only thing so far I thought about was comparing the counter like this:
if count_pos == count_pos:
#Do calculation
NOTE: This is just a part of the data. The real data goes on like this with more than 1000 of these counting and not just 2 like displayed here.
Solution
This would work:
from collections import defaultdict
score = '#4=DFF\n#1=DDF\n'
res = defaultdict(int)
for entry in score.splitlines():
for pos, char in enumerate(entry, 1):
res[pos] += ord(char)
Now:
>>> res
defaultdict(int, {1: 70, 2: 101, 3: 122, 4: 136, 5: 138, 6: 140})
>>> res[1]
70
>>> res[2]
101
In Steps
Your score string looks like this (extracted from your asci numbers):
score = '#4=DFF\n#1=DDF\n'
Instead of looking for asci == 10, just split at new line characters with
the string method splitlines().
The defaultdict from the module collections gives you a dictionary that
you can initiate with a function. We use int() here. That will call int() if we access a key does not exist. So, if you do:
res[pos] += ord(char)
and the key pos does not exit yet, it will call int(), which gives a 0
and you can add your number to it. The next time around, if the number of
pos is already a key in your dictionary, you will get the value and you add
to it, summing up the value for each position.
The enumerate here:
for pos, char in enumerate(entry, 1):
gives you the position in each row named pos, starting with 1.
If you have the two lists to be added in two lists you may do this :
Using zip:
[x + y for x, y in zip(List1, List2)]
or
zipped_list = zip(List1,List2)
print([sum(item) for item in zipped_list])
Eg: If the lists were,
List1=[1, 2, 3]
List2=[4, 5, 6]
Output would be : [5, 7, 9]
Using Numpy:
import numpy as np
all = [list1,list2,list3 ...]
result = sum(map(np.array, all))
Eg:
>>> li=[1,3]
>>> li1=[1,3]
>>> li2=[1,3]
>>> li3=[1,3]
>>> import numpy as np
>>> all=[li,li1,li2,li3]
>>> mylist = sum(map(np.array, all))
>>> mylist
array([ 4, 12])

what this python code trying to do

The following python code is to traverse a 2D grid of (c, g) in some special order, which is stored in "jobs" and "job_queue". But I am not sure which kind of order it is after trying to understand the code. Is someone able to tell about the order and give some explanation for the purpose of each function? Thanks and regards!
import Queue
c_begin, c_end, c_step = -5, 15, 2
g_begin, g_end, g_step = 3, -15, -2
def range_f(begin,end,step):
# like range, but works on non-integer too
seq = []
while True:
if step > 0 and begin > end: break
if step < 0 and begin < end: break
seq.append(begin)
begin = begin + step
return seq
def permute_sequence(seq):
n = len(seq)
if n <= 1: return seq
mid = int(n/2)
left = permute_sequence(seq[:mid])
right = permute_sequence(seq[mid+1:])
ret = [seq[mid]]
while left or right:
if left: ret.append(left.pop(0))
if right: ret.append(right.pop(0))
return ret
def calculate_jobs():
c_seq = permute_sequence(range_f(c_begin,c_end,c_step))
g_seq = permute_sequence(range_f(g_begin,g_end,g_step))
nr_c = float(len(c_seq))
nr_g = float(len(g_seq))
i = 0
j = 0
jobs = []
while i < nr_c or j < nr_g:
if i/nr_c < j/nr_g:
# increase C resolution
line = []
for k in range(0,j):
line.append((c_seq[i],g_seq[k]))
i = i + 1
jobs.append(line)
else:
# increase g resolution
line = []
for k in range(0,i):
line.append((c_seq[k],g_seq[j]))
j = j + 1
jobs.append(line)
return jobs
def main():
jobs = calculate_jobs()
job_queue = Queue.Queue(0)
for line in jobs:
for (c,g) in line:
job_queue.put((c,g))
main()
EDIT:
There is a value for each (c,g). The code actually is to search in the 2D grid of (c,g) to find a grid point where the value is the smallest. I guess the code is using some kind of heuristic search algorithm? The original code is here http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/gridsvr/gridregression.py, which is a script to search for svm algorithm the best values for two parameters c and g with minimum validation error.
permute_sequence reorders a list of values so that the middle value is first, then the midpoint of each half, then the midpoints of the four remaining quarters, and so on. So permute_sequence(range(1000)) starts out like this:
[500, 250, 750, 125, 625, 375, ...]
calculate_jobs alternately fills in rows and columns using the sequences of 1D coordinates provided by permute_sequence.
If you're going to search the entire 2D space eventually anyway, this does not help you finish sooner. You might as well just scan all the points in order. But I think the idea was to find a decent approximation of the minimum as early as possible in the search. I suspect you could do about as well by shuffling the list randomly.
xkcd readers will note that the urinal protocol would give only slightly different (and probably better) results:
[0, 1000, 500, 250, 750, 125, 625, 375, ...]
Here is an example of permute_sequence in action:
print permute_sequence(range(8))
# prints [4, 2, 6, 1, 5, 3, 7, 0]
print permute_sequence(range(12))
# prints [6, 3, 9, 1, 8, 5, 11, 0, 7, 4, 10, 2]
I'm not sure why it uses this order, because in main, it appears that all candidate pairs of (c,g) are still evaluated, I think.

Categories