I have a function that has a parameter requiring a value, and I have the values stored within a list. (These values are the index numbers of a dataframe; the dfFunction takes a row from the dataframe using iloc, so a number between 0 and 99 is needed for the function to return a value).
Such as
IndexList = [0,1,2,3,4,...,99]
RowIndex = IndexList.index(0)
dfFunction (RowIndex)
#Output: 10
But I'd like to be able to run through the list of index values and apply them to the function, thus producing the function result for each index number.
However, the code I have at the moment, to iterate through the list, is returning <function __main__.IndexFunction()>
def IndexFunction():
RowIndex = 0
while RowIndex <= len(IndexList):
RowIndexValue = IndexList.index(RowIndex)
RowIndex += 1
return RowIndexValue
I thought I could apply this function as the parameter, like: dfFunction(IndexFunction) but I also know this isn't correct.
Is there a while loop or way to apply enumerate() or something else so I can use the list values in the dfFunction and produce the results for all of them?
Ideally:
dfFunction(#insert solution)
#Output dfFunction results for each list index to put back in dataframe as a new column
OR:
IndexList = [0,1,2,3,4,5,...,99]
IndexNumber = 0
row_index = IndexList.index(IndexNumber)
Result = dfColumn1.iloc[row_index] / dfFunction(row_index)
Result
#Output
IndexNumber = 1
row_index = IndexList.index(IndexNumber)
Result = dfColumn1.iloc[row_index] / dfFunction(row_index)
Result
#Output
IndexNumber = 3
row_index = IndexList.index(IndexNumber)
Result = dfColumn1.iloc[row_index] / dfFunction(row_index)
Result
#Output
etc
Simplified so there doesn't have to be 100 repeats of those chunks of code, so one function spits out all the outputs in one go, like:
#Output of function from Index Position 1
#Output of function from Index Position 2
#Output of function from Index Position 3
#Output of function from Index Position 4
Related
I would like to obtain the k-th largest/ k-th smallest value from numerical columns in a .xlsx file imported in Python. I have heard that a sorted array is required for the same.
So I tried isolating different columns into an array in Python using openpyxl like so
col_array = []
for i in range(1,1183):
col_array = factor2_db.cell(row=i,column=2).value
print(col_array)
And then used the function below to find the kth Largest value but that resulted in an error in Line 2
``
0 class Solution(object):
1 def findKthLargest(self,nums, k):
2 nums_sorted= sorted(nums) **TypeError: 'float' object is not iterable**
3 if k ==1:
4 return nums_sorted[-1]
5 temp = 1
6 return nums_sorted[len(nums_sorted)-k]
7 ob1 = Solution()
8 print(ob1.findKthLargest(col_array,5))
Soln
Strictly speaking, your variable col_array is not a multi-dimensional array - it's simply a single value presented as a list that gets overwritten 1182 times (i.e. range(1,1183)).
I tried the following and had favourable results:
col_array=[]
for i in range(10): col_array.append(i)
print(col_array)
you can customise the append using something (untested) like this: col_array.append( factor2_db.cell(row=i, column= 2).value)
def kth(k, large=False):
global col_array
col_array.sort(key=None, reverse=large)
print(f'col_array = {col_array}, col_array[{k}] = {col_array[k]}')
return col_array[k-1]
print(kth(4, True))
Noting the first element of a list is indexed 0, although you wouldn't ordinarily talk of returning the '0th' smallest / largest element, whence the adjustment -1 in return col_array[k-1]
Demonstration:
Additional notes:
To preserve the original ordering, replicate col_array at outset - one way to achieve this:
col_array_copy = []; col_array_copy += (x for x in col_array)
Then proceed as above after replacing 'col_array' with 'col_array_copy'.
I'm trying to generate the fibonacci series here. Not necessarily looking for an answer specific to that series but why the loop I've created here won't generate a list with upto 20 values for an input of '0'.
So far I've tried appending within and before the loop. The result I get is [0,1]. It doesn't seem to add to the list beyond that.
series = []
value = input("Enter an integer: \n")
i = int(value)
series.append(i)
if series[0] == 0:
series.append(1)
for i in series[2:20]:
series[i]=series[i-1]+series[i-2]
series.append(i)
print(series)
After doing series.append(1) you series values [0,1] only so series[2:20] == [] and you iterate on nothing and fill in nothing. And you cannot access an index that is not already allocated, so you can't do series[i] and you did not reach that index yet, you just need to append the values
if series[0] == 0:
series.append(1)
for i in range(2, 20):
series.append(series[i - 1] + series[i - 2])
series[2:20] returns values of series from index 2 included to 20 excluded
range(2,20) generates values 2 included to 20 ecluded
I have a Pandas Dataframe with one column for the index of the row in the group. I now want to determine whether that row is in the beginning, middle, or end of the group based on this index. I wanted to apply a UDF that returns start (0) middle (1) or end(2) as output, and I want to save that output per row in a new column. Here is my UDF:
def add_position_within_group(group):
length_of_group = group.max()
three_lists = self.split_lists_into_three_parts([x for x in range(length_of_group)])
result_list = []
for x in group:
if int(x) in three_lists[0]:
result_list.append(0)
elif int(x) in three_lists[1]:
result_list.append(1)
elif int(x) in three_lists[2]:
result_list.append(2)
return result_list
Here is the split_lists_into_three_parts method (tried and tested):
def split_lists_into_three_parts(self, event_list):
k, m = divmod(len(event_list), 3)
total_list = [event_list[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(3)]
start_of_list = total_list[0]
middle_of_list = total_list[1]
end_of_list = total_list[2]
return [start_of_list,middle_of_list,end_of_list]
Here is the line of code that groups the Dataframe and runs transform() which when called on a groupby, according to what I have read, iterates over all the groups and takes the column as a series as an argument and applies my UDF. It has to return a one-dimensional list or series the same size as the group.:
compound_data_frame["position_in_sequence"] = compound_data_frame.groupby('patient_id')["group_index"].transform(self.add_position_within_group)
I'm getting the following error :
shape mismatch: value array of shape (79201,) could not be broadcast to indexing result of shape (79202,)
I still can't figure out what kind of output my function has to have when passed to transform, or why I'm getting this error. Any help would be much appreciated.
Well I'm embarrassed to say this but here goes: in order to create the three lists of indices I use range(group.max()), which creates a range of the group-size -1. What I should have done is either used the group size or added 1 to group.max().
I need to compare two lists given as parameters in a function. The third parameter in the function is an integer. The first list is a list of thresholds. The second list of smaller length than the first. When comparing the two lists, if the value in the second list is greater than the corresponding value in the first list for the consecutive number given as an input, then the function returns that index. How can I write a code for this? Particularly the code for comparing the two lists for the greater value.
def compare_lists(lst_a, lst_b, num):
count = 0 # number of consecutive times a < b
result = None # last index value to be returned
for index, value in enumerate(lst_a):
try:
if value < lst_b[index]:
count += 1
else:
count = 0
if count == num:
result = index # if you need the first index, then set result = index - num + 1
break
except IndexError:
break
return result
a = [1, 2, 3, 4, 5]
b = [10, 20, 30]
print(compare_lists(a, b, 3))
I have a specific scenario that I need to scan a specific portion of an array for a maximum value of that portion and return the position of that value with regards to the entire array.
for example
searchArray = [10,20,30,40,50,60,100,80,90,110]
I want to scan for the max value in portion 3 to 8, (40,50,60,100,80,90)
and then return the location of that value.
so in this case max value is 100 and location is 6
is there a way to get that using python alone or with help oy numpy
First slice your list and then use index on the max function:
searchArray = [10,20,30,40,50,60,100,80,90,110]
slicedArray = searchArray[3:9]
print slicedArray.index(max(slicedArray))+3
This returns the index of the sliced array, plus the added beginSlice
Try this...Assuming you want the index of the max in the whole list -
import numpy as np
searchArray = [10,20,30,40,50,60,100,80,90,110]
start_index = 3
end_index = 8
print (np.argmax(searchArray[start_index:end_index+1]) + start_index)
Use enumerate to get an enumerated list of tuples (actually it's a generator, which means that it always only needs memory for one single entry and not for the whole list) holding the indexes and values, then use max with a custom comparator function to find the greatest value:
searchArray = [10,20,30,40,50,60,100,80,90,110]
lower_bound = 3 # the lower bound is inclusive, i.e. element 3 is the first checked one
upper_bound = 9 # the upper bound is exclusive, i.e. element 8 (9-1) is the last checked one
max_index, max_value = max(enumerate(searchArray[lower_bound:upper_bound], lower_bound),
key=lambda x: x[1])
print max_index, max_value
# output: 6 100
See this code running on ideone.com
I'd do it like this:
sliced = searchArray[3:9]
m = max(sliced)
pos = sliced.index(m) + 3
I've added an offset of 3 to the position to give you the true index in the unmodified list.
With itemgetter:
pos = max(enumerate(searcharray[3:9], 3), key=itemgetter(1))[0]
i guess this what you want
maxVal = max(searchArray[3:8]) // to get max element
position = searchArray.index(max(ary[3:8])) //to get the position of the index