Handling dataframe slice endpoints - python

I am writing a function that splits a dataframe into n equally sized slices, similar to np.array_split, but instead of starting from index 0, it starts from the index of the remainder.
First, I get the indices with which to split.
lst = [1] * 100
df = pd.DataFrame(lst)
def df_split(df, n):
step = math.floor(len(df)/n)
remainder = len(df) % step
splits = [remainder]
while max(splits) < len(df):
splits.append(max(splits) + step)
return splits
splits = df_split(df, 3)
This returns [1, 34, 67, 100].
I would then like to get the sub-arrays, or slices:
arrays = []
for i in range(len(splits) - 1):
st = splits[i]
end = splits[i+1]
arrays.append(df[st:end])
The final iteration of this for loop, however, is indexing df[67:100], which is exclusive of the last row in the df. I would like to make sure that the last row is included.
If I utilize df[67:101], I get an out-of-index error.
I could write an if statement that checks whether end is the last element in splits, and then simply returns df[67:], which would give the desired output, but I was wondering if there is a simpler way to achieve the same result.

Related

The function is not returning an output under the if-else condition

After applying the queries on the given array the function is not returning the resulting array(res).
''' Reversing the Array according to the queries. Each pair of values in queries are the starting and ending indices of the given Array(Arr).
'''
Arr = [5,3,2,1,3]
Queries = [0,1,1,3,2,4]
def Reverse(Arr, Queries):
res=[]
for item in Arr:
res.append(item)
n = Queries[1]
while(Queries[0] != n + 1):
for i in range(Queries[0], Queries[1] + 1):
res[Queries[0]] = Arr[Queries[1]]
Queries[0] += 1
Queries[1] -= 1
Arr.clear()
for item in res:
Arr.append(item)
Queries.pop(0)
Queries.pop(0)
#print(res)
if len(Queries)== 0:
return res
else:
Reverse(Arr, Queries)
print(Reverse(Arr, Queries))
In your Reverse function, if len(Queries) does not equal 0, nothing is explicitly returned, so the function returns None.
Even then, you will get an IndexError because of you convoluted logic.
Instead, let's copy arr (which is a list, rather than an array). Then we'll iterate over the indices in queries. By giving range three arguments we specify where to start, where to stop, and the amount to increment by. If queries has length 6, i will be 0, 2, and last 4.
We can then use this to slice the list and reverse the specified sections.
def reverse(arr, queries):
res = arr.copy()
for i in range(0, len(queries), 2):
res[queries[i]:queries[i+1]+1] = list(reversed(res[queries[i]:queries[i+1]+1]))
return res

Finding the index of an element that divide the array to two equal sums

take an array and find an index i where the sum of the integers to the left of i is equal to the sum of the integers to the right of i. If there is no index that would make this happen, return -1.
Why this code doesn't work all cases.
why it is wrong ,
def find_even_index(arr):
i=1
size=len(arr)
sum_left=0
sum_right=0
for i in range(size):
sum_right=sum(arr[i+1:]) #sum of all elements in right of i
sum_left=sum(arr[:i-1] ) #sum of all elements in left of i
if(sum_right==sum_left):
return i
return -1
There are a few logical errors in your program. Firstly you declare i prior to the for loop, you should do it within the range() function itself, as the value of i resets on iterating it over the for loop. It should be like this:
for i in range(1, size):
Also you are not doing correct slicing, remember that when you slice any list by giving start and end index, it ignores the end value, and only slices till end - 1.
For example,
>>> a = [10,20,30,40,50,60]
>>> print(a[2:4]) # this will print the 2nd and 3rd element only, ignoring the 4th one
[30,40]
Also, if value of i starts from 0, arr[:i-1] will return whole array at once, so loop will break at first iteration.
For example,
>>> a = [10,20,30,40,50]
>>> print(a[0:])
[10,20,30,40,50]
>>> print(a[:-1]) # index -1 refers to the last element of array
[10,20,30,40,50]
And you are using the return -1 statement within the for loop, so the loop will break in the first iteration itself, you should do it outside the for loop.
Now your formatted code should look like this:
def find_even_index(arr):
size = len(arr)
sum_left = 0
sum_right = 0
for i in range(1, size): # value of i initially set here as 1
sum_right = sum(arr[i+1:])
sum_left = sum(arr[:i]) # corrected the slicing
if(sum_right == sum_left):
return i
return -1 # return method indented outside the for loop
Hope this answer helps! :)
If you can't understand the reasoning even now, you can read about the topics here:
Negative Slicing
Python range

pandas groupby + transform gives shape mismatch

I have a Pandas Dataframe with one column for the index of the row in the group. I now want to determine whether that row is in the beginning, middle, or end of the group based on this index. I wanted to apply a UDF that returns start (0) middle (1) or end(2) as output, and I want to save that output per row in a new column. Here is my UDF:
def add_position_within_group(group):
length_of_group = group.max()
three_lists = self.split_lists_into_three_parts([x for x in range(length_of_group)])
result_list = []
for x in group:
if int(x) in three_lists[0]:
result_list.append(0)
elif int(x) in three_lists[1]:
result_list.append(1)
elif int(x) in three_lists[2]:
result_list.append(2)
return result_list
Here is the split_lists_into_three_parts method (tried and tested):
def split_lists_into_three_parts(self, event_list):
k, m = divmod(len(event_list), 3)
total_list = [event_list[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(3)]
start_of_list = total_list[0]
middle_of_list = total_list[1]
end_of_list = total_list[2]
return [start_of_list,middle_of_list,end_of_list]
Here is the line of code that groups the Dataframe and runs transform() which when called on a groupby, according to what I have read, iterates over all the groups and takes the column as a series as an argument and applies my UDF. It has to return a one-dimensional list or series the same size as the group.:
compound_data_frame["position_in_sequence"] = compound_data_frame.groupby('patient_id')["group_index"].transform(self.add_position_within_group)
I'm getting the following error :
shape mismatch: value array of shape (79201,) could not be broadcast to indexing result of shape (79202,)
I still can't figure out what kind of output my function has to have when passed to transform, or why I'm getting this error. Any help would be much appreciated.
Well I'm embarrassed to say this but here goes: in order to create the three lists of indices I use range(group.max()), which creates a range of the group-size -1. What I should have done is either used the group size or added 1 to group.max().

Index and Lists - Index out of range

Shouldn't the following code print? 100 100
price = 100 # assigns 'price' reference to 100
price = [price] # creates a 'price' list with 1 element: [100]
for i in range(1, 3):
print(price[0]) # prints 100
price[i] = price[i - 1]
price.append(price[i])
print(price[i])
Getting a IndexError: list assignment index out of range error at line price[i] = price[i - 1], but the line right before prints 100 successfully. Shouldnt price[i] simply be getting assigned price[0] value?
You're trying to append items to a list, or more precisely to initialize a list with repeated copies of something. Here are Pythonic ways to do that:
# Use a list comprehension
>>> price = [100 for _ in range(3)]
[100, 100, 100]
# Use itertools.repeat
>>> import itertools
>>> list(itertools.repeat(100, 3))
[100, 100, 100]
These are both faster than (repeatedly) doing append(), which is O(N), so repeatedly doing append() is O(N^2) on a long list, which gets very slow.
(Btw if you know a priori the list will have at least N elements, and N is large, you could initialize it to price = [None] * N and you get [None, None, None...]. Now you can directly assign to them. But, explicitly doing append is better practice for beginners.)
If you are just trying to append to a list, trying to do that with an index won't exactly work because that index isn't present in the list:
somelist = []
somelist[0] = 1
IndexError
So just use append
for i in range(1,3):
price.append(price[i-1])
The problem is you are assigning at an index too great for the length of the array.
for i in range(1, 3):
This initializes i to 1. Since arrays are zero-indexed, and the length of your array is 1 on the first pass, you will hit an assignment out of range error at the line you mentioned (when i=1).
Here is a minimal example showing the issue:
my_array = ["foo"]
my_array[1] = "bar" # throws assignment out of range error
You can't assign a value to a list directly why this list has note a previous size defined. You have to use append to add elements to the position you want.
Check:
# Check the value on our initial position
print(price[0])
for i in range(1, 3):
price.append(price[i-1])
print(price[i])
That will not print out:
100
100
You initialized a list with 1 element, the size of that list is 1. However your range starts at 1 for the for loop , what's really happening is:
price = 100 # assigns 'price' to 100
price = [price] # creates a 'price' list with 1 element: [100]
for i in range(1, 3): # The list is 0-indexed, meaning price[0] contains 100
print(price[0]) # prints 100 as it should
price[i] = price[i - 1] # i is 1, price[i] is not an assigned value, i.e: you never assigned price[1]
price.append(price[i]) # This doesn't execute because an exception was thrown
print(price[i]) # Neither does this
To get the result you're looking for, this would work:
price = [100] # creates a 'price' list with 1 element: [100]
for i in range(0, 2): # Start at index 0
print(price[i]) # Print current index
price.append(price[i]) # Append current value of price[i] to the price list
To ensure everything appended as you expected you can test it with len:
print(len(price))
Output:3
However, it is a preferred way of appending as #smci has shown in his/her answer.

i want to find out the index of the elements in an array of duplicate elements

a=[2, 1, 3, 5, 3, 2]
def firstDuplicate(a):
for i in range(0,len(a)):
for j in range(i+1,len(a)):
while a[i]==a[j]:
num=[j]
break
print(num)
print(firstDuplicate(a))
The output should be coming as 4 and 5 but it's coming as 4 only
You can find the indices of all duplicates in an array in O(n) time and O(1) extra space with something like the following:
def get_duplicate_indices(arr):
inds = []
for i, val in enumerate(arr):
val = abs(val)
if arr[val] >= 0:
arr[val] = -arr[val]
else:
inds.append(i)
return inds
get_duplicate_indices(a)
[4, 5]
Note that this will modify the array in place! If you want to keep your input array un-modified, replace the first few lines in the above with:
def get_duplicate_indices(a):
arr = a.copy() # so we don't modify in place. Drawback is it's not O(n) extra space
inds = []
for i, val in enumerate(a):
# ...
Essentially this uses the sign of each element in the array as an indicator of whether a number has been seen before. If we come across a negative value, it means the number we reached has been seen before, so we append the number's index to our list of already-seen indices.
Note that this can run into trouble if the values in the array are larger than the length of the array, but in this case we just extend the working array to be the same length as whatever the maximum value is in the input. Easy peasy.
There are some things wrong with your code. The following will collect the indexes of every first duplicate:
def firstDuplicate(a):
num = [] # list to collect indexes of first dupes
for i in range(len(a)-1): # second to last is enough here
for j in range(i+1, len(a)):
if a[i]==a[j]: # while-loop made little sense
num.append(j) # grow list and do not override it
break # stop inner loop after first duplicate
print(num)
There are of course more performant algorithms to achieve this that are not quadratic.

Categories