Python get the p-value of the array - python

My task is that when the input is *X, every input is an array of one column with n rows, but they may have different rows(eg X[0] is an array with 1 column and 10 rows, and X[2] is with 1 column and 9 rows), I want the code counts the p-value of the every two arrays and get the lowest p-value and the order of X[n](eg. X[1] means the first array and so on). The code goes wrong, 'local variable ans_1 referred before assignment'. I don't know how to do with this.
def mass_independent_ttest(*X):
min_pvalue = 10
for i in range(0, len(X)):
for j in range(i+1, len(X)):
df_1 = pd.DataFrame(X[i])
df_2 = pd.DataFrame(X[j])
df_first = df_1.loc[:,0]
df_second = df_2.loc[:,0]
temp = scipy.stats.ttest_ind(df_first, df_second)
temp_pvalue = temp.pvalue
if temp_pvalue < min_pvalue:
min_pvalue = temp_pvalue
ans_1 = i
ans_2 = j
ans_tuple = (ans_1, ans_2, min_pvalue)
return ans_tuple
`

At the last iteration of i, range(i+1, len(X)) will be an empty list, so that code won't execute, and ans_1 and ans_2 don't exist when you call ans_tuple = (ans_1, ans_2, min_pvalue). So you should evaluate your outer and inner loops to see if you are getting the expected number of iterations.
This example shows conceptually what is happening.
for i in range(0, len(X)):
print(list(range(i+1, len(X))))
=== Output: ===
[1, 2, 3, 4]
[2, 3, 4]
[3, 4]
[4]
[]

Related

Extra zeros appear at the end of existing items when appending new lists to a list

I'm trying to solve Pascal triangle prblem on leetcode: "Return a given number of Pascal triangle rows". I've defined a function getNextRow(row) that calculates the next rows given the current one and then calling it a certain number of times and appending these rows to my resulting list. For some reason extra zero appears at the end of the previous row each time I'm adding a new row.
E.g.
Input: 5 #5 rows needed
Output: [[1,0],[1,1,0],[1,2,1,0],[1,3,3,1,0],[1,4,6,4,1]]
Expected output: [[1],[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1]]
def getNextRow(row):
res = [1]
if len(row) == 0:
return res
row.append(0)
for i in range(len(row) - 1):
res.append(row[i] + row[i+1])
return res
def generate(numRows):
pascal = [] #Empty resulting triangle
currentRow = []
num = 0 #Counter
while num < numRows:
currentRow = getNextRow(currentRow)
pascal.append(currentRow)
num += 1
return pascal
if __name__ == '__main__':
print(generate(5))
The issue here is because you are reusing the variable currentRow on each iteration. In the fourth line of the getNextRow function, you are appending 0 to the row passed in to the variable. This is directly referencing the currentRow variable in memory and therefore making changes before the currentRow is added to the pascal list. To fix this, you can either copy the currentRow before you add it to the pascal list e.g:
pascal.append(currentRow.copy())
or copy the row variable within getNextRow like so:
def getNextRow(row):
row = row.copy()
Hope this helps!
You append zero to row in getNextRow(). That entry in the list never gets modified so you'll always see a redundant trailing zero.
Your code is also rather cumbersome. Here's a more concise implementation:-
def pascal(n):
r = [[1]]
for i in range(2, n+1):
p = r[-1]
r.append([1] + [p[j-1] + p[j] for j in range(1, i-1)] + [1])
return r
print(pascal(6))
Output:
[[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1]]

Finding the sum of column of all the rows in a matrix without using numpy

This is my code to find the sum of all the elements of all columns in a given matrix:
row, col = map(int, input().split())
mat1 = [list(map(int, input().split())) for i in range(row)]
result = 0
j = 0
for i in range(row):
result += mat1[i][j]
print(result)
I am able to get the answer for the first column but I am unable to do it for other columns. Where should I increment j to +1 to get the result for other columns?
This is the input:
2 2
5 -1
19 8
This is the output:
24
7
I got 24 as the answer. How should I get 7 now?
EDIT:
Can I insert the code in a function? After I got the first column's answer, I can increment j outside the loop and then call the function? I think it is known as recursion. I don't know I am new to programming
You should use another for loop for j and reinitialize the result when a new column started to be processed.
for j in range(col):
result = 0
for i in range(row):
result += mat1[i][j]
print(result)
Can I insert the code in a function? After I got the first column's answer, I can increment j outside the loop and then call the
function? I think it is known as recursion.
Yes, you can do this with recursion.
matrix = [[5, -1], [19, 8]]
row = 2
column = 2
def getResult(j, matrix, result):
if j >= column:
return result
s = sum([matrix[i][j] for i in range(row)])
result.append(s)
return getResult(j + 1, matrix, result)
result = getResult(0, matrix, [])
Output
> result
[24, 7]
You should define loop on columns (you have used a fixed value j) and call your code for each column as this:
row, col = map(int, input().split())
mat1 = [list(map(int, input().split())) for i in range(row)]
def sum_column(mat1, row, j):
result = 0 #initialize
for i in range(row): #loop on rows
result += mat1[i][j]
return result
for j in range(col): #loop on columns
#call function for each column
print(f'Column {j +1} sum: {sum_column(mat1, row, j)}')
UPDATE:
please pay attention to the way you get input from user. Although the function is applied on col but in your code user can define more columns than col. You can ignore extra columns like this:
mat1 = [list(map(int, input().split()[:col])) for i in range(row)]

How to choose specific minimum values in lists and do mathematical operations on them

After getting data from user input I put the input in lists like this:
x= [3, 2, 1, 0, 1, 2]
y= [1, 2, 0, 3, 4, 1]
I have manged to write this as:
rows = 3
weight = 0
high =0
low =0
while rows>=3 and rows<=200:
rows, weight = map(int, input().split())
break
count_input = 0
while count_input<rows:
while high>=0 and low<=100:
high, low = map(int, input().split())
i=i+1
if count_input==rows:
break
To choose the minimum number in a list i tried this:
smallest = None
for number in [1, 0, 3, 4, 5, 2]:
if smallest is None or number < smallest:
smallest = number
print('Smallest:', smallest)
my questions are:
How to determine minimum values in these two lists and add minimum values together BUT taking into account that selected minimum values of same positions like x[0] and y[0], or x[1] and y[1] can not be added together.
Elements in diagonal position to each other like x[0] and y[1], x[2] and y[3] can not be added together
Also How to put a limit for number of chosen values, like choosing the minimum 4 values found in lists together
This is how I would approach finding the minimum of the data set, with the logic for not using values between the lists if they have the same index or a diagonal index
x = [3, 2, 1, 0, 1, 2]
y = [1, 2, 0, 3, 4, 1]
final_min = max(x) + max(y)
for x_index in range(0, len(x)):
for y_index in range(0, len(y)):
if y_index == x_index - 1 or y_index == x_index or y_index == x_index + 1:
pass
else:
test_min = x[x_index] + y[y_index]
if test_min < final_min:
print(test_min) # prints 3, 2, 1
final_min = test_min
print(final_min) # prints 1
This makes sense by visually looking at the data, as there are 3 places that the sum would be 1, and the only place it could be smaller (0) would be 0 + 0 but that is a diagonal pair so it cannot be included. You should keep in mind that this is a computationally expensive approach though because it iterates through the y list for every index in the x list, and if your lists are large this will take a LONG time! Also, if the lists are different lengths then the program will likely hit an IndexError and I have not included safeguards for that. I cannot help you on your final point because I do not understand what is meant by:
How to put a limit for number of chosen values, like choosing the minimum 4 values found in lists together
You would need to clarify for anybody to understand what is meant here.
Use min(..) and index(..).
This solution may not be entirely correct, but you get the idea...
def add_min_with_constraints(a, b):
if len(a) == 0 or len(b) == 0:
return -math.inf
min_a_i = a.index(min(a))
min_b_i = b.index(min(b))
if min_a_i == min_b_i or abs(min_a_i - min_b_i) == 1: # same or diagonal indices
# exclude the current minimums
return add_min_with_constraints(a[:min_a_i] + a[min_a_i+1:],
b[:min_b_i] + b[min_b_i+1:])

If numbers in list are equal to n, print out their indices

The Task:
You are given two parameters, an array and a number. For all the numbers that make n in pairs of two, return the sum of their indices.
input is: arr = [1, 4, 2, 3, 0, 5] and n = 7
output: 11
since the perfect pairs are (4,3) and (2,5) with indices 1 + 3 + 2 + 5 = 11
So far I have this, which prints out the perfect pairs
from itertools import combinations
def pairwise(arr, n):
for i in combinations(arr, 2): # for index in combinations in arr, 2 elements
if i[0] + i[1] == n: # if their sum is equal to n
print(i[0],i[1])
Output:
4,3 2,5
However does anyone has tips on how to print the indices of the perfect pairs? Should I use numpy or should I change the whole function?
Instead of generating combinations of array elements you can generate combinations of indices.
from itertools import combinations
def pairwise(arr, n):
s = 0
for i in combinations(range(len(arr)), 2): # for index in combinations in arr, 2 elements
if arr[i[0]] + arr[i[1]] == n: # if their sum is equal to n
# print(arr[i[0]],arr[i[1]])
# print(i[0],i[1])
s += i[0] + i[1]
# print(s)
return s
You can use a dictonary mapping the indexes:
def pairwise(arr, n):
d = {b:a for a,b in enumerate(arr)} #create indexed dict
for i in combinations(arr, 2): # for index in combinations in arr, 2 elements
if i[0] + i[1] == n: # if their sum is equal to n
print(d[i[0]],d[i[1]])
Here you have a live example
Rather than generating combinations and checking if they add up to n, it's faster to turn your list into a dict where you can look up the exact number you need to add up to n. For each number x you can easily calculate n - x and then look up the index of that number in your dict.
This only works if the input list doesn't contain any duplicate numbers.
arr = [1, 4, 2, 3, 0, 5]
n = 7
indices = {x: i for i, x in enumerate(arr)}
total = 0
for i, x in enumerate(arr):
remainder = n - x
if remainder in indices:
idx = indices[remainder]
total += i + idx
# the loop counts each pair twice (once as [a,b] and once as [b,a]), so
# we have to divide the result by two to get the correct value
total //= 2
print(total) # output: 11
If the input does contain duplicate numbers, you have rewrite the code to store more than one index in the dict:
import collections
arr = [1, 4, 2, 3, 0, 5, 2]
n = 7
indices = collections.defaultdict(list)
for i, x in enumerate(arr):
indices[x].append(i)
total = 0
for i, x in enumerate(arr):
remainder = n - x
for idx in indices[remainder]:
total += i + idx
# the loop counts each pair twice (once as [a,b] and once as [b,a]), so
# we have to divide the result by two to get the correct value
total //= 2
You should use the naive approach here:
process each element of the array with its indice
for each element test for all elements after this one (to avoid duplications) whether their sum is the expected number and if it is add the sum of their indices
Code could be:
def myfunc(arr, number):
tot = 0
for i, val in enumerate(arr):
for j in range(i+1, len(arr)):
if val + arr[j] == number:
tot += i + j
return tot
Control:
>>> myfunc([1, 4, 2, 3, 0, 5], 7)
11
>>> myfunc([2, 4, 6], 8)
2

Find missing data indices using python

What is the optimum way to return indices where 1-d array has missing data. The missing data is represented by zeros. The data may be genuinely zero but not missing. We only want to return indices where data is zero for more than or equal to 3 places at a time. For example for array [1,2,3,4,0,1,2,3,0,0,0,1,2,3] the function should only return indices for second segment where there are zeros and not the first instance.
This is actually an interview question :) challenge is to do most effeciently in one line
Keep track of the count of zeros in the current run. Then if a run finishes that has at least three zeros calculate the indexes.
def find_dx_of_missing(a):
runsize = 3 # 3 or more, change to 4 if your need "more than 3"
zcount = 0
for i, n in enumerate(a):
if n == 0:
zcount += 1
else:
if zcount >= runsize:
for j in range(i - zcount, i):
yield j
zcount = 0
if zcount >= runsize: # needed if sequence ends with missing
i += 1
for j in range(i - zcount, i):
yield j
Examples:
>>> a = [1,2,3,4,0,1,2,3,0,0,0,1,2,3]
>>> list(find_dx_of_missing(a))
[8, 9, 10]
>>> a = [0,0,0,3,0,5,0,0,0,0,10,0,0,0,0,0]
>>> list(find_dx_of_missing(a))
[0, 1, 2, 6, 7, 8, 9, 11, 12, 13, 14, 15]
Edit: Since you need a one liner here are two candidates assuming a is your list and n is the smallest run of zeros that count as missing data:
[v for vals in (list(vals) for iszeros, vals in itertools.groupby(xrange(len(a)), lambda dx, a=a: a[dx]==0) if iszeros) for v in vals if len(vals) >= n]
Or
sorted({dx for i in xrange(len(a)-n+1) for dx in xrange(i, i+n) if set(a[i:i+n]) == {0}})

Categories