How to perform Mann-Whitney U test in python with cycle? - python

I have a loop that gives new values k1 and k2 each time, but the problem is that in my dataset there are cases where all values are zero in both k1 and k2. When the program comes to them, it just throws an error and does not complete the loop, and there is still a lot of calculations. How can I make such cases just be signed, like NA or something else, and the cycle goes on?
python3
import pandas
from scipy.stats import mannwhitneyu
print(mannwhitneyu(k1, k2))
I conduct this Mann-Whitney U test for different observations and I want the cycle not to stop at an error, but simply to note that it is impossible here
Error example(line 3, above are normal):
MannwhitneyuResult(statistic=3240.0, pvalue=0.16166098643677973)
MannwhitneyuResult(statistic=2958.5, pvalue=0.008850960706454409)
Traceback (most recent call last):
File "ars1", line 95, in <module>
print(mannwhitneyu(k1, k2))
File "/storage/software/python-3.6.0/lib/python3.6/site-packages/scipy/stats/stats.py", line 4883, in mannwhitneyu
raise ValueError('All numbers are identical in mannwhitneyu')
ValueError: All numbers are identical in mannwhitneyu

You can continue with loop if 2 arrays are equal. For instance, if:
k1 = [0,0,0,0,0];
k2 = [0,0,0,0,0];
then you can check whether k1 == k2. If it is true, just use continue for your loop. Like this:
if ( k1 == k2 ) == True: continue
just before you call mannwhitneyu(k1, k2)

I tried it in loop and also saved it in csv file in a folder
convert your series in the list and check for the equality it will work
for y in df.columns:
target = df[y]
list_mann_white = []
for x in df.columns:
if list(target) == list(df[x]):
pass
else:
list_mann_white.append([stats.mannwhitneyu(target,df[x])[1],x])
list_mann_white.sort()
mann_csv = pd.DataFrame(chi_list)
mann_csv.to_csv('Mann/target_{}.csv'.format(y))

Related

Getting error in student t-test using python

I would like to perform student t-test from the following two samples with null hypothesis as they have same mean:
$cat data1.txt
2
3
4
5
5
5
5
2
$cat data2.txt
4
7
9
10
8
7
3
I got the idea and a script to perform t-test from https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/
My script is:
$cat ttest.py
from math import sqrt
from numpy import mean
from scipy.stats import sem
from scipy.stats import t
def independent_ttest(data1, data2, alpha):
# calculate means
mean1, mean2 = mean(data1), mean(data2)
# calculate standard errors
se1, se2 = sem(data1), sem(data2)
# standard error on the difference between the samples
sed = sqrt(se1**2.0 + se2**2.0)
# calculate the t statistic
t_stat = (mean1 - mean2) / sed
# degrees of freedom
df = len(data1) + len(data2) - 2
# calculate the critical value
cv = t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
data1 = open('data1.txt')
data2 = open('data2.txt')
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
# interpret via critical value
if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
While I run this script as python3 ttest.py , I am getting following error. I think I need to change the print statement, but can't able to do it.
Traceback (most recent call last):
File "t-test.py", line 28, in <module>
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
File "t-test.py", line 10, in independent_ttest
mean1, mean2 = mean(data1), mean(data2)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 3118, in mean
out=out, **kwargs)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/_methods.py", line 87, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: '_io.TextIOWrapper' and 'int'
So your issue is that you are opening the files but not reading the data from the file (or converting it to a list). Basically, opening a file just prepares the file to be read by Python - you need to read it separately.
Also as a quick sidenote, make sure to close the file when you are done or else you could run into issues if you run the code multiple times in quick succession. The code below should work for your needs, just replace the calls to open with this code, replacing file names and other details as needed. The array here is the data you are looking for to pass to independent_ttest.
array = []
with open("test1.txt") as file:
while value:=file.readline():
array.append(int(value))
print(array)
We open our file using with to make sure it is closed at the end.
Then we use a while loop to read each line. The := assigns each line to value as they are looped through.
Finally, for each value we convert it from string to int and then append it to our list.
Hope this helped! Let me know if you have any questions!

How does one perform operations only when certain conditions are met for individual values when inputting a numpy array into a function?

I'm plotting a function over a range of values, so naturally I fed a numpy.arange() into the function to get the dependent values for the plot. However, some of the values are going to NaN or infinity. I know why this is happening, so I went back into the function and included some conditionals. If they were working, these would replace values leading to the NaN outputs only when conditions are met. However, based on the errors I'm getting back, it appears the operations are being performed on every entry in the inputs, rather than only when those conditions are met.
My code is as follows
import matplotlib.pyplot as plt
import numpy as np
import math
k=10
l=50
pForPlot = np.arange(0,1,0.01)
def ProbThingHappens(p,k,l):
temp1 = (((1-p)**(k-1)*((k-1)*p+1))**l)
temp2 = (((1-p)**(1-k))/((k-1)*p+1))
if float(temp2) <= 1.0*10^-300:
temp2 = 0
print("Temp1 = " + str(temp1))
print("Temp2 = " + str(temp2))
temp = temp1*(temp2**l-1)
if math.isnan(temp):
temp = 1
print("Temp = " + str(temp))
return temp
plt.plot(pForPlot,ProbThingHappens(pForPlot,k,l))
plt.axis([0,1,0,1])
plt.xlabel("p")
plt.ylabel("Probability of thing happening")
plt.rcParams["figure.figsize"] = (20,20)
plt.rcParams["font.size"] = (20)
plt.show()
And the error it returns is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-e707ad24af01> in <module>
20 return temp
21
---> 22 plt.plot(pForPlot,ProbAnyOverlapsAtAll(pForPlot,k,l))
23 plt.axis([0,1,0,0.01])
24 plt.xlabel("p")
<ipython-input-20-e707ad24af01> in ProbAnyOverlapsAtAll(p, k, l)
10 temp1 = (((1-p)**(k-1)*((k-1)*p+1))**l)
11 temp2 = (((1-p)**(1-k))/((k-1)*p+1))
---> 12 if float(temp2) <= 1.0*10^-300:
13 temp2 = 0
14 print("Temp1 = " + str(temp1))
TypeError: only size-1 arrays can be converted to Python scalars
How do I specify within the function that I only want to operate on specific values, rather than the whole array of values being produced? I assume that if I know how to get float() to pick out just one value, it should be straightforward to do the same for other conditionals. I'm terribly sorry if this question has been asked before, but I searched for answers using every phrasing I could imagine. I'm afraid in this case I may simply not know know the proper terminology. Any assistance would be greatly appreciated.

fminbound for a simple equation

def profits(q):
range_price = range_p(q)
range_profits = [(x-c(q))*demand(q,x) for x in range_price]
price = range_price[argmax(range_profits)] # recall from above that argmax(V) gives
# the position of the greatest element in a vector V
# further V[i] the element in position i of vector V
return (price-c(q))*demand(q,price)
print profits(0.6)
print profits(0.8)
print profits(1)
0.18
0.2
0.208333333333
With q (being quality) in [0,1], we know that the maximizing quality is 1. Now the question is, how can I solve such an equation? I keep getting the error that either q is not defined yet (which is only natural as we are looking for it) or I get the error that some of the arguments are wrong.
q_firm = optimize.fminbound(-profits(q),0,1)
This is what I've tried, but I get this error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-99-b0a80dc20a3d> in <module>()
----> 1 q_firm = optimize.fminbound(-profits(q),0,1)
NameError: name 'q' is not defined
Can someone help me out? If I need to supply you guys with more information to the question let me know, it's my first time using this platform. Thanks in advance!
fminbound needs a callable, while profits(q) tries to calculate a single value. Use
fminbound(lambda q: -profits(q), 0, 1)
Note that the lambda above is only needed to generate a function for negative profits. Better define a function for -profits and feed it to fminbound.
Better still, use minimize_scalar instead of fminbound.

Coding problems when define something, but can't plot them

import numpy as N
from matplotlib import pylab as plt
def function2(t):
if (t-N.floor(t))<0.5:
return -1
else:
return 1
def function3(t):
if t<=5:
return N.cos(40*N.pi*t)
else:
return 0
x2= N.linspace(0,10,1024)
y2= function2(x2)
x3= N.linspace(0,40,8192)
y3= function3(x3)
plt.plot(x2,y2)
plt.show()
No matter I try plot(x2,y2) or (x3,y3), it shows error message, but I can print any single value of function2 and function3 without any problems.
I'm stuck here. Thanks in advance.
You are having:
Traceback (most recent call last):
File "b.py", line 16, in <module>
y2= function2(x2)
File "b.py", line 4, in function2
if (t-N.floor(t))<0.5:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
which comes from the line:
if (t-N.floor(t))<0.5:
Here you are doing if array < 0.5. What is your intent?
If you want to check all elements of the array > 0.5, you can:
all(array < 0.5)
function2 and function3 is a scalar function. You need to convert them into vectorized one:
x2 = N.linspace(0,10,1024)
y2 = N.vectorize(function2)(x2)
x3 = N.linspace(0,40,8192)
y3 = N.vectorize(function3)(x3)
See numpy.vectorize.
What you were doing, was applying array comparison, which gives you an array of True/False. So your if function evaluates to both at the same time. That raises an error.
While the solution proposed #falsetru is acceptable, I strongly advise against using vectorize, since it adds unnecessary for loops. Instead you can utilize the strength of numpy to do simple comparison operations above. Example: if a is an array (a>0) returns an element-wise boolean array with True(1) or False(0), which later can be operated on. Your code should look like this:
def function2(t):
return 1-2*(t-N.floor(t)<0.5) # returns 1- 2*True(1)/False(0)
def function3(t):
return (t<=5)*N.cos(40*pi*t) # returns 0 if t<=5 evaluates to False
x2= N.linspace(0,10,1024)
y2= function2(x2)
x3= np.linspace(0,40,8192)
y3= function3(x3)
plt.plot(x2,y2)
ylim(-2,2)
plt.show()

Python appending a list in a for loop with numpy array data

I am writing a program that will append a list with a single element pulled from a 2 dimensional numpy array. So far, I have:
# For loop to get correlation data of selected (x,y) pixel for all bands
zdata = []
for n in d.bands:
cor_xy = np.array(d.bands[n])
zdata.append(cor_xy[y,x])
Every time I run my program, I get the following error:
Traceback (most recent call last):
File "/home/sdelgadi/scr/plot_pixel_data.py", line 36, in <module>
cor_xy = np.array(d.bands[n])
TypeError: only integer arrays with one element can be converted to an index
My method works when I try it from the python interpreter without using a loop, i.e.
>>> zdata = []
>>> a = np.array(d.bands[0])
>>> zdata.append(a[y,x])
>>> a = np.array(d.bands[1])
>>> zdata.append(a[y,x])
>>> print(zdata)
[0.59056658, 0.58640128]
What is different about creating a for loop and doing this manually, and how can I get my loop to stop causing errors?
You're treating n as if it's an index into d.bands when it's an element of d.bands
zdata = []
for n in d.bands:
cor_xy = np.array(n)
zdata.append(cor_xy[y,x])
You say a = np.array(d.bands[0]) works. The first n should be exactly the same thing as d.bands[0]. If so then np.array(n) is all you need.

Categories