Getting error in student t-test using python - python

I would like to perform student t-test from the following two samples with null hypothesis as they have same mean:
$cat data1.txt
2
3
4
5
5
5
5
2
$cat data2.txt
4
7
9
10
8
7
3
I got the idea and a script to perform t-test from https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/
My script is:
$cat ttest.py
from math import sqrt
from numpy import mean
from scipy.stats import sem
from scipy.stats import t
def independent_ttest(data1, data2, alpha):
# calculate means
mean1, mean2 = mean(data1), mean(data2)
# calculate standard errors
se1, se2 = sem(data1), sem(data2)
# standard error on the difference between the samples
sed = sqrt(se1**2.0 + se2**2.0)
# calculate the t statistic
t_stat = (mean1 - mean2) / sed
# degrees of freedom
df = len(data1) + len(data2) - 2
# calculate the critical value
cv = t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
data1 = open('data1.txt')
data2 = open('data2.txt')
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
# interpret via critical value
if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
While I run this script as python3 ttest.py , I am getting following error. I think I need to change the print statement, but can't able to do it.
Traceback (most recent call last):
File "t-test.py", line 28, in <module>
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
File "t-test.py", line 10, in independent_ttest
mean1, mean2 = mean(data1), mean(data2)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 3118, in mean
out=out, **kwargs)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/_methods.py", line 87, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: '_io.TextIOWrapper' and 'int'

So your issue is that you are opening the files but not reading the data from the file (or converting it to a list). Basically, opening a file just prepares the file to be read by Python - you need to read it separately.
Also as a quick sidenote, make sure to close the file when you are done or else you could run into issues if you run the code multiple times in quick succession. The code below should work for your needs, just replace the calls to open with this code, replacing file names and other details as needed. The array here is the data you are looking for to pass to independent_ttest.
array = []
with open("test1.txt") as file:
while value:=file.readline():
array.append(int(value))
print(array)
We open our file using with to make sure it is closed at the end.
Then we use a while loop to read each line. The := assigns each line to value as they are looped through.
Finally, for each value we convert it from string to int and then append it to our list.
Hope this helped! Let me know if you have any questions!

Related

Too many values to unpack? python-radar

I have problem which when i run the code it shows an error:
Traceback (most recent call last):
File "C:\Users\server\PycharmProjects\Publictest2\main.py", line 19, in <module>
Distance = radar.route.distance(Starts, End, modes='transit')
File "C:\Users\server\PycharmProjects\Publictest2\venv\lib\site-packages\radar\endpoints.py", line 612, in distance
(origin_lat, origin_lng) = origin
ValueError: too many values to unpack (expected 2)
My Code:
from radar import RadarClient
import pandas as pd
API_key = 'API'
radar = RadarClient(API_key)
file = pd.read_excel('files')
file['AntGeo'] = Sourced[['Ant_lat', 'Ant_long']].apply(','.join, axis=1)
file['BaseGeo'] = Sourced[['Base_lat', 'Base_long']].apply(','.join, axis=1)
antpoint = file['AntGeo']
basepoint = file['BaseGeo']
for antpoint in antpoint:
dist= radar.route.distance(antpoint , basepoint, modes='transit')
dist= dist['routes'][0]['distance']
dist= dist / 1000
Firstly, your error code does not match your given code sample correctly.
It is apparent you are working with the python library for the Radar API.
Your corresponding line 19 is dist= radar.route.distance(antpoint , basepoint, modes='transit')
From the radar-python 'pypi manual', your route should be referenced as:
## Routing
radar.route.distance(origin=[lat,lng], destination=[lat,lng], modes='car', units='metric')
Without having sight of your dataset, file, one can nonetheless deduce or expect the following:
Your antpoint and basepoint must be a two-item list (or tuple).
For instance, your antpoint ought to have a coordinate like [40.7041029, -73.98706]
See the radar-python manual
line 11 and 13 in your code
file['AntGeo'] = Sourced[['Ant_lat', 'Ant_long']].apply(','.join, axis=1)
file['BaseGeo'] = Sourced[['Base_lat', 'Base_long']].apply(','.join, axis=1)
Your error is occuring at this part:
Distance = radar.route.distance(Starts, End, modes='transit')
(origin_lat, origin_lng) = origin
First of all check the amount of variables that "origin" delivers to you, it's mismatched with the expectation I guess.

TypeError: '>=' not supported between instances of 'tuple' and 'float'

I am trying to write a little python script that calculates the Tanimoto similarity index between a molecule of interest and a database of molecules. I am using pybel.
The database, in the .smi format, have chemical information of molecules on the first column and their names as a second one and looks like this:
C[C#]12CC[C#H](C1(C)C)CC2=O (-)-CAMPHOR
CC1=CC[C#H](C(=C)C)C[C##H]1O (-)-CARVEOL
CC1=CC[C#H](CC1=O)C(=C)C (-)-CARVONE
O=CC[C##H](C)CCC=C(C)C (-)-CITRONELLAL
OCC[C##H](C)CCC=C(C)C (-)-CITRONELLOL
C[C##H]1CC[C##H](C(=C)C)C[C#H]1O (-)-DIHYDROCARVEOL
C[C##]12CC[C##H](C1)C(C2=O)(C)C (-)-Fenchone
C[C##H]1CC[C#H]([C##H](C1)O)C(C)C (-)-MENTHOL
C[C##H]1CC[C#H](C(=O)C1)C(C)C (-)-MENTHONE
C[C##H]1CCCCCCCCCCCCC(=O)C1 (-)-MUSCONE
CC(=C)[C#H]1CCC(=CC1)C=O (-)-PERILLALDEHYDE
.
.
.
This version of the script works as I expect:
from openbabel import pybel
targetmol = next(pybel.readfile("smi", "/path/to/sample.smi"))
targetfp = targetmol.calcfp() <--- calculate fingerprints of the sample
for mol in pybel.readfile("smi", "/path/to/db.smi"):
fp = mol.calcfp() <--- calculate fingerprints of the db
tan = fp | targetfp <--- calculate the Tanimoto index via the "|" operator
if tan>=0.8:
print(tan)
Output:
1.0
1.0
0.9285714285714286
0.8571428571428571
1.0
1.0
0.9285714285714286
0.8571428571428571
.
.
.
Clearly, in order to give a meaning to the numbers I receive, I need to add the molecule name to the corresponding Tanimoto index. I tried this:
from openbabel import pybel
targetmol = next(pybel.readfile("smi", "/path/to/sample.smi"))
targetfp = targetmol.calcfp()
for mol in pybel.readfile("smi", "/path/to/db.smi"):
fp = mol.calcfp()
tan = (fp | targetfp, mol.title)
if tan>=0.8:
print(tan, title)
As from the title, I receive the following error:
Traceback (most recent call last):
File "test3.py", line 15, in <module>
if tan>=0.8:
TypeError: '>=' not supported between instances of 'tuple' and 'float'
My guess is that python, obviously, cannot apply the if tan>=0.8 operation to a string format but I really do not know how to overcome this problem since, as you can guess, I am very new to programming.
Any hints on how to correct this piece of code will be appreciated. Thank you for your time.
You just have to change it to:
tan[0] >= 0.8:
the comma , (the one inside tan = (fp | targetfp, mol.title)) is the syntax for a tuple, which is basically a not mutable array, so to access elements you need to do it by index like for lists.

How to perform Mann-Whitney U test in python with cycle?

I have a loop that gives new values k1 and k2 each time, but the problem is that in my dataset there are cases where all values are zero in both k1 and k2. When the program comes to them, it just throws an error and does not complete the loop, and there is still a lot of calculations. How can I make such cases just be signed, like NA or something else, and the cycle goes on?
python3
import pandas
from scipy.stats import mannwhitneyu
print(mannwhitneyu(k1, k2))
I conduct this Mann-Whitney U test for different observations and I want the cycle not to stop at an error, but simply to note that it is impossible here
Error example(line 3, above are normal):
MannwhitneyuResult(statistic=3240.0, pvalue=0.16166098643677973)
MannwhitneyuResult(statistic=2958.5, pvalue=0.008850960706454409)
Traceback (most recent call last):
File "ars1", line 95, in <module>
print(mannwhitneyu(k1, k2))
File "/storage/software/python-3.6.0/lib/python3.6/site-packages/scipy/stats/stats.py", line 4883, in mannwhitneyu
raise ValueError('All numbers are identical in mannwhitneyu')
ValueError: All numbers are identical in mannwhitneyu
You can continue with loop if 2 arrays are equal. For instance, if:
k1 = [0,0,0,0,0];
k2 = [0,0,0,0,0];
then you can check whether k1 == k2. If it is true, just use continue for your loop. Like this:
if ( k1 == k2 ) == True: continue
just before you call mannwhitneyu(k1, k2)
I tried it in loop and also saved it in csv file in a folder
convert your series in the list and check for the equality it will work
for y in df.columns:
target = df[y]
list_mann_white = []
for x in df.columns:
if list(target) == list(df[x]):
pass
else:
list_mann_white.append([stats.mannwhitneyu(target,df[x])[1],x])
list_mann_white.sort()
mann_csv = pd.DataFrame(chi_list)
mann_csv.to_csv('Mann/target_{}.csv'.format(y))

Converting speeds in my code

I had written a code beforehand, but was only just made aware by my teacher that the speed is supposed to be mph, rather than mps. I made necessary changes but I keep on receiving an error. The context of the code isn't important.
import re
# DATA
distance = 0.06 # Distance between the Camera A and B; 0.06 = 600 metres
speed_limit = 20 # (meters per second)
number_plates = ["DV61 GGB", #UK
"D31 EG 2A", #F
"5314 10A02", #F
"24TEG 5063", #F
"TR09 TRE", #UK
"524 WAL 75", #F
"TR44 VCZ", #UK
"FR52 SWD", #UK
"100 GBS 12", #F
"HG55 BPO" #UK
]
enter = [7.12,7.17,7.22,7.12,7.23,7.41,7.18,7.25,7.11,7.38]
leave = [7.56,7.39,7.49,7.56,7.45,7.57,7.22,7.31,7.59,7.47]
mph=2.236936
# Find the non-UK plates
pattern = "(?![A-Z]{2}\d{2}\s+[A-Z]{3}$)"
foreign_numbers = list(filter(lambda x: re.match(pattern, x), number_plates))
# Calculations for speed
elapsed = [(l - e)/100 for l, e in zip(leave, enter)]
speed_mps = [distance/t for t in elapsed]
def mps_to_mph():
speed = [s*h for s,h in zip(speed_mps,mph)]
mps_to_mph()
print(speed)
The error:
>>>
Traceback (most recent call last):
File "M:\ICT Coursework\Task 2.1.py", line 35, in <module>
mps_to_mph()
File "M:\ICT Coursework\Task 2.1.py", line 33, in mps_to_mph
speed = [s*h for s,h in zip(speed_mps,mph)]
TypeError: zip argument #2 must support iteration
Perhaps, is speed = [s*h for s,h in zip(speed_mps,mph)] not the right way to convert the speed?
zip is for iterating through two (or more) sequences in parallel. You are passing it one sequence and one number. I think you mean this:
speed = [s*mph for s in speed_mps]

Getting type error:can't multiply sequence by non-int of type 'function'

I keep getting this error:
Traceback (most recent call last):
File "C:\Users\Andrew\Desktop\lab10.py", line 66, in <module>
main()
File "C:\Users\Andrew\Desktop\lab10.py", line 55, in main
volumeRectangle=VR(length,width,height)
File "C:\Users\Andrew\Desktop\lab10.py", line 20, in VR
volume=length*width*height
TypeError: can't multiply sequence by non-int of type 'function'
Code
import math
def AC(radius):
area = math.pi * radius ** 2
return area
def AR(length,width):
area=length*width
return area
def VC(radius,height):
volume=math.pi*radius*radius*height
return volume
def VR(length,width,height):
volume=length*width*height
return volume
# WRITE ALL THE OTHER FUNCTIONS
def main():
inFile = open("lab10.input","r")
# get calculation type, but wait on dimension(s)
type = (inFile.readline()).strip()
while (type != "###"):
if (type == "AC"):
radius = eval(inFile.readline())
circleArea = AC(radius)
print(format("Area of a Circle","30s"),format(circleArea,"15.2f"))
if (type=='AR'):
length=eval(inFile.readline())
width=eval(inFile.readline())
rsArea=ARR(length,width)
print(format("Area of a Rectangle or Square",'30s'),format(rsArea,'15.2f'))
if (type=='VC'):
radius=eval(inFile.readline())
height=eval(inFile.readline())
volumeCylinder=VC(radius,height)
print(format("Volume of a Cylinder",'30s'),format(volumeCylinder,'15.2f'))
if (type=='VR'):
length=eval(inFile.readline())
width=eval(inFile.readline())
height=eval(inFile.readline())
volumeRectangle=VR(length,width,height)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
# do the processing for all other types of calculations
# get calculation type, but wait on dimension(s)
type = (inFile.readline()).strip()
main()
This is what the input file looks like. INPUT FILE
AC
7.5
SAC
4
VR
2, 3, 4.1
AR
13, 3.25
SAS
24
###
0
This seem to work. Like Paul said inputs in the same lines were getting messed up.
while (type != "###"):
if (type == "AC"):
radius = eval(inFile.readline())
circleArea = AC(radius)
print(format("Area of a Circle","30s"),format(circleArea,"15.2f"))
if (type=='AR'):
length, width=eval(inFile.readline().strip())
rsArea=AR(length,width)
print(format("Area of a Rectangle or Square",'30s'),format(rsArea,'15.2f'))
if (type=='VC'):
radius, height=eval(inFile.readline().strip())
volumeCylinder=VC(radius,height)
print(format("Volume of a Cylinder",'30s'),format(volumeCylinder,'15.2f'))
if (type=='VR'):
length, width, height =eval(inFile.readline().strip())
volumeRectangle=VR(length,width,height)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
You are running into issues when you hit the VR command. This is because it has multiple input parameters. You 'got away with it' in the first command (AC) as it has only a single parameter and eval turned it into a single float value.
For VR there are 3 parameters and you are trying to set each of these parameters by reading a line in the file (that's 3 lines read). Now all your parameters are actually together on single line. So if you thought you were reading the parameters
12, 3, 4.1
You are not you are reading
2, 3, 4.1
AR
13, 3.25
When you eval these you get tuples for the 1st and 3rd lines and since AR is a function you defined, eval('AR') returns you that function (which makes sense of the exception error message you received). This is one of the hazards of using eval, it can lead to unexpected results.
So your VR call looks like
VR((2, 3, 4.1), AR, (13, 3.25))
Which is doubtless not what you expected.
Two ways you could resolve it. I've abbreviated your code for clarity. You'd obviously need to replicate in the other commands.
Stick with the tuples and ast.literal_eval (use that instead of eval).
import math
from ast import literal_eval
def VR(length,width,height):
return length * width * height
def main():
with open("lab10.input","r") as inFile:
acttype = (inFile.readline()).strip()
while (acttype != "###"):
if (acttype=='VR'):
params = literal_eval(inFile.readline())
volumeRectangle = VR(*params)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
acttype = (inFile.readline()).strip()
main()
or avoid the eval by splitting the line on ,.
import math
def VR(length,width,height):
return length * width * height
def main():
with open("lab10.input","r") as inFile:
acttype = (inFile.readline()).strip()
while (acttype != "###"):
if (acttype=='VR'):
params = map(float, inFile.readline().split(','))
volumeRectangle = VR(*params)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
acttype = (inFile.readline()).strip()
main()

Categories