I had written a code beforehand, but was only just made aware by my teacher that the speed is supposed to be mph, rather than mps. I made necessary changes but I keep on receiving an error. The context of the code isn't important.
import re
# DATA
distance = 0.06 # Distance between the Camera A and B; 0.06 = 600 metres
speed_limit = 20 # (meters per second)
number_plates = ["DV61 GGB", #UK
"D31 EG 2A", #F
"5314 10A02", #F
"24TEG 5063", #F
"TR09 TRE", #UK
"524 WAL 75", #F
"TR44 VCZ", #UK
"FR52 SWD", #UK
"100 GBS 12", #F
"HG55 BPO" #UK
]
enter = [7.12,7.17,7.22,7.12,7.23,7.41,7.18,7.25,7.11,7.38]
leave = [7.56,7.39,7.49,7.56,7.45,7.57,7.22,7.31,7.59,7.47]
mph=2.236936
# Find the non-UK plates
pattern = "(?![A-Z]{2}\d{2}\s+[A-Z]{3}$)"
foreign_numbers = list(filter(lambda x: re.match(pattern, x), number_plates))
# Calculations for speed
elapsed = [(l - e)/100 for l, e in zip(leave, enter)]
speed_mps = [distance/t for t in elapsed]
def mps_to_mph():
speed = [s*h for s,h in zip(speed_mps,mph)]
mps_to_mph()
print(speed)
The error:
>>>
Traceback (most recent call last):
File "M:\ICT Coursework\Task 2.1.py", line 35, in <module>
mps_to_mph()
File "M:\ICT Coursework\Task 2.1.py", line 33, in mps_to_mph
speed = [s*h for s,h in zip(speed_mps,mph)]
TypeError: zip argument #2 must support iteration
Perhaps, is speed = [s*h for s,h in zip(speed_mps,mph)] not the right way to convert the speed?
zip is for iterating through two (or more) sequences in parallel. You are passing it one sequence and one number. I think you mean this:
speed = [s*mph for s in speed_mps]
Related
Function I tried to replicate:
doing a project for coursework in which I need to make the blackbody function and manipulate it in some ways.
I'm trying out alternate equations and in doing 2 of them i keep getting over flow error.
this is the error message:
alt2_2 = (1/((const_e**(freq/temp))-1))
OverflowError: (34, 'Result too large')
temp is given in kelvin (im using 5800 as my test value as it is approximately the temp of the sun)
freq is speed of light divided by whatever wavelength is inputted
freq = (3*(10**8))/wavelength
in this case i am using 0.00000005 as the test value for wavelength.
and const e is 2.7182
first time using stack. also first time doing a project on my own, any help appreciated.
This does the blackbody computation with your values.
import math
# Planck constant
h = 6.6e-34
# Boltzmann constant
k = 1.38e-23
# Speed of light
c = 3e+8
# Wavelength
wl = 0.00000005
# Temp
T = 5800
# Frequency
f = c/wl
# This is the exponent for e (about 49).
k1 = h*f / (k*T)
# This computes the spectral radiance.
Bvh = 2*f*f*f*h / (math.exp(k1)-1)
print(Bvh)
Output:
9.293819741690355e-08
Since we only used one or two digits on the way in, the resulting value is only good to one or two digits, 9.3E-08.
I would like to perform student t-test from the following two samples with null hypothesis as they have same mean:
$cat data1.txt
2
3
4
5
5
5
5
2
$cat data2.txt
4
7
9
10
8
7
3
I got the idea and a script to perform t-test from https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/
My script is:
$cat ttest.py
from math import sqrt
from numpy import mean
from scipy.stats import sem
from scipy.stats import t
def independent_ttest(data1, data2, alpha):
# calculate means
mean1, mean2 = mean(data1), mean(data2)
# calculate standard errors
se1, se2 = sem(data1), sem(data2)
# standard error on the difference between the samples
sed = sqrt(se1**2.0 + se2**2.0)
# calculate the t statistic
t_stat = (mean1 - mean2) / sed
# degrees of freedom
df = len(data1) + len(data2) - 2
# calculate the critical value
cv = t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
data1 = open('data1.txt')
data2 = open('data2.txt')
alpha = 0.05
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))
# interpret via critical value
if abs(t_stat) <= cv:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
# interpret via p-value
if p > alpha:
print('Accept null hypothesis that the means are equal.')
else:
print('Reject the null hypothesis that the means are equal.')
While I run this script as python3 ttest.py , I am getting following error. I think I need to change the print statement, but can't able to do it.
Traceback (most recent call last):
File "t-test.py", line 28, in <module>
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
File "t-test.py", line 10, in independent_ttest
mean1, mean2 = mean(data1), mean(data2)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 3118, in mean
out=out, **kwargs)
File "/home/kay/.local/lib/python3.5/site-packages/numpy/core/_methods.py", line 87, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: '_io.TextIOWrapper' and 'int'
So your issue is that you are opening the files but not reading the data from the file (or converting it to a list). Basically, opening a file just prepares the file to be read by Python - you need to read it separately.
Also as a quick sidenote, make sure to close the file when you are done or else you could run into issues if you run the code multiple times in quick succession. The code below should work for your needs, just replace the calls to open with this code, replacing file names and other details as needed. The array here is the data you are looking for to pass to independent_ttest.
array = []
with open("test1.txt") as file:
while value:=file.readline():
array.append(int(value))
print(array)
We open our file using with to make sure it is closed at the end.
Then we use a while loop to read each line. The := assigns each line to value as they are looped through.
Finally, for each value we convert it from string to int and then append it to our list.
Hope this helped! Let me know if you have any questions!
I am trying to explore the topic of concurrency in python. I saw a couple of post about how to optimize processes by splitting the input data and processes separately, and afterwards joining the results. My task is to calculate the mean along the Z axis of a stack of rasters. I read the list of raster from a text file, and them create a stack numpy array with the data.
Then I wrote a simple function to use the stack array as input and calculate the mean. This task take me some minutes to complete. And I would like to process the numpy array in chunks to optimize the script. However when I do so by using the numpy.split (maybe a not good idea to split my 3d Array), then I get the following error:
Traceback <most recent call last>:
File "C:\Users\~\AppData\Local\conda\conda\envs\geoprocessing\lib\site-packages\numpy\lib\shape_base.py",
line 553, in split
len(indices_or_sections)
TypeError: object of type 'int' has no len()
During handling of the above exception, another exception ocurred:
Traceback (most recent call last):
File "tf_calculation_numpy.py", line 69, in <module>
main()
Tile "tf_calculation_numpy.py", line 60, in main
subarrays = np.split(final_array, 4)
File "C:\Users\~\AppData\Local\conda\conda\envs\geoprocessing\lib\site-packages\numpy\lib\shape_base.py", line 559, in split
array split does not result in an equal division'
ValueError: array split does not result in an equal division
Code is:
import rasterio
import os
import numpy as np
import time
import concurrent.futures
def mean_py(array):
print("Calculating mean of array")
start_time = time.time()
x = array.shape[1]
y = array.shape[2]
values = np.empty((x,y), type(array[0][0][0]))
for i in range(x):
for j in range(y):
#no more need for append operations
values[i][j] = ((np.mean(array[:, i, j])))
end_time = time.time()
hours, rem = divmod(end_time-start_time, 3600)
minutes, seconds = divmod(rem, 60)
print("{:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))
print(f"{'.'*80}")
return values
def TF_mean(ndarray):
sdir = r'G:\Mosaics\VH'
final_array = np.asarray(ndarray)
final_array = mean_py(final_array)
out_name = (sdir + "/" + "MEAN_VH.tif")
print(final_array.shape)
with rasterio.open(out_name, "w", **profile) as dst:
dst.write(final_array.astype('float32'), 1)
print(out_name)
print(f"\nDone!\n{'.'*80}")
def main():
sdir = r'G:\Mosaics\VH'
a = np.random.randn(250_000)
b = np.random.randn(250_000)
c = np.random.randn(250_000)
e = np.random.randn(250_000)
f = np.random.randn(250_000)
g = np.random.randn(250_000)
h = np.random.randn(250_000)
arrays = [a, b, c, e, f, g, h]
final_array = []
for array in arrays:
final_array.append(array)
print(f"{array} added")
print("Splitting nd-array!")
final_array = np.asarray(final_array)
subarrays = np.split(final_array, 4)
with concurrent.futures.ProcessPoolExecutor() as executor:
for subarray, mean in zip(subarrays, executor.map(TF_mean,subarrays)):
print(f'Processing {subarray}')
if __name__ == '__main__':
main()
I just expect to have four processes running in parallel and a way to obtained the 4 subarrays and write them as a whole Geotiff file.
The second exception is the important one here, in terms of describing the error: "array split does not result in an equal division"
final_array is a 2D array, with shape 7 by 250,000. numpy.split operates along an axis, defaulting to axis 0, so you just asked it to split a length seven axis into four equal parts. Obviously, this isn't possible, so it gives up.
To fix, you can:
Split more; you could just split in seven parts and process each separately. The executor is perfectly happy to do seven tasks, no matter how many workers you have; seven won't split evenly, so at the tail end of processing you'll likely have some workers idle while the rest finish up, but that's not the end of the world
Split on a more fine-grained level. You could just flatten the array, e.g. final_array = final_array.reshape(final_array.size), which would make it a flat 1,750,000 element array, which can be split into four parts.
Split unevenly; instead of subarrays = np.split(final_array, 4) which requires axis 0 to be evenly splittable, do subarrays = np.split(final_array, (2,4,6)), which splits into three groups of two rows, plus one group with a single row.
There are many other options depending on your use case (e.g. split on axis=1 instead of the default axis=0), but those three are the least invasive (and #1 and #3 shouldn't change behavior meaningfully; #2 might, depending on whether the separation between 250K element blocks is meaningful).
I'm profiling node.js vs python in file (48KB) reading synchronously.
Node.js code
var fs = require('fs');
var stime = new Date().getTime() / 1000;
for (var i=0; i<1000; i++){
var content = fs.readFileSync('npm-debug.log');
}
console.log("Total time took is: " + ((new Date().getTime() / 1000) - stime));
Python Code
import time
stime = time.time()
for i in range(1000):
with open('npm-debug.log', mode='r') as infile:
ax = infile.read();
print("Total time is: " + str(time.time() - stime));
Timings are as follows:
$ python test.py
Total time is: 0.5195660591125488
$ node test.js
Total time took is: 0.25799989700317383
Where is the difference?
In File IO or
Python list ds allocation
Or Am I not comparing apples to apples?
EDIT:
Updated python's readlines() to read() for a good comparison
Changed the iterations to 1000 from 500
PURPOSE:
To understand the truth in node.js is slower than python is slower than C kind of things and if so slow at which place in this context.
readlines returns a list of lines in the file, so it has to read the data char by char, constantly comparing the current character to any of the newline characters, and keep composing a list of lines.
This is more complicated than simple file.read(), which would be the equivalent of what Node.js does.
Also, the length calculated by your Python script is the number of lines, while Node.js gets the number of characters.
If you want even more speed, use os.open instead of open:
import os, time
def Test_os(n):
for x in range(n):
f = os.open('Speed test.py', os.O_RDONLY)
data = ""
t = os.read(f, 1048576).decode('utf8')
while t:
data += t
t = os.read(f, 1048576).decode('utf8')
os.close(f)
def Test_open(n):
for x in range(n):
with open('Speed test.py') as f:
data = f.read()
s = time.monotonic()
Test_os(500000)
print(time.monotonic() - s)
s = time.monotonic()
Test_open(500000)
print(time.monotonic() - s)
On my machine os.open is several seconds faster than open. The output is as follows:
53.68909174999999
58.12600833400029
As you can see, open is 4.4 seconds slower than os.open, although as the number of runs decreases, so does this difference.
Also, you should try tweaking the buffer size of the os.read function as different values may give very different timings:
Here 'operation' means a single call to Test_os.
If you get rid of bytes' decoding and use io.BytesIO instead of mere bytes objects, you'll get a considerable speedup:
def Test_os(n, buf):
for x in range(n):
f = os.open('test.txt', os.O_RDONLY)
data = io.BytesIO()
while data.write(os.read(f, buf)):
...
os.close(f)
Thus, the best result is now 0.038 seconds per call instead of 0.052 (~37% speedup).
I keep getting this error:
Traceback (most recent call last):
File "C:\Users\Andrew\Desktop\lab10.py", line 66, in <module>
main()
File "C:\Users\Andrew\Desktop\lab10.py", line 55, in main
volumeRectangle=VR(length,width,height)
File "C:\Users\Andrew\Desktop\lab10.py", line 20, in VR
volume=length*width*height
TypeError: can't multiply sequence by non-int of type 'function'
Code
import math
def AC(radius):
area = math.pi * radius ** 2
return area
def AR(length,width):
area=length*width
return area
def VC(radius,height):
volume=math.pi*radius*radius*height
return volume
def VR(length,width,height):
volume=length*width*height
return volume
# WRITE ALL THE OTHER FUNCTIONS
def main():
inFile = open("lab10.input","r")
# get calculation type, but wait on dimension(s)
type = (inFile.readline()).strip()
while (type != "###"):
if (type == "AC"):
radius = eval(inFile.readline())
circleArea = AC(radius)
print(format("Area of a Circle","30s"),format(circleArea,"15.2f"))
if (type=='AR'):
length=eval(inFile.readline())
width=eval(inFile.readline())
rsArea=ARR(length,width)
print(format("Area of a Rectangle or Square",'30s'),format(rsArea,'15.2f'))
if (type=='VC'):
radius=eval(inFile.readline())
height=eval(inFile.readline())
volumeCylinder=VC(radius,height)
print(format("Volume of a Cylinder",'30s'),format(volumeCylinder,'15.2f'))
if (type=='VR'):
length=eval(inFile.readline())
width=eval(inFile.readline())
height=eval(inFile.readline())
volumeRectangle=VR(length,width,height)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
# do the processing for all other types of calculations
# get calculation type, but wait on dimension(s)
type = (inFile.readline()).strip()
main()
This is what the input file looks like. INPUT FILE
AC
7.5
SAC
4
VR
2, 3, 4.1
AR
13, 3.25
SAS
24
###
0
This seem to work. Like Paul said inputs in the same lines were getting messed up.
while (type != "###"):
if (type == "AC"):
radius = eval(inFile.readline())
circleArea = AC(radius)
print(format("Area of a Circle","30s"),format(circleArea,"15.2f"))
if (type=='AR'):
length, width=eval(inFile.readline().strip())
rsArea=AR(length,width)
print(format("Area of a Rectangle or Square",'30s'),format(rsArea,'15.2f'))
if (type=='VC'):
radius, height=eval(inFile.readline().strip())
volumeCylinder=VC(radius,height)
print(format("Volume of a Cylinder",'30s'),format(volumeCylinder,'15.2f'))
if (type=='VR'):
length, width, height =eval(inFile.readline().strip())
volumeRectangle=VR(length,width,height)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
You are running into issues when you hit the VR command. This is because it has multiple input parameters. You 'got away with it' in the first command (AC) as it has only a single parameter and eval turned it into a single float value.
For VR there are 3 parameters and you are trying to set each of these parameters by reading a line in the file (that's 3 lines read). Now all your parameters are actually together on single line. So if you thought you were reading the parameters
12, 3, 4.1
You are not you are reading
2, 3, 4.1
AR
13, 3.25
When you eval these you get tuples for the 1st and 3rd lines and since AR is a function you defined, eval('AR') returns you that function (which makes sense of the exception error message you received). This is one of the hazards of using eval, it can lead to unexpected results.
So your VR call looks like
VR((2, 3, 4.1), AR, (13, 3.25))
Which is doubtless not what you expected.
Two ways you could resolve it. I've abbreviated your code for clarity. You'd obviously need to replicate in the other commands.
Stick with the tuples and ast.literal_eval (use that instead of eval).
import math
from ast import literal_eval
def VR(length,width,height):
return length * width * height
def main():
with open("lab10.input","r") as inFile:
acttype = (inFile.readline()).strip()
while (acttype != "###"):
if (acttype=='VR'):
params = literal_eval(inFile.readline())
volumeRectangle = VR(*params)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
acttype = (inFile.readline()).strip()
main()
or avoid the eval by splitting the line on ,.
import math
def VR(length,width,height):
return length * width * height
def main():
with open("lab10.input","r") as inFile:
acttype = (inFile.readline()).strip()
while (acttype != "###"):
if (acttype=='VR'):
params = map(float, inFile.readline().split(','))
volumeRectangle = VR(*params)
print(format("Volume of a Rectangle",'30s'),format(volumeRectangle,'15.2f'))
acttype = (inFile.readline()).strip()
main()