calculate equivalent width using python code - python

I have this Fortran program for compute equivalent width of spectral lines
i hope to find help for write python code to do same algorithm (input file contain tow column wavelength and flux)
PARAMETER (N=195) ! N is the number of data
DO 10 I=1,N
DO I=2,N

Here's a fairly literal translation:
def main():
N = 195 # number of data pairs
x, y = [0 for i in xrange(N)], [0 for i in xrange(N)]
with open('halpha.dat') as f:
for i in xrange(N):
x[i], y[i] = map(float, f.readline().split())
print x[i], y[i]
sum = width(x, y, N)
print sum
def width(x, y, N):
sum = 0.0
for i in xrange(1, N):
sum = sum + (x[i-1] - x[i]) * ((1. - y[i-1]) + (1. - y[i]))
sum = 0.5*abs(sum)
return sum
However this would be a more idiomatic translation:
from math import fsum # more accurate floating point sum of a series of terms
def main():
with open('halpha.dat') as f: # Read file into a list of tuples.
pairs = [tuple(float(word) for word in line.split()) for line in f]
for pair in pairs:
print('{}, {}'.format(*pair))
def width(pairs):
def term(prev, curr):
return (prev[0] - curr[0]) * ((1. - prev[1]) + (1. - curr[1]))
return 0.5 * abs(fsum(term(*pairs[i-1:i+1]) for i in range(1, len(pairs))))

I would suggest that a more natural way to do this in Python is to focus on the properties of the spectrum itself, and use your parameters in astropy's specutils.
In particular equivalent_width details are here. For more general info on
specutils, specutils.analysis and its packages follow these links:
specutils top level
To use this package you need to create a Spectrum1D object, the first component of which will be your wavelength axis and the second will be the flux. You can find details of how to create a Spectrum1D object by following the link in the analysis page (at the end of the third line of first paragraph).
It's a very powerful approach and has been developed by astronomers for astronomers.


Unique ordered ratio of integers

I have two ordered lists of consecutive integers m=0, 1, ... M and n=0, 1, 2, ... N. Each value of m has a probability pm, and each value of n has a probability pn. I am trying to find the ordered list of unique values r=n/m and their probabilities pr. I am aware that r is infinite if n=0 and can even be undefined if m=n=0.
In practice, I would like to run for M and N each be of the order of 2E4, meaning up to 4E8 values of r - which would mean 3 GB of floats (assuming 8 Bytes/float).
For this calculation, I have written the python code below.
The idea is to iterate over m and n, and for each new m/n, insert it in the right place with its probability if it isn't there yet, otherwise add its probability to the existing number. My assumption is that it is easier to sort things on the way instead of waiting until the end.
The cases related to 0 are added at the end of the loop.
I am using the Fraction class since we are dealing with fractions.
The code also tracks the multiplicity of each unique value of m/n.
I have tested up to M=N=100, and things are quite slow. Are there better approaches to the question, or more efficient ways to tackle the code?
M=N=30: 1 s
M=N=50: 6 s
M=N=80: 30 s
M=N=100: 82 s
import numpy as np
from fractions import Fraction
import time # For timiing
start_time = time.time() # Timing
M, N = 6, 4
mList, nList = np.arange(1, M+1), np.arange(1, N+1) # From 1 to M inclusive, deal with 0 later
mProbList, nProbList = [1/(M+1)]*(M), [1/(N+1)]*(N) # Probabilities, here assumed equal (not general case)
# Deal with mn=0 later
pmZero, pnZero = 1/(M+1), 1/(N+1) # P(m=0) and P(n=0)
pNaN = pmZero * pnZero # P(0/0) = P(m=0)P(n=0)
pZero = pmZero * (1 - pnZero) # P(0) = P(m=0)P(n!=0)
pInf = pnZero * (1 - pmZero) # P(inf) = P(m!=0)P(n=0)
# Main list of r=m/n, P(r) and mult(r)
# Start with first line, m=1
rList = [Fraction(mList[0], n) for n in nList[::-1]] # Smallest first
rProbList = [mProbList[0] * nP for nP in nProbList[::-1]] # Start with first line
rMultList = [1] * len(rList) # Multiplicity of each element
# Main loop
for m, mP in zip(mList[1:], mProbList[1:]):
for n, nP in zip(nList[::-1], nProbList[::-1]): # Pick an n value
r, rP, rMult = Fraction(m, n), mP*nP, 1
for i in range(len(rList)-1): # See where it fits in existing list
if r < rList[i]:
rList.insert(i, r)
rProbList.insert(i, rP)
rMultList.insert(i, 1)
elif r == rList[i]:
rProbList[i] += rP
rMultList[i] += 1
elif r < rList[i+1]:
rList.insert(i+1, r)
rProbList.insert(i+1, rP)
rMultList.insert(i+1, 1)
elif r == rList[i+1]:
rProbList[i+1] += rP
rMultList[i+1] += 1
if r > rList[-1]:
# Deal with 0
rList.insert(0, Fraction(0, 1))
rProbList.insert(0, pZero)
rMultList.insert(0, N)
# Deal with infty
# Deal with undefined case
print(".... done in %s seconds." % round(time.time() - start_time, 2))
print("************** Final list\nr", 'Prob', 'Mult')
for r, rP, rM in zip(rList, rProbList, rMultList): print(r, rP, rM)
print("************** Checks")
print("mList", mList, 'nList', nList)
print("Sum of proba = ", np.sum(rProbList))
print("Sum of multi = ", np.sum(rMultList), "\t(M+1)*(N+1) = ", (M+1)*(N+1))
Based on the suggestion of #Prune, and on this thread about merging lists of tuples, I have modified the code as below. It's a lot easier to read, and runs about an order of magnitude faster for N=M=80 (I have omitted dealing with 0 - would be done same way as in original post). I assume there may be ways to tweak the merge and conversion back to lists further yet.
# Do calculations
data = [(Fraction(m, n), mProb(m) * nProb(n)) for n in range(1, N+1) for m in range(1, M+1)]
# Merge duplicates using a dictionary
d = {}
for r, p in data:
if not (r in d): d[r] = [0, 0]
d[r][0] += p
d[r][1] += 1
# Convert back to lists
rList, rProbList, rMultList = [], [], []
for k in d:
I expect that "things are quite slow" because you've chosen a known inefficient sort. A single list insertion is O(K) (later list elements have to be bumped over, and there is added storage allocation on a regular basis). Thus a full-list insertion sort is O(K^2). For your notation, that is O((M*N)^2).
If you want any sort of reasonable performance, research and use the best-know methods. The most straightforward way to do this is to make your non-exception results as a simple list comprehension, and use the built-in sort for your penultimate list. Simply append your n=0 cases, and you're done in O(K log K) time.
I the expression below, I've assumed functions for m and n probabilities.
This is a notational convenience; you know how to directly compute them, and can substitute those expressions if you wish.
data = [ (mProb(m) * nProb(n), Fraction(m, n))
for n in range(1, N+1)
for m in range(0, M+1) ]
data.extend([ # generate your "zero" cases here ])

Exact math with big numbers in Python 3

I'm trying to implement a system for encryption similar to Shamir's Secret Sharing using Python. Essentially, I have code that will generate a list of points that can be used to find a password at the y-intercept of the gradient formed by these points. The password is a number in ASCII (using two digits per ASCII character), thus is gets to be a pretty big number with larger passwords. For example, the password ThisIsAPassword will generate a list of points that looks like this:
x y
9556 66707086867915126140753213946756441607861037300900
4083 28502040182447127964404994111341362715565457349000
9684 67600608880657662915204624898507424633297513499300
9197 64201036847801292531159022293017356403707170463200
To be clear, these points are generated upon a randomly chosen slope (this is fine since it's the y-intercept that matters).
The problem arises in trying to make a program to decode a password. Using normal math, Python is unable to accurately find the password because of the size of the numbers. Here's the code I have:
def findYint(x,y):
slope = (y[1] - y[0]) / (x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
def asciiToString(num):
chars = [num[i:i+3] for i in range(0, len(num), 3)]
return ''.join(chr(int(i)) for i in chars)
def main():
fi = open('pass.txt','r')
x,y = [], []
for i in fi:
row = i.split()
yint = findYint(x,y)
pword = asciiToString(str(yint))
Output (with the password "ThisIsAPassword"):
͉)3 ǢΜĩũć»¢ǔ¼
Typically my code will work with shorter passwords such as "pass" or "word", but the bigger numbers presumably aren't computed with the exact accuracy needed to convert them into ASCII. Any solutions for using either precise math or something else?
Also here's the code for generating points in case it's important:
import random
def encryptWord(word):
numlist = []
for i in range(len(word)):
num = int("".join(numlist))
return num
def createPoints(pwd, pts):
yint = pwd
gradient = pwd*random.randint(10,100)
xvals = []
yvals = []
for i in range(pts):
n = random.randint(1000,10000)
yvals.append(((n) * gradient) + pwd)
return xvals, yvals
def main():
pword = input("Enter a password to encrypt: ")
pword = encryptWord(pword)
numpoints = int(input("How many points to generate? "))
if numpoints < 2:
numpoints = 2
xpts, ypts = createPoints(pword, numpoints)
fi = open("pass.txt","w")
for i in range(len(xpts)):
fi.write(' ')
print("Sent to file (pass.txt)")
As you may know, Python's built-in int type can handle arbitrarily large integers, but the float type which has limited precision. The only part of your code which deals with numbers that aren't ints seems to be this function:
def findYint(x,y):
slope = (y[1] - y[0]) / (x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
Here the division results in a float, even if the result would be exact as an int. Moreover, we can't safely do integer division here with the // operator, because slope will get multiplied by x[0] before the truncation is supposed to happen.
So either you need to do some algebra in order to get the same result using only ints, or you need to represent the fraction (y1 - y0) / (x1 - x0) with an exact non-integer number type instead of float. Fortunately, Python's standard library has a class named Fraction which will do what you want:
from fractions import Fraction
def findYint(x,y):
slope = Fraction(y[1] - y[0], x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
It should be possible to do this only with integer-based math:
def findYint(x,y):
return (y[0] * (x[1] - x[0]) - (y[1] - y[0]) * x[0]) // (x[1] - x[0])
This way you avoid the floating point arithmetic and the precision constraints it has.
Fractions, and rewriting for all integer math are good.
For truly large integers, you may find yourself wanting instead of the builtin int type. I've successfully used it for testing for large primes.
Or if you really do want numbers with a decimal point, maybe try decimal.Decimal("1") - just for example.

compute the infinity norm of the difference between the two solutions

In the following code I have been able to:
Implement Gaussian elimination with no pivoting for a general square linear system.
I have tested it by solving Ax=b, where A is a random 100x100 matrix and b is a random 100x1 vector.
I have compared my solution against the solution obtained using numpy.linalg.solve
However in the final task I need to compute the infinity norm of the difference between the two solutions. I know the infinity norm is the greatest absolute row sum of a matrix. But how can I do this to compute the infinity norm of the difference between the two solutions, my solution and the numpy.linalg.solve. Looking for some help with this!
import numpy as np
def GENP(A, b):
Gaussian elimination with no pivoting.
% input: A is an n x n nonsingular matrix
% b is an n x 1 vector
% output: x is the solution of Ax=b.
% post-condition: A and b have been modified.
n = len(A)
if b.size != n:
raise ValueError("Invalid argument: incompatible sizes between A & b.", b.size, n)
for pivot_row in range(n-1):
for row in range(pivot_row+1, n):
multiplier = A[row][pivot_row]/A[pivot_row][pivot_row]
#the only one in this column since the rest are zero
A[row][pivot_row] = multiplier
for col in range(pivot_row + 1, n):
A[row][col] = A[row][col] - multiplier*A[pivot_row][col]
#Equation solution column
b[row] = b[row] - multiplier*b[pivot_row]
x = np.zeros(n)
k = n-1
x[k] = b[k]/A[k,k]
while k >= 0:
x[k] = (b[k] -[k,k+1:],x[k+1:]))/A[k,k]
k = k-1
return x
if __name__ == "__main__":
A = np.round(np.random.rand(100, 100)*10)
b = np.round(np.random.rand(100)*10)
print (GENP(np.copy(A), np.copy(b)))
for example this code gives the following output for task 1 listed above:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
then for task two my code gives the following:
my_solution = GENP(np.copy(A), np.copy(b))
numpy_solution = np.linalg.solve(A, b)
resulting in:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
finally for task 3:
if np.allclose(my_solution, numpy_solution):
print("These solutions agree")
print("These solutions do not agree")
resulting in:
These solutions agree
If what you want is only the infinity norm for matrix,
it generally should look something like this:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
But since your my_solution and numpy_solution are just 1-D vectors, you
may either to reshape them (I assume 100x1 which is what you have in your
example) for use with above function:
alternative 1:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
diff = my_solution - numpy_solution
inf_norm_result = inf_norm(diff.reshape((100, 1))
alternative 2:
Or if you know they will always be 1-D vectors, you can omit the sum
(because the rows will all have length 1) and compute it directly:
abs(my_solution - numpy_solution).max()
alternative 3:
or as it is written in numpy.linalg.norm (see below) documentation:
max(sum(abs(my_solution - numpy_solution), axis=1))
alternative 4:
or use the numpy.linalg.norm() (see:
np.linalg.norm(my_solution - numpy_solution, np.inf)

TypeError while trying to implement KNN algorithm with python

import csv
import random
import math
import operator
def loadDataset(filename,trainingSet=[],testSet=[]):
with open(filename, 'rt') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
z = len(dataset)-1
for x in range(len(dataset)-2):
for y in range(8,9):
dataset[x][y] = float (dataset[x][y])
for y in range(8,9):
dataset[z][y] = float (dataset[z][y])
def euclideanDistance(instance1, instance2):
distance = 0
X= (instance1[9] - instance2[9]) +(instance1[8] - instance2[8])
distance += pow(X, 2)
return math.sqrt(distance)
def getNeighbors(trainingSet, testInstance, k):
distances = []
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x])
distances.append((trainingSet[x], dist))
neighbors = []
for x in range(k):
return neighbors
def main():
loadDataset('G:\ABCD.csv', trainingSet, testSet)
print ('Train set: ' + repr(len(trainingSet)))
print ('Test set: ' + repr(len(testSet)))
k = 4
neighbors = getNeighbors(trainingSet, testSet[0], k)
print('Best Neighbor is: ' + a)
Error I am getting
Dataset Screenshot
I am getting TypeError while executing the code basically in this program i am trying to find euclidian distance from a test point to each point in the given dataset and then after sorting trying to get neighbors with least distance .
The error says you are trying to subtract a string from a string (line 22 in your euclidianDistance function)
You need to parse the two co-ordinates into numbers to be able to subtract them. The float function will be able to do that.
Example - you're using instance1[9] which is a string representing a floating point number, so float(instance1[9]) should give you a number.
Just leave a comment if you're still struggling and I'll show you the update you need to make.

Python speed up large nested array processing

I'm wondering if there is a faster way to do this.
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[number, number, number, number, number, number, number]
- ... ect X 12000
-data[number, number, number, number, number, number, number]
- ... ect X 12000
x and y are the first two numbers in each data array.
I need to scan each item in layers 1,2,3 against each item in the first layer (0) looking to see if they fall within a given search radius. This takes a while.
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
Thanks for any assistance!
First you should write more readable python code:
for x,y in data[0]:
for x1, y1 in data[1]:
if (x1 - x)**2 + (y1 - y)**2 < somevalue:
The you can vectorize the inner loop with numpy:
for x,y in data[0]:
x1, y1 = data[1].T
indices = (x1 - x)**2 + (y1 - y)**2 < somevalue
matches.append(((x,y), data[1][indices]))
For this specific problem scipy.spatial.KDTree or rather its Cython workalike scipy.spatial.cKDTree would appear to be taylor-made:
import numpy as np
from scipy.spatial import cKDTree
# create some random data
data = np.random.random((4, 12000, 7))
# in each record discard all but x and y
data_xy = data[..., :2]
# build trees
trees = [cKDTree(d) for d in data_xy]
somevalue = 0.001
# find all close pairs between reference layer and other layers
pairs = []
for tree in trees[1:]:
pairs.append(trees[0].query_ball_tree(tree, np.sqrt(somevalue)))
This example takes less than a second. Please note that the output format is different to the one your script produces. For each of the three non-reference layers it is a list of lists, where the inner list at index k contains the indices of the points that are close to point k in the reference list.
I would suggest creating a function out of this and using the numba libray with decorator #jit(nopython=True).
also as suggested you should use numpy arrays as numba is focusing on utilizing numpy operations.
from numba import jit
def search(data):
matches1 = []
matches2 = []
for i in range (len(data[0])):
x = data[0][i][0]
y = data[0][i][1]
for x in range (len(data1[1])):
x1 = data[1][x][0]
y1 = data[1][x][1]
if( math.pow((x1 -x),2) + math.pow((y1 - y),2) < somevalue):
return matches1, matches2
if __name__ == '__main__':
# Initialize
# import your data however.
m1, m2 = search(data)
The key is to make sure to only use the allowed functions supported by numba.
I have seen speed increases from 100x faster to ~300x faster.
This could also be a good place to use GPGPU computation. From python you have pycuda and pyopencl depending on your underlying hardware. Opencl can also use some of the SIMD instructions on the CPU if you don't have a gpu.
If you don't want to go down the GPGPU road then numpy or numba would also be useful as mentioned before.
