select region from sequence with Phobius output in Python - python

I need to use a certain program, to validate some of my results. I am relatively new in Python. The output is so different for each entry, see a snippit below:
SEQENCE ID TM SP PREDICTION
YOL154W_Q12512_Saccharomyces_cerevisiae 0 Y n8-15c20/21o
YDR481C_P11491_Saccharomyces_cerevisiae 1 0 i34-53o
YAL007C_P39704_Saccharomyces_cerevisiae 1 Y n5-20c25/26o181-207i
YAR028W_P39548_Saccharomyces_cerevisiae 2 0 i51-69o75-97i
YBL040C_P18414_Saccharomyces_cerevisiae 7 0 o6-26i38-56o62-80i101-119o125-143i155-174o186-206i
YBR106W_P38264_Saccharomyces_cerevisiae 1 0 o28-47i
YBR287W_P38355_Saccharomyces_cerevisiae 8 0 o12-32i44-63o69-90i258-275o295-315i327-351o363-385i397-421o
So, I need the last transmembrane region, in this case its always the last numbers between o and i or vise versa. if TM = 0, there is no transmembrane region, so I want the numbers if TM > 0
output I need:
34-53
181-207
75-97
186-206
28-47
397-421
preferably in seperate values, like:
first_number = 34
second_number = 53
Because I will be using a loop the values will be overwritten anyway. To summarize: I need the last region between the o and i or vise versa, with very variable strings (both in length and composition).
Trouble: If I just search (for example with regular expression) for the last region between o and i, I will sometimes pick the wrong region.

If the Phobius output is stored in a file, change 'Phobius_output' to the path, then the following code should give the expected result:
with open('Phobius_output') as file:
for line in file.readlines()[1:]:
if int(line.split()[1]) > 0:
prediction = line.split()[3]
i_idx, o_idx = prediction.rfind('i'), prediction.rfind('o')
last_region = prediction[i_idx + 1:o_idx] if i_idx < o_idx else prediction[o_idx + 1:i_idx]
first_number, second_number = map(int, last_region.split('-'))
print(last_region)

Related

Find the common ordered characters between two strings

Given two strings, find the common characters between the two strings which are in same order from left to right.
Example 1
string_1 = 'hcarry'
string_2 = 'sallyc'
Output - 'ay'
Example 2
string_1 = 'jenny'
string_2 = 'ydjeu'
Output - 'je'
Explanation for Example 1 -
Common characters between string_1 and string_2 are c,a,y. But since c comes before ay in string_1 and after ay in string_2, we won't consider character c in output. The order of common characters between the two strings must be maintained and must be same.
Explanation for Example 2 -
Common characters between string_1 and string_2 are j,e,y. But since y comes before je in string_2 and after je in string_1, we won't consider character y in output. The order of common characters between the two strings must be maintained and must be same.
My approach -
Find the common characters between the strings and then store it in another variable for each individual string.
Example -
string_1 = 'hcarry'
string_2 = 'sallyc'
Common_characters = c,a,y
string_1_com = cay
string_2_com = ayc
I used sorted, counter, enumerate functions to get string_1_com and string_2_com in Python.
Now find the longest common sub-sequence in between string_1_com and string_2_com . You get the output as the result.
This is the brute force solution.
What is the optimal solution for this?
The algorithm for this is just called string matching in my book. It runs in O(mn) where m and n are the word lengths. I guess it might as well run on the full words, what's most efficient would depend on the expected number of common letters and how the sorting and filtering is performed. I will explain it for common letters strings as that's easier.
The idea is that you look at a directed acyclic graph of (m+1)*(n+1) nodes. Each path (from upper left to lower right) through this graph represents a unique way of matching the words. We want to match the strings, and additionally put in blanks (-) in the words so that they align with the highest number of common letters. For example the end state of cay and ayc would be
cay-
-ayc
Each node stores the highest number of matches for the partial matching which it represents, and at the end of the algorithm the end node will give us the highest number of matches.
We start at the upper left corner where nothing is matched with nothing and so we have 0 matching letters here (score 0).
c a y
0 . . .
a . . . .
y . . . .
c . . . .
We are to walk through this graph and for each node calculate the highest number of matching letters, by using the data from previous nodes.
The nodes are connected left->right, up->down and diagonally left-up->right-down.
Moving right represents consuming one letter from cay and matching the letter we arrive at with a - inserted in ayc.
Moving down represents the opposite (consuming from ayc and inserting - to cay).
Moving diagonally represents consuming one letter from each word and matching those.
Looking at the first node to the right of our starting node it represents the matching
c
-
and this node can (obviously) only be reached from the starting node.
All nodes in first row and first column will be 0 since they all represent matching one or more letters with an equal number of -.
We get the graph
c a y
0 0 0 0
a 0 . . .
y 0 . . .
c 0 . . .
That was the setup, now the interesting part begins.
Looking at the first unevaluated node, which represents matching the substrings c with a, we want to decide how we can get there with the most number of matching letters.
Alternative 1: We can get there from the node to the left. The node to the left represents the matching
-
a
so by choosing this path to get to our current node we arrive at
-c
a-
matching c with - gives us no correct match and thus the score for this path is 0 (taken from the last node) plus 0 (score for the match c/- just made). So 0 + 0 = 0 for this path.
Alternative 2: We can get to this node from above, this path represents moving from
c -> c-
- -a
which also gives us 0 extra points. Score for this is 0.
Alternative 3: We can get to this node from upper-left. This is moving from starting node (nothing at all) to consuming one character from each letter. That is matching
c
a
Since c and a is different letters we get 0 + 0 = 0 for this path as well.
c a y
0 0 0 0
a 0 0 . .
y 0 . . .
c 0 . . .
But for the next node it looks better. We still have the three alternatives to look at.
Alternative 1 & 2 always gives us 0 extra points as they always represent matching a letter with -, so those paths will give us score 0. Let's move on to alternative 3.
For our current node moving diagonally means going from
c -> ca
- -a
IT'S A MATCH!
That means there is a path to this node that gives us 1 in score. We throw away the 0s and save the 1.
c a y
0 0 0 0
a 0 0 1 .
y 0 . . .
c 0 . . .
For the last node on this row we look at our three alternatives and realize we won't get any new points (new matches), but we can get to the node by using our previous 1 point path:
ca -> cay
-a -a-
So this node is also 1 in score.
Doing this for all nodes we get the following complete graph
c a y
0 0 0 0
a 0 0 1 1
y 0 0 1 2
c 0 1 1 2
where the only increases in score come from
c -> ca | ca -> cay | - -> -c
- -a | -a -ay | y yc
An so the end node tells us the maximal match is 2 letters.
Since in your case you wish to know that longest path with score 2, you need to track, for each node, the path taken as well.
This graph is easily implemented as a matrix (or an array of arrays).
I would suggest that you as elements use a tuple with one score element and one path element and in the path element you just store the aligning letters, then the elements of the final matrix will be
c a y
0 0 0 0
a 0 0 (1, a) (1, a)
y 0 0 (1, a) (2, ay)
c 0 (1, c) (1, a/c) (2, ay)
At one place I noted a/c, this is because string ca and ayc have two different sub-sequences of maximum length. You need to decide what to do in those cases, either just go with one or save both.
EDIT:
Here's an implementation for this solution.
def longest_common(string_1, string_2):
len_1 = len(string_1)
len_2 = len(string_2)
m = [[(0,"") for _ in range(len_1 + 1)] for _ in range(len_2 + 1)] # intitate matrix
for row in range(1, len_2+1):
for col in range(1, len_1+1):
diag = 0
match = ""
if string_1[col-1] == string_2[row-1]: # score increase with one if letters match in diagonal move
diag = 1
match = string_1[col - 1]
# find best alternative
if m[row][col-1][0] >= m[row-1][col][0] and m[row][col-1][0] >= m[row-1][col-1][0]+diag:
m[row][col] = m[row][col-1] # path from left is best
elif m[row-1][col][0] >= m[row-1][col-1][0]+diag:
m[row][col] = m[row-1][col] # path from above is best
else:
m[row][col] = (m[row-1][col-1][0]+diag, m[row-1][col-1][1]+match) # path diagonally is best
return m[len_2][len_1][1]
>>> print(longest_common("hcarry", "sallyc"))
ay
>>> print(longest_common("cay", "ayc"))
ay
>>> m
[[(0, ''), (0, ''), (0, ''), (0, '')],
[(0, ''), (0, ''), (1, 'a'), (1, 'a')],
[(0, ''), (0, ''), (1, 'a'), (2, 'ay')],
[(0, ''), (1, 'c'), (1, 'c'), (2, 'ay')]]
Here is a simple, dynamic programming based implementation for the problem:
def lcs(X, Y):
m, n = len(X), len(Y)
L = [[0 for x in xrange(n+1)] for x in xrange(m+1)]
# using a 2D Matrix for dynamic programming
# L[i][j] stores length of longest common string for X[0:i] and Y[0:j]
for i in range(m+1):
for j in range(n+1):
if i == 0 or j == 0:
L[i][j] = 0
elif X[i-1] == Y[j-1]:
L[i][j] = L[i-1][j-1] + 1
else:
L[i][j] = max(L[i-1][j], L[i][j-1])
# Following code is used to find the common string
index = L[m][n]
# Create a character array to store the lcs string
lcs = [""] * (index+1)
lcs[index] = ""
# Start from the right-most-bottom-most corner and
# one by one store characters in lcs[]
i = m
j = n
while i > 0 and j > 0:
# If current character in X[] and Y are same, then
# current character is part of LCS
if X[i-1] == Y[j-1]:
lcs[index-1] = X[i-1]
i-=1
j-=1
index-=1
# If not same, then find the larger of two and
# go in the direction of larger value
elif L[i-1][j] > L[i][j-1]:
i-=1
else:
j-=1
print ("".join(lcs))
But.. you have already known term "longest common subsequence" and can find numerous descriptions of dynamic programming algorithm.
Wiki link
pseudocode
function LCSLength(X[1..m], Y[1..n])
C = array(0..m, 0..n)
for i := 0..m
C[i,0] = 0
for j := 0..n
C[0,j] = 0
for i := 1..m
for j := 1..n
if X[i] = Y[j] //i-1 and j-1 if reading X & Y from zero
C[i,j] := C[i-1,j-1] + 1
else
C[i,j] := max(C[i,j-1], C[i-1,j])
return C[m,n]
function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
if i = 0 or j = 0
return ""
if X[i] = Y[j]
return backtrack(C, X, Y, i-1, j-1) + X[i]
if C[i,j-1] > C[i-1,j]
return backtrack(C, X, Y, i, j-1)
return backtrack(C, X, Y, i-1, j)
Much easier solution ----- Thank you!
def f(s, s1):
cc = list(set(s) & set(s1))
ns = ''.join([S for S in s if S in cc])
ns1 = ''.join([S for S in s1 if S in cc])
found = []
b = ns[0]
for e in ns[1:]:
cs = b+e
if cs in ns1:
found.append(cs)
b = e
return found

Long multiplication of two numbers given as strings

I am trying to solve a problem of multiplication. I know that Python supports very large numbers and it can be done but what I want to do is
Enter 2 numbers as strings.
Multiply those two numbers in the same manner as we used to do in school.
Basic idea is to convert the code given in the link below to Python code but I am not very good at C++/Java. What I want to do is to understand the code given in the link below and apply it for Python.
https://www.geeksforgeeks.org/multiply-large-numbers-represented-as-strings/
I am stuck at the addition point.
I want to do it it like in the image given below
So I have made a list which stores the values of ith digit of first number to jth digit of second. Please help me to solve the addition part.
def mul(upper_no,lower_no):
upper_len=len(upper_no)
lower_len=len(lower_no)
list_to_add=[] #saves numbers in queue to add in the end
for lower_digit in range(lower_len-1,-1,-1):
q='' #A queue to store step by step multiplication of numbers
carry=0
for upper_digit in range(upper_len-1,-1,-1):
num2=int(lower_no[lower_digit])
num1=int(upper_no[upper_digit])
print(num2,num1)
x=(num2*num1)+carry
if upper_digit==0:
q=str(x)+q
else:
if x>9:
q=str(x%10)+q
carry=x//10
else:
q=str(x%10)+q
carry=0
num=x%10
print(q)
list_to_add.append(int(''.join(q)))
print(list_to_add)
mul('234','567')
I have [1638,1404,1170] as a result for the function call mul('234','567') I am supposed to add these numbers but stuck because these numbers have to be shifted for each list. for example 1638 is supposed to be added as 16380 + 1404 with 6 aligning with 4, 3 with 0 and 8 with 4 and so on. Like:
1638
1404x
1170xx
--------
132678
--------
I think this might help. I've added a place variable to keep track of what power of 10 each intermediate value should be multiplied by, and used the itertools.accumulate function to produce the intermediate accumulated sums that doing so produces (and you want to show).
Note I have also reformatted your code so it closely follows PEP 8 - Style Guide for Python Code in an effort to make it more readable.
from itertools import accumulate
import operator
def mul(upper_no, lower_no):
upper_len = len(upper_no)
lower_len = len(lower_no)
list_to_add = [] # Saves numbers in queue to add in the end
place = 0
for lower_digit in range(lower_len-1, -1, -1):
q = '' # A queue to store step by step multiplication of numbers
carry = 0
for upper_digit in range(upper_len-1, -1, -1):
num2 = int(lower_no[lower_digit])
num1 = int(upper_no[upper_digit])
print(num2, num1)
x = (num2*num1) + carry
if upper_digit == 0:
q = str(x) + q
else:
if x>9:
q = str(x%10) + q
carry = x//10
else:
q = str(x%10) + q
carry = 0
num = x%10
print(q)
list_to_add.append(int(''.join(q)) * (10**place))
place += 1
print(list_to_add)
print(list(accumulate(list_to_add, operator.add)))
mul('234', '567')
Output:
7 4
7 3
7 2
1638
6 4
6 3
6 2
1404
5 4
5 3
5 2
1170
[1638, 14040, 117000]
[1638, 15678, 132678]

Read specific Bits from bitstring.BitArray

I have a Bitarray and want to read from a certain position to another position.
I have the int variable length in a for loop, so for example I have:
length = 2
and my Bitarray looks something like:
msgstr = bitstring.BitArray(0b11110011001111110)
I then want to read the first two bits and convert them into an int, so that I have:
id == 3
And for the next round when length has changed in value it should start from the third bit etc.
id = bitstring.BitArray()
m = 0
while 5 != m:
/////////////
Length changes in value part of Code
/////////////
x = 0
if m == 0:
while length != x:
id.append = msgstr[x] #msgstr is the BitArray that needs to be read
x = x + 1
m = m + 1
What you want here is called slicing.
for i in range(0,len(msgstr),length):
print msgstr[i:i+length].uint
This code will get you what you are asking for. It will take the first two bits and convert them into an int, then will take the third and fourth bits and convert them to an int, etc.

How to randomly delete a number of lines from a big file?

I have a big text file of 13 GB with 158,609,739 lines and I want to randomly select 155,000,000 lines.
I have tried to scramble the file and then cut the 155000000 first lines, but it's seem that my ram memory (16GB) isn't enough big to do this. The pipelines i have tried are:
shuf file | head -n 155000000
sort -R file | head -n 155000000
Now instead of selecting lines, I think is more memory efficient delete 3,609,739 random lines from the file to get a final file of 155000000 lines.
As you copy each line of the file to the output, assess its probability that it should be deleted. The first line should have a 3,609,739/158,609,739 chance of being deleted. If you generate a random number between 0 and 1 and that number is less than that ratio, don't copy it to the output. Now the odds for the second line are 3,609,738/158,609,738; if that line is not deleted, the odds for the third line are 3,609,738/158,609,737. Repeat until done.
Because the odds change with each line processed, this algorithm guarantees the exact line count. Once you've deleted 3,609,739 the odds go to zero; if at any time you would need to delete every remaining line in the file, the odds go to one.
You could always pre-generate which line numbers (a list of 3,609,739 random numbers selected without replacement) you plan on deleting, then just iterate through the file and copy to another, skipping lines as necessary. As long as you have space for a new file this would work.
You could select the random numbers with random.sample
E.g.,
random.sample(xrange(158609739), 3609739)
Proof of Mark Ransom's Answer
Let's use numbers easier to think about (at least for me!):
10 items
delete 3 of them
First time through the loop we will assume that the first three items get deleted -- here's what the probabilities look like:
first item: 3 / 10 = 30%
second item: 2 / 9 = 22%
third item: 1 / 8 = 12%
fourth item: 0 / 7 = 0 %
fifth item: 0 / 6 = 0 %
sixth item: 0 / 5 = 0 %
seventh item: 0 / 4 = 0 %
eighth item: 0 / 3 = 0 %
ninth item: 0 / 2 = 0 %
tenth item: 0 / 1 = 0 %
As you can see, once it hits zero, it stays at zero. But what if nothing is getting deleted?
first item: 3 / 10 = 30%
second item: 3 / 9 = 33%
third item: 3 / 8 = 38%
fourth item: 3 / 7 = 43%
fifth item: 3 / 6 = 50%
sixth item: 3 / 5 = 60%
seventh item: 3 / 4 = 75%
eighth item: 3 / 3 = 100%
ninth item: 2 / 2 = 100%
tenth item: 1 / 1 = 100%
So even though the probability varies per line, overall you get the results you are looking for. I went a step further and coded a test in Python for one million iterations as a final proof to myself -- remove seven items from a list of 100:
# python 3.2
from __future__ import division
from stats import mean # http://pypi.python.org/pypi/stats
import random
counts = dict()
for i in range(100):
counts[i] = 0
removed_failed = 0
for _ in range(1000000):
to_remove = 7
from_list = list(range(100))
removed = 0
while from_list:
current = from_list.pop()
probability = to_remove / (len(from_list) + 1)
if random.random() < probability:
removed += 1
to_remove -= 1
counts[current] += 1
if removed != 7:
removed_failed += 1
print(counts[0], counts[1], counts[2], '...',
counts[49], counts[50], counts[51], '...',
counts[97], counts[98], counts[99])
print("remove failed: ", removed_failed)
print("min: ", min(counts.values()))
print("max: ", max(counts.values()))
print("mean: ", mean(counts.values()))
and here's the results from one of the several times I ran it (they were all similar):
70125 69667 70081 ... 70038 70085 70121 ... 70047 70040 70170
remove failed: 0
min: 69332
max: 70599
mean: 70000.0
A final note: Python's random.random() is [0.0, 1.0) (doesn't include 1.0 as a possibility).
I believe you're looking for "Algorithm S" from section 3.4.2 of Knuth (D. E. Knuth, The Art of Computer Programming. Volume 2: Seminumerical Algorithms, second edition. Addison-Wesley, 1981).
You can see several implementations at http://rosettacode.org/wiki/Knuth%27s_algorithm_S
The Perlmonks list has some Perl implementations of Algorithm S and Algorithm R that might also prove useful.
These algorithms rely on there being a meaningful interpretation of floating point numbers like 3609739/158609739, 3609738/158609738, etc. which might not have sufficient resolution with a standard Float datatype, unless the Float datatype is implemented using numbers of double precision or larger.
Here's a possible solution using Python:
import random
skipping = random.sample(range(158609739), 3609739)
input = open(input)
output = open(output, 'w')
for i, line in enumerate(input):
if i in skipping:
continue
output.write(line)
input.close()
output.close()
Here's another using Mark's method:
import random
lines_in_file = 158609739
lines_left_in_file = lines_in_file
lines_to_delete = lines_in_file - 155000000
input = open(input)
output = open(output, 'w')
try:
for line in input:
current_probability = lines_to_delete / lines_left_in_file
lines_left_in_file -= 1
if random.random < current_probability:
lines_to_delete -= 1
continue
output.write(line)
except ZeroDivisionError:
print("More than %d lines in the file" % lines_in_file)
finally:
input.close()
output.close()
I wrote this code before seeing that Darren Yin has expressed its principle.
I've modified my code to take the use of name skipping (I didn't dare to choose kangaroo ...) and of keyword continue from Ethan Furman whose code's principle is the same too.
I defined default arguments for the parameters of the function in order that the function can be used several times without having to make re-assignement at each call.
import random
import os.path
def spurt(ff,skipping):
for i,line in enumerate(ff):
if i in skipping:
print 'line %d excluded : %r' % (i,line)
continue
yield line
def randomly_reduce_file(filepath,nk = None,
d = {0:'st',1:'nd',2:'rd',3:'th'},spurt = spurt,
sample = random.sample,splitext = os.path.splitext):
# count of the lines of the original file
with open(filepath) as f: nl = sum(1 for _ in f)
# asking for the number of lines to keep, if not given as argument
if nk is None:
nk = int(raw_input(' The file has %d lines.'
' How many of them do you '
'want to randomly keep ? : ' % nl))
# transfer of the lines to keep,
# from one file to another file with different name
if nk<nl:
with open(filepath,'rb') as f,\
open('COPY'.join(splitext(filepath)),'wb') as g:
g.writelines( spurt(f,sample(xrange(0,nl),nl-nk) ) )
# sample(xrange(0,nl),nl-nk) is the list
# of the counting numbers of the lines to be excluded
else:
print ' %d is %s than the number of lines (%d) in the file\n'\
' no operation has been performed'\
% (nk,'the same' if nk==nl else 'greater',nl)
With the $RANDOM variable you can get a random number between 0 and 32,767.
With this, you could read in each line, and see if $RANDOM is less than 155,000,000 / 158,609,739 * 32,767 (which is 32,021), and if so, let the line through.
Of course, this wouldn't give you exactly 150,000,000 lines, but pretty close to it depending on the normality of the random number generator.
EDIT: Here is some code to get you started:
#!/bin/bash
while read line; do
if (( $RANDOM < 32021 ))
then
echo $line
fi
done
Call it like so:
thatScript.sh <inFile.txt >outFile.txt

Printing in a loop

I have the following file I'm trying to manipulate.
1 2 -3 5 10 8.2
5 8 5 4 0 6
4 3 2 3 -2 15
-3 4 0 2 4 2.33
2 1 1 1 2.5 0
0 2 6 0 8 5
The file just contains numbers.
I'm trying to write a program to subtract the rows from each other and print the results to a file. My program is below and, dtest.txt is the name of the input file. The name of the program is make_distance.py.
from math import *
posnfile = open("dtest.txt","r")
posn = posnfile.readlines()
posnfile.close()
for i in range (len(posn)-1):
for j in range (0,1):
if (j == 0):
Xp = float(posn[i].split()[0])
Yp = float(posn[i].split()[1])
Zp = float(posn[i].split()[2])
Xc = float(posn[i+1].split()[0])
Yc = float(posn[i+1].split()[1])
Zc = float(posn[i+1].split()[2])
else:
Xp = float(posn[i].split()[3*j+1])
Yp = float(posn[i].split()[3*j+2])
Zp = float(posn[i].split()[3*j+3])
Xc = float(posn[i+1].split()[3*j+1])
Yc = float(posn[i+1].split()[3*j+2])
Zc = float(posn[i+1].split()[3*j+3])
Px = fabs(Xc-Xp)
Py = fabs(Yc-Yp)
Pz = fabs(Zc-Zp)
print Px,Py,Pz
The program is calculating the values correctly but, when I try to call the program to write the output file,
mpipython make_distance.py > distance.dat
The output file (distance.dat) only contains 3 columns when it should contain 6. How do I tell the program to shift what columns to print to for each step j=0,1,....
For j = 0, the program should output to the first 3 columns, for j = 1 the program should output to the second 3 columns (3,4,5) and so on and so forth.
Finally the len function gives the number of rows in the input file but, what function gives the number of columns in the file?
Thanks.
Append a , to the end of your print statement and it will not print a newline, and then when you exit the for loop add an additional print to move to the next row:
for j in range (0,1):
...
print Px,Py,Pz,
print
Assuming all rows have the same number of columns, you can get the number of columns by using len(row.split()).
Also, you can definitely shorten your code quite a bit, I'm not sure what the purpose of j is, but the following should be equivalent to what you're doing now:
for j in range (0,1):
Xp, Yp, Zp = map(float, posn[i].split()[3*j:3*j+3])
Xc, Yc, Zc = map(float, posn[i+1].split()[3*j:3*j+3])
...
You don't need to:
use numpy
read the whole file in at once
know how many columns
use awkward comma at end of print statement
use list subscripting
use math.fabs()
explicitly close your file
Try this (untested):
with open("dtest.txt", "r") as posnfile:
previous = None
for line in posnfile:
current = [float(x) for x in line.split()]
if previous:
delta = [abs(c - p) for c, p in zip(current, previous)]
print ' '.join(str(d) for d in delta)
previous = current
just in case your dtest.txt grows larger and you don't want to redirect your output but rather write to distance.dat, especially, if you want to use numpy. Thank #John for pointing out my mistake in the old code ;-)
import numpy as np
pos = np.genfromtxt("dtest.txt")
dis = np.array([np.abs(pos[j+1] - pos[j]) for j in xrange(len(pos)-1)])
np.savetxt("distance.dat",dis)

Categories