I need to compare all possible pairs of sequences in a fasta file. Normally, I would use
from Bio import SeqIO
input_file = open("input.fasta")
my_dict = SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
But my file is too large to load into memory. Instead, I would like to load the sequences sequentially, but haven't found a way to do it properly. So far I have attempted this:
file = "dummy.fa" # A file with five fasta sequences for this example. Ids are 1, 2, 3, 4, and 5
with open(file, mode = "r") as source:
for record in SeqIO.parse(source, "fasta"):
record_1 = record.id
for record in SeqIO.parse(source, "fasta"):
record_2 = record.id
print(record_1, record_2)
#This produces the following output
1 3
1 4
1 5
But I am trying to get:
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
Is there a way to achieve this?
I have a .msh file which can be opened and modified by text readers. I want to open it in python, then replace some specific rows with a numpy array and save it as a new .msh file. My numpy array has 9 columns and hundreds of rows. My file has also hundreds of rows. I want to replace the rows of my file that have 9 columns with my numpy array. For sure, the number of rows in my numpy array is the same as number rows that have 9 columns in my file. For simplicity I showed here only two rows but in reality I have hundreds of rows.
The numpy array is:
arr_1= np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[-1, 0, 0, 1, 46, 2, -11, 0, 0]])
my file (my_file) is as following:
$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
2929
1 26.66002035140991 0.75 1.25
-1 5 14 13.2 7.4444 11 9 -3 0.15
0.2 9 54.45 1 63 22.45 0 12 425.65
Then, I want to get a new saved file with a new name as:
$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
2929
1 26.66002035140991 0.75 1.25
1 2 3 4 5 6 7 8 9
-1 0 0 1 46 2 -11 0 0
I only could do the following but it was not working:
with open('my_file') as f:
new_data= line.split() for line in f if len(line.split()) == 9
for i in new_data:
for j in arr_1:
i = j
I tried it but it was not successful at all. So, I appreciate any hint and help.
Cheers,
Ali
You can try this code, in which a new file named changed will be written. In order to reduce disk writes (to improve the performance, especially for large files), chunks of lines will be written to the new file.
chunk_size = 3
buffer = ""
i = 0
# the index of lines with 9 digits
relavent_line = 0
with open('changed', 'a') as fout:
with open('original', 'r') as fin:
for line in fin:
if len(line.split()) == 9:
aux_string = ' '.join([str(num) for num in arr_1[relavent_line]])
buffer += '%s\n' % aux_string
relavent_line += 1
else:
buffer += line
i+=1
if i == chunk_size:
fout.write(buffer)
i=0
buffer = ""
# make sure all lines will be written to the output file
if buffer:
fout.write(buffer)
i=0
buffer = ""
Try the following:
with open('my_file') as f:
for line in f:
new_data = line.split()
if len(new_data)==9:
for i in new_data:
for j in arr_1:
i = j
I have two txt files.
First one is contains a number for each line like this:
22
15
32
53
.
.
and the other file contains 20 continuous numbers for each line like this:
0.1 2.3 4.5 .... 5.4
3.2 77.4 2.1 .... 8.1
....
.
.
According to given number in first txt I want to separate the other files. For example, in first txt for first line I have 22, that means I will take first line with 20 column and second line with two column and other columns of second line I will remove. Then I will look second line of first txt (it is 15), that means I will take 15 column from third line of other file and other columns of third line I will remove and so on. How can I make this?
with open ('numbers.txt', 'r') as f:
with open ('contiuousNumbers.txt', 'r') as f2:
with open ('results.txt', 'w') as fOut:
for line in f:
...
Thanks.
For the number on each line you iterate through the first file, make that number a target total to read, so that you can use a while loop to keep using next on the second file object to read the numbers and decrement the number of numbers from the total until the total reaches 0. Use the lower number of the total and the number of numbers to slice the numbers so that you output just the requested number of numbers:
for line in f:
output = []
total = int(line)
while total > 0:
try:
items = next(f2).split()
output.extend(items[:min(total, len(items))])
total -= len(items)
except StopIteration:
break
fOut.write(' '.join(output) + '\n')
so that given the first file with:
3
6
1
5
and the second file with:
2 5
3 7
2 1
3 6
7 3
2 2
9 1
3 4
8 7
1 2
3 8
the output file will have:
2 5 3
2 1 3 6 7 3
2
9 1 3 4 8
I have a text file as follows:
A B C D E
1 1 2 1 1e8
2 1 2 3 1e5
3 2 3 2 2000
50 2 3 2 2000
80 2 3 2 2000
...
1 2 5 6 1000
4 2 4 3 1e4
50 3 6 4 5000
120 3 5 2 2000
...
2 3 2 3 5000
3 3 4 5 1e9
4 3 2 3 1e6
7 3 2 3 43
...
I need a code to go through this text file and extract lines with the same number in first columns[A] and save in different files,
for example for the first column = 1 and ...
1 1 2 1 1e8
1 2 5 6 1000
I wrote code with while loop, but the problem is that this file is very big and with while loop it does this work for the numbers which does not exist in text and it takes very very long to finish,
Thanks for your help
Warning
Both of the examples below will overwrite files called input_<number>.txt in the path they are run in.
Using awk
rm input_[0-9]*.txt; awk '/^[0-9]+[ \t]+/{ print >> "input_"$1".txt" }' input.txt
The front part /^[0-9]+[ \t]+/ does a regex match to select only lines which start with an integer number, the second part { print >> "input_"$1".txt" } prints those lines into a file named input_<number>.txt, with the corresponding lines for every number found in the first column of the file.
Using Python
import sys
import os
fn = sys.argv[1]
name, ext = os.path.splitext(fn)
with open(fn, 'r') as f:
d = {}
for line in f:
ind = line.split()[0]
try:
ind = int(int)
except ValueError:
continue
try:
d[ind].write(line)
except KeyError:
d[ind] = open(name + "_{}".format(ind) + ext, "w")
d[ind].write(line)
for dd in d.values():
dd.close()
Using Python (avoiding too many open file handles)
In this case you have to remove any old output files before you run the code manually, using rm input_[0-9]*.txt
import sys
import os
fn = sys.argv[1]
name, ext = os.path.splitext(fn)
with open(fn, 'r') as f:
for line in f:
ind = line.split()[0]
try:
ind = int(int)
except ValueError:
continue
with open(name + "_{}".format(ind) + ext, "a") as d:
d.write(line)
Raising the limit of the number of open file handles
If you are sudoer on your machine, you can increase the limit of open file handles for a process by using ulimit -n <number>, as per this answer.
I'd like to read numbers from file into two dimensional array.
File contents:
line containing w, h
h lines containing w integers separated with space
For example:
4 3
1 2 3 4
2 3 4 5
6 7 8 9
Assuming you don't have extraneous whitespace:
with open('file') as f:
w, h = [int(x) for x in next(f).split()] # read first line
array = []
for line in f: # read rest of lines
array.append([int(x) for x in line.split()])
You could condense the last for loop into a nested list comprehension:
with open('file') as f:
w, h = [int(x) for x in next(f).split()]
array = [[int(x) for x in line.split()] for line in f]
To me this kind of seemingly simple problem is what Python is all about. Especially if you're coming from a language like C++, where simple text parsing can be a pain in the butt, you'll really appreciate the functionally unit-wise solution that python can give you. I'd keep it really simple with a couple of built-in functions and some generator expressions.
You'll need open(name, mode), myfile.readlines(), mystring.split(), int(myval), and then you'll probably want to use a couple of generators to put them all together in a pythonic way.
# This opens a handle to your file, in 'r' read mode
file_handle = open('mynumbers.txt', 'r')
# Read in all the lines of your file into a list of lines
lines_list = file_handle.readlines()
# Extract dimensions from first line. Cast values to integers from strings.
cols, rows = (int(val) for val in lines_list[0].split())
# Do a double-nested list comprehension to get the rest of the data into your matrix
my_data = [[int(val) for val in line.split()] for line in lines_list[1:]]
Look up generator expressions here. They can really simplify your code into discrete functional units! Imagine doing the same thing in 4 lines in C++... It would be a monster. Especially the list generators, when I was I C++ guy I always wished I had something like that, and I'd often end up building custom functions to construct each kind of array I wanted.
Not sure why do you need w,h. If these values are actually required and mean that only specified number of rows and cols should be read than you can try the following:
output = []
with open(r'c:\file.txt', 'r') as f:
w, h = map(int, f.readline().split())
tmp = []
for i, line in enumerate(f):
if i == h:
break
tmp.append(map(int, line.split()[:w]))
output.append(tmp)
is working with both python2(e.g. Python 2.7.10) and python3(e.g. Python 3.6.4)
with open('in.txt') as f:
rows,cols=np.fromfile(f, dtype=int, count=2, sep=" ")
data = np.fromfile(f, dtype=int, count=cols*rows, sep=" ").reshape((rows,cols))
another way:
is working with both python2(e.g. Python 2.7.10) and python3(e.g. Python 3.6.4),
as well for complex matrices see the example below (only change int to complex)
with open('in.txt') as f:
data = []
cols,rows=list(map(int, f.readline().split()))
for i in range(0, rows):
data.append(list(map(int, f.readline().split()[:cols])))
print (data)
I updated the code, this method is working for any number of matrices and any kind of matrices(int,complex,float) in the initial in.txt file.
This program yields matrix multiplication as an application. Is working with python2, in order to work with python3 make the following changes
print to print()
and
print "%7g" %a[i,j], to print ("%7g" %a[i,j],end="")
the script:
import numpy as np
def printMatrix(a):
print ("Matrix["+("%d" %a.shape[0])+"]["+("%d" %a.shape[1])+"]")
rows = a.shape[0]
cols = a.shape[1]
for i in range(0,rows):
for j in range(0,cols):
print "%7g" %a[i,j],
print
print
def readMatrixFile(FileName):
rows,cols=np.fromfile(FileName, dtype=int, count=2, sep=" ")
a = np.fromfile(FileName, dtype=float, count=rows*cols, sep=" ").reshape((rows,cols))
return a
def readMatrixFileComplex(FileName):
data = []
rows,cols=list(map(int, FileName.readline().split()))
for i in range(0, rows):
data.append(list(map(complex, FileName.readline().split()[:cols])))
a = np.array(data)
return a
f = open('in.txt')
a=readMatrixFile(f)
printMatrix(a)
b=readMatrixFile(f)
printMatrix(b)
a1=readMatrixFile(f)
printMatrix(a1)
b1=readMatrixFile(f)
printMatrix(b1)
f.close()
print ("matrix multiplication")
c = np.dot(a,b)
printMatrix(c)
c1 = np.dot(a1,b1)
printMatrix(c1)
with open('complex_in.txt') as fid:
a2=readMatrixFileComplex(fid)
print(a2)
b2=readMatrixFileComplex(fid)
print(b2)
print ("complex matrix multiplication")
c2 = np.dot(a2,b2)
print(c2)
print ("real part of complex matrix")
printMatrix(c2.real)
print ("imaginary part of complex matrix")
printMatrix(c2.imag)
as input file I take in.txt:
4 4
1 1 1 1
2 4 8 16
3 9 27 81
4 16 64 256
4 3
4.02 -3.0 4.0
-13.0 19.0 -7.0
3.0 -2.0 7.0
-1.0 1.0 -1.0
3 4
1 2 -2 0
-3 4 7 2
6 0 3 1
4 2
-1 3
0 9
1 -11
4 -5
and complex_in.txt
3 4
1+1j 2+2j -2-2j 0+0j
-3-3j 4+4j 7+7j 2+2j
6+6j 0+0j 3+3j 1+1j
4 2
-1-1j 3+3j
0+0j 9+9j
1+1j -11-11j
4+4j -5-5j
and the output look like:
Matrix[4][4]
1 1 1 1
2 4 8 16
3 9 27 81
4 16 64 256
Matrix[4][3]
4.02 -3 4
-13 19 -7
3 -2 7
-1 1 -1
Matrix[3][4]
1 2 -2 0
-3 4 7 2
6 0 3 1
Matrix[4][2]
-1 3
0 9
1 -11
4 -5
matrix multiplication
Matrix[4][3]
-6.98 15 3
-35.96 70 20
-104.94 189 57
-255.92 420 96
Matrix[3][2]
-3 43
18 -60
1 -20
[[ 1.+1.j 2.+2.j -2.-2.j 0.+0.j]
[-3.-3.j 4.+4.j 7.+7.j 2.+2.j]
[ 6.+6.j 0.+0.j 3.+3.j 1.+1.j]]
[[ -1. -1.j 3. +3.j]
[ 0. +0.j 9. +9.j]
[ 1. +1.j -11.-11.j]
[ 4. +4.j -5. -5.j]]
complex matrix multiplication
[[ 0. -6.j 0. +86.j]
[ 0. +36.j 0.-120.j]
[ 0. +2.j 0. -40.j]]
real part of complex matrix
Matrix[3][2]
0 0
0 0
0 0
imaginary part of complex matrix
Matrix[3][2]
-6 86
36 -120
2 -40
To make the answer simple here is a program that reads integers from the file and sorting them
f = open("input.txt", 'r')
nums = f.readlines()
nums = [int(i) for i in nums]
After reading each line of the file converting each string to a digit
nums.sort()
Sorting the numbers
f.close()
f = open("input.txt", 'w')
for num in nums:
f.write("%d\n" %num)
f.close()
Writing them back
As easy as that, Hope this helps
The shortest I can think of is:
with open("file") as f:
(w, h), data = [int(x) for x in f.readline().split()], [int(x) for x in f.read().split()]
You can seperate (w, h) and data if it looks neater.