Read and replace the contents of a line using dictionary - python

I have a file below where I want to convert what is written on every fourth line into a number.
sample.fastq
#HISE
GGATCGCAATGGGTA
+
CC#!$%*&J#':AAA
#HISE
ATCGATCGATCGATA
+
()**D12EFHI#$;;
Each fourth line is a series of characters which each individually equate to a number (stored in a dictionary). I would like to convert each character into it’s corresponding number and then find the average of all those numbers on that line.
I have gotten as far as being able to display each of the characters individually but I’m pretty stunted as to how to replace the characters with their number and then subsequently go on further.
script.py
d = {
'!':0, '"':1, '#':2, '$':3, '%':4, '&':5, '\'':6, '(':7, ')':8,
'*':9, '+':10, ',':11, '-':12, '.':13, '/':14, '0':15,'1':16,
'2':17, '3':18, '4':19, '5':20, '6':21, '7':22, '8':23, '9':24,
':':25, ';':26, '<':27, '=':28, '>':29, '?':30, '#':31, 'A':32, 'B':33,
'C':34, 'D':35, 'E':36, 'F':37, 'G':38, 'H':39, 'I':40, 'J':41 }
with open('sample.fastq') as fin:
for i in fin.readlines()[3::4]:
for j in i:
print j
The output should be as below and stored in a new file.
output.txt
#HISE
GGATCGCAATGGGTA
+
19 #From 34 34 31 0 3 4 9 5 41 2 6 25 32 32 32
#HISE
ATCGATCGATCGATA
+
23 #From 7 8 9 9 35 16 17 36 37 39 40 31 3 26 26
Is what i’m proposing possible?

You can do this with a for loop over the input file lines:
with open('sample.fastq') as fin, open('outfile.fastq', "w") as outf:
for i, line in enumerate(fin):
if i % 4 == 3: # only change every fourth line
# don't forget to do line[:-1] to get rid of newline
qualities = [d[ch] for ch in line[:-1]]
# take the average quality score. Note that as in your example,
# this truncates each to an integer
average = sum(qualities) / len(qualities)
# new version; average with \n at end
line = str(average) + "\n"
# write line (or new version thereof)
outf.write(line)
This produces the output you requested:
#HISE
GGATCGCAATGGGTA
+
19
#HISE
ATCGATCGATCGATA
+
22

Assuming you read from stdin and write to stdout:
for i, line in enumerate(stdin, 1):
line = line[:-1] # Remove newline
if i % 4 != 0:
print(line)
continue
nums = [d[c] for c in line]
print(sum(nums) / float(len(nums)))

Related

ValueError: invalid literal for int() with base 10: ':'

I am using jupyter python 3. I have tried to import data from .tsp file but it keeps showing me this error.And I saw some people had same problem and they solved it thanks to convert, but it did not work on my codes.
NAME: berlin52
TYPE: TSP
COMMENT: 52 locations in Berlin (Groetschel)
DIMENSION : 52
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
5 845.0 655.0
6 880.0 660.0
7 25.0 230.0
8 525.0 1000.0
9 580.0 1175.0
10 650.0 1130.0
# Open input file
infile = open(r'C:\Users\13136\OneDrive\Desktop\AI\berlin52.tsp')
# Read instance header
Name = infile.readline().strip().split()[1] # NAME
FileType = infile.readline().strip().split()[1] # TYPE
Comment = infile.readline().strip().split()[1] # COMMENT
Dimension = infile.readline().strip().split()[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split()[1] # EDGE_WEIGHT_TYPE
infile.readline()
# Read node list
nodelist = []
N = int(Dimension)
for i in range(0, int(Dimension)):
x,y = infile.readline().strip().split()[1:]
nodelist.append([float(x), float(y)])
# Close input file
infile.close()
ValueError Traceback (most recent call last)
<ipython-input-22-5e3fe725955a> in <module>
12 # Read node list
13 nodelist = []
---> 14 N = int(Dimension)
15 for i in range(0, int(Dimension)):
16 x,y = infile.readline().strip().split()[1:]
ValueError: invalid literal for int() with base 10: ':'
Name = infile.readline().strip().split(':')[1] # NAME
FileType = infile.readline().strip().split(':')[1] # TYPE
Comment = infile.readline().strip().split(':')[1] # COMMENT
Dimension = infile.readline().strip().split(':')[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split(':')[1] # EDGE_WEIGHT_TYPE
The two lines for DIMENSION and EDGE_WEIGHT_TYPE in your file do not have the : immediately following the name, but have some extra space inbetween, so split() will split these lines at each space, into three parts, e.g.:
['DIMENSION', ':', '52']
You are selecting the second part, which cannot be interpreted as int. You want to always have the second part of the line after splitting the line by :, not by , which split(':') does for you, e.g.:
['DIMENSION ', ' 52']
The extra whitespaces could be removed with a .strip() call after these lines, but int will also accept it without.
Dimension = infile.readline().split(':')[1].strip()
This will still cut of fields containing extra :, but I suppose such special cases are not that important to you here.

how to reading .txt file , and adding space after specific position/index , for each line in python

I want to read all line in txt.file, and add space at after specific position/index.
suppose my file contains:
1234567891011 12134516 17
in the above file: I want to add space after index/position [8],[10],[16],and [20], means after irrerular position/index.
expected output :
123456789 10 11 121 3451 6 17
As a reminder: I don't want to replace elements, just add space after a specific position/index
note: all line in the file as the same structure.
reading and writing .txt file and add space after at specific positon/index.
my file contain :
1234567891011 12134516 17
6546546546456 35654554 54
expected output :
123456789 10 11 121 3451 6 17
654654654 64 56 356 5455 4 54
You can read the file line by line and putting the output in an array using the following codes:
f=open('configuration.txt','r+')
lines=f.readlines()
Now you can have access to each line as a string. exp: line 0 by "lines[0]" and you can put space in the 8th index like this:
lines[0]=lines[0][:8] + ' ' + lines[0][8:]
In order to rewrite the file use "".join(lines) to convert a list to string
f.write("".join(lines))
As I understand the following code will solve your problem:
f=open('test.txt','r+')
lines=f.readlines()
for i in range(0,len(lines)):
if lines[i]=='\n':
continue
lines[i]=lines[i][:9] + ' ' + lines[i][9:]
lines[i]=lines[i][:12] + ' ' + lines[i][12:]
lines[i]=lines[i][:19] + ' ' + lines[i][19:]
lines[i]=lines[i][:24] + ' ' + lines[i][24:]
f.seek(0,0)
f.write("".join(lines))

Working with numbers in Python

Hello. I am very new to Python and programming in general.
I have 3 columns from CSV file
X,CH1,CH2,
Second,Volt,Volt,
2.66400e-02,4.00e-03,1.04e-03,
-2.66360e-02,4.00e-03,7.20e-04,
-2.66320e-02,4.00e-03,5.60e-04,
-2.66280e-02,4.00e-03,3.20e-04,
-2.66240e-02,4.00e-03,8.00e-05,
-2.66200e-02,4.00e-03,-2.40e-04,
-2.66160e-02,4.00e-03,-5.60e-04,
-2.66120e-02,4.00e-03,-7.20e-04,
-2.66080e-02,4.00e-03,-1.04e-03, ***for example.***
I am using
:
with open('maximum.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for _ in xrange(2):
next(f)
to skip first two lines, as this is just text, and then
for row in reader:
x=(float(row[2]))
print(x)
gives me
0.00104
0.00072
0.00056
0.00032
8e-05
-0.00024
-0.00056
-0.00072
-0.00104
So there is the question:
What should I write, so that it will give me an integer number instead of decimals, like
104
72
56
24
8
24
56
72
104
P.S I do not want just to multiply by 10^5
Thanks
You have to multiply by 10 ^ 5 because you actually want to have bigger number.
Then apply function int() and get 104 instead of 104.0

How to sort output data into columns and rows

I have an output that looks like this, where the first number corresponds to the count of the type below (e.g. 72 for Type 4, etc)
72
Type
4
51
Type
5
66
Type
6
78
Type
7
..etc
Is there a way to organize this data to look something like this:
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
etc..
Essentially, the question is how to take a single column of data and sort /organize it into something more readable using bash, awk, python, etc. (Ideally, in bash, but interested to know how to do in Python).
Thank you.
Use paste to join 3 consecutive lines from stdin, then just rearrange the fields.
paste - - - < file | awk '{print $2, $3, "=", $1, "times"}'
It's simple enough with Python to read three lines of data at a time:
def perthree(iterable):
return zip(*[iter(iterable)] * 3)
with open(inputfile) as infile:
for count, type_, type_num in perthree(infile):
print('{} {} = {} times'.format(type_.strip(), type_num.strip(), count.strip()))
The .strip() calls remove any extra whitespace, including the newline at the end of each line of input text.
Demo:
>>> with open(inputfile) as infile:
... for count, type_, type_num in perthree(infile):
... print('{} {} = {} times'.format(type_.strip(), type_num.strip(), count.strip()))
...
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
In Bash:
#!/bin/bash
A=() I=0
while read -r LINE; do
if (( (M = ++I % 3) )); then
A[M]=$LINE
else
printf "%s %s = %s times\n" "${A[2]}" "$LINE" "${A[1]}"
fi
done
Running bash script.sh < file creates:
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
Note: With a default IFS ($' \t\n'), read would remove leading and trailing spaces by default.
Try this awk one liner:
$ awk 'NR%3==1{n=$1}NR%3==2{t=$1}NR%3==0{print t,$1,"=",n,"times"}' file
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
How it works?
awk '
NR%3==1{ # if we are on lines 1,4,7, etc (NR is the record number (or the line number)
n=$1 # set the variable n to the first (and only) word
}
NR%3==2{ # if we are on lines 2,5,7, etc
t=$1 # set the variable t to the first (and only) word
}
NR%3==0{ # if we are on lines 3,6,9, etc
print t,$1,"=",n,"times" # print the desired output
}' file

Complex parsing query

I have a very complex parsing problem. Any thoughts would be appreciated here. I have a test.dat file.The file to be parsed looks like this:
* Number = 40
Time = 0
1 10.13 10 10.11 12 13
.
.
Time = n
1 10 10 10 12.50 13
.
.
There are N time blocks and each block has 40 lines like shown above. What I would like to do is add e.g. the 1st line of first block , then 1st line in block #2 .. and so on to to a new file -test_1.dat. Similarly, 2nd line of every block to test_2.datand so on.The lines in the block should be written as is to the new _n.dat file. Is there any way to do this? The number I have assumed here is 40, so if the * number = 40 there will be 40 lines under each time block.
regards,
Ris
You can read the file in as a list of strings (call it fileList), where each string is a different line:
f = open('filename')
fileList = f.readlines()
Then, remove the "header" part of your file with
fileList.pop(0)
fileList.pop(0)
Then, do
outFileContents = {} # This will be a dict, where number -> content of test_number.dat
for outFileName in range(1,41): #outFileName will be the number going after the _ in your filename
outFileContents[outFileName] = []
for n in range(40): # Counting through the time blocks
currentRowIndex = (42 * n) + outFileName # 42 to account for the Time = and blank row
outFileContents[outFileName].append(fileList[currentRowIndex])
Finally you can loop through outFileContents and write the contents of each value to separate files.

Categories