I was writing a python script which converts an ascii file containing one pair numbers per line to a straight binary representation. Here is my script:
in_file = open("p02_0609.bin", 'r')
out_file = open("sta013.bin", 'w')
out_data = bytearray()
for line in in_file:
addr, i2c_data = [int(x) for x in line.split(" ")]
out_data.append(addr)
out_data.append(i2c_data)
out_file.write(out_data)
out_file.close()
in_file.close()
and a sample of the file it's reading (about 2000 lines total)
58 1
42 4
40 0
41 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 1
40 2
33 143
40 3
33 0
40 4
40 5
40 6
40 7
40 8
40 9
40 10
40 11
The output file ends on an odd byte, which it shouldn't since all the data is in pairs, and is about 80 bytes longer than expected. After poking around with a hex editor, I finally found the culprit. Every instance of "10" (Ascii LF) has had a CR appended in front of it. How do I make it stop doing that?
Tl;dr: Python is being a dumbass and adding CR to LF in binary data where that makes no sense. How to fix?
You are working with text files so line endings are automatically added by open function. You need to use the mode 'wb' in open for reading and writing bytes.
Related
I want to convert text file to excel file, without deleting spaces for each line.
Note that the number of columns will be equal to all lines of the file.
the text file follows the following format:
First row
05100079 0000001502 5 01 2 070 1924 02 06 1994 C508 2 8500 3 8500 3 3 1 1 012 10 0 98 00 4 8 8 9 0 40 01 2 15 26000 1748 C508 116 102 3 09 98 013 1 1 0 1 10 10 0 09003 50060 50060 0 0 369 99 9 1 4 4 5 8 0 0181 1 80 00 01 0 9 9 8 1 0 00 00 020 0
second row
05100095 0000001502 2 01 2 059 1917 02 03 1977 C504 2 8500 3 8500 3 9 1 1 54-11-0999-00 2 9 0 90 01 2 12 26000 1744 C504 116 102 3 09 98 013 1 1 0 2 0 09011 50060 50060 0 36 9 9 1 9 9 5 8 0 3161 9 9 8 020 0 `
How to edit the code to convert text file to excel file without deleting the spaces between data?
This code below deletes the space in each line.
I mean to convert the file to Excel Sheet without any modification to the original file.
The spaces stay spaces and all other data stays the same format.
import xlwt
import xlrd
book = xlwt.Workbook()
ws = book.add_sheet('First Sheet') # Add a sheet
f = open('testval.txt', 'r+')
data = f.readlines() # read all lines at once
for i in range(len(data)):
row = data[i].split() # This will return a line of string data, you may need to convert to other formats depending on your use case`
for j in range(len(row)):
ws.write(i, j, row[j]) # Write to cell i, j
book.save('testval' + '.xls')
f.close()
Expected output:
Excel file in the same format as the original file"text"
If you have fixed-length fields, you need to split each line using index intervals.
For instance, you can do:
book = xlwt.Workbook()
ws = book.add_sheet('First Sheet') # Add a sheet
with io.open("testval.txt", mode="r", encoding="utf-8") as f:
for row_idx, row in enumerate(f):
row = row.rstrip()
ws.write(row_idx, 0, row[0:8])
ws.write(row_idx, 1, row[9:19])
ws.write(row_idx, 2, row[20:21])
ws.write(row_idx, 3, row[22:24])
# and so on...
book.save("sample.xlsx")
You get something like that:
My binary file test.bin contains
11 22 33 44 55 66 ...
I want to modify the 3rd position with AA and my file should be like
11 22 33 AA 55 66 ....
Open the file for update in binary mode, seek to the desired position in the file, then write the replacement character. The following will work in Python 2 and 3 and will overwrite the 4th byte of the file (3rd position if counting from 0) with 0xAA.
with open('test.bin', 'rb+') as f:
f.seek(3)
f.write(b'\xAA')
I have txt file witht the following txt in it:
2
4 8 15 16 23 42
1 3 5
6
66 77
77
888
888 77
34
23 234 234
1
32
3
23 23 23
365
22 12
I need a way to read the file and sum all the numbers.
i have this code for now but not sure what to do next. Thx in advance
`lstComplete = []
fichNbr = open("nombres.txt", "r")
lstComplete = fichNbr
somme = 0
for i in lstComplete:
i = i.split()`
Turn them into a list and sum them:
with open('nombres.txt', 'r') as f:
num_list = f.read().split()
print sum([int(n) for n in num_list])
Returns 3227
Open the file and use read() method to get the content and then convert string to int, use sum() to get the result:
>>> sum(map(int,open('nombres.txt').read().split()))
3227
I have 3 files. In there header they all have Id's. What I want to do is to find the intersecting ID's (template is file 1) and then copy the columns with the correct ID behind the ID in the template-file.
Here is an example:
File 1 is the template:
name 123 124 125 128 131 145 156
rdt4 35 12 23 21 36 34 37
gtf2 24 18 18 29 26 12 40
hzt7 40 23 26 25 13 21 28
File 2:
name 123 124 125 126 127 128 131 132 133 145 156
rdt4 F F F T T F T T T F T
gtf2 F F F T T F T T T F T
hzt7 F F F T T F T T T F T
File 3:
name 123_a 123_b 123_c 124_a 124_b 124_c 125_a 125_b 125_c 126_a 126_b 126_c 127_a 127_b 127_c 128_a 128_b 128_c and so on
rdt4 0,087 0,265 0,632 0,220 0,851 0,271 0,436 0,148 0,080 0,899 0,636 0,467 0,508 0,460 0,393 0,689 0,427 0,798
gtf2 0,770 0,971 0,231 0,969 0,494 0,181 0,989 0,155 0,351 0,131 0,204 0,553 0,581 0,138 0,982 0,287 0,702 0,522
hzt7 0,185 0,535 0,093 0,807 0,487 0,786 0,886 0,905 0,966 0,283 0,490 0,190 0,688 0,714 0,577 0,643 0,476 0,738
The final file should look like this:
name 123 123 123_b 124 124 124_b 125 125 125_b 128 128 128_b 131 131 131_b 145 145 145_b 156 156 156_b
rdt4 35 F 0,265 12 F 0,851 23 F 0,148 21 F 0,427 36 T 34 F 37 T
gtf2 24 F 0,971 18 F 0,494 18 F 0,155 29 F 0,702 26 T 12 F 40 T
hzt7 40 F 0,535 23 F 0,487 26 F 0,905 25 F 0,476 13 T 21 F 28 T
Note: I skipped to type in everything for File 3 because it has the same numbers of ID's like file 2 but in file 3 every ID have 3 columns and I need only one of these columns (in the example column b).
What I tried so far:
I started first to do everything only with file 1 and file 2.
I copied the ID's to a new list and then find the positions of these ID's in file 2 to extract the data. But this seems to be very tricky (at least for me). The appending works so far but the problem is that every list which is stored in the list final is the same. It would be nice if you can help me with this.
This is my code so far:
try:
Expr_Matrix_1="file1.txt"
#Expr_Matrix_1=raw_input('Name file with expression data: ')
Expr_Matrix_2=open(Expr_Matrix_1)
Expr_Matrix_3=open(Expr_Matrix_1)
except:
print 'This is a wrong filename!'
exit()
try:
Probe_Detect_1="file2.txt"
#Probe_Detect_1=raw_input('Name of file with probe detection: ')
Probe_Detect_2=open(Probe_Detect_1)
Probe_Detect_3=open(Probe_Detect_1)
except:
print 'This is a wrong filename!'
exit()
find_list=list()
for b, line2 in enumerate(Expr_Matrix_2):
line2 = line2.rstrip()
line2 = line2.split("\t")
if b == 0:
for item in line2:
find_list.append(item)
find_list=find_list[7:]
find_list2=list()
for i, line in enumerate(Probe_Detect_2):
line = line.rstrip()
line = line.split("\t")
if i == 0:
for item in find_list:
find_list2.append(line.index(item))
#print find_list2
index1=8
final=list()
for b, line2 in enumerate(Expr_Matrix_3):
line2 = line2.rstrip()
line2 = line2.split("\t")
for c, line in enumerate(Probe_Detect_3):
line = line.rstrip()
line = line.split("\t")
if line2[b]==line[c]:
for item in find_list2:
if len(line2)<1551:
line2.insert(index1, line[item])
index1=index1+2
final.append(line2)
print final[1]
The fist ID-column in file 1 is column 7 that's why I used the 7 for slicing.
The 1551 means the number of rows to which it should be copied but I think this is a complete wrong approach. However, I wanted to show you my try!
Another note: All files start with the name-column but between this column and the first ID-column there are some columns which shouldn't be considered. Because file 1 is the template those columns should also be in the final file.
What is the solution?
For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
import collections
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
out = np.zeros((len(data2),len(data1)))
for row in data2:
for ch_row in range(len(data1)):
if (row[3] == ch_row + 1):
out = row.tolist() + data1[ch_row].tolist()
print(out)
writer = csv.writer(open('dn.csv','w'), delimiter=',',quoting=csv.QUOTE_ALL)
writer.writerow(out)
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
If I do "print(out)", it comes out a correct answer. However, when I input "out" in the shell, there are only one row appears like [1.0, 1.0, 1.0, 1.0, 20.0, 30.0, 50.0]
What I need is to store all the values in the "out" variables and write them to the dn.csv file.
This ought to do the trick for you:
Code:
from csv import reader, writer
data = list(reader(open("filename.csv", "r"), delimiter=" "))
out = writer(open("output.csv", "w"), delimiter=" ")
for row in reader(open("index.csv", "r"), delimiter=" "):
out.writerow(row + data[int(row[3])])
index.csv:
0 0 0 1
0 0 0 2
0 0 0 3
filename.csv:
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
This produces the output:
0 0 0 1 70 60 45
0 0 0 2 35 26 77
0 0 0 3 93 37 68
Note: There's no need to use numpy here. The stadard library csv module will do most of the work for you.
I also had to modify your sample datasets a bit as what you showed had indexes out of bounds of the sample data in filename.csv.
Please also note that Python (like most languages) uses 0th indexes. So you may have to fiddle with the above code to exactly fit your needs.
with open('dn.csv','w') as f:
writer = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
idx = row[3]
out = [idx] + [x for x in data1[idx-1]]
writer.writerow(out)