Suppose I have a file name num.txt as below:
1 2 3 4 5
6 7 8 9 0
I want to read 3 integers from this file, that is 1 2 3.
I know that struct.unpack might do the trick, but I just cannot get it right.
Here is how I did it:
fp = open('num.txt', 'rb')
print struct.unpack('iii', fp.read(12)) #right?
Anyone can help me with this?
PS
This is how I got file num.txt:
fp = open('num.txt', 'wb')
fp.write('1 2 3 4 5\n6 7 8 9 0')
fp.close()
You don't use struct to read numbers from a text file. It is for reading data from a binary file -- where the first byte is actually 0x01 rather than a byte order mark or the encoded value of the character '1'.
You just want
three_ints = [int(x) for x in numfile.readline().strip().split(' ')[:3]]
If you're only interested in the first three numbers, or
all_ints = [[int(x) for x in line.split()] for line in numfile]
if you want a list of lists of the ints on each line.
struct is used for C-style binary representations of numbers. If you have text representations instead then you should just pass them to int().
>>> [int(x) for x in '1 2 3 4 5'.split()]
[1, 2, 3, 4, 5]
Related
By writing simple python script, I encoutered a weird problem: Two files with a different content have same size.
So, I have a two same list of some binary data, one in string, one in int:
char_list = '10101010'
int_list = [1, 0, 1, 0, 1, 0, 1, 0]
Then, I convert lists to bytearray:
bytes_from_chars = bytearray(char_list, "ascii")
bytes_from_ints = bytearray(int_list)
Printing this out, give me this result:
bytearray(b'10101010')
bytearray(b'\x01\x00\x01\x00\x01\x00\x01\x00')
but, this is ok.
Writing this data to disk:
with open("from_chars.hex", "wb") as f:
f.write(bytes_from_chars)
with open("from_ints.hex", "wb") as f:
f.write(bytes_from_ints)
And the size of files are same, but files contains different data!
ls -l:
hexdump of files:
And my question is, why the size of file are equal? As I now, to write value of 0 or 1 we need 1 bit, and to write hex value of 30 or 31 we need 5 bits (1 1110 and 1 1111)
To write the value of 0 or 1 you do not need a single bit. How could you tell the difference between 3 = 11 or having two 1?
You are writing in both cases an array of 8 bytes, Just in the first case your using the whole byte to write the char.
Think of it as writing a word from the letters 0 and 1, the word 1 is 0000 0001 , Without the 0s in the start, you wont be able to tell what the word is.
I have a text file of the size of 1 megabyte which contains some numeric strings and some letters strings with the characters length of 3, 5, 9, 8, 10. How can I find all the numbers that have only the length of 8 characters? And after finding the numbers with the length of 8 characters these numbers must be extracted and saved in this file extracted.txt . How i can do this?
Example...
file.txt
91664356
1665
00643
qouytyi
15790008
1567065
abcdeigf
qoiyytgxf
931467846
00851685
150033561246788
074226899
extracted.txt
91664356
15790008
15670654
00851685
Use -
with open('data.txt', 'r') as myfile:
data=myfile.read()
numbers = re.findall(r'\D(\d{8})\D', data)
It will catch numbers that have 8-length - which does not includes numbers like 478319.3
It will output a list of such numbers.
Example
Let
123.32 is a good number 12 also 12345678 478319.3
be the contents of the file.
Output will be -
['12345678']
source file like this?
12345678 123456789 1234567 abcdefg abcdefgh abcdefghi
then,maybe this script can help you
f = open('stack.txt')
s = open('save.txt','w')
for i in f.read().split(' '):
if(len(i)!=8):
continue
else:
try:
print(int(i))#if not number,int() will make an error
s.write(i)
except:
pass
f.close()
s.close()
I'm dealing with a character separated hex file, where each field has a particular start code. I've opened the file as 'rb', but I was wondering, after I get the index of the startcode using .find, how do I read a certain number of bytes from this position?
This is how I am loading the file and what I am attempting to do
with open(someFile, 'rb') as fileData:
startIndex = fileData.find('(G')
data = fileData[startIndex:7]
where 7 is the number of bytes I want to read from the index returned by the find function. I am using python 2.7.3
You can get the position of a substring in a bytestring under python2.7 like this:
>>> with open('student.txt', 'rb') as f:
... data = f.read()
...
>>> data # holds the French word for student: élève
'\xc3\xa9l\xc3\xa8ve\n'
>>> len(data) # this shows we are dealing with bytes here, because "élève\n" would be 6 characters long, had it been properly decoded!
8
>>> len(data.decode('utf-8'))
6
>>> data.find('\xa8') # continue with the bytestring...
4
>>> bytes_to_read = 3
>>> data[4:4+bytes_to_read]
'\xa8ve'
You can look for the special characters, and for compatibility with Python3k, it's better if you prepend the character with a b, indicating these are bytes (in Python2.x, it will work without though):
>>> data.find(b'è') # in python2.x this works too (unfortunately, because it has lead to a lot of confusion): data.find('è')
3
>>> bytes_to_read = 3
>>> pos = data.find(b'è')
>>> data[pos:pos+bytes_to_read] # when you use the syntax 'n:m', it will read bytes in a bytestring
'\xc3\xa8v'
>>>
I want to write a 2D numpy array into a human-readable text file format. I came across this question asked before but it only specifies equal number of space to be associated with each element in the array. In it all elements are spaced out with 10 spaces. What I want is different number of space for each column in my array.
Writing white-space delimited text to be human readable in Python
For example, I want 7 spaces for my 1st column, 10 spaces for my 2nd column, 4 spaces for my 3rd column, etc. Is there an analogy to numpy.savetxt(filename, X, delimiter = ',', fmt = '%-10s'), but where instead of '%-10s' I have say '%-7s, %-10s, %-4s' etc?
Thank you
Here is an example of what it can look like (Python2&3):
l = [[1,2,3,4], [3,4,5,6]]
for row in l:
print(u'{:<7} {:>7} {:^7} {:*^7}'.format(*row))
1 2 3 ***4***
3 4 5 ***6***
The formatting options are taken from http://docs.python.org/2/library/string.html
>>> '{:<30}'.format('left aligned')
'left aligned '
>>> '{:>30}'.format('right aligned')
' right aligned'
>>> '{:^30}'.format('centered')
' centered '
>>> '{:*^30}'.format('centered') # use '*' as a fill char
'***********centered***********'
If you need a file, then do this:
l = [[1,2,3,4], [3,4,5,6]]
with open('file.txt', 'wb') as f:
f.write(u'\ufeff'.encode('utf-8'))
for row in l:
line = u'{:<7} {:>7} {:^7} {:*^7}\r\n'.format(*row)
f.write(line.encode('utf-8'))
The content of the file is
1 2 3 ***4***
3 4 5 ***6***
And the encoding is UTF-8. This means that you can have not only numbers but also any letter in your heading: ☠ ⇗ ⌚ ② ☕ ☃ ⛷
heading = [u'☠', u'⇗', u'⌚', u'②']
with open('file.txt', 'wb') as f:
f.write(u'\ufeff'.encode('utf-8'))
line = '{:<7} {:>7} {:^7} {:*^7}\r\n'.format(*heading)
f.write(line.encode('utf-8'))
for row in l:
line = '{:<7} {:>7} {:^7} {:*^7}\r\n'.format(*row)
f.write(line.encode('utf-8'))
☠ ⇗ ⌚ ***②***
1 2 3 ***4***
2 3 4 ***5***
I need to sort the lines of a text file using the integer values of one of the columns
(the first one).
The file (coord.xyz) looks like this
9 1 -1.379785 0.195902 -1.197553
5 4 -0.303549 0.242253 -0.810244
2 2 -0.582923 1.208243 1.566588
3 3 -0.494556 0.028594 0.763130
4 1 -0.749005 -1.209878 1.358057
1 1 -0.883509 1.111866 2.882335
6 1 -1.005786 -1.278486 2.719391
7 5 -1.128898 -0.088124 3.508042
10 1 -0.253070 -0.289294 5.424662
8 1 -1.243879 -0.217228 5.247915
I used the code
import numpy as np
with open("coord.xyz") as inf:
data = []
for line in inf:
line = line.split()
if len(line)==5:
data.append(line)
f_h = file('sorted.dat','a')
m = sorted(data, key=lambda data_entry: data_entry[0])
np.savetxt(f_h, m, fmt='%s', delimiter=' ')
f_h.close()
the sorted.dat file resulted to be like this
1 1 -0.883509 1.111866 2.882335
10 1 -0.253070 -0.289294 5.424662
2 2 -0.582923 1.208243 1.566588
3 3 -0.494556 0.028594 0.763130
4 1 -0.749005 -1.209878 1.358057
5 4 -0.303549 0.242253 -0.810244
6 1 -1.005786 -1.278486 2.719391
7 5 -1.128898 -0.088124 3.508042
8 1 -1.243879 -0.217228 5.247915
9 1 -1.379785 0.195902 -1.197553
The 10 is considered as a smaller value than 2.
Could someone help me to fix this ?
What you wrote is sorting the lines as strings. Alphabetically 10 comes before 2.
Try writing your lambda as:
m = sorted(data, key=lambda data_entry: int(data_entry[0]))
If you used NumPy to import the data as well as to export it, you wouldn't have this problem. For example:
m = np.loadtxt("coord.xyz", dtype="i, i, f8, f8, f8")
Now you've got a 1D array of tuples of the appropriate types, and the default m.sort() will sort the tuples in the usual way, which is exactly what you want. So the whole thing reduces to three lines: read the array, sort the array, write the array.
But let's show you what you did wrong with your attempt:
m = sorted(data, key=lambda data_entry: data_entry[0])
You're asking it to sort by the first string in the list of strings data_entry. So that's what it does. If you want it to sort by that first string as a number, you have to tell it that. Like this:
m = sorted(data, key=lambda data_entry: int(data_entry[0]))
And that's it.
Also, if you want to read (or write) CSV-like files without using NumPy, rather than writing your own string processing, the csv module in the standard library makes your like easier:
with open("coord.xyz") as inf:
data = list(csv.reader(inf, delimiter=' '))
m = sorted(data, key=lambda data_entry: int(data_entry[0]))
with open("sorted.dat", "a") as outf:
csv.writer(outf, delimiter=' ').writerows(m)