How to find floating point numbers in binary file with Python? - python

I have a binary file mixed with ASCII in which there are some floating point numbers I want to find. The file contains some lines like this:
1,1,'11.2','11.3';1,1,'100.4';
In my favorite regex tester I found that the correct regex should be ([0-9]+\.{1}[0-9]+).
Here's the code:
import re
data = open('C:\\Users\\Me\\file.bin', 'rb')
pat = re.compile(b'([0-9]+\.{1}[0-9]+)')
print(pat.match(data.read()))
I do not get a single match, why is that? I'm on Python 3.5.1.

You can try like this,
import re
with open('C:\\Users\\Me\\file.bin', 'rb') as f:
data = f.read()
re.findall("\d+\.\d+", data)
Output:
['11.2', '11.3', '100.4']
re.findall returns string list. If you want to convert to float you can do like this
>>> list(map(float, re.findall("\d+\.\d+", data)))
[11.2, 11.3, 100.4]

How to find floating point numbers in binary file with Python?
float_re = br"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
for m in generate_tokens(r'C:\Users\Me\file.bin', float_re):
print(float(m.group()))
where float_re is from this answer and generate_tokens() is defined here.
pat.match() tries to match at the very start of the input string and your string does not start with a float and therefore you "do not get a single match".
re.findall("\d+\.\d+", data) produces TypeError because the pattern is Unicode (str) but data is a bytes object in your case. Pass the pattern as bytes:
re.findall(b"\d+\.\d+", data)

Related

String decoding of format \x123 in Python

Do anybody know that type of string?
And how to convert it to readable format in Python?
This data from log file of the mobile app (it might be in Russian)
"title":"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443}"
Thanks ahead!
For me it does look like hex-codes of characters, I would extract codes, treat them as base-16 integers and convert to characters. That is
title = r"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443}"
codes = [code.strip('{} ') for code in title.split(r"\x") if code]
characters = [chr(int(code, 16)) for code in codes]
output = ''.join(characters)
print(output)
Output:
Отсрочкапокредиту
import json
data = r'"\x{41E}\x{442}\x{441}\x{440}\x{43E}\x{447}\x{43A}\x{430} \x{43F}\x{43E} \x{43A}\x{440}\x{435}\x{434}\x{438}\x{442}\x{443} "'
print(json.loads(data.replace('{','').replace('}','').replace('x', 'u0')))
…and the output is Отсрочка по кредиту.

best ways to Decode and process Binary Bytes from files

I am trying to decode and process Binary data file, following is a data format
input:9,data:443,gps:3
and has more data in the same fashion, [key:value] format.
basically, I need to create a dictionary of the file to process it later.
input:b'input:9,data:443,gps:3'
Desired output:{'input': '9', 'data': '443', 'gps': '3'}
Your input data is bytes(sequence of bytes). To convert it to str object you can use bytes.decode(). Than you can work with data lie with sting and split it by , and :. Code:
inp = b"input:9,data:443,gps:3"
out = dict(s.split(":") for s in inp.decode().split(","))
The string input:9,data:443,gps:3 is text, not binary data, so I am going to guess that it is a format template, not a sample of the file contents. This format would mean that your file has an "input" field that is 9 bytes long, followed by 443 bytes of "data", followed by a 3-byte "gps" value. This description does not specify the types of the fields, so it is incomplete; but it's a start.
The Python tool for structured binary files is the module struct. Here's how to extract the three fields as bytes objects:
import struct
with open("some_file.bin", "rb") as binfile:
content = binfile.read()
input_, data, gps = struct.unpack("9s443s3s", content)
The function struct.unpack provides many other formats besides s; this is just an example. But there is no specifier for plain text strings, so if input_ is a text string, the next step is to convert it:
input_ = input_.decode("ascii") # or other encoding
Since you ask for a dictionary of the results, here is one way:
result = { "input":input_, "data": data, "gps": gps }
Solution using eval:
inp = b"input:9,data:443,gps:3"
out = eval(b'dict(%s)' % inp.replace(b':', b'='))

Parsing floating number from ping output in text file

So I am writing this python program that must extract the round trip time from a text file that contains numerous pings, whats in the text file I previewed below:
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=1 ttl=60 time=12.6ms
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=2 ttl=60 time=1864ms
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=3 ttl=60 time=107.8ms
What I want to extract from the text file is the 12.6, 1864, and the 107.8. I used regex to do this and have the following:
import re
ping = open("pingoutput.txt")
rawping = ping.read()
roundtriptimes = re.findall(r'times=(\d+.\d+)', rawping)
roundtriptimes.sort()
print (roundtriptimes)
The issue I'm having is that I believe the numbers are being read into the roundtriptimes list as strings so when I go to sort them they do not sort as I would like them to.
Any idea how to modify my regex findall command to make sure it recognizes them as numbers would help tremendously! Thanks!
I don't know of a way to do that in RegEx, but if you add the following line before the sort, it should take care of it for you:
roundtriptimes[:] = [float(x) for x in roundtriptimes]
Non-regex:
Simply performing splits on space, grabbing the last entry, then split on =, grab the second part of the list and omit the last two components (ms). Cast to a float.
All of that is done in a list-comprehension:
Note that readlines is used to have a list containing each line of the file, which will be much easier to manage.
with open('ping_results.txt') as f:
data = f.readlines()
times = [float(line.split()[-1].split('=')[1][:-2]) for line in data]
print(times) # [12.6, 1864.0, 107.8]
regex:
The key thing here is to pay attention to the regex being used:
time=(\d*\.?\d+)
Look for time=, then start a capture group (), and grab digits (\d*), optional decimal (\.?), digits (\d+).
import re
with open('ping_results.txt') as f:
data = f.readlines()
times = [float(re.findall('time=(\d*\.?\d+)', line)[0]) for line in data]
print(times) # [12.6, 1864.0, 107.8]

convert string into int()

I have a dataset that looks like this:
0 _ _ 23.0186E-03
10 _ _51.283E-03
20 _ _125.573E-03
where the numbers are lined up line by line (the underscores represent spaces).
The numbers in the right hand column are currently part of the line's string. I am trying to convert the numbers on the right into numerical values (0.0230186 etc). I can convert them with int() once they are in a simple numerical form, but I need to change the "E"s to get there. If you know how to change it for any value of E such as E-01, E-22 it would be very helpful.
Currently my code looks like so:
fin = open( 'stringtest1.txt', "r" )
fout = open("stringtest2.txt", "w")
while 1:
x=fin.readline()
a=x[5:-1]
##conversion code should go here
if not x:
break
fin.close()
fout.close()
I would suggest the following for the conversion:
float(x.split()[-1])
str.split() will split on white space when no arguments are provided, and float() will convert the string into a number, for example:
>>> '20 125.573E-03'.split()
['20', '125.573E-03']
>>> float('20 125.573E-03'.split()[-1])
0.12557299999999999
You should use context handlers, and file handles are iterable:
with open('test1.txt') as fhi, open('test2.txt', 'w') as fho:
for line in fhi:
f = float(line.split()[-1])
fho.write(str(f))
If I understand what you want to do correctly, there's no need to do anything with the E's: in python float('23.0186E-03') returns 0.0230186, which I think is what you want.
All you need is:
fout = open("stringtest2.txt", "w")
for line in open('stringtest1.txt', "r"):
x = float(line.strip().split()[1])
fout.write("%f\n"%x)
fout.close()
Using %f in the output string will make sure the output will be in decimal notation (no E's). If you just use str(x), you may get E's in the output depending on the original value, so the correct conversion method depends on which output you want:
>>> str(float('23.0186E-06'))
'2.30186e-05'
>>> "%f"%float('23.0186E-06')
'0.000023'
>>> "%.10f"%float('23.0186E-06')
'0.0000230186'
You can add any number to %f to specify the precision. For more about string formatting with %, see http://rgruet.free.fr/PQR26/PQR2.6.html#stringMethods (scroll down to the "String formatting with the % operator" section).
float("20 _ _125.573E-03".split()[-1].strip("_"))

python: find and replace numbers < 1 in text file

I'm pretty new to Python programming and would appreciate some help to a problem I have...
Basically I have multiple text files which contain velocity values as such:
0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00
etc for many lines...
What I need to do is convert all the values in the text file that are less than 1 (e.g. 0.137865E+00 above) to an arbitrary value of 0.100000E+01. While it seems pretty simple to replace specific values with the 'replace()' method and a while loop, how do you do this if you want to replace a range?
thanks
I think when you are beginning programming, it's useful to see some examples; and I assume you've tried this problem on your own first!
Here is a break-down of how you could approach this:
contents='0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00'
The split method works on strings. It returns a list of strings. By default, it splits on whitespace:
string_numbers=contents.split()
print(string_numbers)
# ['0.259515E+03', '0.235095E+03', '0.208262E+03', '0.230223E+03', '0.267333E+03', '0.217889E+03', '0.156233E+03', '0.144876E+03', '0.136187E+03', '0.137865E+00']
The map command applies its first argument (the function float) to each of the elements of its second argument (the list string_numbers). The float function converts each string into a floating-point object.
float_numbers=map(float,string_numbers)
print(float_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 0.13786499999999999]
You can use a list comprehension to process the list, converting numbers less than 1 into the number 1. The conditional expression (1 if num<1 else num) equals 1 when num is less than 1, otherwise, it equals num.
processed_numbers=[(1 if num<1 else num) for num in float_numbers]
print(processed_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 1]
This is the same thing, all in one line:
processed_numbers=[(1 if num<1 else num) for num in map(float,contents.split())]
To generate a string out of the elements of processed_numbers, you could use the str.join method:
comma_separated_string=', '.join(map(str,processed_numbers))
# '259.515, 235.095, 208.262, 230.223, 267.333, 217.889, 156.233, 144.876, 136.187, 1'
typical technique would be:
read file line by line
split each line into a list of strings
convert each string to the float
compare converted value with 1
replace when needed
write back to the new file
As I don't see you having any code yet, I hope that this would be a good start
def float_filter(input):
for number in input.split():
if float(number) < 1.0:
yield "0.100000E+01"
else:
yield number
input = "0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00"
print " ".join(float_filter(input))
import numpy as np
a = np.genfromtxt('file.txt') # read file
a[a<1] = 0.1 # replace
np.savetxt('converted.txt', a) # save to file
You could use regular expressions for parsing the string. I'm assuming here that the mantissa is never larger than 1 (ie, begins with 0). This means that for the number to be less than 1, the exponent must be either 0 or negative. The following regular expression matches '0', '.', unlimited number of decimal digits (at least 1), 'E' and either '+00' or '-' and two decimal digits.
0\.\d+E(-\d\d|\+00)
Assuming that you have the file read into variable 'text', you can use the regexp with the following python code:
result = re.sub(r"0\.\d*E(-\d\d|\+00)", "0.100000E+01", text)
Edit: Just realized that the description doesn't limit the valid range of input numbers to positive numbers. Negative numbers can be matched with the following regexp:
-0\.\d+E[-+]\d\d
This can be alternated with the first one using the (pattern1|pattern2) syntax which results in the following Python code:
result = re.sub(r"(0\.\d+E(-\d\d|\+00)|-0\.\d+E[-+]\d\d)", "0.100000E+00", subject)
Also if there's a chance that the exponent goes past 99, the regexp can be further modified by adding a '+' sign after the '\d\d' patterns. This allows matching digits ending in two OR MORE digits.
I've got the script working as I want now...thanks people.
When writing the list to a new file I used the replace method to get rid of the brackets and commas - is there a simpler way?
ftext = open("C:\\Users\\hhp06\\Desktop\\out.grd", "r")
otext = open("C:\\Users\\hhp06\\Desktop\\out2.grd", "w+")
for line in ftext:
stringnum = line.split()
floatnum = map(float, stringnum)
procnum = [(1.0 if num<1 else num) for num in floatnum]
stringproc = str(procnum)
s = (stringproc).replace(",", " ").replace("[", " ").replace("]", "")
otext.writelines(s + "\n")
otext.close()

Categories