Parsing floating number from ping output in text file - python

So I am writing this python program that must extract the round trip time from a text file that contains numerous pings, whats in the text file I previewed below:
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=1 ttl=60 time=12.6ms
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=2 ttl=60 time=1864ms
64 bytes from a104-100-153-112.deploy.static.akamaitechnologies.com (104.100.153.112): icmp_seq=3 ttl=60 time=107.8ms
What I want to extract from the text file is the 12.6, 1864, and the 107.8. I used regex to do this and have the following:
import re
ping = open("pingoutput.txt")
rawping = ping.read()
roundtriptimes = re.findall(r'times=(\d+.\d+)', rawping)
roundtriptimes.sort()
print (roundtriptimes)
The issue I'm having is that I believe the numbers are being read into the roundtriptimes list as strings so when I go to sort them they do not sort as I would like them to.
Any idea how to modify my regex findall command to make sure it recognizes them as numbers would help tremendously! Thanks!

I don't know of a way to do that in RegEx, but if you add the following line before the sort, it should take care of it for you:
roundtriptimes[:] = [float(x) for x in roundtriptimes]

Non-regex:
Simply performing splits on space, grabbing the last entry, then split on =, grab the second part of the list and omit the last two components (ms). Cast to a float.
All of that is done in a list-comprehension:
Note that readlines is used to have a list containing each line of the file, which will be much easier to manage.
with open('ping_results.txt') as f:
data = f.readlines()
times = [float(line.split()[-1].split('=')[1][:-2]) for line in data]
print(times) # [12.6, 1864.0, 107.8]
regex:
The key thing here is to pay attention to the regex being used:
time=(\d*\.?\d+)
Look for time=, then start a capture group (), and grab digits (\d*), optional decimal (\.?), digits (\d+).
import re
with open('ping_results.txt') as f:
data = f.readlines()
times = [float(re.findall('time=(\d*\.?\d+)', line)[0]) for line in data]
print(times) # [12.6, 1864.0, 107.8]

Related

How to find floating point numbers in binary file with Python?

I have a binary file mixed with ASCII in which there are some floating point numbers I want to find. The file contains some lines like this:
1,1,'11.2','11.3';1,1,'100.4';
In my favorite regex tester I found that the correct regex should be ([0-9]+\.{1}[0-9]+).
Here's the code:
import re
data = open('C:\\Users\\Me\\file.bin', 'rb')
pat = re.compile(b'([0-9]+\.{1}[0-9]+)')
print(pat.match(data.read()))
I do not get a single match, why is that? I'm on Python 3.5.1.
You can try like this,
import re
with open('C:\\Users\\Me\\file.bin', 'rb') as f:
data = f.read()
re.findall("\d+\.\d+", data)
Output:
['11.2', '11.3', '100.4']
re.findall returns string list. If you want to convert to float you can do like this
>>> list(map(float, re.findall("\d+\.\d+", data)))
[11.2, 11.3, 100.4]
How to find floating point numbers in binary file with Python?
float_re = br"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
for m in generate_tokens(r'C:\Users\Me\file.bin', float_re):
print(float(m.group()))
where float_re is from this answer and generate_tokens() is defined here.
pat.match() tries to match at the very start of the input string and your string does not start with a float and therefore you "do not get a single match".
re.findall("\d+\.\d+", data) produces TypeError because the pattern is Unicode (str) but data is a bytes object in your case. Pass the pattern as bytes:
re.findall(b"\d+\.\d+", data)

converting matrix from logfile

I have a matrix written in this format inside a log file:
2014-09-08 14:10:20,107 - root - INFO - [[ 8.30857546 0.69993454 0.20645551
77.01797674 13.76705776]
[ 8.35205432 0.53417203 0.19969048 76.78598173 14.12810144]
[ 8.37066492 0.64428449 0.18623849 76.4181809 14.3806312 ]
[ 8.50493296 0.5110043 0.19731849 76.45838604 14.32835821]
[ 8.18900791 0.4955451 0.22524777 76.96966663 14.12053259]]
...some text
2014-09-08 14:12:22,211 - root - INFO - [[ 3.25142253e+01 1.11788106e+00 1.51065008e-02 6.16496299e+01
4.70315726e+00]
[ 3.31685887e+01 9.53522041e-01 1.49767860e-02 6.13449154e+01
4.51799710e+00]
[ 3.31101827e+01 1.09729703e+00 5.03347259e-03 6.11818594e+01
4.60562742e+00]
[ 3.32506957e+01 1.13837592e+00 1.51783456e-02 6.08651657e+01
4.73058437e+00]
[ 3.26809490e+01 1.06617279e+00 1.00110121e-02 6.17429172e+01
4.49994994e+00]]
I am writing this matrix using the python logging package:
logging.info(conf_mat)
However, logging.info does not show me a method to write the matrix in a float %.3f format. So I decided to parse the log file this way:
conf_mat = [[]]
cf = '[+-]?(?=\d*[.eE])(?=\.?\d)\d*\.?\d*(?:[eE][+-]?\d+)?'
with open(sys.argv[1]) as f:
for line in f:
epoch = re.findall(ep, line) # find lines starting with epoch for other stuff
if epoch:
error_line = next(f) # grab the next line, which is the error line
error_value = error_line[error_line.rfind('=')+1:]
data_points.append(map(float,epoch[0]+(error_value,))) #get the error value for the specific epoch
for i in range(N):
cnf_mline = next(f)
match = re.findall(cf, cnf_mline)
if match:
conf_mat[count].append(map(float,match))
else:
conf_mat.append([])
count += 1
However, the regex does not catch the break in the line when looking at the matrix, when I try to convert the matrix using
conf_mtx = np.array(conf_mat)
Your regex string cf needs to be a raw string literal:
cf = r'[+-]?(?=\d*[.eE])(?=\.?\d)\d*\.?\d*(?:[eE][+-]?\d+)?'
in order to work properly. Backslash \ characters are interpreted as escape sequences in "regular" strings, but should not be in regexes. You can read about raw string literals at the top of the re module's documentation, and in this excellent SO answer. Alex Martelli explains them quite well, so I won't repeat everything he says here. Suffice it to say that were you not to use a raw literal, you'd have to escape each and every one of your backslashes with another backslash, and that just gets ugly and annoying fast.
As for the rest of your code, it won't run without more information. The N in for i in range(N): is undefined, as is count a few lines later. Calling cnf_mline = next(f) really doesn't make sense at all, because you're going to run out of lines in the file (by calling next repeatedly) before you can iterate over all of them using the for line in f: command. It's unclear whether your data really has that line break in the second half where one of the members of the list is on the next line, I assume that's the case because of the next attempt.
I think you should first try to clean up your input file into a regular format, then you'll have a much easier time running regular expressions on it. In order to work on subsequent lines and not run out your generator expression with excessive uses of next(), check out itertools.tee(). It returns n independent generators from a single iterable, allowing you to advance the second a line ahead of the first. Alternatively, you could read your file's lines into a list, and just operate using indices of i, i+1. Just strip each line, join them together, and write to a new file or list. You can then go ahead and rewrite your matching loop to simply pull each number of the appropriate format out and insert it into your matrix at the correct position. The good news is your regex caught everything I threw at it, so you won't need to modify anything there.
Good luck!

How to find null byte in a string in Python?

I'm having an issue parsing data after reading a file. What I'm doing is reading a binary file in and need to create a list of attributes from the read file all of the data in the file is terminated with a null byte. What I'm trying to do is find every instance of a null byte terminated attribute.
Essentially taking a string like
Health\x00experience\x00charactername\x00
and storing it in a list.
The real issue is I need to keep the null bytes in tact, I just need to be able to find each instance of a null byte and store the data that precedes it.
Python doesn't treat NUL bytes as anything special; they're no different from spaces or commas. So, split() works fine:
>>> my_string = "Health\x00experience\x00charactername\x00"
>>> my_string.split('\x00')
['Health', 'experience', 'charactername', '']
Note that split is treating \x00 as a separator, not a terminator, so we get an extra empty string at the end. If that's a problem, you can just slice it off:
>>> my_string.split('\x00')[:-1]
['Health', 'experience', 'charactername']
While it boils down to using split('\x00') a convenience wrapper might be nice.
def readlines(f, bufsize):
buf = ""
data = True
while data:
data = f.read(bufsize)
buf += data
lines = buf.split('\x00')
buf = lines.pop()
for line in lines:
yield line + '\x00'
yield buf + '\x00'
then you can do something like
with open('myfile', 'rb') as f:
mylist = [item for item in readlines(f, 524288)]
This has the added benefit of not needing to load the entire contents into memory before splitting the text.
To check if string has NULL byte, simply use in operator, for example:
if b'\x00' in data:
To find the position of it, use find() which would return the lowest index in the string where substring sub is found. Then use optional arguments start and end for slice notation.
Split on null bytes; .split() returns a list:
>> print("Health\x00experience\x00charactername\x00".split("\x00"))
['Health', 'experience', 'charactername', '']
If you know the data always ends with a null byte, you can slice the list to chop off the last empty string (like result_list[:-1]).

Python parsing a text file and logical methods

I'm a bit stuck with python logic.
I'd like some some advice on how to tackle a problem I'm having with python and the methods to parsing data.
I've spent a bit of time reading the python reference documents and going through this site and I understand there are several ways to do what I'm trying to achieve and this is the path I've gone down.
I'm re-formating some text files with data generated from some satellite hardware to be uploaded into a MySQL database.
This is the raw data
TP N: 1
Frequency: 12288.635 Mhz
Symbol rate: 3000 KS
Polarization: Vertical
Spectrum: Inverted
Standard/Modulation: DVB-S2/QPSK
FEC: 1/2
RollOff: 0.20
Pilot: on
Coding mode: ACM/VCM
Short frame
Transport stream
Single input stream
RF-Level: -49 dBm
Signal/Noise: 6.3 dB
Carrier width: 3.600 Mhz
BitRate: 2.967 Mbit/s
The above section is repeated for each transponder TP N on the satellite
I'm using this script to extract the data I need
strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")
sat_raw = open('/BLScan/reports/1520.txt', 'r')
sat_out = open('1520out.txt', 'w')
for line in sat_raw:
if any(s in line for s in strings):
for word in line.split():
if ':' in word:
sat_out.write(line.split(':')[-1])
sat_raw.close()
sat_out.close()
The output data is then formatted like this before its sent to the database
12288.635 Mhz
3000 KS
Vertical
DVB-S2/QPSK
1/2
-49 dBm
6.3 dB
3.600 Mhz
2.967 Mbit/s
This script is working fine but for some features I want to implement on MySQL I need to edit this further.
Remove the decimal point and 3 numbers after it and MHz on the first "frequency" line.
Remove all the trailing measurement references KS,dBm,dB, Mhz, Mbit.
Join the 9 fields into a comma delimited string so each transponders (approx 30 per file ) are on their own line
I'm unsure weather to continue down this path adding onto this existing script (which I'm stuck at the point where the output file is written). Or rethink my approach to the way I'm processing the raw file.
My solution is crude, might not work in corner cases, but it is a good start.
import re
import csv
strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")
sat_raw = open('/BLScan/reports/1520.txt', 'r')
sat_out = open('1520out.txt', 'w')
csv_writer = csv.writer(sat_out)
csv_output = []
for line in sat_raw:
if any(s in line for s in strings):
try:
m = re.match(r'^.*:\s+(\S+)', line)
value = m.groups()[0]
# Attempt to convert to int, thus removing the decimal part
value = int(float(value))
except ValueError:
pass # Ignore conversion
except AttributeError:
pass # Ignore case when m is None (no match)
csv_output.append(value)
elif line.startswith('TP N'):
# Before we start a new set of values, write out the old set
if csv_output:
csv_writer.writerow(csv_output)
csv_output=[]
# If we reach the end of the file, don't miss the last set of values
if csv_output:
csv_writer.writerow(csv_output)
sat_raw.close()
sat_out.close()
Discussion
The csv package helps with CSV output
The re (regular expression) module helps parsing the line and extract the value from the line.
In the line that reads, value = int(...), We attempt to turn the string value into an integer, thus removing the dot and following digits.
When the code encounters a line that starts with 'TP N', which signals a new set of values. We write out the old set of value to the CSV file.
import math
strings = ("Frequency", "Symbol", "Polar", "Mod", "FEC", "RF", "Signal", "Carrier", "BitRate")
files=['/BLScan/reports/1520.txt']
sat_out = open('1520out.txt', 'w')
combineOutput=[]
for myfile in files:
sat_raw = open(myfile, 'r')
singleOutput=[]
for line in sat_raw:
if any(s in line for s in strings):
marker=line.split(':')[1]
try:
data=str(int(math.floor(float(marker.split()[0]))))
except:
data=marker.split()[0]
singleOutput.append(data)
combineOutput.append(",".join(singleOutput))
for rec in combineOutput:
sat_out.write("%s\n"%rec)
sat_raw.close()
sat_out.close()
Add all the files that you want to parse in files list. It will write the output of each file as a separate line and each field comma separated.

convert string into int()

I have a dataset that looks like this:
0 _ _ 23.0186E-03
10 _ _51.283E-03
20 _ _125.573E-03
where the numbers are lined up line by line (the underscores represent spaces).
The numbers in the right hand column are currently part of the line's string. I am trying to convert the numbers on the right into numerical values (0.0230186 etc). I can convert them with int() once they are in a simple numerical form, but I need to change the "E"s to get there. If you know how to change it for any value of E such as E-01, E-22 it would be very helpful.
Currently my code looks like so:
fin = open( 'stringtest1.txt', "r" )
fout = open("stringtest2.txt", "w")
while 1:
x=fin.readline()
a=x[5:-1]
##conversion code should go here
if not x:
break
fin.close()
fout.close()
I would suggest the following for the conversion:
float(x.split()[-1])
str.split() will split on white space when no arguments are provided, and float() will convert the string into a number, for example:
>>> '20 125.573E-03'.split()
['20', '125.573E-03']
>>> float('20 125.573E-03'.split()[-1])
0.12557299999999999
You should use context handlers, and file handles are iterable:
with open('test1.txt') as fhi, open('test2.txt', 'w') as fho:
for line in fhi:
f = float(line.split()[-1])
fho.write(str(f))
If I understand what you want to do correctly, there's no need to do anything with the E's: in python float('23.0186E-03') returns 0.0230186, which I think is what you want.
All you need is:
fout = open("stringtest2.txt", "w")
for line in open('stringtest1.txt', "r"):
x = float(line.strip().split()[1])
fout.write("%f\n"%x)
fout.close()
Using %f in the output string will make sure the output will be in decimal notation (no E's). If you just use str(x), you may get E's in the output depending on the original value, so the correct conversion method depends on which output you want:
>>> str(float('23.0186E-06'))
'2.30186e-05'
>>> "%f"%float('23.0186E-06')
'0.000023'
>>> "%.10f"%float('23.0186E-06')
'0.0000230186'
You can add any number to %f to specify the precision. For more about string formatting with %, see http://rgruet.free.fr/PQR26/PQR2.6.html#stringMethods (scroll down to the "String formatting with the % operator" section).
float("20 _ _125.573E-03".split()[-1].strip("_"))

Categories