I have a log file which shows data sent in the below format -
2019-10-17T00:00:02|Connection(10.0.0.89 :0 ) r=0 s=1024
d=0 t=0 q=0 # connected
2019-10-17T00:00:02|McSend (229.0.0.70 :20001) b=1635807
f=2104 d=0 t=0
There will be multiple lines per file
How can I graph the b=value against the time (near the beginning on the line) but only from the McSend lines
Thanks
If you're not familiar with regular expressions - python regex documentation is a good place to start.
The simplest regex you probably need is r"^(\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d)\|.*McSend.*+b=(\d+)"
First group will allow you compare the timestamp and the second will give the value.
import re
pattern = r"^(\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d)\|.+McSend.+b=(\d+)"
#result is a list of tuples containing the time stamp and the value for b
result = re.findall(pattern, some_input)
You should read your file line by lines. Then scan for each line if it contains 'McSend'. If it does then retrieve the desired data.
You could do something like this :
b_values = []
dates = []
## Lets open the file and read it line by line
with open(filepath) as f:
for line in f:
## If the line contains McSend
if 'McSend' in line :
## We split the line by spaces ( split() with no arguments does so )
splited_line = line.split()
## First string chunk contains the header where the date is located
header = splited_line[0]
## Then retrieve the b value
for val in splited_line :
if val.startswith('b=') :
b_value = val.split("=",1)[1]
## Now you can add the value to arrays and then plot what you neet
b_values.append(b_value)
dates.append(header.split("|",1)[0]
## Do your plot
Related
I have a file that contains blocks of information beginning and ending with the same phrase:
# Info block
Info line 1
Info line 2
Internal problem
ENDOFPARAMETERPOINT
I am trying to write a python code that deletes the entire block beginning with # Info block and ending with ENDOFPARAMETERPOINT once it detects the phrase Internal problem.
finds = '# Info block\nInfo line 1\nInfo line 2\nInternal problem\nENDOFPARAMETERPOINT'
with open(filename,"r+") as fp:
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,'',textdata)
fp.seek(0)
fp.write(line)
This code only works for one line but not the entire paragraph. Any suggestions are appreciated.
EDIT:
The code that works now is:
with open(filename,"r+") as fp:
pattern = re.compile(re.escape(finds))
textdata = fp.read()
line = re.sub(pattern,'',textdata)
fp.seek(0)
fp.write(line)
fp.truncate()
Why can't you just use pattern = re.compile(re.escape(finds))?
You can use two lists start_indexes and stop_indexes which contain respectively the start index to remove from and the end index to remove to. Then you can merge the two lists with the 'zip' method to have a matrix where each row has the start index and the end index of the rows to be removed. For each of these rows in the matrix you can create a list with the lines corresponding to the range of values and then remove the values contained in this list from the original list.
In this example the text to be processed divided into lines is stored in vals.
vals = ['string', '#blabla', 'ciao', 'miao', 'bau', 'ENDOFPARAMETERPOINT', 'as']
start_indexes = []
stop_indexes = []
for index, line in enumerate(vals):
if line[0] == '#':
start_indexes.append(index)
elif line == 'ENDOFPARAMETERPOINT':
stop_indexes.append(index)
for start, stop in zip(start_indexes, stop_indexes):
values_to_remove = [vals[x] for x in range(start, stop+1)]
for v in values_to_remove:
vals.remove(v)
I'm trying to read xyz coordinates from a long file using python.
within the file there is a block which indicates that the xyz coordinates are within the next lines.
CARTESIAN COORDINATES (ANGSTROEM)
---------------------------------
C -0.283576 -0.776740 -0.312605
H -0.177080 -0.046256 -1.140653
Cl -0.166557 0.025928 1.189976
----------------------------
I'm using the following code to find the line which mentions the "CARTESIAN COORDINATES (ANGSTROEM)" and then try to iterate until finding an empty line to read the coordinates. However, f.tell() points that I'm at line 0! Therefore, I can not do either next(f) or f.readline() to go through the next lines (just goes to line 1 from line 0). I don't know how this can be done with python.
def read_xyz_out(self,out):
atoms = []
x = []
y = []
z = []
f = open(out, "r")
for line in open(out):
if re.match(r'{}'.format(r'CARTESIAN COORDINATES \(ANGSTROEM\)'), line):
print(f.tell())
# data = line.split()
# atoms.append(data[0])
# x.append(float(data[1]))
# y.append(float(data[2]))
# z.append(float(data[3]))
Suppose you read your file into this string:
My dog has fleas.
CARTESIAN COORDINATES (ANGSTROEM)
---------------------------------
C -0.283576 -0.776740 -0.312605
H -0.177080 -0.046256 -1.140653
Cl -0.166557 0.025928 1.189976
----------------------------
My cat too.
You can then extract lines 4, 5 and 6 with the regular expression
/CARTESIAN COORDINATES \(ANGSTROEM\)\r?\n---------------------------------\r?\n(.+?)(?=\r?\n\r?\n)/s
demo
This expression reads, "match the string 'CARTENSION...---\r?\n' followed by matching 1+ chars, greedily, in capture group 1, followed by an empty line, with the flag '/s' to enable '.' to match the ends of lines".
The desired information can then be extracted with the regular expression
/ *([A-Z][a-z]*) +(-?\d+.\d{6}) +(-?\d+.\d{6}) +(-?\d+.\d{6})\r?\n/
demo
The first step can be skipped if it is sufficient to look for a line that look like this:
C -0.283576 -0.776740 -0.312605
without having to confirm it is preceded by "CARTESIAN...---".
demo
You've opened out twice: once for the f variable and a second time for the for line in open(out): loop. Each file object has its own position, and you've only been reading from the second one (which hasn't been assigned to a variable so you can't get the position). The position of f is still at the beginning, since you never read from it.
You should use
for line in f:
and not call open(out) a second time. You can then call f.readline() inside the loop to read more lines of the file.
How about this (note: untested so there's bound to be bugs - think of this as a sketch of a solution):
def read_xyz_out(self,out):
atoms = []
x = []
y = []
z = []
f = open(out, "r")
# Read until you get to the data
for line in f:
if re.match(r'{}'.format(r'CARTESIAN COORDINATES \(ANGSTROEM\)'), line):
# skip the next line too
f.readline()
break
# Now you're into the data - the loop here picks up where the previous
# one left off
for line in f:
data = line.split()
atoms.append(data[0])
x.append(float(data[1]))
y.append(float(data[2]))
z.append(float(data[3]))
f.close()
I am trying to extract data from a .txt file in Python. My goal is to capture the last occurrence of a certain word and show the next line, so I do a reverse () of the text and read from behind. In this case, I search for the word 'MEC', and show the next line, but I capture all occurrences of the word, not the first.
Any idea what I need to do?
Thanks!
This is what my code looks like:
import re
from file_read_backwards import FileReadBackwards
with FileReadBackwards("camdex.txt", encoding="utf-8") as file:
for l in file:
lines = l
while line:
if re.match('MEC', line):
x = (file.readline())
x2 = (x.strip('\n'))
print(x2)
break
line = file.readline()
The txt file contains this:
MEC
29/35
MEC
28,29/35
And with my code print this output:
28,29/35
29/35
And my objetive is print only this:
28,29/35
This will give you the result as well. Loop through lines, add the matching lines to an array. Then print the last element.
import re
with open("data\camdex.txt", encoding="utf-8") as file:
result = []
for line in file:
if re.match('MEC', line):
x = file.readline()
result.append(x.strip('\n'))
print(result[-1])
Get rid of the extra imports and overhead. Read your file normally, remembering the last line that qualifies.
with ("camdex.txt", encoding="utf-8") as file:
for line in file:
if line.startswith("MEC"):
last = line
print(last[4:-1]) # "4" gets rid of "MEC "; "-1" stops just before the line feed.
If the file is very large, then reading backwards makes sense -- seeking to the end and backing up will be faster than reading to the end.
This is data from a lab experiment (around 717 lines of data). Rather than trying to excell it, I want to import and graph it on either python or matlab. I'm new here btw... and am a student!
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
more numbers : see Screenshot of more data from my file
I just can't figure out how to read the line up until a comma. Specifically, I need the Load numbers for one of my arrays/list, so for example on the first line I only need 62.638 (which would be the first number on my first index on my list/array).
How can I get an array/list of this, something that iterates/reads the list and ignores strings?
Thanks!
NOTE: I use Anaconda + Jupyter Notebooks for Python & Matlab (school provided software).
EDIT: Okay, so I came home today and worked on it again. I hadn't dealt with CSV files before, but after some searching I was able to learn how to read my file, somewhat.
import csv
from itertools import islice
with open('Blue_bar_GroupD.txt','r') as BB:
BB_csv = csv.reader(BB)
x = 0
BB_lb = []
while x < 7: #to skip the string data
next(BB_csv)
x+=1
for row in islice(BB_csv,0,758):
print(row[0]) #testing if I can read row data
Okay, here is where I am stuck. I want to make an arraw/list that has the 0th index value of each row. Sorry if I'm a freaking noob!
Thanks again!
You can skip all lines till the first data row and then parse the data into a list for later use - 700+ lines can be easily processd in memory.
Therefor you need to:
read the file line by line
remember the last non-empty line before number/comma/dot ( == header )
see if the line is only number/comma/dot, else increase a skip-counter (== data )
seek to 0
skip enough lines to get to header or data
read the rest into a data structure
Create test file:
text = """
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
"""
with open ("t.txt","w") as w:
w.write(text)
Some helpers and the skipping/reading logic:
import re
import csv
def convert_row(row):
"""Convert one row of data into a list of mixed ints and others.
Int is the preferred data type, else string is used - no other tried."""
d = []
for v in row:
try:
# convert to int && add
d.append(float(v))
except:
# not an int, append as is
d.append(v)
return d
def count_to_first_data(fh):
"""Count lines in fh not consisting of numbers, dots and commas.
Sideeffect: will reset position in fh to 0."""
skiplines = 0
header_line = 0
fh.seek(0)
for line in fh:
if re.match(r"^[\d.,]+$",line):
fh.seek(0)
return skiplines, header_line
else:
if line.strip():
header_line = skiplines
skiplines += 1
raise ValueError("File does not contain pure number rows!")
Usage of helpers / data conversion:
data = []
skiplines = 0
with open("t.txt","r") as csvfile:
skip_to_data, skip_to_header = count_to_first_data(csvfile)
for _ in range(skip_to_header): # skip_to_data if you do not want the headers
next(csvfile)
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
row_data = convert_row(row)
if row_data:
data.append(row_data)
print(data)
Output (reformatted):
[['Load (lbf)', 'Time (s)', 'Crosshead (in)', 'Extensometer (in)'],
[62.638, 0.9, 0.0, 8e-05],
[122.998, 1.7, 0.001, 0.00012]]
Doku:
re.match
csv.reader
Method of file objekts (i.e.: seek())
With this you now have "clean" data that you can use for further processing - including your headers.
For visualization you can have a look at matplotlib
I would recommend reading your file with python
data = []
with open('my_txt.txt', 'r') as fd:
# Suppress header lines
for i in range(6):
fd.readline()
# Read data lines up to the first column
for line in fd:
index = line.find(',')
if index >= 0:
data.append(float(line[0:index]))
leads to a list containing your data of the first column
>>> data
[62.638, 122.998]
The MATLAB solution is less nice, since you have to know the number of data lines in your file (which you do not need to know in the python solution)
n_header = 6
n_lines = 2 % Insert here 717 (as you mentioned)
M = csvread('my_txt.txt', n_header, 0, [n_header 0 n_header+n_lines-1 0])
leads to:
>> M
M =
62.6380
122.9980
For the sake of clarity: You can also use MATLABs textscan function to achieve what you want without knowing the number of lines, but still, the python code would be the better choice in my opinion.
Based on your format, you will need to do 3 steps. One, read all lines, two, determine which line to use, last, get the floats and assign them to a list.
Assuming you file name is name.txt, try:
f = open("name.txt", "r")
all_lines = f.readlines()
grid = []
for line in all_lines:
if ('"' not in line) and (line != '\n'):
grid.append(list(map(float, line.strip('\n').split(','))))
f.close()
The grid will then contain a series of lists containing your group of floats.
Explanation for fun:
In the "for" loop, i searched for the double quote to eliminate any string as all strings are concocted between quotes. The other one is for skipping empty lines.
Based on your needs, you can use the list grid as you please. For example, to fetch the first line's first number, do
grid[0][0]
as python's list counts from 0 to n-1 for n elements.
This is super simple in Matlab, just 2 lines:
data = dlmread('data.csv', ',', 6,0);
column1 = data(:,1);
Where 6 and 0 should be replaced by the row and column offset you want. So in this case, the data starts at row 7 and you want all the columns, then just copy over the data in column 1 into another vector.
As another note, try typing doc dlmread in matlab - it brings up the help page for dlmread. This is really useful when you're looking for matlab functions, as it has other suggestions for similar functions down the bottom.
In Python, I'm reading a large file with many many lines. Each line contains a number and then a string such as:
[37273738] Hello world!
[83847273747] Hey my name is James!
And so on...
After I read the txt file and put it into a list, I was wondering how I would be able to extract the number and then sort that whole line of code based on the number?
file = open("info.txt","r")
myList = []
for line in file:
line = line.split()
myList.append(line)
What I would like to do:
since the number in message one falls between 37273700 and 38000000, I'll sort that (along with all other lines that follow that rule) into a separate list
This does exactly what you need (for the sorting part)
my_sorted_list = sorted(my_list, key=lambda line: int(line[0][1:-2]))
Use tuple as key value:
for line in file:
line = line.split()
keyval = (line[0].replace('[','').replace(']',''),line[1:])
print(keyval)
myList.append(keyval)
Sort
my_sorted_list = sorted(myList, key=lambda line: line[0])
How about:
# ---
# Function which gets a number from a line like so:
# - searches for the pattern: start_of_line, [, sequence of digits
# - if that's not found (e.g. empty line) return 0
# - if it is found, try to convert it to a number type
# - return the number, or 0 if that conversion fails
def extract_number(line):
import re
search_result = re.findall('^\[(\d+)\]', line)
if not search_result:
num = 0
else:
try:
num = int(search_result[0])
except ValueError:
num = 0
return num
# ---
# Read all the lines into a list
with open("info.txt") as f:
lines = f.readlines()
# Sort them using the number function above, and print them
lines = sorted(lines, key=extract_number)
print ''.join(lines)
It's more resilient in the case of lines without numbers, it's more adjustable if the numbers might appear in different places (e.g. spaces at the start of the line).
(Obligatory suggestion not to use file as a variable name because it's a builtin function name already, and that's confusing).
Now there's an extract_number() function, it's easier to filter:
lines2 = [L for L in lines if 37273700 < extract_number(L) < 38000000]
print ''.join(lines2)