Finding value in a text file - python

I have to find a value of a variable in a text file.
My file is in this form:
AccountingStoragePort=8544
AccountingStorageUser=root
Name=touktouk Cards=45 Files=2015 State=UNKNOWN
How to obtain for example the value of Files (thus 2015)
In this file there is either 1 "variable" / line or several separated by a space. Each value is unique (there are not 2 files in my file)
On the other hand, "Files" can be found at any line => at line 3 or line 150 for example.
I can not modify this file to be compatible with ConfigParser.
My conf file have comment who start with "#", I must exclude these line of my research.
for exclude comment line I try this but it's not work :
match = re.search(r"(?!#\s*)\bFiles=(\d+)", s)
I found the problem, if in my file I have return line, your code not apply comment restriction, for example:
import re
s = '''
AccountingStoragePort=8544
AccountingStorageUser=root
#Name=touktouk Cards=45 Files=2015 State=UNKNOWN
Name=touktouk Cards=45 Files=2017 State=UNKNOWN
'''
match = re.search(r'^[^#].*\bFiles=(\d+)', s, flags=re.MULTILINE)
if match:
value = match.group(1)
print (value)
the program return 2015 not 2017

You can use re.search. If the value is number and variable occurs 1 time or less, use:
import re
s = '''AccountingStoragePort=8544
AccountingStorageUser=root
Name=touktouk Cards=45 Files=2015 State=UNKNOWN'''
match = re.search(r'\bFiles=(\d+)', s)
if match:
value = match.group(1)
print value

A more
import re
with open(file.txt) as f:
kvlist = re.split(r'\s+', f.read())
pairs = [kvpair.split('=') for kvpair in kvlist]
kvdict = {pair[0]:pair[1] for pair in pairs}
print(kvdict['Files'])
This should give you a complete dictionary for reuse on other values.

Related

Iterate Previous Lines after find a pattern

I am searching for a pattern and then if I find that pattern(which can be multiples in a single file) then i want to iterate backwords and capture another pattern and pick the 1st instance.
For Example, if content of the file is as below:
SetSearchExpr("This is the Search Spec 1");
...
ExecuteQuery (ForwardOnly);
var Rec2=FirstRecord();
if(Rec2!=null);
{
Then the Expected Output:
ExecuteQuery Search Spec = "This is the Search Spec 1"
I have figured out by below to check if ExecuteQuery is present, but unable to get the logic to iterate back, my code as below:
import sys
import os
file = open("Sample_code.txt", 'r')
for line in file:
if "ExecuteQuery (" in line:
#if found then check previous lines for another pattern
If anyone help me with a steer then it would be of great help.
No need to go backwards. Just save the SetSearchExpr() line in a variable and use that when you find ExecuteQuery()
for line in file:
if 'SetSearchExpr(' in line:
search_line = line
elif 'ExecuteQuery (' in line:
m = re.match(r'SetSearchExpr\((".*")\)', search_line)
search_spec = m.group(1)
print(f'ExecuteQuery Search Spec = {search_spec}')

RegEx for capturing groups using dictionary key

I'm having trouble displaying the right named capture in my dictionary function. My program reads a .txt file and then transforms the text in that file into a dictionary. I already have the right regex formula to capture them.
Here is my File.txt:
file Science/Chemistry/Quantum 444 1
file Marvel/CaptainAmerica 342 0
file DC/JusticeLeague/Superman 300 0
file Math 333 0
file Biology 224 1
Here is the regex link that is able to capture the ones I want:
By looking at the link, the ones I want to display is highlighted in green and orange.
This part of my code works:
rx= re.compile(r'file (?P<path>.*?)( |\/.*?)? (?P<views>\d+).+')
i = sub_pattern.match(data) # 'data' is from the .txt file
x = (i.group(1), i.group(3))
print(x)
But since I'm making the .txt into a dictionary I couldn't figure out how to make .group(1) or .group(3) as keys to display specifically for my display function. I don't know how to make those groups display when I use print("Title: %s | Number: %s" % (key[1], key[3])) and it will display those contents. I hope someone can help me implement that in my dictionary function.
Here is my dictionary function:
def create_dict(data):
dictionary = {}
for line in data:
line_pattern = re.findall(r'file (?P<path>.*?)( |\/.*?)? (?P<views>\d+).+', line)
dictionary[line] = line_pattern
content = dictionary[line]
print(content)
return dictionary
I'm trying to make my output look like this from my text file:
Science 444
Marvel 342
DC 300
Math 333
Biology 224
You may create and populate a dictionary with your file data using
def create_dict(data):
dictionary = {}
for line in data:
m = re.search(r'file\s+([^/\s]*)\D*(\d+)', line)
if m:
dictionary[m.group(1)] = m.group(2)
return dictionary
Basically, it does the following:
Defines a dictionary dictionary
Reads data line by line
Searches for a file\s+([^/\s]*)\D*(\d+) match, and if there is a match, the two capturing group values are used to form a dictionary key-value pair.
The regex I suggest is
file\s+([^/\s]*)\D*(\d+)
See the Regulex graph explaining it:
Then, you may use it like
res = {}
with open(filepath, 'r') as f:
res = create_dict(f)
print(res)
See the Python demo.
You already used named group in your 'line_pattern', simply put them to your dictionary. re.findall would not work here. Also the character escape '\' before '/' is redundant. Thus your dictionary function would be:
def create_dict(data):
dictionary = {}
for line in data:
line_pattern = re.search(r'file (?P<path>.*?)( |/.*?)? (?P<views>\d+).+', line)
dictionary[line_pattern.group('path')] = line_pattern.group('views')
content = dictionary[line]
print(content)
return dictionary
This RegEx might help you to divide your inputs into four groups where group 2 and group 4 are your target groups that can be simply extracted and spaced with a space:
(file\s)([A-Za-z]+(?=\/|\s))(.*)(\d{3})

Regular Expression result don't match with tester

I'm new with Python...
After couple days if googling I'm still don't get it to work.
My script:
import re
pattern = '^Hostname=([a-zA-Z0-9.]+)'
hand = open('python_test_data.conf')
for line in hand:
line = line.rstrip()
if re.search(pattern, line) :
print line
Test file content:
### Option: Hostname
# Unique, case sensitive Proxy name. Make sure the Proxy name is known to the server!
# Value is acquired from HostnameItem if undefined.
#
# Mandatory: no
# Default:
# Hostname=
Hostname=bbg-zbx-proxy
Script results:
ubuntu-workstation:~$ python python_test.py
Hostname=bbg-zbx-proxy
But when I have tested regex in tester the result is: https://regex101.com/r/wYUc4v/1
I need some advice haw cant I get only bbg-zbx-proxy as script output.
You have already written a regular expression capturing one part of the match, so you could as well use it then. Additionally, change your character class to include - and get rid of the line.strip() call, it's not necessary with your expression.
In total this comes down to:
import re
pattern = '^Hostname=([-a-zA-Z0-9.]+)'
hand = open('python_test_data.conf')
for line in hand:
m = re.search(pattern, line)
if m:
print(m.group(1))
# ^^^
The simple solution would be to split on the equals sign. You know it will always contain that and you will be able to ignore the first item in the split.
import re
pattern = '^Hostname=([a-zA-Z0-9.]+)'
hand = open('testdata.txt')
for line in hand:
line = line.rstrip()
if re.search(pattern, line) :
print(line.split("=")[1]) # UPDATED HERE

how to get the number of occurrence of an expression in a file using python

I have a code that read files and find the matching expression with the user input and highlight it, using findall function in regular expression.
also i am trying to save in json file several information based on this matching.
like :
file name
matching expression
number of occurrence
the problem is that the program read the file and display the text with highlighted expression but in the json file it save the number of occurrence as the number of lines.
in this example the word this is the searched word it exist in the text file twice
the result in the json file is = 12 ==> that is the number of text lines
result of the json file and the highlighted text
code:
def MatchFunc(self):
self.textEdit_PDFpreview.clear()
x = self.lineEditSearch.text()
TextString=self.ReadingFileContent(self.FileListSelected())
d = defaultdict(list)
filename = os.path.basename(self.FileListSelected())
RepX='<u><b style="color:#FF0000">'+x+'</b></u>'
for counter , myLine in enumerate(filename):
self.textEdit_PDFpreview.clear()
thematch=re.sub(x,RepX,TextString)
thematchFilt=re.findall(x,TextString,re.M|re.I)
if thematchFilt:
d[thematchFilt[0]].append(counter + 1)
self.textEdit_PDFpreview.insertHtml(str(thematch))
else:
self.textEdit_PDFpreview.insertHtml('No Match Found')
OutPutListMetaData = []
for match , positions in d.items():
print ("this is match {}".format(match))
print("this is position {}".format(positions))
listMetaData = {"File Name":filename,"Searched Word":match,"Number Of Occurence":len(positions)}
OutPutListMetaData.append(listMetaData)
for p in positions:
print("on line {}".format(p))
jsondata = json.dumps(OutPutListMetaData,indent=4)
print(jsondata)
folderToCreate = "search_result"
today = time.strftime("%Y%m%d__%H-%M")
jsonFileName = "{}_searchResult.json".format(today)
if not(os.path.exists(os.getcwd() + os.sep + folderToCreate)):
os.mkdir("./search_result")
fpJ = os.path.join(os.getcwd()+os.sep+folderToCreate,jsonFileName)
print(fpJ)
with open(fpJ,"a") as jsf:
jsf.write(jsondata)
print("finish writing")
It's straightforward using Counter. Once you pass an iterable, it returns each one of them along with the number of occurrences as tuples.
As the re.findall function returns a list you can just do len(result).

Analysing a text file in Python

I have a text file that needs to be analysed. Each line in the file is of this form:
7:06:32 (slbfd) IN: "lq_viz_server" aqeela#nabltas1
7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS ) Albahraj#nabwmps3 (License server system does not support this feature. (-18,327))
7:08:21 (slbfd) OUT: "OFM32" Albahraj#nabwmps3
I need to skip the timestamp and the (slbfd) and only keep a count of the lines with the IN and OUT. Further, depending on the name in quotes, I need to increase a variable count for different variables if a line starts with OUT and decrease the variable count otherwise. How would I go about doing this in Python?
The other answers with regex and splitting the line will get the job done, but if you want a fully maintainable solution that will grow with you, you should build a grammar. I love pyparsing for this:
S ='''
7:06:32 (slbfd) IN: "lq_viz_server" aqeela#nabltas1
7:08:21 (slbfd) UNSUPPORTED: "Slb_Internal_vlsodc" (PORT_AT_HOST_PLUS ) Albahraj#nabwmps3 (License server system does not support this feature. (-18,327))
7:08:21 (slbfd) OUT: "OFM32" Albahraj#nabwmps3'''
from pyparsing import *
from collections import defaultdict
# Define the grammar
num = Word(nums)
marker = Literal(":").suppress()
timestamp = Group(num + marker + num + marker + num)
label = Literal("(slbfd)")
flag = Word(alphas)("flag") + marker
name = QuotedString(quoteChar='"')("name")
line = timestamp + label + flag + name + restOfLine
grammar = OneOrMore(Group(line))
# Now parsing is a piece of cake!
P = grammar.parseString(S)
counts = defaultdict(int)
for x in P:
if x.flag=="IN": counts[x.name] += 1
if x.flag=="OUT": counts[x.name] -= 1
for key in counts:
print key, counts[key]
This gives as output:
lq_viz_server 1
OFM32 -1
Which would look more impressive if your sample log file was longer. The beauty of a pyparsing solution is the ability to adapt to a more complex query in the future (ex. grab and parse the timestamp, pull email address, parse error codes...). The idea is that you write the grammar independent of the query - you simply convert the raw text to a computer friendly format, abstracting away the parsing implementation away from it's usage.
If I consider that the file is divided into lines (I don't know if it's true) you have to apply split() function to each line. You will have this:
["7:06:32", "(slbfd)", "IN:", "lq_viz_server", "aqeela#nabltas1"]
And then I think you have to be capable of apply any logic comparing the values that you need.
i made some wild assumptions about your specification and here is a sample code to help you start:
objects = {}
with open("data.txt") as data:
for line in data:
if "IN:" in line or "OUT:" in line:
try:
name = line.split("\"")[1]
except IndexError:
print("No double quoted name on line: {}".format(line))
name = "PARSING_ERRORS"
if "OUT:" in line:
diff = 1
else:
diff = -1
try:
objects[name] += diff
except KeyError:
objects[name] = diff
print(objects) # for debug only, not advisable to print huge number of names
You have two options:
Use the .split() function of the string (as pointed out in the comments)
Use the re module for regular expressions.
I would suggest using the re module and create a pattern with named groups.
Recipe:
first create a pattern with re.compile() containing named groups
do a for loop over the file to get the lines use .match() od the
created pattern object on each line use .groupdict() of the
returned match object to access your values of interest
In the mode of just get 'er done with the standard distribution, this works:
import re
from collections import Counter
# open your file as inF...
count=Counter()
for line in inF:
match=re.match(r'\d+:\d+:\d+ \(slbfd\) (\w+): "(\w+)"', line)
if match:
if match.group(1) == 'IN': count[match.group(2)]+=1
elif match.group(1) == 'OUT': count[match.group(2)]-=1
print(count)
Prints:
Counter({'lq_viz_server': 1, 'OFM32': -1})

Categories