Parsing with multiple loops opening files

Parsing with multiple loops opening files - python

I'm trying to count the number of lines contained by a file that looks like this:
-StartACheck
---Lines--
-EndACheck
-StartBCheck
---Lines--
-EndBCheck
with this:
count=0
z={}
for line in file:
s=re.search(r'\-+Start([A-Za-z0-9]+)Check',line)
if s:
e=s.group(1)
for line in file:
z.setdefault(e,[]).append(count)
q=re.search(r'\-+End',line)
if q:
count=0
break
for a,b in z.items():
print(a,len(b))
I want to basically store the number of lines present inside ACheck , BCheck etc in a dictionary but I keep getting the wrong output
Something like this
A,15
B,9
etc
I found out that even though the code should work, it doesn't because of the way the file is opened. I can't change the way it is opened and was looking for an implementation that only opens the file once but counts the same things and gives the exact same output without all the added functions of the newer python version.

This kind of problem can be resolved with a finite state machine. This is a complex matter that would need more explanation than what I could write here. You should look into it to further understand what you can do with it.
But first of all, I'm going to do a few presumptions:
The input file doesn't have any errors
If you have more than one section with the same name, you want their count to be combined
Even though you have tagged this question python 2.7, because you are using print(), I'll presume you are using python 3.x
Here's my suggestion:
import re
input_filename = "/home/evens/Temporaire/StackOverflow/Input_file-39339007.txt"
matchers = {
'start_section' : re.compile(r'\-+Start([A-Za-z0-9]+)Check'),
'end_section' : re.compile(r'\-+End'),
}
inside_section = False # Am I inside a section ?
section_name = None # Which section am I in ?
tally = {} # Sums of each section
with open(input_filename) as file_read:
for line in file_read:
line_matches = {k: v.match(line) for (k, v) in matchers.items()}
if inside_section:
if line_matches['end_section']:
future_inside_section = False
else:
future_inside_section = True
if section_name in tally:
tally[section_name] += 1
else:
tally[section_name] = 1
else:
if line_matches['start_section']:
future_inside_section = True
section_name = line_matches['start_section'].group(1)
# Just before we go in the future
inside_section = future_inside_section
for (a,b) in tally.items():
print('Total of all "{}" sections: {}'.format(a, b))
What this code does is determine :
How it should change its state (Am I going to be inside or outside a section on the next line?)
What else should be done:
Change the name of the section I'm in ?
Count this line in the present section ?
But even this code has its problems:
It doesn't check to see if a section start has a matching section end (-StartACheck could be ended by -EndATotallyInvalidCheck)
It doesn't handle the case where two consecutive section starts (or ends) are detected (Error? Nested sections?)
It doesn't handle the case where there are lines outside a section
And probably other corner cases.
How you want to handle these cases is up to you
This code could probably be further simplified but I don't want to be too complex for now.
Hope this helps. Don't hesitate to ask if you need further explanations.

Related

Eliminate Indentations in Python

I'm using the Google Docs API to retrieve the contents of a document and process it using Python. However, the document is of complex structure and I have to loop through multiple nodes of the returned JSON, so I have to use multiple for loops to get the desired content and do the filter necessary. Is there a way that I can eliminate some of the indentations to make the format look much more organized?
Here is a snippet of my loops:
for key, docContent in docs_api_result.json().items():
if key == "body":
content = docContent['content']
for i, body_content in enumerate(content):
if "table" in body_content:
for sKey, tableContent in content[i]['table'].items():
if sKey == "tableRows":
for tableRowContent in tableContent:
for tableCellMain in tableRowContent['tableCells']:
for tableCellContent in tableCellMain['content']:
hasBullet = False
for tableCellElement in tableCellContent['paragraph']['elements']:
if "bullet" in tableCellContent['paragraph']:
...
I know that instead of having
if True:
# some code here
I can replace it with
if False:
continue
# some code here
to remove some of the indents, but that only solves part of the problem. I still have 7 for-loops left and I hope that I could remove some of the indentations as well.
Any help is appreciated! :)

The general method for reducing indentation levels would be to identify blocks of code to go in their own functions.
E.g. looking at your loop, I guess I would try something like:
class ApiResultProcessor(object):
def process_api_result(self, api_result):
doc_dict = api_result.json()
if "body" in doc_dict:
self.process_body(doc_dict["body"])
def process_body(self, body_dict):
content = body_dict["content"]
for i, content_element_dict in enumerate(content):
if "table" in content_element_dict:
self.process_table(content_element_dict["table"])
...
def process_table(self, table_dict):
for tableRowContent in table_dict["tableRows"]:
for tableCellMain in tableRowContent["tableCells"]:
for tableCellContent in tableCellMain['content']:
self.process_cell_content(tableCellContent)
def process_cell_content(self, table_cell_dict):
hasBullet = False
for tableCellElement in table_cell_dict["paragraph"]["elements"]:
if "bullet" in table_cell_dict["paragraph"]:
...
The only refactoring that I have done is trying to avoid the dreadful "for if" antipattern.

I am not that experienced with python, but I am pretty sure you can use only one space and not multiple of four every indentation and it won't be an indentation error. Although it is not according to the PEP 8 protocol...
So just remove every 4 spaces/tab you have in this bunch of code, to 1 space.

Python - program for searching for relevant cells in excel does not work correctly

I've written a code to search for relevant cells in an excel file. However, it does not work as well as I had hoped.
In pseudocode, this is it what it should do:
Ask for input excel file
Ask for input textfile containing keywords to search for
Convert input textfile to list containing keywords
For each keyword in list, scan the excelfile
If the keyword is found within a cell, write it into a new excelfile
Repeat with next word
The code works, but some keywords are not found while they are present within the input excelfile. I think it might have something to do with the way I iterate over the list, since when I provide a single keyword to search for, it works correctly. This is my whole code: https://pastebin.com/euZzN3T3
This is the part I suspect is not working correctly. Splitting the textfile into a list works fine (I think).
#IF TEXTFILE
elif btext == True:
#Split each line of textfile into a list
file = open(txtfile, 'r')
#Keywords in list
for line in file:
keywordlist = file.read().splitlines()
nkeywords = len(keywordlist)
print(keywordlist)
print(nkeywords)
#Iterate over each string in list, look for match in .xlsx file
for i in range(1, nkeywords):
nfound = 0
ws_matches.cell(row = 1, column = i).value = str.lower(keywordlist[i-1])
for j in range(1, worksheet.max_row + 1):
cursor = worksheet.cell(row = j, column = c)
cellcontent = str.lower(cursor.value)
if match(keywordlist[i-1], cellcontent) == True:
ws_matches.cell(row = 2 + nfound, column = i).value = cellcontent
nfound = nfound + 1
and my match() function:
def match(keyword, content):
"""Check if the keyword is present within the cell content, return True if found, else False"""
if content.find(keyword) == -1:
return False
else:
return True
I'm new to Python so my apologies if the way I code looks like a warzone. Can someone help me see what I'm doing wrong (or could be doing better?)? Thank you for taking the time!

Splitting the textfile into a list works fine (I think).
This is something you should actually test (hint: it does but is inelegant). The best way to make easily testable code is to isolate functional units into separate functions, i.e. you could make a function that takes the name of a text file and returns a list of keywords. Then you can easily check if that bit of code works on its own. A more pythonic way to read lines from a file (which is what you do, assuming one word per line) is as follows:
with open(filename) as f:
keywords = f.readlines()
The rest of your code may actually work better than you expect. I'm not able to test it right now (and don't have your spreadsheet to try it on anyway), but if you're relying on nfound to give you an accurate count for all keywords, you've made a small but significant mistake: it's set to zero inside the loop, and thus you only get a count for the last keyword. Move nfound = 0 outside the loop.
In Python, the way to iterate over lists - or just about anything - is not to increment an integer and then use that integer to index the value in the list. Rather loop over the list (or other iterable) itself:
for keyword in keywordlist:
...
As a hint, you shouldn't need nkeywords at all.
I hope this gets you on the right track. When asking questions in future, it'd be a great help to provide more information about what goes wrong, and preferably enough to be able to reproduce the error.

Reading data from a text file in Python according to the parameters provided

I have a text file something like this
Mqtt_allowed=true
Mqtt_host=192.168.0.1
Mqtt_port=2223
<=============>
cloud_allowed=true
cloud_host=m12.abc.com
cloud_port=1232
<=============>
local_storage=true
local_path=abcd
I needed to get each of the value w.r.t parameter provided by the user.
What i am doing right now is:
def search(param):
try:
with open('config.txt') as configuration:
for line in configuration:
if not line:
continue
function, f_input=line.split("=")
if function == param:
result=f_input.split()
break
else:
result="0"
except FileNotFoundError:
print("File not found: ")
return result
mqttIsAllowed=search("Mqtt_allowed")
print mqttIsAllowed
Now when i call only mqt stuff it is working fine but when i call cloud or anything after the "<==========>" separation it throws an error. Thanks

Just skip all the lines starting with <:
if not line or line.lstrip().startswith("<"):
continue
Or, if you really, really want to match the separator exactly:
if line.strip() == "<=============>":
continue
I think the first variant is better because if someone slightly modified the separator by accident, the second piece of code won't work at all.

Because you are trying to split on the = character in a style that seems to be standard INI format, it is safe to assume that your pairs will be at max size 2. I'm not a fan of using methods that rely on character checking (unless specifically called for), so give this a whirl:
def search(param):
result = '0' # declare here
try:
with open('config.txt') as configuration:
for line in configuration:
if not line:
continue
f_pair = line.strip().split("=") # remove \r\n, \n
if len(f_pair) > 2: # your separator will be much longer
continue
else if f_pair[0] == param:
result = f_pair[1]
# result = f_input.split() # why the 'split()' here?
break
except FileNotFoundError:
print("File not found: ")
return result
mqttIsAllowed=search("Mqtt_allowed")
I'm pretty sure the error you were getting was a ValueError: too many values to unpack.
Here is how I know that:
When you call this function for any of the Mqtt_* values, the loop never encounters the separator string <=============>. As soo as you try to call anything below that first separator (for example a cloud_* key), the loop eventually reaches the first separator and tries to execute:
function, f_input = line.split('=')
But that wont work, in fact it will tell you:
ValueError: too many values to unpack (expected 2)
And that is because you are forcing the split() call to push into only 2 variables, but a split('=') on your separator string will return a list of 15 elements (a '<', a '>' and 13 ''). Thus, doing what I have posted above ensures that your split('=') still goes off, but checks to see if you hit a separator or not.

Functional approach to file parsing in Python

I have a text file describing an electronic circuit and a few other things done with it. I've built a simple Python code that splits the file into different units which can then be further analyzed if needed.
The syntax of the simulation language defines these units as contained within the following lines:
subckt xxx .....
...
...
ends xxx ...
There is a few of these 'text blocks' and other stuff I'm parsing or leaving out - like comment lines.
To accomplish this, I use the following core:
with open('input') as f:
for l in iter(f):
if 'subckt' not in l:
pass
else:
with open('output') as o:
o.write(l)
for l in iter(f):
if 'ends' in l:
o.write(l)
break
else:
o.write(l)
(can't easily paste the real code, there might be oversights)
The nice thing about it is the fact that iter(f) keeps scanning the file so when I break out of the inner loop as I reached the ends line of a subckt, the outer loop keeps going from that point onward, searching for new occurrences of the token subckt in subsequent lines.
I am looking for suggestions and/or guidance on how to transform the forest of if/then clauses into something more functional, i.e. based on 'pure' functions which just yield values (the file rows or lines) and are then composed as to bring to the final result.
Specifically, I am not sure how to approach the fact that the generator\map\filter should actually yield a different row based on the fact that it has found the subckt token or not.
I can think of a filter of the form:
line = filter(lambda x: 'subckt' in x, iter(f))
but this of course only gives me the lines where that string is present, whereas I would like - from that moment on - yield all lines, until the ends token is found.
Is this something I'd have to handle with recursion? Or maybe itertools.tee?
Seems to me that what I want is to have some form of state, i.e. "you have reached a subckt", but without resorting to a true state variable, which would be against the functional paradigm.

Not sure if this is what you are looking for. blocks(f) is a generator producing the blocks in your file f. Each block is an iterator over the lines between 'subckt' and 'ends'. If you want to include those two lines in the block, you'd have to do some more work in _blocks. But I hope this gives you an idea:
def __block(f):
while 'subckt' not in next(f): pass # raises StopIteration at EOF
return iter(next(iter([])) if 'ends' in l else l.strip() for l in f)
def blocks(f):
while 1: yield __block(f) # StopIteration from __block will stop the generator
f = open('data.txt')
for block in blocks(f):
# process block
for line in block:
# process line
next(iter([])) if is a little hack to terminate a comprehension/generator.

This answer also works, still very keen on hearing comments:
from itertools import takewhile, dropwhile
def start(l): return 'subckt' not in l
def stop(l): return 'ends' not in l
def sub(iter):
while True:
a = list(dropwhile(start,takewhile(stop,iter)))
if len(a):
yield a
else:
return
f = open('file.txt')
for b in sub(f):
#process b
f.close()
Something I couldn't work out yet: enclose the last line (containing ends keyword) in the output.

Copy and paste Python functions in Emacs

I have a program that looks something like (this is a silly example to illustrate my point, what it does is not very important)
count = 0
def average(search_term):
average = 0
page = 0
current = download(search_term, page)
while current:
def add_up(downloaded):
results = downloaded.body.get_results()
count += len(results)
return sum(result.score for result in results)
total = average*count
total += add_up(current)
average = total/count
print('Average so far: {:2f}'.format(average))
page += 1
current = download(search_term, page)
If I have the cursor on any of the lines 8–11 and press a key combination I want Emacs to copy or kill the add_up function, and then I want to move the cursor to line 2 and press a key combination and paste the function there, with the correct level of indentation for the context it is pasted in.
Is this possible, and if so, how would I do that?

With python-mode.el py-kill-def and yank would do the job.
However, there are some restrictions. py-kill-def must be called from inside def in question. So needs to go upward from line 11 first.
Also indenting after insert poses some problems: as indent is syntax, sometimes Emacs can't know which indentation is wanted. In example below have an indent of 4 first and of 8 in add_up probably is not wanted - however it's legal code. After indenting first line in body of add_up, py-indent-and-forward should be convenient for the remaining.
def average(search_term):
average = 0
def add_up(downloaded):
results = downloaded.body.get_results()
count += len(results)
return sum(result.score for result in results)
page = 0
current = download(search_term, page)
while current:
total = average*count
total += add_up(current)
average = total/count
print('Average so far: {:2f}'.format(average))
page += 1
current = download(search_term, page)

For this type of thing I usually use expand-region, which I choose to bind to C-=.
Using your example I can select the add_up() function by pressing C-= once, kill the region normally (C-k), move to line 2, and yank as usual (C-y).
Depending on what else you have configured for Python you may have to clean up some whitespace, or it may get cleaned up for you. For example, aggressive-indent would be helpful.
One manual option would be to reindent the pasted code with something like C-x C-x M-\.

I've been using smart-shift (available in Melpa) for this sort of thing. global-smart-shift-mode to enable (beware, it binds keys). Select the block you want to move (I'd use expand-region like Chris), and the default keybind C-S-c <arrow> starts moving it. Once you're shifting, the arrows (without C-S-c) shift further. Horizontal shifts use the major mode's indent offset (python-indent-offset for python.el).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.