Eliminate Indentations in Python - python

I'm using the Google Docs API to retrieve the contents of a document and process it using Python. However, the document is of complex structure and I have to loop through multiple nodes of the returned JSON, so I have to use multiple for loops to get the desired content and do the filter necessary. Is there a way that I can eliminate some of the indentations to make the format look much more organized?
Here is a snippet of my loops:
for key, docContent in docs_api_result.json().items():
if key == "body":
content = docContent['content']
for i, body_content in enumerate(content):
if "table" in body_content:
for sKey, tableContent in content[i]['table'].items():
if sKey == "tableRows":
for tableRowContent in tableContent:
for tableCellMain in tableRowContent['tableCells']:
for tableCellContent in tableCellMain['content']:
hasBullet = False
for tableCellElement in tableCellContent['paragraph']['elements']:
if "bullet" in tableCellContent['paragraph']:
...
I know that instead of having
if True:
# some code here
I can replace it with
if False:
continue
# some code here
to remove some of the indents, but that only solves part of the problem. I still have 7 for-loops left and I hope that I could remove some of the indentations as well.
Any help is appreciated! :)

The general method for reducing indentation levels would be to identify blocks of code to go in their own functions.
E.g. looking at your loop, I guess I would try something like:
class ApiResultProcessor(object):
def process_api_result(self, api_result):
doc_dict = api_result.json()
if "body" in doc_dict:
self.process_body(doc_dict["body"])
def process_body(self, body_dict):
content = body_dict["content"]
for i, content_element_dict in enumerate(content):
if "table" in content_element_dict:
self.process_table(content_element_dict["table"])
...
def process_table(self, table_dict):
for tableRowContent in table_dict["tableRows"]:
for tableCellMain in tableRowContent["tableCells"]:
for tableCellContent in tableCellMain['content']:
self.process_cell_content(tableCellContent)
def process_cell_content(self, table_cell_dict):
hasBullet = False
for tableCellElement in table_cell_dict["paragraph"]["elements"]:
if "bullet" in table_cell_dict["paragraph"]:
...
The only refactoring that I have done is trying to avoid the dreadful "for if" antipattern.

I am not that experienced with python, but I am pretty sure you can use only one space and not multiple of four every indentation and it won't be an indentation error. Although it is not according to the PEP 8 protocol...
So just remove every 4 spaces/tab you have in this bunch of code, to 1 space.

Related

Python - program for searching for relevant cells in excel does not work correctly

I've written a code to search for relevant cells in an excel file. However, it does not work as well as I had hoped.
In pseudocode, this is it what it should do:
Ask for input excel file
Ask for input textfile containing keywords to search for
Convert input textfile to list containing keywords
For each keyword in list, scan the excelfile
If the keyword is found within a cell, write it into a new excelfile
Repeat with next word
The code works, but some keywords are not found while they are present within the input excelfile. I think it might have something to do with the way I iterate over the list, since when I provide a single keyword to search for, it works correctly. This is my whole code: https://pastebin.com/euZzN3T3
This is the part I suspect is not working correctly. Splitting the textfile into a list works fine (I think).
#IF TEXTFILE
elif btext == True:
#Split each line of textfile into a list
file = open(txtfile, 'r')
#Keywords in list
for line in file:
keywordlist = file.read().splitlines()
nkeywords = len(keywordlist)
print(keywordlist)
print(nkeywords)
#Iterate over each string in list, look for match in .xlsx file
for i in range(1, nkeywords):
nfound = 0
ws_matches.cell(row = 1, column = i).value = str.lower(keywordlist[i-1])
for j in range(1, worksheet.max_row + 1):
cursor = worksheet.cell(row = j, column = c)
cellcontent = str.lower(cursor.value)
if match(keywordlist[i-1], cellcontent) == True:
ws_matches.cell(row = 2 + nfound, column = i).value = cellcontent
nfound = nfound + 1
and my match() function:
def match(keyword, content):
"""Check if the keyword is present within the cell content, return True if found, else False"""
if content.find(keyword) == -1:
return False
else:
return True
I'm new to Python so my apologies if the way I code looks like a warzone. Can someone help me see what I'm doing wrong (or could be doing better?)? Thank you for taking the time!
Splitting the textfile into a list works fine (I think).
This is something you should actually test (hint: it does but is inelegant). The best way to make easily testable code is to isolate functional units into separate functions, i.e. you could make a function that takes the name of a text file and returns a list of keywords. Then you can easily check if that bit of code works on its own. A more pythonic way to read lines from a file (which is what you do, assuming one word per line) is as follows:
with open(filename) as f:
keywords = f.readlines()
The rest of your code may actually work better than you expect. I'm not able to test it right now (and don't have your spreadsheet to try it on anyway), but if you're relying on nfound to give you an accurate count for all keywords, you've made a small but significant mistake: it's set to zero inside the loop, and thus you only get a count for the last keyword. Move nfound = 0 outside the loop.
In Python, the way to iterate over lists - or just about anything - is not to increment an integer and then use that integer to index the value in the list. Rather loop over the list (or other iterable) itself:
for keyword in keywordlist:
...
As a hint, you shouldn't need nkeywords at all.
I hope this gets you on the right track. When asking questions in future, it'd be a great help to provide more information about what goes wrong, and preferably enough to be able to reproduce the error.

Python .find() to return the end of found string

So I want to pick some data out of a text file, which looks like this:
##After some other stuff which could change
EASY:[5,500]
MEDIUM:[10,100]
HARD:[20,1000]
EXPERT:[30,2000]
EXTREME:[50,5000]
I'm writing a function which uses the difficulty ('EASY' 'HARD' e.t.c) to return the following list. My current code looks like this:
def setAI(difficulty): #difficulty='EASY' or 'HARD' or...e.t.c)
configFile=open('AISettings.txt')
config=configFile.read()
print(config[(config.find(difficulty)):(config.find(']',(config.find(difficulty))))]) #So it will return the chunk between the difficulty, and the next closed-square-bracket after that
This produces the following output:
>>> HARD:[20,1000
I tried fixing it like this:
print(config[(config.find(difficulty)+2):(config.find(']',(config.find(difficulty)+2))+1)])
which returns:
>>>RD:[20,1000]
The issue I'm trying to adress is that I want it to start after the colon, I am aware that I could use the length of the difficulty string to solve this, but is there a simpler way of returning the end of the string when using the .find() command?
P.S: I couldn't find any duplicates for this, but it is a slightly odd question, so sorry if it's already on here somewhere; Thanks in advance
EDIT: Thanks for the replies, I think you basically all solved the problem, but the chosen answer was becasue I like the iteration line-by-line idea, Cheers guys :)
Well if the file look like this, why not just iterate line by line and do something like:
def setAI(difficulty): #difficulty='EASY' or 'HARD' or...e.t.c)
configFile=open('AISettings.txt')
config=configFile.readlines()
for line in config:
if line.startswith(difficulty.upper()):
print(line[len(difficulty) + 1:])
Find returns the location. But ranges assume that their end number should not be included. Just add one to the end.
config = """
##After some other stuff which could change
EASY:[5,500]
MEDIUM:[10,100]
HARD:[20,1000]
EXPERT:[30,2000]
EXTREME:[50,5000]
"""
difficulty = 'HARD'
begin = config.find(difficulty)
end = config.find(']', begin)
print(config[begin:end+1])
The function find will always give you the position of the first letter of the string. Also consider that the notation string[start:end] will give you the substring including the character at start but excluding the character at end. Therefore you could use something like the following:
def setAI(difficulty):
configFile = open('AISettings.txt')
config = configFile.read()
start = config.find(difficulty) + len(difficulty) + 1
end = config.find(']', start) + 1
print(config[start:end])

Parsing with multiple loops opening files

I'm trying to count the number of lines contained by a file that looks like this:
-StartACheck
---Lines--
-EndACheck
-StartBCheck
---Lines--
-EndBCheck
with this:
count=0
z={}
for line in file:
s=re.search(r'\-+Start([A-Za-z0-9]+)Check',line)
if s:
e=s.group(1)
for line in file:
z.setdefault(e,[]).append(count)
q=re.search(r'\-+End',line)
if q:
count=0
break
for a,b in z.items():
print(a,len(b))
I want to basically store the number of lines present inside ACheck , BCheck etc in a dictionary but I keep getting the wrong output
Something like this
A,15
B,9
etc
I found out that even though the code should work, it doesn't because of the way the file is opened. I can't change the way it is opened and was looking for an implementation that only opens the file once but counts the same things and gives the exact same output without all the added functions of the newer python version.
This kind of problem can be resolved with a finite state machine. This is a complex matter that would need more explanation than what I could write here. You should look into it to further understand what you can do with it.
But first of all, I'm going to do a few presumptions:
The input file doesn't have any errors
If you have more than one section with the same name, you want their count to be combined
Even though you have tagged this question python 2.7, because you are using print(), I'll presume you are using python 3.x
Here's my suggestion:
import re
input_filename = "/home/evens/Temporaire/StackOverflow/Input_file-39339007.txt"
matchers = {
'start_section' : re.compile(r'\-+Start([A-Za-z0-9]+)Check'),
'end_section' : re.compile(r'\-+End'),
}
inside_section = False # Am I inside a section ?
section_name = None # Which section am I in ?
tally = {} # Sums of each section
with open(input_filename) as file_read:
for line in file_read:
line_matches = {k: v.match(line) for (k, v) in matchers.items()}
if inside_section:
if line_matches['end_section']:
future_inside_section = False
else:
future_inside_section = True
if section_name in tally:
tally[section_name] += 1
else:
tally[section_name] = 1
else:
if line_matches['start_section']:
future_inside_section = True
section_name = line_matches['start_section'].group(1)
# Just before we go in the future
inside_section = future_inside_section
for (a,b) in tally.items():
print('Total of all "{}" sections: {}'.format(a, b))
What this code does is determine :
How it should change its state (Am I going to be inside or outside a section on the next line?)
What else should be done:
Change the name of the section I'm in ?
Count this line in the present section ?
But even this code has its problems:
It doesn't check to see if a section start has a matching section end (-StartACheck could be ended by -EndATotallyInvalidCheck)
It doesn't handle the case where two consecutive section starts (or ends) are detected (Error? Nested sections?)
It doesn't handle the case where there are lines outside a section
And probably other corner cases.
How you want to handle these cases is up to you
This code could probably be further simplified but I don't want to be too complex for now.
Hope this helps. Don't hesitate to ask if you need further explanations.

Use list/set comprehension merely as a "for" loop?

I am creating a set of NUM_RECORDS tuples in Python. This is my code.
record_key_list = {(choice(tuple(studentID_list)),
choice(tuple(courseID_list)),
randint(2012, 2016),
choice(semesters),
choice(grades)[0])
for no_use in range(NUM_RECORDS)}
An alternative is to code the problem like this.
record_key_list = set()
while len(record_key_list) < NUM_RECORDS:
record_key_list.add((choice(tuple(studentID_list)),
choice(tuple(courseID_list)),
randint(2012, 2016),
choice(semesters),
choice(grades)[0]))
I timed the two code snippets and they are roughly the same as fast for 20000 records. I prefer the first version of the code stylistically.
Is the first version of the code a correct usage of set comprehension? Or should I always stick to the second method?
EDIT: Improved formatting as suggested. I mostly just copied and pasted from the IDE. Sorry about that, guys.
The first code snippet looks totally fine. If anything, I would extract the record creation to a function for clarity and easier refactoring.
def random_record():
studentID = choice(studentID_list)
courseID = choice(courseID_list)
year = randint(2012, 2016)
semester = choice(semesters)
grade = choice(grades)[0]
return (studentID, courseID, year, semester, grade)
# ...
record_key_list = {random_record() for _ in range(NUM_RECORDS)}

Selecting from multiple variables

I am attempting to find objects on the screen, see if they exist, and if so, select them. Using the Sikuli library to run this little automation.
while True:
if exist("image/one.png", "image/two.png", "image/three.png"):
click ("image/one.png", or "image/two.png", or "image/three.png")
break
I get SyntaxError: mismatched input 'or' expecting RPARENa I've done a quick search but there is nothing I saw relevant to my particular issue.
I've even tried
while True:
if exist("image/one.png", or "image/two.png", or "image/three.png"):
click ("image/one.png", or "image/two.png", or "image/three.png")
break
And that results in the same error.
#Stephan: New code snippet with error.
class gameImages():
imageFiles = ["one.png", "two.png", "three,png"]
for imageFile in imageFiles:
if exists(imageFile):
click(imageFile)
The Error now, :
NameError: name 'imageFiles' is not defined
for imageFile in imageFiles:
if exists(imageFile):
click(imageFile)
Your while loop isn't doing anything, and neither is your break statement. This might do what you want, assuming I understand what you want to do.
After reading a little of the Sikuli docs, I think this might also do what you want.
for impath in ("image/one.png", "image/two.png", "image/three.png"):
match = exists(impath)
if match:
click(match.getTarget())
Even easier, this is a perfect use of filter(ifexist,imageFiles). You then know that all >=0 elements in the return of filter can be used :). And it's more concise and clearly conveys your intent - much nicer to read then a chain of for's and if's
a = range(10)
# [1,2,3,4,5,6,7,8,9]
print filter(lambda x: x > 5, a)
# [6,7,8,9]
Also the or is a logical operator:
e.g.
a = 5
b = 6
c = 5
if( (a==c) or (b==c) ):
print 'c is repeated'
# c is repeated
your use of the or here makes no sense as it doesn't have operands to operate on - these can even be two objects, e.g.
1 or 2 since anything can be cast to a boolean
a concise way to do what you want is:
//imagepaths = your list of imagepaths
map(lambda x: click(x.getTarget()), filter(exists, imagepaths))

Categories