In our code base, I often find functions formatted in the following way:
some_function_with_a_very_long_name(parameter_a,
parameter_b,
parameter_c)
This moves a lot of information to the right side of the screen and less readable than the cleaner alternative:
some_function_with_a_very_long_name(
parameter_a,
parameter_b,
parameter_c,
)
This could be detected by checking that the number of indentations in any given line is at most one indentation level greater than the line before.
Is there any linting rule (in Flake8, SonarQube or similar) that I can use to automatically check that this is done properly in our CI/CD pipeline?
Here is a function I just wrote, it returns a list of all (should be all) blocks where there is overindentation like here (also the line of code (starting at line 1)):
some_function_with_a_very_long_name(parameter_a,
parameter_b,
parameter_c)
Then you can iterate over that list to print out all those cases
Code:
def find_over_indentations(file_name, level_depth, search_possible=True):
def count_spaces_before(lst):
counter = 0
for item in lst:
if item == '':
counter += 1
counter = counter
else:
break
return counter
faulty_blocks = list()
faulty_block = ''
previous_line = ''
caution = False
previous_indentation_level = 0
with open(file_name) as file:
for index, line in enumerate(file):
split_by_space = line.split(' ')
spaces_before = count_spaces_before(split_by_space)
current_indentation_level = spaces_before // level_depth
possible_fault = spaces_before % level_depth if search_possible else 0
if ((current_indentation_level - previous_indentation_level > 1 or possible_fault != 0)
and faulty_block == ''):
caution = True
faulty_block += str(index) + ' ' + previous_line
faulty_block += str(index + 1) + ' ' + line
elif caution and current_indentation_level == previous_indentation_level:
faulty_block += str(index + 1) + ' ' + line
else:
if faulty_block != '':
faulty_blocks.append(faulty_block)
faulty_block = ''
caution = False
previous_indentation_level = current_indentation_level
previous_line = line
if faulty_block != '':
faulty_blocks.append(faulty_block)
return faulty_blocks
if __name__ == '__main__':
over_indents = find_over_indentations('test.py', 4)
for over_indented in over_indents:
print(over_indented)
Basically you can just use this code (just replace the file name with whatever file you need) and it should print out all those lines where is such issues (it will also show the "starting" line for example in the above case this will also get printed: some_function_with_a_very_long_name(parameter_a,), also possible to toggle possible faults (default is true) and it basically checks if there are indents outside of the level depth for example an extra space or two, level depth argument is how deep is level for example usually it is 4 spaces (tab)
EDIT1: there was a slight issue with the first if statement in a case when there was a possible fault (out of indention depth) where it run that if statement twice so I added a condition so that that if statement runs only if there is nothing appended to the faulty_block. (also this introduces an interesting situation in the current where it is formatted correctly as per indentation level but it would be appended to the list if this function run on this file, that is why it shows line number and previous line so that human can go and check them manually but they don't have to look all over the file)
Related
I have made a script:
our_word = "Success"
def duplicate_encode(word):
char_list = []
final_str = ""
changed_index = []
base_wrd = word.lower()
for k in base_wrd:
char_list.append(k)
for i in range(0, len(char_list)):
count = 0
for j in range(i + 1, len(char_list)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
else:
continue
if count > 0:
char_list[i] = ")"
else:
char_list[i] = "("
print(changed_index)
print(char_list)
final_str = "".join(char_list)
return final_str
print(duplicate_encode(our_word))
essentialy the purpose of this script is to convert a string to a new string where each character in the new string is "(", if that character appears only once in the original string, or ")", if that character appears more than once in the original string. I have made a rather layered up script (I am relatively new to the python language so didn't want to use any helpful in-built functions) that attempts to do this. My issue is that where I check if the current index has been previously edited (in order to prevent it from changing), it seems to ignore it. So instead of the intended )())()) I get )()((((. I'd really appreciate an insightful answer to why I am getting this issue and ways to work around this, since I'm trying to gather an intuitive knowledge surrounding python. Thanks!
word = "Success"
print(''.join([')' if word.lower().count(c) > 1 else '(' for c in word.lower()]))
The issue here has nothing to do with your understanding of Python. It's purely algorithmic. If you retain this 'layered' algorithm, it is essential that you add one more check in the "i" loop.
our_word = "Success"
def duplicate_encode(word):
char_list = list(word.lower())
changed_index = []
for i in range(len(word)):
count = 0
for j in range(i + 1, len(word)):
if j not in changed_index:
if char_list[j] == char_list[i]:
char_list[j] = ")"
changed_index.append(j)
count += 1
if i not in changed_index: # the new inportant check to avoid reversal of already assigned ')' to '('
char_list[i] = ")" if count > 0 else "("
return "".join(char_list)
print(duplicate_encode(our_word))
Your algorithm can be greatly simplified if you avoid using char_list as both the input and output. Instead, you can create an output list of the same length filled with ( by default, and then only change an element when a duplicate is found. The loops will simply walk along the entire input list once for each character looking for any matches (other than self-matches). If one is found, the output list can be updated and the inner loop will break and move on to the next character.
The final code should look like this:
def duplicate_encode(word):
char_list = list(word.lower())
output = list('(' * len(word))
for i in range(len(char_list)):
for j in range(len(char_list)):
if i != j and char_list[i] == char_list[j]:
output[i] = ')'
break
return ''.join(output)
for our_word in (
'Success',
'ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ',
):
result = duplicate_encode(our_word)
print(our_word)
print(result)
Output:
Success
)())())
ChJsTk(u cIUzI htBp#qX)OTIHpVtHHhQ
))(()(()))))())))()()((())))()))))
I'm trying to decompress strings using recursion. For example, the input:
3[b3[a]]
should output:
baaabaaabaaa
but I get:
baaaabaaaabaaaabbaaaabaaaabaaaaa
I have the following code but it is clearly off. The first find_end function works as intended. I am absolutely new to using recursion and any help understanding / tracking where the extra letters come from or any general tips to help me understand this really cool methodology would be greatly appreciated.
def find_end(original, start, level):
if original[start] != "[":
message = "ERROR in find_error, must start with [:", original[start:]
raise ValueError(message)
indent = level * " "
index = start + 1
count = 1
while count != 0 and index < len(original):
if original[index] == "[":
count += 1
elif original[index] == "]":
count -= 1
index += 1
if count != 0:
message = "ERROR in find_error, mismatched brackets:", original[start:]
raise ValueError(message)
return index - 1
def decompress(original, level):
# set the result to an empty string
result = ""
# for any character in the string we have not looked at yet
for i in range(len(original)):
# if the character at the current index is a digit
if original[i].isnumeric():
# the character of the current index is the number of repetitions needed
repititions = int(original[i])
# start = the next index containing the '[' character
x = 0
while x < (len(original)):
if original[x].isnumeric():
start = x + 1
x = len(original)
else:
x += 1
# last = the index of the matching ']'
last = find_end(original, start, level)
# calculate a substring using `original[start + 1:last]
sub_original = original[start + 1 : last]
# RECURSIVELY call decompress with the substring
# sub = decompress(original, level + 1)
# concatenate the result of the recursive call times the number of repetitions needed to the result
result += decompress(sub_original, level + 1) * repititions
# set the current index to the index of the matching ']'
i = last
# else
else:
# concatenate the letter at the current index to the result
if original[i] != "[" and original[i] != "]":
result += original[i]
# return the result
return result
def main():
passed = True
ORIGINAL = 0
EXPECTED = 1
# The test cases
provided = [
("3[b]", "bbb"),
("3[b3[a]]", "baaabaaabaaa"),
("3[b2[ca]]", "bcacabcacabcaca"),
("5[a3[b]1[ab]]", "abbbababbbababbbababbbababbbab"),
]
# Run the provided tests cases
for t in provided:
actual = decompress(t[ORIGINAL], 0)
if actual != t[EXPECTED]:
print("Error decompressing:", t[ORIGINAL])
print(" Expected:", t[EXPECTED])
print(" Actual: ", actual)
print()
passed = False
# print that all the tests passed
if passed:
print("All tests passed")
if __name__ == '__main__':
main()
From what I gathered from your code, it probably gives the wrong result because of the approach you've taken to find the last matching closing brace at a given level (I'm not 100% sure, the code was a lot). However, I can suggest a cleaner approach using stacks (almost similar to DFS, without the complications):
def decomp(s):
stack = []
for i in s:
if i.isalnum():
stack.append(i)
elif i == "]":
temp = stack.pop()
count = stack.pop()
if count.isnumeric():
stack.append(int(count)*temp)
else:
stack.append(count+temp)
for i in range(len(stack)-2, -1, -1):
if stack[i].isnumeric():
stack[i] = int(stack[i])*stack[i+1]
else:
stack[i] += stack[i+1]
return stack[0]
print(decomp("3[b]")) # bbb
print(decomp("3[b3[a]]")) # baaabaaabaaa
print(decomp("3[b2[ca]]")) # bcacabcacabcaca
print(decomp("5[a3[b]1[ab]]")) # abbbababbbababbbababbbababbbab
This works on a simple observation: rather tha evaluating a substring after on reading a [, evaluate the substring after encountering a ]. That would allow you to build the result AFTER the pieces have been evaluated individually as well. (This is similar to the prefix/postfix evaluation using programming).
(You can add error checking to this as well, if you wish. It would be easier to check if the string is semantically correct in one pass and evaluate it in another pass, rather than doing both in one go)
Here is the solution with the similar idea from above:
we go through string putting everything on stack until we find ']', then we go back until '[' taking everything off, find the number, multiply and put it back on stack
It's much less consuming as we don't add strings, but work with lists
Note: multiply number can't be more than 9 as we parse it as one element string
def decompress(string):
stack = []
letters = []
for i in string:
if i != ']':
stack.append(i)
elif i == ']':
letter = stack.pop()
while letter != '[':
letters.append(letter)
letter = stack.pop()
word = ''.join(letters[::-1])
letters = []
stack.append(''.join([word for j in range(int(stack.pop()))]))
return ''.join(stack)
I'm looping through lines in a file to create a dict with the start/stop positions, however, am getting way too many results and I'm unsure why. It looks like every addition of the variable ref_start and ref_end is being added multiple times in the dictionary.
def main():
#initialize variables for counts
gb_count = 0
glimmer_count = 0
exact_count = 0
five_prime_count = 0
three_prime_count = 0
no_matches_count = 0
#protein_id list
protein_id = []
#initialize lists for start/stop coordinates
reference = []
prediction = []
#read in GeneBank file
for line in open('file'):
line = line.rstrip()
if "protein_id=" in line:
pro_id = line.split("=")
pro_id = pro_id[1].replace('"','')
protein_id.append(pro_id)
elif "CDS" in line:
if "join" in line:
continue
elif "/translation" in line:
continue
elif "P" in line:
continue
elif "complement" in line:
value = " ".join(line.split()).replace('CDS','').replace("(",'').replace(")",'').split("complement")
newValue = value[1].split("..")
ref_start = newValue[1]
ref_end = newValue[0]
gb_count += 1
else:
test = " ".join(line.split()).replace('CDS','').split("..")
ref_start = test[0]
ref_end = test[1]
gb_count += 1
reference.append({'refstart': ref_start, 'refend': ref_end})
print(reference)
I initially posted something else that was wrong, but I copied over the code and ran a dummy file and I think I figured it out. Your problem is: for line in open('file').
What it is doing (what it did for me) is loading the file up by character. Instead of 'line' = "protein_id=", you're getting 'line' = "p" then 'line' = "r", etc.
The fix is too simple. This is what I did:
file = open('file')
for line in file:
I'm not 100% on this explanation, but I think it has to do with the way python is loading the file. Since it hasn't been established as one long string, it's loading up each individual element. Once it has been made a string, it can break it down by line. Hope this helped.
I wrote the piece of code below a while back, and had this issue then as well. I ignored it at the time and when I came back to it after asking an 'expert' to look at it, it was working fine.
The issue is, sometimes the program seems unable to run the main() on my laptop, possibly due to how heavy the algorithm is. Is there a way around this? I would hate to keep having this problem in the future. The same code is working perfectly on another computer which i have limited access to.
(P.S. laptop having the issue is a MacBook Air 2015 and it should have no problem running the program. Also, it stops after printing "hi")
It does not give and error message, it just doesn't print anything from main(). It's supposed to print a series of strings which progressively converge to "methinks it is like a weasel". In eclipse, it shows that the code is still being processed but it does not output anything that it is supposed to
import random
def generateOne(strlen):
alphabet = "abcdefghijklmnopqrstuvwxyz "
res = ""
for i in range(strlen):
res = res + alphabet[random.randrange(27)]
return res
def score(goal, teststring):
numSame = 0
for i in range(len(goal)):
if goal[i] == teststring[i]:
numSame = numSame + 1
return numSame / len(goal)
def main():
goalstring = "methinks it is like a weasel"
chgoal = [0]*len(goalstring)
newstring = generateOne(28)
workingstring = list(newstring)
countvar = 0
finalstring = ""
while score(list(goalstring), workingstring) < 1:
if score(goalstring, newstring) > 0:
for j in range(len(goalstring)):
if goalstring[j] == newstring[j] and chgoal[j] == 0:
workingstring[j] = newstring[j]
chgoal[j] = 1
finalstring = "".join(workingstring)
countvar = countvar + 1
print(finalstring)
newstring = generateOne(28)
finalstring = "".join(workingstring)
print(finalstring)
print(countvar)
print("hi")
if __name__ == '__main__':
main()
print("ho")
You can optimize a bit. Strings are immutable - every time you append one char to a string a new string is created and replaces the old one. Use lists of chars instead - also do not use "".join() all the time for printing purposes if you can print the list of chars by decomposing and a seperator of "":
import random
def generateOne(strlen):
"""Create one in one random-call, return as list, do not iterativly add to string"""
alphabet = "abcdefghijklmnopqrstuvwxyz "
return random.choices(alphabet,k=strlen)
def score(goal, teststring):
"""Use zip and generator expr. for summing/scoring"""
return sum(1 if a==b else 0 for a,b in zip(goal,teststring))/len(goal)
def main():
goalstring = list("methinks it is like a weasel") # use a list
newstring = generateOne(28) # also returns a list
workingstring = newstring [:] # copy
countvar = 0
while score(goalstring, workingstring) < 1:
if score(goalstring, newstring) > 0:
for pos,c in enumerate(goalstring): # enumerate for getting the index
# test if equal, only change if not yet ok
if c == newstring[pos] and workingstring[pos] != c:
workingstring[pos] = newstring[pos] # could use c instead
countvar += 1
print(*workingstring, sep="") # print decomposed with sep of ""
# instead of "".join()
newstring = generateOne(28)
finalstring = "".join(workingstring) # create result once ...
# although its same as goalstring
# so we could just assing that one
print(finalstring)
print(countvar)
print("hi")
if __name__ == '__main__':
s = datetime.datetime.now()
main()
print(datetime.datetime.now()-s)
print("ho")
Timings with printouts are very unrelieable. If I comment the print printing the intermediate steps to the final solution and use a `random.seed(42)' - I get for mine:
0:00:00.012536
0:00:00.012664
0:00:00.008590
0:00:00.012575
0:00:00.012576
and for yours:
0:00:00.017490
0:00:00.017427
0:00:00.013481
0:00:00.017657
0:00:00.013210
I am quite sure this wont solve your laptops issues, but still - it is a bit faster.
Hi stackoverflow Users,
I am wondering how to use for loop with string.
For example,
There is a file (file.txt) like,
=====================
Initial Value
1 2 3
3 4 5
5 6 7
Middle Value <---From Here
3 5 6
5 8 8
6 9 8 <---To Here
Last Value
5 8 7
6 8 7
5 5 7
==================
I want to modify the section of the file only in "Middle Value" and write an output file
after modifying.
I think that if I use "if and for" statements, that might be solved.
I have thought a code like
with open('file.txt') as f, open('out.txt', 'w') as f2:
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Middle':
"Do something until line == 'Last'"
I am stuck with "Do something until line == 'Last'" part.
Any comments are appreciated.
Thanks.
There are three basic approaches.
The first is to use a state machine. You could build a real state machine, but in this case the states and transitions are so trivial that it's simpler to fake it by just using a flag:
state = 0
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if state == 0:
if line == 'Middle\n':
state = 1
elif state == 1:
if line == 'Last\n':
state = 2
else:
# Thing you do until line == 'Last\n'
else:
# nothing to do after Last, so you could leave it out
Note that I checked for 'Middle\n', not 'Middle'. If you look at the way you build line above, there's no way it could match the latter, because you always add '\n'. But also note than in your sample data, the line is 'Middle Value\n', not 'Middle', so if that's true in your real data, you have to deal with that here. Whether that's line == 'Middle Value\n', line.startswith('Middle'), or something else depends on your actual data, which only you know about.
Alternatively, you can just break it into loops:
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Middle\n':
break
for line in f:
sp1 = line.split()
line = " ".join(sp1) + '\n'
if line == 'Last\n':
break
else:
# Thing you do until line == 'Last\n'
for line in f:
# Nothing to do here, so you could leave the loop out
There are variations on this one as well. For example:
lines = (" ".join(line.split()) + '\n' for line in f)
lines = dropwhile(lambda line: line != 'Middle', lines)
middle = takewhile(lambda line: line != 'End', lines)
for line in middle:
# Thing you want to do
Finally, you can split up the file before turning it into lines, instead of after. This is harder to do iteratively, so let's just read the whole file into memory to show the idea:
contents = f.read()
_, _, rest = contents.partition('\nMiddle\n')
middle, _, _ = rest.partition('\nEnd')
for line in middle.splitlines():
# Thing you want to do
If reading the whole file into memory wastes too much space or takes too long before you get going, mmap is your friend.
I would just code the process as a simple FSM (a Finite-State Machine or more specifically an event-driven Finite-state machine):
with open('file.txt') as f, open('out.txt', 'w') as f2:
state = 1
for line in f:
if line == 'Middle Value\n':
state = 2
continue # unless there's something to do upon entering the state
elif line == 'Last Value\n': # might want to just test for blank line `\n'
state = 3
continue # unless there's something to do upon entering the state
# otherwise process to line based on the current value of "state"
if state == 1: # before 'Middle Value' has been seen
pass
elif state == 2: # after 'Middle Value' has been seen
pass
else: # after 'Last Value' (or a blank line after
pass # 'Middle Value') has been seen
Just replace the pass statements with whatever is appropriate to do at that point of reading the input file.
In your if line == 'Middle': you could flip a boolean flag that allows you to enter another if inMiddle and line !=last` statement where you can then modify your numbers
You can replace your for loop with this.
inMiddle = false
for line in f:
sp1 = line.split()
line = "".join(sp1) + '\n'
if line == 'Middle':
inMiddle = true
if inMiddle and line != 'Last':
#MODIFY YOUR NUMBERS HERE
elif line == 'Last':
inMiddle = false
Forgive me as I access files a bit differently
with open('file.txt') as f:
file_string = f.read()
middle_to_end = file_string.split('Middle Value\n')[-1]
just_middle = middle_to_end.split('Last Value\n')[0]
middle_lines = just_middle.splitlines()
for line in middle_lines:
do_something
Basically you are setting a flag to say you are 'in' the section'. Below I optionally set a different flag when finished. You could bail out when flag is 2 for example.
with open('file.txt') as f, open('out.txt', 'w') as f2:
section = 0;
for line in f:
if line.startswith("Middle"):
section = 1
elif line.startswith("Last"):
section = 2
if section == 1:
#collect digits and output to other file
f2.write(line)
elif section == 2:
#close file and break out
f.close()
f2.close()
break
else:
continue