Can´t compare two strings in python

Can´t compare two strings in python - python

I need to check if thre is a call for a function, I know it's not a txt file but when I read a line and try to print using type() it's says str and let me to print all the file correctly but for whatever reason the lines can't be compare and I don't know why, when I compile shows no error, the first line of the file is '#include <Arduino.h>' and the var found is false anyways
import os
def BeforeBuild():
found = False
with open(r"\src\OTAWEBAP.cpp", "r") as f:
for line in f:
print (line)
if(line == '#include <Arduino.h>'):
found = True;
if(not found):
raise Exception("OtaIni function not found, you need to use it to preserve OTA functions in your new deploy")
else:
print('function OtaIni was found')
f.close()
BeforeBuild()

Try replacing
if(line == '#include <Arduino.h>'):
found = True;
with
if line.strip() == '#include <Arduino.h>':
found = True
The strip() function removes ALL whitespace at the beginning and end of the line.
P. S. Try to remember that in Python, if conditions don't need to be in parenthesis and lines don't need semicolons at the end. Otherwise everyone will know that you're really a C programmer at heart.

The last character is \n if you change the line.
So this may work for you:-
import os
def BeforeBuild():
found = False
with open(r"\src\OTAWEBAP.cpp", "r") as f:
for line in f:
print (line)
if(line[:-1] == '#include <Arduino.h>'):
found = True;
if(not found):
raise Exception("OtaIni function not found, you need to use it to preserve OTA functions in your new deploy")
else:
print('function OtaIni was found')
f.close()
BeforeBuild()

Caution should be taken while comparing strings. In this case a whitespace character causing this issue. There are a lot of whitespace characters that could not be seen but may present. So, a good practice while working with this kind of files is to remove these whitespace characters.
You can use strip() to remove whitespace characters from both end of a string.
import os
def BeforeBuild():
found = False
with open(r"\src\OTAWEBAP.cpp", "r") as f:
for line in f:
line = line.strip();
print (line)
if line == '#include <Arduino.h>':
found = True;
if not found:
raise Exception("OtaIni function not found, you need to use it to preserve OTA functions in your new deploy")
else:
print('function OtaIni was found')
f.close()
BeforeBuild()

Related

Replacement for isAlpha() to include underscores?

I am processing data using Python3 and I need to read a results file that looks like this:
ENERGY_BOUNDS
1.964033E+07 1.733253E+07 1.491825E+07 1.384031E+07 1.161834E+07 1.000000E+07 8.187308E+06 6.703200E+06
6.065307E+06 5.488116E+06 4.493290E+06 3.678794E+06 3.011942E+06 2.465970E+06 2.231302E+06 2.018965E+06
GAMMA_INTERFACE
0
EIGENVALUE
1.219034E+00
I want to search the file for a specific identifier (in this case ENERGY_BOUNDS), begin reading the numeric values after this identifier but not the identifier itself, and stop when I reach the next identifier. However, my problem is that I was using isAlpha to find the next identifier, and some of them contain underscores. Here is my code:
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as read_obj:
list_of_results = []
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(read_obj)
while(not nextValue.strip().isalpha()): # Keep on reading until next identifier appears
list_of_results.extend(nextValue.split())
nextValue = next(read_obj)
return(list_of_results)
I think I need to use regex, but I am stuck regarding how to phrase it. Any help would be much appreciated!

take = False
with open('path/to/input') as infile:
for line in input:
if line.strip() == "ENERGY_BOUNDS":
take = True
continue # we don't actually want this line
if all(char.isalpha() or char=="_" for char in line.strip()): # we've hit the next section
take = False
if take:
print(line) # or whatever else you want to do with this line

Here's an option for you.
Just iterate over the file until you hit the identifier.
Then iterate over it in another for loop until the next identifier causes a ValueError.
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as f:
list_of_results = []
for line in f:
if identifier in line:
break
for line in f:
try:
list_of_results.extend(map(float, line.split()))
except ValueError:
break
return list_of_results

You can use this regex: ^[A-Z]+(?:_[A-Z]+)*$
Additionally, you can modify the regex to match strings of custom length, like this: ^[A-Z]{2,10}+(?:_[A-Z]+)*$, where {2, 10} is {MIN, MAX} length:
You can find this demo here: https://regex101.com/r/9jESAH/35
See this answer for more details.

Here is a simple function to verify a string has alpha, uppercase and lowercase and underscore:
RE_PY_VAR_NAME="^[a-zA-Z_]+$"
def isAlphaUscore(s:str) -> bool:
assert not s is None, "s cannot be None"
return re.search(RE_PY_VAR_NAME, s)

Return value in a quite nested for-loop

I want nested loops to test whether all elements match the condition and then to return True. Example:
There's a given text file: file.txt, which includes lines of this pattern:
aaa:bb3:3
fff:cc3:4
Letters, colon, alphanumeric, colon, integer, newline.
Generally, I want to test whether all lines matches this pattern. However, in this function I would like to check whether the first column includes only letters.
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
However, the function returns False even if all characters in the first column are letters. I would like to ask you for the explanation, too.

Your code evaluates the empty line after your code - hence False :
Your file contains a newline after its last line, hence your code checks the line after your last data which does not fullfill your test- that is why you get False no matter the input:
aaa:bb3:3
fff:cc3:4
empty line that does not start with only letters
You can fix it if you "spezial treat" empty lines if they occur at the end. If you have an empty line in between filled ones you return False as well:
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
import string
def opener(file):
letters = string.ascii_letters
# Opens a file and creates a list of lines
with open(file) as fi:
res = True
empty_line_found = False
for i in fi:
if i.strip(): # only check line if not empty
if empty_line_found: # we had an empty line and now a filled line: error
return False
#Checks whether any characters in the first column is not a letter
if any(j not in letters for j in i.strip().split(':')[0]):
return False # immediately exit - no need to test the rest of the file
else:
empty_line_found = True
return res # or True
print (opener("t.txt"))
Output:
True
If you use
# example with a file that contains an empty line between data lines - NOT ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
fff:cc3:4
""")
or
# example for file that contains empty line after data - which is ok
with open("t.txt","w") as f:
f.write("""aaa:bb3:3
ff2f:cc3:4
""")
you get: False

Colonoscopy
ASCII, and UNICODE, both define character 0x3A as COLON. This character looks like two dots, one over the other: :
ASCII, and UNICODE, both define character 0x3B as SEMICOLON. This character looks like a dot over a comma: ;
You were consistent in your use of the colon in your example: fff:cc3:4 and you were consistent in your use of the word semicolon in your descriptive text: Letters, semicolon, alphanumeric, semicolon, integer, newline.
I'm going to assume you meant colon (':') since that is the character you typed. If not, you should change it to a semicolon (';') everywhere necessary.
Your Code
Here is your code, for reference:
def opener(file):
#Opens a file and creates a list of lines
fi=open(file).read().splitlines()
import string
res = True
for i in fi:
#Checks whether any characters in the first column is not a letter
if any(j not in string.ascii_letters for j in i.split(':')[0]):
res = False
else:
continue
return res
Your Problem
The problem you asked about was the function always returning false. The example you gave included a blank line between the first example and the second. I would caution you to watch out for spaces or tabs in those blank lines. You can fix this by explicitly catching blank lines and skipping over them:
for i in fi:
if i.isspace():
# skip blank lines
continue
Some Other Problems
Now here are some other things you might not have noticed:
You provided a nice comment in your function. That should have been a docstring:
def opener(file):
""" Opens a file and creates a list of lines.
"""
You import string in the middle of your function. Don't do that. Move the import
up to the top of the module:
import string # at top of file
def opener(file): # Not at top of file
You opened the file with open() and never closed it. This is exactly why the with keyword was added to python:
with open(file) as infile:
fi = infile.read().splitlines()
You opened the file, read its entire contents into memory, then split it into lines
discarding the newlines at the end. All so that you could split it by colons and ignore
everything but the first field.
It would have been simpler to just call readlines() on the file:
with open(file) as infile:
fi = infile.readlines()
res = True
for i in fi:
It would have been even easier and even simpler to just iterate on the file directly:
with open(file) as infile:
res = True
for i in infile:
It seems like you are building up towards checking the entire format you gave at the beginning. I suspect a regular expression would be (1) easier to write and maintain; (2) easier to understand later; and (3) faster to execute. Both now, for this simple case, and later when you have more rules in place:
import logging
import re
bad_lines = 0
for line in infile:
if line.isspace():
continue
if re.match(valid_line, line):
continue
logging.warn(f"Bad line: {line}")
bad_lines += 1
return bad_lines == 0
Your names are bad. Your function includes the names file, fi, i, j, and res. The only one that barely makes sense is file.
Considering that you are asking people to read your code and help you find a problem, please, please use better names. If you just replaced those names with file (same), infile, line, ch, and result the code gets more readable. If you restructured the code using standard Python best practices, like with, it gets even more readable. (And has fewer bugs!)

Outputting line of string search in python

Im new to python and Im trying to search a text file for a particular string, then output the whole line which contains that string. However, I want to do this as two separate files. Main file contains the following code;
def searchNoCase():
f = open('text.txt')
for line in f:
if searchWord() in f:
print(line)
else:
print("No result")
f.close()
def searchWord(stuff):
word=stuff
return word
File 2 contains the following code
import main
def bla():
main.searchWord("he")
Im sure this is a simple fix but I cant seem to figure it out. Help would be greatly appreciated

I don't use Python 3 so I need to check exactly what changed with __init__.py but in the meantime, create an empty script with that name in the same directory as the following files.
I've tried to cover a few different topics for you to read up on. For example, the exception handler is basically useless here because input (in Python 3) always returns a string but it's something you would have to worry about.
This is main.py
def search_file(search_word):
# Check we have a string input, otherwise converting to lowercase fails
try:
search_word = search_word.lower()
except AttributeError as e:
print(e)
# Now break out of the function early and give nothing back
return None
# If we didn't fail, the function will keep going
# Use a context manager (with) to open files. It will close them automatically
# once you get out of its block
with open('test.txt', 'r') as infile:
for line in infile:
# Break sentences into words
words = line.split()
# List comprehention to convert them to lowercase
words = [item.lower() for item in words]
if search_word in words:
return line
# If we found the word, we would again have broken out of the function by this point
# and returned that line
return None
This is file1.py
import main
def ask_for_input():
search_term = input('Pick a word: ') # use 'raw_input' in Python 2
check_if_it_exists = main.search_file(search_term)
if check_if_it_exists:
# If our function didn't return None then this is considered True
print(check_if_it_exists)
else:
print('Word not found')
ask_for_input()

Match an element of every line

I have a list of rules for a given input file for my function. If any of them are violated in the file given, I want my program to return an error message and quit.
Every gene in the file should be on the same chromosome
Thus for a lines such as:
NM_001003443 chr11 + 5997152 5927598 5921052 5926098 1 5928752,5925972, 5927204,5396098,
NM_001003444 chr11 + 5925152 5926098 5925152 5926098 2 5925152,5925652, 5925404,5926098,
NM_001003489 chr11 + 5925145 5926093 5925115 5926045 4 5925151,5925762, 5987404,5908098,
etc.
Each line in the file will be variations of this line
Thus, I want to make sure every line in the file is on chr11
Yet I may be given a file with a different list of chr(and any number of numbers). Thus I want to write a function that will make sure whatever number is found on chr in the line is the same for every line.
Should I use a regular expression for this, or what should I do? This is in python by the way.
Such as: chr\d+ ?
I am unsure how to make sure that whatever is matched is the same in every line though...
I currently have:
from re import *
for line in file:
r = 'chr\d+'
i = search(r, line)
if i in line:
but I don't know how to make sure it is the same in every line...
In reference to sajattack's answer
fp = open(infile, 'r')
for line in fp:
filestring = ''
filestring +=line
chrlist = search('chr\d+', filestring)
chrlist = chrlist.group()
for chr in chrlist:
if chr != chrlist[0]:
print('Every gene in file not on same chromosome')

Just read the file and have a while loop check each line to make sure it contains chr11. There are string functions to search for substrings in a string. As soon as you find a line that returns false (does not contain chr11) then break out of the loop and set a flag valid = false.
import re
fp = open(infile, 'r')
fp.readline()
tar = re.findall(r'chr\d+', fp.readline())[0]
for line in fp:
if (line.find(tar) == -1):
print("Not valid")
break
This should search for a number in the line and check for validity.

Is it safe to assume that the first chr is the correct one? If so, use this:
import re
chrlist = re.findall("chr[0-9]+", open('file').read())
# ^ this is a list with all chr(whatever numbers)
for chr in chrlist:
if chr != chrlist[0]
print("Chr does not match")
break

My solution uses a "match group" to collect the matched numbers from the "chr" string.
import re
pat = re.compile(r'\schr(\d+)\s')
def chr_val(line):
m = re.search(pat, line)
if m is not None:
return m.group(1)
else:
return ''
def is_valid(f):
line = f.readline()
v = chr_val(line)
if not v:
return False
return all(chr_val(line) == v for line in f)
with open("test.txt", "r") as f:
print("The file is {0}".format("valid" if is_valid(f) else "NOT valid"))
Notes:
Pre-compiles the regular expression for speed.
Uses a raw string (r'') to specify the regular expression.
The pattern requires white space (\s) on either side of the chr string.
is_valid() returns False if the first line doesn't have a good chr value. Then it returns a Boolean value that is true if all of the following lines match the chr value of the first line.
Your sample code just prints something like The file is True so I made it a bit friendlier.

Printing Lines in a File

So here's the problem I have,
I can find the Search Term in my file but at the moment I can only print out the line that the Search Term is in. (Thanks to Questions posted by people earlier =)). But I cannot print out all the lines to the end of the file after the Search Term. Here is the coding I have so far:-
search_term = r'\b%s\b' % search_term
for line in open(f, 'r'):
if re.match(search_term, line):
print line,
Thanks in advance!

It can be much improved if you first compile the regex:
search_term_regex = re.compile(r'\b%s\b' % search_term)
found = False
for line in open(f):
if not found:
found = bool(search_term_regex.findall(line))
if found:
print line,
Then you're not repeating the print line.

You could set a boolean flag, e.g. "found = True";
and do a check for found==True, and if so print the line.
Code below:
search_term = r'\b%s\b' % search_term
found = False;
for line in open(f, 'r'):
if found==True:
print line,
elif re.match(search_term, line):
found = True;
print line,
To explain this a bit: With the boolean flag you are adding some state to your code to modify its functionality. What you want your code to do is dependent on whether you have found a certain line of text in your file or not, so the best way to represent such a binary state (have I found the line or not found it?) is with a boolean variable like this, and then have the code do different things depending on the value of the variable.
Also, the elif is just a shortening of else if.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can´t compare two strings in python - python

Related

Replacement for isAlpha() to include underscores?

Return value in a quite nested for-loop

Outputting line of string search in python

Match an element of every line

Printing Lines in a File

Categories

Resources