I need to use or create a comparison function in Python, perhaps it already exists a way of doing this?
I need to compare a string with a value in a list, and I need to make a match even if it's a couple of characters off. I'll make an example so you can see what I mean.
Example 1:
Value in list : Name: This is the title
Value in search : Name This is the title
Example 2:
Value in list : Name and shortening m.m.
Value in search : Name and shortening m.m
As you can see the values I want to compare and need to match are very similar. The values in the search are folder names so they are a bit different because they contain illegal characters.
Maybe the easiest way to accomplish this is to remove none allowed characters from the string before making the comparison \/:*?"<>| and any trailing points.
Any tips on what's the most efficient way of comparing the strings and get a match is?
Edit: Is this an ugly way of doing it?
def Cleanup(Str):
Illegal = ['\\','/',':','*','?','"','<','>','|']
return ''.join([char for char in Str if char not in Illegal]).rstrip('.')
I'm sure there's a better way to do this, but here's my crack at it
import string
a = "Name: This is the title"
b = "Name This is the title"
# remove punctuation and make all lower-case
def strippedAndLowered(myString):
return "".join(i for i in myString if i not in string.punctuation).lower()
strippedAndLowered(a) == strippedAndLowered(b) # returns True
Use the following code to strip the punctuation, and then compare them:
def search(lst, item):
for i in lst:
i = ''.join([char for char in i if char.isalpha() or char == ' '])
if item == i:
return True
return False
The translate function should be faster:
item = "Name: This is the title"
search = "Name This is the title"
illegal = r'\/:*?"<>|'
def compare(s1, s2):
return s1.translate(None, illegal) == s2.translate(None, illegal)
print compare(search, item)
Gives:
True
And if you are really worried about the performance, and have many comparisons, you can cache the translated versions in a dictionary.
Related
Sorry for asking a fairly simple question, but how would I make it so that if a certain letter/symbol/number would "activate" an if statement when the letter (for example "a") is in the element. If I make
thestring = ["a"]
(And then add the if statement ^^)
Then it would work but if I add another letter or number next to it:
thestring = ["ab"]
(And then add the if statement ^^)
It wouldn't work anymore. So essentially, how can I fix this?
Sample Code:
thestring = ["fvrbgea"]
if "a" in thestring:
print("True")
Output: (Nothing here since there's not an else code I guess.)
As Klaus D. alluded to, your string is a list of one element ("fvrbgea"). That one element is not equal to the string "a". If you remove the brackets from thestring your code will execute the way you are expecting to.
if you want to just validate if a string contains letters/numbers/symbols you should use regex:
import re
string = "mystring"
#Validate numbers, lowercase letters, dashes
if re.match("^[0-9a-z-]+$", string):
print("True.")
else:
print("False.")
thestring = ["fvrbgea"]
for el in thestring:
if el.isnumeric():
print("Numeric")
elif el.isalnum():
print("AlNum")
# elif el.isPutSomethingHere
The python basic library have the "is" method to check the string.
Just put variable.isSomethingHere to see.
Example
And if you want to check if a specify letter is in the string, just follow the same logic, and in certain cases you will ned put another for in the initial loop
thestring = ["fvrbdoggea"]
control = list()
for el in thestring:
for c in el:
if "".join(control) != "dog":
if c == "d" or c == "o" or c == "g":
control.append(c)
print("".join(control))
you can automatize that, but, it's a bit hardeful and i don't wanna do, but with this you can take an idea
I am trying to write a CLI for generating python classes. Part of this requires validating the identifiers provided in user input, and for python this requires making sure that identifiers conform to the pep8 best practices/standards for identifiers- classes with CapsCases, fields with all_lowercase_with_underscores, packages and modules with so on so fourth-
# it is easy to correct when there is a identifier
# with underscores or whitespace and correcting for a class
def package_correct_convention(item):
return item.strip().lower().replace(" ","").replace("_","")
But when there is no whitespaces or underscores between tokens, I'm not sure how to how to correctly capitalize the first letter of each word in an identifier. Is it possible to implement something like that without using AI or something like that:
say for example:
# providing "ClassA" returns "classa" because there is no delimiter between "class" and "a"
def class_correct_convention(item):
if item.count(" ") or item.count("_"):
# checking whether space or underscore was used as word delimiter.
if item.count(" ") > item.count("_"):
item = item.split(" ")
elif item.count(" ") < item.count("_"):
item = item.split("_")
item = list(map(lambda x: x.title(), item))
return ("".join(item)).replace("_", "").replace(" ","")
# if there is no white space, best we can do it capitalize first letter
return item[0].upper() + item[1:]
Well, with AI-based approach it will be difficult, not perfect, a lot of work. If it does not worth it, there is maybe simpler and certainly comparably efficient.
I understand the worst scenario is "todelineatewordsinastringlikethat".
I would recommend you to download a text file for english language, one word by line, and to proceed this way:
import re
string = "todelineatewordsinastringlikethat"
#with open("mydic.dat", "r") as msg:
# lst = msg.read().splitlines()
lst = ['to','string','in'] #Let's say the dict contains 3 words
lst = sorted(lst, key=len, reverse = True)
replaced = []
for elem in lst:
if elem in string: #Very fast
replaced_str = " ".join(replaced) #Faster to check elem in a string than elem in a list
capitalized = elem[0].upper()+elem[1:] #Prepare your capitalized word
if elem not in replaced_str: #Check if elem could be a substring of something you replaced already
string = re.sub(elem,capitalized,string)
elif elem in replaced_str: #If elem is a sub of something you replaced, you'll protect
protect_replaced = [item for item in replaced if elem in item] #Get the list of replaced items containing the substring elem
for protect in protect_replaced: #Uppercase the whole word to protect, as we do a case sensitive re.sub()
string = re.sub(protect,protect.upper(),string)
string = re.sub(elem,capitalized,string)
for protect in protect_replaced: #Deprotect by doing the reverse, full uppercase to capitalized
string = re.sub(protect.upper(),protect,string)
replaced.append(capitalized) #Append replaced element in the list
print (string)
Output:
TodelIneatewordsInaStringlikethat
#You see that String has been protected but not delIneate, cause it was not in our dict.
This is certainly not optimal, but will perform certainly comparably to AI for a problem which would certainly not be presented as it is for AI anyway (input prep are very important in AI).
Note it is important to reverse sort the list of words. Cause you want to detect full string words first, not sub. Like in beforehand you want the full one, not before or and.
First of all I know that out there are many similar posts which I have listed below, however, none to my knowledge answer the problem I'm facing this is because all that I have found are asking how to search for a string in 'dict.values()' and not to search every single character in the string and check whether it is in the 'dict.values()' and if it has found any characters in the string that appear in the characters of 'dict.values()' it will return which and how many.
Links to similar posts which don't answer the question but could be useful to some:
How to search if dictionary value contains certain string with Python
Find dictionary items whose key matches a substring
How can I check if the characters in a string are in a dictionary of values?
How to search if dictionary value contains certain string with Python
This is what I have so far but don't seem to work at all...
characters = {'small':'abcdefghijklmnopqrstuvwxyz',
'big':'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'nums':'0123456789',
'special':"!#$%&()*+,-./:;<=>?#[\]^_{|}~",}
password = 'aAb'
def count(pass_word,char_set):
num_of_char = 0
char_list = []
for char in pass_word:
if i in char_set.values():
num_of_char +=1
char_list += i
return char_list, num_of_char
#Print result
print(count(password,characters))
The output should be something similar to:
'a','A','b'
3
Hopefully, you understand what I mean and if anything unclear please comment so that I can improve it.
def count(password, char_dict):
sanitized_pass = [char for char in password if any(char in v for v in char_dict.values())]
return sanitized_pass, len(sanitized_pass)
Here's one way. Another would be to build a set of all the acceptable c
characters and pass that to the function with the password
from itertools import chain
char_set = set(chain.from_iterable(characters.values()))
def count(password, chars):
sanitized_pass = [char for char in password if char in char_set]
return sanitized_pass, len(sanitized_pass)
I'm trying to find way to parse string that can contain variable, function, list, or dict written in python syntax separated with ",". Whitespace should be usable anywhere, so split with "," when its not inside (), [] or {}.
Example string: "variable, function1(1,3), function2([1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': "dict_item_1"}"
Another example string: "variable,function1(1, 3) , function2( [1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': "dict_item_1"}"
Example output ["variable", "function1(1,3)", "function2([1,3],2)", "['list_item_1','list_item_2']", "{'dict_key_1': "dict_item_1"}"]
edit:
Reason for the code is to parse string an then run it with exec("var = &s" % list[x]). (yes i know this might not be recommended way to do stuff)
I guess the main problem here is that the arrays and dicts also have commas in them, so just using str.split(",") wouldn't work. One way of doing it is to parse the string one character at a time, and keep track of whether all brackets are closed. If they are, we can append the current result to an array when we come across a comma. Here's my attempt:
s = "variable, function1(1,3),function2([1,3],2),['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"
tokens = []
current = ""
open_brackets = 0
for char in s:
current += char
if char in "({[":
open_brackets += 1
elif char in ")}]":
open_brackets -= 1
elif (char == ",") and (open_brackets == 0):
tokens.append(current[:-1].strip())
current = ""
tokens.append(current)
for t in tokens:
print(t)
"""
variable
function1(1,3)
function2([1,3],2)
['list_item_1','list_item_2']
{'dict_key_1': 'dict_item_1'}
"""
Regular expressions aren't very good for parsing the complexity of arbitrary code. What exactly are you trying to accomplish? You can (unsafely) use eval to just evaluate the string as code. Or if you're trying to understand it without evaling it, you can use the ast or dis modules for various forms of inspection.
Have you tried using split?
>>> teststring = "variable, function1(1,3), function2([1,3],2), ['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"
>>> teststring.split(", ")
['variable', 'function1(1,3)', 'function2([1,3],2)', "['list_item_1','list_item_2'],{'dict_key_1': 'dict_item_1'}"]
I've got a small problem with finding part of string in a list with python.
I load the string from a file and the value is one of the following: (none, 1 from list, 2 from list, 3 from list or more...)
I need to perform different actions depending on whether the String equals "", the String equals 1 element from list, or if the String is for 2 or more elements. For Example:
List = [ 'Aaron', 'Albert', 'Arcady', 'Leo', 'John' ... ]
String = "" #this is just example
String = "Aaron" #this is just example
String = "AaronAlbert" #this is just example
String = "LeoJohnAaron" #this is just example
I created something like this:
if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif ... #dont know what now
The best way would be to split this String with a pattern from a list. I was trying:
String.Find(x) #failed.
I tried to find similar posts but couldn't.
if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif len([1 for x in List if x in String]) == 2
...
This is called a list comprehension, it will go through the list and find all of the list elements that have a substring in common with the string at hand, then return the length of that.
Note that there may be some issues if you have a name like "Ann" and "Anna", the name "Anna" in the string will get counted twice. If you need a solution that accounts for that, I would suggest splitting on capital letters to explicitly separate the list into separate names by splitting on capital letters (If you want I can update this solution to show how to do that with regex)
I think the most straightforward approach would be to loop over the list of names and for each of them check if its in your string.
for name in List:
if name in String:
print("do something here")
So, you want to find whether some string contains any members of the given list.
Iterate over the list and check whether the string contains the current item:
for data in List:
if data in String:
print("Found it!")