Ideas to improve language detection between Spanish and Catalan - python
I'm working on a text mining script in python. I need to detect the language of a natural language field from the dataset.
The thing is, 98% of the rows are in Spanish and Catalan. I tried using some algorithms like the stopwords one or the langdetect library, but these languages share a lot of words so they fail a lot.
I'm looking for some ideas to improve this algorithm.
One thought is, make a dictionary with some words that are specific to Spanish and Catalan, so if one text has any of these words, it's tagged as that language.
Approach 1: Distinguishing characters
Spanish and Catalan (note: there will be exceptions for proper names and loanwords e.g. Barça):
esp_chars = "ñÑáÁýÝ"
cat_chars = "çÇàÀèÈòÒ·ŀĿ"
Example:
sample_texts = ["El año que es abundante de poesía, suele serlo de hambre.",
"Cal no abandonar mai ni la tasca ni l'esperança."]
for text in sample_texts:
if any(char in text for char in esp_chars):
print("Spanish: {}".format(text))
elif any(char in text for char in cat_chars):
print("Catalan: {}".format(text))
>>> Spanish: El año que es abundante de poesía, suele serlo de hambre.
Catalan: Cal no abandonar mai ni la tasca ni l'esperança.
If this isn't sufficient, you could expand this logic to search for language exclusive digraphs, letter combinations, or words:
Spanish only
Catalan only
Words
como y su con él otro
com i seva amb ell altre
Initial digraphs
d' l'
Digraphs
ss tj qü l·l l.l
Terminal digraphs
ig
Catalan letter combinations that only marginally appear in Spanish
tx
tg (Es. exceptions postgrado, postgraduado, postguerra)
ny (Es. exceptions mostly prefixed in-, en-, con- + y-)
ll (terminal) (Es. exceptions (loanwords): detall, nomparell)
Approach 2: googletrans library
You could also use the googletrans library to detect the language:
from googletrans import Translator
translator = Translator()
for text in sample_texts:
lang = translator.detect(text).lang
print(lang, ":", text)
>>> es : El año que es abundante de poesía, suele serlo de hambre.
ca : Cal no abandonar mai ni la tasca ni l'esperança.
DicCat = ['amb','cap','dalt','damunt','des','dintre','durant','excepte','fins','per','pro','sense','sota','llei','hi','ha','més','mes','moment','órgans', 'segóns','Article','i','per','els','amb','és','com','dels','més','seu','seva','fou','també','però','als','després','aquest','fins','any','són','hi','pel','aquesta','durant','on','part','altres','anys','ciutat','cap','des','seus','tot','estat','qual','segle','quan','ja','havia','molt','rei','nom','fer','així','li','sant','encara','pels','seves','té','partit','està','mateix','pot','nord','temps','fill','només','dues','sota','lloc','això','alguns','govern','uns','aquests','mort','nou','tots','fet','sense','frança','grup','tant','terme','fa','tenir','segons','món','regne','exèrcit','segona','abans','mentre','quals','aquestes','família','catalunya','eren','poden','diferents','nova','molts','església','major','club','estats','seua','diversos','grans','què','arribar','troba','població','poble','foren','època','haver','eleccions','diverses','tipus','riu','dia','quatre','poc','regió','exemple','batalla','altre','espanya','joan','actualment','tenen','dins','llavors','centre','algunes','important','altra','terra','antic','tenia','obres','estava','pare','qui','ara','havien','començar','història','morir','majoria','qui','ara','havien','començar','història','morir','majoria']
DicEsp = ['los','y','bajo','con', 'entre','hacia','hasta','para','por','según','segun','sin','tras','más','mas','ley','capítulo','capitulo','título','titulo','momento','y','las','por','con','su','para','lo','como','más','pero','sus','le','me','sin','este','ya','cuando','todo','esta','son','también','fue','había','muy','años','hasta','desde','está','mi','porque','qué','sólo','yo','hay','vez','puede','todos','así','nos','ni','parte','tiene','él','uno','donde','bien','tiempo','mismo','ese','ahora','otro','después','te','otros','aunque','esa','eso','hace','otra','gobierno','tan','durante','siempre','día','tanto','ella','sí','dijo','sido','según','menos','año','antes','estado','sino','caso','nada','hacer','estaba','poco','estos','presidente','mayor','ante','unos','algo','hacia','casa','ellos','ayer','hecho','mucho','mientras','además','quien','momento','millones','esto','españa','hombre','están','pues','hoy','lugar','madrid','trabajo','otras','mejor','nuevo','decir','algunos','entonces','todas','días','debe','política','cómo','casi','toda','tal','luego','pasado','medio','estas','sea','tenía','nunca','aquí','ver','veces','embargo','partido','personas','grupo','cuenta','pueden','tienen','misma','nueva','cual','fueron','mujer','frente','josé','tras','cosas','fin','ciudad','he','social','tener','será','historia','muchos','juan','tipo','cuatro','dentro','nuestro','punto','dice','ello','cualquier','noche','aún','agua','parece','haber','situación','fuera','bajo','grandes','nuestra','ejemplo','acuerdo','habían','usted','estados','hizo','nadie','países','horas','posible','tarde','ley','importante','desarrollo','proceso','realidad','sentido','lado','mí','tu','cambio','allí','mano','eran','estar','san','número','sociedad','unas','centro','padre','gente','relación','cuerpo','incluso','través','último','madre','mis','modo','problema','cinco','carlos','hombres','información','ojos','muerte','nombre','algunas','público','mujeres','siglo','todavía','meses','mañana','esos','nosotros','hora','muchas','pueblo','alguna','dar','don','da','tú','derecho','verdad','maría','unidos','podría','sería','junto','cabeza','aquel','luis','cuanto','tierra','equipo','segundo','director','dicho','cierto','casos','manos','nivel','podía','familia','largo','falta','llegar','propio','ministro','cosa','primero','seguridad','hemos','mal','trata','algún','tuvo','respecto','semana','varios','real','sé','voz','paso','señor','mil','quienes','proyecto','mercado','mayoría','luz','claro','iba','éste','pesetas','orden','español','buena','quiere','aquella','programa','palabras','internacional','esas','segunda','empresa','puesto','ahí','propia','libro','igual','político','persona','últimos','ellas','total','creo','tengo','dios','española','condiciones','méxico','fuerza','solo','único','acción','amor','policía','puerta','pesar','sabe','calle','interior','tampoco','ningún','vista','campo','buen','hubiera','saber','obras','razón','niños','presencia','tema','dinero','comisión','antonio','servicio','hijo','última','ciento','estoy','hablar','dio','minutos','producción','camino','seis','quién','fondo','dirección','papel','demás','idea','especial','diferentes','dado','base','capital','ambos','europa','libertad','relaciones','espacio','medios','ir','actual','población','empresas','estudio','salud','servicios','haya','principio','siendo','cultura','anterior','alto','media','mediante','primeros','arte','paz','sector','imagen','medida','deben','datos','consejo','personal','interés','julio','grupos','miembros','ninguna','existe','cara','edad','movimiento','visto','llegó','puntos','actividad','bueno','uso','niño','difícil','joven','futuro','aquellos','mes','pronto','soy','hacía','nuevos','nuestros','estaban','posibilidad','sigue','cerca','resultados','educación','atención','gonzález','capacidad','efecto','necesario','valor','aire','investigación','siguiente','figura','central','comunidad','necesidad','serie','organizació','nuevas','calidad']
DicEng = ['all','my','have','do','and', 'or', 'what', 'can', 'you', 'the', 'on', 'it', 'at', 'since', 'for', 'ago', 'before', 'past', 'by', 'next', 'from','with', 'wich','law','is','the','of','and','to','in','is','you','that','it','he','was','for','on','are','as','with','his','they','at','be','this','have','from','or','one','had','by','word','but','not','what','all','were','we','when','your','can','said','there','use','an','each','which','she','do','how','their','if','will','up','other','about','out','many','then','them','these','so','some','her','would','make','like','him','into','time','has','look','two','more','write','go','see','number','no','way','could','people','my','than','first','water','been','call','who','oil','its','now','find','long','down','day','did','get','come','made','may','part','may','part']
def WhichLanguage(text):
Input = text.lower().split(" ")
CatScore = []
EspScore = []
EngScore = []
for e in Input:
if e in DicCat:
CatScore.append(e)
if e in DicEsp:
EspScore.append(e)
if e in DicEng:
EngScore.append(e)
if(len(EngScore) > len(EspScore)) and (len(EngScore) > len(CatScore)):
Language ='English'
else:
if(len(CatScore) > len(EspScore)):
Language ='Catala'
else:
Language ='Espanyol'
print(text)
print("ESP= ",len(EspScore),EspScore)
print("Cat = ",len(CatScore), CatScore)
print("ING= ",len(EngScore),EngScore)
print( 'Language is =', Language)
print("-----")
return(Language)
print(WhichLanguage("Hola bon dia"))
Related
Count occurrences of words in a text with special characters
I want to count occurrences of each word in a text to spot the key words coming over the most. This script works quite well but the problem is that this text is written in French. So there are important key words that would be missed. For example the word Europe may appear in the text like l'Europe or en Europe. In the first case, the code will remove the apostrophe and l'Europe is considered as one unique word leurope in the final result. How can I improve the code to split l' from Europe? import string # Open the file in read mode #text = open("debat.txt", "r") text = ["Monsieur Mitterrand, vous avez parlé une minute et demie de moins que Monsieur Chirac dans cette première partie. Je préfère ne pas avoir parlé une minute et demie de plus pour dire des choses aussi irréelles et aussi injustes que celles qui viennent d'être énoncées. Si vous êtes d'accord, nous arrêtons cette première partie, nous arrêtons les chronomètres et nous repartons maintenant pour une seconde partie en parlant de l'Europe. Pour les téléspectateurs, M. Mitterrand a parlé 18 minutes 36 et M. Chirac, 19 minutes 56. Ce n'est pas un drame !... On vous a, messieurs, probablement jamais vus plus proches à la fois physiquement et peut-être politiquement que sur les affaires européennes... les Français vous ont vus, en effet, à la télévision, participer ensemble à des négociations, au coude à coude... voilà, au moins, un domaine dans lequel, sans aucun doute, vous connaissez fort bien, l'un et l'autre, les opinions de l'un et de l'autre. Nous avons envie de vous demander ce qui, aujourd'hui, au -plan européen, vous sépare et vous rapproche ?... et aussi lequel de vous deux a le plus évolué au cours des quelques années qui viennent de s'écouler ?... #"] # Create an empty dictionary d = dict() # Loop through each line of the file for line in text: # Remove the leading spaces and newline character line = line.strip() # Convert the characters in line to # lowercase to avoid case mismatch line = line.lower() # Remove the punctuation marks from the line line = line.translate(line.maketrans("", "", string.punctuation)) # Split the line into words words = line.split(" ") # Iterate over each word in line for word in words: # Check if the word is already in dictionary if word in d: # Increment count of word by 1 d[word] = d[word] + 1 else: # Add the word to dictionary with count 1 d[word] = 1 sorted_tuples = sorted(d.items(), key=lambda item: item[1], reverse=True) sorted_dict = {k: v for k, v in sorted_tuples} # Print the contents of dictionary for key in list(sorted_dict.keys()): print(key, ":", sorted_dict[key])
line = line.translate(line.maketrans("", "", string.punctuation)) removes all punctuation characters (l'Europe becomes lEurope). Instead of that, you may want to replace them by spaces, using for example: for p in string.punctuation: line = line.replace(p, ' ')
Where you currently have: line = line.translate(line.maketrans("", "", string.punctuation)) ... you can add the following line before it: line = line.translate(line.maketrans("'", " ")) This will replace the ' character with a space wherever it's found, and the line using string.punctuation will behave exactly as before, except that it will not encounter any ' characters since we have already replaced them.
How can I modify an "if condition" in order to apply it to different list at the same time?
I wrote a script to extract sentences in huge set which contains particular pattern. The problem lied in the fact that , for some patterns I checked the value of the attribute at the beginning or ending of the pattern to see if the word is present in a particular list. I have 4 dictionaries with 2 lists of positive and negative word. So far I wrote the script and I am able to use the function I wrote with one dictionary. I am thinking how can I improve the my function so that I can use it at the same time of the 4 dictionaries without duplicating the bloc which loop in the dictionary. I give an example with two dictionaries (since the script is quite long I make a small example with all the necessary element import spacy.attrs from spacy.attrs import POS import spacy from spacy import displacy from spacy.lang.fr import French from spacy.tokenizer import Tokenizer from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex from spacy.lemmatizer import Lemmatizer nlp = spacy.load("fr_core_news_md") from spacy.matcher import Matcher#LIST ##################### List of lexicon # Lexique Diko lexicon = open(os.path.join('/h/Ressources/Diko.txt'), 'r', encoding='utf-8') data = pd.read_csv(lexicon, sep=";", header=None) data.columns = ["id", "terme", "pol"] pol_diko_pos = data.loc[data.pol =='positive', 'terme'] liste_pos_D = list(pol_diko_pos) print(liste_pos[1]) pol_diko_neg = data.loc[data.pol =='negative', 'terme'] liste_neg_D = list(pol_diko_neg) #print(type(liste_neg)) # Lexique Polarimots lexicon_p = open(os.path.join('/h/Ressources/polarimots.txt'), 'r', encoding='utf-8') data_p = pd.read_csv(lexicon_p, sep="\t", header=None) #data.columns = ["terme", "pol", "pos", "degre"] data_p.columns = ["ind", "terme", "cat", "pol", "fiabilité"] pol_polarimot_pos = data_p.loc[data_p.pol =='POS', 'terme'] liste_pos_P = list(pol_polarimot_pos) print(liste_pos_P[1]) pol_polarimot_neg = data_p.loc[data_p.pol =='NEG', 'terme'] liste_neg_P = list(pol_polarimot_neg) #print(type(liste_neg)) # ############################# Lists sentence_not_extract_lexique_1 =[] #List of all sentences without the specified pattern sentence_extract_lexique_1 = [] #list of sentences which the pattern[0] is present in the first lexicon sentence_not_extract_lexique_2 =[] #List of all sentences without the specified pattern sentence_extract_lexique_2 = [] #list of sentences which the pattern[0] is present in the second lexicon list_token_pos = [] #list of the token found in the lexique list_token_neg = [] #list of the token found in the lexique list_token_not_found = [] #list of the token not found in the lexique #PATTERN pattern1 = [{"POS": {"IN": ["VERB", "AUX","ADV","NOUN","ADJ"]}}, {"IS_PUNCT": True, "OP": "*"}, {"LOWER": "mais"} ] pattern1_tup = (pattern1, 1, True) pattern3 = [{"LOWER": {"IN": ["très","trop"]}}, {"POS": {"IN": ["ADV","ADJ"]}}] pattern3_tup = (pattern3, 0, True) pattern4 = [{"POS": "ADV"}, # adverbe de négation {"POS": "PRON","OP": "*"}, {"POS": {"IN": ["VERB", "AUX"]}}, {"TEXT": {"IN": ["pas", "plus", "aucun", "aucunement", "point", "jamais", "nullement", "rien"]}},] pattern4_tup = (pattern4, None, False) #Tuple of pattern pattern_list_tup =[pattern1_tup, pattern3_tup, pattern4_tup] pattern_name = ['first', 'second', 'third', 'fourth'] length_of_list = len(pattern_list_tup) print('length', length_of_list) #index of the value of attribute to check in the lexicon value_of_attribute = [0,-1,-1] # List of lexicon to use lexique_1 = [lexique_neg, lexique_pos] lexique_2 = [lexique_2neg, lexique_2pos] # text (example of some sentences) file =b= ["Le film est superbe mais cette édition DVD est nulle !", "J'allais dire déplorable, mais je serais peut-être un peu trop extrême.", "Hélas, l'impression de violence, bien que très bien rendue, ne sauve pas cette histoire gothique moderne de la sécheresse scénaristique, le tout couvert d'un adultère dont le propos semble être gratuit, classique mais intéressant...", "Tout ça ne me donne pas envie d'utiliser un pieu mais plutôt d'aller au pieu (suis-je drôle).", "Oui biensur, il y a la superbe introduction des parapluies au debut, et puis lorsqu il sent des culs tout neufs et qu il s extase, j ai envie de faire la meme chose apres sur celui de ma voisine de palier (ma voisine de palier elle a un gros cul, mais j admets que je voudrais bien lui foute mon tarin), mais c est tout, apres c est un film tres noir, lent et qui te plonge dans le depression.", "Et bien hélas ce DVD ne m'a pas appris grand chose par rapport à la doc des agences de voyages et la petite dame qui fait ses dessins est bien gentille mais tout tourne un peu trop autour d'elle.", "Au final on passe de l'un a l'autre sans subtilité, et on n'arrive qu'à une caricature de plus : si Kechiche avait comme but initial déclaré de fustiger les préjugés, c'est le contraire qui ressort de ce ''film'' truffé de clichés très préjudiciables pour les quelques habitants de banlieue qui ne se reconnaîtront pas dans cette lourde farce.", "-ci écorche les mots, les notes... mais surtout nos oreilles !"] # Loop to check each sentence and extract the sentences with the specified pattern from above for pat in range(0, length_of_list): matcher = Matcher(nlp.vocab) matcher.add("matching_2", None, pattern_list_tup[pat][0]) # print(pat) # print(pattern_list_tup[pat][0]) for sent in file: doc =nlp(sent) matches= matcher(doc) for match_id, start, end in matches: span = doc[start:end].lemma_.split() #print(f"{pattern_name[pat]} pattern found: {span}") This is the part I want ot modify to use it for another dictionary, the goal is to able to retrieve sentences extract by 4 different dictionaries to make a comparison and then check which sentences are present in more than two list. # Condition to use the lexicon and extract the sentence if (pattern_list_tup[pat][2]): if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]]): if sent not in sentence_extract: sentence_extract_lexique_1.append(sent) if (pattern_list_tup[pat][1] == 1): list_token_pos.append(span[value_of_attribute[pat]]) if (pattern_list_tup[pat][1] == 0): list_token_neg.append(span[value_of_attribute[pat]]) else: list_token_not_found.append(span[value_of_attribute[pat]]) # the text form is not present in the lexicon need the lemma form sentence_not_extract_lexique_1.append(sent) else: if sent not in sentence_extract: sentence_extract_lexique_1.append(sent) print(len(sentence_extract)) print(sentence_extract) One solution I find is to duplicate the code abode and change the name of the list where the sentences are stored but since I have 2 dictionaries duplicating will make the code longer is there a way to combine the looping the 2 dictionaries (actually 4 dictionaries in the original) and append the result to the good list. So, for example, when I use lexique_1 , all the sentences extracted are send to "sentence_extract_lexique_1" and so on for the other.
In my opinion attempt using the if-elif-else chain. If not attempt only using the if-elif block simply because the elif statement catches the specific condition of interest. In which you're trying to catch a specific to compare and check with the sentences. Keep in mind if you try the if-elif-else chain its a good method, but it only works when you need one test to pass. Because Python finds one test to pass and it skips the rest. Its very efficient and allows you to test for one specific condition.
How to remove newline from end and beginning of every list element [python]
I've got the following code: from itemspecs import itemspecs x = itemspecs.split('###') res = [] for item in x: res.append(item.split('##')) print(res) This imports a string from another document. And gives me the following: [['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n', '\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlanders onderschept\n', '\nB\n', '\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'], ['\n2\nIn welk land ligt Upernavik?\n', '\nA. Antartica\nB. Canada\nC. Rusland\nD. Groenland\nE. Amerika\n', '\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224 inwoners.\n']] But now I want to remove all the \n from every end and beginning of every element in this list. How can I do this?
You can do that by stripping "\n" as follow a = "\nhello\n" stripped_a = a.strip("\n") so, what you need to do is iterate through the list and then apply the strip on the string as shown below res_1=[] for i in res: tmp=[] for j in i: tmp.append(j.strip("\n")) res_1.append(tmp) The above answer just removes \n from start and end. if you want to remove all the new lines in a string, just use .replace('\n"," ") as shown below res_1=[] for i in res: tmp=[] for j in i: tmp.append(j.replace("\n")) res_1.append(tmp)
strip() is easiest method in this case, but if you want to do any advanced text processing in the future, it isn't bad idea to learn regular expressions: import re from pprint import pprint l = [['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n', '\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept\n', '\nB\n', '\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'], ['\n2\nIn welk land ligt Upernavik?\n', '\nA. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika\n', '\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224inwoners.\n']] l = [[re.sub(r'^([\s]*)|([\s]*)$', '', j)] for i in l for j in i] pprint(l, width=120) Output: [['1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?'], ['A. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept'], ['B'], ['Antwoord B De Nederlandse vlootvoogd werd hierdoor bekend.'], ['2\nIn welk land ligt Upernavik?'], ['A. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika'], ['D'], ['Antwoord D Het is een dorp in Groenland met 1224inwoners.']]
Find text between list of keywords and point with RegEx in Python
# coding=utf-8 import re m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras." keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos'] obj = re.compile(r'\b(?:{})\b\s*(.*?),'.format('|'.join(map(re.escape, keywords)))) print obj.findall(m) I want to print text between one of words of keywords and the next point. Output that I want in these case: "esta es una de, las palabras."
the trailing \b prevents the match because your keyword ends with : simplify your regex by removing it. Plus the greedy / comma (.*?), is only extracting the first part before comma, I suppose you meant "to the next point": (.*?)\. obj = re.compile(r'\b(?:{})\s*(.*?)\.'.format('|'.join(map(re.escape, keywords)))) result: ['esta es una de, las palabras'] Removing the word boundary can match part of keywords in sentences though. You could force a non-word char with \W afterwards and it would work (acting like word boundary): obj = re.compile(r'\b(?:{})\W\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))
Use \b(?:{0})\s*(.*?)(?=\b(?:{0})|$) with lookahead instead: import re m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras." keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos'] obj = re.compile(r'\b(?:{0})\s*(.*?)(?=\b(?:{0})|$)'.format('|'.join(map(re.escape, keywords)))) print(obj.findall(m)) This outputs: ['esta es una de, las palabras.']
where is the error in my code trying to compare case -insensitive using python
I have a code that read files and compare the content with a user-input with ignoring case-sensitive. i used the list-comprehension in order to loop through the content and compare with user-input. The problem is that the list comprehension return an empty list, although the entered word exist. Example: textContent Les hiboux Charles Baudelaire Cycle 3 * POESIE Sous les ifs noirs qui les abritent Les hiboux se tiennent rangés Ainsi que des dieux étrangers Dardant leur œil rouge. Ils méditent. Sans remuer ils se tiendront Jusqu'à l'heure mélancolique Où, poussant le soleil oblique, Les ténèbres s'établiront. Leur attitude au sage enseigne Qu'il faut en ce monde qu'il craigne Le tumulte et le mouvement ; L'homme ivre d'une ombre qui passe Porte toujours le châtiment D'avoir voulu changer de place. Les Fleurs du Mal 1857 Charles Pierre Baudelaire (1821 – 1867) est un poète français. user-input: charl word exist : Charles--charle--CHARLE x=self.lineEditSearch.text() print(x) textString=self.ReadingFileContent(Item) #self.varStr =[c for c in textString if c.islower() or c.isupper() or c.capitalize()] self.varStr =[i for i in textString if i.lower() == x.lower()] print(self.varStr)
If user_input = "charl" word_exist = ["Charles","charle","CHARLE","Hello"] Then output = [item for item in word_exist if user_input.lower() in item.lower()] print(output) # ['Charles', 'charle', 'CHARLE'] Is this what you are looking for?
Your problem is, you are only putting in self.varStr members of textString that satisfies i.lower() == x.lower(), which means "being completely the same (case insensitive) with x". You want to pick up members that contains x. You can do that by changing i.lower() == x.lower() into i.lower() in x.lower()