Ideas to improve language detection between Spanish and Catalan

Ideas to improve language detection between Spanish and Catalan - python

I'm working on a text mining script in python. I need to detect the language of a natural language field from the dataset.
The thing is, 98% of the rows are in Spanish and Catalan. I tried using some algorithms like the stopwords one or the langdetect library, but these languages share a lot of words so they fail a lot.
I'm looking for some ideas to improve this algorithm.
One thought is, make a dictionary with some words that are specific to Spanish and Catalan, so if one text has any of these words, it's tagged as that language.

Approach 1: Distinguishing characters
Spanish and Catalan (note: there will be exceptions for proper names and loanwords e.g. Barça):
esp_chars = "ñÑáÁýÝ"
cat_chars = "çÇàÀèÈòÒ·ŀĿ"
Example:
sample_texts = ["El año que es abundante de poesía, suele serlo de hambre.",
"Cal no abandonar mai ni la tasca ni l'esperança."]
for text in sample_texts:
if any(char in text for char in esp_chars):
print("Spanish: {}".format(text))
elif any(char in text for char in cat_chars):
print("Catalan: {}".format(text))
>>> Spanish: El año que es abundante de poesía, suele serlo de hambre.
Catalan: Cal no abandonar mai ni la tasca ni l'esperança.
If this isn't sufficient, you could expand this logic to search for language exclusive digraphs, letter combinations, or words:
Spanish only
Catalan only
Words
como y su con él otro
com i seva amb ell altre
Initial digraphs
d' l'
Digraphs
ss tj qü l·l l.l
Terminal digraphs
ig
Catalan letter combinations that only marginally appear in Spanish
tx
tg          (Es. exceptions postgrado, postgraduado, postguerra)
ny          (Es. exceptions mostly prefixed in-, en-, con- + y-)
ll (terminal) (Es. exceptions (loanwords): detall, nomparell)
Approach 2: googletrans library
You could also use the googletrans library to detect the language:
from googletrans import Translator
translator = Translator()
for text in sample_texts:
lang = translator.detect(text).lang
print(lang, ":", text)
>>> es : El año que es abundante de poesía, suele serlo de hambre.
ca : Cal no abandonar mai ni la tasca ni l'esperança.

DicCat = ['amb','cap','dalt','damunt','des','dintre','durant','excepte','fins','per','pro','sense','sota','llei','hi','ha','més','mes','moment','órgans', 'segóns','Article','i','per','els','amb','és','com','dels','més','seu','seva','fou','també','però','als','després','aquest','fins','any','són','hi','pel','aquesta','durant','on','part','altres','anys','ciutat','cap','des','seus','tot','estat','qual','segle','quan','ja','havia','molt','rei','nom','fer','així','li','sant','encara','pels','seves','té','partit','està','mateix','pot','nord','temps','fill','només','dues','sota','lloc','això','alguns','govern','uns','aquests','mort','nou','tots','fet','sense','frança','grup','tant','terme','fa','tenir','segons','món','regne','exèrcit','segona','abans','mentre','quals','aquestes','família','catalunya','eren','poden','diferents','nova','molts','església','major','club','estats','seua','diversos','grans','què','arribar','troba','població','poble','foren','època','haver','eleccions','diverses','tipus','riu','dia','quatre','poc','regió','exemple','batalla','altre','espanya','joan','actualment','tenen','dins','llavors','centre','algunes','important','altra','terra','antic','tenia','obres','estava','pare','qui','ara','havien','començar','història','morir','majoria','qui','ara','havien','començar','història','morir','majoria']
DicEsp = ['los','y','bajo','con', 'entre','hacia','hasta','para','por','según','segun','sin','tras','más','mas','ley','capítulo','capitulo','título','titulo','momento','y','las','por','con','su','para','lo','como','más','pero','sus','le','me','sin','este','ya','cuando','todo','esta','son','también','fue','había','muy','años','hasta','desde','está','mi','porque','qué','sólo','yo','hay','vez','puede','todos','así','nos','ni','parte','tiene','él','uno','donde','bien','tiempo','mismo','ese','ahora','otro','después','te','otros','aunque','esa','eso','hace','otra','gobierno','tan','durante','siempre','día','tanto','ella','sí','dijo','sido','según','menos','año','antes','estado','sino','caso','nada','hacer','estaba','poco','estos','presidente','mayor','ante','unos','algo','hacia','casa','ellos','ayer','hecho','mucho','mientras','además','quien','momento','millones','esto','españa','hombre','están','pues','hoy','lugar','madrid','trabajo','otras','mejor','nuevo','decir','algunos','entonces','todas','días','debe','política','cómo','casi','toda','tal','luego','pasado','medio','estas','sea','tenía','nunca','aquí','ver','veces','embargo','partido','personas','grupo','cuenta','pueden','tienen','misma','nueva','cual','fueron','mujer','frente','josé','tras','cosas','fin','ciudad','he','social','tener','será','historia','muchos','juan','tipo','cuatro','dentro','nuestro','punto','dice','ello','cualquier','noche','aún','agua','parece','haber','situación','fuera','bajo','grandes','nuestra','ejemplo','acuerdo','habían','usted','estados','hizo','nadie','países','horas','posible','tarde','ley','importante','desarrollo','proceso','realidad','sentido','lado','mí','tu','cambio','allí','mano','eran','estar','san','número','sociedad','unas','centro','padre','gente','relación','cuerpo','incluso','través','último','madre','mis','modo','problema','cinco','carlos','hombres','información','ojos','muerte','nombre','algunas','público','mujeres','siglo','todavía','meses','mañana','esos','nosotros','hora','muchas','pueblo','alguna','dar','don','da','tú','derecho','verdad','maría','unidos','podría','sería','junto','cabeza','aquel','luis','cuanto','tierra','equipo','segundo','director','dicho','cierto','casos','manos','nivel','podía','familia','largo','falta','llegar','propio','ministro','cosa','primero','seguridad','hemos','mal','trata','algún','tuvo','respecto','semana','varios','real','sé','voz','paso','señor','mil','quienes','proyecto','mercado','mayoría','luz','claro','iba','éste','pesetas','orden','español','buena','quiere','aquella','programa','palabras','internacional','esas','segunda','empresa','puesto','ahí','propia','libro','igual','político','persona','últimos','ellas','total','creo','tengo','dios','española','condiciones','méxico','fuerza','solo','único','acción','amor','policía','puerta','pesar','sabe','calle','interior','tampoco','ningún','vista','campo','buen','hubiera','saber','obras','razón','niños','presencia','tema','dinero','comisión','antonio','servicio','hijo','última','ciento','estoy','hablar','dio','minutos','producción','camino','seis','quién','fondo','dirección','papel','demás','idea','especial','diferentes','dado','base','capital','ambos','europa','libertad','relaciones','espacio','medios','ir','actual','población','empresas','estudio','salud','servicios','haya','principio','siendo','cultura','anterior','alto','media','mediante','primeros','arte','paz','sector','imagen','medida','deben','datos','consejo','personal','interés','julio','grupos','miembros','ninguna','existe','cara','edad','movimiento','visto','llegó','puntos','actividad','bueno','uso','niño','difícil','joven','futuro','aquellos','mes','pronto','soy','hacía','nuevos','nuestros','estaban','posibilidad','sigue','cerca','resultados','educación','atención','gonzález','capacidad','efecto','necesario','valor','aire','investigación','siguiente','figura','central','comunidad','necesidad','serie','organizació','nuevas','calidad']
DicEng = ['all','my','have','do','and', 'or', 'what', 'can', 'you', 'the', 'on', 'it', 'at', 'since', 'for', 'ago', 'before', 'past', 'by', 'next', 'from','with', 'wich','law','is','the','of','and','to','in','is','you','that','it','he','was','for','on','are','as','with','his','they','at','be','this','have','from','or','one','had','by','word','but','not','what','all','were','we','when','your','can','said','there','use','an','each','which','she','do','how','their','if','will','up','other','about','out','many','then','them','these','so','some','her','would','make','like','him','into','time','has','look','two','more','write','go','see','number','no','way','could','people','my','than','first','water','been','call','who','oil','its','now','find','long','down','day','did','get','come','made','may','part','may','part']
def WhichLanguage(text):
Input = text.lower().split(" ")
CatScore = []
EspScore = []
EngScore = []
for e in Input:
if e in DicCat:
CatScore.append(e)
if e in DicEsp:
EspScore.append(e)
if e in DicEng:
EngScore.append(e)
if(len(EngScore) > len(EspScore)) and (len(EngScore) > len(CatScore)):
Language ='English'
else:
if(len(CatScore) > len(EspScore)):
Language ='Catala'
else:
Language ='Espanyol'
print(text)
print("ESP= ",len(EspScore),EspScore)
print("Cat = ",len(CatScore), CatScore)
print("ING= ",len(EngScore),EngScore)
print( 'Language is =', Language)
print("-----")
return(Language)
print(WhichLanguage("Hola bon dia"))

Related

Count occurrences of words in a text with special characters

I want to count occurrences of each word in a text to spot the key words coming over the most.
This script works quite well but the problem is that this text is written in French. So there are important key words that would be missed.
For example the word Europe may appear in the text like l'Europe or en Europe.
In the first case, the code will remove the apostrophe and l'Europe is considered as one unique word leurope in the final result.
How can I improve the code to split l' from Europe?
import string
# Open the file in read mode
#text = open("debat.txt", "r")
text = ["Monsieur Mitterrand, vous avez parlé une minute et demie de moins que Monsieur Chirac dans cette première partie. Je préfère ne pas avoir parlé une minute et demie de plus pour dire des choses aussi irréelles et aussi injustes que celles qui viennent d'être énoncées. Si vous êtes d'accord, nous arrêtons cette première partie, nous arrêtons les chronomètres et nous repartons maintenant pour une seconde partie en parlant de l'Europe. Pour les téléspectateurs, M. Mitterrand a parlé 18 minutes 36 et M. Chirac, 19 minutes 56. Ce n'est pas un drame !... On vous a, messieurs, probablement jamais vus plus proches à la fois physiquement et peut-être politiquement que sur les affaires européennes... les Français vous ont vus, en effet, à la télévision, participer ensemble à des négociations, au coude à coude... voilà, au moins, un domaine dans lequel, sans aucun doute, vous connaissez fort bien, l'un et l'autre, les opinions de l'un et de l'autre. Nous avons envie de vous demander ce qui, aujourd'hui, au -plan européen, vous sépare et vous rapproche ?... et aussi lequel de vous deux a le plus évolué au cours des quelques années qui viennent de s'écouler ?... #"]
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
sorted_tuples = sorted(d.items(), key=lambda item: item[1], reverse=True)
sorted_dict = {k: v for k, v in sorted_tuples}
# Print the contents of dictionary
for key in list(sorted_dict.keys()):
print(key, ":", sorted_dict[key])

line = line.translate(line.maketrans("", "", string.punctuation))
removes all punctuation characters (l'Europe becomes lEurope). Instead of that, you may want to replace them by spaces, using for example:
for p in string.punctuation:
line = line.replace(p, ' ')

Where you currently have:
line = line.translate(line.maketrans("", "", string.punctuation))
... you can add the following line before it:
line = line.translate(line.maketrans("'", " "))
This will replace the ' character with a space wherever it's found, and the line using string.punctuation will behave exactly as before, except that it will not encounter any ' characters since we have already replaced them.

How can I modify an "if condition" in order to apply it to different list at the same time?

I wrote a script to extract sentences in huge set which contains particular pattern. The problem lied in the fact that , for some patterns I checked the value of the attribute at the beginning or ending of the pattern to see if the word is present in a particular list. I have 4 dictionaries with 2 lists of positive and negative word. So far I wrote the script and I am able to use the function I wrote with one dictionary. I am thinking how can I improve the my function so that I can use it at the same time of the 4 dictionaries without duplicating the bloc which loop in the dictionary.
I give an example with two dictionaries (since the script is quite long I make a small example with all the necessary element
import spacy.attrs
from spacy.attrs import POS
import spacy
from spacy import displacy
from spacy.lang.fr import French
from spacy.tokenizer import Tokenizer
from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex
from spacy.lemmatizer import Lemmatizer
nlp = spacy.load("fr_core_news_md")
from spacy.matcher import Matcher#LIST
##################### List of lexicon
# Lexique Diko
lexicon = open(os.path.join('/h/Ressources/Diko.txt'), 'r', encoding='utf-8')
data = pd.read_csv(lexicon, sep=";", header=None)
data.columns = ["id", "terme", "pol"]
pol_diko_pos = data.loc[data.pol =='positive', 'terme']
liste_pos_D = list(pol_diko_pos)
print(liste_pos[1])
pol_diko_neg = data.loc[data.pol =='negative', 'terme']
liste_neg_D = list(pol_diko_neg)
#print(type(liste_neg))
# Lexique Polarimots
lexicon_p = open(os.path.join('/h/Ressources/polarimots.txt'), 'r', encoding='utf-8')
data_p = pd.read_csv(lexicon_p, sep="\t", header=None)
#data.columns = ["terme", "pol", "pos", "degre"]
data_p.columns = ["ind", "terme", "cat", "pol", "fiabilité"]
pol_polarimot_pos = data_p.loc[data_p.pol =='POS', 'terme']
liste_pos_P = list(pol_polarimot_pos)
print(liste_pos_P[1])
pol_polarimot_neg = data_p.loc[data_p.pol =='NEG', 'terme']
liste_neg_P = list(pol_polarimot_neg)
#print(type(liste_neg))
# ############################# Lists
sentence_not_extract_lexique_1 =[] #List of all sentences without the specified pattern
sentence_extract_lexique_1 = [] #list of sentences which the pattern[0] is present in the first lexicon
sentence_not_extract_lexique_2 =[] #List of all sentences without the specified pattern
sentence_extract_lexique_2 = [] #list of sentences which the pattern[0] is present in the second lexicon
list_token_pos = [] #list of the token found in the lexique
list_token_neg = [] #list of the token found in the lexique
list_token_not_found = [] #list of the token not found in the lexique
#PATTERN
pattern1 = [{"POS": {"IN": ["VERB", "AUX","ADV","NOUN","ADJ"]}}, {"IS_PUNCT": True, "OP": "*"}, {"LOWER": "mais"} ]
pattern1_tup = (pattern1, 1, True)
pattern3 = [{"LOWER": {"IN": ["très","trop"]}},
{"POS": {"IN": ["ADV","ADJ"]}}]
pattern3_tup = (pattern3, 0, True)
pattern4 = [{"POS": "ADV"}, # adverbe de négation
{"POS": "PRON","OP": "*"},
{"POS": {"IN": ["VERB", "AUX"]}},
{"TEXT": {"IN": ["pas", "plus", "aucun", "aucunement", "point", "jamais", "nullement", "rien"]}},]
pattern4_tup = (pattern4, None, False)
#Tuple of pattern
pattern_list_tup =[pattern1_tup, pattern3_tup, pattern4_tup]
pattern_name = ['first', 'second', 'third', 'fourth']
length_of_list = len(pattern_list_tup)
print('length', length_of_list)
#index of the value of attribute to check in the lexicon
value_of_attribute = [0,-1,-1]
# List of lexicon to use
lexique_1 = [lexique_neg, lexique_pos]
lexique_2 = [lexique_2neg, lexique_2pos]
# text (example of some sentences)
file =b= ["Le film est superbe mais cette édition DVD est nulle !",
"J'allais dire déplorable, mais je serais peut-être un peu trop extrême.",
"Hélas, l'impression de violence, bien que très bien rendue, ne sauve pas cette histoire gothique moderne de la sécheresse scénaristique, le tout couvert d'un adultère dont le propos semble être gratuit, classique mais intéressant...",
"Tout ça ne me donne pas envie d'utiliser un pieu mais plutôt d'aller au pieu (suis-je drôle).",
"Oui biensur, il y a la superbe introduction des parapluies au debut, et puis lorsqu il sent des culs tout neufs et qu il s extase, j ai envie de faire la meme chose apres sur celui de ma voisine de palier (ma voisine de palier elle a un gros cul, mais j admets que je voudrais bien lui foute mon tarin), mais c est tout, apres c est un film tres noir, lent et qui te plonge dans le depression.",
"Et bien hélas ce DVD ne m'a pas appris grand chose par rapport à la doc des agences de voyages et la petite dame qui fait ses dessins est bien gentille mais tout tourne un peu trop autour d'elle.",
"Au final on passe de l'un a l'autre sans subtilité, et on n'arrive qu'à une caricature de plus : si Kechiche avait comme but initial déclaré de fustiger les préjugés, c'est le contraire qui ressort de ce ''film'' truffé de clichés très préjudiciables pour les quelques habitants de banlieue qui ne se reconnaîtront pas dans cette lourde farce.",
"-ci écorche les mots, les notes... mais surtout nos oreilles !"]
# Loop to check each sentence and extract the sentences with the specified pattern from above
for pat in range(0, length_of_list):
matcher = Matcher(nlp.vocab)
matcher.add("matching_2", None, pattern_list_tup[pat][0])
# print(pat)
# print(pattern_list_tup[pat][0])
for sent in file:
doc =nlp(sent)
matches= matcher(doc)
for match_id, start, end in matches:
span = doc[start:end].lemma_.split()
#print(f"{pattern_name[pat]} pattern found: {span}")
This is the part I want ot modify to use it for another dictionary, the goal is to able to retrieve sentences extract by 4 different dictionaries to make a comparison and then check which sentences are present in more than two list.
# Condition to use the lexicon and extract the sentence
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]]):
if sent not in sentence_extract:
sentence_extract_lexique_1.append(sent)
if (pattern_list_tup[pat][1] == 1):
list_token_pos.append(span[value_of_attribute[pat]])
if (pattern_list_tup[pat][1] == 0):
list_token_neg.append(span[value_of_attribute[pat]])
else:
list_token_not_found.append(span[value_of_attribute[pat]]) # the text form is not present in the lexicon need the lemma form
sentence_not_extract_lexique_1.append(sent)
else:
if sent not in sentence_extract:
sentence_extract_lexique_1.append(sent)
print(len(sentence_extract))
print(sentence_extract)
One solution I find is to duplicate the code abode and change the name of the list where the sentences are stored but since I have 2 dictionaries duplicating will make the code longer is there a way to combine the looping the 2 dictionaries (actually 4 dictionaries in the original) and append the result to the good list. So, for example, when I use lexique_1 , all the sentences extracted are send to "sentence_extract_lexique_1" and so on for the other.

In my opinion attempt using the if-elif-else chain. If not attempt only using the if-elif block simply because the elif statement catches the specific condition of interest. In which you're trying to catch a specific to compare and check with the sentences. Keep in mind if you try the if-elif-else chain its a good method, but it only works when you need one test to pass. Because Python finds one test to pass and it skips the rest. Its very efficient and allows you to test for one specific condition.

How to remove newline from end and beginning of every list element [python]

I've got the following code:
from itemspecs import itemspecs
x = itemspecs.split('###')
res = []
for item in x:
res.append(item.split('##'))
print(res)
This imports a string from another document. And gives me the following:
[['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n',
'\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlanders
onderschept\n', '\nB\n', '\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'],
['\n2\nIn welk land ligt Upernavik?\n', '\nA. Antartica\nB. Canada\nC. Rusland\nD.
Groenland\nE. Amerika\n', '\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224
inwoners.\n']]
But now I want to remove all the \n from every end and beginning of every element in this list. How can I do this?

You can do that by stripping "\n" as follow
a = "\nhello\n"
stripped_a = a.strip("\n")
so, what you need to do is iterate through the list and then apply the strip on the string as shown below
res_1=[]
for i in res:
tmp=[]
for j in i:
tmp.append(j.strip("\n"))
res_1.append(tmp)
The above answer just removes \n from start and end. if you want to remove all the new lines in a string, just use .replace('\n"," ") as shown below
res_1=[]
for i in res:
tmp=[]
for j in i:
tmp.append(j.replace("\n"))
res_1.append(tmp)

strip() is easiest method in this case, but if you want to do any advanced text processing in the future, it isn't bad idea to learn regular expressions:
import re
from pprint import pprint
l = [['\n1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?\n',
'\nA. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept\n',
'\nB\n',
'\nAntwoord B De Nederlandse vlootvoogd werd hierdoor bekend.\n'],
['\n2\nIn welk land ligt Upernavik?\n',
'\nA. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika\n',
'\nD\n', '\nAntwoord D Het is een dorp in Groenland met 1224inwoners.\n']]
l = [[re.sub(r'^([\s]*)|([\s]*)$', '', j)] for i in l for j in i]
pprint(l, width=120)
Output:
[['1\nWie was de Nederlandse scheepvaarder die de Spaanse zilvervloot veroverde?'],
['A. Michiel de Ruyter\nB. Piet Heijn\nC. De zilvervloot is nooit door de Nederlandersonderschept'],
['B'],
['Antwoord B De Nederlandse vlootvoogd werd hierdoor bekend.'],
['2\nIn welk land ligt Upernavik?'],
['A. Antartica\nB. Canada\nC. Rusland\nD.Groenland\nE. Amerika'],
['D'],
['Antwoord D Het is een dorp in Groenland met 1224inwoners.']]

Find text between list of keywords and point with RegEx in Python

# coding=utf-8
import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{})\b\s*(.*?),'.format('|'.join(map(re.escape, keywords))))
print obj.findall(m)
I want to print text between one of words of keywords and the next point. Output that I want in these case: "esta es una de, las palabras."

the trailing \b prevents the match because your keyword ends with :
simplify your regex by removing it. Plus the greedy / comma (.*?), is only extracting the first part before comma, I suppose you meant "to the next point": (.*?)\.
obj = re.compile(r'\b(?:{})\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))
result:
['esta es una de, las palabras']
Removing the word boundary can match part of keywords in sentences though. You could force a non-word char with \W afterwards and it would work (acting like word boundary):
obj = re.compile(r'\b(?:{})\W\s*(.*?)\.'.format('|'.join(map(re.escape, keywords))))

Use \b(?:{0})\s*(.*?)(?=\b(?:{0})|$) with lookahead instead:
import re
m = "Hola esto es un ejemplo Objeto: esta es una de, las palabras."
keywords = ['Objeto:', 'OBJETO', 'Objeto social:', 'Objetos']
obj = re.compile(r'\b(?:{0})\s*(.*?)(?=\b(?:{0})|$)'.format('|'.join(map(re.escape, keywords))))
print(obj.findall(m))
This outputs:
['esta es una de, las palabras.']

where is the error in my code trying to compare case -insensitive using python

I have a code that read files and compare the content with a user-input with ignoring case-sensitive.
i used the list-comprehension in order to loop through the content and compare with user-input.
The problem is that the list comprehension return an empty list, although the entered word exist. Example:
textContent
Les hiboux
Charles Baudelaire
Cycle 3
*
POESIE
Sous les ifs noirs qui les abritent
Les hiboux se tiennent rangés
Ainsi que des dieux étrangers
Dardant leur œil rouge. Ils méditent.
Sans remuer ils se tiendront
Jusqu'à l'heure mélancolique
Où, poussant le soleil oblique,
Les ténèbres s'établiront.
Leur attitude au sage enseigne
Qu'il faut en ce monde qu'il craigne
Le tumulte et le mouvement ;
L'homme ivre d'une ombre qui passe
Porte toujours le châtiment
D'avoir voulu changer de place.
Les Fleurs du Mal
1857
Charles Pierre Baudelaire (1821 – 1867) est un poète français.
user-input: charl
word exist : Charles--charle--CHARLE
x=self.lineEditSearch.text()
print(x)
textString=self.ReadingFileContent(Item)
#self.varStr =[c for c in textString if c.islower() or c.isupper() or c.capitalize()]
self.varStr =[i for i in textString if i.lower() == x.lower()]
print(self.varStr)

If
user_input = "charl"
word_exist = ["Charles","charle","CHARLE","Hello"]
Then
output = [item for item in word_exist if user_input.lower() in item.lower()]
print(output)
# ['Charles', 'charle', 'CHARLE']
Is this what you are looking for?

Your problem is, you are only putting in self.varStr members of textString that satisfies i.lower() == x.lower(), which means "being completely the same (case insensitive) with x".
You want to pick up members that contains x.
You can do that by changing i.lower() == x.lower() into i.lower() in x.lower()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Ideas to improve language detection between Spanish and Catalan - python

Related

Count occurrences of words in a text with special characters

How can I modify an "if condition" in order to apply it to different list at the same time?

How to remove newline from end and beginning of every list element [python]

Find text between list of keywords and point with RegEx in Python

where is the error in my code trying to compare case -insensitive using python

Categories

Resources