How do I make my search case insensitive?

How do I make my search case insensitive? - python

I have a web app that searches through a few databases, some of the data saved is in uppercase and some a mix of upper and lowercase, however when searching the keyword I want it to ignore the case and just bring up results that match the word. for example I want to search "document_reference" without having to write the proper way it is saved which is "Document_Reference"
I was told to add case insensitivity in my index hwoever not sure what to do or add in there,
I tried this (found it in whoosh documentation)
class CaseSensitivizer(analysis.Filter):
def __call__(self, tokens):
for t in tokens:
yield t
if t.mode == "index":
low = t.text.lower()
if low != t.text:
t.text = low
yield t
this what my index and query parser looks like
def open_index(indexDirectory):
# open index and return a idex object
ix = index.open_dir(indexDirectory)
return ix
def search_index(srch, ix):
# Search the index and print results
# ix = open_index(indexDirectory)
results = ''
lst = []
qp = MultifieldParser(['Text', 'colname',
'tblname', 'Length', 'DataType', 'tag_name'],
schema=ix.schema, group=qparser.OrGroup)
# qp = QueryParser('Text', schema=ix.schema)
q = qp.parse(srch)
with ix.searcher() as s:
results = s.search(q, limit=None)
for r in results:
print('\n', r)
lst.append(r.fields())
if(DEBUG):
print('Search Results:\n', lst)
print('\nFinished in search.py')
return lst
currently it only ever gives results that exactly match what I typed in search bar, so If I type "document" but the source is actually stored as "DOCUMENT" I wouldnt get any results

I know this is an older issue but thought would reply in case somebody like me came here looking for a solution.
The CaseSensitivizer class needs to be used when you define your schema. This is how you would use it to create the schema from the quickstart example from the docs
>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> from whoosh import analysis
>>> class CaseSensitivizer(analysis.Filter):
def __call__(self, tokens):
for t in tokens:
yield t
if t.mode == "index":
low = t.text.lower()
if low != t.text:
t.text = low
yield t
>>> myanalyzer = analysis.RegexTokenizer() | CaseSensitivizer()
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT(analyzer=myanalyzer))
Now you can use this schema to create your index and do what you were doing before to search. That worked for me.

instead of using lower() or upper(), you can use casefold() for string comparison.
A very good example given here.
In short, example is:
s1 = 'Apple'
s3 = 'aPPle'
s1.casefold() == s3.casefold()
returns True.

Related

How can I properly use variables returned in other function / method?

Please bare with me since I am just starting to learn coding, and Python is my first language to go.
I struggle and can't really get to understand how the functions work.
I can't manage to call it and use later on when I need it in another function.
Can someone please help me understand the depth of it ?
My code doesn't work and I can't manage to understand how to grab the results from a function, in order to use those results for the end purpose.
This is something I tried in the project I am working on:
manly_coded_bag = []
female_coded_bag = []
feminine_coded_words = [
"agree",
"affectionate",
"child",
"cheer",
]
masculine_coded_words = [
"active",
"adventurous",
"aggressive",
"ambitios",
]
explanations = {
"feminine-coded": (
"This job ad uses more words that are subtly coded as feminine than words that are subtly coded as masculine"
),
"masculine-coded": (
"This job ad uses more words that are subtly coded as masculine than words that are subtly coded as feminine."
)
def men_coded_words(masc_bag, text):
add_text = text
man_coded_bag = masc_bag
for word in masculine_coded_words:
if word in add_text:
man_coded_bag.append(word)
return man_coded_bag
def women_coded_words(fem_bag, text):
add_text = text
woman_coded_bag = fem_bag
for word in feminine_coded_words:
if word in add_text:
woman_coded_bag.append(word)
return woman_coded_bag
def analise_and_explain_results(text, count_man, count_fem):
count_man_words = count_man
count_man_words = len(man_coded_bag)
count_woman_words = count_fem
count_woman_words = len(woman_coded_bag)
coding_score = count_woman_words - count_man_words
strengths_of_coding = ""
if coding_score == 0:
if count_man_words:
strengths_of_coding = "neutral"
else:
strengths_of_coding = "empty"
elif coding_score > 0:
strengths_of_coding = "feminine-coded"
else:
strengths_of_coding = "masculine-coded"
return count_man_words, count_woman_words, strengths_of_coding
def get_results(text):
user_input = text
user_input = input("add text here:").lower()
res = analise_and_explain_results(text, man_coded_bag,
woman_coded_bag)
# i am trying to use the returned variable strengths_of_coding and
is not accesible.
explain_results = explanations[strengths_of_coding]
return res, explain_results
get_results("random text added here, really whatever for testing purposes")
Right, so when I am calling get_results('text'), I get this error and I know where it is coming from, "name 'strengths_of_coding' is not defined", but I just don't know how to access that variable...
I'm stuck here and a little bit frustrated because I understand it's a noob mistake, yet still I can't get the hang of it after a week of stress and frustration.
Any feedback is welcome.

So it's hard to explain everything if you barely have any knowledge in OOP or coding in general. But in python, the return value of a function can be anything. None, a integer, a list, tuple, dictionary, object. Can even be a class definition. Only by looking at it, will you know exactly. That is called duck-typing; "If it walks like a duck and it quacks like a duck, then it must be a duck"
In this case, your analise_and_explain_results function does not return one thing, but several since it does this:
return count_man_words, count_woman_words, strengths_of_coding
So it actually returns a tuple with those three values inside. And these variables are scoped to that specific function, you cannot use them outside that function anymore. Note: For the sake of simplicity; let's just stick to not using them outside of the function since it's bad practice.
In your code, you then do this:
res = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag)
Which means that res at this point is actually the tuple holding the three values you are interested in. You have several ways to resolve this. But this easiest to follow is to just assign the values of variables like this:
count_man_words, count_woman_words, strengths_of_coding = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag)
This basically unpacks the tuple into three different values since it basically does this:
a, b, c = (1, 2 ,3)
Where before you did:
d = (1, 2, 3)
Unpacking is easy, as long as the item you unpack holds as many items as you're trying to assign;
a, b, c = d
If you have trouble grasping OOP and python I would suggest you learn to walk, before you run, which you're doing now IMO.
Follow some tutorials or videos explaining OOP and python. Or combine them like they do on realpython.

strengths_of_coding is only defined inside the analise_and_explain_results function. When you return the values of that function, they are no longer attached to the names you used inside the function
return count_man_words, count_woman_words, strengths_of_coding can be also written as return (count_man_words, count_woman_words, strengths_of_coding) - it means the return value of the function is a tuple with 3 elements that are values of each of the variables, and that tuple is assigned to res in res = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag)
Value of variable called strengths_of_coding inside the function is available as res[2] in get_results after you do the assignment to res

res = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag) turns res into a tuple with 3 elements. strengths_of_coding is the 3rd element in this tuple. So, you access it as res[2]. In python, when you return multiple stuff to one variable, the variable turns into a tuple. You can provide multiple variables to take each return. e,g, count_man_words, count_woman_words, strengths_of_coding = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag). Or, if you only need that one return then, strengths_of_coding = analise_and_explain_results(user_input, man_coded_bag, woman_coded_bag)[2].

Sorry for a late answer. This is how I ended up fixing my code with the kind help from people who answered, I have came to understand where I was making mistakes.
This is how I fixed my code, using the example above, just in case someone else struggle with grasping the basics.
As a bonus for any new beginner finding this useful, something I learned from someone else.
The last print statement is a very useful way to debug your code:
print("results are: %s" % results)
For example adding it at the end, you can see if you ended up getting the right results, like in my example, or you can add it in your code to see what results you return in each functions and so on.
user_input = "Active, child, whatever random text, testing"
text = user_input.lower()
# declare the coded words, feminine and masculine
feminine_coded_words = [
"agree",
"affectionate",
"child",
"cheer",
]
masculine_coded_words = [
"active",
"adventurous",
"aggressive",
"ambitios",
]
# declare explanations to use when we explain the results to the user.
explanations = {
"feminine-coded": (
"This job ad uses more words that are subtly coded as feminine than words that are subtly coded as masculine"
),
"masculine-coded": (
"This job ad uses more words that are subtly coded as masculine than words that are subtly coded as feminine."
),
"neutral": ("this is neutral"),
"empty": ("empty text"),
}
# initiate two empty variable where we add our coded words, when we find them.
def men_coded_words(text):
add_text = text
manly_coded_bag = []
for word in masculine_coded_words:
if word in add_text:
manly_coded_bag.append(word)
return manly_coded_bag
def women_coded_words(text):
add_text = text
feminine_coded_bag = []
for word in feminine_coded_words:
if word in add_text:
feminine_coded_bag.append(word)
return feminine_coded_bag
def feminine_counted_words(text):
woman_coded_bag = women_coded_words(text)
return len(woman_coded_bag)
def masculine_counted_words(text):
man_coded_bag = men_coded_words(text)
return len(man_coded_bag)
def coding_score(text):
count_fem_words = feminine_counted_words(text)
count_masc_words = masculine_counted_words(text)
return count_fem_words - count_masc_words
def explain_results(text):
strengths_of_coding = ""
count_masc_words = masculine_counted_words(text)
coding_score_results = coding_score(text)
if coding_score_results == 0:
if count_masc_words:
strengths_of_coding = "neutral"
else:
strengths_of_coding = "empty"
elif coding_score_results > 0:
strengths_of_coding = "feminine-coded"
else:
strengths_of_coding = "masculine-coded"
return strengths_of_coding
def get_results(text):
strenght_of_coding = explain_results(text)
return explanations[strenght_of_coding]
results = get_results(text)
print("results are: %s" % results)

Parse plaintext API response

Im getting plaintext responses from an API like these:
So i would like to parse or pass those values to variables.
Example:
If the response is:
TD_OK
3213513
I would like to convert this to:
TD_Result = TD_OK
TD_Number = 3213513
I tried something like this, but did not work:
result = """
TD_EXISTS
23433395"""
result2 = []
for r in result:
result2.append(r)
TD_Result = result2[1]
TD_Number = result2[2]
print (TD_Result)
print (TD_Number)
Any idea about how to do that?

for r in result: -> for r in result.splitlines():
or
as #Matmarbon said, below will be better
result = """
TD_EXISTS
23433395
"""
td_result, td_number = result.split()
print(td_result)
print(td_number)
get rid of unnecessary dict
use Extended Iterable Unpacking
use snake_case to comply with its naming convention

You can do this using the split method as follows.
Also note that list indexes in Python start at zero instead of one.
result = """
TD_EXISTS
23433395"""
result2 = result.split()
TD_Result = result2[0]
TD_Number = result2[1]
print (TD_Result)
print (TD_Number)

Convert JSON to .ics (Python)

I am trying to convert a JSON file to an iCalendar file. My supervisor suggested using two functions convertTo(data) (which converts a JSON to a String) and convertFrom(data) (which converts a String to a JSON; I am not sure of the purpose of this function).
My current approach uses a lot of refactoring and multiple functions.
#returns a String
def __convert(data):
convStr = __convertTo(data)
convStr = __fields(convStr)
return convStr
#convert JSON to a String
def __convertTo(data):
str = "" + data
return str
#takes string arg (prev converted from JSON) to split it into useful info
def __fields(data)
#########
iCalStr = __iCalTemplate(title, dtStart, UID, remType, email)
return iCalStr
#
def __iCalTemplate(title, dtStart, UID, remType, email):
icsTempStr = "BEGIN:VEVENT\n
DTSTART:" + dtStart + "\nUID:" + UID + "\nDESCRIPTION:" + desc + "\nSUMMARY:" + title
if remType is not None
icsTempStr += "\nBEGIN:VALARM\nACTION:" + remType + "DESCRIPTION:This is an event reminder"
if remType is email
icsTempStr += "\nSUMMARY:Alarm notification\nATTENDEE:mailto:" + email
icsTempStr += "\nEND:VALARM"
return icsTempStr
Any hints or suggestions would be very helpful. I am fully aware that this code needs a LOT of work.

This isn't intended to be a complete answer, but as a longer tip.
There's a Python idiom that will be very helpful to you in building strings, especially potentially large ones. It's probably easier to see an example than explain:
>>> template = 'a value: {a}; b value: {b}'
>>> data = {'a': 'Spam', 'b': 'Eggs'}
>>> template.format(**data)
'a value: Spam; b value: Eggs'
This idiom has a number of advantages over string concatenation and could eliminate the need for a function altogether if you write the template correctly. Optional inserts could, for example, be given values of ''. Once you format your iCal template correctly, it's just a matter of retrieving the right data points from JSON... and if you name your template insert points the same as what you have in JSON, you might even be able to do that conversion in one step. With a bit of planning, your final answer could be something as simple as:
import json
template = 'full iCal template with {insert_point} spec goes here'
data = json.JSONDecoder().decode(your_json_data)
ical = template.format(**data)
To do a quick (and slightly different) interpreter example:
>>> import json
>>> decoder = json.JSONDecoder()
>>> json_example = '{"item_one" : "Spam", "item_two" : "Eggs"}'
>>> template = 'Item 1: {item_one}\nItem 2: {item_two}'
>>> print template.format(**decoder.decode(json_example))
Item 1: Spam
Item 2: Eggs

I ended up using a completely different, more efficient approach to accomplish this. In summary, my method traverses through a JSON, extracting each value from each field and manually places it in the appropriate place in an iCalendar template. It returns a string. Something like this...
def convert(self, json):
template = 'BEGIN:VEVENT\n'
template += 'DTSTART:%s\n' % json['event-start']
...
return template

Iterating over references to variables in Python

I have an object called Song, which is defined as:
class Song(object):
def __init__(self):
self.title = None
self.songauthor = None
self.textauthor = None
self.categories = None
Inside this class I have a method that parses a run-time property of that object, "metadata", which is basically just a text file with some formatted text that I parse with regular expressions. During this process, I have come up with the following code that I am pretty certain can be simplified to a loop.
re_title = re.compile("^title:(.*)$", re.MULTILINE)
re_textauthor = re.compile("^textauthor:(.*)$", re.MULTILINE)
re_songauthor = re.compile("^songauthor:(.*)$", re.MULTILINE)
re_categories = re.compile("^categories:(.*)$", re.MULTILINE)
#
# it must be possible to simplify the below code to a loop...
#
tmp = re_title.findall(self.metadata)
self.title = tmp[0] if len(tmp) > 0 else None
tmp = re_textauthor.findall(self.metadata)
self.textauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_songauthor.findall(self.metadata)
self.songauthor = tmp[0] if len(tmp) > 0 else None
tmp = re_categories.findall(self.metadata)
self.categories = tmp[0] if len(tmp) > 0 else None
I'm guessing this can be done by encapsulating a reference to the property (e.g. self.title) and the corresponding regular expression (re_title) in a datatype (possibly tuple), and then iterate over a list of these data types.
I have a tried using a tuple as such:
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
data = x[1].findall(self.metadata)
x[0] = data[0] if len(data) > 0 else None
This failed horribly as I cannot modify a tuple in run-time. Can anyone provide a suggestion as to how I can pull this off?

There are two problems with your code.
The big one is that x[0] is not a reference to self.title, it's a reference to the value of self.title. In other words, you're just copying the existing title into a tuple, then replacing that title in the tuple with a different one, which has no effect on the existing title.
The smaller one is that you can't replace elements in a tuple. You could fix that trivially by using a list instead of a tuple, but you're still going to have the big problem.
So, how do you create references to variables in Python? You can't. You need to think of a way to reorganize things. For example, maybe you can access these things by name, instead of by reference. Instead of four separate variables, store a dictionary of four variables in a single dictionary:
res = {
'title': re.compile("^title:(.*)$", re.MULTILINE),
'textauthor': re.compile("^textauthor:(.*)$", re.MULTILINE)
'songauthor': re.compile("^songauthor:(.*)$", re.MULTILINE)
'categories': re.compile("^categories:(.*)$", re.MULTILINE)
}
class Song(object):
def __init__(self):
self.properties = {}
def parsify(self, text):
for thing in ('title', 'textauthor', 'songauthor', 'categories'):
data = res[thing].findall(self.metadata)
self.properties[thing] = data[0] if len(data) > 0 else None
You could also use for thing in res: there, because that will iterate over all the keys (in arbitrary order, but you probably don't care about the order).
If you really need to have self.title, you've run into a common problem. Usually, there's a clear distinction between data—which should be referred to by runtime strings—and attributes—which should not. But sometimes, there isn't. So you have to bridge between them in some way. You can create four #property fields that return self.properties['title'], or you can use setattr(self, thing, …) instead of self.properties[thing], or various other possibilities. Which one is best comes down to whether they're more data-like or more attribute-like.

Instead of assigning to the tuple, update the class members directly:
all_res = {'title':re_title,
'textauthor': re_textauthor,
'songauthor': re_song_author,
'categories': re_categories}
for k, v in all_res.iteritems():
tmp = v.findall(self.metadata)
if tmp:
setattr(self, k, tmp[0])
else:
setattr(self, k, None)
If you only care about the first match, you don't need to use findall.

abarnert's answer has given a good explanation of what is going wrong with your code, but I wanted to offer up an alternative solution. Rather than using a loop to assign each variable, try creating an iterable of the different values from the parsed file, then use a single unpacking-assignment to get them into the various variables.
Here's a two-statement solution using a list comprehension, which is made just a bit tricky by the fact that you need to reference the result of findall twice in if/else expression (thus the nested generator expression):
vals = [x[0] if len(x) > 0 else None for x in (regex.findall(self.metadata) for regex in
[re_title, re_textauthor,
re_songauthor, re_categories])]
self.title, self.textauthor, self.songauthor, self.categories = vals
You can probably simplify things a little bit in the first part of the list comprehension. To start with, you can just test if x rather than if len(x) > 0. Or, if you're not too attached to using findall, you could use search instead, then just use x and x.group(0) instead of the whole if/else bit. The search method returns None if no match was found, so the short-circuiting behavior of the and operator will do exactly what we want.

An example would be to use a dictionary like this:
things = {}
for x in ((self.title, re_title),
(self.textauthor, re_textauthor),
(self.songauthor, re_songauthor),
(self.categories, re_categories)):
if len(x[1].findall(self.metadata):
things[x[0]] = x[1].findall(self.metadata)[1]
else:
things[x[0]] = None
Could this be a possible solution?

Easiest way to substitute characters in the string

Suppose I have a main form named "main_form" and I have few more forms name like "w_main_form","g_main_form" etc etc which are based on "main_form" and they vary according to the 'category'.
Now is there any easy way to generate names of derived forms and how to call them.
Suppose the category is "iron" then the form name should be "w_main_form" and when the category is "coal" the form name should be "g_main_form".

>>> main_name = "main_form"
>>> derived_names = []
>>> for prefix in ["w_", "g_"]:
derived_names.append("%s%s" % (prefix, main_name))
>>> derived_names
['w_main_form', 'g_main_form']
Or, with list comprehensions (my preferred method):
>>> derived_names = ["%s%s" % (prefix, main_name) for prefix in ["w_", "g_"]]
>>> derived_names
['w_main_form', 'g_main_form']
In general, so you can apply the same principle yourself, you want to think of the transform you want to do in terms of a function, f(main_name, data), and the data to provide to it. In this case, the operation was "prepend" (which I implemented with "%s%s" % (prefix, main_name)) and the data was all the prefixes.
EDIT: Yes.
>>> category_to_prefix = {'iron': 'w_', 'coal': 'g_'}
>>> def category_to_form_name(category):
return '%s%s' % (category_to_prefix.get(category,""), 'main_form')
>>> category_to_form_name('iron')
'w_main_form'
>>> category_to_form_name('coal')
'g_main_form'
>>> category_to_form_name(None)
'main_form'
Please upvote and accept the answer (click the up arrow and the green checkmark) if it is what you were looking for.

This will do what your comment stated ..
def generate_Name(base, category):
if category == 'iron':
derived_name = 'w_'+base
elif category == 'coal':
derived_name = 'g_'+base
return derived_name
iron_form = generate_Name('main_form', 'iron')
coal_form = generate_Name('main_form', 'coal')
print iron_form
print coal_form
gives
w_main_form
g_main_form

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I make my search case insensitive? - python

instead of using lower() or upper(), you can use casefold() for string comparison. A very good example given here. In short, example is: s1 = 'Apple' s3 = 'aPPle' s1.casefold() == s3.casefold() returns True.

Related

How can I properly use variables returned in other function / method?

Parse plaintext API response

Convert JSON to .ics (Python)

Iterating over references to variables in Python

Easiest way to substitute characters in the string

Categories

Resources