comparing python dictionary values - python

I'm creating a very basic search engine in python, I am working on creating a method for handling phrase queries, so if the position of 2 words are within 1 they are next to each other in the document and it will output all document numbers where this happens.
I currently have a dictionary which looks like this
{'8':[['1170', '1264', '1307', '1559', '1638'], ['197', '1169']],
'6':[['345', '772'], ['346']}
This is just a layout example.
w=word, p=position ||
{doc1:[w1p1, w1p2, w1p3],[w2p1, w2p2]}
The key is the document id, followed by the positions in that document that the 1st word contains, then the positions of the 2nd word. There will be as many words (grouping of the positions) as that of in the query.
My questions is, is there a way were i can compare the values of the 1 and 2nd + 3rd etc ... values for the same document ID?. I want to compare them to see if a words position is only +1 of the other word.
So you can see for doc 6 word 2 follows word 1, this would result in the key being sent back.

There are a couple ways to achieve what you're trying to do here. I'm assuming based on the example you gave me that there are always only two words, and the lists are always ordered ordered.
No matter what the method, you'll want to iterate over the documents (The dictionary). Iterating over dictionaries is simple in Python; you can see an example here. After that, the steps change
First option - less efficient, slightly simpler:
Iterate over each item (location) in list 1 (the locations of the first word).
Iterate over each item (location) in list 2 (the locations of the second word).
Compare the two locations, and if they're within 1 of each other, return the document id.
Example:
for documentNumber in docdictionary:
for word1location in docdictionary[documentNumber][0]:
for word2location in docdictionary[documentNumber][1]:
if abs(word1location - word2location) == 1:
return documentNumber
Second Option - more efficient, slightly more complicated:
Start at the beginning of each list of word locations, keeping track of where you are
Check the two values at the locations you're at.
If the two values are 1 word apart, return the document number
If the two values are not, check which list item (page position), has a lower value and move to the next item in that list, repeat
If one of the lists (ex. list 1) runs out of numbers, and the other list (list 2) is at a value that is greater than the last value of the first (list 1), return None.
Example:
for documentNumber in docdictionary:
list1pos = 0
list2pos = 0
while True:
difference = docdictionary[documentNumber][0][list1pos] - docdictionary[documentNumber][1][list2pos]
if abs(difference) == 1:
return documentNumber
if difference < 0: #Page location 2 is greater
list1pos++
if list1pos == len(docdictionary[documentNumber][0]): #We were at the end of list 1, there will be no more matches
break
else: #Page location 1 is greater
list2pos++
if list2pos == len(docdictionary[documentNumber][1]): #We were at the end of list 2, there will be no more matches
break
return None
As a reminder, option 2 only works if the lists are always sorted. Also, you don't always need to return the document id right away. You could just add the document id to a list if you want all the documents that the pair happens in instead of the first one it finds. You could even use a dictionary to easily keep track of how many times the word pair appears in each document.
Hope this helped! Please let me know if anything isn't clear.

Related

auto remove value or string from list if it start with

how can i remove the similar value from list if it start with and keep
one of the value if it has alot of
for example this is my code
list_ph = ['8002378990','8001378990','8202378990','8002378920','8002375990','8002378990','8001378890','8202398990']
so this value sould return 3 value when it will remove the value
if i[:5]
so the result of it will be something like this
['8002378990','8001378990','8202378990']
without i give it specific value or any thing just sub value[:5]
Here is how I would approach this problem.
I would first create two empty lists, one for comparing the first five digits, and the other to save your result, say
first_five = []
res = []
Now, I would loop through all the entries in your list_ph and add the number to res if the first five digits are not already stored in first_five
i.e.
for ph in list_ph:
if ph[:5] not in first_five:
first_five.append(ph[:5])
res.append(ph)
All of your target numbers should be stored in res

Finding consecutive numbers from multiple lists in python

Consider, for example, that I have 3 python lists of numbers, like this:
a = [1,5,7]
b = [2,6,8]
c = [4,9]
I need to be able to check if there are consecutive numbers from these lists, one number from each list, and return true if there are.
In the above example, 7 from list a, 8 from list b and 9 from list c are consecutive, so the returned value should be true.This should be extendable to any number of lists (the number of lists is not known in advance, because they are created on the fly based on prior conditions).
Also, values in a list is not present in any other list. For example, list a above contains the element '1', so '1' is not present in any other list.
Is there a way to accomplish? It seems simple, yet too complex. I am a python newbie, and have been trying all sorts of loops but not even getting close to what I am looking for.
Looking for suggestions. Thanks in advance.
UPDATE: Here is the context for this question.
I am trying to implement a 'phrase search' in a sentence (which is part of a much bigger task).
Here is an example.
The sentence is:
My friend is my colleague.
I have created an index, which is a dictionary having the word as the key and a list of its positions as the value. So for the above sentence, I get:
{
'My': [0,3],
'friend': [1],
'is': [2],
'colleague': [4]
}
I need to search for the phrase 'friend is my' in the above sentence.
So I am trying to do something like this:
First get the positions of words in the phrase from the dictionary, to get:
{
'My': [0,3],
'friend': [1],
'is': [2],
}
Then check if the words in my phrase have consecutive positions, which goes back to my original question of finding consecutive numbers in different lists.
Since 'friend' is in position 1, 'is' is in position 2, and 'my' is in position 3. Hence, I should be able to conclude that the given sentence contains my phrase.
Can you assume
lists are sorted?
O(n) memory usage is acceptable?
As a start, you could merge the lists and then check for consecutive elements. This isn't a complete solution because it would match consecutive elements that all appear in a single list (see comments).
from itertools import chain, pairwise
# from https://docs.python.org/3/library/itertools.html#itertools-recipes
def triplewise(iterable):
"Return overlapping triplets from an iterable"
# triplewise('ABCDEFG') --> ABC BCD CDE DEF EFG
for (a, _), (b, c) in pairwise(pairwise(iterable)):
yield a, b, c
def consecutive_numbers_in_list(*lists: list[list]) -> bool:
big_list = sorted(chain(*lists))
for first, second, third in triplewise(big_list):
if (first + 1) == second == (third - 1):
return True
return False
consecutive_numbers_in_list(a, b, c)
# True
Note itertools.pairwise is py 3.10
If the lists are sorted but you need constant memory, then you can use an n pointer approach in which you have a pointer to the first element of each list, then advance the lowest pointer on each iteration and keep track of the last three values seen at all times.
Ultimately, your question doesn't make that much sense, in that this doesn't seem like a typical programming task. If you are a newbie to programming, you can ask what you are trying to accomplish, instead of how to implement your candidate solution, and we might be able to suggest a better method overall. See https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
UPDATE
You are implementing phrase search. So an additional requirement, compared to the original question, is that the first list contain the first index of the sequence, the second list contain the second index of the sequence, etc. (As I assume that "friend my is" is not an acceptable search result for the query "my friend is".)
Pseudocode:
for each index i in the j=1th list:
for each list from the jth list to the nth list:
see whether i + j - 1 appears in list j
Depending on the characteristics of your data, you may find there are easier/more efficient approaches
can find all the documents matching n of the search terms in the phrase, then do exact substring matching in the document
if search terms have max token length that is relatively short, then you can add n-grams to your search index
This is a very general problem, you can look at implementations in popular search engines like ElasticSearch.

How to only check for two values existence in a list

I have a list of lists say, for example:
directions = [[-1,0,1],[1,0,4],[1,1,2][-1,1,2]]
now, in any of the nested lists the index [2] is of no importance in the test.
I want to try to find if the first two values in any of the nested lists match the inverse of any other, To clarify further by inverse I mean the negative value In python code. preferable with only one line but if that not possible than a work around to get the same effect.
and if is condition is true and the third values of the two nested lists should be added together and stored in the second original list in the check function and the second list which was the inverse one should be deleted.
So
if nested list's first 2 values == -another nested list's first 2 values
add their third values together
list delete(inverse list)
I hope this makes a little more sense.
I have tried this below but I still cant get it to skip the 3 value or index 2
listNum = 0
while len(directions) > listNum:
if (-directions[listNum][0], -directions[listNum][1], anything(Idk)) in directions:
index = index(-directions[listNum][0], -directions[listNum][1], anything(Idk))
directions[listNum][2] += directions[index][2]
directions.del(index)
But I don't know what to put where I put anything(Idk)

Count how many times a series of strings appears in a list - Python

I'm working on Python (and I'm very new to it) and I have several tuples of the type
A = ('T-ha', 'T-he', 'PRE-ma')
B = ('T-ha', 'M-ha', 'PRE-ma')
and I want to count how many times several strings appear in each tuple and, in case this number is higher than 1, delete the tuple.
The strings that I want to test are something like T, PRE and M.
In this case, I would delete the first tuple and keep the second.
I know that, with str.count(str2), I can check if an individual of those strings is present but I need to check all of them at the same time (and once the count is higher than 1, stop the counting and delete the tuple).
Any ideas?
Thankss in advance!
Probably not the most elegant solution, but this might do the trick:
search = ['T', 'PRE', 'M']
for i in search:
if ''.join(B).count(i) > 1:
del B
Put the strings you want to test for into a list, temporarily convert your tuple into a single string and count occurrences of the items on your searchlist. If the count comes out > 1 delete the tuple.

Python recursive function seems to lose some variable values

I have 4x4 table of letters and I want to find all possible paths there. They are candidates for being words. I have problems with the variable "used" It is a list that includes all the places where the path has been already so it doesn't go there again. There should be one used-list for every path. But it doesn't work correctly. For example I had a test print that printed the current word and the used-list. Sometimes the word had only one letter, but path had gone through all 16 cells/indices.
The for-loop of size 8 is there for all possible directions. And main-function executes the chase-function 16 times - once for every possible starting point. Move function returns the indice after moving to a specific direction. And is_allowed tests for whether it is allowed to move to a certain division.
sample input: oakaoastsniuttot. (4x4 table, where first 4 letters are first row etc.)
sample output: all the real words that can be found in dictionary of some word
In my case it might output one or two words but not nearly all, because it thinks some cells are used eventhough they are not.
def chase(current_place, used, word):
used.append(current_place) #used === list of indices that have been used
word += letter_list[current_place]
if len(word)>=11:
return 0
for i in range(3,9):
if len(word) == i and word in right_list[i-3]: #right_list === list of all words
print word
break
for i in range(8):
if is_allowed(current_place, i) and (move(current_place, i) not in used):
chase(move(current_place, i), used, word)
The problem is that there's only one used list that gets passed around. You have two options for fixing this in chase():
Make a copy of used and work with that copy.
Before you return from the function, undo the append() that was done at the start.

Categories