getting rid of proper nouns in a nested list python - python

I'm trying to right a program that takes in a nested list, and returns a new list that takes out proper nouns.
Here is an example:
L = [['The', 'name', 'is', 'James'], ['Where', 'is', 'the', 'treasure'], ['Bond', 'cackled', 'insanely']]
I want to return:
['the', 'name', 'is', 'is', 'the', 'tresure', 'cackled', 'insanely']
Take note that 'where' is deleted. It is ok since it does not appear anywhere else in the nested list. Each nested list is a sentence. My approach to it is append every first element in the nested list to a newList. Then I compare to see if elements in the newList are in the nested list. I would lowercase the element's in the newList to check. I'm half way done with this program, but I'm running into an error when I try to remove the element from the newList at the end. Once i get the new updated list, I want to delete items from the nestedList that are in the newList. I'd lastly append all the items in the nested list to a newerList and lowercase them. That should do it.
If someone has a more efficient approach I'd gladly listen.
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
newList.remove(firstWord)
return newList
Note this code is not finished due to the error in the second to last line
Here is with the newerList (updatedNewList):
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
updatedNewList = newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
updatedNewList.remove(firstWord)
return updatedNewList
error message:
Traceback (most recent call last):
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 80, in lowerCaseFirst
ValueError: list.remove(x): x not in list

The error in your first function is because you try to remove an uppercased version of firstWord from newlist where there are no uppercase words (you see that from the printout). Remember that you store a upper/lowercased version of your words in a new variable, but you don't change the contents of the original list.
I still don't understand your approach. You want to do to things as you describe your task; 1) flatten the a lists of lists to a list of elements (always an interesting programming exercise) and 2) remove proper nouns from this list. This means that you have to decide what is a proper noun. You could do that rudimentarily (all non-starting capitalized words, or an exhaustive list), or you could use a POS tagger (see: Finding Proper Nouns using NLTK WordNet). Unless I misunderstand your task completely, you needn't worry about the casing here.
The first task can be solved in many ways. Here is a nice way that illustrates well what actually happenes in the simple case where your list L is a list of lists (and not lists that can be infinitely nested):
def flatten(L):
newList = []
for sublist in L:
for elm in sublist:
newList.append(elm)
return newList
this function you could make into flattenAndFilter(L) by checking each element like this:
PN = ['James', 'Bond']
def flattenAndFilter(L):
newList = []
for sublist in L:
for elm in sublist:
if not elm in PN:
newList.append(elm)
return newList
You might not have such a nice list of PNs, though, then you would have to expand on the checking, as for instance by parsing the sentence and checking the POS tags.

Related

Try to find an element in the list based on a part of the string

I have a list = ['Assassin Bow', 'Bone Bow', 'Citadel Bow']
and I have a string = 'Bone Bow of Fire'
How can I get output 'Bone Bow' as the result ?
Just started codding, thx for understanding.
l = ['Assassin Bow', 'Bone Bow', 'Citadel Bow']
s = 'Bone Bow of Fire'
# loop through each element in list 'l'
for x in l:
# if the element is somewhere in the string 's'
if x in s:
print(x)
You can check iteratively to see if an element in list is included in string or not.
The very basic is to use a for loop:
result = []
for item in list:
if item in string:
result.append(item)
A more comprehensive way:
result = [i for i in l if i in s]
This will return a list containing all elements that satisfies the condition. In your example, it will be a list of one item, but for other cases, there can be more.
Notes:
list is a predefined function/class in python, so do NOT name your variables with it.
string is technically allowed, but it's a better practice not to use it. You should name your variables cleverer and more meaningful.

Replacing list item based on another list using pseudo-token

So, I am new to Python. I want to replace the values of my list if they are also in another list and change them to a specified value, pseudo token (OOV). I have turned them into tokens and with a regex i cleaned the code a little bit.
This is my code:
def replace_words(list1, list2):
for word in list1:
for words in list2:
if word == words:
word = "OOV"
replace_words(list1, list2)
list1.count("OOV") #this keeps showing 0, so something is wrong...
Your code is not working beacuse you are trying to assign the variable word a new value OOV which is fine but it doesn't actually changes that element inside the list1. So you need to change item inplace inside list1
Try this:
def replace_words(list1, list2):
for idx in range(len(list1)):
if list1[idx] in list2:
list1[idx] = "OOV"
And when you now execute >>>list1.count("OOV") it will not return 0 if there is value in list1 which is also in list2
Hope this helps!
What you are doing wrong is assuming that setting word = "oov" will replace the element in the list. This is not true you need to replace by accessing the index of that list. Read more on here
The following should work
def replace_words(list1, list2):
for i in range(0,len(list1)-1):#using index
for words in list2:
if list1[i] == words:
list1[i] = "OOV"
replace_words(list1, list2)
list1.count("OOV")

Replace string in specific index in list of lists python

How can i replace a string in list of lists in python but i want to apply the changes only to the specific index and not affecting the other index, here some example:
mylist = [["test_one", "test_two"], ["test_one", "test_two"]]
i want to change the word "test" to "my" so the result would be only affecting the second index:
mylist = [["test_one", "my_two"], ["test_one", "my_two"]]
I can figure out how to change both of list but i can't figure out what I'm supposed to do if only change one specific index.
Use indexing:
newlist = []
for l in mylist:
l[1] = l[1].replace("test", "my")
newlist.append(l)
print(newlist)
Or oneliner if you always have two elements in the sublist:
newlist = [[i, j.replace("test", "my")] for i, j in mylist]
print(newlist)
Output:
[['test_one', 'my_two'], ['test_one', 'my_two']]
There is a way to do this on one line but it is not coming to me at the moment. Here is how to do it in two lines.
for two_word_list in mylist:
two_word_list[1] = two_word_list.replace("test", "my")

Why does my list skip certain elements when I iteratate?

I am trying to write some code which adds an element in one list to another list and then removes it from the first list. It also should not add duplicates to the new list which is where the if statement comes in.
However, when adding to the 'individuals' list and removing from the 'sentence_list' list it misses out certain words such as 'not' and 'for'. This is also not random and the same words are missed each time. Any help?
sentence = "I am a yellow fish"
sentence_list = sentence.lower().split()
individuals = []
for i in sentence_list:
if i in individuals:
print ("yes")
sentence_list.remove(i)
else:
individuals.append(i)
sentence_list.remove(i)
print ("individuals", individuals)
print ("sentence_list", sentence_list)
The issue is that you are removing items from the list you are looping through. You can fix this just by making a copy of the list and looping through it instead, like this:
sentence = "ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY"
sentence_list = sentence.lower().split()
individuals = []
#We slice with [:] to make a copy of the list
orig_list = sentence_list[:]
for i in orig_list:
if i in individuals:
print ("yes")
sentence_list.remove(i)
else:
individuals.append(i)
sentence_list.remove(i)
print ("individuals", individuals)
print ("sentence_list", sentence_list)
The lists are now what was expected:
print(individuals)
print(sentence_list)
['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']
[]
In general you should not add or remove elements to a list as you iterate over it. Given that you are removing every single element of the list, just remove the lines with sentence_list.remove(i).
If you actually need to remove just some elements from the list you're iterating I'd either: make a new empty list and add the elements you want to keep to that, or keep a track of which indices in the list you want to remove as you iterate and then remove after the loop.
For the first solution,
oldList = [1, 2, 3, 4]
newList = []
for i in oldList:
shouldRemove = i % 2
if not shouldRemove:
newList.append(i)
For the second,
oldList = [1, 2, 3, 4]
indicesToKeep = []
for i, e in enumerate(oldList):
shouldRemove = e % 2
if not shouldRemove:
indicesToKeep.append(i)
newList = [e for i, e in enumerate(oldList) if i in indicesToKeep]

Find how many lists in list have the same element

I am new at Python, so I'm having trouble with something. I have a few string lists in one list.
list=[ [['AA','A0'],['AB','A0']],
[['AA','B0'],['AB','A0']],
[['A0','00'],['00','A0'], [['00','BB'],['AB','A0'],['AA','A0']] ]
]
And I have to find how many lists have the same element. For example, the correct result for the above list is 3 for the element ['AB','A0'] because it is the element that connects the most of them.
I wrote some code...but it's not good...it works for 2 lists in list,but not for more....
Please,help!
This is my code...for the above list...
for t in range(0,len(list)-1):
pattern=[]
flag=True
pattern.append(list[t])
count=1
rest=list[t+1:]
for p in pattern:
for j in p:
if flag==False:
break
pair= j
for i in rest:
for y in i:
if pair==y:
count=count+1
break
if brojac==len(list):
flag=False
break
Since your data structure is rather complex, you might want to build a recursive function, that is a function that calls itself (http://en.wikipedia.org/wiki/Recursion_(computer_science)).
This function is rather simple. You iterate through all items of the original list. If the current item is equal to the value you are searching for, you increment the number of found objects by 1. If the item is itself a list, you will go through that sub-list and find all matches in that sub-list (by calling the same function on the sub-list, instead of the original list). You then increment the total number of found objects by the count in your sub-list. I hope my explanation is somewhat clear.
alist=[[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
def count_in_list(val, arr):
val_is_list = isinstance(val, list)
ct = 0
for item in arr:
item_is_list = isinstance(item, list)
if item == val or (val_is_list and item_is_list and sorted(item) == sorted(val)):
ct += 1
if item_is_list :
ct += count_in_list(val, item)
return ct
print count_in_list(['AB', 'A0'], alist)
This is an iterative approach that will also work using python3 that will get the count of all sublists:
from collections import defaultdict
d = defaultdict(int)
def counter(lst, d):
it = iter(lst)
nxt = next(it)
while nxt:
if isinstance(nxt, list):
if nxt and isinstance(nxt[0], str):
d[tuple(nxt)] += 1
rev = tuple(reversed(nxt))
if rev in d:
d[rev] += 1
else:
lst += nxt
nxt = next(it,"")
return d
print((counter(lst, d)['AB', 'A0'])
3
It will only work on data like your input, nesting of strings beside lists will break the code.
To get a single sublist count is easier:
def counter(lst, ele):
it = iter(lst)
nxt = next(it)
count = 0
while nxt:
if isinstance(nxt, list):
if ele in (nxt, nxt[::-1]):
count += 1
else:
lst += nxt
nxt = next(it, "")
return count
print(counter(lst, ['AB', 'A0']))
3
Ooookay - this maybe isn't very nice and straightforward code, but that's how i'd try to solve this. Please don't hurt me ;-)
First,
i'd fragment the problem in three smaller ones:
Get rid of your multiple nested lists,
Count the occurence of all value-pairs in the inner lists and
Extract the most occurring value-pair from the counting results.
1.
I'd still use nested lists, but only of two-levels depth. An outer list, to iterate through, and all the two-value-lists inside of it. You can finde an awful lot of information about how to get rid of nested lists right here. As i'm just a beginner, i couldn't make much out of all that very detailed information - but if you scroll down, you'll find an example similar to mine. This is what i understand, this is how i can do.
Note that it's a recursive function. As you mentioned in comments that you think this isn't easy to understand: I think you're right. I'll try to explain it somehow:
I don't know if the nesting depth is consistent in your list. and i don't want to exctract the values themselves, as you want to work with lists. So this function loops through the outer list. For each element, it checks if it's a list. If not, nothing happens. If it is a list, it'll have a look at the first element inside of that list. It'll check again if it's a list or not.
If the first element inside the current list is another list, the function will be called again - recursive - but this time starting with the current inner list. This is repeated until the function finds a list, containing an element on the first position that is NOT a list.
In your example, it'll dig through the complete list-of-lists, until it finds your first string values. Then it gets the list containing this value - and put that in another list, the one that is returned.
Oh boy, that sounds really crazy - tell me if that clarified anything... :-D
"Yo dawg, i herd you like lists, so i put a list in a list..."
def get_inner_lists(some_list):
inner_lists = []
for item in some_list:
if hasattr(item, '__iter__') and not isinstance(item, basestring):
if hasattr(item[0], '__iter__') and not isinstance(item[0], basestring):
inner_lists.extend(get_inner_lists(item))
else:
inner_lists.append(item)
return inner_lists
Whatever - call that function and you'll find your list re-arranged a little bit:
>>> foo = [[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
>>> print get_inner_lists(foo)
[['AA', 'A0'], ['AB', 'A0'], ['AA', 'B0'], ['AB', 'A0'], ['A0', '00'], ['00', 'A0'], ['00', 'BB'], ['AB', 'A0'], ['AA', 'A0']]
2.
Now i'd iterate through that lists and build a string with their values. This will only work with lists of two values, but as this is what you showed in your example it'll do. While iterating, i'd build up a dictionary with the strings as keys and the occurrence as values. That makes it really easy to add new values and raise the counter of existing ones:
def count_list_values(some_list):
result = {}
for item in some_list:
str = item[0]+'-'+item[1]
if not str in result.keys():
result[str] = 1
else:
result[str] += 1
return result
There you have it, all the counting is done. I don't know if it's needed, but as a side effect there are all values and all occurrences:
>>> print count_list_values(get_inner_lists(foo))
{'00-A0': 1, '00-BB': 1, 'A0-00': 1, 'AB-A0': 3, 'AA-A0': 2, 'AA-B0': 1}
3.
But you want clear results, so let's loop through that dictionary, list all keys and all values, find the maximum value - and return the corresponding key. Having built the string-of-two-values with a seperator (-), it's easy to split it and make a list out of it, again:
def get_max_dict_value(some_dict):
all_keys = []
all_values = []
for key, val in some_dict.items():
all_keys.append(key)
all_values.append(val)
return all_keys[all_values.index(max(all_values))].split('-')
If you define this three little functions and call them combined, this is what you'll get:
>>> print get_max_dict_value(count_list_values(get_inner_lists(foo)))
['AB', 'A0']
Ta-Daa! :-)
If you really have such lists with only nine elements, and you don't need to count values that often - do it manually. By reading values and counting with fingers. It'll be so much easier ;-)
Otherwise, here you go!
Or...
...you wait until some Guru shows up and gives you a super fast, elegant one-line python command that i've never seen before, which will do the same ;-)
This is as simple as I can reasonably make it:
from collections import Counter
lst = [ [['AA','A0'],['AB','A0']],
[['AA','B0'],['AB','A0']],
[['A0','00'],['00','A0'], [['00','BB'],['AB','A0'],['AA','A0']] ]
]
def is_leaf(element):
return (isinstance(element, list) and
len(element) == 2 and
isinstance(element[0], basestring)
and isinstance(element[1], basestring))
def traverse(iterable):
for element in iterable:
if is_leaf(element):
yield tuple(sorted(element))
else:
for value in traverse(element):
yield value
value, count = Counter(traverse(lst)).most_common(1)[0]
print 'Value {!r} is present {} times'.format(value, count)
The traverse() generate yields a series of sorted tuples representing each item in your list. The Counter object counts the number of occurrences of each, and its .most_common(1) method returns the value and count of the most common item.
You've said recursion is too difficult, but I beg to differ: it's the simplest way possible to attack this problem. The sooner you come to love recursion, the happier you'll be. :-)
Hopefully soemthing like this is what you were looking for. It is a bit tenuous and would suggest that recursion is better. But Since you didn't want it that way here is some code that might work. I am not super good at python but hope it will do the job:
def Compare(List):
#Assuming that the list input is a simple list like ["A1","B0"]
myList =[[['AA','A0'],['AB','A0']],[['AA','B0'],['AB','A0']],[['A0','00'],['00','A0'],[['00','BB'],['AB','A0'],['AA','A0']]]]
#Create a counter that will count if the elements are the same
myCounter = 0;
for innerList1 in myList:
for innerList2 in innerList1
for innerList3 in innerList2
for element in innerList3
for myListElements in myList
if (myListElements == element)
myCounter = myCounter + 1;
#I am putting the break here so that it counts how many lists have the
#same elements, not how many elements are the same in the lists
break;
return myCounter;

Categories