Replacing list item based on another list using pseudo-token - python

So, I am new to Python. I want to replace the values of my list if they are also in another list and change them to a specified value, pseudo token (OOV). I have turned them into tokens and with a regex i cleaned the code a little bit.
This is my code:
def replace_words(list1, list2):
for word in list1:
for words in list2:
if word == words:
word = "OOV"
replace_words(list1, list2)
list1.count("OOV") #this keeps showing 0, so something is wrong...

Your code is not working beacuse you are trying to assign the variable word a new value OOV which is fine but it doesn't actually changes that element inside the list1. So you need to change item inplace inside list1
Try this:
def replace_words(list1, list2):
for idx in range(len(list1)):
if list1[idx] in list2:
list1[idx] = "OOV"
And when you now execute >>>list1.count("OOV") it will not return 0 if there is value in list1 which is also in list2
Hope this helps!

What you are doing wrong is assuming that setting word = "oov" will replace the element in the list. This is not true you need to replace by accessing the index of that list. Read more on here
The following should work
def replace_words(list1, list2):
for i in range(0,len(list1)-1):#using index
for words in list2:
if list1[i] == words:
list1[i] = "OOV"
replace_words(list1, list2)
list1.count("OOV")

Related

The second list is being filled over and over. What is wrong with this python code?

def firstnonrepeatingchar(str1):
list1=list(str1)
list2=[]
print(list1)
for ch in list1:
if ch not in list2:
a=list1.count(ch)
list2.append(a)
print(list2)
for x in list2:
if(x==1):
print(list1[x+2])
string1="aaabccc"
firstnonrepeatingchar(string1)
The output is giving list2 as
[3,3,3,1,3,3,3]
how to make it only as [3,1,3]?
You are getting this [3,3,3,1,3,3,3] because you are appending every time you encounter the character.
A better approach would be to use OrderedSet that does not allow duplicates and preserves the order.
from orderedset import OrderedSet
def firstnonrepeatingchar(str1):
s = OrderedSet(str1)
list2 = []
for ch in s:
list2.append(str1.count(ch))
# or list2 = [str1.count(c) for c in s]
print(list2)
string1="aaabccc"
firstnonrepeatingchar(string1)
Errors in your code:
if ch not in list2:
ch will never be there in list2 because you never append ch to list2, you append the count.
Fix for your code:
def firstnonrepeatingchar(str1):
list1 = []
list2 = []
for ch in str1:
if ch not in list1:
list1.append(ch)
list2.append(str1.count(ch))
print(list2)
Although, I do not recommend this if ch not in list1:. It performs a linear search. Using a set will be better for this problem.
In the if statement in the first loop, if ch not in list2:, you're checking if the current character is in list2. But you're appending the counts to that list. So it never passes that check and adds the count for every character in the string/array. I would suggest using a dictionary to store characters and their counts together, so then the if statement can check if the key (character) exists, and if not can add the key with its count. Then you should be able to find the first non-repeating character (the first entry in the dic with a 1 count). As of Python 3.6 dictionaries will remember the order of insertion, otherwise use an OrderedDict.

Replace string in specific index in list of lists python

How can i replace a string in list of lists in python but i want to apply the changes only to the specific index and not affecting the other index, here some example:
mylist = [["test_one", "test_two"], ["test_one", "test_two"]]
i want to change the word "test" to "my" so the result would be only affecting the second index:
mylist = [["test_one", "my_two"], ["test_one", "my_two"]]
I can figure out how to change both of list but i can't figure out what I'm supposed to do if only change one specific index.
Use indexing:
newlist = []
for l in mylist:
l[1] = l[1].replace("test", "my")
newlist.append(l)
print(newlist)
Or oneliner if you always have two elements in the sublist:
newlist = [[i, j.replace("test", "my")] for i, j in mylist]
print(newlist)
Output:
[['test_one', 'my_two'], ['test_one', 'my_two']]
There is a way to do this on one line but it is not coming to me at the moment. Here is how to do it in two lines.
for two_word_list in mylist:
two_word_list[1] = two_word_list.replace("test", "my")

Python trouble with matching tuples

For reference this is my code:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
for i in list1:
for j in list2:
for k in j:
print (k)
if k.upper() in i:
matching_app.add(j)
for i in matching_app:
print (i)
When I run it, it does not match. This list can contain two or three variables and I need it to add it to the matching_app set if ANY value from list2 = ANY value from list1. It does not work unless the tuples are of equal length.
Any direction to how to resolve this logic error will be appreciated.
You can solve this in a few different ways. Here are two approaches:
Looping:
list1 = [('10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01')]
list2 = [('0.0.0.0', 'STCMGMTUNIX01')]
matches = []
for i in list1[0]:
if i in list2[0]:
matches.append(i)
print(matches)
#['STCMGMTUNIX01']
List Comp with a set
merged = list(list1[0] + list2[0])
matches2 = set([i for i in merged if merged.count(i) > 1])
print(matches2)
#{'STCMGMTUNIX01'}
I'm not clear of what you want to do. You have two lists, each containing exactly one tuple. There also seems to be one missing comma in the first tuple.
For finding an item from a list in another list you can:
list1 = ['10.180.13.101', '10.50.60.30', 'STCMGMTUNIX01']
list2 = ['0.0.0.0', 'STCMGMTUNIX01']
for item in list2:
if item.upper() in list1: # Check if item is in list
print(item, 'found in', list1)
Works the same way with tuples.

getting rid of proper nouns in a nested list python

I'm trying to right a program that takes in a nested list, and returns a new list that takes out proper nouns.
Here is an example:
L = [['The', 'name', 'is', 'James'], ['Where', 'is', 'the', 'treasure'], ['Bond', 'cackled', 'insanely']]
I want to return:
['the', 'name', 'is', 'is', 'the', 'tresure', 'cackled', 'insanely']
Take note that 'where' is deleted. It is ok since it does not appear anywhere else in the nested list. Each nested list is a sentence. My approach to it is append every first element in the nested list to a newList. Then I compare to see if elements in the newList are in the nested list. I would lowercase the element's in the newList to check. I'm half way done with this program, but I'm running into an error when I try to remove the element from the newList at the end. Once i get the new updated list, I want to delete items from the nestedList that are in the newList. I'd lastly append all the items in the nested list to a newerList and lowercase them. That should do it.
If someone has a more efficient approach I'd gladly listen.
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
newList.remove(firstWord)
return newList
Note this code is not finished due to the error in the second to last line
Here is with the newerList (updatedNewList):
def lowerCaseFirst(L):
newList = []
for nestedList in L:
newList.append(nestedList[0])
print newList
updatedNewList = newList
for firstWord in newList:
sum = 0
firstWord = firstWord.lower()
for nestedList in L:
for word in nestedList[1:]:
if firstWord == word:
print "yes"
sum = sum + 1
print newList
if sum >= 1:
firstWord = firstWord.upper()
updatedNewList.remove(firstWord)
return updatedNewList
error message:
Traceback (most recent call last):
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 80, in lowerCaseFirst
ValueError: list.remove(x): x not in list
The error in your first function is because you try to remove an uppercased version of firstWord from newlist where there are no uppercase words (you see that from the printout). Remember that you store a upper/lowercased version of your words in a new variable, but you don't change the contents of the original list.
I still don't understand your approach. You want to do to things as you describe your task; 1) flatten the a lists of lists to a list of elements (always an interesting programming exercise) and 2) remove proper nouns from this list. This means that you have to decide what is a proper noun. You could do that rudimentarily (all non-starting capitalized words, or an exhaustive list), or you could use a POS tagger (see: Finding Proper Nouns using NLTK WordNet). Unless I misunderstand your task completely, you needn't worry about the casing here.
The first task can be solved in many ways. Here is a nice way that illustrates well what actually happenes in the simple case where your list L is a list of lists (and not lists that can be infinitely nested):
def flatten(L):
newList = []
for sublist in L:
for elm in sublist:
newList.append(elm)
return newList
this function you could make into flattenAndFilter(L) by checking each element like this:
PN = ['James', 'Bond']
def flattenAndFilter(L):
newList = []
for sublist in L:
for elm in sublist:
if not elm in PN:
newList.append(elm)
return newList
You might not have such a nice list of PNs, though, then you would have to expand on the checking, as for instance by parsing the sentence and checking the POS tags.

Python append adds an item to every variable

I'm iterating on a list and attempting to create sublists from its items. Every time I append to a variable, the value is added to every other variable that I have defined. I've stripped down the code substantially to illustrate.
item = 'things.separated.by.periods'.split('.')
list1 = list2 = []
i = item.pop(0)
print i
list1.append(i)
i = item.pop(0)
print i
list2.append(i)
print(item, list1, list2)
Returns:
things
separated
(['by', 'periods'], ['things', 'separated'], ['things', 'separated'])
What I expected:
things
separated
(['by', 'periods'], ['things'], ['separated'])
I think this might by answered here, but I'm not sure how to apply this fix to my circumstances. Thanks in advance!
The problem is the line
list1 = list2 = []
This makes list1 and list2 refer to the exact same list, so that if you append an item to one you also append it to the other. Change it to
list1 = []
list2 = []
list1 = list2 = []
You are setting list1 to be the exact same list as list2. Therefore, they basically mean the same thing.
To fix this, try something like this:
list1, list2 = [], []

Categories