I write a code that has str data
def characters(self, content):
self.contentText = content.split()
# self.contentText is List here
I am sending self.contentText list to another module as:
self.contentText = Formatter.formatter(self.contentText)
In this method, I am writing below code:
remArticles = remArticles = {' a ':'', ' the ':'', ' and ':'', ' an ':'', '& nbsp;':''}
contentText = [i for i in contentText if i not in remArticles.keys()]
But it is not replacing. Is it that remArticles should be list and not dict
But I tried replacing it with list too. It wouldn't simply replace.
ofcourse with list, below will be the code:
contentText = [i for i in contentText if i not in remArticles]
This is continuation from Accessing Python List Type
Initially I was trying:
for i in remArticles:
print type(contentText)
print "1"
contentText = contentText.replace(i, remArticles[i])
print type(contentText)
But that threw errors:
contentText = contentText.replace(i, remArticles[i])
AttributeError: 'list' object has no attribute 'replace'
Your question is not clear but if your goal is to convert a string to a list, remove unwanted words, and then turn the list back into a string, then you can do it like this:
def clean_string(s):
words_to_remove = ['a', 'the', 'and', 'an', ' ']
list_of_words = s.split()
cleaned_list = [word for word in list_of_words if word not in words_to_remove]
new_string = ' '.join(cleaned_list)
return new_string
This is how you could do the same without converting to a list:
def clean_string(s):
words_to_remove = ['a', 'the', 'and', 'an', ' ']
for word in words_to_remove:
s = s.replace(word, '')
return s
And if you wanted more flexibility in removing some words but replacing others, you could do the following with a dictionary:
def clean_string(s):
words_to_replace = {'a': '', 'the': '', 'and': '&', 'an': '', ' ': ' '}
for old, new in words_to_replace.items():
s = s.replace(old, new)
return s
Your problem is that your map contains spaces within the keys. Following code solves your problem:
[i for i in contentText if i not in map(lambda x: x.strip(), remArticles.keys())]
Related
I am trying to remove ' ' from the list and it gives me IndexError: list out of range
I have tried new_string.remove(' ') as well
def func_name(new_string):
for k in range(len(new_string)):
if new_string[k] == '':
new_string = new_string[:k] + new_string[k+1:]
return new_string
print func_name(['Title','','This','is','a','link','.'])
print func_name(['','','Hello','World!','',''])
print func_name(['',''])
You should try a more pythonic way, like this:
def func_name(new_string):
return [s for s in new_string if s != '']
print func_name(['Title', '', 'This', 'is', 'a', 'link', '.'])
print func_name(['', '', 'Hello', 'World!', '', ''])
print func_name(['', ''])
Your problem is that the list is shortened, but the indices are still iterated as if it's the original list.
There are better ways of implementing this (see other answers), but to fix your implementation, you can iterate in reverse order thus avoiding the "overflow":
def remove_empty(s):
for k in reversed(range(len(s))):
if s[k] == '':
del s[k]
This only works in case where you remove at most one element at each iteration.
Note that this mutates s in place.
>>> a = ['Title','','This','is','a','link','.']
>>> remove_empty(a)
>>> a
['Title','This','is','a','link','.']
check an example:
x = ["don't", '', 13]
while '' in x:
x.remove('')
assert x == ["don't", 13]
I am getting multiple lists as an output to a function. I want to combine all the lists and form only one list. Please help
def words(*args):
word =[args]
tokens = nltk.wordpunct_tokenize(''.join(word))
for word in tokens:
final = wn.synsets(word)
synonyms = set()
for synset in final:
for synwords in synset.lemma_names:
synonyms.add(synwords)
final = list(synonyms)
dic = dict(zip(word,final))
dic[word] = final
return final
Use this wherever you are using words function (which is a list of lists according to your question):
wordlist = [word for sublist in words(args) for word in sublist]
Once your code is corrected I find no problem with function words() returning anything other than a flat list. If there is some input that reproduces the problem please update your question with it.
The correction is to pass args directly into join(), not to wrap it in a list.
def words(*args):
tokens = nltk.wordpunct_tokenize(''.join(args))
for word in tokens:
final = wn.synsets(word)
synonyms = set()
for synset in final:
for synwords in synset.lemma_names:
synonyms.add(synwords)
final = list(synonyms)
dic = dict(zip(word,final))
dic[word] = final
return final
>>> words('good day', ', this is a', ' test!', 'the end.')
['beneficial', 'right', 'secure', 'just', 'unspoilt', 'respectable', 'good', 'goodness', 'dear', 'salutary', 'ripe', 'expert', 'skillful', 'in_force', 'proficient', 'unspoiled', 'dependable', 'soundly', 'honorable', 'full', 'undecomposed', 'safe', 'adept', 'upright', 'trade_good', 'sound', 'in_effect', 'practiced', 'effective', 'commodity', 'estimable', 'well', 'honest', 'near', 'skilful', 'thoroughly', 'serious']
if i have a list of strings-
common = ['the','in','a','for','is']
and i have a sentence broken up into a list-
lst = ['the', 'man', 'is', 'in', 'the', 'barrel']
how can i compare the two,and if there are any words in common, then print the full string again as a title. I have part of it working but my end result prints out the newly changed in common strings as well as the original.
new_title = lst.pop(0).title()
for word in lst:
for word2 in common:
if word == word2:
new_title = new_title + ' ' + word
new_title = new_title + ' ' + word.title()
print(new_title)
output:
The Man is Is in In the The Barrel
so I'm trying to get it so that the lower case words in common, stay in the new sentence, without the originals, and without them changing into title case.
>>> new_title = ' '.join(w.title() if w not in common else w for w in lst)
>>> new_title = new_title[0].capitalize() + new_title[1:]
'The Man Is in the Barrel'
If all you’re trying to do is to see whether any of the elements of lst appear in common, you can do
>>> common = ['the','in','a','for']
>>> lst = ['the', 'man', 'is', 'in', 'the', 'barrel']
>>> list(set(common).intersection(lst))
['the', 'in']
and just check to see whether the resulting list has any elements in it.
If you want the words in common to be lowercased and you want all of the other words to be uppercased, do something like this:
def title_case(words):
common = {'the','in','a','for'}
partial = ' '.join(word.title() if word not in common else word for word in words)
return partial[0].capitalize() + partial[1:]
words = ['the', 'man', 'is', 'in', 'the', 'barrel']
title_case(words) # gives "The Man Is in the Barrel"
If I have a string and want to return a word that includes a whitespace how would it be done?
For example, I have:
line = 'This is a group of words that include #this and #that but not ME ME'
response = [ word for word in line.split() if word.startswith("#") or word.startswith('#') or word.startswith('ME ')]
print response ['#this', '#that', 'ME']
So ME ME does not get printed because of the whitespace.
Thanks
You could just keep it simple:
line = 'This is a group of words that include #this and #that but not ME ME'
words = line.split()
result = []
pos = 0
try:
while True:
if words[pos].startswith(('#', '#')):
result.append(words[pos])
pos += 1
elif words[pos] == 'ME':
result.append('ME ' + words[pos + 1])
pos += 2
else:
pos += 1
except IndexError:
pass
print result
Think about speed only if it proves to be too slow in practice.
From python Documentation:
string.split(s[, sep[, maxsplit]]): Return a list of the words of the string s. If the optional second
argument sep is absent or None, the words are separated by arbitrary
strings of whitespace characters (space, tab, newline, return,
formfeed).
so your error is first on the call for split.
print line.split()
['This', 'is', 'a', 'group', 'of', 'words', 'that', 'include', '#this', 'and', '#that', 'but', 'not', 'ME', 'ME']
I recommend to use re for splitting the string. Use the re.split(pattern, string, maxsplit=0, flags=0)
guys, I'm a programming newbie trying to improve the procedure bellow in a way that when I pass it this argument: split_string("After the flood ... all the colors came out."," .") it returns it:
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
and not this:
['After', 'the', 'flood', '', '', '', '', 'all', 'the', 'colors', 'came', 'out', '']
Any hint of how to do this? (I could just iterate again the list and delete the '' elements, but I wanted a more elegant solution)
This is the procedure:
def split_string(source, separatorList):
splited = [source]
for separator in splitlist:
source = splited
splited = []
print 'separator= ', separator
for sequence in source:
print 'sequence = ', sequence
if sequence not in splitlist and sequence != ' ':
splited = splited + sequence.split(separator)
return splited
print split_string("This is a test-of the,string separation-code!", " ,!-")
print
print split_string("After the flood ... all the colors came out."," .")
You can filter out the empty strings in the return statement:
return [x for x in split if x]
As a side note, I think it would be easier to write your function based on re.split():
def split_string(s, separators):
pattern = "|".join(re.escape(sep) for sep in separators)
return [x for x in re.split(pattern, s) if x]
print re.split('[. ]+', 'After the flood ... all the colors came out.')
or, better, the other way round
print re.findall('[^. ]+', 'After the flood ... all the colors came out.')
Let's see where did the empty strings come from first, try to execute this in shell:
>>> 'After the'.split(' ')
result:
['After', '', 'the']
This was because when split method came to ' ' in the string, it find nothing but '' between two spaces.
So the solution is simple, just check the boolean value of every item get from .split(
def split_string(source, separatorList):
splited = [source]
for separator in separatorList:
# if you want to exchange two variables, then write in one line can make the code more clear
source, splited = splited, []
for sequence in source:
# there's no need to check `sequence` in advance, just split it
# if sequence not in separatorList and sequence != ' ':
# splited = splited + sequence.split(separator)
# code to prevent appearance of `''` is here, do a if check in list comprehension.
# `+=` is equivalent to `= splited +`
splited += [i for i in sequence.split(separator) if i]
return splited
More details about [i for i in a_list if i] see PEP 202