Remove a substr from string items in a list - python

list = [ 'u'adc', 'u'toto', 'u'tomato', ...]
What I want is to end up with a list of the kind:
list2 = [ 'adc', 'toto', 'tomato'... ]
Can you please tell me how to do that without using regex?
I'm trying:
for item in list:
list.extend(str(item).replace("u'",''))
list.remove(item)
but this ends up giving something of the form [ 'a', 'd', 'd', 'm'...]
In the list I may have an arbitrary number of strings.

you can encode it to "utf-8" like this:
list_a=[ u'adc', u'toto', u'tomato']
list_b=list()
for i in list_a:
list_b.append(i.encode("utf-8"))
list_b
output:
['adc', 'toto', 'tomato']
Or you can use str function:
list_c = list()
for i in list_a:
list_c.append(str(i))
list_c
Output:
['adc', 'toto', 'tomato']

Use "u\'"
For example:
l = [ "u'adc", "u'toto", "u'tomato"]
for item in l:
print(item.replace("u\'", ""))
Will output:
adc
toto
tomato

I verified your question but it says the syntax problem, which means that the way you are declaring the string in the list is not proper. In which case, I have corrected that at line #2.
In [1]: list = [ 'u'adc', 'u'toto', 'u'tomato']
File "<ipython-input-1-2c6e581e868e>", line 1
list = [ 'u'adc', 'u'toto', 'u'tomato']
^
SyntaxError: invalid syntax
In [2]: list = [ u'adc', u'toto', u'tomato']
In [3]: list = [ str(item) for item in list ]
In [4]: list
Out[4]: ['adc', 'toto', 'tomato']
In [5]:

Solution-1
input_list = [ u'adc', u'toto', u'tomato']
output_list=map(lambda x:str(x),input_list )
print output_list
And Output Look like:
['adc', 'toto', 'tomato']
Solution-2
input_list = [ u'adc', u'toto', u'tomato']
output_list=map(lambda x:x.encode("utf-8"),input_list )
print output_list
And Output Look like:
['adc', 'toto', 'tomato']

Try this:
for item in list:
for x in range(0, len(item)):
if item[x] == 'u':
item[x] = ''
This takes all instances in the list, and checks for the string 'u'. If 'u' is found, than the code replaces it with a blank string, essentially deleting it. Some more code could allow this to check for combinations of letters ('abc', etc.).

Your input is nothing but a json! You the dump each item in the list(which is a json!) to get the desired output!
Since your output comes with quotes - you need to strip(beginning and trailing) them!
import json
list = [ u'adc', u'toto', u'tomato']
print [json.dumps(i).strip('\"') for i in list]
Output:
['adc', 'toto', 'tomato']
Hope it helps!

Related

Is there a way to change this dictionary output?

Change my dictionary, this is the initial code:
bow=[[i for i in all_docs[j] if i not in stopwords] for j in range(n_docs)]
bow=list(filter(None,bow))
bow
Here is bow output:
[['lunar',
'satellite',
'needs'],
['glad',
'see',
'griffin'] ]
worddict_two = [ (i,key) for i,key in enumerate(bow)]
worddict_two
From this output :
[(0,
['lunar',
'satellite',
'needs']),
(1,
['glad',
'see',
'griffin'])
to this output:
[(0,'lunar satellite needs'),
(1,'glad see griffin') ) ]
worddict_two = [ (i, " ".join(key)) for i,key in enumerate(bow)]
This would work. Use join to Join all items in a tuple into a string with space as a separator
You can just join the list with spaces like so
worddict_two = [ (i,' '.join(key)) for i,key in enumerate(bow)]
you can do it like this:
bow = [
['lunar','satellite','needs'],
['glad','see','griffin']
]
res = [(i,*key) for i,key in enumerate(bow)]
print(res)
Try this:
word_three = [(item[0], ', '.join(word for word in item[1])) for item in worddict_two]

Removing specific set of characters in a list of strings

I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.
You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')
Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.
I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)
import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.
Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']
Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))
my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]

Find List2 in List1 at starting Position

List1 = ['RELEASE', 'KM123', 'MOTOR', 'XS4501', 'NAME']
List2 = ['KM', 'XS', 'M']
Now I am using code that only searches List2 in List1 in any position.
Result = [ s for s in List1 if any(xs in s for xs in List2]
Output :
[KM123', 'MOTOR', 'XS4501', 'NAME']
But I don't want 'NAME' to be in the list because it contains 'M' not in the starting. Any help...
Use str.startswith() which checks if a string starts with a particular sequence of characters:
[s for s in List1 if any(s.startswith(xs) for xs in List2)]
Looks like you can use str.startswith
Ex:
List1 = ['RELEASE', 'KM123', 'MOTOR', 'XS4501', 'NAME']
List2 = ('KM', 'XS', 'M') #convert to tuple
result = [ s for s in List1 if s.startswith(List2)]
print(result) #-->['KM123', 'MOTOR', 'XS4501']

inplace removal of quotes from list of list

I changed this list
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list=[x.split(",") for x in orig_list]
new_list=[['"jason"', '"hello1', 'hello2', 'hello3"', '"somegroup2"', '"bundle1"', '"loc1"'], ['"ruby"', '"hello"', '"somegroup"', '"bundle2"', '"loc2"'], ['"sam"', '"hello3', 'hello2"', '"somegroup3', 'somegroup4"', '"bundle2"', '"loc3"']]
what my intent is to get
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Is it possible to do it inplace and not creating a new one?
Update : I can have some elements in double quotes, all in double quotes, no double quotes and same in single quotes.
Instead of splitting on , split on ",":
new_list=[[l.replace('"','') for l in x.split('","') for x in orig_list]
new_list
Out[99]: [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
If you need an in-place removal of quotes, you need to add in the [:] to the list comprehension assignment:
orig_list = ['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
id1 = id(orig_list)
orig_list[:] = [w for w in orig_list]
orig_list[:] = [g.replace('"', "'") for g in orig_list]
orig_list[:] = [h.split("',") for h in orig_list]
orig_list[:] = [[j.replace("'", '') for j in k] for k in orig_list]
id2 = id(orig_list)
print id1 == id2 # True
print orig_list # [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Note the orig_list[:] = .... This ensures that you don't create a copy of the list (hence, making it not in-place).
Valid list, preserving grouping of grouped elements
Use the reader function from the csv module:
from csv import reader
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list = []
for line in reader(orig_list):
new_list.append(line)
This outputs the results you requested:
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Ungroup all elements
If you want to ungroup all the comma-delimited elements, you can convert the list to a string and then split it:
orig_list2=['jason,"hello1,hello2,hello3",somegroup2,bundle1,loc1', 'ruby,hello,somegroup,bundle2,loc2', 'sam','hello3,hello2',"somegroup3,somegroup4","bundle2",'loc3']
orig_list2 = str(orig_list2)
# list of characters to remove
bad_chars = ['\'','"','[',']',' ']
for c in bad_chars:
orig_list2 = orig_list2.replace(c,'')
# put into a list
new_list2 = orig_list2.split(',')
If you're dealing with a string that looks like a list but is invalid because some quotes are not complete pairs like the example you left in a comment for JohnZ, you can also use this method, but you wouldn't need to convert it to a string.

List of strings in a list

I have multiple lists of strings, inside a list. I want to change the strings that are digits into integers.
ex:-
L1=[['123','string','list']['words','python','456']['code','678','links']]
What i want is:
[[123,'string','list']['words','python',456]['code',678,'links']]
I have tried using-
W=range(len(L1))
Q=range(2)
if (L1[W][Q]).isdigit():
(L1[W][Q])=(int(L1[W][Q]))
when I tried the above code, I got an error.
Use str.isdigit():
L1=[['123','string','list'],['words','python','456'],['code','678','links']]
for item in L1:
for i in range(0,len(item)):
if(item[i].isdigit()):
item[i] = int(item[i])
print(L1)
Something like this:
>>> mylist = [['123','string','list'], ['words','python','456'], ['code','678','links']]
>>> [ [(int(item) if item.isdigit() else item) for item in sublist] for sublist in mylist]
[[123, 'string', 'list'], ['words', 'python', 456], ['code', 678, 'links']]

Categories