I have a list of lists and I want to remove inside double quotes from each line.
Initially it was like this:
[['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
After fixing my code I got this:
[['"MILK', 'BREAD', 'BISCUIT"'], ['"BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES"']]
I want to have like this
[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
I tried my best but I am not able to figure out how to do it.
My code looks like this:
def getFeatureData(featureFile):
x=[]
dFile = open(featureFile, 'r')
for line in dFile:
row = line.split()
#row[-1]=row[-1].strip()
x.append(row)
dFile.close()
print(x)
return x
You can use replace and list comprehension.
list_with_quotes = [['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
list_without_quotes = [[l[0].replace('"','')] for l in list_with_quotes]
print(list_without_quotes)
>>out
>>[['MILK,BREAD,BISCUIT'], ['BREAD,MILK,BISCUIT,CORNFLAKES']]
EDIT sorry I did it quickly and didn't notice that my output is not exactly what you wanted. Here below a for loop that do the job:
list_without_quotes = []
for l in list_with_quotes:
# get list
with_quotes = l[0]
# separate words by adding spaces before and after comma to use split
separated_words = with_quotes.replace(","," ")
# remove quotes in each word and recreate list
words = [ w.replace('"','') for w in separated_words.split()]
# append list to final list
list_without_quotes.append(words)
print(list_without_quotes)
>>out
>>[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
Try this, using a list comprehension:
initial = [['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
final = [item[0].replace('"', '').split(',') for item in initial]
print(final)
Output:
[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
Related
I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.
You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')
Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.
I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)
import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.
Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']
Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))
my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]
I am trying to split the strings from a list based on the whitespaces.
My code:
line=['abc def','ghi jk']
for x in line:
word=x.split(' ')
print(word)
Expected output:
['abc','def','ghi','jk']
But I keep getting the output as:
['ghi','jk']
Where am I going wrong?
You can join the strings with a space first before splitting the string by spaces:
' '.join(line).split()
This returns:
['abc', 'def', 'ghi', 'jk']
Every time word=x.split(' ') runs, the previous value of words is replaced by a new value. In your case, the first iteration of the loop creates word = ['abc','def'], and the second iteration replaces it by word = ['ghi','jk'].
You need to store the result of each iteration into a new list:
line = ['abc def', 'ghi jk']
result = []
for x in line:
result.extend(x.split(' '))
print(result)
A neat one-liner:
Given line = ['abc def', 'ghi jk']
word = [item for sublist in [elem.split() for elem in line] for item in sublist]
Then you have
>>> word
['abc', 'def', 'ghi', 'jk']
What happens here? The [elem.split() for elem in line] is a list-comprehension, taking elem.split() for each element in the original list line, and putting each result in a list.
>>> [elem.split() for elem in line]
[['abc', 'def'], ['ghi', 'jk']]
Suppose then, that we again use list-comprehension and take from each element of our new nested list, each element it has, and puts it in a list. The procedure is called flattening of a list, and is of the form:
flattened_list = [item for sublist in nestedlists for item in sublist]
Split it, then flatten
line = ['abc def','ghi jk']
temp = [s.split(' ') for s in line]
res = [c for s in temp for c in s]
print(res)
Result
['abc','def','ghi','jk']
Or by using operator and reduce
import operator
from functools import reduce
line = [s.split(' ') for s in line]
res = reduce(operator.concat, line)
print(res)
Result:
['abc', 'def', 'ghi', 'jk']
I have a list:
lst = ['words in a list']
and I was hoping to split each one of these words in the string into their own separate indexes. So for example, it would look something like this:
lst = ['words','in','a','list']
I'm wondering if this is possible? I thought initially this would be just a simple lst.split() with a loop, but it seems like this is will throw an error.
Thanks for the help!
Use this:
print(lst[0].split())
If the list has more elements:
print([x for i in lst for x in i.split()])
Split only works for a string type. So you need to index the list item first and then split.
lst = lst[0].split()
Use this when you have a list of string or single string inside a list
lst = ['this is string1', 'this is string2', 'this is string3']
result =' '.join(lst).split()
print(result)
# output : ['this', 'is', 'string1', 'this', 'is', 'string2', 'this', 'is', 'string3']
The goal of my codes are to write a function and return a list of strings, in which the successive strings (fruit name) correspond to the consecutive #No.1...#No.5 . The whole name of the fruit was split over multiple lines, and I want to display the fruit name in the list as a single string with no whitespace.
I expect my codes return:
['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
but I got:
['', 'Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
These are the contents of my file fruit.txt:
#NO.1
P
ear
#NO.2
A
pp
l
e
#NO.3
Cherry
#NO.4
Banan
a
#NO.5
Pea
c
h
These are my codes:
def read(filename):
myfile = open('fruit', 'r')
seq = ''
list1 = []
for line in myfile:
if line[0] != '#':
seq +=line.rstrip('\n')
else:
list1.append(seq)
seq = ''
list1.append(seq)
return list1
how to avoid to append an empty string which is not what I want? I suppose I just need to adjust the position a certain line of codes, any suggestion is appreciated.
You could change the
else:
to
elif seq:
This checks whether seq is empty and only appends it if it's not.
Alternative if you'd like a single line solution:
with open('fruit.txt') as f:
content = f.read()
output = [''.join(x.split('\n')[1:len(x.split('\n'))+1]) for x in content.split('#') if len(x.split('\n')) > 1]
Quick fix for removing empty strings from a list:
list1 = filter(None, list1)
How about this solution with regex? The following is a two-step process. First all whitespace like newlines, spaces etc. is removed. Then all words following your pattern #No.\d are found:
import re
whitespace = re.compile(r'\s*')
fruitdef = re.compile(r'#NO\.\d(\w*)')
inputfile = open('fruit', 'r').read()
inputstring = re.sub(whitespace, '', inputfile)
fruits = re.findall(fruitdef, inputstring)
print fruits
['Pear', 'Apple', 'Cherry', 'Banana', 'Peach']
Minified to a oneliner:
import re
print re.findall(r'#NO\.\d(\w*)', re.sub(r'\s*', '', open('fruit', 'r').read()))
I changed this list
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list=[x.split(",") for x in orig_list]
new_list=[['"jason"', '"hello1', 'hello2', 'hello3"', '"somegroup2"', '"bundle1"', '"loc1"'], ['"ruby"', '"hello"', '"somegroup"', '"bundle2"', '"loc2"'], ['"sam"', '"hello3', 'hello2"', '"somegroup3', 'somegroup4"', '"bundle2"', '"loc3"']]
what my intent is to get
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Is it possible to do it inplace and not creating a new one?
Update : I can have some elements in double quotes, all in double quotes, no double quotes and same in single quotes.
Instead of splitting on , split on ",":
new_list=[[l.replace('"','') for l in x.split('","') for x in orig_list]
new_list
Out[99]: [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
If you need an in-place removal of quotes, you need to add in the [:] to the list comprehension assignment:
orig_list = ['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
id1 = id(orig_list)
orig_list[:] = [w for w in orig_list]
orig_list[:] = [g.replace('"', "'") for g in orig_list]
orig_list[:] = [h.split("',") for h in orig_list]
orig_list[:] = [[j.replace("'", '') for j in k] for k in orig_list]
id2 = id(orig_list)
print id1 == id2 # True
print orig_list # [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Note the orig_list[:] = .... This ensures that you don't create a copy of the list (hence, making it not in-place).
Valid list, preserving grouping of grouped elements
Use the reader function from the csv module:
from csv import reader
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list = []
for line in reader(orig_list):
new_list.append(line)
This outputs the results you requested:
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Ungroup all elements
If you want to ungroup all the comma-delimited elements, you can convert the list to a string and then split it:
orig_list2=['jason,"hello1,hello2,hello3",somegroup2,bundle1,loc1', 'ruby,hello,somegroup,bundle2,loc2', 'sam','hello3,hello2',"somegroup3,somegroup4","bundle2",'loc3']
orig_list2 = str(orig_list2)
# list of characters to remove
bad_chars = ['\'','"','[',']',' ']
for c in bad_chars:
orig_list2 = orig_list2.replace(c,'')
# put into a list
new_list2 = orig_list2.split(',')
If you're dealing with a string that looks like a list but is invalid because some quotes are not complete pairs like the example you left in a comment for JohnZ, you can also use this method, but you wouldn't need to convert it to a string.