I changed this list
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list=[x.split(",") for x in orig_list]
new_list=[['"jason"', '"hello1', 'hello2', 'hello3"', '"somegroup2"', '"bundle1"', '"loc1"'], ['"ruby"', '"hello"', '"somegroup"', '"bundle2"', '"loc2"'], ['"sam"', '"hello3', 'hello2"', '"somegroup3', 'somegroup4"', '"bundle2"', '"loc3"']]
what my intent is to get
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Is it possible to do it inplace and not creating a new one?
Update : I can have some elements in double quotes, all in double quotes, no double quotes and same in single quotes.
Instead of splitting on , split on ",":
new_list=[[l.replace('"','') for l in x.split('","') for x in orig_list]
new_list
Out[99]: [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
If you need an in-place removal of quotes, you need to add in the [:] to the list comprehension assignment:
orig_list = ['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
id1 = id(orig_list)
orig_list[:] = [w for w in orig_list]
orig_list[:] = [g.replace('"', "'") for g in orig_list]
orig_list[:] = [h.split("',") for h in orig_list]
orig_list[:] = [[j.replace("'", '') for j in k] for k in orig_list]
id2 = id(orig_list)
print id1 == id2 # True
print orig_list # [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Note the orig_list[:] = .... This ensures that you don't create a copy of the list (hence, making it not in-place).
Valid list, preserving grouping of grouped elements
Use the reader function from the csv module:
from csv import reader
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list = []
for line in reader(orig_list):
new_list.append(line)
This outputs the results you requested:
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Ungroup all elements
If you want to ungroup all the comma-delimited elements, you can convert the list to a string and then split it:
orig_list2=['jason,"hello1,hello2,hello3",somegroup2,bundle1,loc1', 'ruby,hello,somegroup,bundle2,loc2', 'sam','hello3,hello2',"somegroup3,somegroup4","bundle2",'loc3']
orig_list2 = str(orig_list2)
# list of characters to remove
bad_chars = ['\'','"','[',']',' ']
for c in bad_chars:
orig_list2 = orig_list2.replace(c,'')
# put into a list
new_list2 = orig_list2.split(',')
If you're dealing with a string that looks like a list but is invalid because some quotes are not complete pairs like the example you left in a comment for JohnZ, you can also use this method, but you wouldn't need to convert it to a string.
Related
I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.
You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')
Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.
I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)
import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.
Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']
Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))
my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]
I have a list of lists and I want to remove inside double quotes from each line.
Initially it was like this:
[['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
After fixing my code I got this:
[['"MILK', 'BREAD', 'BISCUIT"'], ['"BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES"']]
I want to have like this
[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
I tried my best but I am not able to figure out how to do it.
My code looks like this:
def getFeatureData(featureFile):
x=[]
dFile = open(featureFile, 'r')
for line in dFile:
row = line.split()
#row[-1]=row[-1].strip()
x.append(row)
dFile.close()
print(x)
return x
You can use replace and list comprehension.
list_with_quotes = [['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
list_without_quotes = [[l[0].replace('"','')] for l in list_with_quotes]
print(list_without_quotes)
>>out
>>[['MILK,BREAD,BISCUIT'], ['BREAD,MILK,BISCUIT,CORNFLAKES']]
EDIT sorry I did it quickly and didn't notice that my output is not exactly what you wanted. Here below a for loop that do the job:
list_without_quotes = []
for l in list_with_quotes:
# get list
with_quotes = l[0]
# separate words by adding spaces before and after comma to use split
separated_words = with_quotes.replace(","," ")
# remove quotes in each word and recreate list
words = [ w.replace('"','') for w in separated_words.split()]
# append list to final list
list_without_quotes.append(words)
print(list_without_quotes)
>>out
>>[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
Try this, using a list comprehension:
initial = [['"MILK,BREAD,BISCUIT"'], ['"BREAD,MILK,BISCUIT,CORNFLAKES"']]
final = [item[0].replace('"', '').split(',') for item in initial]
print(final)
Output:
[['MILK', 'BREAD', 'BISCUIT'], ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES']]
I am trying to split the strings from a list based on the whitespaces.
My code:
line=['abc def','ghi jk']
for x in line:
word=x.split(' ')
print(word)
Expected output:
['abc','def','ghi','jk']
But I keep getting the output as:
['ghi','jk']
Where am I going wrong?
You can join the strings with a space first before splitting the string by spaces:
' '.join(line).split()
This returns:
['abc', 'def', 'ghi', 'jk']
Every time word=x.split(' ') runs, the previous value of words is replaced by a new value. In your case, the first iteration of the loop creates word = ['abc','def'], and the second iteration replaces it by word = ['ghi','jk'].
You need to store the result of each iteration into a new list:
line = ['abc def', 'ghi jk']
result = []
for x in line:
result.extend(x.split(' '))
print(result)
A neat one-liner:
Given line = ['abc def', 'ghi jk']
word = [item for sublist in [elem.split() for elem in line] for item in sublist]
Then you have
>>> word
['abc', 'def', 'ghi', 'jk']
What happens here? The [elem.split() for elem in line] is a list-comprehension, taking elem.split() for each element in the original list line, and putting each result in a list.
>>> [elem.split() for elem in line]
[['abc', 'def'], ['ghi', 'jk']]
Suppose then, that we again use list-comprehension and take from each element of our new nested list, each element it has, and puts it in a list. The procedure is called flattening of a list, and is of the form:
flattened_list = [item for sublist in nestedlists for item in sublist]
Split it, then flatten
line = ['abc def','ghi jk']
temp = [s.split(' ') for s in line]
res = [c for s in temp for c in s]
print(res)
Result
['abc','def','ghi','jk']
Or by using operator and reduce
import operator
from functools import reduce
line = [s.split(' ') for s in line]
res = reduce(operator.concat, line)
print(res)
Result:
['abc', 'def', 'ghi', 'jk']
I have a file in this format
[('misure', 'di', 'protezione'), ('libertà', 'di', 'espressione', 'di', 'popolo')]
What I want is to eliminate from the tuple the preposition (di), and return the result in the same format. So I created this function to do this
lista = myfilelist
prep = prepositionfile
li = ast.literal_eval(lista)
for i in li:
word = str(i)
ll = word.split("', '")
for w in ll:
lll= w.strip("('')")
if lll in prep:
i = word.replace(lll, "")
i.strip('')
print (nonlem(li))
but it return the tuple in this format
[('misure', '', 'protezione'), ('libertà', '', 'espressione', '', 'popolo')]
The output needed is
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')] ### without ''
You can use ast.literal_eval to turn the string into a list of tuples
>>> from ast import literal_eval
>>> l = literal_eval("[('misure', 'di', 'protezione'), ('libertà', 'di', 'espressione')]")
Then a list comprehension to make new tuples out of the first and last element
>>> [(i[0], i[-1]) for i in l]
[('misure', 'protezione'), ('libertà', 'espressione')]
Then str to create a string from the list of tuples
>>> str([(i[0], i[-1]) for i in l])
"[('misure', 'protezione'), ('libertà', 'espressione')]"
Edit
If you want to remove all instances of the string 'di' it is the same idea, you can use a list comprehension
>>> [tuple(i for i in j if i != 'di') for j in l]
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')]
Edit 2
Even more generally, if you have a set of prepositions you want to exclude
>>> prepositions = {'di', 'a', 'al'}
>>> [tuple(i for i in j if i not in prepositions) for j in l]
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')]
Tuple are immutable so you cannot change them once created. You can create new ones. Try this:
[tuple(j for j in i if 'di' not in j) for i in a]
Here is the working demo.
[tuple(j for j in i if 'di' != j) for i in li]
I modified sam2090 code, since the string 'di' can be part of a word like 'dim'
my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
I'm trying to split this list into sublists separated by \t
output = [['1','Melkor','Morgoth','SauronAtDolGoldul'],['2','Thigols','HeirIsDior','Silmaril'],['3','Arkenstone','IsProbablyA','Silmaril']]
I was thinking something on the lines of
output = []
for k_string in my_list:
temp = []
for i in k_string:
temp_s = ''
if i != '\':
temp_s = temp_s + i
elif i == '\':
break
temp.append(temp_s)
it gets messed up with the t . . i'm not sure how else I would go about doing it. I've seen people use .join for similar things but I don't really understand how to use .join
You want to use str.split(); a list comprehension lets you apply this to all elements in one line:
output = [sub.split('\t') for sub in my_list]
There is no literal \ in the string; the \t is an escape code that signifies the tab character.
Demo:
>>> my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
>>> [sub.split('\t') for sub in my_list]
[['1', 'Melkor', 'Morgoth', 'SauronAtDolGoldul'], ['2', 'Thingols', 'HeirIsDior', 'Silmaril'], ['3', 'Arkenstone', 'IsProbablyA', 'Silmaril']]
>>> import csv
>>> my_list = ['1\tMelkor\tMorgoth\tSauronAtDolGoldul','2\tThingols\tHeirIsDior\tSilmaril','3\tArkenstone\tIsProbablyA\tSilmaril']
>>> list(csv.reader(my_list, delimiter='\t'))
[['1', 'Melkor', 'Morgoth', 'SauronAtDolGoldul'], ['2', 'Thingols', 'HeirIsDior', 'Silmaril'], ['3', 'Arkenstone', 'IsProbablyA', 'Silmaril']]