I have a file in this format
[('misure', 'di', 'protezione'), ('libertà', 'di', 'espressione', 'di', 'popolo')]
What I want is to eliminate from the tuple the preposition (di), and return the result in the same format. So I created this function to do this
lista = myfilelist
prep = prepositionfile
li = ast.literal_eval(lista)
for i in li:
word = str(i)
ll = word.split("', '")
for w in ll:
lll= w.strip("('')")
if lll in prep:
i = word.replace(lll, "")
i.strip('')
print (nonlem(li))
but it return the tuple in this format
[('misure', '', 'protezione'), ('libertà', '', 'espressione', '', 'popolo')]
The output needed is
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')] ### without ''
You can use ast.literal_eval to turn the string into a list of tuples
>>> from ast import literal_eval
>>> l = literal_eval("[('misure', 'di', 'protezione'), ('libertà', 'di', 'espressione')]")
Then a list comprehension to make new tuples out of the first and last element
>>> [(i[0], i[-1]) for i in l]
[('misure', 'protezione'), ('libertà', 'espressione')]
Then str to create a string from the list of tuples
>>> str([(i[0], i[-1]) for i in l])
"[('misure', 'protezione'), ('libertà', 'espressione')]"
Edit
If you want to remove all instances of the string 'di' it is the same idea, you can use a list comprehension
>>> [tuple(i for i in j if i != 'di') for j in l]
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')]
Edit 2
Even more generally, if you have a set of prepositions you want to exclude
>>> prepositions = {'di', 'a', 'al'}
>>> [tuple(i for i in j if i not in prepositions) for j in l]
[('misure', 'protezione'), ('libertà', 'espressione', 'popolo')]
Tuple are immutable so you cannot change them once created. You can create new ones. Try this:
[tuple(j for j in i if 'di' not in j) for i in a]
Here is the working demo.
[tuple(j for j in i if 'di' != j) for i in li]
I modified sam2090 code, since the string 'di' can be part of a word like 'dim'
Related
I am wondering how to construct a string, which takes 1st letter of each word from list. Then it takes 2nd letter from each word etc.
For example :
Input --> my_list = ['good', 'bad', 'father']
Every word has different length (but the words in the list could have equal length)
The output should be: 'gbfoaaodtdher'.
I tried:
def letters(my_list):
string = ''
for i in range(len(my_list)):
for j in range(len(my_list)):
string += my_list[j][i]
return string
print(letters(['good', 'bad', 'father']))
and I got:
'gbfoaaodt'.
That's a good job for itertools.zip_longest:
from itertools import zip_longest
s = ''.join([c for x in zip_longest(*my_list) for c in x if c])
print(s)
Or more_itertools.interleave_longest:
from more_itertools import interleave_longest
s = ''.join(interleave_longest(*my_list))
print(s)
Output: gbfoaaodtdher
Used input:
my_list = ['good', 'bad', 'father']
The answer by #mozway is the best approach, but if you want to go along with your original method, this is how
def letters(my_list):
string = ''
max_len = max([len(s) for s in my_list])
for i in range(max_len):
for j in range(len(my_list)):
if i < len(my_list[j]):
string += my_list[j][i]
return string
print(letters(['good', 'bad', 'father']))
Output: gbfoaaodtdher
We can do without zip_longest as well:
l = ['good', 'bad', 'father']
longest_string=max(l,key=len)
''.join(''.join([e[i] for e in l if len(e) > i]) for i in range(len(longest_string)))
#'gbfoaaodtdher'
I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.
You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')
Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.
I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)
import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.
Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']
Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))
my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]
I have a list of n lists. Each internal list contains a combination of (a) strings, (b) the empty list, or (c) a list containing one string. I would like to transform the inside lists so they only contain the strings.
I have a list like this for example:
[[[],["a"],"a"],[["ab"],[],"abc"]]
and I would like it to be like this:
[["","a","a"],["ab","","abc"]]
I know I could probably go through with a loop but I am looking for a more elegant solution, preferably with a list comprehension.
List comprehension:
>>> original = [[[],["a"],"a"],[["ab"],[],"abc"]]
>>> result = [['' if not item else ''.join(item) for item in sublist] for sublist in original]
>>> result
[['', 'a', 'a'], ['ab', '', 'abc']]
As every element of the list that you'd like to flatten is iterable, instead of checking of being instance of some class (list, string) you can actually make use of duck-typing:
>> my_list = [[[],["a"],"a"],[["ab"],[],"abc"]]
>> [list(map(lambda x: ''.join(x), elem)) for elem in my_list]
Or more readable version:
result = []
for elem in my_list:
flatten = map(lambda x: ''.join(x), elem)
result.append(list(flatten))
Result:
[['', 'a', 'a'], ['ab', '', 'abc']]
It's quite pythonic to not to check what something is but rather leverage transformation mechanics to adaptive abilities of each of the structure.
Via list comprehension:
lst = [[[],["a"],"a"],[["ab"],[],"abc"]]
result = [ ['' if not v else (v[0] if isinstance(v, list) else v) for v in sub_l]
for sub_l in lst ]
print(result)
The output:
[['', 'a', 'a'], ['ab', '', 'abc']]
original_list = [[[],["a"],"a"],[["ab"],[],"abc"]]
flatten = lambda x: "" if x == [] else x[0] if isinstance(x, list) else x
flattened_list = [[flatten(i) for i in j] for j in original_list]
I changed this list
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list=[x.split(",") for x in orig_list]
new_list=[['"jason"', '"hello1', 'hello2', 'hello3"', '"somegroup2"', '"bundle1"', '"loc1"'], ['"ruby"', '"hello"', '"somegroup"', '"bundle2"', '"loc2"'], ['"sam"', '"hello3', 'hello2"', '"somegroup3', 'somegroup4"', '"bundle2"', '"loc3"']]
what my intent is to get
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Is it possible to do it inplace and not creating a new one?
Update : I can have some elements in double quotes, all in double quotes, no double quotes and same in single quotes.
Instead of splitting on , split on ",":
new_list=[[l.replace('"','') for l in x.split('","') for x in orig_list]
new_list
Out[99]: [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
If you need an in-place removal of quotes, you need to add in the [:] to the list comprehension assignment:
orig_list = ['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
id1 = id(orig_list)
orig_list[:] = [w for w in orig_list]
orig_list[:] = [g.replace('"', "'") for g in orig_list]
orig_list[:] = [h.split("',") for h in orig_list]
orig_list[:] = [[j.replace("'", '') for j in k] for k in orig_list]
id2 = id(orig_list)
print id1 == id2 # True
print orig_list # [['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Note the orig_list[:] = .... This ensures that you don't create a copy of the list (hence, making it not in-place).
Valid list, preserving grouping of grouped elements
Use the reader function from the csv module:
from csv import reader
orig_list=['"jason","hello1,hello2,hello3","somegroup2","bundle1","loc1"', '"ruby","hello","somegroup","bundle2","loc2"', '"sam","hello3,hello2","somegroup3,somegroup4","bundle2","loc3"']
new_list = []
for line in reader(orig_list):
new_list.append(line)
This outputs the results you requested:
[['jason', 'hello1,hello2,hello3', 'somegroup2', 'bundle1', 'loc1'], ['ruby', 'hello', 'somegroup', 'bundle2', 'loc2'], ['sam', 'hello3,hello2', 'somegroup3,somegroup4', 'bundle2', 'loc3']]
Ungroup all elements
If you want to ungroup all the comma-delimited elements, you can convert the list to a string and then split it:
orig_list2=['jason,"hello1,hello2,hello3",somegroup2,bundle1,loc1', 'ruby,hello,somegroup,bundle2,loc2', 'sam','hello3,hello2',"somegroup3,somegroup4","bundle2",'loc3']
orig_list2 = str(orig_list2)
# list of characters to remove
bad_chars = ['\'','"','[',']',' ']
for c in bad_chars:
orig_list2 = orig_list2.replace(c,'')
# put into a list
new_list2 = orig_list2.split(',')
If you're dealing with a string that looks like a list but is invalid because some quotes are not complete pairs like the example you left in a comment for JohnZ, you can also use this method, but you wouldn't need to convert it to a string.
I have a list of lists and I want to convert the second value in each list to an int since it is currently a string
[['hello','how','are','you','1'],['hello','how','are','you','2']]
I am trying to convert index 4 to an int in each list within this larger list but when I do
for hi in above:
int(hi[4])
It is just returning the int when I print the list and not the entire list.
Just traverse it and convert it using the int() function for every 4th element in every list inside :
for li in my_list:
li[4] = int(li[4])
This list comprehension is one way to do it:
a_list = [[int(a) if item.index(a) == 4 else a for a in item] for item in a_list]
Demo:
>>> a_list = [['hello','how','are','you','1'],['hello','how','are','you','2']]
>>> a_list = [[int(a) if item.index(a) == 4 else a for a in item] for item in a_list]
>>> a_list
[['hello', 'how', 'are', 'you', 1], ['hello', 'how', 'are', 'you', 2]]
>>>