extract sequences from python list

extract sequences from python list - python

I have a list in python which looks like this:
['x','x','x','x','P','x','x','N','P','N','x','x','x','N','P','x','x,'x,','x','x','x','N','x,'x','P','N','x','x','x'....]
I need to process the list in some way such that I return individual sequences of P and N. In the above case I need to return:
[['P'],['N','P','N'],['N','P'],['N'],['P','N'].....]
I have looked at itertools but have not found anything that can do this. I have a lot of lists to process in this way so efficiency is also important.

You can do it using itertools.groupby:
from itertools import groupby
data = ['x','x','x','x','P','x','x','N','P','N','x','x','x','N',
'P','x','x','x','x','x','x','N','x','x','P','N','x','x','x']
out = list(list(g) for k, g in groupby(data, lambda item: item in {'N', 'P'}) if k)
print(out)
# [['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]
We group according to item in {'N', 'P'}, and keep only the groups for which this is True.

main_list = []
def get_desired_value(_list):
new_list = []
for val in _list:
if val in ['N', 'P']:
new_list.append(val)
else:
if new_list:
main_list.append(new_list[:])
new_list.clear()
return main_list
print(get_desired_value(data))
>>>[['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]

Related

Python rearrange list based on another list

I want to rearrange a list based on another list which have common elements between them.
my list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
Above lists have common 'a','b' and 'c' as common elements.the expected outcome for is as below
my_result = ['a','b','c','q','s','f','l','x']
Thanks in Advance
Sky

my_list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
res1=[x for x in base_list if x in my_list] # common elements
res2=[x for x in my_list if x not in res1] #
res3=res1+res2
Output :
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']

Create a custom key for sorted as shown in this document. Set the value arbitrarily high for the letters that don't appear in the base_list so they end up in the back. Since sorted is considered stable those that aren't in the base_list will remain untouched in terms of original order.
l = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
def custom_key(letter):
try:
return base_list.index(letter)
except ValueError:
return 1_000
sorted(l, key=custom_key)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']

A (probably non optimal) way:
>>> sorted(my_list, key=lambda x: base_list.index(x) if x in base_list
else len(base_list)+1)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']

List shuffling by range

I have a list full of strings. I want to take the first 10 values, shuffle them, then replace the first 10 values of the list, then with values 11-20, then 21-30, and so on.
For example:
input_list = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
and a function called:
shuffle10(input_list)
>>> ['d','b','c','f','j','i','h','a','e','g','m','n','s','r','k','p','l','q','o','t']
I thought it'd work if I defined an empty list and appended every 10 values randomized:
newlist=[]
for i in range(int(len(input_list) / 10)):
newlist.append(shuffle(input_list[(i*10):(i+1)*10]))
print(newlist)
but all this returns is:
[None]
[None, None]

Use random.sample instead of shuffle
>>> input_list = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
>>> sum((random.sample(input_list[n:n+10], 10) for n in range(0,len(input_list),10)), [])
['f', 'i', 'd', 'a', 'g', 'j', 'e', 'c', 'b', 'h', 'p', 'l', 'r', 'q', 'm', 't', 's', 'n', 'o', 'k']

You're creating a temp list in place and shuffling it but not capturing its results. You can pull out the relevant sublist, shuffle, then create a new list:
new_list=[]
for i in range(1, len(input_list), 10):
list_slice = input_list[i:i + 10]
shuffle(list_slice)
new_list.extend(list_slice)
print(new_list)

How to delete one of each item in a python list

I have a list that looks like this:
lst = ['p','p','p','p','p','m','m','m','n','n','n','n','d','d']
I want to remove one of each item. Currently my code looks like this:
for item in lst:
if (lst[-1] == lst[-2]) == True:
del(lst[-2])
That is if the last two items of the list are the same the second
from the last should be deleted, but my code does not work.

You can make a set of the unique characters, loop over a copy of your list, and then remove items from the set while adding to an output list:
lst = ['p','p','p','p','p','m','m','m','n','n','n','n','d','d']
chars_to_remove = set(lst)
counter = len(chars_to_remove)
output = []
for item in lst[:]:
if item in chars_to_remove:
chars_to_remove.remove(item)
continue
else:
output.append(item)
print(output)
Result:
['p', 'p', 'p', 'p', 'm', 'm', 'n', 'n', 'n', 'd']
Note: You still need to define what happens when there is only a single instance of a string in your list. (i.e. Does it get deleted as well?) In the above code, it will be deleted. But that can be changed like so, by adding another condition to the loop body:
Sample input : lst = ['p','p','p','p','p','m','m','m','q','n','n','n','n','d','d']
for item in lst[:]:
if lst[:].count(item) == 1:
output.append(item)
continue
elif item in chars_to_remove:
chars_to_remove.remove(item)
continue
else:
output.append(item)
Result:
['p', 'p', 'p', 'p', 'm', 'm', 'q', 'n', 'n', 'n', 'd']

You can also, use sum and groupby:
from itertools import groupby
lst = ['p', 'p', 'p', 'p', 'p', 'm', 'm', 'm', 'n', 'n', 'n', 'n', 'd', 'd']
final = sum((list(g)[:-1] for _, g in groupby(lst)), [])
print(final)
Output:
['p', 'p', 'p', 'p', 'm', 'm', 'n', 'n', 'n', 'd']

You could give this a shot
result = []
for _, u in groupby(lst):
new_u = list(u)
last_index = max(1, len(new_u) - 1)
result += new_u[:last_index]

It is not clear what your expected result it. Your code will not work because you are iterating a list while mutating it. Instead, iterate over a copy (lst[:]).
for item in lst[:]:
if (lst[-1] == lst[-2]):
del(lst[-2])
lst
# ['p', 'p', 'p', 'p', 'p', 'm', 'm', 'm', 'n', 'n', 'n', 'n', 'd']
However, you code still needs more to resolve:
I want to remove one of each item
Try this instead:
import itertools as it
lst = ['p','p','p','p','p','m','m','m','n','n','n','n','d','d']
list(it.chain.from_iterable((list(g)[:-1] for _, g in it.groupby(lst))))
# ['p', 'p', 'p', 'p', 'm', 'm', 'n', 'n', 'n', 'd']

Assuming you want to remove single occurrences in the list, can be done in one line:
[lst.remove(c) for c in set(lst)]
This does not return the answer, but modifies your list in place, so lst is now trimmed.
Or wrapped into a potentially more useful function:
def remove_first_occurence(lst):
l = lst.copy()
[l.remove(c) for c in set(l)]
return l

Take every nth block from list

Given a list:
import string
a = list(string.ascii_lowercase)
What is the Pythonic way to return every nth block of m elements? Note that this is different from just returning every nth element.
Desired result of taking every 1st of 3 blocks of 3 elements (take 3, skip 6, take 3, skip 6...):
['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']
I can get to this as:
import itertools
s1 = a[::9]
s2 = a[1::9]
s3 = a[2::9]
res = list(itertools.chain.from_iterable(zip(s1,s2, s3)))
Is there a cleaner way?

For a fixed order of select and skip, you can wrap indices taking the modulo on the total length of the window (9 here) and select only those beneath the given threshold, 3:
lst = [x for i, x in enumerate(a) if i % 9 < 3]
print(lst)
# ['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']
You can make this into a function that makes it more intuitive to use:
def select_skip(iterable, select, skip):
return [x for i, x in enumerate(iterable) if i % (select+skip) < select]
print(select_skip(a, select=3, skip=6))
# ['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']

Perhaps just writing a simple generator is the most readable
def thinger(iterable, take=3, skip=6):
it = iter(iterable)
try:
while True:
for i in range(take):
yield next(it)
for i in range(skip):
next(it)
except StopIteration:
return
This has the advantage of working even if the input is infinite, or not slicable (e.g. data coming in from a socket).

more_itertools is a third-party library that implements itertools recipes and other helpful tools such as more_itertools.windowed.
> pip install more_itertools
Code
import string
from more_itertools import windowed, flatten
m, n = 3, 6
list(flatten(windowed(string.ascii_lowercase, m, step=m+n)))
# ['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']
windowed naturally steps one position per iteration. Given a new step by advancing beyond the overlaps (m), the windows are appropriately determined.

You can do it using some generic "chunks" recipe:
windows = chunks(original_iter, n=3)
Now that you've windowed you're data as you think of it, use islice's second variant for its' 'step' capabilities:
# flattens the list as well using chain
result = chain.from_iterable(islice(windows, 0, None, 2))

You can use a list comprehension and create a function that does this for any skip, take and list values:
import string
import itertools
a = list(string.ascii_lowercase)
def everyNthBlock(a, take, skip):
res = [a[i:i + take] for i in range(0, len(a) ,skip + take)]
return list(itertools.chain(*res))
print(everyNthBlock(a, 3, 6))
#^^^^ => ['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']
print(everyNthBlock(a, 4, 7))
#^^^^ => ['a', 'b', 'c', 'd', 'l', 'm', 'n', 'o', 'w', 'x', 'y', 'z']

Using incomprehensible list comprehension :D
m, n = 3, 3
[elem for blockstart in range(0, len(a), m*n) for elem in a[blockstart:blockstart+n]]
#> ['a', 'b', 'c', 'j', 'k', 'l', 's', 't', 'u']

Is that a tag list or something else?

I am new to NLP and NLTK, and I want to find ambiguous words, meaning words with at least n different tags. I have this method, but the output is more than confusing.
Code:
def MostAmbiguousWords(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
if wordsUniqeTags.has_key(w):
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
else:
wordsUniqeTags[w] = set([t])
# Starting to count
res = []
for w in wordsUniqeTags:
if len(wordsUniqeTags[w]) >= n:
res.append((w, wordsUniqeTags[w]))
return res
MostAmbiguousWords(brown.tagged_words(), 13)
Output:
[("what's", set(['C', 'B', 'E', 'D', 'H', 'WDT+BEZ', '-', 'N', 'T', 'W', 'V', 'Z', '+'])),
("who's", set(['C', 'B', 'E', 'WPS+BEZ', 'H', '+', '-', 'N', 'P', 'S', 'W', 'V', 'Z'])),
("that's", set(['C', 'B', 'E', 'D', 'H', '+', '-', 'N', 'DT+BEZ', 'P', 'S', 'T', 'W', 'V', 'Z'])),
('that', set(['C', 'D', 'I', 'H', '-', 'L', 'O', 'N', 'Q', 'P', 'S', 'T', 'W', 'CS']))]
Now I have no idea what B,C,Q, ect. could represent. So, my questions:
What are these?
What do they mean? (In case they are tags)
I think they are not tags, because who and whats don't have the WH tag indicating "wh question words".
I'll be happy if someone could post a link that includes a mapping of all possible tags and their meaning.

It looks like you have a typo. In this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
you should have set([t]) (not set(t)), like you do in the else case.
This explains the behavior you're seeing because t is a string and set(t) is making a set out of each character in the string. What you want is set([t]) which makes a set that has t as its element.
>>> t = 'WHQ'
>>> set(t)
set(['Q', 'H', 'W']) # bad
>>> set([t])
set(['WHQ']) # good
By the way, you can correct the problem and simplify things by just changing that line to:
wordsUniqeTags[w].add(t)
But, really, you should make use of the setdefault method on dict and list comprehension syntax to improve the method overall. So try this instead:
def most_ambiguous_words(words, n):
# wordsUniqeTags holds a list of uniqe tags that have been observed for a given word
wordsUniqeTags = {}
for (w,t) in words:
wordsUniqeTags.setdefault(w, set()).add(t)
# Starting to count
return [(word,tags) for word,tags in wordsUniqeTags.iteritems() if len(tags) >= n]

You are splitting your POS tags into single characters in this line:
wordsUniqeTags[w] = wordsUniqeTags[w] | set(t)
set('AT') results in set(['A', 'T']).

How about making use of the Counter and defaultdict functionality in the collections module?
from collection import defaultdict, Counter
def most_ambiguous_words(words, n):
counts = defaultdict(Counter)
for (word,tag) in words:
counts[word][tag] += 1
return [(w, counts[w].keys()) for w in counts if len(counts[word]) > n]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

extract sequences from python list - python

main_list = [] def get_desired_value(_list): new_list = [] for val in _list: if val in ['N', 'P']: new_list.append(val) else: if new_list: main_list.append(new_list[:]) new_list.clear() return main_list print(get_desired_value(data)) >>>[['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]

Related

Python rearrange list based on another list

List shuffling by range

How to delete one of each item in a python list

Take every nth block from list

Is that a tag list or something else?

Categories

Resources