How to count items in nest list? - python

I'm trying to figure out how to count the number of items in a nest list. I'm stuck at how to even begin this. For example if I were to do NestLst([]) it would print 0 but if I do
NestLst([[2, [[9]], [1]], [[[[5]]], ['hat', 'bat'], [3.44], ['hat', ['bat']]]]
it would return 9. Any help on how to begin this or how to do this would be great.
Thanks!

import collections
def NestLst(seq):
if isinstance(seq, str) or not isinstance(seq, collections.Iterable):
return 1
return sum(NestLst(x) for x in seq)
>>> NestLst([[2, [[9]], [1]], [[[[5]]], ['hat', 'bat'], [3.44], ['hat', ['bat']]]])
9

def total_length(l):
if isinstance(l, list):
return sum(total_length(x) for x in l)
else:
return 1

Your question contains the keyword: recursively.
Create a function that iterates over the list and if it finds a non-list item, adds one to the count, and if it finds a list, calls itself recusively.
The issue with your code its that you are using length instead of a recursive call.
Here is a pythonic pseudocode:
def count(list):
answer = 0
for item in list:
if item is not a list:
answer += 1
else:
answer += number of items in the sublist (recursion will be useful here)

You could try to recursively call reduce(). Something like that:
>>> def accumulator(x,y):
... if isinstance(y, list):
... return reduce(accumulator,y,x)
... else:
... return x+1
...
>>> reduce(accumulator, [10,20,30,40] ,0)
4
>>> reduce(accumulator, [10,[20,30],40] ,0)
4
>>> reduce(accumulator, [10,20,30,40,[]] ,0)
4
>>> reduce(accumulator, [10,[20,[30,[40]]]] ,0)
4
>>> reduce(accumulator, [10*i for i in range(1,5)] ,0)
4
Some notices:
empty collections will count for 0 items (see last example)
the 0 at the end the the reduce() call is the initial value. This might be a pitfall since when omitting it you still have a valid call, but the result will not be what you want. I strongly suggest wrapping the initial call in an utility function.

Related

How can I count the depth in a list of lists?

I want to count the depth in a list of lists, so not the amount of elements but the maximum depth one list can have.
This is my function:
def max_level(lst):
print(max_level([1, [[[2, 3]]], [[3]]])))
should return 4
You can try:
def max_level(lst):
return isinstance(lst, list) and max(map(max_level, lst)) + 1
print(max_level([1, [[[2, 3]]], [[3]]]))
Output:
4
Explanation:
First check if object passed into the recursive function is of type list:
def max_level(lst):
return isinstance(lst, list)
If so, proceed to add up the Trues in the list:
and max(map(max_level, lst)) + 1
where the max(map(max_level, lst)) returns the current amount of Trues, and the + 1 is to add one more.
If there can be empty lists, you can replace lst with lst or [0], where the or will tell python to use the list on the left side of it if its not empty, else use the [0]:
def max_level(lst):
return isinstance(lst, list) and max(map(max_level, lst or [0])) + 1
print(max_level([1, [], [[]]]))
Output:
3
Addressing #cdlane's comment, if you don't want to mix boolean values with integer values, you can add an int() wrapper to the isinstance() call:
def max_level(lst):
return int(isinstance(lst, list)) and max(map(max_level, lst or [0])) + 1
I want to search through a list that is empty as well
Give this a try:
def max_level(thing):
return 1 + (max(map(max_level, thing)) if thing else 0) if isinstance(thing, list) else 0
I've reworked #AnnZen's initial solution to add an extra check for empty lists and also to not mix booleans and integers.

How to add the list elements in python [duplicate]

This question already has answers here:
sum of nested list in Python
(14 answers)
Closed 8 years ago.
for example I have a list with numbers like this:
a = [10,[20,30],40]
or
b = [[10,20],30]
Now I have to add all the elements in the above lists.
so that if add the first list then I should get the answer as follows: 10+20+30+40 = 100.
and for the second one b as follows: 10+20+30 = 60.
The solution is to be expressed as a function.
I have tried this one but it can be used for only adding if there is no nested list.
def sum(t):
total = 0
for x in t:
total = total+x
return total
Now can anyone help me solve this kind of problem in python programming.
Thank you in advance!!!!!
You can use reduce:
x = reduce(lambda prev,el: prev+([x for x in el] if type(el) is list else [el]), x, [])
And use its result to feed your loop.
def sum(t):
t = reduce(lambda prev,el: prev+([x for x in el] if type(el) is list else [el]), t, [])
total = 0
for x in t:
total = total+x
return total
You can recursively flatten into a single list:
def flatten(lst, out=None):
if out is None:
out = []
for item in lst:
if isinstance(item, list):
flatten(item, out)
else:
out.append(item)
return out
Now you can just use sum:
>>> sum(flatten([10, [20, 30], 40]))
100
You need to define a recursion to handle the nested lists:
rec = lambda x: sum(map(rec, x)) if isinstance(x, list) else x
rec, applied on a list, will return the sum (recursively), on a value, return the value.
result = rec(a)
Seems like the best approach would be to iterate over the top-level list and check each element's type (using is_instance(type, item)). If it's an integer, add it to the total, otherwise if it's a list, iterate over that list.
Making your function recursive would make it most usable.
Edit: For anybody stumbling upon this question, here's an example.
def nested_sum(input_list):
total = 0
for element in input_list:
if isinstance(element, int):
total += element
elif isinstance(element, list):
total += nested_sum(element)
else:
raise TypeError
return total
Usage:
my_list = [72, 5, [108, 99, [8, 5], 23], 44]
print nested_sum(my_list)
>>> 364

Length of 2d list in python

I have a 2D list, for example mylist =[[1,2,3],[4,5,6],[7,8,9]].
Is there any way I can use len() function such that I can calculate the lengths of array indices? For example:
len(mylist[0:3])
len(mylist[1:3])
len(mylist[0:1])
Should give:
9
6
3
length = sum([len(arr) for arr in mylist])
sum([len(arr) for arr in mylist[0:3]]) = 9
sum([len(arr) for arr in mylist[1:3]]) = 6
sum([len(arr) for arr in mylist[2:3]]) = 3
Sum the length of each list in mylist to get the length of all elements.
This will only work correctly if the list is 2D. If some elements of mylist are not lists, who knows what will happen...
Additionally, you could bind this to a function:
len2 = lambda l: sum([len(x) for x in l])
len2(mylist[0:3]) = 9
len2(mylist[1:3]) = 6
len2(mylist[2:3]) = 3
You can flatten the list, then call len on it:
>>> mylist=[[1,2,3],[4,5,6],[7,8,9]]
>>> import collections
>>> def flatten(l):
... for el in l:
... if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
... for sub in flatten(el):
... yield sub
... else:
... yield el
...
>>> len(list(flatten(mylist)))
9
>>> len(list(flatten(mylist[1:3])))
6
>>> len(list(flatten(mylist[0:1])))
3
You can use reduce to calculate the length of array indices like this, this can also handle the scenario when you pass in something like mylist[0:0]:
def myLen(myList):
return reduce(lambda x, y:x+y, [len(x) for x in myList], 0)
myLen(mylist[0:3]) = 9
myLen(mylist[1:3]) = 6
myLen(mylist[0:1]) = 3
myLen(mylist[0:0]) = 0
I like #Haidro's answer, which works for arbitrary nesting, but I dislike the creation of the intermediate list. Here's a variant that avoids that.
try:
reduce
except NameError:
# python3 - reduce is in functools, there is no basestring
from functools import reduce
basestring = str
import operator
import collections
def rlen(item):
"""
rlen - recursive len(), where the "length" of a non-iterable
is just 1, but the length of anything else is the sum of the
lengths of its sub-items.
"""
if isinstance(item, collections.Iterable):
# A basestring is an Iterable that contains basestrings,
# i.e., it's endlessly recursive unless we short circuit
# here.
if isinstance(item, basestring):
return len(item)
return reduce(operator.add, (rlen(x) for x in item), 0)
return 1
For the heck of it I've included a generator-driven, fully-recursive flatten as well. Note that this time there's a harder decision to make about strings (the short circuit above is trivially correct since as len(some_string) == sum(len(char) for char in some_string)).
def flatten(item, keep_strings=False):
"""
Recursively flatten an iterable into a series of items. If given
an already flat item, just returns it.
"""
if isinstance(item, collections.Iterable):
# We may want to flatten strings too, but when they're
# length 1 we have to terminate recursion no matter what.
if isinstance(item, basestring) and (len(item) == 1 or keep_strings):
yield item
else:
for elem in item:
for sub in flatten(elem, keep_strings):
yield sub
else:
yield item
If you don't need arbitrary nesting—if you're always sure that this is just a list of lists (or list of tuples, tuple of lists, etc)—the "best" method is probably the simple "sum of generator" variant of #Matt Bryant's answer:
len2 = lambda lst: sum(len(x) for x in lst)
how about len(ast.flatten(lst))? only works in py2k afaik
It's
from compiler import ast
len(ast.flatten(lst))
since
ast.flatten([1,2,3]) == [1,2,3]
ast.flatten(mylist[0:2]) == [1,2,3,4,5,6]
ast.flatten(mylist) == [1,2,3,4,5,6,7,8,9]

Python iterate through list and count items of a certain value

Hi I have a python problem whereby I have to count which list elements contain the value 2, in a list with various levels of nest. E.g.:
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
this list can have have up to three levels of nesting but could also be two or only one level deep.
I have piece of code which sort of works:
count = 0
for i in my_list:
if len(i) > 1:
for x in i:
if 2 in x:
count += 1
elif i == 2:
count += 1
However, apart from being very ugly, this does not take account of the potential to have a list with a single element which is 2. It also doesn't work to get len() on a single int.
I know list comprehension should be able to take care of this but I am a litle stuck on how to deal with the potential nesting.
Any help would be much appreciated.
I'd use a variant of the flatten() generator from https://stackoverflow.com/a/2158532/367273
The original yields every elements from an arbitrarily nested and irregularly shaped structure of iterables. My variant (below) yields the innermost iterables instead of yielding the scalars.
from collections import Iterable
def flatten(l):
for el in l:
if isinstance(el, Iterable) and any(isinstance(subel, Iterable) for subel in el):
for sub in flatten(el):
yield sub
else:
yield el
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print(sum(1 for el in flatten(my_list) if 2 in el))
For your example it prints 3.
Here is another way to do this:
small_list = [2]
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
another_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1], 2]
from collections import Iterable
def count_lists_w_two(x):
if isinstance(x, Iterable) == False:
return 0
else:
return (2 in x) + sum(count_lists_w_two(ele) for ele in x)
Result:
>>> count_lists_w_two(small_list)
1
>>> count_lists_w_two(my_list)
3
>>> count_lists_w_two(another_list)
4
Update and final answer:
My solution recurses a nested list of lists. If the list contains 2, the count is set to one. Then the sum of the counts of any sub-lists are added. This approach supports heterogeneous lists where numbers and lists can be mixed at the same level, as in the second usage example below:
import collections
def list_count(l):
count = int(2 in l)
for el in l:
if isinstance(el, collections.Iterable):
count += list_count(el)
return count
Here are a couple of test cases:
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print = list_count(my_list)
# 3 is printed
my_list = [2, [2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print = list_count(my_list)
# 4 is printed
#Akavall's answer reminded me this could be collapsed to a one-liner. (But I always get (usually justified) readability complaints on SO when I do this.)
def list_count(l):
return ( int(2 in l) +
sum([list_count(el) for el in l
if isinstance(el, collections.Iterable)]) )
Original Answer (not what original question was looking for)
Update: #NPE updated his answer after expected result was specified. He's current answer works as the original poster desires.
#NPE's (original) answer is close, but you asked:
count which list elements contain the value 2
Which I read as you needing:
my_list = [[2,3,2,2], [[2,1,2,1], [2,1,1]], [1,1,1]]
print(sum([1 for e in [flatten(sl) for sl in ml] if 2 in e]))
But he's looking for 3. My code generates 2 because it iterates across each top level element and counts it if it contains any 2, but that's not what he actually wants.

Removing an element from a list based on a predicate

I want to remove an element from list, such that the element contains 'X' or 'N'. I have to apply for a large genome. Here is an example:
input:
codon=['AAT','XAC','ANT','TTA']
expected output:
codon=['AAT','TTA']
For basis purpose
>>> [x for x in ['AAT','XAC','ANT','TTA'] if "X" not in x and "N" not in x]
['AAT', 'TTA']
But if you have huge amount of data, I suggest you to use dict or set
And If you have many characters other than X and N, you may do like this
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(ch for ch in list(x) if ch in ["X","N","Y","Z","K","J"])]
['AAT', 'TTA']
NOTE: list(x) can be just x, and ["X","N","Y","Z","K","J"] can be just "XNYZKJ", and refer gnibbler answer, He did the best one.
Another not fastest way but I think it reads nicely
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(y in x for y in "XN")]
['AAT', 'TTA']
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not set("XN")&set(x)]
['AAT', 'TTA']
This way will be faster for long codons (assuming there is some repetition)
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]=not any(y in s for y in "XN")
return memo[s]
print filter(pred,codon)
Here is the method suggested by James Brooks, you'd have to test to see which is faster for your data
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not set("XN")&set(s)
return memo[s]
print filter(pred,codon)
For this sample codon, the version using sets is about 10% slower
There is also the method of doing it using filter
lst = filter(lambda x: 'X' not in x and 'N' not in x, list)
filter(lambda x: 'N' not in x or 'X' not in x, your_list)
your_list = [x for x in your_list if 'N' not in x or 'X' not in x]
I like gnibbler’s memoization approach a lot. Either method using memoization should be identically fast in the big picture on large data sets, as the memo dictionary should quickly be filled and the actual test should be rarely performed. With this in mind, we should be able to improve the performance even more for large data sets. (This comes at some cost for very small ones, but who cares about those?) The following code only has to look up an item in the memo dict once when it is present, instead of twice (once to determine membership, another to extract the value).
codon = ['AAT', 'XAC', 'ANT', 'TTA']
def pred(s,memo={}):
try:
return memo[s]
except KeyError:
memo[s] = not any(y in s for y in "XN")
return memo[s]
filtered = filter(pred, codon)
As I said, this should be noticeably faster when the genome is large (or at least not extremely small).
If you don’t want to duplicate the list, but just iterate over the filtered list, do something like:
for item in (item for item in codon if pred):
do_something(item)
If you're dealing with extremely large lists, you want to use methods that don't involve traversing the entire list any more than you absolutely need to.
Your best bet is likely to be creating a filter function, and using itertools.ifilter, e.g.:
new_seq = itertools.ifilter(lambda x: 'X' in x or 'N' in x, seq)
This defers actually testing every element in the list until you actually iterate over it. Note that you can filter a filtered sequence just as you can the original sequence:
new_seq1 = itertools.ifilter(some_other_predicate, new_seq)
Edit:
Also, a little testing shows that memoizing found entries in a set is likely to provide enough of an improvement to be worth doing, and using a regular expression is probably not the way to go:
seq = ['AAT','XAC','ANT','TTA']
>>> p = re.compile('[X|N]')
>>> timeit.timeit('[x for x in seq if not p.search(x)]', 'from __main__ import p, seq')
3.4722548536196314
>>> timeit.timeit('[x for x in seq if "X" not in x and "N" not in x]', 'from __main__ import seq')
1.0560532134670666
>>> s = set(('XAC', 'ANT'))
>>> timeit.timeit('[x for x in seq if x not in s]', 'from __main__ import s, seq')
0.87923730529996647
Any reason for duplicating the entire list? How about:
>>> def pred(item, haystack="XN"):
... return any(needle in item for needle in haystack)
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> idx = 0
>>> while idx < len(lst):
... if pred(lst[idx]):
... del lst[idx]
... else:
... idx = idx + 1
...
>>> lst
['AAT', 'TTA']
I know that list comprehensions are all the rage these days, but if the list is long we don't want to duplicate it without any reason right? You can take this to the next step and create a nice utility function:
>>> def remove_if(coll, predicate):
... idx = len(coll) - 1
... while idx >= 0:
... if predicate(coll[idx]):
... del coll[idx]
... idx = idx - 1
... return coll
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> remove_if(lst, pred)
['AAT', 'TTA']
>>> lst
['AAT', 'TTA']
As S.Mark requested here is my version. It's probably slower but does make it easier to change what gets removed.
def filter_genome(genome, killlist = set("X N".split()):
return [codon for codon in genome if 0 == len(set(codon) | killlist)]
It is (asympotically) faster to use a regular expression than searching many times in the same string for a certain character: in fact, with a regular expression the sequences is only be read at most once (instead of twice when the letters are not found, in gnibbler's original answer, for instance). With gnibbler's memoization, the regular expression approach reads:
import re
remove = re.compile('[XN]').search
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not remove(s)
return memo[s]
print filter(pred,codon)
This should be (asymptotically) faster than using the "in s" or the "set" checks (i.e., the code above should be faster for long enough strings s).
I originally thought that gnibbler's answer could be written in a faster and more compact way with dict.setdefault():
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
return memo.setdefault(s, not any(y in s for y in "XN"))
print filter(pred,codon)
However, as gnibbler noted, the value in setdefault is always evaluated (even though, in principle, it could be evaluated only when the dictionary key is not found).
If you want to modify the actual list instead of creating a new one here is a simple set of functions that you can use:
from typing import TypeVar, Callable, List
T = TypeVar("T")
def list_remove_first(lst: List[T], accept: Callable[[T], bool]) -> None:
for i, v in enumerate(lst):
if accept(v):
del lst[i]
return
def list_remove_all(lst: List[T], accept: Callable[[T], bool]) -> None:
for i in reversed(range(len(lst))):
if accept(lst[i]):
del lst[i]

Categories