Related
np.unique() can return indices of first occurrence, indices to reconstruct, and occurrence count. Is there any function/library that can do the same for any Python object?
Not as such. You can get similar functionality using different classes depending on your needs.
unique with no extra flags has a similar result to set:
unique_value = set(x)
collections.Counter simulates return_counts:
counts = collections.Counter(x)
unique_values = list(counts.keys())
unique_counts = list(counts.values())
To mimic return_index, use list.index on a set or Counter. This assumes that the container is a list
first_indices = [x.index(k) for k in counts]
To simulate return_inverse, we look at how unique is actually implemented. unique sorts the input to get the runs of elements. A similar technique can be acheived via sorted (or in-place list.sort) and itertools.groupby:
s = sorted(zip(x, itertools.count()))
inverse = [0] * len(x)
for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
for v in g:
inverse[v[1]] = i
In fact, the groupby approach encodes all the options:
s = sorted(zip(x, itertools.count()))
unique_values = []
first_indices = []
unique_counts = []
inverse = [0] * len(x)
for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
unique_values.append(k)
count = 1
v = next(g)
inverse[v[1]] = i
first_indices.append(v[0])
for v in g:
inverse[v[1]] = i
count += 1
unique_counts.append(count)
You can use Counter:
> from collections import Counter
> bob = ['bob','bob','dob']
> Counter(bob)
Counter({'bob': 2, 'dob': 1})
> Counter(bob).keys()
dict_keys(['bob', 'dob'])
How to improve the following code?
1 )-> pass a 1-dimensional array of length 2**n
2 )-> for every index of the array get the binary representation
3 )-> reverse the binary represenation and use it as new integer index for the corresponding value
EXAMPLE:
[56,209,81,42]
[00,01,10,11] (binary representation of the indices)
-> REVERSE:[00,10,01,11]
-> BECOMES:[56,81,209,42]
Code:
def order_in_reversed_bits(data: np.ndarray) -> np.ndarray:
tobinary = lambda t: np.binary_repr(t, width=len(np.binary_repr(data.shape[0]-1)))[::-1]
func = np.vectorize(tobinary)
a = func(np.arange(0,data.shape[0]))
t = np.zeros(data.shape,dtype='float64')
for i,k in enumerate(a):
t[int(k,2)] = data[i]
return t
Which built-in functionality of Numpy or Python is handy?
You could use sorted with custom key, for example (thanks to #hpaulj for improved key function with bin()):
lst = [56,209,81,42]
def order_in_reversed_bits_python(lst):
return [v for _, v in sorted(enumerate(lst), key=lambda k: bin(k[0])[:1:-1])]
print(order_in_reversed_bits_python(lst))
Prints:
[56, 81, 209, 42]
Timings:
import timeit
from random import randint
def order_in_reversed_bits_python(lst):
return [v for _, v in sorted(enumerate(lst), key=lambda k: bin(k[0])[:1:-1])]
def order_in_reversed_bits(data):
tobinary = lambda t: np.binary_repr(t, width=len(np.binary_repr(data.shape[0]-1)))[::-1]
func = np.vectorize(tobinary)
a = func(np.arange(0,data.shape[0]))
t = np.zeros(data.shape,dtype='float64')
for i,k in enumerate(a):
t[int(k,2)] = data[i]
return t
# create some large array:
lst = np.array([randint(1, 100) for _ in range(2**16)])
t1 = timeit.timeit(lambda: order_in_reversed_bits_python(lst), number=1)
t2 = timeit.timeit(lambda: order_in_reversed_bits(lst), number=1)
print(t1)
print(t2)
Prints:
0.05821935099811526
0.22723246600071434
which is improvement ~3.9x
This problem is known as bit-reversal permutation. The only difficult part is to reverse the binary representation of an index. Here you will find ways to do it. I choose the simplest one:
def bit_reversal_permutation(n):
indices = range(2**n)
rev_bits = lambda x: int(format(x, f'0{n}b')[::-1], 2)
return np.fromiter(map(rev_bits, indices), dtype=int)
A faster version, based on the observation that:
Each permutation in this sequence can be generated by concatenating two sequences of numbers: the previous permutation, doubled, and the same sequence with each value increased by one.
def bit_reversal_permutation(n):
indices = range(2**(n-1))
rev_bits = lambda x: int(format(x, f'0{n-1}b')[::-1], 2)
rev_indices = np.fromiter(map(rev_bits, indices), dtype=int)
return np.concatenate([2*rev_indices, 2*rev_indices + 1])
Example:
n = 4
a = np.random.randn(2**n)
inds_rev = bit_reversal_permutation(n)
a[inds_rev]
I have a list of strings like this:
lst = ['23532','user_name=app','content=123',
'###########################',
'54546','user_name=bee','content=998 hello','source=fb',
'###########################',
'12/22/2015']
I want a similar method like string.split('#') that can give me output like this:
[['23532','user_name=app','content='123'],
['54546','user_name=bee',content='998 hello','source=fb'],
['12/22/2015']]
but I know list has not split attribute. I cannot use ''.join(lst) either because this list comes from part of a txt file I read in and my txt.file was too big, so it will throw an memory error to me.
I don't think there's a one-liner for this, but you can easily write a generator to do what you want:
def sublists(lst):
x = []
for item in lst:
if item == '###########################': # or whatever condition you like
if x:
yield x
x = []
else:
x.append(item)
if x:
yield x
new_list = list(sublists(old_list))
If you can't use .join(), you can loop through the list and save the index of any string that contains # then loop again to slice the list:
lst = ['23532', 'user_name=app', 'content=123', '###########################' ,'54546','user_name=bee','content=998 hello','source=fb','###########################','12/22/2015']
idx = []
new_lst = []
for i,val in enumerate(lst):
if '#' in val:
idx.append(i)
j = 0
for x in idx:
new_lst.append(lst[j:x])
j = x+1
new_lst.append(lst[j:])
print new_lst
output:
[['23532', 'user_name=app', 'content=123'], ['54546', 'user_name=bee', 'content=998 hello', 'source=fb'], ['12/22/2015']]
sep = '###########################'
def split_list(_list):
global sep
lists = list()
sub_list = list()
for x in _list:
if x == sep:
lists.append(sub_list)
sub_list = list()
else:
sub_list.append(x)
lists.append(sub_list)
return lists
l = ['23532','user_name=app','content=123',
'###########################',
'54546','user_name=bee','content=998 hello','source=fb',
'###########################',
'12/22/2015']
pprint(split_list(l))
Output:
[['23532', 'user_name=app', 'content=123'],
['54546', 'user_name=bee', 'content=998 hello', 'source=fb'],
['12/22/2015']]
You can achieve this by itertools.groupby
from itertools import groupby
lst = ['23532','user_name=app','content=123',
'###########################','54546','user_name=bee','content=998 hello','source=fb',
'###########################','12/22/2015']
[list(g) for k, g in groupby(lst, lambda x: x == '###########################') if not k ]
Output
[['23532', 'user_name=app', 'content=123'],
['54546', 'user_name=bee', 'content=998 hello', 'source=fb'],
['12/22/2015']]
I try to sum a list of nested elements
e.g, numbers=[1,3,5,6,[7,8]] should produce sum=30
I wrote the following code :
def nested_sum(L):
sum=0
for i in range(len(L)):
if (len(L[i])>1):
sum=sum+nested_sum(L[i])
else:
sum=sum+L[i]
return sum
The above code gives following error:
object of type 'int' has no len()
I also tried len([L[i]]), still not working.
Anyone can help? It is Python 3.3
You need to use isinstance to check whether an element is a list or not. Also, you might want to iterate over the actual list, to make things simpler.
def nested_sum(L):
total = 0 # don't use `sum` as a variable name
for i in L:
if isinstance(i, list): # checks if `i` is a list
total += nested_sum(i)
else:
total += i
return total
One alternative solution with list comprehension:
>>> sum( sum(x) if isinstance(x, list) else x for x in L )
30
Edit:
And for lists with more than two levels(thx #Volatility):
def nested_sum(L):
return sum( nested_sum(x) if isinstance(x, list) else x for x in L )
It is generally considered more pythonic to duck type, rather than explicit type checking. Something like this will take any iterable, not just lists:
def nested_sum(a) :
total = 0
for item in a :
try:
total += item
except TypeError:
total += nested_sum(item)
return total
I would sum the flattened list:
def flatten(L):
'''Flattens nested lists or tuples with non-string items'''
for item in L:
try:
for i in flatten(item):
yield i
except TypeError:
yield item
>>> sum(flatten([1,3,5,6,[7,8]]))
30
A quick recursion that uses a lambda to handle the nested lists:
rec = lambda x: sum(map(rec, x)) if isinstance(x, list) else x
rec, applied on a list, will return the sum (recursively), on a value, return the value.
result = rec(a)
This code also works.
def add_all(t):
total = 0
for i in t:
if type(i) == list: # check whether i is list or not
total = total + add_all(i)
else:
total += i
return total
An example using filter and map and recursion:
def islist(x):
return isinstance(x, list)
def notlist(x):
return not isinstance(x, list)
def nested_sum(seq):
return sum(filter(notlist, seq)) + map(nested_sum, filter(islist, seq))
And here is an example using reduce and recursion
from functools import reduce
def nested_sum(seq):
return reduce(lambda a,b: a+(nested_sum(b) if isinstance(b, list) else b), seq)
An example using plain old recursion:
def nested_sum(seq):
if isinstance(seq[0], list):
head = nested_sum(seq[0])
else:
head = seq[0]
return head + nested_sum(seq[1:])
An example using simulated recursion:
def nested_sum(seq):
stack = []
stack.append(seq)
result = 0
while stack:
item = stack.pop()
if isinstance(item, list):
for e in item:
stack.append(e)
else:
result += item
return result
Adjustment for handling self-referential lists is left as an exercise for the reader.
def sum_nest_lst(lst):
t=0
for l in lst:
if(type(l)==int):
t=t+l
if(type(l)==list):
t=t+sum(l)
print(t)
def nnl(nl): # non nested list function
nn = []
for x in nl:
if type(x) == type(5):
nn.append(x)
if type(x) == type([]):
n = nnl(x)
for y in n:
nn.append(y)
return sum(nn)
print(nnl([[9, 4, 5], [3, 8,[5]], 6])) # output:[9,4,5,3,8,5,6]
a = sum(nnl([[9, 4, 5], [3, 8,[5]], 6]))
print (a) # output: 40
A simple solution would be to use nested loops.
def nested_sum(t):
sum=0
for i in t:
if isinstance(i, list):
for j in i:
sum +=j
else:
sum += i
return sum
L = [1, 2, 3, [4, 5, 6], 5, [7, 8, 9]]
total = 0 # assign any var
for a in L: # assign index and start to iterate using if else
if (isinstance(a, list)): # since its a list you are basically repeating the prev step
for b in a:
total += b
else:
total += a
print(total)
def list_sum(L):
return sum(list_sum(x) if isinstance(x, list) else x for x in L)
def nested_sum(lists):
total = 0
for lst in lists:
s = sum(lst)
total += s
return total
#nested sum
l = [[1, 2], [3,5], [6,2], [4, 5, 6,9]]
def nested_sum(lst):
sum = 0
for i in lst:
for j in i:
sum = sum + j
print(sum)
nested_sum(l)
I have a few arrays containing integer and strings. For example:
myarray1 = [1,2,3,"ab","cd",4]
myarray2 = [1,"a",2,3,"bc","cd","e",4]
I'm trying to combine only the strings in an array that are next to each other. So I want the result to be:
newarray1= [1,2,3,"abcd",4]
newarray2= [1,"a",2,3,"bccde",4]
Does anyone know how to do this? Thank you!
The groupby breaks the list up into runs of strings and runs of integers. The ternary operation joins the groups of strings and puts them into a temporary sequence. The chain re-joins the strings and the runs of integers.
from itertools import groupby, chain
def joinstrings(iterable):
return list(chain.from_iterable(
(''.join(group),) if key else group
for key, group in
groupby(iterable, key=lambda elem: isinstance(elem, basestring))))
>>> myarray1 = [1,2,3,"ab","cd",4]
>>> newarray1 = [myarray1[0]]
>>> for item in myarray1[1:]:
... if isinstance(item, str) and isinstance(newarray1[-1], str):
... newarray1[-1] = newarray1[-1] + item
... else:
... newarray1.append(item)
>>> newarray1
[1, 2, 3, 'abcd', 4]
reduce(lambda x, (tp, it): tp and x + ["".join(it)] or x+list(it), itertools.groupby( myarray1, lambda x: isinstance(x, basestring) ), [])
a = [1,2,3,"ab","cd",4]
b = [1,a,2,3,"bc","cd","e",4]
def func(a):
ret = []
s = ""
for x in a:
if isinstance(x, basestring):
s = s + x
else:
if s:
ret.append(s)
s = ""
ret.append(x)
return ret
print func(a)
print func(b)