Find unique pairs of array with Python - python

I'm searching for a pythonic way to do this operation faster
import numpy as np
von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
zu_knoten = np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])
try:
for i in range(0,len(von_knoten)-1):
for j in range(0,len(von_knoten)-1):
if (i != j) & ([von_knoten[i],zu_knoten[i]] == [zu_knoten[j],von_knoten[j]]):
print(str(i)+".column equal " +str(j)+".column")
von_knoten = sp.delete(von_knoten , j)
zu_knoten = sp.delete(zu_knoten , j)
print(von_knoten)
print(zu_knoten)
except:
print('end')
so I need the fastest way to get
[0 0 1 1 4]
[1 2 2 3 2]
from
[0 0 1 1 1 2 2 2 3 4]
[1 2 0 2 3 0 1 4 1 2]
Thanks ;)

Some comments about your code; as-is, it does not do what you want, it shall print some stuff, did you even try to run it? Could you show us what you obtain?
first, simply do a range(len(von_knoten)); this will do what you want, as range starts at 0 by default, and ends one step before the end.
if you delete some items from the input lists, and try to access to items at end of them, you will likely obtain IndexErrors, this before exhausting the analysis of your input lists.
you do some sp.delete but we do not know what that is (neither do the code), this will raise AttributeErrors.
alas, please do not use except:. This will catch Exceptions you never dreamt of, and may explain why you don't understand what's wrong.
Then, what about using zip built-in function to obtain sorted two-dimensions tuples, and remove the duplicates ? Something like:
>>> von_knoten = [0, 0, 1, 1, 1, 2, 2, 2, 3, 4]
>>> zu_knoten = [1, 2, 0, 2, 3, 0, 1, 4, 1, 2]
>>> set(tuple(sorted([m, n])) for m, n in zip(von_knoten, zu_knoten))
{(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)}
I let you work around this to obtain the exact thing you're looking for.

You are trying to build up a collection of pairs you haven't seen before.
You can use not in but need to check this either way round:
L = []
for x,y in zip(von_knoten, zu_knoten):
if (x, y) not in L and (y, x ) not in L:
L.append((x, y))
This gives a list of tuples
[(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)]
which you can reshape.

Here's a vectorized output -
def unique_pairs(von_knoten, zu_knoten):
s = np.max([von_knoten, zu_knoten])+1
p1 = zu_knoten*s + von_knoten
p2 = von_knoten*s + zu_knoten
p = np.maximum(p1,p2)
sidx = p.argsort(kind='mergesort')
ps = p[sidx]
m = np.concatenate(([True],ps[1:] != ps[:-1]))
sm = sidx[m]
return von_knoten[sm],zu_knoten[sm]
Sample run -
In [417]: von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
...: zu_knoten = np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])
In [418]: unique_pairs(von_knoten, zu_knoten)
Out[418]: (array([0, 0, 1, 1, 2]), array([1, 2, 2, 3, 4]))

Using np.unique and the void view method from here
def unique_pairs(a, b):
c = np.sort(np.stack([a, b], axis = 1), axis = 1)
c_view = np.ascontiguousarray(c).view(np.dtype((np.void,
c.dtype.itemsize * c.shape[1])))
_, i = np.unique(c_view, return_index = True)
return a[i], b[i]

Related

Replace consecutive identic elements in the beginning of an array with 0

I want to replace the N first identic consecutive numbers from an array with 0.
import numpy as np
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
Loop works, but what would be a faster-vectorized implementation?
i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
x[i] = 0
i += 1
You can use argmax on a boolean array to get the index of the first changing value.
Then slice and replace:
n = (x!=x[0]).argmax() # 4
x[:n] = 0
output:
array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
intermediate array:
(x!=x[0])
# n=4
# [False False False False True True True True True True True True
# True True True True True True True]
My solution is based on itertools.groupby, so start from import itertools.
This function creates groups of consecutive equal values, contrary to e.g.
the pandasonic version of groupby, which collects withis a single group all
equal values from the input.
Another important feature is that you can assign any value to N and
replaced will be only the first N of a sequence of consecutive values.
To test my code, I set N = 4 and defined the source array as:
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])
Note that it contains 5 consecutive values of 2 at the end.
Then, to get the expected result, run:
rv = []
for key, grp in itertools.groupby(x):
lst = list(grp)
lgth = len(lst)
if lgth >= N:
lst[0:N] = [0] * N
rv.extend(lst)
xNew = np.array(rv)
The result is:
[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]
Note that a sequence of 4 zeroes occurs:
at the beginning (all 4 values of 1 have been replaced),
almost at the end (from 5 values of 2 first 4 have been replaced).

Repeating each element of a vector by a number of times provided by another counts vector [duplicate]

Say I have an array with longitudes, lonPorts
lonPort =np.loadtxt('LongPorts.txt',delimiter=',')
for example:
lonPort=[0,1,2,3,...]
And I want to repeat each element a different amount of times. How do I do this? This is what I tried:
Repeat =[5, 3, 2, 3,...]
lonPort1=[]
for i in range (0,len(lenDates)):
lonPort1[sum(Repeat[0:i])]=np.tile(lonPort[i],Repeat[i])
So the result would be:
lonPort1=[0,0,0,0,0,1,1,1,2,2,3,3,3,...]
The error I get is:
list assignment index out of range
How do I get rid of the error and make my array?
Thank you!
You can use np.repeat():
np.repeat(a, [5,3,2,3])
Example:
In [3]: a = np.array([0,1,2,3])
In [4]: np.repeat(a, [5,3,2,3])
Out[4]: array([0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3])
Without relying on numpy, you can create a generator that will consume your items one by one, and repeat them the desired amount of time.
x = [0, 1, 2, 3]
repeat = [4, 3, 2, 1]
def repeat_items(x, repeat):
for item, r in zip(x, repeat):
while r > 0:
yield item
r -= 1
for value in repeat_items(x, repeat):
print(value, end=' ')
displays 0 0 0 0 1 1 1 2 2 3.
Providing a numpy-free solution for future readers that might want to use lists.
>>> lst = [0,1,2,3]
>>> repeat = [5, 3, 2, 3]
>>> [x for sub in ([x]*y for x,y in zip(lst, repeat)) for x in sub]
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3]
If lst contains mutable objects, be aware of the pitfalls of sequence multiplication for sequences holding mutable elements.

How to duplicate a specific value in a list/array?

Any advice on how to repeat a certain value in an array in Python?
For instance, I want to repeat only 2 in array_a:
array_a = [1, 2, 1, 2, 1, 1, 2]
Wanted outcome is: I repeat each 2 and leave the 1:
array_a = [1, 2, 2, 1, 2, 2, 1, 1, 2, 2] # only the `2` should be repeated
I tried numpy and I could duplicate the entire array but not a certain value.
If you're interested in a numpy solution, you can repeat an array on itself using np.repeat.
>>> import numpy as np
>>> np.repeat(array_a, array_a)
array([1, 2, 2, 1, 2, 2, 1, 1, 2, 2])
This works only if you haves 1s and 2s in your data. For a generic solution, consider
>>> n_repeats = 2
>>> temp = np.where(np.array(array_a) == 2, n_repeats, 1)
>>> np.repeat(array_a, temp)
array([1, 2, 2, 1, 2, 2, 1, 1, 2, 2])
May be you can use dictionary to each unique element and number of times it needs to be repeated. Then using list comprehension to create array:
array_a = [1,2,1,2,1,1,2]
repeat_times = {1:1, 2:2} # 1 is 1 time and 2 is repeated two times
result = [i for i in array_a for j in range(repeat_times[i])]
print(result)
Output:
[1, 2, 2, 1, 2, 2, 1, 1, 2, 2]
This seems a good use-case for a generator:
>>> def repeater(iterable, repeat_map):
... for value in iterable:
... for i in range(repeat_map.get(value, 1)):
... yield value
...
>>> array_a = [1,2,1,2,1,1,2]
>>> list(repeater(array_a, repeat_map={2: 2}))
[1, 2, 2, 1, 2, 2, 1, 1, 2, 2]
If you convert this to a list, you can loop through it, and if it matches your criteria, add an extra version. For example:
a = [1,2,1,2,1,1,2]
long_a = []
for x in a:
long_a.append(x)
if x == 2:
long_a.append(x)
loop over the array (a 'list' in python)
find the the number
get the position of the matched number in the array
insert another number after each matched position
https://docs.python.org/3/reference/compound_stmts.html#for
https://docs.python.org/2/tutorial/datastructures.html#more-on-lists
An attempt using comprehensions.
array = [1, 2, 1, 2, 1, 1, 2]
element_to_repeat = 2
result = [
repeats_element
for repeats in
((element,)*2 if element == element_to_repeat else (element,) for element in array)
for repeats_element in repeats
]
It basically spits out tuples, "repeats", which contain the element once if it's not the element to repeat, or twice if it's the element to repeat. Then all of the elements of these "repeats" tuples are flattened into the answer.
Using a generator.
array = [1, 2, 1, 2, 1, 1, 2]
element_to_repeat = 2
def add_repeats(array, element_to_repeat):
for element in array:
if element == element_to_repeat:
yield element
yield element
else:
yield element
result = list(add_repeats(array, element_to_repeat))
Here is a handy one-liner using itertools and list comprehension with if and else in it. First it makes a nested list (to have the ability to repeat items on a certain position) and then it will simply flatten it at the end using .chain()-method:
from itertools import chain
array_a = [1, 2, 1, 2, 1, 1, 2]
list(chain.from_iterable([[item, item] if item == 2 else [item] for item in array_a]))
[1, 2, 2, 1, 2, 2, 1, 1, 2, 2] # output
The specific value to double is inside the if-statement. Using multipliers (instead of [item, item]) and a variable (instead of 2) would make this easily more generic, see this for example:
from itertools import chain
def repeat_certain_value(array, val, n):
return list(chain.from_iterable(([i] * n if i == val else [i] for i in array)))
repeat_certain_value([1, 2, 1, 2, 1, 1, 2], 2, 2)
[1, 2, 2, 1, 2, 2, 1, 1, 2, 2] # output
repeat_certain_value([0, -3, 1], -3, 5)
[0, -3, -3, -3, -3, -3, 1] # output
While this approach is a handy one-liner using builtin libraries, the approach from coldspeed is faster:
%timeit for x in range(1000): repeat_certain_value([1, 1, 1, 2, 2, 2, 3, 3, 3] * 100, 2, 2)
10 loops, best of 3: 165 ms per loop
%timeit for x in range(1000): coldspeeds_solution([1, 1, 1, 2, 2, 2, 3, 3, 3] * 100, 2, 2)
10 loops, best of 3: 100 ms per loop
Can try a list comprehension and create a flat function:
array_a = [1, 2, 1, 2, 1, 1, 2]
def flat(l):
newl=[]
for i in l:
if isinstance(i,list):
newl.extend(i)
else:
newl.append(i)
return newl
print(flat([[i]*2 if i==2 else i for i in array_a]))
Output:
[1, 2, 2, 1, 2, 2, 1, 1, 2, 2]

Decompress an array in Python

I need to decompress an array and I am not sure where to start.
Here is the input of the function
def main():
# Test case for Decompress function
B = [6, 2, 7, 1, 3, 5, 1, 9, 2, 0]
A = Decompress(B)
print(A)
I want this to come out
A = [2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 9, 0, 0]
If you can't see the pattern, B[1] is how many times B[2] shows up in A[], and then B[3] is how many times B[4] shows up in A[], and so on.
How do I write a function for this?
Compact version with zip() and itertools.chain.from_iterable:
from itertools import chain
list(chain.from_iterable([v] * c for c, v in zip(*([iter(B)]*2))))
Demo:
>>> B = [6, 2, 7, 1, 3, 5, 1, 9, 2, 0]
>>> from itertools import chain
>>> list(chain.from_iterable([v] * c for c, v in zip(*([iter(B)]*2))))
[2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 9, 0, 0]
Breaking this down:
zip(*([iter(B)]*2))) pairs counts with values:
>>> zip(*([iter(B)]*2))
[(6, 2), (7, 1), (3, 5), (1, 9), (2, 0)]
It is a fairly standard Python trick to get pairs out of a input iterable.
([v] * c for c, v in zip(*([iter(B)]*2))) is a generator expression that takes the counts and values and produces lists with the value repeated count times:
>>> next([v] * c for c, v in zip(*([iter(B)]*2)))
[2, 2, 2, 2, 2, 2]
chain.from_iterable takes the various lists produced by the generator expression and lets you iterate over them as if they were one long list.
list() turns it all back to a list.
def unencodeRLE(i):
i = list(i) #Copies the list to a new list, so the original one is not changed.
r = []
while i:
count = i.pop(0)
n = i.pop(0)
r+= [n for _ in xrange(count)]
return r
One more one-liner:
def decompress(vl):
return sum([vl[i] * [vl[i+1]] for i in xrange(0, len(vl), 2)], [])
A list comprehension extracts and unpacks pairs (xrange(0, len(vl), 2) iterates through start indices of pairs, vl[i] is a number of repetitions, vl[i+1] is what to repeat).
sum() joins the results together ([] is the initial value the unpacked lists are sequentially added to).
A slightly faster solution (with Python 2.7.3):
A=list(chain.from_iterable( [ B[i]*[B[i+1]] for i in xrange(0,len(B),2) ] ) )
>>> timeit.Timer(
setup='B=[6,2,7,1,3,5,1,9,2,0];from itertools import chain',
stmt='A=list(chain.from_iterable( [ B[i]*[B[i+1]] for i in xrange(0,len(B),2) ] ) )').timeit(100000)
0.22841787338256836
Comparing with:
>>> timeit.Timer(
setup='B=[6,2,7,1,3,5,1,9,2,0];from itertools import chain',
stmt='A=list(chain.from_iterable([v] * c for c, v in zip(*([iter(B)]*2))))').timeit(100000)
0.31104111671447754

How to split an integer into a list of digits?

Suppose I have an input integer 12345. How can I split it into a list like [1, 2, 3, 4, 5]?
Convert the number to a string so you can iterate over it, then convert each digit (character) back to an int inside a list-comprehension:
>>> [int(i) for i in str(12345)]
[1, 2, 3, 4, 5]
return array as string
>>> list(str(12345))
['1', '2', '3', '4', '5']
return array as integer
>>> map(int,str(12345))
[1, 2, 3, 4, 5]
I'd rather not turn an integer into a string, so here's the function I use for this:
def digitize(n, base=10):
if n == 0:
yield 0
while n:
n, d = divmod(n, base)
yield d
Examples:
tuple(digitize(123456789)) == (9, 8, 7, 6, 5, 4, 3, 2, 1)
tuple(digitize(0b1101110, 2)) == (0, 1, 1, 1, 0, 1, 1)
tuple(digitize(0x123456789ABCDEF, 16)) == (15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
As you can see, this will yield digits from right to left. If you'd like the digits from left to right, you'll need to create a sequence out of it, then reverse it:
reversed(tuple(digitize(x)))
You can also use this function for base conversion as you split the integer. The following example splits a hexadecimal number into binary nibbles as tuples:
import itertools as it
tuple(it.zip_longest(*[digitize(0x123456789ABCDEF, 2)]*4, fillvalue=0)) == ((1, 1, 1, 1), (0, 1, 1, 1), (1, 0, 1, 1), (0, 0, 1, 1), (1, 1, 0, 1), (0, 1, 0, 1), (1, 0, 0, 1), (0, 0, 0, 1), (1, 1, 1, 0), (0, 1, 1, 0), (1, 0, 1, 0), (0, 0, 1, 0), (1, 1, 0, 0), (0, 1, 0, 0), (1, 0, 0, 0))
Note that this method doesn't handle decimals, but could be adapted to.
[int(i) for i in str(number)]
or, if do not want to use a list comprehension or you want to use a base different from 10
from __future__ import division # for compatibility of // between Python 2 and 3
def digits(number, base=10):
assert number >= 0
if number == 0:
return [0]
l = []
while number > 0:
l.append(number % base)
number = number // base
return l
While list(map(int, str(x))) is the Pythonic approach, you can formulate logic to derive digits without any type conversion:
from math import log10
def digitize(x):
n = int(log10(x))
for i in range(n, -1, -1):
factor = 10**i
k = x // factor
yield k
x -= k * factor
res = list(digitize(5243))
[5, 2, 4, 3]
One benefit of a generator is you can feed seamlessly to set, tuple, next, etc, without any additional logic.
like #nd says but using the built-in function of int to convert to a different base
>>> [ int(i,16) for i in '0123456789ABCDEF' ]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
>>> [int(i,2) for i in "100 010 110 111".split()]
[4, 2, 6, 7]
Another solution that does not involve converting to/from strings:
from math import log10
def decompose(n):
if n == 0:
return [0]
b = int(log10(n)) + 1
return [(n // (10 ** i)) % 10 for i in reversed(range(b))]
Using join and split methods of strings:
>>> a=12345
>>> list(map(int,' '.join(str(a)).split()))
[1, 2, 3, 4, 5]
>>> [int(i) for i in ' '.join(str(a)).split()]
[1, 2, 3, 4, 5]
>>>
Here we also use map or a list comprehension to get a list.
Strings are just as iterable as arrays, so just convert it to string:
str(12345)
Simply turn it into a string, split, and turn it back into an array integer:
nums = []
c = 12345
for i in str(c):
l = i.split()[0]
nums.append(l)
np.array(nums)

Categories