Numpy, duplicating data in different order - python

I have data for example
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([3, 4, 5, 6])
I want to duplicate each item in each vector to the value of the length of the vector. So the results can are
>>> a2 = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
>>> b2 = np.array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])
Using np.tile(b, len(b)) can output b2. However, how can I get a2?

The two replications are a bit different. The first one can be obtained with .repeat(..) [numpy-doc]:
>>> a.repeat(len(a))
array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
the second one with .tile(..) [numpy-doc]:
>>> np.tile(b, len(b))
array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])

You can do both in one go using np.meshgrid
A,B = map(np.ravel,np.meshgrid(a,b,indexing='ij'))
A
# array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
B
# array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])

Related

Duplicate every nth row and column of a numpy array

I have a given 2d np-array and want to duplicate every e.g. 3rd row and column.
Basically, if I had an np-array
a = np.array([
[1, 2, 3, 1, 2, 3],
[2, 3, 4, 2, 3, 4],
[3, 4, 5, 3, 4, 5],
[4, 5, 6, 4, 5, 6]
])
I would want to produce:
b = np.array([
[1, 2, 3, 3, 1, 2, 3, 3],
[2, 3, 4, 4, 2, 3, 4, 4],
[3, 4, 5, 5, 3, 4, 5, 5],
[3, 4, 5, 5, 3, 4, 5, 5],
[4, 5, 6, 6, 4, 5, 6, 6]
])
How could I do that?
rows
You can identify the Nth row using arithmetic, then duplicate it with np.repeat:
N = 3
out = np.repeat(a, (np.arange(a.shape[0])%N == (N-1)) + 1, axis=0)
Output:
array([[1, 2, 3, 1, 2, 3],
[2, 3, 4, 2, 3, 4],
[3, 4, 5, 3, 4, 5],
[3, 4, 5, 3, 4, 5],
[4, 5, 6, 4, 5, 6]])
intermediate:
(np.arange(a.shape[0])%N == (N-1)) + 1
# array([1, 1, 2, 1])
rows and cols
same mechanisms on both dimensions:
N = 3
rows = (np.arange(a.shape[0])%N == (N-1)) + 1
# array([1, 1, 2, 1])
cols = (np.arange(a.shape[1])%N == (N-1)) + 1
# array([1, 1, 2, 1, 1, 2])
out = np.repeat(np.repeat(a, rows, axis=0), cols, axis=1)
output:
array([[1, 2, 3, 3, 1, 2, 3, 3],
[2, 3, 4, 4, 2, 3, 4, 4],
[3, 4, 5, 5, 3, 4, 5, 5],
[3, 4, 5, 5, 3, 4, 5, 5],
[4, 5, 6, 6, 4, 5, 6, 6]])

Make function working for a single tuple, work for a list of tuples

I have a function
def series_score(sailor, races_to_discard):
places = sailor[1]
for i in range(races_to_discard):
places.remove(max(places))
print(places)
sum_of_places = sum(places)
print(sum_of_places)
that modifies a tuple like this
sailor = ("bob", [2, 4, 1, 1, 2, 5])
into this by removing the highest number in the list
sailor = ("bob", [2, 4, 1, 1, 2])
how could I adapt the function to work on list of tuples like this
list_of_sailors = [('Clare', [3, 1, 1, 2, 1, 1]), ('Bob', [2, 2, 3, 1, 2, 3]), ('Alice', [1, 3, 2, 3, 3, 2]), ('Eva', [4, 5, 4, 4, 5, 5]), ('Dennis', [5, 4, 5, 5, 4, 4])]
You could do something like this
for sailor,races in list_of_sailors:
series_score(sailor, races)
Just include another loop. Here, I'm taking off the highest two numbers in every array:
def series_score(sailors, races_to_discard):
for sailor in sailors:
places = sailor[1]
for i in range(races_to_discard):
places.remove(max(places))
print(places)
sum_of_places = sum(places)
print(sum_of_places)
print(series_score([('Clare', [3, 1, 1, 2, 1, 1]), ('Bob', [2, 2, 3, 1, 2, 3]), ('Alice', [1, 3, 2, 3, 3, 2]), ('Eva', [4, 5, 4, 4, 5, 5]), ('Dennis', [5, 4, 5, 5, 4, 4])], 2))
>>>[1, 1, 1, 1]
4
[2, 2, 1, 2]
7
[1, 2, 3, 2]
8
[4, 4, 4, 5]
17
[4, 5, 4, 4]
17

In python, how should i Weighted-random coding?

I want to know the method that weighted-random in Python.
1:10%, 2:10%, 3:10%, 4:50%, 5:20%
Then I choose the random number without duplication. How should I code? Generally, we will code below that:
Python
from random import *
sample(range(1,6),1)
You should have a look at random.choices (https://docs.python.org/3/library/random.html#random.choices), which allows you to define a weighting, if you are using python 3.6 ore newer
Example:
import random
choices = [1,2,3,4,5]
random.choices(choices, weights=[10,10,10,50,20], k=20)
Output:
[3, 5, 2, 4, 4, 4, 5, 3, 5, 4, 5, 4, 5, 4, 2, 4, 5, 2, 4, 4]
Try this:
from numpy.random import choice
list_of_candidates = [1,2,5,4,12]
number_of_items_to_pick = 120
p = [0.1, 0, 0.3, 0.6, 0]
choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)
If you really wanted a sample-version you can prepare the range accordingly:
nums = [1,2,3,4,5]
w = [10,10,10,50,20] # total of 100%
d = [x for y in ( [n]*i for n,i in zip(nums,w)) for x in y]
a_sample = random.sample(d,k=5)
print(a_sample)
print(d)
Output:
# 5 samples
[4, 2, 3, 1, 4]
# the whole sample input:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
If you just need 1 number you can use random.choices - it is limited to 1 number because its drawing with replacement.
import random
from collections import Counter
# draw and count 10k to show distribution works
print(Counter( random.choices([1,2,3,4,5], weights=[10,10,10,50,20], k=10000)).most_common())
Output:
[(4, 5019), (5, 2073), (3, 1031), (1, 978), (2, 899)]
Using a "sample" w/o replacement and "weighted" is (for me) weired - because you would change the weighting for each successive number because you removed available numbers from the range (thats by feel - my guess would be the math behind tells me its not so).

Find highest values in n unequal lists

I have list with n multiple lists.
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5]
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3]
]
How can I efficiently compare them and generate a list which always contains the highest value at the current position?
I don't know how I can do this since the boundaries for each list are different.
The output for the above example should be a list with these values:
[8,9,4,5,9,6,7,8,9,4,5]
The most idiomatic approach would be transposing the 2D list and calling max on each row in the transposed list. But in your case, you're dealing with ragged lists, so zip cannot be directly applied here (it zips upto the shortest list only).
Instead, use itertools.zip_longest (izip_longest for python 2), and then apply max using map -
from itertools import zip_longest
r = list(map(max, zip_longest(*data, fillvalue=-float('inf'))))
Or, using #Peter DeGlopper's suggestion, with a list comprehension -
r = [max(x) for x in zip_longest(*data, fillvalue=-float('inf'))]
print(r)
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Here, I use a fillvalue parameter to fill missing values with negative infinity. The intermediate result looks something like this -
list(zip_longest(*data, fillvalue=-float('inf')))
[(1, 2, 8, 3),
(2, 6, 1, 9),
(3, 3, 4, 1),
(4, 5, 1, 2),
(5, 9, 2, 2),
(6, 1, 3, 1),
(7, 1, 4, 1),
(8, 1, 2, 5),
(-inf, 2, 5, 9),
(-inf, 4, -inf, 3),
(-inf, 5, -inf, -inf)]
Now, applying max becomes straightforward - just do it over each row and you're done.
zip_longest is your friend in this case.
from itertools import zip_longest
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
output = list()
for x in zip_longest(*data, fillvalue=0):
output.append(max(x))
print(output)
>>> [8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Adding a pandas solution
import pandas as pd
pd.DataFrame(data).max().astype(int).tolist()
Out[100]: [8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
You don't need any external module , Just use some logic and you go :
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3]
]
new_data={}
for j in data:
for k,m in enumerate(j):
if k not in new_data:
new_data[k] = [m]
else:
new_data[k].append(m)
final_data=[0]*len(new_data.keys())
for key,value in new_data.items():
final_data[key]=max(value)
print(final_data)
output:
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
You can use itertools.izip_longest (itertools.zip_longest in Python3):
Python2:
import itertools
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
new_data = [max(filter(lambda x:x, i)) for i in itertools.izip_longest(*data)]
Output:
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Python3:
import itertools
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
new_data = [max(filter(None, i)) for i in itertools.zip_longest(*data)]

Adding an array to the end of another Python

I'm very new to python and I have been faced with the task of taking several arrays into another array, this is inside of a loop.
So if you had
a = np.array([2,3,4,3,4,4,5,3,2,3,4])
and
b = np.array([1,1,1,1,1,2,23,2,3,3,3])
and
c = np.array([])
and wanted the result
c = [[2,3,4,3,4,4,5,3,2,3,4],
[1,1,1,1,1,2,23,2,3,3,3]]
so if I did c[0,:] I would get [2,3,4,3,4,4,5,3,2,3,4]
I tried using c = [c, np.array(a)] then next iteration you get c = [c, np.array(b)]
but I i do c[0,:] i get the error message list indices must be integers not tuples
EDIT:
When I get it to print out c it gives [array([2,3,4,3,4,4,5,3,2,3,4],dtype = unit8)]
Do you have any ideas?
In [10]: np.vstack((a,b))
Out[10]:
array([[ 2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4],
[ 1, 1, 1, 1, 1, 2, 23, 2, 3, 3, 3]])
EDIT: Here's an example of using it in a loop to gradually build a matrix:
In [14]: c = np.random.randint(0, 10, 10)
In [15]: c
Out[15]: array([9, 5, 9, 7, 3, 0, 1, 9, 2, 0])
In [16]: for _ in xrange(10):
....: c = np.vstack((c, np.random.randint(0, 10, 10)))
....:
In [17]: c
Out[17]:
array([[9, 5, 9, 7, 3, 0, 1, 9, 2, 0],
[0, 8, 1, 9, 7, 5, 4, 2, 1, 2],
[2, 1, 4, 2, 9, 6, 7, 1, 3, 2],
[6, 0, 7, 9, 1, 9, 8, 5, 9, 8],
[8, 1, 0, 9, 6, 6, 6, 4, 8, 5],
[0, 0, 5, 0, 6, 9, 9, 4, 6, 9],
[4, 0, 9, 8, 6, 0, 2, 2, 7, 0],
[1, 3, 4, 8, 2, 2, 8, 7, 7, 7],
[0, 0, 4, 8, 3, 6, 5, 6, 5, 7],
[7, 1, 3, 8, 6, 0, 0, 3, 9, 0],
[8, 5, 7, 4, 7, 2, 4, 8, 6, 7]])
Most numpythonic way is using np.array:
>>> c = np.array((a,b))
>>>
>>> c
array([[ 2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4],
[ 1, 1, 1, 1, 1, 2, 23, 2, 3, 3, 3]])
You may try this:
>>> c = [list(a), list(b)]
>>> c
[[2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4], [1, 1, 1, 1, 1, 2, 23, 2, 3, 3, 3]]
You can concatenate arrays in numpy. For this to work, they must have the same size in all dimensions except the concatenation direction.
If you just say
>>> c = np.concatenate([a,b])
you will get
>>> c
array([ 2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4, 1, 1, 1, 1, 1, 2,
23, 2, 3, 3, 3])
So in order to achieve what you want you first have to add another dimension to your vectors a and b like so
>>> a[None,:]
array([[2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4]])
or equivalently
>>> a[np.newaxis,:]
array([[2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4]])
So you could do the following:
>>> c = np.concatenate([a[None,:],b[None,:]],axis = 0)
>>> c
array([[ 2, 3, 4, 3, 4, 4, 5, 3, 2, 3, 4],
[ 1, 1, 1, 1, 1, 2, 23, 2, 3, 3, 3]])

Categories