Numpy - How to shift values at indexes where change happened - python

So I would like to shift my values in a 1D numpy arrays, where change happened. The sample of shifting shall be configured.
input = np.array([0,0,0,0,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0])
shiftSize = 2
out = np.magic(input, shiftSize)
print out
np.array([0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0])
For example the first switch happened and index 4, so index 2,3 becomes '1'.
The next happened at 5, so 6 and 7 becomes '1'.
EDIT: Also it would be important to be without for cycle because, that might be slow (it is needed for large data sets)
EDIT2: indexes and variable name
I tried with np.diff, so i get where the changes happened and then np.put, but with multiple index ranges it seems impossible.
Thank you for the help in advance!

What you want is called "binary dilation" and is contained in scipy.ndimage:
import numpy as np
import scipy.ndimage
input = np.array([0,0,0,0,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0], dtype=bool)
out = scipy.ndimage.morphology.binary_dilation(input, iterations=2).astype(int)
# array([0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0])

Nils' answer seems good. Here is an alternative using NumPy only:
import numpy as np
def dilate(ar, amount):
# Convolve with a kernel as big as the dilation scope
dil = np.convolve(np.abs(ar), np.ones(2 * amount + 1), mode='same')
# Crop in case the convolution kernel was bigger than array
dil = dil[-len(ar):]
# Take non-zero and convert to input type
return (dil != 0).astype(ar.dtype)
# Test
inp = np.array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0])
print(inp)
print(dilate(inp, 2))
Output:
[0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0]
[0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0]

Another numpy solution :
def dilatation(seed,shift):
out=seed.copy()
for sh in range(1,shift+1):
out[sh:] |= seed[:-sh]
for sh in range(-shift,0):
out[:sh] |= seed[-sh:]
return out
Example (shift = 2) :
in : [0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0]
out: [0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1]

Related

In a boolean matrix, what is the best way to make every value adjacent to True/1 to True?

I have a numpy boolean 2d array with True/False. I want to make every adjacent cell of a True value to be True. What's the best/fastest of doing that in python?
For Eg:
#Initial Matrix
1 0 0 0 0 1 0
0 0 0 1 0 0 0
0 0 0 0 0 0 0
#After operation
1 1 1 1 1 1 1
1 1 1 1 1 1 1
0 0 1 1 1 0 0
It looks like you want to do dilation. OpenCV might be your best tool
import cv2
dilatation_dst = cv2.dilate(src, np.ones((3,3)))
https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html
You can use scipy.signal.convolve2d.
import numpy as np
from scipy.signal import convolve2d
result = convolve2d(src, np.ones((3,3)), mode='same').astype(bool).astype(int)
print(result)
Or we can use scipy.ndimage.
from scipy import ndimage
result = ndimage.binary_dilation(src, np.ones((3,3))).astype(int)
print(result)
Output:
[[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[0 0 1 1 1 0 0]]
Given
arr = np.array([[1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
You can do
from scipy.ndimage import shift
arr2 = arr | shift(arr, (0, 1), cval=0) | shift(arr, (0, -1), cval=0)
arr3 = arr2 | shift(arr2, (1, 0), cval=0), (-1, 0), cval=0)

Sum n values of n-lists with the same index in Python

For example, I have 5 lists with 10 elements each one generated with random values simulating a coin toss.
I get my 5 lists with 10 elements in the following way:
result = [0,1] #0 is tail #1 is head
probability = [1/2,1/2]
N = 10
list = []
def list_generator(number): #this number would be 5 in this case
for i in range(number):
n_round = np.array(rnd.choices(result, probability, k=N))
print(n_round)
list_generator(5)
And for example I would get this
[1 1 0 0 0 1 0 1 1 0]
[0 1 0 0 0 1 1 1 0 1]
[1 1 0 0 1 1 1 0 1 1]
[0 0 0 1 0 0 0 1 0 0]
[0 0 1 1 0 0 0 0 1 1]
How can I sum only the numbers of the same column, I mean, I would like to get a list that appends the value of 1+0+1+0+0 (the first column), then, that list appends the sum of each second coin toss of each round i.e. 1+1+1+0+0 (the second column), and so on with the ten coin tosses
(I need it in a list because I will use this to plot a graph)
I have thought about making a matrix with each array and summing only the nth column and append that value in the list but I do not know how to do that, I do not have much knowledge about using arrays.
Have your function return a 2d numpy array and then sum along the required axis. Separately, you don't need to pass probability to random.choices as equal probabilities are the default.
import random
import numpy as np
def list_generator(number):
return np.array([np.array(random.choices([0,1], k=10)) for i in range(number)])
a = list_generator(5)
>>> a
array([[0, 1, 1, 1, 0, 1, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 1, 0, 0, 1, 1],
[1, 1, 0, 0, 1, 1, 1, 1, 0, 0],
[0, 1, 1, 0, 0, 1, 1, 1, 0, 0]])
>>> a.sum(axis=0)
array([3, 4, 3, 2, 3, 5, 4, 3, 2, 1])
You can use numpy.random.randint to generate your randomized data. Then use sum to get the sum of the columns:
import numpy as np
N = 10
data = np.random.randint(2, size=(N, N))
print(data)
print(data.sum(axis=0))
[[1 0 1 1 1 1 0 0 1 1]
[0 0 1 1 0 0 1 1 1 0]
[1 1 0 1 1 1 0 0 1 1]
[1 1 0 0 0 0 1 1 1 1]
[1 0 0 1 1 1 0 1 1 1]
[1 0 1 1 0 1 0 1 1 1]
[0 0 0 1 0 1 0 1 1 0]
[0 0 0 1 0 1 0 1 0 1]
[1 0 0 0 1 0 1 0 1 1]
[1 0 1 1 0 1 0 0 0 1]]
[7 2 4 8 4 7 3 6 8 8]

Python Index Error - Out of Bounds for axis 0

I have dataset like the following in the txt file. (First column is=userid, second column is=locationid)
Normally my dataset is big but I created a dummy dataset to better explain my problem.
I'm trying to create a matrix like in the code below. row will be userid column location id. Since this dataset shows the location ids visited by the users, I assign the value 1 in the code to the locations they visited in the matrix.
I am getting an indexerror.IndexError: index 801 is out of bounds for axis 0 with size 50
I tried different user_num and poi_num but still doesn't work
datausers.txt
801 32332
801 14470
801 33847
501 10259
501 34041
501 10201
301 15810
301 34827
301 19264
401 34834
401 35407
401 36115
Code
import numpy as np
from collections import defaultdict
from itertools import islice
import pandas as pd
train_file = "datausers.txt"
user_num = 20
poi_num = 20
training_matrix = np.zeros((user_num, poi_num))
train_data = list(islice(open(train_file, 'r'), 10))
for eachline in train_data:
uid, lid= eachline.strip().split()
uid, lid = int(uid), int(lid)
training_matrix[uid, lid] = 1.0
Error
Expected Output
4x12 Matrix because we have 4 unique users and 12 unique location
[1 0 1 0 1 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 0 0 0 0
...
]
For example for first row
1 0 1 0 1 0 0 0 0 0 0 0
User 801 visited 3 locations and those are 1. (The location of the 1's can be variable I gave it to be an example)
As you have tagged the question with pandas, here is one way of approaching the problem with str.get_dummies method of the pandas Series:
df = pd.read_csv('datausers.txt', sep='\s+', names=['userid', 'locationid'], index_col=0)
out = df['locationid'].astype(str).str.get_dummies().sum(level=0)
Result
For the sample data
>>> out
10201 10259 14470 15810 19264 32332 33847 34041 34827 34834 35407 36115
userid
801 0 0 1 0 0 1 1 0 0 0 0 0
501 1 1 0 0 0 0 0 1 0 0 0 0
301 0 0 0 1 1 0 0 0 1 0 0 0
401 0 0 0 0 0 0 0 0 0 1 1 1
If you need numpy array instead:
>>> out.to_numpy()
array([[0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])

Create several new dataframes or dictionaries from one dataframe

I have a dataframe like this:
evt pcle bin_0 bin_1 bin_2 ... bin_49
1 pi 1 0 0 0
1 pi 1 0 0 0
1 k 0 0 0 1
1 pi 0 0 1 0
2 pi 0 0 1 0
2 k 0 1 0 0
3 J 0 1 0 0
3 pi 0 0 0 1
3 pi 1 0 0 0
3 k 0 1 0 0
...
5000 J 0 0 1 0
5000 pi 0 1 0 0
5000 k 0 0 0 1
With this information, I want to create several other dataframes df_{evt} (or maybe dictionaries should be better?):
df_1 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 3 2 0 1 0
k 1 0 0 0 1
df_2 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 1 0 0 1 0
k 0 1 0 0 0
In total there would be 5000 dataframes (1 for each evt) where in each of them:
*the column "cant" has the ocurrences of "pcle" in the particular "evt".
*bin_0 ... bin_49 have the sum of the values for this particular "pcle" in
the particular "evt".
Which is the best way to achieve this goal?
Here's a possible solution:
import pandas as pd
import numpy as np
columns = ["evt", "pcle", "bin_0", "bin_1", "bin_2", "bin_3"]
data = [[1, "pi", 1, 0, 0, 0],
[1, "pi", 0, 0, 0, 0],
[1, "k", 0, 0, 0, 1],
[1, "pi", 0, 0, 1, 0],
[2, "pi", 0, 0, 1, 0],
[2, "k", 0, 1, 0, 0],
[3, "J", 0, 1, 0, 0],
[3, "pi", 0, 0, 0, 1],
[3, "pi", 1, 0, 0, 0],
[3, "k", 0, 1, 0, 0]]
df = pd.DataFrame(data=data, columns=columns)
# group your data by the columns you want
grouped = df.groupby(["evt", "pcle"])
# compute the aggregates for the bin_X
df_t = grouped.aggregate(np.sum)
# move pcle from index to column
df_t.reset_index(level=["pcle"], inplace=True)
# count occurrences of pcle
df_t["cant"] = grouped.size().values
# filter evt with .loc
df_t.loc[1]
If you want to make it into a dictionary then you can run:
d = {i:j.reset_index(drop=True) for i, j in df_t.groupby(df_t.index)}

Drawing a checkerboard out of 1s and 0s with a nested for loop

I'm using just normal python to make a checkerboard grids out of alternating 1s and 0s. I know that I can use a nested for loop with the modulus operator but I don't know exactly what to do with the modulus inside the for loop.
def print_board(board):
for i in range(len(board)):
print " ".join([str(x) for x in board[i]])
my_grid = []
for i in range(8):
my_grid.append([0] * 8)
for j in range(8):
#This is where I'm stuck.
print_board(my_grid)
Here's my solution, using nested for loops. Note that whether i+j is even or odd is a good way to determine where it should be 1 and where it should be 0, as it always alternates between adjacent 'cells'.
def checkerboard(n):
board = []
for i in range(n):
board.append([])
for j in range(n):
board[i].append((i+j) % 2)
return board
for row in checkerboard(8):
print(row)
Prints
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
Perhaps we should first aim to solve a different problem: how to generate a list with checkboard patterns.
Such list thus has interleaved a [0,1,0,...] row, and an [1,0,1,...] row.
Let us first construct the first row with length n. We can do this like:
[i%2 for i in range(n)]
Now the next row should be:
[(i+1)%2 for i in range(n)]
the next one can be:
[(i+2)%2 for i in range(n)]
Do you see a pattern emerge? We can construct such a pattern like:
[[(i+j)%2 for i in range(n)] for j in range(m)]
Now the only thing that is left is producing it as a string. We can do this by converting the data in the list to strings, join them together (and optionally use generators instead of list comprehension). So:
'\n'.join(''.join(str((i+j)%2) for i in range(n)) for j in range(m))
So we can construct an m×n grid like:
def print_board(m,n):
print('\n'.join(''.join(str((i+j)%2) for i in range(n)) for j in range(m)))
A 10x15 board then is:
>>> print_board(10,15)
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
N.B.: we can make the code a bit more efficient, by using &1 instead of %2:
def print_board(m,n):
print('\n'.join(''.join(str((i+j)&1) for i in range(n)) for j in range(m)))
A simple approach
# Function to draw checkerboard
def drawBoard(length):
for row in xrange(0, length):
for col in xrange(0, length):
# Even rows will start with a 0 (No offset)
# Odd rows will start with a 1 (1 offset)
offset = 0
if row % 2 == 0:
offset = 1
# alterate each column in a row by 1 and 0
if (col + offset) % 2 == 0:
print '1',
else:
print '0',
# print new line at the end of a row
print ""
drawBoard(8)
For even widths, you could avoid loops all together and just multiply some strings:
def print_board(width):
print ('0 1 ' * (width // 2) + '\n' + '1 0 ' * (width // 2) + '\n') * (width // 2)
print_board(10)
Giving:
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
This works as follows for a 10 x 10 grid:
Take the string
1 0
Multiply the string by 5 giving
1 0 1 0 1 0 1 0 1 0
Do the same with 0 1 giving:
0 1 0 1 0 1 0 1 0 1
Add a newline to the end of each and join them together:
1 0 1 0 1 0 1 0 1 0 \n0 1 0 1 0 1 0 1 0 1 \n
Now multiply this whole string by 5 to get the grid.
You can use list comprehension and a modulo:
new_board = [[0 if b%2 == 0 else 1 for b in range(8)] if i%2 == 0 else [1 if b%2 == 0 else 0 for b in range(8)] for i in range(8)]
for row in new_board:
print(row)
Output:
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
For a more custom finish:
for row in new_board:
print(' '.join(map(str, row)))
Output:
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0

Categories