Most efficient way to create a binary matrix of users/purchases? - python

I have data where there are N users and K possible items. The data is in the form of a dictionary like data[user] = [item1, item2, ...]. I want to take this dictionary and create an N x K matrix where the (n,k) is entry is 1 if user n has purchased this item and 0 otherwise. Below is sample data.
import random
random.seed(10)
# Users
N = list(range(10))
# Items represented by an integer
K = list(range(1000))
# I have a dict of {user: [item1, item2...itemK]}
# where k differs by user
data = {x:random.sample(K, random.randint(1,50)) for x in N}
# Now I want to create an N x K matrix, where rows are users, columns are items, and the (n,k) entry
# is 1 if user i has item k in list and 0 otherwise.

If I understand your question right, you can convert the list of items each user has to set and then do a test for each item.
Note: I lowered the number of items to 50 (to represent it better on screen):
import random
random.seed(10)
# Users
N = list(range(10))
# Items represented by an integer
K = list(range(50))
# I have a dict of {user: [item1, item2...itemK]}
# where k differs by user
data = {x: random.sample(K, random.randint(1, 50)) for x in N}
# create matrix:
matrix = []
for v in data.values():
v = set(v)
matrix.append([int(i in v) for i in K])
# print matrix:
for row in matrix:
print(*row)
Prints (each row is different user):
1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1
1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 0 1

The best possible way includes traversing each user in dictionary and each item the user has at the least.
//Assuming users are also represented by integers
mat = [[0]*N]*K //Matrix initialised to value 0
for ui in data:
for i in data[ui]:
mat[ui][i]=1
If the user can have repeated items, you can try-
mat = [[0]*N]*K
for ui in data:
for i in list(set(data[ui])):
mat[ui][i]=1

Related

Analyze values around a main value in 2D matrix

I have in Python, a series of 2D arrays consisting of both negative and positive values with commas. For each matrix I have to find values included in a range. Up to this point I have succeeded.
Once I have found the values with their indices, however, I have to analyze their surroundings (with for example a submatrix of known size) and depending on the values I find in the surroundings (through a condition) I assign the value 0 or 1 .
Thanks in advance everyone
Update:
I expect to get a matrix that contains zones with values of 1 or with zero values taking into account the values surrounding my main value defined with an initial condition.
A part of My matrix 2D = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 2 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 3 3 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
]
I would like to analyze the neighborhood of the value 2. If in its neighborhood there are only 1 values then I assign a condition (eg true) or a specific value.
On the other hand, if in its surroundings there are values equal to 2 I would like to extend the search (maximum distance of 3 cells from the identified value) until the condition is satisfied (the neighborhood equal to 1).
Thanks

How can I make a square with a specified circumference and add margin?

I am trying to make a square path of a specified length:
I made a function - and if I put 20 then I get a 6x6 matrix.
How can I add a margin of 0's of eg. 3 fields thickness?
like this
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
def square(length): return [
[1 for _ in range(length//4+1)]
for _ in range(length//4+1)
]
for x in square(24):
print(x)
You can prepare a line pattern of 0s and 1s then build a 2D matrix by intersecting them.
def square(size,margin=3):
p = [0]*margin + [1]*(size-2*margin) + [0]*margin
return [[r*c for r in p] for c in p]
for row in square(20):print(*row)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Here's one way. One caution here is that, because of the way I duplicated the zero rows, those are all the same list. If you modify one of the zero rows, it will modify all of them.
def square(length):
zeros = [0]*(length//4+7)
sq = [zeros] * 3
sq.extend( [
([0,0,0] + [1 for _ in range(length//4+1)] + [0,0,0] )
for _ in range(length//4+1)
])
sq.extend( [zeros]*3 )
return sq
for x in square(24):
print(x)
Here's a numpy method.
import numpy as np
def square(length):
c = length//4+1
sq = np.zeros((c+6,c+6)).astype(int)
sq[3:c+3,3:c+3] = np.ones((c,c))
return sq
print( square(24) )
One way to do this is to build it as a flat string, then use textwrap to style the output into the right number of lines:
import textwrap
# The number of 1's in a row/column
count = 6
# The number of 0's to pad with
margin = 3
# The total 'size' of a row/column
size = margin + count + margin
pad_rows = "0" * size * margin
core = (("0" * margin) + ("1" * count) + ("0" * margin)) * count
print('\n'.join(textwrap.wrap(pad_rows + core + pad_rows, size)))

Creating a map by iterating through a list in pygame?

I am creating my first game with pygame and am trying to render a maze by using a list where it creates an tile where the designated number is. Where the index reads 1, it'll print a wall, 2, a door, and so on. Right now I just have it so that it prints the same image for every tile, but the draw() function is only picking up and printing when the index hits 1.
class Maze:
def __init__(self, x=20, y=20):
self.x = x
self.y = y
self.tile = pygame.image.load("assets/redsquare.png")
self.tile = pygame.transform.scale(self.tile, (20, 20))
self.screen = pygame.display.set_mode((800, 700))
self.background = pygame.Surface(self.screen.get_size()).convert()
self.maze = """
1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 1
1 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
1 0 4 0 1 0 0 0 1 0 0 0 0 0 0 0 0 4 0 1
1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1
1 0 1 0 0 0 1 0 0 0 4 0 0 0 1 0 1 1 0 1
1 4 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1
1 0 0 0 1 0 0 0 4 0 0 0 1 0 1 0 4 0 0 1
1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 0 1
1 0 0 0 1 0 1 0 4 0 1 0 1 0 1 0 1 1 0 1
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1
1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 1
1 4 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1
1 0 0 0 0 0 4 0 1 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 0 1 0 1 0 0 0 4 0 0 0 1 1
1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1
1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 3
1 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1
1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 """
self.maze = self.maze.splitlines()
And this is my draw function.
def draw(self):
print(self.maze)
for y, line in enumerate(self.maze):
for x, c in enumerate(line):
if c == "1":
self.screen.blit(self.tile, (x * 20, y * 20))
if c == "2":
self.screen.blit(self.tile, (x * 20, y * 20))
if c == "3":
self.screen.blit(self.tile, (x * 20, y * 20))
if c == "4":
self.screen.blit(self.tile, (x * 20, y * 20))
The maze string contains a lot of whitespaces, thus you don't get the expected result.
I recommend to create a list of strings, where each string represents a single row of the grid:
(You don't need splitlines at all)
self.maze = ["12111111111111111111",
"10100000400000000001",
"10101110101111111111",
"10401000100000000401",
"10111011111111101101",
"10100010004000101101",
"14111110111110111101",
"10001000400010104001",
"11101011111010101101",
"10001010401010101101",
"11101010101010101101",
"10001000101010101101",
"14111111101010001101",
"10000040101111111111",
"11111010101000400011",
"10001010101011111011",
"10111010100010001003",
"10000010111110101011",
"11111010000000101011",
"11111111111111111111"]
Alternatively you can replace all blanks by "nothing" (.replace(" ", "")):
self.maze = """
1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 1
1 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
1 0 4 0 1 0 0 0 1 0 0 0 0 0 0 0 0 4 0 1
1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1
1 0 1 0 0 0 1 0 0 0 4 0 0 0 1 0 1 1 0 1
1 4 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1
1 0 0 0 1 0 0 0 4 0 0 0 1 0 1 0 4 0 0 1
1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 0 1
1 0 0 0 1 0 1 0 4 0 1 0 1 0 1 0 1 1 0 1
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1
1 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 1
1 4 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1
1 0 0 0 0 0 4 0 1 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 0 1 0 1 0 0 0 4 0 0 0 1 1
1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1
1 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 3
1 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1
1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 """
self.maze = self.maze.replace(" ", "").splitlines()

How do you remove values not in a cluster using a pandas data frame?

If I have a pandas data frame like this made up of 0 and 1s:
1 1 1 0 0 0 0 1 0
1 1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 1 0
1 0 0 0 0 1 0 0 0
How do I filter out outliers such that I get something like this:
1 1 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
Such that I remove the outliers.
We can do this with a cummulative product over the second axis with pandas.cumprod [pandas-doc]:
>>> df.cumprod(axis=1)
0 1 2 3 4 5 6 7 8
0 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0
2 1 1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
The same result can here be obtained with pandas.cummin [pandas-doc]:
>>> df.cummin(axis=1)
0 1 2 3 4 5 6 7 8
0 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0
2 1 1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0

How do I open a binary matrix and convert it into a 2D array or a dataframe?

I have a binary matrix in a txt file that looks as follows:
0011011000
1011011000
0011011000
0011011010
1011011000
1011011000
0011011000
1011011000
0100100101
1011011000
I want to make this into a 2D array or a dataframe where there is one number per column and the rows are as shown. I've tried using numpy and pandas, but the output has only one column that contains the whole number. I want to be able to call an entire column as a number.
One of the codes I've tried is:
with open("a1data1.txt") as myfile:
dat1=myfile.read().split('\n')
dat1=pd.DataFrame(dat1)
Use read_fwf with parameter widths:
df = pd.read_fwf("a1data1.txt", header=None, widths=[1]*10)
print (df)
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 1 1 0 0 0
1 1 0 1 1 0 1 1 0 0 0
2 0 0 1 1 0 1 1 0 0 0
3 0 0 1 1 0 1 1 0 1 0
4 1 0 1 1 0 1 1 0 0 0
5 1 0 1 1 0 1 1 0 0 0
6 0 0 1 1 0 1 1 0 0 0
7 1 0 1 1 0 1 1 0 0 0
8 0 1 0 0 1 0 0 1 0 1
9 1 0 1 1 0 1 1 0 0 0
After you read your txt, you can using following code fix it
pd.DataFrame(df[0].apply(list).values.tolist())
Out[846]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 1 1 0 0 0
1 1 0 1 1 0 1 1 0 0 0
2 0 0 1 1 0 1 1 0 0 0
3 0 0 1 1 0 1 1 0 1 0
4 1 0 1 1 0 1 1 0 0 0
5 1 0 1 1 0 1 1 0 0 0
6 0 0 1 1 0 1 1 0 0 0
7 1 0 1 1 0 1 1 0 0 0
8 0 1 0 0 1 0 0 1 0 1
9 1 0 1 1 0 1 1 0 0 0

Categories