Expanding a Pandas.dataframe after encoding

Expanding a Pandas.dataframe after encoding - python

My actual problem need to encode strings in a data frame, as I do in the following step:
import pandas as pd
df = pd.DataFrame({"cool": list("ABC"), "not_cool": list("CBA")})
encoding = {"A": [0, 0, 1], "B": [0, 1, 0], "C": [1, 0, 0]}
Which is encoded:
df.applymap(encoding.get)
Now, what I have is a data frame where the elements are lists:
cool not_cool
[0, 0, 1] [1, 0, 0]
[0, 1, 0] [0, 1, 0]
[1, 0, 0] [0, 0, 1]
I need to expand this as matrix. How to do that? My first thought was iterate through the rows and apply numpy.hstack for joining, store it and numpy.vstack the stored rows, but it doesn't work as intended.
Other way is to this data frame to create a new one, where every column will be the n-th element of the lists. If I had this data frame, the pandas.DataFrame.values would get what I need:
1, 2, 3, 4, 5, 6 # Column names
0, 0, 1, 1, 0, 0
0, 1, 0, 0, 1, 0
1, 0, 0, 0, 0, 1

quick answer:
x = df.applymap(encoding.get)
(x.cool+x.not_cool).values # gives you matrix without the headers
# should be elementary to get labels you need in there
This adds the two columns together (adding lists actually concatenates them). The values just get the array of lists.
Updating for #mithrado comment
pd.DataFrame(np.vstack((x.cool+x.not_cool).values), columns=range(6))]
# will give you a dataframe with the required values
You seem to ask fro the columns as a another row in the DataFrame? Why would you want it that way?

Related

Replace all but the first 1 in an array with 0

I am trying to find a way to replace all of the duplicate 1 with 0. As an example:
[[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]]
Should become:
[[0,1,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0]]
I found a similar problem, however the solution does not seem to work numpy: setting duplicate values in a row to 0

Assume array contains only zeros and ones, you can find the max value per row using numpy.argmax and then use advanced indexing to reassign the values on the index to a zeros array.
arr = np.array([[0,1,0,1,0],
[1,0,0,1,0],
[1,1,1,0,1]])
res = np.zeros_like(arr)
idx = (np.arange(len(res)), np.argmax(arr, axis=1))
res[idx] = arr[idx]
res
array([[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]])

Try looping through each row of the grid
In each row, find all the 1s. In particular you want their indices (positions within the row). You can do this with a list comprehension and enumerate, which automatically gives an index for each element.
Then, still within that row, go through every 1 except for the first, and set it to zero.
grid = [[0, 1, 0, 1, 0], [1, 0, 0, 1, 0], [1, 1, 1, 0, 1]]
for row in grid:
ones = [i for i, element in enumerate(row) if element==1]
for i in ones[1:]:
row[i] = 0
print(grid)
Gives: [[0, 1, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0]]

You can use cumsum:
(arr.cumsum(axis=1).cumsum(axis=1) == 1) * 1
this will create a cummulative sum, by then checking if a value is 1 you can find the first 1s

Splitting a nump array at specific locations

Hey so i basically have a problem like this:
i have a numpy array which contains a matrix of values, for example:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
Using another array i want to extract the specific values and sum them together, this is a bit hard to explain so ill just show an example:
values = np.array([3,1,3,4,2])
so this means we want the first 3 values of the first row, first value of the second row, first 3 values of the 3rd row, first 4 values of the 4th row and first 2 values of the the last row, so we only want this data:
final_data = np.array([
[3, 0, 1],
[0],
[0, 3, 0],
[0, 0, 0, 6],
[5, 1]])
then we want to get the sum amount of those values, in this case the sum value will be 19.
Is there any easy way to do this? also, the data isn't always the same size so i cant have any fixed variables.

An even better answer:
Data[np.arange(Data.shape[1])<values[:,None]].sum()

You can try:
sum([Data[i, :j].sum() for i, j in enumerate(values)])

You can accomplish this with advanced indexing. The advanced coordinates can be calculated separately before pulling them from the array.
Explicitly:
Data = np.array([
[3, 0, 1, 5],
[0, 0, 0, 7],
[0, 3, 0, 0],
[0, 0, 0, 6],
[5, 1, 0, 0]])
values = np.array([3,1,3,4,2])
X = [0,0,0,1,2,2,2,3,3,3,3,4,4]
Y = [0,1,2,0,0,1,2,0,1,2,3,0,1]
Data[X,Y]
Notice X is the number of times to access each row and Y is the column to access with each X. These can be calculated from values directly:
X = np.concatenate([[n]*i for n,i in enumerate(values)])
Y = np.concatenate([np.arange(i) for i in values])

Python: Integrate the number of columns into a variable

I am new to python and could need your help.
I have the variable 'sequence' which shows the optimal order of products.
sequence = seq_2
with for example:
seq_2 = [[0, 0], [1, 0]]
seq_4 = [[0, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]]
What I want to do is to change the '2' according to the number of columns of a matrix I have generated.
For example, if the matrix has 6 columns (= 6 products), the variable should be:
sequence = seq_6
I know that the number of columns can be generated with:
columns = len(df.columns)
But how to combine the result to my "sequence"-varible?
Best regards
Amy

is this the answer you are looking for?
sequence = 'seq_' + str(len(df.columns))
With the string function seq_ can be concatenated to the number columns in the df

Creating n number of masked subarrays for all the n unique values in an array using python

I have an array created from a raster. This array has multiple unique values. I want to create new arrays for each unique value such that the places with that value are marked as '1' and the rest as '0'. I am using python for this.
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # Input array
b = numpy.unique(A) # gives unique values
a1 = [1, 1, 0, 0, 0, 1, 1, 0, 0] #new array for value 1
a2 = [0, 0, 0, 1, 1, 0, 0, 0, 0] #new array for value 2
a3 = [0, 0, 1, 0, 0, 0, 0, 1, 1] #new array for value 3
So basically the code would scan through the unique values, get the number of unique values and create individual arrays for each unique value.
I have used the numpy.unique() and numpy.zeros() to get the unique values in the array, and to create arrays that can be overwritten to the desired array, respectively. But I do not how to get the code to get the number of unique values and create that many new arrays.
I have been trying to do this with the for loop, but I haven't been successful. My concepts of developing such a nested for loopare not very clear yet.

You could do something like this:
>>> A = [1, 1, 3, 2, 2, 1, 1, 3, 3]
>>> result = [(A==unique_val).astype(int) for unique_val in np.unique(A)]
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]), array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
The core part of the program being:
(A == unique_val).astype(int)
It's simply comparing the elements in numpy array with unique_val, each element return a boolean result. By using astype(int) we are converting the boolean result to an integer array.

You can do:
a1 = (A == b[0]) * 1
And, instead of b[0], create a loop using len(b) and iterate with b[i].

Easiest way is to do is with broadcasting:
locs = (A[None, :] == b[:, None]).astype(int)
out = {val: arr for val, arr in zip(list(b), list(locs))}

Make every possible combination in 2D array

I'm trying to make an array of 4x4 (16) pixel black and white images with all possible combinations. I made the following array as a template:
template = [[0,0,0,0], # start with all white pixels
[0,0,0,0],
[0,0,0,0],
[0,0,0,0]]
I then want to iterate through the template and changing the 0 to 1 for every possible combination.
I tried to iterate with numpy and itertools but can only get 256 combinations, and with my calculations there should be 32000 (Edit: 65536! don't know what happened there...). Any one with mad skills that could help me out?

As you said, you can use the itertools module to do this, in particular the product function:
import itertools
import numpy as np
# generate all the combinations as string tuples of length 16
seq = itertools.product("01", repeat=16)
for s in seq:
# convert to numpy array and reshape to 4x4
arr = np.fromiter(s, np.int8).reshape(4, 4)
# do something with arr

You would have a total of 65536 such combinations of such a (4 x 4) shaped array. Here's a vectorized approach to generate all those combinations, to give us a (65536 x 4 x 4) shaped multi-dim array -
mask = ((np.arange(2**16)[:,None] & (1 << np.arange(16))) != 0)
out = mask.astype(int).reshape(-1,4,4)
Sample run -
In [145]: out.shape
Out[145]: (65536, 4, 4)
In [146]: out
Out[146]:
array([[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 1, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
...,
[[1, 0, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[0, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])

One possibility which relies on a for loop
out = []
for i in range(2**16):
out.append(np.frombuffer("{:016b}".format(i).encode('utf8')).view(np.uint8).reshape(4,4)-48)
Obviously you could make that a list comprehension if you like.
It takes advantage of Python string formatting which is able to produce the binary representation of integers. The format string instructs it to use 16 places filling with zeros on the left. The string is then encoded to give a bytes object which numpy can interpret as an array.
In the end we subtract the code for the character "0" to get a proper 0. Luckily, "1" sits just above "0", so that's all we need to do.

First I'll iterate for all numbers from 0 to (2^16)-1. Then I'll create a 16 character binary string for each of those numbers and thus covering all possible combinations
After that I converted the string to a list and made a 2d list out of it using list comprehension and slicing.
all_combinations = []
for i in xrange(pow(2,16))
binary = '{0:016b}'.format(i) ## Converted number to binary string
binary = map(int,list(binary)) ## String to list ## list(map(int,list(binary))) in py 3
template = [binary[i:i+4] for i in xrange(0, len(binary), 4)] #created 2d list
all_combinations.append(template)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Expanding a Pandas.dataframe after encoding - python

Related

Replace all but the first 1 in an array with 0

Splitting a nump array at specific locations

Python: Integrate the number of columns into a variable

Creating n number of masked subarrays for all the n unique values in an array using python

Make every possible combination in 2D array

Categories

Resources