Set Pandas DataFrame entries by boolean index to tuple values - python

I have a data frame
a b
0 1 (1, 1, 0)
1 1 (1, 1, 0)
2 2 (1, 1, 0)
3 1 (1, 1, 0)
(created by d = pd.DataFrame({'a':[1,1,2,1], 'b':[(1,1,0)]*4})).
I'd like to assign tuple values to entries indexed by boolean values, e.g.
d.loc[d['a']==1, 'b'] = [(0,0,1)] * 3
to change the values in rows 0,1,3 to (0,0,1). This does not work and throws a ValueError: Must have equal len keys and value when setting with an ndarray. Note that d.loc[d['a']==1, 'b'] = [((0,0,1),)]*3 does not throw an error, but the result is
a b
0 1 ((0, 0, 1),)
1 1 ((0, 0, 1),)
2 2 (1, 1, 0)
3 1 ((0, 0, 1),)
How do I get the result
a b
0 1 (0, 0, 1)
1 1 (0, 0, 1)
2 2 (1, 1, 0)
3 1 (0, 0, 1)
using logical indexing for rows?

Here's a way you can do:
# set values
ixs = [0,1,3]
vals = [[(0,0,1)]*len(ixs)]
# replace values
d.loc[ixs,['b']] = vals
a b
0 1 (0, 0, 1)
1 1 (0, 0, 1)
2 2 (1, 1, 0)
3 1 (0, 0, 1)
For pandas >= 1.0, you can do:
d.loc[ixs, 'b'] = pd.Series(vals, index=ixs)

Just double wrap the tuple inside list
d.loc[d['a']==1, 'b'] = [[(0, 0, 1)]]
Out[78]:
a b
0 1 (0, 0, 1)
1 1 (0, 0, 1)
2 2 (1, 1, 0)
3 1 (0, 0, 1)

One way to do is to create a Series. However, the indices have to match:
d.loc[d['a']==1, 'b'] = pd.Series([(0,0,1)]*len(d.loc[d['a']==1, 'b']), index=d.loc[d['a']==1, 'b'].index)
This seems a bit cumbersome and I hope someone else posts a better solution.
(Using the naive d.loc[d['a']==1, 'b'] = pd.Series([(0,0,1)]*len(d.loc[d['a']==1, 'b'])) produces NaN in the last row, because the index 3 of the data frame is not met by a matching index in the series. This: d.loc[d['a']==1, 'b'] = pd.Series([(0,0,1)]*len(d)) also seems to work, but seems terribly inefficient especially when most conditions are false.)

here is a solution, you can change the values to fit with your answer

Related

Pandas DataFrame, How to obtain something like method .value_counts(), but its value as the index of that unique sequence of classes instead of count

Given my dataframe df has 11 rows and 5 classes, as the following
import pandas as pd
df = pd.DataFrame( [[0,0,1,0,0],
[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,1,0],
[0,1,1,0,0],
[1,0,0,1,0],
[0,1,1,0,0],
[1,0,0,1,0],
[0,1,1,0,1]])
Note that the columns of df are [0,1,2,3,4] as default
When you run df.value_counts(), you got
0 1 2 3 4
0 0 1 0 0 4
0 0 1 3
1 1 0 0 2
1 0 0 1 0 2
0 0 0 1 0 1
1 1 0 1 1
you could observe that it returns all unique sequence of 5 classes with its count (the sparse element represent zero, I guess), Now I am wondering is there any possible way to get the index which contain each of these sequence of unique value in form of dictionary?
so, for this case, it could return the following dictionary where its key represent each unique sequence of class and its value represent the list of index.. like this
{(0,0,1,0,0): [0,2,4,6],
(0,0,0,0,1): [1,3,5],
(0,1,1,0,0): [8,10,12],
(1,0,0,1,0): [9,11],
(0,0,0,1,0) :[7],
(0,1,1,0,1): [1]}
Thank you in advance
you can simply did .groupby method
So, .groupby which take the list of columns as input will group all possible combination of every columns, and we can follow .groups to access your expected dictionary
df.groupby([0,1,2,3,4]).groups
Result:
{(0, 0, 0, 0, 1): [1, 3, 5], (0, 0, 0, 1, 0): [7], (0, 0, 1, 0, 0): [0, 2, 4, 6], (0, 1, 1, 0, 0): [8, 10], (0, 1, 1, 0, 1): [12], (1, 0, 0, 1, 0): [9, 11]}

Extract the index value from array

I have an array A1 with shape 2x3 & list A2. I want to extract the index value of array from the list.
Example
A1 = [[0, 1, 2]
[3, 4, 5]] # Shape 2 rows & 3 columns
A2 = [0,1,2,3,4,5]
Now, I want to write a code to access the an element's index in Array A1
Expected Output
A2[3] = (1,0) #(1 = row & 0 = column) Index of No.3 in A1
Please help me. Thank you
There is some ambiguity in the question. Are we looking for the indices of elements by value, or by order?
Unravel an ordinal index
Assuming that the values in A1 are not important (i.e. this is not a search of certain values, but really finding the index corresponding to a location), you can use unravel_index for that.
Example:
>>> np.unravel_index(3, A1.shape)
(1, 0)
Or, on the whole A2 in one shot:
>>> np.unravel_index(A2, np.array(A1).shape)
(array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]))
which you may prefer as a list of tuples ("transpose" of the above):
>>> list(zip(*np.unravel_index(A2, np.array(A1).shape)))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Search for a value
If, instead, you are searching for values, e.g., where in A1 are there values equal to A2[i], then, like in #dc_Bita98's answer:
>>> tuple(np.argwhere(A1 == A2[3]).squeeze())
(1, 0)
If you want all the locations in one shot, you need to do something to handle the fact that the shapes are different. Say also, for sake of illustration, that:
A3 = np.array([9, 1, 0, 1])
Then, either:
>>> i, j, k = np.where(A1 == A3[:, None, None])
>>> out = np.full(A3.shape, (,), dtype=object)
>>> out[i] = list(zip(j, k))
>>> out.tolist()
[None, (1, 0), (2, 0), (3, 0)]
which clearly indicates that the first value (9) was not found, and where to find the others.
Or:
>>> [tuple(np.argwhere(A1 == v).squeeze()) for v in A3]
[None, (0, 1), (0, 0), (0, 1)]
If you can use numpy, check out argwhere
a1 = np.array([[0,1,2],[3,4,5]])
a2 = [0,1,2,3,4,5]
a3 = np.argwhere(a1 == a2[3]).squeeze() # -> (1, 0)

remove some elements from a matrix according to the indices

I have a matrix:
a = ([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 0, 1]
[1, 0, 0, 1]])
and I want to print the 0s in the matrix but not all of the 0s. I only want to keep the 0s in every row with the smallest index and remove all subsequent zeros in the row.
For instance, in the first row of this matrix, the second element (a[0][1]) should be kept and the rest of elements in the first row should be deleted since they are all zeros.
I used pop() for 2D array but I got attribute error. And the output is not correct too. I don't know how to compare indices and select the smallest column index in every row.
This is my code:
for ix, row in enumerate(a):
for iy, i in enumerate(row):
if i==0 and (iy+ix<(iy+1)+ix) :
a[ix].pop((iy+1))
print(ix,iy)
elif i==0 and (iy+ix>(iy+1)+ix):
a[ix].pop(iy)
print(ix,iy+1)
print(a)
my expected result is the set of indices and the modified matrix a.
0 1
1 0
2 0
3 1
a=[[1,0],[0,1,1],[0,1,1],[1,0]]
Could anyone help me?
This solution only works if there is at least one zero in every row.
indices = []
for x,row in enumerate(a):
i = row.index(0)
indices.append((x,i))
a[x] = row[:i+1] + [e for e in row[i:] if e]
print(indices)
print(a)
Output
[(0, 1), (1, 0), (2, 0), (3, 1)]
[[1, 0], [0, 1, 1], [0, 1, 1], [1, 0, 1]]
Assuming there's a zero in every row, you can get its column index with
c = np.argmin(a, axis=1)
Alternatively, if the matrix can contain negative numbers, you can do
c = np.argmax(np.equal(a, 0), axis=1)
The rows are just
r = np.arange(len(a))
The result you want is then
result = np.stack((r, c), axis=-1)
If there are rows without a zero in them, you can filter the result with a mask:
mask = np.array(a)[r, c] == 0
result = result[mask, :]
Looking at your example input
a = [[1,0,0,0],[0,0,1,1],[0,1,0,1],[1,0,0,1]]
and the expected output
>>[(0, 1), (1, 0), (2, 0), (3, 1)]
you can reframe the problem as finding the index of the element in each row which has the value zero (and where more than one element exists, return the first).
By framing it this way, the solution is as simple as iterating through each row of a and retrieving the index of the value 0 (whereby only the first element will be returned by default).
Using list comprehension that would look like this:
value_to_find = 0
desired_indexes = [
row.index(value_to_find) for row in a
]
or using map that would be:
value_to_find = 0
desired_indexes = map(lambda row:row.index(value_to_find), a)
Then you could enumerate them to pair the results with the row number
enumerate(desired_indexes)
Et voila!
>>[(0, 1), (1, 0), (2, 0), (3, 1)]
The entire solution can be written in a single line like so:
answer = list(enumerate(map(lambda row:row.index(0), a)))
try this:
a = [[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 0, 1],
[1, 0, 0, 1]]
b = []
for i in a:
f = False
c = []
for j in i:
if (j==0 and f==False) or j != 0:
c.append(j)
if j == 0: f = True
else:
continue
b.append(c)
output:
[[1, 0], [0, 1, 1], [0, 1, 1], [1, 0, 1]]
For getting indices zero in array you can try this:
list({i : j.index(0) for i,j in enumerate(b)}.items())
# [(0, 1), (1, 0), (2, 0), (3, 1)]

How to generate permutations of n (similar) elements in m slots with no duplicates

I've been trying to do something but can't wrap my head around :
I want to generate all possible permutations of n elements in m slots.
To be more precise, I have a 8x8 two-dimensional array, but to make it more simple, let's say it's a 64 slot list (I will transform it back to a two-dimension array later), all filled with 0. I want to place 4 1 in this list, and generate all possible permutations, with no duplicates.
For example, if I wanted to place 2 elements in a list of 4 slots, if would give those 6 lists:
0 0 1 1
0 1 0 1
1 0 0 1
0 1 1 0
1 0 1 0
1 1 0 0
I've tried using itertools, but neither of the functions there seem to do the job, or I don't really understand them enough to find the right way to use them this way.
IF YOU HAVE A BIG NUMBER
from sympy.utilities.iterables import multiset_permutations
# for 64 list with 4X1
l = [0]*64
for i in range(4):
l[i] = 1
perm = multiset_permutations(l)
# for i in perm:
# print(i)
allPerms = list(perm)
print("Total permuations found: ", len(allPerms))
Total permuations found: 635376
ALTERNATE SOLUTION FOR SMALL NUMBERS
# permutations using itertools
from itertools import permutations
# Get all permutations
perm = permutations([1, 1, 0, 0])
print(list(set(perm)))
Output
[(1, 0, 1, 0), (1, 1, 0, 0), (1, 0, 0, 1), (0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1)]

Create dense matrix from sparse matrix efficently (numpy/scipy but NO sklearn)

I have a sparse.txt that looks like this:
# first column is label 0 or 1
# rest of the data is sparse data
# maximum value in the data is 4, so the future dense matrix will
# have 1+4 = 5 elements in a row
# file: sparse.txt
1 1:1 2:1 3:1
0 1:1 4:1
1 2:1 3:1 4:1
The required dense.txt is this:
# required file: dense.txt
1 1 1 1 0
0 1 0 0 1
1 0 1 1 1
Without using scipy coo_matrix it did it in a simple way like this:
def create_dense(fsparse, fdense,fvocab):
# number of lines in vocab
lvocab = sum(1 for line in open(fvocab))
# create dense file
with open(fsparse) as fi, open(fdense,'w') as fo:
for i, line in enumerate(fi):
words = line.strip('\n').split(':')
words = " ".join(words).split()
label = int(words[0])
indices = [int(w) for (i,w) in enumerate(words) if int(i)%2]
row = [0]* (lvocab+1)
row[0] = label
# use listcomps
row = [ 1 if i in indices else row[i] for i in range(len(row))]
l = " ".join(map(str,row)) + "\n"
fo.write(l)
print('Writing dense matrix line: ', i+1)
Question
How can we directly get label and data from sparse data without first creating dense matrix and using NUMPY /Scipy preferably??
Question:
How can we read the sparse data using numpy.fromregex ?
My attempt is:
def read_file(fsparse):
regex = r'([0-1]\s)([0-9]):(1\s)*([0-9]:1)' + r'\s*\n'
data = np.fromregex(fsparse,regex,dtype=str)
print(data,file=open('dense.txt','w'))
It did not work!
Related links:
Parsing colon separated sparse data with pandas and numpy
Tweaking your code to create the dense array directly, rather via file:
fsparse = 'stack47266965.txt'
def create_dense(fsparse, fdense, lvocab):
alist = []
with open(fsparse) as fi:
for i, line in enumerate(fi):
words = line.strip('\n').split(':')
words = " ".join(words).split()
label = int(words[0])
indices = [int(w) for (i,w) in enumerate(words) if int(i)%2]
row = [0]* (lvocab+1)
row[0] = label
# use listcomps
row = [ 1 if i in indices else row[i] for i in range(len(row))]
alist.append(row)
return alist
alist = create_dense(fsparse, fdense, 4)
print(alist)
import numpy as np
arr = np.array(alist)
from scipy import sparse
M = sparse.coo_matrix(arr)
print(M)
print(M.A)
produces
0926:~/mypy$ python3 stack47266965.py
[[1, 1, 1, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 1, 1]]
(0, 0) 1
(0, 1) 1
(0, 2) 1
(0, 3) 1
(1, 1) 1
(1, 4) 1
(2, 0) 1
(2, 2) 1
(2, 3) 1
(2, 4) 1
[[1 1 1 1 0]
[0 1 0 0 1]
[1 0 1 1 1]]
If you want to skip the dense arr, you need to generate the equivalent of the M.row,M.col, and M.data attributes (order doesn't matter)
[0 0 0 0 1 1 2 2 2 2]
[0 1 2 3 1 4 0 2 3 4]
[1 1 1 1 1 1 1 1 1 1]
I don't use regex much so I won't try to fix that. I assume you want to convert
'1 1:1 2:1 3:1'
into
['1' '1' '2' '2' '1' '3' '1']
But that just gets you to the words/label stage.
A direct to sparse:
def create_sparse(fsparse, lvocab):
row, col, data = [],[],[]
with open(fsparse) as fi:
for i, line in enumerate(fi):
words = line.strip('\n').split(':')
words = " ".join(words).split()
label = int(words[0])
row.append(i); col.append(0); data.append(label)
indices = [int(w) for (i,w) in enumerate(words) if int(i)%2]
for j in indices: # quick-n-dirty version
row.append(i); col.append(j); data.append(1)
return row, col, data
r,c,d = create_sparse(fsparse, 4)
print(r,c,d)
M = sparse.coo_matrix((d,(r,c)))
print(M)
print(M.A)
producing
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2] [0, 1, 2, 3, 0, 1, 4, 0, 2, 3, 4] [1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]
....
The only thing that's different is the one data item with value 0. sparse will take care of that.
(Answered before explicitly disallowing sklearn)
This is basically the svmlight / libsvm format.
Just use scikit-learn's load_svmlight_file or the more efficient svmlight-loader. No need to reinvent the wheel here!
from sklearn.datasets import load_svmlight_file
X, y = load_svmlight_file('C:/TEMP/sparse.txt')
print(X)
print(y)
print(X.todense())
Output:
(0, 0) 1.0
(0, 1) 1.0
(0, 2) 1.0
(1, 0) 1.0
(1, 3) 1.0
(2, 1) 1.0
(2, 2) 1.0
(2, 3) 1.0
[ 1. 0. 1.]
[[ 1. 1. 1. 0.]
[ 1. 0. 0. 1.]
[ 0. 1. 1. 1.]]

Categories