OpenCV canny edge detection is not working properly on ideal square - python

I am using this binary square image of 15*15 pixels.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I am applying canny edge detection provided by openCV (version 2.7)
for object size measurement. My expected output should look like,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
But two edges (top and left edge) are always getting shifted by one pixel.
The output of canny edge detection is,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Why is this pixel shift happening?
Is there any way I can avoid this. (I cannot manually adjust pixel shift after output, as I have to use edge detection on irregular shapes) The same shift happens irrespective of odd / even pixels.

At first glance, I was quite surprised when I came across this question. Moreover I did not believe that Canny edge detection would be so deceiving. So I took a similar image and applied Canny edge to it. To my surprise I encountered the same problem you are facing. Why is it so?
After digging in to the documentation I came across many operations that were occurring under the hood.
The documentation claims that Gaussian filtering is done to reduce noise. Well, it is true. But this blurs out the existing edges present in the image as well. So when you blur a perfect square/rectangle, it tends to have curved corners.
After Gaussian filtering, the next step is finding edge gradient. As said, by now the perfect edge of the square/rectangle is gone due to blurring (Gaussian filtering). What is left are rounded/curved edges. Finding the intensity of gradients on rounded/curved edges will never yield a perfect square/rectangle -like edge. I might be wrong, but I guess this the main reason as to why we do not get perfect edges while performing Canny edge detection.
If you want a perfect edge my suggestion would be to try finding contours(as suggested by Micka) and draw a bounding rectangle.

Related

How to determine if 2d array contains a contiguous box?

I'm trying recreate a game in python, and I need to be able to determine if a given matrix contains a contiguous box and then return which elements are enclosed by that box. In this case, the matrix is the board and and each element is a game piece.
For example, the matrix:
0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0
0 0 1 0 1 0 0 0
0 0 1 1 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
contains a contiguous box, and the element inside that block in this case is (4,3). We also assume that there is a hypothetical wall of ones on the outside of the array such that:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
forms a contiguous box, with the elements (0,6),(0,7),(0,8),(1,6),(1,7),(1,8) being boxed in. The key is that it MUST be contiguous on all sides, so the matrix
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
would return nothing. (Additionally, diagonals do not apply here, only elements next to each other to the left, right, up or down.)
I've tried a few things to implement this such as a recursive solution similar to the flood fill algorithm, but I was unable to get it to apply to all cases. Any suggestions to solving this problem?

Python keeps updating wrong list

I'm trying to create a code to simulate the spread of something, via a 2D list of nxn structure. My issue is this: when I create a temp of my original list via temp = [*board], board[:], etc. it nonetheless updates both lists and instead of returning,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
returns
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
my code is here:
def spread(board, iterations, size):
temp = board[:]
for iteration in range(iterations):
for x in range(size):
for y in range(size):
if board[x][y] == 1:
if x+1 < size:
temp[x+1][y] = 1
if x-1 >= 0:
temp[x-1][y] = 1
if y+1 < size:
temp[x][y+1] = 1
if y-1 >= 0:
temp[x][y-1] = 1
board = temp[:]
return board
and I called it via
new_board = spread(my_board, 1, 15)
This is programming 101. Remember, lists are stored in heap, with pointers to them.
So really the variable board points to the place in heap where that array is stored. When you assign temp to board, what you are doing is creating a new pointer which points to that same array. I suggest taking a look at this using python tutor: https://pythontutor.com/visualize.html#mode=edit
For example:
b = [1,2,3,4,5]
a = b
a[0] = 2
print(b)
will output
[2,2,3,4,5]
Try it out in python tutor and you'll see what's happening!
To solve your problem, create a deep copy
def deep_copy(board):
temp = []
for i in range(len(board)):
row_copy = []
for j in range(len(board[0])):
row_copy.append(board[i][j])
temp.append(row_copy)
return temp

Efficient majority voting for 1-in-N classification with sliding window classifier over 2D Array

Short version: I would like to use the values in a 2D array to index the third dimension of a corresponding subset of a larger array - and then increment those elements.
I would appreciate help making the two incorporate_votes algorithms quicker. Actually sliding the classifier over the array and calculating optimal strides is not the point here.
Long version:
I have an algorithm, which classifies each element in R1xC1 2D array as 1 of N classes.
I would like to classify a larger 2D array of size R2xC2. Rather than tessellating the larger array into multiple R1xC1 2D arrays I would like to slide the classifier over the larger array, such that each element in the larger array is classified multiple times. This means that I will have a R2xC2xN array to store the results in, and as the window slides across the large array each pixel in the window will increment one of elements in third dimension (i.e. one of the N classes).
After all the sliding is finished we can simply get the argmax in the dimension corresponding to the classification to get the per element classification.
I intend to scale this up to classify an array of several million pixels with a few dozens so I am concerned with the efficiency of using the classification results to increment one value in the classification dimension per element.
Below is the toy version of the problem I have been crafting all evening in Python3. It has a naive double for loop implementation and a slightly better one obtained by index swizzling and some smart indexing. The classifier is just random.
import numpy as np
map_rows = 8
map_cols = 10
num_candidates = 3
vote_rows = 6
vote_cols = 5
def display_tally(the_tally):
print("{:25s}{:25s}{:25s}".format("Class 0", "Class 1", "Class 2"))
for i in range(map_rows):
for k in range(num_candidates):
for j in range(map_cols):
print("{:<2}".format(the_tally[i, j, k]), end='')
print(" ", end='')
print("")
def incorporate_votes(current_tally, this_vote, left, top):
for i in range(vote_rows):
for j in range(vote_cols):
current_tally[top + i, left + j, this_vote[i, j]] += 1
return current_tally
def incorporate_votes2(current_tally, this_vote, left, top):
for i in range(num_candidates):
current_tally[i, top:top + vote_rows, left:left + vote_cols][this_vote == i] += 1
return current_tally
tally = np.zeros((map_rows, map_cols, num_candidates), dtype=int)
swizzled_tally = np.zeros((num_candidates, map_rows, map_cols), dtype=int)
print("Before voting")
display_tally(tally)
print("\n Votes from classifier A (centered at (2,2))")
votes = np.random.randint(num_candidates, size=vote_rows*vote_cols).reshape((vote_rows, vote_cols))
print(votes)
tally = incorporate_votes(tally, votes, 0, 0)
swizzled_tally = incorporate_votes2(swizzled_tally, votes, 0, 0)
print("\nAfter classifier A voting (centered at (2,2))")
display_tally(tally)
print("\n Votes from classifier B (Centered at (5, 4))")
votes2 = np.random.randint(num_candidates, size=vote_rows*vote_cols).reshape((vote_rows, vote_cols))
print(votes2)
tally = incorporate_votes(tally, votes2, 3, 2)
swizzled_tally = incorporate_votes2(swizzled_tally, votes2, 3, 2)
print("\nAfter classifier B voting (Centered at (5, 4))")
print("Naive vote counting")
display_tally(tally)
print("\nSwizzled vote counting")
display_tally(np.moveaxis(swizzled_tally, [-2, -1], [0, 1]))
new_tally = np.moveaxis(tally, -1, 0)
classifications = np.argmax(swizzled_tally, axis=0)
print("\nNaive classifications")
print(classifications)
print("\nSwizzled classifications")
classifications = np.argmax(tally, axis=2)
print(classifications)
And some sample output:
Before voting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Votes from classifier A (centered at (2,2))
[[1 1 2 2 1]
[0 2 0 2 1]
[0 2 2 0 2]
[1 1 1 2 0]
[1 0 0 2 1]
[2 1 1 1 0]]
After classifier A voting (centered at (2,2))
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Votes from classifier B (Centered at (5, 4))
[[2 2 2 0 0]
[0 1 2 1 2]
[2 0 0 2 0]
[2 2 1 1 1]
[1 2 0 2 1]
[1 1 1 1 2]]
After classifier B voting (Centered at (5, 4))
Naive vote counting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Swizzled vote counting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Naive classifications
[[1 1 2 2 1 0 0 0 0 0]
[0 2 0 2 1 0 0 0 0 0]
[0 2 2 0 2 2 0 0 0 0]
[1 1 1 0 0 2 1 2 0 0]
[1 0 0 2 0 0 2 0 0 0]
[2 1 1 1 0 1 1 1 0 0]
[0 0 0 1 2 0 2 1 0 0]
[0 0 0 1 1 1 1 2 0 0]]
Swizzled classifications
[[1 1 2 2 1 0 0 0 0 0]
[0 2 0 2 1 0 0 0 0 0]
[0 2 2 0 2 2 0 0 0 0]
[1 1 1 0 0 2 1 2 0 0]
[1 0 0 2 0 0 2 0 0 0]
[2 1 1 1 0 1 1 1 0 0]
[0 0 0 1 2 0 2 1 0 0]
[0 0 0 1 1 1 1 2 0 0]]

Reordering rows based on number formed by combining columns in Python

I have a dataframe formed with pandas as seen below:
a b c d e f g h i j k l m n o
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0
3 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0
4 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0
5 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0
6 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
8 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0
9 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
10 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0
11 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0
12 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
13 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
15 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
16 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
I want to sort the rows so that they are ordered in descending order. The value of the row is the number formed by combining the columns. For example, row 1 is 000000000000000 and row 2 is 000000101010010. The final result should have Row 6 as the first row and row 1 as the last row. I've tried
dat.sort_values(by=['a'], ascending=False, axis=0)
but this only sorts by the first column. Is there another way I could reorder the rows?
Sort by all columns in their current order:
df.sort_values(by=df.columns.tolist(), ascending=False)
# a b c d e f g h i j k l m n o
#6 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0
#3 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0
#5 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0
#10 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0
#12 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
#15 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
#4 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0
#2 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0
#11 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0
#8 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0
#13 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
#9 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
#16 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
#7 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
#14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
#1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Using a key for sort
df.loc[df.astype(str).sum(1).sort_values(ascending=False).index]
Out[871]:
a b c d e f g h i j k l m n o
6 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0
3 0 1 0 0 1 0 1 0 1 0 1 0 0 1 0
5 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0
10 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0
15 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
12 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0
4 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0
2 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0
11 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0
8 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0
13 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
9 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
16 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

pandas: print all non-empty rows from a DataFrame

I have this data:
time-stamp ccount A B C D E F G H I
2015-03-03T23:43:33+0000 0 0 0 0 0 0 0 0 0 0
2015-03-04T06:33:28+0000 0 0 0 0 0 0 0 0 0 0
2015-03-04T06:18:38+0000 0 0 0 0 0 0 0 0 0 0
2015-03-04T05:36:43+0000 0 0 0 1 0 0 0 0 0 0
2015-03-04T05:29:09+0000 0 0 0 1 0 0 0 0 1 0
2015-03-04T07:01:11+0000 0 0 1 0 1 0 0 0 0 0
2015-03-03T15:27:06+0000 19 0 1 0 1 0 0 0 0 0
2015-03-03T15:43:38+0000 10 0 1 0 1 1 0 0 0 0
2015-03-03T18:16:26+0000 0 0 0 1 0 0 0 0 0 0
2015-03-03T18:19:48+0000 0 0 0 0 0 0 0 0 0 0
2015-03-03T18:20:02+0000 4 0 0 0 0 1 0 0 0 0
2015-03-03T20:21:55+0000 2 0 0 0 0 0 1 0 0 0
2015-03-03T20:37:36+0000 0 0 0 0 0 0 0 0 0 0
2015-03-04T03:03:51+0000 1 0 0 0 0 0 1 0 0 0
2015-03-03T16:33:04+0000 9 0 0 0 0 0 0 0 0 0
2015-03-03T16:18:13+0000 1 0 0 0 0 0 0 0 0 0
2015-03-03T16:34:18+0000 4 0 0 0 0 0 0 0 0 0
2015-03-03T18:11:36+0000 5 0 0 0 0 0 0 0 0 0
2015-03-03T18:24:35+0000 0 0 0 0 0 0 0 0 0 0
I want to slice all rows which have at least a single one ("1") in the columns A to I.
For the above data, the output will be:
time-stamp ccount A B C D E F G H I
2015-03-04T05:36:43+0000 0 0 0 1 0 0 0 0 0 0
2015-03-04T05:29:09+0000 0 0 0 1 0 0 0 0 1 0
2015-03-04T07:01:11+0000 0 0 1 0 1 0 0 0 0 0
2015-03-03T15:27:06+0000 19 0 1 0 1 0 0 0 0 0
2015-03-03T15:43:38+0000 10 0 1 0 1 1 0 0 0 0
2015-03-03T18:16:26+0000 0 0 0 1 0 0 0 0 0 0
2015-03-03T18:20:02+0000 4 0 0 0 0 1 0 0 0 0
2015-03-03T20:21:55+0000 2 0 0 0 0 0 1 0 0 0
2015-03-04T03:03:51+0000 1 0 0 0 0 0 1 0 0 0
We have ignored all the rows which don't have a "1" in any of the columns from A to I.
You could use any and boolean indexing to select only the rows that have at least one entry equal to 1:
df[(df.loc[:,['A','B','C','D','E','F','G','H','I']] == 1).any(axis=1)]
Referring to columns by label is somewhat tedious if you have a lot of them so you can use slicing to make things a little neater:
df[(df.loc[:, 'A':'I'] == 1).any(axis=1)]
a = open("a.txt",'r')
for line in a:
new = line.split(" ")
if "1" in new[1:]:
print line
OUTPUT:
2015-03-04T05:36:43+0000 0 0 0 1 0 0 0 0 0 0
2015-03-04T05:29:09+0000 0 0 0 1 0 0 0 0 1 0
2015-03-04T07:01:11+0000 0 0 1 0 1 0 0 0 0 0
2015-03-03T15:27:06+0000 19 0 1 0 1 0 0 0 0 0
2015-03-03T15:43:38+0000 10 0 1 0 1 1 0 0 0 0
2015-03-03T18:16:26+0000 0 0 0 1 0 0 0 0 0 0
2015-03-03T18:20:02+0000 4 0 0 0 0 1 0 0 0 0
2015-03-03T20:21:55+0000 2 0 0 0 0 0 1 0 0 0
2015-03-04T03:03:51+0000 1 0 0 0 0 0 1 0 0 0
2015-03-03T16:18:13+0000 1 0 0 0 0 0 0 0 0 0
Another solution assuming that all the values in the columns A to I are nonnegative
df[(df.drop(['time-stamp','ccount'], axis=1).sum(axis=1) > 0)]
Of course the dropping part, can be combined with the other solutions

Categories