How to apply conditions to str.split() - python

I have a dataframe that contains a string of varying length in each cell i.e.
Num
(1,2,3,4,5)
(6,7,8)
(9)
(10,11,12)
I want to avoid attempting to perform str.split(',') on the cells that only have one number in them. However, I want all of the single numbers to be converted to a list of one element.
Here is what I have tried, it gives an error that says " 'int' object is not callable"
if(df['Num'].size() > 1):
df['Num'] = df['Num'].str.split(',')
update for clarification:
Index Num
0 2,6,7
1 1,3,6,7,8
2 2,4,7,8,9
3 3,5,8,9,10
4 4,9,10
5 1,2,7
6 1,2,3,6,8
7 2,3,4,7,9
8 3,4,5,8,10
9 4,5,9
10 2,3
11 1,3
12 1,2
13 2,3,4
14 1,3,4
15 1,2,4
16 1,2,3
17 2
18 1
I am trying to take this dataframe and convert each Num row from a string of numbers to a list. I want all of the indices that contain only one number (17 and 18) to be converted to a list containing a single element (itself).
This code below only works if every string is more than one number separated by a ','.
df['Adj'] = df['Adj'].str.split(',')
The output dataframe that I get when I run the above code. Notice the elements that only had one number are now nan.
Index Num
0 [2, 6, 7]
1 [1, 3, 6, 7, 8]
2 [2, 4, 7, 8, 9]
3 [3, 5, 8, 9, 10]
4 [4, 9, 10]
5 [1, 2, 7]
6 [1, 2, 3, 6, 8]
7 [2, 3, 4, 7, 9]
8 [3, 4, 5, 8, 10]
9 [4, 5, 9]
10 [2, 3]
11 [1, 3]
12 [1, 2]
13 [2, 3, 4]
14 [1, 3, 4]
15 [1, 2, 4]
16 [1, 2, 3]
17 NaN
18 NaN

Assuming your column are all strings and you just want the individual numbers as a list of str, this should do the trick:
df['Num'].str.strip('()').str.split(',')
# 0 [1, 2, 3, 4, 5]
# 1 [6, 7, 8]
# 2 [9]
# 3 [10, 11, 12]
# Name: Num, dtype: object
Since not all your data are str type, you'll need to coerce them into str first to ensure the string methods are called properly:
df['Num'].astype(str).str.split(',')
# 0 [2, 6, 7]
# 1 [1, 3, 6, 7, 8]
# 2 [2, 4, 7, 8, 9]
# ...
# 16 [1, 2, 3]
# 17 [2]
# 18 [1]

Related

How to store numbers in chunks from an array and create another array or list? [duplicate]

This question already has answers here:
How to Split or break a Python list into Unequal chunks, with specified chunk sizes
(3 answers)
Python For Loop Appending Only Last Value to List
(2 answers)
Closed 1 year ago.
I have two arrays x and y:
x = [2 3 1 1 2 5 7 3 6]
y = [0 0 4 2 4 5 8 4 5 6 7 0 5 3 2 8 1 3 1 0 4 2 4 5 4 4 5 6 7 0]
I want to create a list "z" and want to store group/chunks of numbers from y into z and the size of groups is defined by the values of x.
so z store numbers as
z = [[0,0],[4,2,4],[5],[8],[4,5],[6,7,0,5,3],[2,8,1,3,1,0,4],[2,4,5],[4,4,5,6,7,0]]
I tried this loop:
h=[]
for j in x:
h=[[a] for i in range(j) for a in y[i:i+1]]
But it is only storing for last value of x.
Also I am not sure whether the title of this question is appropriate for this problem. Anyone can edit if it is confusing. Thank you so much.
You're reassigning h each time through the loop, so it ends up with just the last iteration's assignment.
You should append to it, not assign it.
start = 0
for j in x:
h.append(y[start:start+j])
start += j
Another way to do it would be by using (and consuming as you do) an iterator like so:
x = [2, 3, 1, 1, 2, 5, 7, 3, 6]
y = [0, 0, 4, 2, 4, 5, 8, 4, 5, 6, 7, 0, 5, 3, 2, 8, 1, 3, 1, 0, 4, 2, 4, 5, 4, 4, 5, 6, 7, 0]
yi = iter(y)
res = [[next(yi) for _ in range(i)] for i in x]
print(res) # -> [[0, 0], [4, 2, 4], [5], [8], [4, 5], [6, 7, 0, 5, 3], [2, 8, 1, 3, 1, 0, 4], [2, 4, 5], [4, 4, 5, 6, 7, 0]]
Aside of the problem you are facing, and as a general rule to live by, try to give more meaningful names to your variables.

Using numpy to select rows based on a condition of one column

I have a file with various columns. Say
1 2 3 4 5 6
2 4 5 6 7 4
3 4 5 6 7 6
2 0 1 5 6 0
2 4 6 8 9 9
I would like to select and save out rows (in each column) in a new file which have the values in column two in the range [0 - 2].
The answer in the new file should be
1 2 3 4 5 6
2 0 1 5 6 0
Kindly assist me. I prefer doing this with numpy in python.
For array a, you can use:
a[(a[:,1] <= 2) & (a[:,1] >= 0)]
Here, the condition filters the values in your second column.
For your example:
>>> a
array([[1, 2, 3, 4, 5, 6],
[2, 4, 5, 6, 7, 4],
[3, 4, 5, 6, 7, 6],
[2, 0, 1, 5, 6, 0],
[2, 4, 6, 8, 9, 9]])
>>> a[(a[:,1] <= 2) & (a[:,1] >= 0)]
array([[1, 2, 3, 4, 5, 6],
[2, 0, 1, 5, 6, 0]])

Multi-Index two columns, both of which have a common index

I have the following data frame:
Connector Pin Adj.
F123 1 [2, 6, 7]
2 [1, 3, 6, 7, 8]
3 [2, 4, 7, 8, 9]
4 [3, 5, 8, 9, 10]
5 [4, 9, 10]
6 [1, 2, 7]
7 [1, 2, 3, 6, 8]
8 [2, 3, 4, 7, 9]
9 [3, 4, 5, 8, 10]
10 [4, 5, 9]
C137 1 [2, 3]
2 [1, 3]
3 [1, 2]
Both Connector and Pin are multi-indexed, however, is it possible to also multi index that same Connector value with Adj.?
The code below is what I figured would be the ticket:
df = df.reset_index().set_index(['Connector' , 'Pin'])['Adj.']
df = df.reset_index().set_index(['Connector' , 'Pin']['Connector' , 'Adj.'])
On the second line I get the following error:
TypeError: list indices must be integers or slices, not tuple
First off, I thought this was a list and not a tuple. If it is a tuple, is it possible to convert this already populated tuple to a list of integers and then multi-index it back with Connector?
Update:
I am trying to make it so the dataframe has two columns at the end, not just one column with all 3 indexed together. So like a table that has the index value ('F123', 1) and then another 'Adj.' column that has ('F123', 2), ('F123', 6), ('F123', 7)---> all of which have the same index--->('F123', 1)
Expected Output (I think):
Connector Pin Connector Adj.
F123 1 F123 2
1 F123 6
1 F123 7

Python - Delete row in matrix/array if row contains

If you have an x*n matrix how do you check for a row that contains a certain number and if so, how do you delete that row?
If you are using pandas, you can create a mask that you can use to index the dataframe, negating the mask with ~:
df = pd.DataFrame(np.arange(12).reshape(3, 4))
# 0 1 2 3
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
value = 2
If you want to check if the value is contained in a specific column:
df[~(df[2] == value)]
# 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
Or if it can be contained in any column:
df[~(df == value).any(axis=1)]
# 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
Just reassign it to df afterwards.
This also works if you are using just numpy:
x = np.arange(12).reshape(3, 4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
x[~(x == value).any(axis=1)]
# array([[ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
And finally, if you are using plain Python and have a list of lists, use the built-in any in a list comprehension:
y = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
[row for row in y if not any(x == value for x in row)]
# [[4, 5, 6, 7], [8, 9, 10, 11]]

Filter a 2D numpy array from an array of values

Let's say I have a numpy array with the following shape :
nonSortedNonFiltered=np.array([[9,8,5,4,6,7,1,2,3],[1,3,2,6,4,5,7,9,8]])
I want to :
- Sort the array according to nonSortedNonFiltered[1]
- Filter the array according to nonSortedNonFiltered[0] and an array of values
I currently do the sorting with :
sortedNonFiltered=nonSortedNonFiltered[:,nonSortedNonFiltered[1].argsort()]
Which gives : np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
Now I want to filter sortedNonFiltered from an array of values, for example :
sortedNonFiltered=np.array([[9 5 8 6 7 4 1 3 2],[1 2 3 4 5 6 7 8 9]])
listOfValues=np.array([8 6 5 2 1])
...Something here...
> np.array([5 8 6 1 2],[2 3 4 7 9]) #What I want to get in the end
Note : Each value in a column of my 2D array is exclusive.
You can use np.in1d to get a boolean mask and use it to filter columns in the sorted array, something like this -
output = sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Sample run -
In [76]: nonSortedNonFiltered
Out[76]:
array([[9, 8, 5, 4, 6, 7, 1, 2, 3],
[1, 3, 2, 6, 4, 5, 7, 9, 8]])
In [77]: sortedNonFiltered
Out[77]:
array([[9, 5, 8, 6, 7, 4, 1, 3, 2],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [78]: listOfValues
Out[78]: array([8, 6, 5, 2, 1])
In [79]: sortedNonFiltered[:,np.in1d(sortedNonFiltered[0],listOfValues)]
Out[79]:
array([[5, 8, 6, 1, 2],
[2, 3, 4, 7, 9]])

Categories