How to recursively extract values from a pandas DataFrame?

How to recursively extract values from a pandas DataFrame? - python

I have the following pandas DataFrame:
df = pd.DataFrame([
[3, 2, 5, 2],
[8, 5, 4, 2],
[9, 0, 8, 6],
[9, 2, 7, 1],
[1, 9, 2, 3],
[8, 1, 1, 6],
[8, 8, 0, 0],
[0, 1, 3, 0],
[2, 4, 5, 3],
[4, 0, 9, 7]
])
I am trying to write a recursive function that extracts all the possible paths up until 3 iterations:
and saves them into a list. Several attempts but no results to post.
Desired Output:
[
[0, 3, 9, 4],
[0, 3, 9, 0],
[0, 3, 9, 9],
[0, 3, 9, 7],
[0, 3, 2, 9],
[0, 3, 2, 0],
...
]
Represented as a tree, this is how it looks like:

Since you use numeric naming for both rows and columns in your dataframe, it's faster to convert the frame to a 2-D numpy array. Try this;
arr = df.to_numpy()
staging = [[0]]
result = []
while len(staging) > 0:
s = staging.pop(0)
if len(s) == 4:
result.append(s)
else:
i = s[-1]
for j in range(4):
staging.append(s + [arr[i, j]])

Related

Python - Reshape matrix by taking n consecutive rows every n rows

There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?

Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])

Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])

How to add a new column in a empty list from another list in python

I have a list
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
another list
col=[0,2,3]
and an empty list M = [].
col list has the index of columns of L list that has to be copied to M.
So M should be [[1,3,0],[4,6,0],[7,9,0]].
How can i do this??
I want M as a dataframe.

>>> L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col=[0,2,3]
>>> M = [[nums[i] for i in col] for nums in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]

With numpy you can use a list as list index:
>>> import numpy as np
>>> L=np.array([[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]])
>>> col=[0,2,3]
>>> M = [row[col] for row in L]
>>> M
[array([1, 3, 0]), array([4, 6, 0]), array([7, 9, 0])]
>>> M = [list(row[col]) for row in L]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]

You can use operator.itemgetter along with simple list comprehension to fetch your desired elements
>>> from operator import itemgetter
>>> L = [[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
>>> col = col=[0,2,3]
>>> M = [list(itemgetter(*col)(i)) for i in l]
>>> M
[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
To Convert it to DataFrame you can do
>>> import pandas as pd
>>> df = pd.DataFrame(M)
>>> df
0 1 2
0 1 3 0
1 4 6 0
2 7 9 0

Just to put it out there, the enumerate solution:
L=[[1, 2, 3, 0, 3, 8], [4, 5, 6, 0, 3, 8], [7, 8, 9, 0, 3, 8]]
col=[0,2,3]
solution = [[j for i, j in enumerate(sub) if i in col] for sub in L]
#[[1, 3, 0], [4, 6, 0], [7, 9, 0]]
This would work faster if col was a set:
col={0,2,3}

Sum a staggered array "columns" in this way

Let's say I have the following array
import numpy as np
matrix = np.array([
[[1, 2, 3, 4], [0, 1], [2, 3, 4, 5]],
[[1, 2, 3], [4], [0, 1], [2, 0], [0, 0]],
[[2, 2], [3, 4, 0], [1, 1, 0, 0], [0]],
[[6, 3, 3, 4, 0], [4, 2, 3, 4, 5]],
[[1, 2, 3, 2], [0, 1, 2], [3, 4, 5]]])
As you can see, it's a staggered array. What I want to do is to sum the elements in a way so that the output is:
[11, 11, 15, 18, 0, 8, 9, 9, 12, 15]
I want to sum the elements in the "columns" of the matrix, but I don't know how to do it.

As mentioned by juanpa.arrivillaga in the comments, you don't have a multi-dimensional array, you have a 1-D array of lists of lists. You need to flatten the inner lists first :
>>> np.array([[z for y in x for z in y] for x in matrix])
array([[1, 2, 3, 4, 0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 0, 1, 2, 0, 0, 0],
[2, 2, 3, 4, 0, 1, 1, 0, 0, 0],
[6, 3, 3, 4, 0, 4, 2, 3, 4, 5],
[1, 2, 3, 2, 0, 1, 2, 3, 4, 5]])
It should be much easier to solve your problem now. This matrix has a shape of (5,10), and supports T for transposition and np.sum() for summing rows or columns.
You didn't write any code, so I won't solve the problem completely, but with this matrix, you're one step away from:
array([11, 11, 15, 18, 0, 8, 9, 9, 12, 15])

calculations for different columns in a numpy array

I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.

Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])

Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)

Python list of list of lists

I don't have a real reason for doing this, other than to gain understanding, but I'm trying to create a list of lists of lists using list comprehension.
I can create a list of lists just fine:
In[1]: [j for j in [range(3,k) for k in [k for k in range(5,10)]]]
Out[1]: [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
And I can create a list of lists of lists from either the results of that, for example:
In [2]: [range(0,i) for i in [3,4]]
Out[2]: [[0, 1, 2], [0, 1, 2, 3]]
In [3]: [range(0,i) for i in j]
Out[3]:
[[0, 1, 2],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5, 6],
[0, 1, 2, 3, 4, 5, 6, 7]]
But when I try to combine it into a single statement it goes awry:
In [4]: [range(0,i) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]]]
---------------------------------------------------------------------------
TypeError: range() integer end argument expected, got list.
Am I missing some brackets somewhere?

Try the following:
[[range(0, j) for j in range(3, i)] for i in range(5, 10)]
This results in the following list of lists of lists:
>>> pprint.pprint([[range(0, j) for j in range(3, i)] for i in range(5, 10)])
[[[0, 1, 2], [0, 1, 2, 3]],
[[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4]],
[[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5]],
[[0, 1, 2],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5, 6]],
[[0, 1, 2],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5, 6],
[0, 1, 2, 3, 4, 5, 6, 7]]]
The best way to understand what is happening in a list comprehension is to try to roll it out into normal for loops, lets try that with yours and then mine to see what the difference is:
x = [range(0,i) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]]]
# equivalent to
a, b, c, x = [], [], [], []
for k in range(5, 10):
a.append(k)
for k in a:
b.append(range(3, k))
for j in b:
c.append(j)
for i in c:
x.append(range(0, i))
At the end of this x would be equivalent to your list comprehension, however of course this code will not work because b (and c) will be lists of lists, so i will be a list and range(0, i) will cause an error. Now obviously this is not what you intended to do, since what you would really like to see is those for loops nested instead of one after the other.
Lets look at how mine works:
x = [[range(0, j) for j in range(3, i)] for i in range(5, 10)]
# equivalent to
x = []
for i in range(5, 10):
a = []
for j in range(3, i):
a.append(range(0, j)):
x.append(a)
Hope this helped to clarify!

From your question:
In[1]: [j for j in [range(3,k) for k in [k for k in range(5,10)]]]
Out[1]: [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
range takes integer parameters
So when you do
[range(0,i) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]]]
it is the equivalent of saying
L = []
for j in [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]:
L.append(range(0,i))
Of course, this will fail because each i is a list and range doesn't take list parameters.
Other answers here show you how to fix your error. This response was to explain what went wrong with your initial approach
Hope this helps

To troubleshoot this, I ran each step of the list comprehension at a time.
>>> [k for k in range(5,10)]
[5, 6, 7, 8, 9]
>>> [range(3,k) for k in [k for k in range(5,10)]]
[[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
Your problem is here, because it feeds lists to the next range() instead of ints:
>>> [j for j in [range(3,k) for k in [k for k in range(5,10)]]]
[[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
You can flatten this list using lambda and reduce:
# reduce(lambda x,y: x+y,l)
>>> reduce(lambda x,y: x+y, [range(3,k) for k in [k for k in range(5,10)]])
[3, 4, 3, 4, 5, 3, 4, 5, 6, 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 8]
But you will still only nest two lists deep:
>>> [range(0,i) for i in reduce(lambda x,y: x+y, [range(3,k) for k in [k for k in range(5,10)]])]
[[0, 1, 2], [0, 1, 2, 3], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7]]
If you want to go three lists deep, you need to reconsider your program flow. List comprehensions are best suited for working with the outermost objects in an iterator. If you used list comprehensions on the left side of the for statement as well as the right, you could nest more deeply:
>>> [[[range(j, 5) for j in range(5)] for i in range(5)] for k in range(5)]

That's because:
[range(0,i) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]]]
^here i == j == range(3,k) - it's a range, not integer
You probably wanted to do:
[range(0,j) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]] for j in i]
^ i is still a list, iterate it with j here
You do of course know that your first statement can be reduced to :
[range(3,k) for k in range(5,10)]
??

[range(0,i) for i in [j for j in [range(3,k) for k in [k for k in range(5,10)]]]]
Because i is a list, you need list comprehend on that too:
[[range(0,h) for h in i] for i in [...]]
In your code:
print [
[range(0,h) for h in i] for i in [ # get a list for i here. Need to iterate again on i!
j for j in [ # [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
range(3,k) for k in [
k for k in range(5,10) # [5,6,7,8,9]
]
]
]
]
# output:
[[[0, 1, 2], [0, 1, 2, 3]], [[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4]], [[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5]], [[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6]], [[0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7]]]
Also, you have one unnecessary comprehension in there. This will produce the same result:
print [
[range(0,i) for i in j] for j in [ # get a list for j here. Need to iterate again on j!
range(3,k) for k in [ # [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
k for k in range(5,10) # [5,6,7,8,9]
]
]
]
As an aside, this is about the point where a list comprehension becomes less desirable than a recursive function or simple nested for loops, if only for the sake of code readability.

Apparently, in the first range (range(0,i)), i is [[3, 4], [3, 4, 5], [3, 4, 5, 6], [3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8]]
You probably need to flatten that list before you call the range function with it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to recursively extract values from a pandas DataFrame? - python

Related

Python - Reshape matrix by taking n consecutive rows every n rows

How to add a new column in a empty list from another list in python

Sum a staggered array "columns" in this way

calculations for different columns in a numpy array

Python list of list of lists

Categories

Resources