Convert pandas DataFrame into list of lists [duplicate] - python

This question already has answers here:
Pandas DataFrame to List of Lists
(14 answers)
Closed 3 years ago.
I have a pandas data frame like this:
admit gpa gre rank
0 3.61 380 3
1 3.67 660 3
1 3.19 640 4
0 2.93 520 4
Now I want to get a list of rows in pandas like:
[[0,3.61,380,3], [1,3.67,660,3], [1,3.19,640,4], [0,2.93,520,4]]
How can I do it?

There is a built in method which would be the fastest method also, calling tolist on the .values np array:
df.values.tolist()
[[0.0, 3.61, 380.0, 3.0],
[1.0, 3.67, 660.0, 3.0],
[1.0, 3.19, 640.0, 4.0],
[0.0, 2.93, 520.0, 4.0]]

you can do it like this:
map(list, df.values)

EDIT: as_matrix is deprecated since version 0.23.0
You can use the built in values or to_numpy (recommended option) method on the dataframe:
In [8]:
df.to_numpy()
Out[8]:
array([[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.8, 6.1, 5.4, ..., 15.9, 14.4, 8.6],
...,
[ 0.2, 1.3, 2.3, ..., 16.1, 16.1, 10.8],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4]])
If you explicitly want lists and not a numpy array add .tolist():
df.to_numpy().tolist()

Related

How to pass argument of type char ** from Python to C API [duplicate]

As seen here How do I convert a Python list into a C array by using ctypes? this code will take a python array and transform it to a C array.
import ctypes
arr = (ctypes.c_int * len(pyarr))(*pyarr)
Which would the way of doing the same with a list of lists or a lists of lists of lists?
For example, for the following variable
list3d = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
I have tried the following with no luck:
([[ctypes.c_double * 4] *2]*3)(*list3d)
# *** TypeError: 'list' object is not callable
(ctypes.c_double * 4 *2 *3)(*list3d)
# *** TypeError: expected c_double_Array_4_Array_2 instance, got list
Thank you!
EDIT: Just to clarify, I am trying to get one object that contains the whole multidimensional array, not a list of objects. This object's reference will be an input to a C DLL that expects a 3D array.
It works with tuples if you don't mind doing a bit of conversion first:
from ctypes import *
list3d = [
[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0]],
[[0.2, 1.2, 2.2, 3.2], [4.2, 5.2, 6.2, 7.2]],
[[0.4, 1.4, 2.4, 3.4], [4.4, 5.4, 6.4, 7.4]],
]
arr = (c_double * 4 * 2 * 3)(*(tuple(tuple(j) for j in i) for i in list3d))
Check that it's initialized correctly in row-major order:
>>> (c_double * 24).from_buffer(arr)[:]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2,
0.4, 1.4, 2.4, 3.4, 4.4, 5.4, 6.4, 7.4]
Or you can create an empty array and initialize it using a loop. enumerate over the rows and columns of the list and assign the data to a slice:
arr = (c_double * 4 * 2 * 3)()
for i, row in enumerate(list3d):
for j, col in enumerate(row):
arr[i][j][:] = col
I made the change accordingly
a = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
arr = (((ctypes.c_float * len(a[0][0])) * len(a[0])) * len(a))
arr_instance=arr()
for i in range(0,len(a)):
for j in range(0,len(a[0])):
for k in range(0,len(a[0][0])):
arr_instance[i][j][k]=a[i][j][k]
The arr_instance is what you want.

Convert an array stored as a string, to a proper numpy array

For long and tedious reasons, I have lots of arrays that are stored as strings:
tmp = '[[1.0, 3.0, 0.4]\n [3.0, 4.0, -1.0]\n [3.0, 4.0, 0.1]\n [3.0, 4.0, 0.2]]'
Now I obviously do not want my arrays as long strings, I want them as proper numpy arrays so I can use them. Consequently, what is a good way to convert the above to:
tmp_np = np.array([[1.0, 3.0, 0.4]
[3.0, 4.0, -1.0]
[3.0, 4.0, 0.1]
[3.0, 4.0, 0.2]])
such that I can do simple things like tmp_np.shape = (4,3) or simple indexing tmp_np[0,:] = [1.0, 3.0, 0.4] etc.
Thanks
You can use ast.literal_eval, if you replace your \n characters with ,:
temp_np = np.array(ast.literal_eval(tmp.replace('\n', ',')))
Returns:
>>> tmp_np
array([[ 1. , 3. , 0.4],
[ 3. , 4. , -1. ],
[ 3. , 4. , 0.1],
[ 3. , 4. , 0.2]])

How to efficiently generate matrix from vector in numpy? [duplicate]

This question already has answers here:
Most efficient way to map function over numpy array
(11 answers)
Numpy vectorize function with non-scalar output
(1 answer)
Closed 5 years ago.
I have a function f(x):[0,1]-> Rⁿ such as:
>>> f(0.54)
array([0.2, 0.3, 4.0, 5.2, ... , 1.0])
How can I efficiently apply that to a vector, in order to generate a matrix?
Example:
>>> f([0.54, 0.32, 0.56, 0.21])
array([0.2, 0.3, 4.0, 5.2, ... , 1.0],
[0.6, 0.1, 0.0, 2.3, ... , 4.7],
[0.1, 7.1, 0.2, 4.9, ... , 3.1],
[1.3, 2.8, 1.2, 1.1, ... , 5.3])
Note: numpy solutions are very welcome :)

how to merge the values of a list of lists and a list into 1 resulting list of lists

I have a list of lists (a) and a list (b) which have the same "length" (in this case "4"):
a = [
[1.0, 2.0],
[1.1, 2.1],
[1.2, 2.2],
[1.3, 2.3]
]
b = [3.0, 3.1, 3.2, 3.3]
I would like to merge the values to obtain the following (c):
c = [
[1.0, 2.0, 3.0],
[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3]
]
currently I'm doing the following to achieve it:
c = []
for index, elem in enumerate(a):
x = [a[index], [b[index]]] # x assigned here for better readability
c.append(sum(x, []))
my feeling is that there is an elegant way to do this...
note: the lists are a lot larger, for simplicity I shortened them. they are always(!) of the same length.
In python3.5+ use zip() within a list comprehension and in-place unpacking:
In [7]: [[*j, i] for i, j in zip(b, a)]
Out[7]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
In python 2 :
In [8]: [j+[i] for i, j in zip(b, a)]
Out[8]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
Or use numpy.column_stack in numpy:
In [16]: import numpy as np
In [17]: np.column_stack((a, b))
Out[17]:
array([[ 1. , 2. , 3. ],
[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])

Creating a Pandas rolling-window series of arrays

Suppose I have the following code:
import numpy as np
import pandas as pd
x = np.array([1.0, 1.1, 1.2, 1.3, 1.4])
s = pd.Series(x, index=[1, 2, 3, 4, 5])
This produces the following s:
1 1.0
2 1.1
3 1.2
4 1.3
5 1.4
Now what I want to create is a rolling window of size n, but I don't want to take the mean or standard deviation of each window, I just want the arrays. So, suppose n = 3. I want a transformation that outputs the following series given the input s:
1 array([1.0, nan, nan])
2 array([1.1, 1.0, nan])
3 array([1.2, 1.1, 1.0])
4 array([1.3, 1.2, 1.1])
5 array([1.4, 1.3, 1.2])
How do I do this?
Here's one way to do it
In [294]: arr = [s.shift(x).values[::-1][:3] for x in range(len(s))[::-1]]
In [295]: arr
Out[295]:
[array([ 1., nan, nan]),
array([ 1.1, 1. , nan]),
array([ 1.2, 1.1, 1. ]),
array([ 1.3, 1.2, 1.1]),
array([ 1.4, 1.3, 1.2])]
In [296]: pd.Series(arr, index=s.index)
Out[296]:
1 [1.0, nan, nan]
2 [1.1, 1.0, nan]
3 [1.2, 1.1, 1.0]
4 [1.3, 1.2, 1.1]
5 [1.4, 1.3, 1.2]
dtype: object
Here's a vectorized approach using NumPy broadcasting -
n = 3 # window length
idx = np.arange(n)[::-1] + np.arange(len(s))[:,None] - n + 1
out = s.get_values()[idx]
out[idx<0] = np.nan
This gets you the output as a 2D array.
To get a series with each element holding each window as a list -
In [40]: pd.Series(out.tolist())
Out[40]:
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object
If you wish to have a list of 1D arrays split arrays, you can use np.split on the output, like so -
out_split = np.split(out,out.shape[0],axis=0)
Sample run -
In [100]: s
Out[100]:
1 1.0
2 1.1
3 1.2
4 1.3
5 1.4
dtype: float64
In [101]: n = 3
In [102]: idx = np.arange(n)[::-1] + np.arange(len(s))[:,None] - n + 1
...: out = s.get_values()[idx]
...: out[idx<0] = np.nan
...:
In [103]: out
Out[103]:
array([[ 1. , nan, nan],
[ 1.1, 1. , nan],
[ 1.2, 1.1, 1. ],
[ 1.3, 1.2, 1.1],
[ 1.4, 1.3, 1.2]])
In [104]: np.split(out,out.shape[0],axis=0)
Out[104]:
[array([[ 1., nan, nan]]),
array([[ 1.1, 1. , nan]]),
array([[ 1.2, 1.1, 1. ]]),
array([[ 1.3, 1.2, 1.1]]),
array([[ 1.4, 1.3, 1.2]])]
Memory-efficiency with strides
For memory efficiency, we can use a strided one - strided_axis0, similar to #B. M.'s solution, but a bit more generic one.
So, to get 2D array of values with NaNs precedding the first element -
In [35]: strided_axis0(s.values, fillval=np.nan, L=3)
Out[35]:
array([[nan, nan, 1. ],
[nan, 1. , 1.1],
[1. , 1.1, 1.2],
[1.1, 1.2, 1.3],
[1.2, 1.3, 1.4]])
To get 2D array of values with NaNs as fillers coming after the original elements in each row and the order of elements being flipped, as stated in the problem -
In [36]: strided_axis0(s.values, fillval=np.nan, L=3)[:,::-1]
Out[36]:
array([[1. , nan, nan],
[1.1, 1. , nan],
[1.2, 1.1, 1. ],
[1.3, 1.2, 1.1],
[1.4, 1.3, 1.2]])
To get a series with each element holding each window as a list, simply wrap the earlier methods with pd.Series(out.tolist()) with out being the 2D array outputs -
In [38]: pd.Series(strided_axis0(s.values, fillval=np.nan, L=3)[:,::-1].tolist())
Out[38]:
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object
Your data look like a strided array :
data=np.lib.stride_tricks.as_strided(np.concatenate(([NaN]*2,s))[2:],(5,3),(8,-8))
"""
array([[ 1. , nan, nan],
[ 1.1, 1. , nan],
[ 1.2, 1.1, 1. ],
[ 1.3, 1.2, 1.1],
[ 1.4, 1.3, 1.2]])
"""
Then transform in Series :
pd.Series(map(list,data))
""""
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object
""""
If you attach the missing nans at the beginning and the end of the series, you use a simple window
def wndw(s,size=3):
stretched = np.hstack([
np.array([np.nan]*(size-1)),
s.values.T,
np.array([np.nan]*size)
])
for begin in range(len(stretched)-size):
end = begin+size
yield stretched[begin:end][::-1]
for arr in wndw(s, 3):
print arr

Categories