How to one-hot-encode sentences at the character level? - python

I would like to convert a sentence to an array of one-hot vector.
These vector would be the one-hot representation of the alphabet.
It would look like the following:
"hello" # h=7, e=4 l=11 o=14
would become
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Unfortunately OneHotEncoder from sklearn does not take as input string.

Just compare the letters in your passed string to a given alphabet:
def string_vectorizer(strng, alphabet=string.ascii_lowercase):
vector = [[0 if char != letter else 1 for char in alphabet]
for letter in strng]
return vector
Note that, with a custom alphabet (e.g. "defbcazk", the columns will be ordered as each element appears in the original list).
The output of string_vectorizer('hello'):
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

This is a common task in Recurrent Neural Networks and there's a specific function just for this purpose in tensorflow, if you'd like to use it.
alphabets = {'a' : 0, 'b': 1, 'c':2, 'd':3, 'e':4, 'f':5, 'g':6, 'h':7, 'i':8, 'j':9, 'k':10, 'l':11, 'm':12, 'n':13, 'o':14}
idxs = [alphabets[ch] for ch in 'hello']
print(idxs)
# [7, 4, 11, 11, 14]
# #divakar's approach
idxs = np.fromstring("hello",dtype=np.uint8)-97
# or for more clear understanding, use:
idxs = np.fromstring('hello', dtype=np.uint8) - ord('a')
one_hot = tf.one_hot(idxs, 26, dtype=tf.uint8)
sess = tf.InteractiveSession()
In [15]: one_hot.eval()
Out[15]:
array([[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

With pandas, you can use pd.get_dummies by passing a categorical Series:
import pandas as pd
import string
low = string.ascii_lowercase
pd.get_dummies(pd.Series(list(s)).astype('category', categories=list(low)))
Out:
a b c d e f g h i j ... q r s t u v w x y z
0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
[5 rows x 26 columns]

Here's a vectorized approach using NumPy broadcasting to give us a (N,26) shaped array -
ints = np.fromstring("hello",dtype=np.uint8)-97
out = (ints[:,None] == np.arange(26)).astype(int)
If you are looking for performance, I would suggest using an initialized array and then assign -
out = np.zeros((len(ints),26),dtype=int)
out[np.arange(len(ints)), ints] = 1
Sample run -
In [153]: ints = np.fromstring("hello",dtype=np.uint8)-97
In [154]: ints
Out[154]: array([ 7, 4, 11, 11, 14], dtype=uint8)
In [155]: out = (ints[:,None] == np.arange(26)).astype(int)
In [156]: print out
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]]

You asked about "sentences" but your example provided only a single word, so I'm not sure what you wanted to do about spaces. But as far as single words are concerned, your example could be implemented with:
def onehot(ltr):
return [1 if i==ord(ltr) else 0 for i in range(97,123)]
def onehotvec(s):
return [onehot(c) for c in list(s.lower())]
onehotvec("hello")
[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

Related

How to convert values to their index

I have a numpy array containing 1's and 0's:
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0]])
I'd like to convert each 1 to the index in the subarray that it's occuring at, to get this:
e = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
So far what I've done is multiply the array by a range:
a * np.arange(a.shape[0])
which is good, but I'm wondering if there's a better, simpler way to do it, like a single function call?
This modifies a in place:
In [4]: i, j = np.nonzero(a)
In [5]: a[i, j] = j
In [6]: a
Out[6]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Make a copy if you don't want modify a in place.
Or, this creates a new array (in one line):
In [8]: np.arange(a.shape[1])[a]
Out[8]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 2, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 0, 0, 0, 5, 0, 0, 0, 9],
[0, 0, 2, 0, 0, 0, 6, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 6, 0, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9],
[0, 1, 2, 0, 0, 5, 6, 0, 0, 0]])
Your approach is a fast as it gets but it uses the wrong dimension for the multiplication (it would fait if the matrix wasn't square).
Multiply the matrix by a range of column indexes:
import numpy as np
a = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0]])
e = a * np.arange(a.shape[1])
print(e)
[[ 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 5 0 7 0 0 0]
[ 0 0 2 0 0 0 0 0 0 0 0]
[ 0 1 2 0 0 0 6 0 0 0 0]
[ 0 0 2 0 0 5 0 0 0 9 0]
[ 0 0 0 0 0 5 0 0 0 9 0]
[ 0 0 2 0 0 0 6 0 0 0 10]
[ 0 0 0 3 0 0 6 0 8 0 0]
[ 0 0 0 0 0 0 0 0 0 9 0]
[ 0 1 2 0 0 5 6 0 0 0 0]]
I benchmarked the obligatory np.einsum approach, which was ~1.29x slower for larger arrays (100_000, 1000) than the corrected original solution. The inplace solution was ~8x slower than np.einsum.
np.einsum('ij,j->ij', a, np.arange(a.shape[1]))

Store python array in each entry of a column

I have got an array 'mutlilabel' which looks like this:
[[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
...
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
and want to store each of those arrays in my target variable as I am facing a multi-label classification task. How can I achieve that? My code:
pd.DataFrame(multilabel)
Outputs multiple columns:
0 1 2 3 4 5 6 7
0 0 0 0 0 1 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
Thanks in advance!
df = pd.DataFrame(list(multilabel))
list_column = df.apply(lambda row: row.values, axis=1)
pd.DataFrame(list_column, columns=['list_column'])
Result df:
Have you consider using the following trick?
import pandas as pd
arr = [[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0]]
pd.DataFrame([arr]).T
Output
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
EDIT
In case you are using numpy arrays you can use the following
import numpy as np
pd.DataFrame(np.array(arr))\
.apply(lambda x: np.array(x), axis=1)
So, the real question is why... it doesn't seem like the most useful data structure.
That said, the one-dimensional data type in pandas is the Series:
>>> pd.Series(multilabel)
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
dtype: object
You can then convert it further into a DataFrame:
>>> pd.DataFrame(pd.Series(multilabel))
0
0 [0, 0, 0, 1, 0, 0, 0, 0]
1 [1, 0, 0, 0, 0, 0, 0, 0]
2 [1, 0, 0, 1, 0, 0, 0, 0]
3 [0, 0, 0, 1, 0, 0, 0, 0]
4 [1, 0, 0, 0, 0, 0, 0, 0]
Edit: Per further discussion, this works if multilabel is a nested Python list, but not if it's a NumPy array.

Convert numpy binary string array back to string [duplicate]

This question already has answers here:
Convert binary to ASCII and vice versa
(8 answers)
Closed 6 years ago.
I have a numpy binary array like this:
np_bin_array = [0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
It was 8-bit string characters of a word originally, starting from the left, with 0's padding it out.
I need to convert this back into strings to form the word again and strip the 0's and the output for the above should be 'Hello'.
Thanks for your help!
You can firstly interpret the bits into an array, using numpy.packbits(), then convert it to an array of bytes by applying bytearray(), then decode() it to be a normal string.
The following code
import numpy
np_bin_array = [0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print(bytearray(numpy.packbits(np_bin_array)).decode().strip("\x00"))
gives
Hello
This one works to me. I had a boolean array though, so had to make an additional conversion.
def split_list(alist,max_size=1):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(alist), max_size):
yield alist[i:i+max_size]
result = "".join([chr(i) for i in (int("".join([str(int(j)) for j in letter]), base=2) for letter in split_list(np_bin_array, 8)) if i != 0])
import numpy as np
np_bin_array = np.array([0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
bhello = ''.join(map(str, np_bin_array))
xhello = hex(int(bhello, 2)).strip("0x")
''.join(chr(int(xhello[i:i+2], 16)) for i in range(0, len(xhello), 2))
I got it working with this:
np_bin_array = [0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
yy=[]
yy_word=""
yy=np.packbits(np_bin_array)
for i in yy:
if i:
j = chr(i)
yy_word += str(j)
print(yy_word)

Github-API: How do I plot the number of commits to a repo?

I am a Python newbie, and I am trying to plot the number of commits, per day of the week.
I have the output as a pandas dataframe. However, I am unable to figure out how to plot this data using matplotlib.
Any help is greatly appreciated!
import requests
import json
import pandas as pd
r = requests.get('https://api.github.com/repos/thisismetis/dsp/stats/commit_activity')
raw = r.text
results = json.loads(raw)
print pd.DataFrame(results)
Results:
days total week
0 [0, 0, 0, 0, 0, 0, 0] 0 1431216000
1 [0, 0, 0, 0, 0, 8, 0] 8 1431820800
2 [0, 19, 1, 4, 18, 8, 0] 50 1432425600
3 [0, 3, 23, 1, 0, 0, 0] 27 1433030400
4 [1, 0, 0, 0, 1, 0, 0] 2 1433635200
5 [0, 0, 0, 0, 0, 0, 0] 0 1434240000
6 [0, 2, 0, 0, 0, 0, 0] 2 1434844800
7 [0, 0, 0, 0, 0, 0, 0] 0 1435449600
8 [0, 0, 0, 0, 0, 0, 0] 0 1436054400
9 [0, 0, 0, 0, 0, 0, 0] 0 1436659200
10 [0, 0, 8, 0, 3, 0, 0] 11 1437264000
11 [0, 3, 36, 0, 1, 9, 0] 49 1437868800
12 [0, 2, 2, 2, 5, 1, 0] 12 1438473600
13 [0, 0, 0, 0, 0, 0, 0] 0 1439078400
14 [0, 2, 0, 0, 0, 0, 0] 2 1439683200
15 [0, 0, 0, 0, 0, 0, 0] 0 1440288000
16 [0, 0, 0, 0, 0, 0, 0] 0 1440892800
17 [0, 0, 0, 0, 0, 0, 0] 0 1441497600
18 [0, 0, 0, 0, 0, 3, 0] 3 1442102400
19 [0, 0, 0, 0, 0, 0, 0] 0 1442707200
20 [0, 0, 0, 0, 0, 0, 0] 0 1443312000
21 [0, 0, 0, 0, 0, 0, 0] 0 1443916800
22 [0, 0, 0, 0, 0, 0, 0] 0 1444521600
23 [0, 0, 10, 0, 0, 0, 0] 10 1445126400
24 [0, 0, 0, 0, 0, 0, 0] 0 1445731200
25 [1, 0, 0, 0, 0, 0, 0] 1 1446336000
26 [0, 0, 0, 0, 4, 3, 0] 7 1446940800
27 [0, 0, 0, 0, 0, 0, 0] 0 1447545600
28 [0, 0, 0, 0, 0, 0, 0] 0 1448150400
29 [0, 0, 0, 0, 0, 0, 0] 0 1448755200
30 [0, 0, 0, 0, 0, 0, 0] 0 1449360000
31 [0, 0, 0, 0, 0, 0, 0] 0 1449964800
32 [0, 0, 0, 0, 0, 0, 0] 0 1450569600
33 [0, 0, 0, 0, 0, 0, 1] 1 1451174400
34 [0, 0, 0, 0, 0, 0, 0] 0 1451779200
35 [0, 0, 0, 0, 0, 0, 0] 0 1452384000
36 [0, 0, 0, 0, 0, 0, 0] 0 1452988800
37 [0, 0, 0, 0, 0, 0, 0] 0 1453593600
38 [0, 0, 0, 0, 0, 0, 0] 0 1454198400
39 [0, 0, 5, 2, 0, 0, 0] 7 1454803200
40 [0, 0, 25, 2, 0, 0, 0] 27 1455408000
41 [1, 10, 0, 0, 3, 0, 0] 14 1456012800
42 [0, 0, 0, 0, 0, 0, 0] 0 1456617600
43 [0, 0, 0, 0, 0, 0, 0] 0 1457222400
44 [0, 0, 0, 2, 1, 0, 0] 3 1457827200
45 [0, 0, 0, 0, 0, 0, 0] 0 1458432000
46 [0, 0, 0, 0, 0, 0, 0] 0 1459036800
47 [0, 0, 0, 0, 0, 0, 0] 0 1459641600
48 [0, 0, 0, 0, 0, 0, 0] 0 1460246400
49 [0, 0, 0, 0, 0, 0, 0] 0 1460851200
50 [0, 0, 0, 0, 0, 0, 0] 0 1461456000
51 [0, 0, 0, 0, 0, 0, 0] 0 1462060800
you can do it this way:
df['date'] = pd.to_datetime(df.week, unit='s')
df['week_no'] = df.apply(lambda x: '{:d}-{:02d}'.format(x['date'].year, x['date'].weekofyear), axis=1)
df.set_index('week_no')['total'].plot.bar()

Converting an array of numpy arrays to DataFrame

I have a numpy object that contains the following:
17506 [0, 0, 0, 0, 0, 0]
17507 [0, 0, 0, 0, 0, 0]
17508 [0, 0, 0, 0, 0, 0]
17509 [0, 0, 0, 0, 0, 0]
17510 [0, 0, 0, 0, 0, 0]
17511 [0, 0, 0, 0, 0, 0]
17512 [0, 0, 0, 0, 0, 0]
17513 [0, 0, 0, 0, 0, 0]
17514 [0, 0, 0, 0, 0, 0]
17515 [0, 0, 0, 0, 0, 0]
17516 [0, 0, 0, 0, 0, 0]
17517 [0, 0, 0, 0, 0, 0]
17518 [0, 0, 0, 0, 0, 0]
17519 [0, 0, 0, 0, 0, 0]
(An array that contains arrays of dtype('int32'))
How can I efficiently convert this to data frame in pandas and concantenate it (vertically) to an existing dataframe?
What seems to be the problem? You may need to further describe your data.
a = np.array([np.zeros(6) for _ in range(3)])
>>> pd.DataFrame(a)
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0

Categories