I have a pandas dataframe and I want to convert it from (n,1) to shape (n,). Probably I have to use squeeze but can't figure out, How to. squeeze documentation
I also tried z['0']=z['0'].squeeze() but it didn't help.
How can I convert?
z=z.squeeze() works the best and keeps the result dataframe. of course maybe its because I just had one columns, and didn't check it for more columns.
>>> s = pd.DataFrame({"col" : range(5)})
>>> s.shape
(5,1)
>>> s.col.to_numpy().shape
(5,)
You might be interested in .to_numpy():
array = z['0'].to_numpy()
Related
I have a quick question:
I have an array like this:
array([('A', 'B'),
('C', 'D'),
dtype=[('group1', '<U4'), ('group2', '<U4')])
And I would like to combine group1 and group2 into 1 like this:
array([('A_B'),
('C_D'),
dtype=[('group3', '<U4')])
I tried some different things from other answers like this:
array_test = np.array([])
for group in array_test:
combi = np.append(combi,np.array(group[0]+"_"+group[1]))
this does give me a new array with what I want, but when I try to add it to the array I get an error which I can't figure out (don't really know what it means):
np.append(test_array, combi, axis=1)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I tried other thing with concaternate as well but it gave the same error
could someone help me?
The error means that you try to append a 1D array (shape(n,)) to another 1D array along the the second dimension (axis=1) which is impossible as your arrays have only one dimension.
If you don't specify the axis (or axis=0) you'll end up, however, with just a 1D array like array(['A_B', 'C_D']). To get a structured array as requested you need to create a new array like np.array(combi, dtype=[('group3', '<U4')]).
You can do the same vectorized without a loop:
np.array(np.char.add(np.char.add(a['group1'], '_'), a['group2']), dtype=[('group3', '<U4')])
Looking to print the minimum values of numpy array columns.
I am using a loop in order to do this.
The array is shaped (20, 3) and I want to find the min values of columns, starting with the first (i.e. col_value=0)
I have coded
col_value=0
for col_value in X:
print(X[:, col_value].min)
col_value += 1
However, it is coming up with an error
"arrays used as indices must be of integer (or boolean) type"
How do I fix this?
Let me suggest an alternative approach that you might find useful. numpy min() has axis argument that you can use to find min values along various
dimensions.
Example:
X = np.random.randn(20, 3)
print(X.min(axis=0))
prints numpy array with minimum values of X columns.
You don't need col_value=0 nor do you need col_value+=1.
x = numpy.array([1,23,4,6,0])
print(x.min())
EDIT:
Sorry didn't see that you wanted to iterate through columns.
import numpy as np
X = np.array([[1,2], [3,4]])
for col in X.T:
print(col.min())
Transposing the axis of the matrix is one the best solution.
X=np.array([[11,2,14],
[5,15, 7],
[8,9,20]])
X=X.T #Transposing the array
for i in X:
print(min(i))
For a numpy array I have found that
x = numpy.array([]).reshape(0,4)
is fine and allows me to append (0,4) arrays to x without the array losing its structure (ie it dosnt just become a list of numbers). However, when I try
x = numpy.array([]).reshape(2,3)
it throws an error. Why is this?
This out put will explain what it mean to reshape an array...
np.array([2, 3, 4, 5, 6, 7]).reshape(2, 3)
Output -
array([[2, 3, 4],
[5, 6, 7]])
So reshaping just means reshaping an array. reshape(0, 4) means convert the current array into a format with 0 rows and 4 columns intuitively. But 0 rows means no elements means so it works as your array is empty. Similarly (2, 3) means 2 rows and 3 columns which is 6 elements...
reshape is not an 'append' function. It reshapes the array you give it to the dimensions you want.
np.array([]).reshape(0,4) works because you reshape a zero element array to a 0x4(=0 elements) array.
np.reshape([]).reshape(2,3) doesn't work because you're trying to reshape a zero element array to a 2x3(=6 elements) array.
To create an empty array use np.zeros((2,3)) instead.
And in case you're wondering, numpy arrays can't be appended to. You'll have to work around by casting it as a list, appending what you want and the converting back to a numpy array. Preferably, you only create a numpy array when you don't mean to append data later.
It looks to me like a bug in pandas.Series.
a = pd.Series([1,2,3,4])
b = a.reshape(2,2)
b
b has type Series but can not be displayed, the last statement gives exception, very lengthy, the last line is "TypeError: %d format: a number is required, not numpy.ndarray". b.shape returns (2,2), which contradicts its type Series. I am guessing perhaps pandas.Series does not implement reshape function and I am calling the version from np.array? Anyone see this error as well? I am at pandas 0.9.1.
You can call reshape on the values array of the Series:
In [4]: a.values.reshape(2,2)
Out[4]:
array([[1, 2],
[3, 4]], dtype=int64)
I actually think it won't always make sense to apply reshape to a Series (do you ignore the index?), and that you're correct in thinking it's just numpy's reshape:
a.reshape?
Docstring: See numpy.ndarray.reshape
that said, I agree the fact that it let's you try to do this looks like a bug.
The reshape function takes the new shape as a tuple rather than as multiple arguments:
In [4]: a.reshape?
Type: function
String Form:<function reshape at 0x1023d2578>
File: /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/numpy/core/fromnumeric.py
Definition: numpy.reshape(a, newshape, order='C')
Docstring:
Gives a new shape to an array without changing its data.
Parameters
----------
a : array_like
Array to be reshaped.
newshape : int or tuple of ints
The new shape should be compatible with the original shape. If
an integer, then the result will be a 1-D array of that length.
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.
Reshape is actually implemented in Series and will return an ndarray:
In [11]: a
Out[11]:
0 1
1 2
2 3
3 4
In [12]: a.reshape((2, 2))
Out[12]:
array([[1, 2],
[3, 4]])
you can directly use a.reshape((2,2)) to reshape a Series, but you can not reshape a pandas DataFrame directly, because there is no reshape function for pandas DataFrame, but you can do reshape on numpy ndarray:
convert DataFrame to numpy ndarray
do reshape
convert back
e.g.
a = pd.DataFrame([[1,2,3],[4,5,6]])
b = a.as_matrix().reshape(3,2)
a = pd.DataFrame(b)
Just use this below code:
b=a.values.reshape(2,2)
I think it will help you.
u can directly use only reshape() function.but it will give future warning
for example we have a series. We can change it to dataframe like this way;
a = pd.DataFrame(a)
I am reading in census data using the matplotlib cvs2rec function - works fine gives me a nice ndarray.
But there are several columns where all the values are '"none"" with dtype |04. This is cuasing problems when I lode into Atpy "TypeError: object of NoneType has no len()". Something like '9999' or other missing would work for me. Mask is not going to work in this case because I am passing the real array to ATPY and it will not convert MASK. The Put function in numpy will not work with none values wich is the best way to change values(I think). I think some sort of boolean array is the way to go but I can't get it to work.
So what is a good/fast way to change none values and/or uninitialized numpy array to something like '9999'or other recode. No Masking.
Thanks,
Matthew
Here is a solution to this problem, although if your data is a record array you should only apply this operation to your column, rather than the whole array:
import numpy as np
# initialise some data with None in it
a = np.array([1, 2, 3, None])
a = np.where(a == np.array(None), 9999, a)
Note that you need to cast None into a numpy array for this to work
you can use mask array when you do calculation. and when pass the array to ATPY, you can call filled(9999) method of the mask array to convert the mask array to normal array with invalid values replaced by 9999.