How to reshape a pandas.Series - python

It looks to me like a bug in pandas.Series.
a = pd.Series([1,2,3,4])
b = a.reshape(2,2)
b
b has type Series but can not be displayed, the last statement gives exception, very lengthy, the last line is "TypeError: %d format: a number is required, not numpy.ndarray". b.shape returns (2,2), which contradicts its type Series. I am guessing perhaps pandas.Series does not implement reshape function and I am calling the version from np.array? Anyone see this error as well? I am at pandas 0.9.1.

You can call reshape on the values array of the Series:
In [4]: a.values.reshape(2,2)
Out[4]:
array([[1, 2],
[3, 4]], dtype=int64)
I actually think it won't always make sense to apply reshape to a Series (do you ignore the index?), and that you're correct in thinking it's just numpy's reshape:
a.reshape?
Docstring: See numpy.ndarray.reshape
that said, I agree the fact that it let's you try to do this looks like a bug.

The reshape function takes the new shape as a tuple rather than as multiple arguments:
In [4]: a.reshape?
Type: function
String Form:<function reshape at 0x1023d2578>
File: /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/numpy/core/fromnumeric.py
Definition: numpy.reshape(a, newshape, order='C')
Docstring:
Gives a new shape to an array without changing its data.
Parameters
----------
a : array_like
Array to be reshaped.
newshape : int or tuple of ints
The new shape should be compatible with the original shape. If
an integer, then the result will be a 1-D array of that length.
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.
Reshape is actually implemented in Series and will return an ndarray:
In [11]: a
Out[11]:
0 1
1 2
2 3
3 4
In [12]: a.reshape((2, 2))
Out[12]:
array([[1, 2],
[3, 4]])

you can directly use a.reshape((2,2)) to reshape a Series, but you can not reshape a pandas DataFrame directly, because there is no reshape function for pandas DataFrame, but you can do reshape on numpy ndarray:
convert DataFrame to numpy ndarray
do reshape
convert back
e.g.
a = pd.DataFrame([[1,2,3],[4,5,6]])
b = a.as_matrix().reshape(3,2)
a = pd.DataFrame(b)

Just use this below code:
b=a.values.reshape(2,2)
I think it will help you.
u can directly use only reshape() function.but it will give future warning

for example we have a series. We can change it to dataframe like this way;
a = pd.DataFrame(a)

Related

bypandas shape (n,1) to shape (n,)

I have a pandas dataframe and I want to convert it from (n,1) to shape (n,). Probably I have to use squeeze but can't figure out, How to. squeeze documentation
I also tried z['0']=z['0'].squeeze() but it didn't help.
How can I convert?
z=z.squeeze() works the best and keeps the result dataframe. of course maybe its because I just had one columns, and didn't check it for more columns.
>>> s = pd.DataFrame({"col" : range(5)})
>>> s.shape
(5,1)
>>> s.col.to_numpy().shape
(5,)
You might be interested in .to_numpy():
array = z['0'].to_numpy()

Python How to convert ['0.12' '0.23'] <class 'numpy.ndarray'> to a normal numpy array

I am using a package that is fetching values from a csv file for me. If I print out the result I get ['0.12' '0.23']. I checked the type, which is <class 'numpy.ndarray'> I want to convert it to a numpy array like [0.12, 0.23].
I tried np.asarray(variabel) but that did not resolve the problem.
Solution
import numpy as np
array = array.astype(np.float)
# If you are just initializing array you can do this
ar= np.array(your_list,dtype=np.float)
It might help to know how the csv was read. But for what ever reason it appears to have created a numpy array with a string dtype:
In [106]: data = np.array(['0.12', '0.23'])
In [107]: data
Out[107]: array(['0.12', '0.23'], dtype='<U4')
In [108]: print(data)
['0.12' '0.23']
The str formatting of such an array omits the comma, the repr display keeps it.
A list equivalent also displays with comma:
In [109]: data.tolist()
Out[109]: ['0.12', '0.23']
We call this a numpy array, but technically it is of class numpy.ndarray
In [110]: type(data)
Out[110]: numpy.ndarray
It can be converted to an array of floats with:
In [111]: data.astype(float)
Out[111]: array([0.12, 0.23])
It is still a ndarray, just the dtype is different. You may need to read more in the numpy docs about dtype.
The error:
If I want to calculate with it it gives me an error TypeError: only size-1 arrays can be converted to Python scalars
has a different source. data has 2 elements. You don't show the code that generates this error, but often we see this in plotting calls. The parameter is supposed to be a single number (often an integer), where as your array, even with a numeric dtype) is two numbers.

DataFrame of objects `astype(float)` behaviour different depending if lists or arrays

I'll preface this with the statement that I wouldn't do this in the first place and that I ran across this helping a friend.
Consider the data frame df
df = pd.DataFrame(pd.Series([[1.2]]))
df
0
0 [1.2]
This is a data frame of objects where the objects are lists. In my friend's code, they had:
df.astype(float)
Which breaks as I had hoped
ValueError: setting an array element with a sequence.
However, if those values were numpy arrays instead:
df = pd.DataFrame(pd.Series([np.array([1.2])]))
df
0
0 [1.2]
And I tried the same thing:
df.astype(float)
0
0 1.2
It's happy enough to do something and convert my 1-length arrays to scalars. This feels very dirty!
If instead they were not 1-length arrays
df = pd.DataFrame(pd.Series([np.array([1.2, 1.3])]))
df
0
0 [1.2, 1.3]
Then it breaks
ValueError: setting an array element with a sequence.
Question
Please tell me this is a bug and we can fix it. Or can someone explain why and in what world this makes sense?
Response to #root
You are right. Is this worth an issue? Do you expect/want this?
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float)
array([ 1.2])
And
a = np.empty((1,), object)
a[0] = np.array([1.2, 1.3])
a.astype(float)
ValueError: setting an array element with a sequence.
This is due to the unsafe default-value for the castingargument of astype. In the docs the argument casting is described as such:
"Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility." (my emphasis)
Any of the other possible castings return a TypeError.
a = np.empty((1,), object)
a[0] = np.array([1.2])
a.astype(float, casting='same_kind')
Results in:
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'same_kind'
This is true for all castings except unsafe, namely: no, equiv, safe, and same_kind.

Reshaping Numpy Arrays to a multidimensional array

For a numpy array I have found that
x = numpy.array([]).reshape(0,4)
is fine and allows me to append (0,4) arrays to x without the array losing its structure (ie it dosnt just become a list of numbers). However, when I try
x = numpy.array([]).reshape(2,3)
it throws an error. Why is this?
This out put will explain what it mean to reshape an array...
np.array([2, 3, 4, 5, 6, 7]).reshape(2, 3)
Output -
array([[2, 3, 4],
[5, 6, 7]])
So reshaping just means reshaping an array. reshape(0, 4) means convert the current array into a format with 0 rows and 4 columns intuitively. But 0 rows means no elements means so it works as your array is empty. Similarly (2, 3) means 2 rows and 3 columns which is 6 elements...
reshape is not an 'append' function. It reshapes the array you give it to the dimensions you want.
np.array([]).reshape(0,4) works because you reshape a zero element array to a 0x4(=0 elements) array.
np.reshape([]).reshape(2,3) doesn't work because you're trying to reshape a zero element array to a 2x3(=6 elements) array.
To create an empty array use np.zeros((2,3)) instead.
And in case you're wondering, numpy arrays can't be appended to. You'll have to work around by casting it as a list, appending what you want and the converting back to a numpy array. Preferably, you only create a numpy array when you don't mean to append data later.

Recode missing data Numpy

I am reading in census data using the matplotlib cvs2rec function - works fine gives me a nice ndarray.
But there are several columns where all the values are '"none"" with dtype |04. This is cuasing problems when I lode into Atpy "TypeError: object of NoneType has no len()". Something like '9999' or other missing would work for me. Mask is not going to work in this case because I am passing the real array to ATPY and it will not convert MASK. The Put function in numpy will not work with none values wich is the best way to change values(I think). I think some sort of boolean array is the way to go but I can't get it to work.
So what is a good/fast way to change none values and/or uninitialized numpy array to something like '9999'or other recode. No Masking.
Thanks,
Matthew
Here is a solution to this problem, although if your data is a record array you should only apply this operation to your column, rather than the whole array:
import numpy as np
# initialise some data with None in it
a = np.array([1, 2, 3, None])
a = np.where(a == np.array(None), 9999, a)
Note that you need to cast None into a numpy array for this to work
you can use mask array when you do calculation. and when pass the array to ATPY, you can call filled(9999) method of the mask array to convert the mask array to normal array with invalid values replaced by 9999.

Categories