Python: How to convert whole array into int - python

I wish to have an int matrix which has only its first column filled and the rest of elements are Null. Sorry but, I have a background of R. So, I know if I leave some Null elements it would be easier to manage them later. Meanwhile, if I leave 0 then it would be lots of problems later.
I have the following code:
import numpy as np
import numpy.random as random
import pandas as pa
def getRowData():
rowDt = np.full((80,20), np.nan)
rowDt[:,0] = random.choice([1,2,3],80) # Set the first column
return rowDt
I wish that this function returns the int, but seems that it gives me float.
I have seen this link, and tried the below code:
return pa.to_numeric(rowDt)
But, it did not help me. Also the rowDT object does not have .astype(<type>).
How can I convert an int array?

You create a full (np.full ) matrix of np.nan, which holds float dtype. This means you start off with a matrix defined to hold float numbers, not integers.
To fix this, fefine a full matrix with the integer 0 as initial value. That way, the dtype of your array is np.int and there is no need for astype or type casting.
rowDt = np.full((80,20), 0)
If you still wish to hold np.nan in your matrix, then I'm afraid you cannot use numpy arrays for that. You either hold all integers, or all floats.

You can use numpy.ma.masked_array() to create a numpy masked array
The numpy masked array "remembers" which elements are "masked". It provides methods and functions similar to those of numpy arrays, but excluding the masked values from the computations (such as, eg, mean()).
Once you have the masked array, you can always mask or unmask specific elements or rows or columns of elements whenever you want.

Related

Converting pandas DataFrame to numpy array but with an inconsistency

I am running into a weird inconsistency. So I had to learn the difference between immutable and mutable data types. For my purpose, I need to convert my pandas DataFrame into Numpy apply operations and convert it back, as I do not wish to alter my input.
so I am converting like follows:
mix=pd.DataFrame(array,columns=columns)
def mix_to_pmix(mix,p_tank):
previous=0
columns,mix_in=np.array(mix) #<---
mix_in*=p_tank
previous=0
for count,val in enumerate(mix_in):
mix_in[count]=val+previous
previous+=val
return pd.DataFrame(mix_in,columns=columns)
This works perfectly fine, but the function:
columns,mix_in=np.array(mix)
seems to not be consistent as in the case:
def to_molfrac(mix):
columns,mix_in=np.array(mix)
shape=mix_in.shape
for i in range(shape[0]):
mix_in[i,:]*=1/max(mix_in[i,:])
for k in range(shape[1]-1,0,-1):
mix_in[:,k]+=-mix_in[:,k-1]
mix_in=mix_in/mix_in.sum(axis=1)[:,np.newaxis]
return pd.DataFrame(mix_in,columns=columns)
I receive the error:
ValueError: too many values to unpack (expected 2)
The input of the latter function is the output of the previous function. So it should be the same case.
It's impossible to understand the input of to_molfrac and mix_to_pmix without an example.
But the pandas objects has a .value attribute which allows you to access the underlying numpy array. So, its probably better to use mix_in = mix.values instead.
columns, values = df.columns, df.values

Unable to insert NA's in numpy array

I was working on this piece of code and was stuck here.
import numpy as np
a = np.arange(10)
a[7:] = np.nan
By theory, it should insert missing values starting from index 7 until the end of the array. However, when I ran the code, some random values are inserted into the array instead of NA's.
Can someone explain what happened here and how should I insert missing values intentionally into numpy arrays?
Not-a-number (NA) is a special type of floating point number. By default, np.arange() creates an array of type int. Casting this to float should allow you to add NA's:
import numpy as np
a = np.arange(10).astype(float)
a[7:] = np.nan

Normalising rows in numpy matrix

I am trying to normalize rows of a numpy matrix using L2 norm (unity length).
I am seeing a problem when I do that.
Assuming my matrix 'b' is as follows:
Now when I do the normalization of first row as below it works fine.
But when I try to do it by iterating through all the rows and converting the same matrix b as below it gives me all zeros.
Any idea why is that happening and how to get the correct normalization?.
Any faster way of row normalizing of matrix without having to iterate each row?. I don't want to use sci-kit learn normalization function though.
Thanks
The problem comes from the fact that b has type int so when you fill in row by row, numpy automatically converts the results of you computation (float) to int, hence the zeros. One way to avoid that is to define b with type float by using 0., 1. etc... or just adding .astype(float) at definition.
This should work to do the computation in one go which also doesn't require converting to float first:
b = b / np.linalg.norm(b, axis=1, keepdims=True)
This works because you are redefining the whole array rather than changing its rows one by one, and numpy is clever enough to make it float.

setting null values in a numpy array

how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.
Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])
np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).

Recode missing data Numpy

I am reading in census data using the matplotlib cvs2rec function - works fine gives me a nice ndarray.
But there are several columns where all the values are '"none"" with dtype |04. This is cuasing problems when I lode into Atpy "TypeError: object of NoneType has no len()". Something like '9999' or other missing would work for me. Mask is not going to work in this case because I am passing the real array to ATPY and it will not convert MASK. The Put function in numpy will not work with none values wich is the best way to change values(I think). I think some sort of boolean array is the way to go but I can't get it to work.
So what is a good/fast way to change none values and/or uninitialized numpy array to something like '9999'or other recode. No Masking.
Thanks,
Matthew
Here is a solution to this problem, although if your data is a record array you should only apply this operation to your column, rather than the whole array:
import numpy as np
# initialise some data with None in it
a = np.array([1, 2, 3, None])
a = np.where(a == np.array(None), 9999, a)
Note that you need to cast None into a numpy array for this to work
you can use mask array when you do calculation. and when pass the array to ATPY, you can call filled(9999) method of the mask array to convert the mask array to normal array with invalid values replaced by 9999.

Categories