Unable to insert NA's in numpy array - python

I was working on this piece of code and was stuck here.
import numpy as np
a = np.arange(10)
a[7:] = np.nan
By theory, it should insert missing values starting from index 7 until the end of the array. However, when I ran the code, some random values are inserted into the array instead of NA's.
Can someone explain what happened here and how should I insert missing values intentionally into numpy arrays?

Not-a-number (NA) is a special type of floating point number. By default, np.arange() creates an array of type int. Casting this to float should allow you to add NA's:
import numpy as np
a = np.arange(10).astype(float)
a[7:] = np.nan

Related

Is there a numpy function to find an array in multi dimensional array?

I have a numpy array with n row and p columns.
I want to check if a given row is in my array and find the index.
For exemple I have a numpy array like this :
[[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0],....]
I want to check if this array [6,0,5,8,2,1] is in my numpy array or and where.
Is there a numpy function for that ?
I'm sorry for asking naive question but I'm quite confuse right now.
You can use == and .all(axis=1) to match entire rows, then use numpy.where() to get the index:
import numpy as np
a = np.array([[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0], [6,0,5,8,2,1]])
b = np.array([6,0,5,8,2,1])
print(np.where((a==b).all(axis=1)))
Output:
(array([5], dtype=int32),)

Looping through pandas dataframe and assigning values

This is probably a very stupid question but I have been searching for example on this site as well as others and I have yet to find an answer that works. I am trying to multiply two arrays and assign the values to a 2D array within nested loops.
I have two variable 'cars' and 'scrappage'. I would like to multiply each element wise and create a 2D array that is 10x10.
I was able to do this in MATLAB but I am new to python, so I know I am probably using the Pandas dataframe incorrectly. I have tried to debug the code, and everything runs exactly as it should (creation of data frames, loops, etc.) with the exception of the array multiplication and values being assigned to the 2D array.
I know this is an indexing error I keep receiving the "IndexError: single positional indexer is out-of-bounds" message.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
cars = pd.DataFrame([1000,2000,3000,4000,5000,6000,7000,8000,9000,10000])
scrappage = pd.DataFrame([1,.95,.86,.75,.62,.44,.30,.20,.12,.04])
Fleet = pd.DataFrame([])
i=0
j=0
for i in range(0,len(cars)):
for j in range(0,len(scrappage)):
Fleet.iloc[i,j]= cars.iloc[i,0] * scrappage.iloc[j,0]
#This^ line is causing the error.
j= j+1
i=i+1
I'm sure this is probably very simple for most but I am struggling with the Pandas syntax. Thank you in advance for any help.
Instead defining fleet as empty dataframe you should give it proper index and column. You can try this:-
import numpy as np
import pandas as pd
cars = pd.DataFrame([1000,2000,3000,4000,5000,6000,7000,8000,9000,10000])
scrappage = pd.DataFrame([1,.95,.86,.75,.62,.44,.30,.20,.12,.04])
fleet = pd.DataFrame(index=range(len(cars)),columns=range(len(scrappage)))
i=0
j=0
for i in range(0,len(cars)):
for j in range(0,len(scrappage)):
fleet.iloc[i,j]= cars.iloc[i,0] * scrappage.iloc[j,0]
j= j+1
i=i+1

How to remove an line from a numpy array/table if it is empty

I have numpy arrays which are around 2000 long each, but not every element has a value. Some are blank. As you can see at the end of the code ive stacked them into one called 'match'. How would I remove a row in match if it is missing an element. So for example if a particular ID is missing the magnitude it removes the entire row. I'm only interested in keeping the rows that have data for all of the elements.
from astropy.table import Table
import numpy as np
data = '/home/myname/datable.fits'
data = Table.read(data, format="fits")
ID = np.array(data['ID'])
ID.astype(str)
redshift = np.array(data['z'])
redshift.astype(float)
radius = np.array(data['r'])
radius.astype(float)
mag = np.array(data['MAG'])
mag.astype(float)
match = (ID, redshift, radius, mag)
np.stack(match, axis=1)
Here you can use the numpy.isnan method which gives true for missing values and false for existing values. But numpy.isnan can be applied to NumPy arrays of native dtype (such as np.float64).
Your requirement can be achieved as follows:
Note: considering data is your numpy array.
import numpy as np
data = np.array(some_array) # set data as your numpy array
key_col = np.array(data[:,0], dtype=np.float64) # If you want to filter based on column 0
filtered_data = data[~np.isnan(key_col)] # ~ is the logical not here
For better flexibility, consider using pandas!!
Hope this helps!!

Python: How to convert whole array into int

I wish to have an int matrix which has only its first column filled and the rest of elements are Null. Sorry but, I have a background of R. So, I know if I leave some Null elements it would be easier to manage them later. Meanwhile, if I leave 0 then it would be lots of problems later.
I have the following code:
import numpy as np
import numpy.random as random
import pandas as pa
def getRowData():
rowDt = np.full((80,20), np.nan)
rowDt[:,0] = random.choice([1,2,3],80) # Set the first column
return rowDt
I wish that this function returns the int, but seems that it gives me float.
I have seen this link, and tried the below code:
return pa.to_numeric(rowDt)
But, it did not help me. Also the rowDT object does not have .astype(<type>).
How can I convert an int array?
You create a full (np.full ) matrix of np.nan, which holds float dtype. This means you start off with a matrix defined to hold float numbers, not integers.
To fix this, fefine a full matrix with the integer 0 as initial value. That way, the dtype of your array is np.int and there is no need for astype or type casting.
rowDt = np.full((80,20), 0)
If you still wish to hold np.nan in your matrix, then I'm afraid you cannot use numpy arrays for that. You either hold all integers, or all floats.
You can use numpy.ma.masked_array() to create a numpy masked array
The numpy masked array "remembers" which elements are "masked". It provides methods and functions similar to those of numpy arrays, but excluding the masked values from the computations (such as, eg, mean()).
Once you have the masked array, you can always mask or unmask specific elements or rows or columns of elements whenever you want.

Recode missing data Numpy

I am reading in census data using the matplotlib cvs2rec function - works fine gives me a nice ndarray.
But there are several columns where all the values are '"none"" with dtype |04. This is cuasing problems when I lode into Atpy "TypeError: object of NoneType has no len()". Something like '9999' or other missing would work for me. Mask is not going to work in this case because I am passing the real array to ATPY and it will not convert MASK. The Put function in numpy will not work with none values wich is the best way to change values(I think). I think some sort of boolean array is the way to go but I can't get it to work.
So what is a good/fast way to change none values and/or uninitialized numpy array to something like '9999'or other recode. No Masking.
Thanks,
Matthew
Here is a solution to this problem, although if your data is a record array you should only apply this operation to your column, rather than the whole array:
import numpy as np
# initialise some data with None in it
a = np.array([1, 2, 3, None])
a = np.where(a == np.array(None), 9999, a)
Note that you need to cast None into a numpy array for this to work
you can use mask array when you do calculation. and when pass the array to ATPY, you can call filled(9999) method of the mask array to convert the mask array to normal array with invalid values replaced by 9999.

Categories