First of all I'm a beginner and having an issue about functions and returning values. After that, I need to do some matrix operations to take minimum value of the right column. However, since I cannot return these values (I could not figure out why) I'm not able to do any operations on it. The problem here is, every time I try to use return, It gives me only the first or the last row of the matrix. If you can help, I really appreciate it. Thanks.
import numpy as np
import pandas as pd
df = pd.read_csv(r"C:\Users\Yunus Özer\Downloads/MA.csv")
df.head()
x = df["x"]
def minreg():
for k in range(2,16):
x_pred = np.full(x.shape, np.nan)
for t in range(k,x.size):
x_pred[t] = np.mean(x[(t-k):t])
mape_value=((np.mean(np.abs(x-x_pred)/np.abs(x))*100))
m=np.array([k,mape_value])
return m
print(minreg())
return m command basicly terminates the function and returns m. As a result, the function terminates after executing the first loop. So firstly you need to call return after your loop ends. Secondly you need to put each m value generated for the loop to an array to store them and return that array.
import numpy as np
import pandas as pd
df = pd.read_csv(r"C:\Users\Yunus Özer\Downloads/MA.csv")
df.head()
x = df["x"]
def minreg():
m_arr = []
for k in range(2,16):
x_pred = np.full(x.shape, np.nan)
for t in range(k,x.size):
x_pred[t] = np.mean(x[(t-k):t])
mape_value=((np.mean(np.abs(x-x_pred)/np.abs(x))*100))
m_arr.append(np.array([k,mape_value]))
return m_arr
print(minreg())
I am tryig to apply argrelextrema function with dataframe df. But unable to apply correctly. below is my code
import pandas as pd
from scipy.signal import argrelextrema
np.random.seed(42)
def maxloc(data):
loc_opt_ind = argrelextrema(df.values, np.greater)
loc_max = np.zeros(len(data))
loc_max[loc_opt_ind] = 1
data['loc_max'] = loc_max
return data
values = np.random.rand(23000)
df = pd.DataFrame({'value': values})
np.all(maxloc_faster(df).loc_max)
It gives me error
that loc_max[loc_opt_ind] = 1
IndexError: too many indices for array
A Pandas dataframe is two-dimensional. That is, df.values is two dimensional, even when it has only one column. As a result, loc_opt_ind will contain x and y indices (two tuples; just print loc_opt_ind to see), which can't be used to index loc_max. You probably want to use either df['values'].values (which turns into <Series>.values), or np.squeeze(df.values) as input. Note that argrelextrema still returns a tuple in that case, just a one-element one, so you may need loc_opt_ind[0] (np.where has similar behaviour).
I have numpy arrays which are around 2000 long each, but not every element has a value. Some are blank. As you can see at the end of the code ive stacked them into one called 'match'. How would I remove a row in match if it is missing an element. So for example if a particular ID is missing the magnitude it removes the entire row. I'm only interested in keeping the rows that have data for all of the elements.
from astropy.table import Table
import numpy as np
data = '/home/myname/datable.fits'
data = Table.read(data, format="fits")
ID = np.array(data['ID'])
ID.astype(str)
redshift = np.array(data['z'])
redshift.astype(float)
radius = np.array(data['r'])
radius.astype(float)
mag = np.array(data['MAG'])
mag.astype(float)
match = (ID, redshift, radius, mag)
np.stack(match, axis=1)
Here you can use the numpy.isnan method which gives true for missing values and false for existing values. But numpy.isnan can be applied to NumPy arrays of native dtype (such as np.float64).
Your requirement can be achieved as follows:
Note: considering data is your numpy array.
import numpy as np
data = np.array(some_array) # set data as your numpy array
key_col = np.array(data[:,0], dtype=np.float64) # If you want to filter based on column 0
filtered_data = data[~np.isnan(key_col)] # ~ is the logical not here
For better flexibility, consider using pandas!!
Hope this helps!!
I have a pandas dataframe from which I wish to construct some matrices using numpy arrays. These matrices will be constructed based on variables in the dataframe, and I would like to create these via a loop over a list of the dataframe variables. I would also like the numpy arrays to be named based on the variable, so that I can easily reference them.
Below is code to try to illustrate my problem. I create a dataframe with two categorical variables and an identifier. I then create a list 'vars' with the variable names I'd like to loop over. I show that my code runs outside the loop (although the object created is pandas not numpy). The commented piece at the end does not work, but shows my attempt at including the variable string in the loop.
import pandas as pd
import numpy as np
import random
mult_cat = [] # multiple categories
bin_cat = [] # binary categories
id = []
for i in range(0,10):
x = random.randint(0,4)
y = random.randint(0,1)
z = i+1
mult_cat.append(x)
bin_cat.append(y)
id.append(z)
data_2 = {'ID': id,
'mult_cat': mult_cat,
'bin_cat': bin_cat}
df = pd.DataFrame(data_2,
columns = ['ID', 'mult_cat', 'bin_cat'])
vars = ['mult_cat', 'bin_cat']
twice_mult_cat=2*df.mult_cat
print(mult_cat)
print(twice_mult_cat)
"""
for var in vars:
twice_var=2*df.var
print(twice_var)
"""
I believe there are at least two issues here.
1) I am simply multiplying the pandas array, so the resulting object is not a numpy array.
2) The issue of naming, which is, I think, the more important issue here.
I am relatively new to python and numpy. I am currently trying to replicate the following table as shown in the image in python using numpy.
As in the figure, I have got the columns "group, sub_group,value" that are populated. I want to transpose column "sub_group" and do a simple calculation i.e. value minus shift(value) and display the figure in the lower diagonal of the matrix for each group. If sub_group is "0", then assign the whole column as 0. The transposed sub_group can be named anything (preferably index numbers) if it makes it easier. I am ok with a pandas solution as well. I just think pandas may be slow?
Below is code in array form:
import numpy as np
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
Any help would be appreciated.
Regards,
S
Try this out :
import numpy as np
import pandas as pd
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
df = pd.DataFrame(a)
for i in df.index:
col_name = str(int(df['sub_group'][i]))
df[col_name] = None
if df['sub_group'][i] == 0:
df[col_name] = 0
else:
val = df['value'][i]
for j in range(i, df.index[-1]+1):
df[col_name][j] = val - df['value'][j]
For the upper triangle of the matrix, I have put Nonevalues. You can replace it by whatever you want.
This piece of code does the calculation for the example of the subgroup, I am not sure if this is what you actually want, in that case post a comment here and I will edit
import numpy as np
array_1=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1);(1,0,12),(1,-5,16)])
#transpose the matrix
transposed_group = array_1.transpose()
#loop over the first row
for i in range(0,len(transposed_group[1,:])):
#value[i] - first value of the row
transposed_group[0,i] = transposed_group[0,i] - transposed_group[0,0]
print transposed_group
In case you want to display that in the diagonal of the matrix, you can loop through the rows and columns, as for example:
import numpy as np
#create an array of 0
array = np.zeros(shape=(3,3))
#fill the array with 1 in the diagonals
print array
#loop over rows
for i in range(0,len(array[:,1])):
#loop over columns
for j in range(0,len(array[1,:])):
array[i,j] = 1
print array