I am relatively new to python and numpy. I am currently trying to replicate the following table as shown in the image in python using numpy.
As in the figure, I have got the columns "group, sub_group,value" that are populated. I want to transpose column "sub_group" and do a simple calculation i.e. value minus shift(value) and display the figure in the lower diagonal of the matrix for each group. If sub_group is "0", then assign the whole column as 0. The transposed sub_group can be named anything (preferably index numbers) if it makes it easier. I am ok with a pandas solution as well. I just think pandas may be slow?
Below is code in array form:
import numpy as np
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
Any help would be appreciated.
Regards,
S
Try this out :
import numpy as np
import pandas as pd
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
df = pd.DataFrame(a)
for i in df.index:
col_name = str(int(df['sub_group'][i]))
df[col_name] = None
if df['sub_group'][i] == 0:
df[col_name] = 0
else:
val = df['value'][i]
for j in range(i, df.index[-1]+1):
df[col_name][j] = val - df['value'][j]
For the upper triangle of the matrix, I have put Nonevalues. You can replace it by whatever you want.
This piece of code does the calculation for the example of the subgroup, I am not sure if this is what you actually want, in that case post a comment here and I will edit
import numpy as np
array_1=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1);(1,0,12),(1,-5,16)])
#transpose the matrix
transposed_group = array_1.transpose()
#loop over the first row
for i in range(0,len(transposed_group[1,:])):
#value[i] - first value of the row
transposed_group[0,i] = transposed_group[0,i] - transposed_group[0,0]
print transposed_group
In case you want to display that in the diagonal of the matrix, you can loop through the rows and columns, as for example:
import numpy as np
#create an array of 0
array = np.zeros(shape=(3,3))
#fill the array with 1 in the diagonals
print array
#loop over rows
for i in range(0,len(array[:,1])):
#loop over columns
for j in range(0,len(array[1,:])):
array[i,j] = 1
print array
Related
I have problem with DataFrame for range.
In the first line, I would like to calculate and add the data,
subsequent lines depend on each previous one.
So the first formula is "different", the rest are repeated.
I did this in a DataFrame and it works, but very slowly.
All other data so far is in the DataFrame.
import pandas as pd
import numpy as np
calc = pd.DataFrame(np.random.binomial(n=10, p=0.2, size=(5,1)))
calc['op_ol'] = calc[0]
calc['op_ol'][0] = calc[0][0]
for ee in range(1,5):
calc['op_ol'][ee] = 0 if calc['op_ol'][ee-1] == 0 else calc[0][ee-1] * calc['op_ol'][ee-1]
How could I speed this up?
It's generally slow when you use loops with pandas. I suggest you these lines:
calc = pd.DataFrame(np.random.binomial(n=10, p=0.2, size=(5,1)))
calc['op_ol'] = (calc[0].cumprod() * calc[0][0]).shift(fill_value=calc[0][0])
Where cumprod is the cumulative product and we shift it with the first value.
I got a numpy array from a csv file. It is 6 X 6. I want to find a specific row and then find the lower element index.
This creates a random array 6x6, since you did not provide any data and finds the index of the minimum in the row_n specific row.
import numpy as np
a = np.random.rand(6, 6)
print(a)
row_n = 0
print(a[row_n, :].argmin())
I have a N x M numpy array / list. I want to save this matrix into a .csv file using Pandas. Unfortunately I don't know a priori the values of M and N which can be large. I am interested in Pandas because I find it manageable in terms of data columns access.
Let's start with this MWE:
import numpy as np
import pandas as pd
N,M = np.random.randint(10,100, size = 2)
A = np.random.randint(10, size = (N,M))
columns = []
for i in range(len(A[0,:])):
columns.append( "column_{} ".format(i) )
I cannot do something like pd.append( ) i.e. appending columns with new additional indices via a for loop.
Is there a way to save A into a .csv file?
Following the comment of Quang Hoang, there are 2 possibilities:
pd.DataFrame(A).to_csv('yourfile.csv').
np.save("yourfile.npy",A) and then A = np.load("yourfile.npy").
I need to make it so I can return the function (abspec2(v)) for each value of the array row, and sum each returned function for each value of the row (not sum each value of the array). apologies if my code is unclear, quite new to this. Please comment if you require clarification.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df =pd.read_csv(r'C:\Users\adamf\OneDrive\Desktop\hitran_water.csv')
print(df)
arr = df.to_numpy()
here I am setting my variables, setting i as the row value. I need to be able to calculate and return the below function (abspec2(v), for each individual row value of each column.
nu0 = arr[i,1]
s = arr[i,2]
gamma = arr[i,3]
def abspec2(v):
a = s1/np.pi
b = gamma/(gamma**2+(v - nu0)**2)
spec = a*b
return spec
plt.plot(V, abspec2(V))
plt.show()
dput(C:\Users\adamf\OneDrive\Desktop\hitran_water.csv)
I have numpy arrays which are around 2000 long each, but not every element has a value. Some are blank. As you can see at the end of the code ive stacked them into one called 'match'. How would I remove a row in match if it is missing an element. So for example if a particular ID is missing the magnitude it removes the entire row. I'm only interested in keeping the rows that have data for all of the elements.
from astropy.table import Table
import numpy as np
data = '/home/myname/datable.fits'
data = Table.read(data, format="fits")
ID = np.array(data['ID'])
ID.astype(str)
redshift = np.array(data['z'])
redshift.astype(float)
radius = np.array(data['r'])
radius.astype(float)
mag = np.array(data['MAG'])
mag.astype(float)
match = (ID, redshift, radius, mag)
np.stack(match, axis=1)
Here you can use the numpy.isnan method which gives true for missing values and false for existing values. But numpy.isnan can be applied to NumPy arrays of native dtype (such as np.float64).
Your requirement can be achieved as follows:
Note: considering data is your numpy array.
import numpy as np
data = np.array(some_array) # set data as your numpy array
key_col = np.array(data[:,0], dtype=np.float64) # If you want to filter based on column 0
filtered_data = data[~np.isnan(key_col)] # ~ is the logical not here
For better flexibility, consider using pandas!!
Hope this helps!!