Vector from other two vectors pandas python [duplicate] - python

This question already has answers here:
Creating an element-wise minimum Series from two other Series in Python Pandas
(10 answers)
Closed 11 months ago.
I can't find solution for my problem. I have two vectors pandas.Series type T = [a1, a2, a3,....,an] M = [b1, b2, b3,...bn] I need to create new vector in which every element should be the minimum between two elements in the given vector. It should looks like new_vector = [min(a1, b1), min(a2, b2), ....min(an, bn)
Is this possible with the functions in pandas?

Yes, can use pandas min() function to return lowest value per element position
Minimum Value Comparing Each Element of Pandas Series
import pandas as pd
T = pd.Series([6,7,4,1,4,1,6,8,0])
M = pd.Series([5,3,8,1,3,7,1,7,1])
new_vector = pd.DataFrame([T,M]).min()
print(new_vector)
Results:
idx minValue
0 5
1 3
2 4
3 1
4 3
5 1
6 1
7 7
8 0

Related

Numpy array masking(Python) [duplicate]

This question already has answers here:
check for identical rows in different numpy arrays
(7 answers)
Closed 19 days ago.
I would like to ask a question with numpy array masking.
For instance given the array below:
a b
1 2
3 4
5 6
6 5
I have another array which is
a b
1 2
3 4
I want to compare two arrays and find the index numbers of second array in the first array.
For instance, the solution should be index=[0,1]
I have tried with
np.where np.where(~(np.abs(a - b[:,None]).sum(-1)==0).any(0))
but does not give me the final result
thanks for suggestions!
A possible solution, based on Broadcasting, where ar1 and ar2 are the two arrays, respectively:
np.nonzero(np.any(np.all(ar1 == ar2[:,None], axis=2), axis=0))[0]
Output:
array([0, 1])
a = np.array([[1,2],[3,4],[5,6],[6,5]])
b = np.array([[1,2],[3,4]])
np.where(np.all(a == b[:,None], axis=2))[1] # np.array([0,1])

Pandas: Replacing values with np.nan in Column A depending on value in column B [duplicate]

This question already has answers here:
Conditional Replace Pandas
(7 answers)
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Replacing dataframe values with NaN based on condition while preserving shape of df
(2 answers)
Closed 4 months ago.
I have a dataframe that looks something like this:
wavelength normalized flux lof
0 5100.00 0.948305 1
1 5100.07 0.796783 1
2 5100.14 0.696425 1
3 5100.21 0.880586 1
4 5100.28 0.836257 1
... ... ... ...
4281 5399.67 1.076449 1
4282 5399.74 1.038198 1
4283 5399.81 1.004292 1
4284 5399.88 0.946977 1
4285 5399.95 0.894559 1
If lof = -1, I want to replace the normalized flux value with np.nan. Otherwise, just leave the normalized flux value as is. Is there a simple way to do this?
You can just assign
df.loc[df['lof']==-1,'flux'] = np.nan
df['flux']=df['flux'].mask(df['lof'].eq(-1), np.nan)
df

Apply function to two columns and map the output to a new column [duplicate]

This question already has answers here:
How to apply a function to two columns of Pandas dataframe
(15 answers)
Closed 3 years ago.
I am new to Pandas. Would like to know how to apply a function to two columns in a dataframe and map the output from the function to a new column in the dataframe. Is this at all possible with pandas syntax or should I resort to native Python to iterate over the rows in the dataframe columns to generate the new column?
a b
1 2
3 1
2 9
Question is how to get, for example, the multiplication of the two numbers in a new column c
a b c
1 2 2
3 1 3
2 9 18
You can do with pandas.
For example:
def funcMul(row):
return row['a']*row['b']
Then,
df['c'] = df.apply(funcMul,1)
Output:
a b c
0 1 2 2
1 3 1 3
2 2 9 18
You can do the following with pandas
import pandas as pd
def func(r):
return r[0]*r[1]
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df['c'] = df.apply(func, axis = 1)
Also, here is the official documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
The comment by harvpan shows the simplest way to achieve your specific example, but here is a generic way to do what you asked:
def functionUsedInApply(row):
""" The function logic for the apply function comes here.
row: A Pandas Series containing the a row in df.
"""
return row['a'] * row['b']
def functionUsedInMap(value):
""" This function is used in the map after the apply.
For this example, if the value is larger than 5,
return the cube, otherwise, return the square.
value: a value of whatever type is returned by functionUsedInApply.
"""
if value > 5:
return value**3
else:
return value**2
df['new_column_name'] = df.apply(functionUsedInApply,axis=1).map(functionUsedInMap)
The function above first adds columns a and b together and then returns the square of that value for a+b <=5 and the cube of that value for a+b > 5.

Extracting the nth element from each list and storing it in a new column [duplicate]

This question already has answers here:
How do I select an element in array column of a data frame?
(2 answers)
Closed 4 years ago.
I've a dataframe (called 'df') which contains a column called 'grades'. This column contains a list of grades. The data in this column is of type 'object'.
student_id grades
0 11 [A,A,B,A]
1 12 [B,B,B,C]
2 13 [C,C,D,B]
3 21 [B,A,C,B]
I'm hoping to create a new column called 'maths_grades', which will store the 3rd element in the grades list.
Example Output:
student_id grades maths_grade
0 11 [A,A,B,A] B
1 12 [B,B,B,C] B
2 13 [C,C,D,B] D
3 21 [B,A,C,B] C
Whats best was to go about this?
Use indexing with str, because working with iterables:
df['maths_grade'] = df['grades'].str[2]
Or list comprehension if no missing values and performance is important:
df['maths_grade'] = [x[2] for x in df['grades']]
df[‘math_grade’] = df[‘grades’].apply(lambda x : x[2]) will do the job

Mapping rows of a Pandas dataframe to numpy array

Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc, .iloc, and .index methods and slicing in general. The method .reset_index may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).
Here is a picture that show's what I'm looking for:
I can pull cols of rows from the dataframe based on some search criteria.
idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']
But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?
stuffprime = array[?, ?]
The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.
I would map pandas indices to numpy indicies:
keys_dict = dict(zip(idxlbls, range(len(idxlbls))))
Then you may use the dictionary keys_dict to address the array elements by a pandas index: array[keys_dict[some_df_index], :]
I believe need get_indexer for positions by filtered columns names, for index is possible use same way or numpy.where for positions by boolean mask:
df = pd.DataFrame({'timestamp':list('abadef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))
print (df)
timestamp B C D E
A a 4 7 1 5
B b 5 8 3 3
C a 4 9 5 6
D d 5 4 7 9
E e 5 2 1 2
F f 4 3 0 4
idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
C D E
A 7 1 5
C 9 5 6
a = df.index.get_indexer(stuff.index)
Or get positions by boolean mask:
a = np.where(df['timestamp'] == 'a')[0]
print (a)
[0 2]
b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]

Categories