I have the below data frame
and i have a variable as ID = 1052107168068132864
How I can filter all the values to drop it after that column and can get the result like below. In a way i want to drop all the column after that Id including it as well.
and then update the value of ID as 1052121282324692992 as the current value.
i want to repeat this in a loop so that every time i get a new data frame the same operation will keep going and if that is the top value then nothing should happen.
I am having two solutions but they only works when index are in serial way :-
df.iloc[:df[df.ID == '1052121282324692992'].index.item()]
or
idx = (df['ID'] == ID).idxmax()
new_df = df.iloc[:idx, :]
Related
I have a big df (rates) that contains all information, then I have a second dataframe (aig_df) that contains a couple of rows of the first one.
I need to get a 3rd dataframe that is basically the big one (rates) without the rows on the second one (aig_df), but I need to keep the corresponding indices of the rows that results of rates without aig_df.
With the code I have now, I can get the 3rd dataframe with all the information needed but with int index and I need the index corresponding to each row (Index = Stock Ticker).
rates = pd.read_sql("SELECT Ticker, Carrier, Product, Name, CDSC,StrategyTerm,ParRate,Spread,Fee,Cap FROM ProductRates ", conn).set_index('Ticker')
aig_df = rates.query('Product == "X5 Advantage AnnuitySM"')
competitors_df = pd.merge(rates, aig_df[['Carrier', 'Product', 'Name','CDSC','StrategyTerm','ParRate','Spread','Fee','Cap']],indicator=True,
how='outer').query('_merge=="left_only"').drop('_merge',axis=1)
¿Is there any way to do what I need?
Thanks for your attention
In your specific case, you don't need a merge to do what you want:
result = rates[rates["Product"] != "X5 Advantage AnnuitySM"]
I have to reassign a reassign a column value for specific rows based on state. The data frame I am working with has only two columns, SET VALUE and AMOUNT, with STATE being in the index. I need to change the value of SET VALUE to 'YES' for the 3 customers with the highest value in the AMOUNT column for each state. How can I do this in the pandas framework?
I have attempted to use a for loop on the state in the index and then sort by AMOUNT column values and assign 'YES' to the first three rows in the SET VALUE column.
for state in trial.index:
trial[trial.index == state].sort_values('AMOUNT', ascending = False)['SET VALUE'].iloc[0:3] = 'YES'
print(trial[trial.index == state])
I am expecting the print portion of this loop to include 3 'YES' values but instead all I get are 'NO' values (the default for the column). It is unclear to me why this is happening.
I would advise against repeated index for various reasons. This case being one, as it is harder for you to update the rows. Here's what I would do:
# make STATE a column, and index continuous numbers
df = df.reset_index()
# get the actual indexes of the largest amounts
idx = df.groupby('STATE').AMOUNT.nlargest(3).index.get_level_values(1)
# update
df.loc[idx, 'SET_VALUE'] = 'YES'
I have the below data frame
and i have a variable as ID = 1052107168068132864
How I can filter all the values to drop it after that column and can get the result like below. In a way i want to drop all the column after that Id including it as well.
and then update the value of ID as 1052121282324692992 as the current value.
i want to repeat this in a loop so that every time i get a new data frame the same operation will keep going and if that is the top value then nothing should happen.
Assuming IDs are unique, using iloc
df.iloc[:df[df.ID == '1052121282324692992'].index.item()]
Using idxmax
idx = (df['ID'] == ID).idxmax()
new_df = df.iloc[:idx, :]
I am trying to extract data from Quandl and I want to get the Date and 'Open' value (respectively) for each row. However, I am not sure what I should. Been trying different method that hasn't worked out. Below is an example:
data = quandl.get("EOD/PG", trim_start = "2011-12-12", trim_end =
"2011-12-30", authtoken=quandl.ApiConfig.api_key)
data = data.reset_index()
sta = data[['Date','Open']]
for row in sta:
price = row.iloc[:,1]
date = row.iloc[:, 0]
What you're doing with the code you have provided is iterating through the column names, i.e. you get 'Date' on the first iteration, and 'Open' on the next (and last).
To iterate through a dataframe by row, you can use any one the .iterrows(), .iteritems() or .itertuples() methods.
For example,
for row in data.itertuples():
price = row.Open
date = row.Date
Having said so, iterating through a pandas dataframe is really slow. Chances are, whatever you intend to do could be done faster by making use of pandas' vectorization, i.e. without a loop.
I have been trying to wrap my head around this for a while now and have yet to come up with a solution.
My question is how do I change current column values in multiple columns based on the column name if criteria is met???
I have survey data which has been read in as a pandas csv dataframe:
import pandas as pd
df = pd.read_csv("survey_data")
I have created a dictionary with column names and the values I want in each column if the current column value is equal to 1. Each column contains 1 or NaN. Basically any column within the data frame ending in '_SA' =5, '_A' =4, '_NO' =3, '_D' =2 and '_SD' stays as the current value 1. All of the 'NaN' values remain as is. This is the dictionary:
op_dict = {
'op_dog_SA':5,
'op_dog_A':4,
'op_dog_NO':3,
'op_dog_D':2,
'op_dog_SD':1,
'op_cat_SA':5,
'op_cat_A':4,
'op_cat_NO':3,
'op_cat_D':2,
'op_cat_SD':1,
'op_fish_SA':5,
'op_fish_A':4,
'op_fish_NO':3,
'op_fish_D':2,
'op_fish__SD':1}
I have also created a list of the columns within the data frame I would like to be changed if the current column value = 1 called [op_cols]. Now I have been trying to use something like this that iterates through the values in those columns and replaces 1 with the mapped value in the dictionary:
for i in df[op_cols]:
if i == 1:
df[op_cols].apply(lambda x: op_dict.get(x,x))
df[op_cols]
It is not spitting out an error but it is not replacing the 1 values with the corresponding value from the dictionary. It remains as 1.
Any advice/suggestions on why this would not work or a more efficient way would be greatly appreciated
So if I understand your question you want to replace all ones in a column with 1,2,3,4,5 depending on the column name?
I think all you need to do is iterate through your list and multiple by the value your dict returns:
for col in op_cols:
df[col] = df[col]*op_dict[col]
This does what you describe and is far faster than replacing every value. NaNs will still be NaNs, you could handle those in the loop with fillna if you like too.