Modifying entries of a pandas dataframe within a loop

Modifying entries of a pandas dataframe within a loop - python

I want to add the probabilities of each record within a dataframe for that I used a for loop
def map_score(dataframe,customers,prob):
dataframe['Propensity'] = 0
for i in range(len(dataframe)):
for j in range(len(customers)):
if dataframe['Client'].iloc[i] == customers[j]:
dataframe["Propensity"].iloc[i] = prob[j]
I am able to map the Probabilities associated with each client correctly, but Python throws a warning message
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
from ipykernel import kernelapp as app
When I use .loc function, the result is erroneous and I am getting null values.
Kindly suggest a good method to update and add entries conditionally

You are attempting to make assignment on a copy.
dataframe["Propensity"] is one column, yet a "copy" of dataframe.
However, you are tracking index position with i. So how do you get to use .loc when you have a column name "Propensity" and an index location i.
Assign some variable, say idx, equal to dataframe.index at that location
idx = dataframe.index[i]
Then you can use .loc with the assignment and without issues
dataframe.loc[idx, "Propensity"] = prob[j]
def map_score(dataframe,customers,prob):
dataframe['Propensity'] = 0
for i in range(len(dataframe)):
idx = dataframe.index[i]
for j in range(len(customers)):
if dataframe['Client'].iloc[i] == customers[j]:
dataframe.loc[idx, "Propensity"] = prob[j]

Related

apply max to varying-dimension subsets of pandas dataframe

For a dataframe with an indexed column with repeated indexes, I'm trying to get the maximum value found in a different column, by index, and assign it to a third column, so that for any given row, we can see the maximum value found in any row with the same index.
I'm doing this over a very large data set and would like it to be vectorized if possible. For now, I can't get it to work at all
multiindexDF = pd.DataFrame([[1,2,3,3,4,4,4,4],[5,6,7,10,15,11,25,89]]).transpose()
multiindexDF.columns = ['theIndex','theValue']
multiindexDF['maxValuePerIndex'] = 0
uniqueIndicies = multiindexDF['theIndex'].unique()
for i in uniqueIndices:
matchingIndices = multiindexDF['theIndex'] == i
maxValue = multiindexDF[matchingIndices == i]['theValue'].max()
multiindexDF.loc[matchingIndices]['maxValuePerIndex'] = maxValue
This fails, telling me I should use .loc, when I'm already using it. Not sure what the error means, and not sure how I can fix this so I don't have to loop through everything so I can vectorize it instead
I'm looking for this
targetDF = pd.DataFrame([[1,2,3,3,4,4,4,4],[5,6,10,7,15,11,25,89],[5,6,10,10,89,89,89,89]]).transpose()
targetDF

Looks like this is a good case for groupby transform, this can get the maximum value per index group and transform them back onto their original index (rather than the grouped index):
multiindexDF['maxValuePerIndex'] = multiindexDF.groupby("theIndex")["theValue"].transform("max")
The reason you're getting the SettingWithCopyWarning is that in your .loc call you're taking a slice of a slice and setting the value there, see the two pair of square brackets in:
multiindexDF.loc[matchingIndices]['maxValuePerIndex'] = maxValue
So it tries to assign the value to the slice rather than the original DataFrame, you're doing a .loc and then another [] after it in a chain.
So using your original approach:
for i in uniqueIndices:
matchingIndices = multiindexDF['theIndex'] == i
maxValue = multiindexDF.loc[matchingIndices, 'theValue'].max()
multiindexDF.loc[matchingIndices, 'maxValuePerIndex'] = maxValue
(Notice I've also changed the first .loc where you were incorrectly using the boolean index)

Check whether column values are within range

Here's what I have in my dataframe-
RecordType Latitude Longitude Name
L 28.2N 70W Jon
L 34.3N 56W Dan
L 54.2N 72W Rachel
Note: The dtype of all the columns is object.
Now, in my final dataframe, I only want to include those rows in which the Latitude and Longitude fall in a certain range (say 24 < Latitude < 30 and 79 < Longitude < 87).
My idea is to apply a function to all the values in the Latitude and Longitude columns to first get float values like 28.2, etc. and then to compare the values to see if they fall into my range. So I wrote the following-
def numbers(value):
return float(value[:-1])
result[u'Latitude'] = result[u'Latitude'].apply(numbers)
result[u'Longitude'] = result[u'Longitude'].apply(numbers)
But I get the following warning-
Warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I'm having a hard time understanding this since I'm new to Pandas. What's the best way to do this?

If you don't want to modify df, I would suggest getting rid of the apply and vectorising this. One option is using eval.
u = df.assign(Latitude=df['Latitude'].str[:-1].astype(float))
u['Longitude'] = df['Longitude'].str[:-1].astype(float)
df[u.eval("24 < Latitude < 30 and 79 < Longitude < 87")]
You have more options using Series.between:
u = df['Latitude'].str[:-1].astype(float))
v = df['Longitude'].str[:-1].astype(float))
df[u.between(24, 30, inclusive=False) & v.between(79, 87, inclusive=False)]

As for why Pandas threw that particular A value is trying to be set on a copy of a slice... warning and how to avoid it:
First, using this syntax should prevent the error message:
result.loc[:,'Latitude'] = result['Latitude'].apply(numbers)
Pandas gave you the warning because your .apply() function may be attempting to modify a temporary copy of Latitude/Longitude columns in your dataframe. Meaning, the column is copied to a new location in memory before the operation is performed on it. The article you referenced (http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) gives examples of why this could potentially cause unexpected problems in certain situations.
Pandas instead recommends that you instead use syntax that will ensure you are modifying a view of your dataframe's column with the .apply() operation. Doing this will ensure that your dataframe ends up being modified in the manner you expect. The code I wrote above using .loc will tell Pandas to access and modify the contents of that column in-place in memory, and this will keep Pandas from throwing the warning that you saw.

Assigning pandas selection to a variable and then modifying it

I am trying to select some rows from a pandas dataframe and store the subset/selection into a variable so I can perform multiple operations on this subset (including modification) without having to do the selection again. But I don't quite understand why it doesn't work.
For example, this doesn't work as expected (the original df doesn't get modified):
df = pd.DataFrame({"a":list(range(1,3))})
subDf = df.loc[df.a==2,:]
subDf.loc[:,"a"] = -1 # also throws SettingWithCopyWarning
# ... do more stuff with subDf...
But, this works as expected:
df = pd.DataFrame({"a":list(range(1,3))})
mask = (df.a==2)
df.loc[mask,"a"] = -1
After reading the pandas docs on indexing view vs copy, I was under the impression that selecting via .loc will return a view, but apparently that's not the case given the SettingWithCopyWarning. What am I misunderstanding here?

In subDf = df.loc[df.a==2,:] the method you are using is actually __getitem__ (df.loc.__getitem__) which is not guaranteed to return a view. When you assign something to loc (for example df.loc[mask,"a"] = -1) you are actually calling __setitem__ (df.loc.__setitem__). Here, since it has to assign a value to that slice, it is guaranteed to be a view.

Assigning New Column's Value From a List of Values Warning For Large DataSets - Pandas

I have a list of Data Frames like this:
sm = pd.DataFrame([["Forever", 'BenHarper'],["Steel My Kisses", 'Kack Johnson'],\
["Diamond On the Inside",'Xavier Rudd'],[ "Count On Me", "Bruno Mars"]],\
columns=["Song", "Artist"])
pm = pd.DataFrame([["I am yours", 'Jack Johnson'],["Chasing Cars", 'Snow Patrol'],\
["Kingdom Comes",'Cold Play'],[ "Time of your life", "GreenDay"]],\
columns=["Song", "Artist"])
df_list = [sm,pm]
Now, I have another list of values that I like to assign as a new column to Data Frames in my list of Data Frames.
years = ["1999", "2003"]
I used the following code, (it works okay for smaller data sets)
df_with_year = []
for df in df_list:
for j in years:
df["Year"] = j
df_with_year.append(df)
However, when I use this same logic for bigger dataset, I am getting an error:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Any ideas why I am getting this Copying error. I went through the provided link, it talks about a column that already exists, in which case I can use .loc. In my case, I am creating a new column and assigning values.

If your Datafame df is itself a sub-DataFrame of some other parent_df, this
SettingWithCopyWarning is often triggered by lines like df["Year"] = j or even df.loc[:, "Year"] = j.
As long as you are not trying to use df["Year"] = j as a way to modify parent_df, you can always safely ignore
SettingWithCopyWarning.
If you'd rather not see the warning anyway, you can silence it globally by setting
pd.options.mode.chained_assignment = None

Python view vs copy error wants me to use .loc in script only

I'm running a long script which has a dataframe df. as the script runs, building up and modifying df column by column I get this error over and over again in the command line:
A value is trying to be set on a copy of a slice from a DataFrame. Try
using .loc[row_indexer,col_indexer] = value instead See the caveats in
the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
But then it will print out the line that is causing the warning and it wont look like a problem. Lines such as the following will trigger it (each line triggered it separately):
df['ZIP_DENS'] = df['ZIP_DENS'].astype(str)
df['AVG_WAGE'] = df['AVG_WAGE'].astype(str).apply(lambda x:x if x != 'nan' else 'unknown')
df['TERM_BIN'] = df['TERMS'].map(terms_dict)
df['LOSS_ONE'] = 'T_'+ df['TERM'].astype(str) +'_C_'+ df['COMP'].astype(str) + df['SIZE']
# this one's inside a loop:
df[i + '_BIN'] = df[i + '_BIN'].apply(lambda x:x if x != 'nan' else 'unknown')
There are some examples of the mutations I'm making on the dataframe. Now, this warning just started showing up but I can't recreate this problem in the interpreter. When I open a terminal I try things like this and it gives me no warnings:
import pandas as pd
df = pd.DataFrame([list('ab'),list('ef')],columns=['first','second'])
df['third'] = df[['first','second']].astype('str')
Is there something I'm missing, something I don't understand about the nature of DataFrames that this warning is trying to tell me? Do you think perhaps I did something to this dataframe at the beginning of the script and then all subsequent mutations on the object are mutations on a view or a copy of it or something weird like that is going on?

As I mentioned in my comment, the likely issue is that somewhere upstream in your code, you assigned a slice of some other pd.DataFrame to df.
This is a common cause of confusion and is also explained under why-does-assignment-fail-when-using-chained-indexing in the link that the Warning mentions.
A minimal example:
data = pd.DataFrame({'a':range(7), 'b':list('abcccdb')})
df = data[data.a % 2 == 0] #making a subselection of the DataFrame
df['b'] = 'b'
/home/user/miniconda3/envs/myenv/lib/python3.6/site-packages/ipykernel_launcher.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
Notice that this section:
df = data[data.a % 2 == 0] #making a subselection of the DataFrame
df['b'] = 'b'
could just as well be rewritten this way:
data[data.a % 2 == 0]['b'] = 'b' #obvious chained indexing
df = data[data.a % 2 == 0]
The correct way of writing this bit is the following way:
data = pd.DataFrame({'a':range(7), 'b':list('abcccdb')})
df = data.loc[data.a % 2 == 0].copy() #making a copy of the subselection
df.loc[:,'b'] = 'b'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Modifying entries of a pandas dataframe within a loop - python

Related

apply max to varying-dimension subsets of pandas dataframe

Check whether column values are within range

Assigning pandas selection to a variable and then modifying it

Assigning New Column's Value From a List of Values Warning For Large DataSets - Pandas

Python view vs copy error wants me to use .loc in script only

Categories

Resources