Cleaning Data: Replacing Current Column Values with Values mapped in Dictionary - python

I have been trying to wrap my head around this for a while now and have yet to come up with a solution.
My question is how do I change current column values in multiple columns based on the column name if criteria is met???
I have survey data which has been read in as a pandas csv dataframe:
import pandas as pd
df = pd.read_csv("survey_data")
I have created a dictionary with column names and the values I want in each column if the current column value is equal to 1. Each column contains 1 or NaN. Basically any column within the data frame ending in '_SA' =5, '_A' =4, '_NO' =3, '_D' =2 and '_SD' stays as the current value 1. All of the 'NaN' values remain as is. This is the dictionary:
op_dict = {
'op_dog_SA':5,
'op_dog_A':4,
'op_dog_NO':3,
'op_dog_D':2,
'op_dog_SD':1,
'op_cat_SA':5,
'op_cat_A':4,
'op_cat_NO':3,
'op_cat_D':2,
'op_cat_SD':1,
'op_fish_SA':5,
'op_fish_A':4,
'op_fish_NO':3,
'op_fish_D':2,
'op_fish__SD':1}
I have also created a list of the columns within the data frame I would like to be changed if the current column value = 1 called [op_cols]. Now I have been trying to use something like this that iterates through the values in those columns and replaces 1 with the mapped value in the dictionary:
for i in df[op_cols]:
if i == 1:
df[op_cols].apply(lambda x: op_dict.get(x,x))
df[op_cols]
It is not spitting out an error but it is not replacing the 1 values with the corresponding value from the dictionary. It remains as 1.
Any advice/suggestions on why this would not work or a more efficient way would be greatly appreciated

So if I understand your question you want to replace all ones in a column with 1,2,3,4,5 depending on the column name?
I think all you need to do is iterate through your list and multiple by the value your dict returns:
for col in op_cols:
df[col] = df[col]*op_dict[col]
This does what you describe and is far faster than replacing every value. NaNs will still be NaNs, you could handle those in the loop with fillna if you like too.

Related

New column in DataFrame from other columns AND rows

I want to create a new column, V, in an existing DataFrame, df. I would like the value of the new column to be the difference between the value in the 'x' column in that row, and the value of the 'x' column in the row below it.
As an example, in the picture below, I want the value of the new column to be
93.244598 - 93.093285 = 0.151313.
I know how to create a new column based on existing columns in Pandas, but I don't know how to reference other rows using this method. Is there a way to do this that doesn't involve iterating over the rows in the dataframe? (since I have read that this is generally a bad idea)
You can use pandas.DataFrame.shift for your use case.
The last row will not have any row to subtract from so you will get the value for that cell as NaN
df['temp_x'] = df['x'].shift(-1)
df[`new_col`] = df['x'] - df['temp_x']
or one liner :
df[`new_col`] = df['x'] - df['x'].shift(-1)
the column new_col will contain the expected data
An ideal solution is to use diff:
df['new'] = df['x'].diff(-1)

Replace column Values based on Index of other Dataframe

I am trying to replace the Values in the "All Assortment" column of the "buyer" data frame.
I need to replace them with the data from the "All Stores" column of the "asl" data frame. The twist is that the index values of the asl data frame are the values that need to match for the replacement to work.
Hard to say without a minimal reproducible example, but try mapping the values of buyer['All Assortment'] to corresponding values from the asl['All Stores'] column based on the asl index:
buyer['All Assortment'] = buyer['All Assortment'].map(asl['All Stores'])

Creating a Pandas column based on a value in a specific row and column with .map or similar

I have a use case where I need to fill a new pandas column with the contents of a specific cell in the same table. There are 60 countries in Europe, so I need to fill a shared currency column with the content's of one country's currency (as an example only)
I need an SQL "Where" clause for Pandas - that:
1. Searches the dataframe rows for the single occurrence of "Britain" in column "country"
2. Returns a single, unique value "pound" from df['currency'].
3. Creates a new column filled with just this value = string "pound"
w['Euro_currency'] = w['Euro_currency'].map(w.loc["country"]=="Britain"["currency"])
# [Britain][currency] - contains the value - "Pound"
When this works correctly, every row in the new column 'Euro_currency' contains the value "pound"
How about you take the value from that cell and just create a new column with it as below:
p = w.loc["Britain"]["currency"]
w['Euro_currency'] = p
Does this work for you?
Thanks for help. I found this answer by #anton-protopopov at extract column value based on another column pandas dataframe
currency_value = df.loc[df['country'] == 'Britain', 'currency'].iloc[0]
df['currency_britain'] = currency_value
#anderson-zhu also mentioned that .item() would work as well
currency_value = df.loc[df['country'] == 'Britain', 'currency'].item()

How to expand a list in a pandas dataframe without repeating other column values

I was wondering how I would be able to expand out a list in a cell without repeating variables in other cells.
The goal is to get it so that the list is expanded but the first column is not repeated. I know how to expand the list out but I would not like to have the first column values repeated if that is possible. Thank you for any help!!
In order to get what you're asking for, you still have to use explode() to get what you need. You just have to take it a step further and change the values of the first column. Please note that this will destroy the association between the elements of the list and the letter of the row they were first in. You would be creating a third value for the column (an empty string) that would be repeated for every record not beginning with 1.
If you want to eliminate the value from the rows you are talking about but still want to have those records associated with the value that their list was associated with, you can't. It's not logically possible for a value to both be in a given cell but also not be in that cell. So, I will show you the steps for eliminating the original association.
For this example, I named the columns since they are not provided.
data = [
["a",["1 hey","2 hi","3 hello"]],
["b",["1 what","2 how","3 say"]]
]
df = pd.DataFrame(data,columns=["first","second"])
df = df.explode("second")
df['first'] = df.apply(lambda x: x['first'] if x['second'][0] == '1' else '', axis=1)

How to update a subset of Pandas DataFrame rows with new (different) values?

If I want to set the same value for all matches I can do:
df.loc[df.A.isin(some_list), 'C'] = value
The replace and update methods change existing values in some column, but don't seem to allow changing values in column C for the matching entries in column A as explained below.
Remap values in pandas column with a dict
But what if I have a dictionary (or two aligned lists - one with A and one with C values)?
I can loop over the keys in dict and change the values one-by-one, but that is awfully slow.
If I understand your problem correctly then you want to change the values in column C based on values in column A and the actual value assigned to C is looked up in a dictionary but still you want to leave those rows untouched where a value in A is not present in the dictionary mapping.
Dictionary m is used for mapping values from column A to the target value:
df = pandas.DataFrame({'A': [1,2,3,4,5,6,7,8,9], 'C': [0,0,0,0,0,0,0,0,0]})
m = {1:1,3:1,6:1,8:1}
Then you need to select all rows in A that match the keys of the dictionary using select. Then you map the values of column A using m and assign the result to the filtered values of column C. The other values remain like before.
select = df['A'].isin(m.keys())
df.loc[select, 'C'] = df.loc[select, 'A'].map(m)

Categories