I have this dataframe
For some reason when I did this all_data = all_data.set_index(['Location+Type']) the name 'Location+Type' is now gone. How do I now set the first column of the dataframe to have the name 'Location+Type'?
By using set_index(['Location+Type']) you no longer have the column Location+Type instead, it is now the index of your dataframe therefore when printing, you are not seeing the name. If you wish to recover Location+Type as a column, then you need to use:
all_data = all_data.reset_index()
This will create a new index, while returning the original index (Location+Type) to a column. As the documentation states:
When we reset the index, the old index is added as a column, and a new sequential index is used:
Related
How can one rename a specific column in a pandas DataFrame when the colum name is not unique?
Calling df.rename(columns={'old_name':'new_name'} will rename all the columns with the name "old_name".
Same question for dropping a column when there are duplicate column names.
Since calling df.rename(columns={'old_name':'new_name'} will rename all the columns called "old_name", renaming must be done with the column index.
Get all the indexes of the column of interest:
[col_index for col_index, col_name in enumerate(df.columns) if col_name in col_name_to_find]
Rename:
Once you know which index you'd like to rename: df.columns.values[col_index] = new_col_name
Dropping:
One option is to use pandas built-in duplicate method, giving you the option to keep only the first or last column, or remove them all.
df.loc[:, ~df.columns.duplicated(keep="first")].
This is helpful only if you want to drop all, the first or last duplicate column.
If you have more than 2 duplicated columns and want to keep one that is not the first or last, you can:
Get all the indexes of the column of interest (as explained above) as a list.
Remove the index you want to keep in the df from the list.
calling df.drop(df.iloc[:,[list_of_column_indexes_to_drop]], axis=1)
I have a use case where I need to fill a new pandas column with the contents of a specific cell in the same table. There are 60 countries in Europe, so I need to fill a shared currency column with the content's of one country's currency (as an example only)
I need an SQL "Where" clause for Pandas - that:
1. Searches the dataframe rows for the single occurrence of "Britain" in column "country"
2. Returns a single, unique value "pound" from df['currency'].
3. Creates a new column filled with just this value = string "pound"
w['Euro_currency'] = w['Euro_currency'].map(w.loc["country"]=="Britain"["currency"])
# [Britain][currency] - contains the value - "Pound"
When this works correctly, every row in the new column 'Euro_currency' contains the value "pound"
How about you take the value from that cell and just create a new column with it as below:
p = w.loc["Britain"]["currency"]
w['Euro_currency'] = p
Does this work for you?
Thanks for help. I found this answer by #anton-protopopov at extract column value based on another column pandas dataframe
currency_value = df.loc[df['country'] == 'Britain', 'currency'].iloc[0]
df['currency_britain'] = currency_value
#anderson-zhu also mentioned that .item() would work as well
currency_value = df.loc[df['country'] == 'Britain', 'currency'].item()
I'm trying to clean an excel file that has some random formatting. The file has blank rows at the top, with the actual column headings at row 8. I've gotten rid of the blank rows, and now want to use the row 8 string as the true column headings in the dataframe.
I use this code to get the position of the column headings by searching for the string 'Destination' in the whole dataframe, and then take the location of the True value in the Boolean mask to get the list for renaming the column headers:
boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[7]
print(hdrstr)
df2=df.rename(columns=hdrstr)
However when I try to use hdrindex as a variable, I get errors when the second dataframe is created (ie when I try to use hdrstr to replace column headings.)
boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[hdrindex]
print(hdrstr)
df2=df.rename(columns=hdrstr)
How do I use a variable to specify an index, so that the resulting list can be used as column headings?
I assume your indicator of actual header rows in dataframe is string "destination". Lets find where it is:
start_tag = df.eq("destination").any(1)
We'll keep the number of the index of first occurrence of word "destination" for further use:
start_row = df.loc[start_tag].index.min()
Using index number we will get list of values in the "header" row:
new_col_names = df.iloc[start_row].values.tolist()
And here we can assign new column names to dataframe:
df.columns = new_col_names
From here you can play with new dataframe, actual column names and proper indexing:
df2 = df.iloc[start_row+1:].reset_index(drop=True)
i have a dataframe, after grouping, it is like this now:
now i want to move row index(name) to be the first column, how to do that ?
i tried to do like this:
gr.reset_index(drop=True)
but the effect is like this:
name field now has count information,
Don't specify the drop parameter, as as it means, it will drop the index, and also probably better to rename the index, since you have a name column already:
gr.index.name = "company"
gr = gr.reset_index()
I have been trying to wrap my head around this for a while now and have yet to come up with a solution.
My question is how do I change current column values in multiple columns based on the column name if criteria is met???
I have survey data which has been read in as a pandas csv dataframe:
import pandas as pd
df = pd.read_csv("survey_data")
I have created a dictionary with column names and the values I want in each column if the current column value is equal to 1. Each column contains 1 or NaN. Basically any column within the data frame ending in '_SA' =5, '_A' =4, '_NO' =3, '_D' =2 and '_SD' stays as the current value 1. All of the 'NaN' values remain as is. This is the dictionary:
op_dict = {
'op_dog_SA':5,
'op_dog_A':4,
'op_dog_NO':3,
'op_dog_D':2,
'op_dog_SD':1,
'op_cat_SA':5,
'op_cat_A':4,
'op_cat_NO':3,
'op_cat_D':2,
'op_cat_SD':1,
'op_fish_SA':5,
'op_fish_A':4,
'op_fish_NO':3,
'op_fish_D':2,
'op_fish__SD':1}
I have also created a list of the columns within the data frame I would like to be changed if the current column value = 1 called [op_cols]. Now I have been trying to use something like this that iterates through the values in those columns and replaces 1 with the mapped value in the dictionary:
for i in df[op_cols]:
if i == 1:
df[op_cols].apply(lambda x: op_dict.get(x,x))
df[op_cols]
It is not spitting out an error but it is not replacing the 1 values with the corresponding value from the dictionary. It remains as 1.
Any advice/suggestions on why this would not work or a more efficient way would be greatly appreciated
So if I understand your question you want to replace all ones in a column with 1,2,3,4,5 depending on the column name?
I think all you need to do is iterate through your list and multiple by the value your dict returns:
for col in op_cols:
df[col] = df[col]*op_dict[col]
This does what you describe and is far faster than replacing every value. NaNs will still be NaNs, you could handle those in the loop with fillna if you like too.