I want to create third column to existing dataframe, having the same values of 2nd column using iloc pandas method.
What are the options do I have ?
df = pd.DataFrame([*zip([1,2,3],[4,5,6])])
here is one way to do it
df.insert(len(df.columns), # position of new column, it points to column after last column
len(df.columns), # naming could be the location of column
value=df.iloc[:,1] # grab the values from last column
)
df
0 1 2
0 1 4 4
1 2 5 5
2 3 6 6
Related
Context: I'd like to "bump" the index level of a multi-index dataframe up. In other words, I'd like to put the index level of a dataframe at the same level as the columns of a multi-indexed dataframe
Let's say we have this dataframe:
tt = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
tt.index.name = 'Index Column'
And we perform this change to add a multi-index level (like a label of a table)
tt = pd.concat([tt],keys=['Multi-Index Table Label'], axis=1)
Which results in this:
Multi-Index Table Label
A B C
Index Column
0 1 4 7
1 2 5 8
2 3 6 9
Desired Output: How can I make it so that the dataframe looks like this instead (notice the removal of the empty level on the dataframe/table):
Multi-Index Table Label
Index Column A B C
0 1 4 7
1 2 5 8
2 3 6 9
Attempts: I was testing something out and you can essentially remove the index level by doing this:
tt.index.name = None
Which would result in :
Multi-Index Table Label
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Essentially removing that extra level/empty line, but the thing is that I do want to keep the Index Column as it will give information about the type of data present on the index (which in this example are just 0,1,2 but can be years, dates, etc).
How could I do that?
Thank you all in advance :)
How about this:
tt = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
tt.insert(loc=0, column='Index Column', value=tt.index)
tt = pd.concat([tt],keys=['Multi-Index Table Label'], axis=1)
tt = tt.style.hide_index()
When we make a new column in a dataset in pandas
df["Max"] = df.iloc[:, 5:7].sum(axis=1)
If we are only getting the columns from index 5 to index 7, why do we need to pass: as all the columns.
pandas.DataFrame.iloc() is used purely for integer-location based indexing for selection by position (read here for documentation). The : means all rows in the selected columns, here column index 5 and 6 (iloc is not inclusive of the last index).
You are using .iloc() to take a slice out of the dataframe and apply an aggregate function across columns of the slice.
Consider an example:
df = pd.DataFrame({"a":[0,1,2],"b":[2,3,4],"c":[4,5,6]})
df
would produce the following dataframe
a b c
0 0 2 4
1 1 3 5
2 2 4 6
You are using iloc to avoid dealing with named columns, so that
df.iloc[:,1:3]
would look as follows
b c
0 2 4
1 3 5
2 4 6
Now a slight modification of your code would get you a new column containing sums across columns
df.iloc[:,1:3].sum(axis=1)
0 6
1 8
2 10
Alternatively you could use function application:
df.apply(lambda x: x.iloc[1:3].sum(), axis=1)
0 6
1 8
2 10
Thus you explicitly tell to apply sum across columns. However your syntax is more succinct and is preferable to explicit function application. The result is the same as one would expect.
Is there any way to select the row by index (i.e. integer) and column by column name in a pandas data frame?
I tried using loc but it returns an error, and I understand iloc only works with indexes.
Here is the first rows of the data frame df. I am willing to select the first row, column named 'Volume' and tried using df.loc[0,'Volume']
Use get_loc method of Index to get integer location of a column name.
Suppose this dataframe:
>>> df
A B C
10 1 2 3
11 4 5 6
12 7 8 9
You can use .iloc like this:
>>> df.iloc[1, df.columns.get_loc('B')]
5
I have a series and df
s = pd.Series([1,2,3,5])
df = pd.DataFrame()
When I add columns to df like this
df.loc[:, "0-2"] = s.iloc[0:3]
df.loc[:, "1-3"] = s.iloc[1:4]
I get df
0-2 1-3
0 1 NaN
1 2 2.0
2 3 3.0
Why am I getting NaN? I tried create new series with correct idxs, but adding it to df still causes NaN.
What I want is
0-2 1-3
0 1 2
1 2 3
2 3 5
Try either of the following lines.
df.loc[:, "1-3"] = s.iloc[1:4].values
# -OR-
df.loc[:, "1-3"] = s.iloc[1:4].reset_index(drop=True)
Your original code is trying unsuccessfully to match the index of the data frame df to the index of the subset series s.iloc[1:4]. When it can't find the 0 index in the series, it places a NaN value in df at that location. You can get around this by only keeping the values so it doesn't try to match on the index or resetting the index on the subset series.
>>> s.iloc[1:4]
1 2
2 3
3 5
dtype: int64
Notice the index values since the original, unsubset series is the following.
>>> s
0 1
1 2
2 3
3 5
dtype: int64
The index of the first row in df is 0. By dropping the indices with the values call, you bypass the index matching which is producing the NaN. By resetting the index in the second option, you make the indices the same.
I have the following DataFrame:
index PUBLICO CLASSIFICACAO_PUBLICO
0 19 143643 1
1 34 111879 2
2 31 50382 3
3 9 49204 4
4 32 37541 5
5 4 36095 6
I need convert the index name column to index column.
For example:
index PUBLICO CLASSIFICACAO_PUBLICO
19 143643 1
34 111879 2
31 50382 3
9 49204 4
32 37541 5
4 36095 6
I try use df.set_index('index'), but it didn't work.
The column with the name index previously was the index column the DataFrame, but I used reset_index(); now I need to do the reverse.
The method set_index doesn't work inplace. So that you have to reassign your dataframe, or to pass the option inplace = True:
df = df.set_index('index')
or
df.set_index('index',inplace = True)
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html
You can try it this way:
df.set_index(df['index'], inplace=True)
This will set your index column as the index in your dataframe and your index column will still remain in your dataframe as well. Then, you can just drop that column.
df.drop('index', axis=1, inplace=True)