Returning column from dataframe by name - python

I have dataframe with given names of columns and I want to to return a column with specified name:
name_of_column = 'name1' # string variable
I tried to use this:
dataframe.iloc[:, name_of_column]
But it did not work. What should I do?

Use loc instead of iloc and your syntax will work. iloc is for indexing by integer position (this is what the i stands for), while loc is for indexing by label. So you can use:
dataframe.loc[:, name_of_column]
Having said this, the more usual way to retrieve a series is to use __getitem__ directly:
dataframe[name_of_column]

You can just do:
dataframe[column_name]
Will select the column.
iloc() method finds an item in pandas by index.
More examples the selecting data you can find in Pandas Indexing and Selecting Data

Related

panda dataframe extracting values

I have a dataframe called "nums" and am trying to find the value of the column "angle" by specifying the values of other columns like this:
nums[(nums['frame']==300)&(nums['tad']==6)]['angl']
When I do so, I do not get a singular number and cannot do calculations on them. What am I doing wrong?
nums
First of all, in general you should use .loc rather than concatenate indexes like that:
>>> s = nums.loc[(nums['frame']==300)&(nums['tad']==6), 'angl']
Now, to get the float, you may use the .item() accessor.
>>> s.item()
-0.466331

Pandas Group By and Sum , Header being removed

after I run the following code I seem to lose the headers of my dataframe. If i remove the below line, my headers exist.
unifiedview = unifiedview.groupby(['key','MTM'])['MTM'].sum()
When i use to_csv my excel has no headers.
ive tried :
unifiedview = unifiedview.groupby(['key','MTM'], as_index = False)['MTM'].sum()
unifiedview = unifiedview.reset_index()
any help would be appreciated.
Calling
unifiedview.groupby(['key','MTM'])['MTM']'
will return a Pandas Series of only the 'MTM' column...
Therefore, the expression
unifiedview.groupby(['key','MTM'])['MTM'].sum() will return the sum of the GroupBy'd 'MTM' column...
unifiedview.groupby(['key','MTM']).sum().reset_index() should return the sum of all columns in unifiedview of the int or float dtype.
Are you looking to preserve all columns from the original dataframe?
Also, you must place an aggregate function after the groupby clause...
unifiedview.groupby(['key','MTM']) must have a .count(), .sum(), .mean(), ... method to group your columns...
unifiedview.groupby(['key','MTM']).sum()
unifiedview.groupby(['key','MTM']).count()
unifiedview.groupby(['key','MTM']).mean()
Is this helping you get in the right direction?
What version of pandas are you using? If you check the documentation it states:
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
Changed in version 0.24.0: Previously defaulted to False for Series
Since you are transforming your dataframe into a series object this might be the cause of your issue.
The documenation can be found here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

Create a column that is a shift of the index in a Pandas DataFrame

I want to create a new column in my DataFrame equal to the Index column but shifted 1 position upwards.
I know how to use the shift function to a columns that can be referred to in a df['column_name'] way, but I don't know how to do this with the index column.
I have tried df.index.shift(-1) but it doesn't work. df['index_name'].shift(-1) doesn't work either.
The desired result of would be to create a column which the index but shifted, just as if I did df['column2'] = df['column1'].shift(-1).
This should work:
df["index_shifted"] = df.reset_index().iloc[:, 0].shift(-1)
The reset_index function put the index in the first column of the new DataFrame created and is selected by iloc. Then it is shifted backward by one step by the shift function.
Documentation of these functions:
reset_index
iloc
shift
Don't use this
#Jezrael's answer is a better choice, because more efficient, especially for big Dataframes.
Use Index.to_series, because shift is not implemented for some Index like RangeIndex yet:
df["index_shifted"] = df.index.to_series().shift(-1)
If check Index.shift:
Notes
This method is only implemented for datetime-like index classes, i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.

Retrieval of single entry by np.datetime index value failing, but not failing by range

I have a pandas dataframe, whose index is based on the numpy datetime type.
I can easily access a range of dataframe entries:
for t in df.index.values:
print(df[:t])
However have problems (KeyError) whenever I try to access a specific value.
for t in df.index.values:
print(df[t])
End up with a workaround using .iloc, but it is messy.
Try this:
for t in df.index:
print(df.loc[t])
df[t] is for column indexing, since you are iterating on your row index, use loc or 'iloc`.
for t in df.index.values:
print(df.loc[:t])
for i, t in enumerate(df.index.values):
print(df.iloc[:i])
Since you mentioned that your index is based on numpy DateTime. Pandas also accept slicing using datetime as follows:
df.loc[: '2018-01-01'] #All entries up to date 1st of Jan 2018
Found a universal simple solution I was looking for.
For single row index.
df[t:t]
For multi-row index.
df[:t]
Returns a nice df even for single index (with no weird *.loc style transformations where the column headers and index flip).

How can I set the index of a generated pandas Series to a column from a DataFrame?

In pandas this operation creates a Series:
q7.loc[:, list(q7)].max(axis=1) - q7.loc[:, list(q7)].min(axis=1)
I would like to be able to set the index as a list of values from a df colum. Ie
list(df['Colname'])
I've tried to create the series then update it with the series generated from the first code snippet. I've also searched the docs and don't see a method that will allow me to do this. I would prefer not to manually iterate over it.
Help is appreciated.
You can simply store that series to a variable say S and set the index accordingly as shown below..!!
S = (q7.loc[:, list(q7)].max(axis=1) - q7.loc[:, list(q7)].min(axis=1))
S.index = df['Colname']
The code is provided assuming the lengths of the series and Column from the dataframe is equal. Hope this helps.!!
If you want to reset series s index, you can do:
s.index = new_index_list

Categories