Look for name in column and get their index - python

I have a dataframe with several names. I want to look for the name on the pair_stock column and get the index value of that name.
So if i do :
df['pair_stock'].get_loc("MMM-MO")
I want to get a 0
So if i do :
df['pair_stock'].get_loc("WU-ZBH")
But it shows these error
i want to get 5539
But it shows these error:

.get_loc gives you the integer index for an index.
'pair_stock' isn't the index.
One option you have is to make it the index, which I think is actually what you want.
Another option (to get the index value for a row with that label) is akin to this:
df.loc[df['pair_stock']=="MMM-MO"].index.values
That gives you an array. You can grab just the first item, but if you know it's unique, maybe it should just be your index.

You can simply use df.index.
For example, to get all the indices that has the 'pair_stock' name of 'MMM-MO':
df.index[df['pair_stock'] == 'MMM-MO'].tolist()

Related

Drop rows from dataframe based on date index gives "KeyError: "[''column_name'] not found in axis""

I'm trying to create a new dataframe excluding the rows containing two dates, my date column is the index.
When I use
DF2 = DF.drop(DF.loc['03/01/2018':'03/02/2018'])
I get the error
KeyError: "['Column_name1' 'Column_name2'] not found in axis"
I've tried adding axis = 0 to specify that I want to drop rows, but still get the same error
DF2 = DF.drop(DF.loc['03/01/2018':'03/02/2018'], axis = 0)
If I try and print the 'loc' it returns the rows as expected
print(DF.loc['03/01/2018':'03/02/2018'])
your statement just need .index at the end. to do the slicing like this you need loc, but drop wants index as input.
DF2 = DF.drop(DF.loc['03/01/2018':'03/02/2018'].index)
If this doesn't work, then you should check the format of index (needs to be string the way you try to access it)
If index is in datetime.date format you could do it like this:
DF2 = DF.drop(DF.loc[dateimte.date(2018,3,1):datetime.date(2018,3,2)].index)
Since you say your date is the index, when you use DF.loc['03/01/2018':'03/02/2018'], you are locating the rows that are between 03/01/2018 and 03/02/2018.
pandas.DataFrame.drop accepts index or column label, by default it accepts index label. You should use
DF2 = DF.drop(['03/01/2018', '03/02/2018'])
# or
DF2 = DF[~DF.index.isin('03/01/2018', '03/02/2018')]
I had a similar error with the drop function (without loc)
data.drop('column_name')
and it was giving me a similar indexing error. I still had the error after adding .index , but this kwarg fixed it.
data.drop(columns='column_name')

How can I assign a lists elements to corresponding rows of a dataframe in pandas?

I have numbers in a List that should get assigned to certain rows of a dataframe consecutively.
List=[2,5,7,12….]
In my dataframe that looks similar to the below table, I need to do the following:
A frame_index that starts with 1 gets the next element of List as “sequence_number”
Frame_Index==1 then assign first element of List as Sequence_number.
Frame_index == 1 again, so assign second element of List as Sequence_number.
So my goal is to achieve a new dataframe like this:
I don't know which functions to use. If this weren't python language, I would use a for loop and check where frame_index==1, but my dataset is large and I need a pythonic way to achieve the described solution. I appreciate any help.
EDIT: I tried the following to fill with my List values to use fillna with ffill afterwards:
concatenated_df['Sequence_number']=[List[i] for i in
concatenated_df.index if (concatenated_df['Frame_Index'] == 1).any()]
But of course I'm getting "list index out of range" error.
I think you could do that in two steps.
Add column and fill with your list where frame_index == 1.
Use df.fillna() with method="ffill" kwarg.
import pandas as pd
df = pd.DataFrame({"frame_index": [1,2,3,4,1,2]})
sequence = [2,5]
df.loc[df["frame_index"] == 1, "sequence_number"] = sequence
df.ffill(inplace=True) # alias for df.fillna(method="ffill")
This puts the sequence_number as float64, which might be acceptable in your use case, if you want it to be int64, then you can just force it when creating the column (line 4) or cast it later.

Access to DataFrame element by column label containing special character hyphen '-' fails

so i have a
df = read_excel(...)
loop does work:
for i, row in df.iterrows(): #loop through rows
a = df[df.columns].SignalName[i] #column "SignalName" of row i, is read
b = (row[7]) #column "Bus-Signalname" of row i, taken primitively=hardcoded
Access to a is ok, how to replace the hardcoded b = (row[7]) with a dynamically found/located "Bus-Signalname" element from the excel table. Which are the many ways to do this?
b = df[df.columns].Bus-Signalname[i]
does not work.
To access the whole column, run: df['Bus-Signalname'].
So called attribute notation (df.Bus-Signalname) will not work here,
since "-" is not allowed as a part of an attribute name.
It is treated as minus operator, so:
the expression before it is df.Bus, but df probably has no
column with whis name, so an exception is thrown,
what occurs after it (Signalname) is expected to be e.g. a variable,
but you probably have no such variable and this is another reason
which could cause an exception.
Note also that then you wrote [i].
As I understand, i is an integer and you want to access element No i from this column.
Note that the column you retrieved is a Series with index just the
same as your whole DataFrame.
If the index is a default one (consecutive numbers, starting from 0),
you will succeed. Otherwise (if the index does not contain value of i)
you will fail.
A more pandasonic syntax to access an element in a DataFrame is:
df.loc[i, 'Bus-Signalname']
where i is the index of the row in question and Bus-Signalname is the column name.
#Valdi_Bo
thank you. In the loop, both
df.loc[i, 'Bus-Signalname']
and
df['Bus-Signalname'][i]
work.

Pandas Groupby First Value in Column

Is there a way to get the first or last value in a particular of a group in a pandas dataframe after performing a particular groupby ?
For example, I want to get the first value in column_z but this does not work :
df.groupby(by=['A', 'B']).agg({'x':np.sum, 'y':np.max, 'datetime':'count', 'column_z':first()})
The point of getting the first and last value in the group is I would like to eventually get the difference between the two.
I know there is this function: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
But i don't know how to use it with my use case, getting the first value in a particular column after grouping.

Cannot get right slice bound for non-unique label when indexing data frame with python-pandas

I have such a data frame df:
a b
10 2
3 1
0 0
0 4
....
# about 50,000+ rows
I wish to choose the df[:5, 'a']. But When I call df.loc[:5, 'a'], I got an error: KeyError: 'Cannot get right slice bound for non-unique label: 5. When I call df.loc[5], the result contains 250 rows while there is just one when I use df.iloc[5]. Why does this thing happen and how can I index it properly? Thank you in advance!
The error message is explained here: if the index is not monotonic, then both slice bounds must be unique members of the index.
The difference between .loc and .iloc is label vs integer position based indexing - see docs. .loc is intended to select individual labels or slices of labels. That's why .loc[5] selects all rows where the index has the value 250 (and the error is about a non-unique index). iloc, in contrast, select row number 5 (0-indexed). That's why you only get a single row, and the index value may or may not be 5. Hope this helps!
To filter with non-unique indexs try something like this:
df.loc[(df.index>0)&(df.index<2)]
The issue with the way you are addressing is that, there are multiple rows with index as 5. So the loc attribute does not know which one to pick. To know just do a df.loc[5] you will get number of rows with same index.
Either you can sort it using sort_index or you can first aggregate data based on index and then retrieve.
Hope this helps.

Categories