I wonder if it is possible to retrieve the last 5 values of an excel column instead of all the values in that same column.
Currently I am able to select all the data in the column with the following piece of code:
var= pd.read_excel("Path/MyFile.xlsx",'MS3',skiprows=15)
xDate = list(var['Date'])
Is there a way to retrieve the last 5 values in this column?
yes you can use tail like head
var.tail(5)
you can simply go for this:
xDate[-5:]
Related
I cant seem to find a way to split all of the array values from the column of a dataframe.
I have managed to get all the array values using this code:
The dataframe is as follows:
I want to use value.counts() on the dataframe and I get this
I want the array values that are clubbed together to be split so that I can get the accurate count of every value.
Thanks in advance!
You could try .explode(), which would create a new row for every value in each list.
df_mentioned_id_exploded = pd.DataFrame(df_mentioned_id.explode('entities.user_mentions'))
With the above code you would create a new dataframe df_mentioned_id_exploded with a single column entities.user_mentions, which you could then use .value_counts() on.
I have a dataframe that I created from a master table in SQL. That new dataframe is then grouped by type as I want to find the outliers for each group in the master table.
The function finds the outliers, showing where in the GroupDF they outliers occur. How do I see this outliers as a part of the original dataframe? Not just volume but also location, SKU, group etc.
dataframe: HOSIERY_df
Code:
##Sku Group Data Frames
grouped_skus = sku_volume.groupby('SKUGROUP')
HOSIERY_df = grouped_skus.get_group('HOSIERY')
hosiery_outliers = find_outliers_IQR(HOSIERY_df['VOLUME'])
hosiery_outliers
#.iloc[[hosiery_outliers]]
#hosiery_outliers
Picture to show code and output:
I know enough that I need to find the rows based on location of the index. Like Vlookup in Excel but i need to do it with in Python. Not sure how to pull only the 5, 6, 7...3888 and 4482nd place in the HOSIERY_df.
You can provide a list of index numbers as integers to iloc, which it looks like you have tried based on your commented-out code. So, you may want to make sure that find_outliers_IQR is returning a list of int so it will work properly with iloc, or convert it's output.
It looks like it's currently returning a DataFrame. You can get the index of that frame as a list like this:
hosiery_outliers.index.tolist()
I have multiple
I want to get rows of Name.
I know how to get by index using dataframe but I want to get using row name as index might change.
like
(row=="Name" ) or (row== "name")
output be like :
Thanks in advance!
If you want the name column name = df['Name']
I know that if we want to check if one value exists in a dataframe we use isin(). However, I want the position or positions where it is found in the other dataframe.
Like df1['Column1'].isin(df2['Column2']) only returns True if it is contained in df2. But I want the position where it is found in df2.
1 I do not want to loop over the dataframes because I have a very large dataset. Is there any function in pandas or a quick way to do it without having to loop?
Each line in pandas dataframe has its index (0-... as default or changed by you). If you would like to get the position , try to use .index:
df1[df1['Column1'].isin(df2['Column2'])].index
Updated:
df1['df1_index']=pd.DataFrame(df1[df1['col1'].isin(df2['col1'])].index).astype('int')
df1['df2_index']=pd.DataFrame(df2[df2['col1'].isin(df1['col1'])].index).astype('int')
You might try this:
filtered_df = df1[df1['Column1'] == df2['Column2']]
print(filtered_df)
Does this work?
Apologies if this is contained in a previous answer but I've read this one: How to select rows from a DataFrame based on column values? and can't work out how to do what I need to do:
Suppose have some pandas dataframe X and one of the columns is 'timestamp'. The entries are formatted like '2010-11-03 09:44:05'. I want to select just those rows that correspond to a specific day, for example, select just those rows for which the actual string in timestamp column starts with '2010-11-03'. Is there a neat way to do this? Can I do it with a mask or Boolean indexing? Or should I just write a separate line to peel off the day from each entry and then select the rows? Bear in mind the dataframe is large if it helps.
i.e. I want to write something like
X.loc[X['timestamp'].startswith('2010-11-03')]
or
mask = '2010-11-03' in X["timestamp"]
but these don't actually make any sense.
This should work:-
X[X['timestamp'].str.startswith('2010-11-03')]