Dropping rows in a Data Frame - python

I am trying to drop some specific rows in a DataFrame df where, the column Time is anything except 06:00:00. I tried the following code but it dosen't seem to work. I even tried adding another column Index to my file to aid the process but still it is not working. Can you please help me. I am attaching the screenshots.
The val just contains the specific time 06:00:00. Also, please ignore the variable req. Thanks a lot.

In pandas, by default drop isn't inplace operation. Try specifying df.drop(j, inplace=True).

Have you tried?
df = df.drop(df[//expresion here//].index)
Or even better:
df = df[~df.a.str.contains("06:00:00")]
Where a is the name of the column you want to search the time in

Related

SnowparkFetchDataException: (1406): Failed to fetch a Pandas Dataframe. The error is: Found non-unique column index

While running some code like this:
session = ...
return session.table([DB,SCHEMA, MANUAL_METRICS_BY_SIZE]).select("TECHNOLOGY","OBJECTTYPE","OBJECTTYPE","SIZE","EFFORT").to_pandas()
I got this error.
Any idea of what might be causing this?
Well it was easier that what I thought.
I had a duplicated column name and pandas doesn't like that.
Just check your columns. For example with df.columns and remove the duplicated column

Ask Pandas to delete all rows beneath a certain row

I have imported an Excel file as a dataframe using pandas.
I now need to delete all rows from row 41,504 (index 41,505) and below.
I have tried df.drop(df.index[41504]), although that only catches the one row. How do I tell Pandas to delete onwards from that row?
I did not want to delete by an index range as the dataset has tens of thousands of rows, and I would prefer not to scroll through the whole thing.
Thank you for your help.
Kind regards
df.drop(df.index[41504:])
Drop the remaining range. If you don't mind creating a new df, then use a filter, keeping rows [:41594].
You can reassign the range you do want back into the variable instead of removing the range you do not want.
You can just get the first rows you that you need, ignoring all the rest:
result=df[:41504]
df = df.iloc[:41504]
just another way

Why is the `df.columns` an empty list while I can see the column names if I print out the dataframe? Python Pandas

import pandas as pd
DATA = pd.read_csv(url)
DATA.head()
I have a large dataset that have dozens of columns. After loading it like above into Colab, I can see the name of each column. But running DATA.columns just return Index([], dtype='object'). What's happening in this?
Now I find it impossible to pick out a few columns without column names. One way is to specify names = [...] when I load it, but I'm reluctant to do that since there're too many columns. So I'm looking for a way to index a column by integers, like in R df[:,[1,2,3]] would simply give me the first three columns of a dataframe. Somehow Pandas seems to focus on column names and makes integer indexing very inconvenient, though.
So what I'm asking is (1) What did I do wrong? Can I obtain those column names as well when I load the dataframe? (2) If not, how can I pick out the [0, 1, 10]th column by a list of integers?
It seems that the problem is in the loading as DATA.shape returns (10000,0). I rerun the loading code a few times, and all of a sudden, things go back normal. Maybe Colab was taking a nap or something?
You can perfectly do that using df.loc[:,[1,2,3]] but i would suggest you to use the names because if the columns ever change the order or you insert new columns, the code can break it.

Unable to rename and remove pandas Index - Python

I have a dataframe which is like as shown below
Though I know the column names are 'FR', 'ig' and 'te' with the help of below command.
dataFramesDict['Tri'].columns
What does name = 'level_1' mean here? Moreover, I also don't see subject_ID in the columns or index list. What is subject_ID here?
How do I get the output to be like as shown below
I tried the below code to rename 'level_1' to 'subject_ID' but it doesn't work
dataFramesDict['Tri'].index = dataFramesDict['Tri'].index.rename('subject_ID')
Please note that the data is just a sample data. I am only interested in changing the first column name and dropping that 'level_1'. Nothing to do with data
I am unable to create dataframe of this type through sample code. The above shown dataframe is a result of another complex code. So, I have provided a screenshot of dataframe
Try this
df.columns.name= ''
df.reset_index(inplace=True)

Pandas dropna does not work as expected on a MultiIndex

I have a Pandas DataFrame with a multiIndex. The index consists of a date and a text string. Some of the values are NaN and when I use dropna(), the row disappears as expected. However, when I look at the index using df.index, the dropped dates are still there. This is problematic as when I use the to_panel function, the dropped dates reappear.
Am I using dropna incorrectly or how can I resolve this?
I think it is issue 2770.
And solution is decribe here.
index.get_level_values(level)
For me this actually worked :
df1=df1[pd.notnull(df1['Cloumn Name'])]

Categories