Pandas dropna does not work as expected on a MultiIndex - python

I have a Pandas DataFrame with a multiIndex. The index consists of a date and a text string. Some of the values are NaN and when I use dropna(), the row disappears as expected. However, when I look at the index using df.index, the dropped dates are still there. This is problematic as when I use the to_panel function, the dropped dates reappear.
Am I using dropna incorrectly or how can I resolve this?

I think it is issue 2770.
And solution is decribe here.
index.get_level_values(level)

For me this actually worked :
df1=df1[pd.notnull(df1['Cloumn Name'])]

Related

I need to change the type of few columns in a pandas dataframe. Can't do so using iloc

In a dataframe with around 40+ columns I am trying to change dtype for first 27 columns from float to int by using iloc:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
However, it's not working. I'm not getting any error, but dtype is not changing as well. It still remains float.
Now the strangest part:
If I first change dtype for only 1st column (like below):
df1.iloc[:,0]=df1.iloc[:,0].astype('int')
and then run the earlier line of code:
df1.iloc[:,0:27]=df1.iloc[:,0:27].astype('int')
It works as required.
Any help to understand this and solution to same will be grateful.
Thanks!
I guess it is a bug in 1.0.5. I tested on my 1.0.5. I have the same issue as yours. The .loc also has the same issue, so I guess pandas devs break something in iloc/loc. You need to update to latest pandas or use a workaround. If you need a workaround, using assignment as follows
df1[df1.columns[0:27]] = df1.iloc[:, 0:27].astype('int')
I tested it. Above way overcomes this bug. It will turn first 27 columns to dtype int32
Just don't use iloc. You can just create a loop over the 27 columns and convert them into the data type that you want.
df.info()
my_columns = df.columns.to_list()[0:27]
for i in my_columns:
df[i] = df[i].astype('int32')
df.info()

How to convert "object" columns to "datetime" and keep NaNs as-is

I want to convert these "object' columns to "datetime"
I've tried this
dashboard[['started_at_ahc', 'ended_at_ahc']] = dashboard[['started_at_ahc', 'ended_at_ahc']].apply(pd.to_datetime, erros="coerce")
I want to keep nan values as nan, but the code above converted the nans to Sep 21, 1677 2:17 AM. How can I fix that error; I mean to convert the object to datetime but in the same time keep the nans as nans.
Pass errors='ignore' to the to_datetime function.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
The problem simply was in Streamlit itself. After conversion, I used the command st.write(dashboard[['date_of_birth', 'started_at_ahc', 'ended_at_ahc']]) which fill each NaT value to an initial date I think Streamlit developers use as a default value with NaT values. While using the same logic and also tried your solution #Ismael EL ATIFI in Jupyter Notebook, the results were okay and everything is correct. The problem is only with Streamlit. I've posted an issue and waiting for a reply

Dropping rows in a Data Frame

I am trying to drop some specific rows in a DataFrame df where, the column Time is anything except 06:00:00. I tried the following code but it dosen't seem to work. I even tried adding another column Index to my file to aid the process but still it is not working. Can you please help me. I am attaching the screenshots.
The val just contains the specific time 06:00:00. Also, please ignore the variable req. Thanks a lot.
In pandas, by default drop isn't inplace operation. Try specifying df.drop(j, inplace=True).
Have you tried?
df = df.drop(df[//expresion here//].index)
Or even better:
df = df[~df.a.str.contains("06:00:00")]
Where a is the name of the column you want to search the time in

Pandas Dataframe Sorting by Date Time [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Why does my Pandas DataFrame not display new order using `sort_values`?

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Categories