Changing Values in multiindex pandas dataframe - python

I have loaded a multiindex matrix from excel to a panda dataframe.
df=pd.read_excel("filename.xlsx",sheet_name=sheet,header=1,index_col=[0,1,2],usecols="B:AZ")
The dataframe has three index colums and one header, so it has 4 indices. Part of the dataframe looks like this:
When I want to show a particular value, it works like this:
df[index1][index2][index3][index4]
I now want to change certain values in the dataframe. After searching through different forums for a while, it seemed like df.at would be the right method to do this. So I tried this:
df.at[index1,index2,index3,index4]=10 (or any random value)
but I got a ValueError: Not enough indexers for scalar access (setting)!. I also tried different orders for the indices inside the df.at-Brackets.
Any help on this would be much appreciated!

It seems you need something like that:
df.loc[('Kosten','LKW','Sofia'),'Ruse']=10

Related

Unable to update new column values in rows which were derived from existing column having multiple values separeted by ','?

Original dataframe
Converted Dataframe using stack and split:
Adding new column to a converted dataframe:
What i am trying to is add a new column using np.select(condition, values) but it not updating the two addition rows derived from H1 its returning with 0 or NAN. Can someone please help me here ?
Please note i have already done the reset index but still its not helping.
I think using numpy in this situation is kind of unnecessary.
you can use something like the following code:
df[df.State == 'CT']['H3'] = 4400000

What's the fastest way to do these tasks?

I originally have some time series data, which looks like this and have to do the following:
First import it as dataframe
Set date column as datetime index
Add some indicators such as moving average etc, as new columns
Do some rounding (values of the whole column)
Shift a column one row up or down (just to manipulate the data)
Then convert the df to list (because I need to loop it based on some conditions, it's a lot faster than looping a df because I need speed)
But now I want to convert df to dict instead of list because I want to keep the column names, it's more convenient
But now I found out that convert to dict takes a lot longer than list. Even I do it manually instead of using python built-in method.
My question is, is there a better way to do it? Maybe not to import as dataframe in the first place? And still able to do Point 2 to Point 5? At the end I need to convert to dict which allows me to do the loop, keep the column names as keys? THanks.
P.S. the dict should look something like this, the format is similar to df, each row is basically the date with the corresponding data.
On item #7: If you want to convert to a dictionary, you can use df.to_dict()
On item #6: You don't need to convert the df to a list or loop over it: Here are better options. Look for the second answer (it says DON'T)

Why is the `df.columns` an empty list while I can see the column names if I print out the dataframe? Python Pandas

import pandas as pd
DATA = pd.read_csv(url)
DATA.head()
I have a large dataset that have dozens of columns. After loading it like above into Colab, I can see the name of each column. But running DATA.columns just return Index([], dtype='object'). What's happening in this?
Now I find it impossible to pick out a few columns without column names. One way is to specify names = [...] when I load it, but I'm reluctant to do that since there're too many columns. So I'm looking for a way to index a column by integers, like in R df[:,[1,2,3]] would simply give me the first three columns of a dataframe. Somehow Pandas seems to focus on column names and makes integer indexing very inconvenient, though.
So what I'm asking is (1) What did I do wrong? Can I obtain those column names as well when I load the dataframe? (2) If not, how can I pick out the [0, 1, 10]th column by a list of integers?
It seems that the problem is in the loading as DATA.shape returns (10000,0). I rerun the loading code a few times, and all of a sudden, things go back normal. Maybe Colab was taking a nap or something?
You can perfectly do that using df.loc[:,[1,2,3]] but i would suggest you to use the names because if the columns ever change the order or you insert new columns, the code can break it.

multiplication of two columns in dataframe SettingWithCopyWarning

I`ve a large dataframe, Im trying to do a simple multipication between two columns and put the results in new column when I do that I'm getting this error message :
SettingWithCopyWarning : a value is trying to be set on a copy of a slice from a dataframe.
my code looks like this :
DF[‘mult‘]=DF[‘price‘]*DF[‘rate‘]
I Tried loc but didnt work .. does anyone have a solution ?
You should use df.assign() in this case:
df2 = DF.assign(mult=DF[‘price‘]*DF[‘rate‘])
You get back a new dataframe with a 'mult' column added.

Python Pandas Dataframe Pulling cell value of Column B based on Column A

struggling here. Probably missing something incredibly easy, but beating my head on my desk while trying to learn Python and realizing that's probably not going to solve this for me.
I have a dataframe df and need to pull the value of column B based on the value of column A.
Here's what I can tell you of my dataset that should make it easier. Column A is unique (FiscalYear) but despite being a year was converted to_numeric. Column B is not specifically unique (Sales) and like Column A was converted to to_numeric. This is what I have been trying as I was able to do this when finding the value of sales using idx max. However at a specific value, this is returning an error:
v = df.at[df.FiscalYear == 2007.0, 'Sales']
I am getting ValueError: At based indexing on an integer index can only have integer indexers I am certain that I am doing something wrong, but I can't quite put my finger on it.
And here's the code that is working for me.
v = df.at[df.FiscalYear.idxmax(), 'Sales']
No issues there, returning the proper value, etc.
Any help is appreciated. I saw a bunch of similar threads, but for some reason searching and blindly writing lines of code is failing me tonight.
you can use .loc method
df.Sales.loc[df.FiscalYear==2007.0]
this will be pandas series type object.
if you want it in a list you can do:
df.Sales.loc[df.FiscalYear==2007.0].tolist()
Can you try this:
v = df.at[df.FiscalYear.eq(2007.0).index[0], 'Sales']

Categories