Grouping by two columns and creating a DataFrame from the result gives me this multiindex table. I didn't manage to access an element from it as described in the documentation. The access fails with KeyError: ('110166987', 'Direct Mail'). What am I doing wrong here?
As a second question, can I somehow pivot this DataFrame so that the second index variable "Channel" becomes the columns?
Let's try:
df.loc[('110166987','Direct Mail')]
Related
While running some code like this:
session = ...
return session.table([DB,SCHEMA, MANUAL_METRICS_BY_SIZE]).select("TECHNOLOGY","OBJECTTYPE","OBJECTTYPE","SIZE","EFFORT").to_pandas()
I got this error.
Any idea of what might be causing this?
Well it was easier that what I thought.
I had a duplicated column name and pandas doesn't like that.
Just check your columns. For example with df.columns and remove the duplicated column
I'm new to python and just trying to redo my first project from matlab. I've written a code in vscode to import an excel file using pandas
filename=r'C:\Users\user\Desktop\data.xlsx'
sheet=['data']
with pd.ExcelFile(filename) as xls:
Dateee=pd.read_excel(xls, sheet,index_col=0)
Then I want to access data in a row and column.
I tried to print data using code below:
for key in dateee.keys():
print(dateee.keys())
but this returns nothing.
Is there anyway to access the data (as a list)?
You can iterate on each column, making the contents of each a list:
for c in df:
print(df[c].to_list())
df is what the dataframe was assigned as. (OP had inconsistent syntax & so I didn't use that.)
Look into df.iterrows() or df.itertuples() if you want to iterate by row. Example:
for row in df.itertuples():
print(row)
Look into df.iloc and df.loc for row and column selection of individual values, see Pandas iloc and loc – quickly select rows and columns in DataFrames.
Or df.iat or df.at for getting or setting single values, see here, here, and here.
Original dataframe
Converted Dataframe using stack and split:
Adding new column to a converted dataframe:
What i am trying to is add a new column using np.select(condition, values) but it not updating the two addition rows derived from H1 its returning with 0 or NAN. Can someone please help me here ?
Please note i have already done the reset index but still its not helping.
I think using numpy in this situation is kind of unnecessary.
you can use something like the following code:
df[df.State == 'CT']['H3'] = 4400000
I have a dataframe with a column I want to explode ("genre") and then reorganize the dataframe because I get many duplicates.
I also don't want to lose information. I get the following dataframe after using split and explode:
https://i.stack.imgur.com/eVWzg.png
I want to get a dataframe without the duplicates but keeping the genre column as it is. I thought of stacking it or making it multiindex but how should I proceed?
This database is not exactly what I'm using but it's similar in the column I want to work with:
https://www.kaggle.com/PromptCloudHQ/imdb-data
I`ve a large dataframe, Im trying to do a simple multipication between two columns and put the results in new column when I do that I'm getting this error message :
SettingWithCopyWarning : a value is trying to be set on a copy of a slice from a dataframe.
my code looks like this :
DF[‘mult‘]=DF[‘price‘]*DF[‘rate‘]
I Tried loc but didnt work .. does anyone have a solution ?
You should use df.assign() in this case:
df2 = DF.assign(mult=DF[‘price‘]*DF[‘rate‘])
You get back a new dataframe with a 'mult' column added.