AttributeError: 'DataFrame' object has no attribute 'group' - python

I'm clueless about this error.
First I try
import pandas as pd
datafile = "E:\...\DPA.xlsx"
data = pd.read_excel(datafile)
data
And everything is fine. Then...
data.boxplot('DPA', by='Liga', figsize=(12, 8))
Everything keeps going fine. then...
ctrl = data['DPA'][data.group == 'PremierLeague']
grps = pd.unique(data.group.values)
d_data = {grp:data['DPA'][data.group == grp] for grp in grps}
k = len(pd.unique(data.group)) # number of conditions
N = len(data.values) # conditions times participants
n = data.groupby('Liga').size()[0] #Participants in each condition
And here is when I get this error:
AttributeError: 'DataFrame' object has no attribute 'group'
Any ideas? I'm following this steps https://www.marsja.se/four-ways-to-conduct-one-way-anovas-using-python/ to make an ANOVA.
Thank you.

DataFrame has no attribute group. However, it is possible to access data in a column in your dataframe with the same syntax used to access attributes and methods, i.e. if you have a column col, you may access the series related to this column through
df.col
What happened here is that your data is probably different from what she used in the tutorial. Or at least, the columns she has are different than the columns you have.
To solve that problem, you can either (I) simply rename your columns to match the columns from the tutorial or (II) replace data.group with the corresponding column name that you have in your df

Related

Drop/edit rows in dataframe where entry doesn't meet condition

I know this has been asked before but I cannot find an answer that is working for me. I have a dataframe df that contains a column age, but the values are not all integers, some are strings like 35-59. I want to drop those entries. I have tried these two solutions as suggested by kite but they both give me AttributeError: 'Series' object has no attribute 'isnumeric'
df.drop(df[df.age.isnumeric()].index, inplace=True)
df = df.query("age.isnumeric()")
df = df.reset_index(drop=True)
Additionally is there a simple way to edit the value of an entry if it matches a certain condition? For example instead of deleting rows that have age as a range of values, I could replace it with a random value within that range.
Try with:
df.drop(df[df.age.str.isnumeric() == False].index, inplace=True)
If you check documentation isnumeric is a method of Series.str and not of Series. That's why you get that error.
Also you will need the ==False because you have mixed types and get a series with only booleans.
I'm posting it in case this also helps you with your last question. You can use pandas.DataFrame.at with pandas.DataFrame.Itertuples for iteration over rows of the dataframe and replace values:
for row in df.itertuples():
# iterate every row and change the value of that column
if row.age == 'non_desirable_value:
df.at[row.Index, "age"] = 'desirable_value'
Hence, it could be:
for row in df.itertuples():
if row.age.str.isnumeric() == False or row.age == 'non_desirable_value':
df.at[row.Index, "age"] = 'desirable_value'

Renaming pandas columns gives not found in index error

I have a data frame called v where columns are = ['self','id','desc','name','arch','rel']. And when I rename is as follows it won't let me drop columns giving column not found in axis error.
case1:
for i in range(0,len(v.columns)):
#I'm trying to add 'v_' prefix to all col names
v.columns.values[i] = 'v_' + v.columns.values[i]
v.drop('v_self',1)
#leads to error
KeyError: "['v_self'] not found in axis"
But if I do it as follows then it works fine
case2:
v.columns = ['v_self','v_id','v_desc','v_name','v_arch','v_rel']
v.drop('v_self',1)
# no error
In both cases if I do following it give same results for its columns
v.columns
#both cases gives
Index(['v_self', 'v_id', 'v_description', 'v_name', 'v_archived',
'v_released'],
dtype='object')
I can't understand why in the case1 it gives an error? Please help, thanks.
That's because .values returns the underlying values. You're not supposed to change those directly. Assigning directly to .columns is supported though.
Try something like this:
import pandas
df = pandas.DataFrame(
[
{key: 0 for key in ["self", "id", "desc", "name", "arch", "rel"]}
for _ in range(100)
]
)
# Add a v_ to every column
df.columns = [f"v_{column}" for column in df.columns]
# Drop one column
df = df.drop(columns=["v_self"])
To your "case 1":
You meet a bug (#38547) in pandas — “Direct renaming of 1 column seems to be accepted, but only old name is working”.
It means that after that "renaming", you may delete the first column
not by using
v.drop('v_self',1)
but using the old name
v.drop('self',1)`.
Of course, the better option is not using such a buggy renaming in the
current versions of pandas.
To renaming columns by adding a prefix to every label:
There is a direct dateframe method .add_prefix() for it, isn't it?
v = df.add_prefix("v_")

Excel Column Converter with a specific Column Does not works

I tried to code the program that allows the user enter the column and sort the column and replace the cell to the other entered information but I probably get syntact errors
I tried to search but I could not find any solution
import pandas as pd
data = pd.read_csv('List')
df = pd.DataFrame(data, columns = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O'])
findL = ['example']
replaceL = ['convert']
col = 'C';
df[col] = df[col].replace(findL, replaceL)
TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str'
I seems that your df[col] and findLand replaceLdo not have the same datatype. Try to run df[col] = df[col].astype(str) beofre you run df[col]=df[col].replace(findL, replaceL)and it should work
If the column/s you are dealing with has blank entries in it, you have to specify the na_filter parameter in .read_csv() method to be False.
That way, it will take all the column entries with blank/empty values as str and thus the not empty ones as str as well.
Doing the .replace() method using this will not give a TypeError as you will be parsing through both columns as strings and not 'ndarray(dtype=float64) and str.

How to use df.loc and if condtions in python pandas to delete a row

I wanted to use the if condition and df.loc[..] to compare two values in the same column.
If the previous value is higher then next one, I want to delete the complete row.
This what I tried and my example:
import pandas as pd
data = [('cycle',[1,1,2,2,3,3,4,4]),
('A',[0.1,0.5,0.2,0.6,0.15,0.43,0.13,0.59]),
('B',[ 500, 600, 510,580,512,575,499,598]),
('time',[0.0,0.2,0.5,0.4,0.6,0.7,0.5,0.8]),]
df = pd.DataFrame.from_items(data)
df = df.drop(df.loc[i,'time']<df.loc[i-1,'time'].index)
print(df)
and I got the following error :
TypeError: '<' not supported between instances of 'numpy.ndarray' and
'str'
Help is very is much appreciated
Try this:
df.drop(df.loc[df.time< df.time.shift()].index, inplace=True)
One problem is you are applying .index on the second df, before the comparison. You might try something like this:
df = df.drop((df.loc[i,'time'] < df.loc[i-1,'time']).index)
Try using pd.DataFrame.shift
Using shift:
df[df.time > df.time.shift()]
df.time.shift will return the original series where the index has been incremented by 1, so you are able to compare it to the original series. Each value will be compared to the one immediately below it. You can also set the fill_value parameter to determine the behavior of the first index:
df[df.time > df.time.shift(fill_value=0)]

Using pandas apply() function on a dataframe to create a new dataframe

I have a problem annoying me for some time now. I have written a function that should, based on the row values of a dataframe, create a new dataframe filled with values based on a condition in the function. My function looks like this:
def intI():
df_ = pd.DataFrame()
df_ = df_.fillna(0)
for index, row in Anno.iterrows():
genes=row['AR_Genes'].split(',')
df=pd.DataFrame()
if 'intI1' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
elif 'intI2' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
else:
df['Year']=row['Year']
df['Integrase']= 0
df_=df_.append(df)
return df_
when I call it like this Newdf=Anno['AR_Genes'].apply(intI()), I get the following error:
TypeError: 'DataFrame' object is not callable
I really do not understand why it does not work. I have done similar things before, but there seems to be a difference that I do not get. Can anybody explain what is wrong here?
*******************EDIT*****************************
Anno in the function is the dataframe that the function shal be run on. It contains a string, for example a,b,c,ad,c
DataFrame.apply takes a function which applies to all rows/columns of the DataFrame. That error occurs because your function returns a DataFrame which you then pass to apply.
Why do you do use .fillna(0) on a newly created, empty, DataFrame?
Would not this work? Newdf = intI()

Categories