I am having troubling indexing a dataframe - python

So I have a csv, and I am trying to load it into a dataframe via
df = pd.read_csv("watchlist.csv", sep='\s{2,}',)
It seems to work fine when I print(df)
Also, when I print columns, this is the output I get.
print(df.columns) #- OUTPUT:
Index([',Name,Growth,Recommendation,CurrentRatio,TotalCash,Debt,Revenue,PercentageSharesOut,PercentageInstitutions,PercentageInsiders,PricetoBook,ShortRatio,RegularMarketPrice'], dtype='object')
The trouble I'm having, is that when I try to then go and access a column with something like
med_debt = math.floor(df.Debt), or even
print(df.Debt)
I get an attribute error:
AttributeError: 'DataFrame' object has no attribute 'Debt'
Any assistance here would be appreicated

 sep='\s{2,}' parameter will cause column list to become an object of type string, example:
>>> df = pd.read_csv("weather", sep='\s{2,}')
>>> df.columns
Index(['Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),
Visibility (km),Stn Press (kPa),Weather'], dtype='object')
>>> df.index
RangeIndex(start=0, stop=8784, step=1)
 When you try to access a specific column math.floor(df.Debt) it returns
AttributeError: 'DataFrame' object has no attribute 'Debt'
or maybe df["Debt"]
raise KeyError(key) from err
(KeyError: 'Debt')
 To have access on specific columns of df by this way, use:
df = pd.read_csv("watchlist.csv")

The separator is not separating the csv correctly, try leaving it out and letting the csv reader use the default value of , instead.

Related

Expected unicode, got pandas._libs.properties.CachedProperty

I,m trying to add empty column in my dataset on colab but it give me this error. and when I,m trying to run it on my local machine it works perfectly fine. does anybody know possible solution for this?
My code.
dataframe["Comp"] = ''
dataframe["Negative"] = ''
dataframe["Neutral"] = ''
dataframe["Positive"] = ''
dataframe
Error message
TypeError: Expected unicode, got pandas._libs.properties.CachedProperty
I run into similar issue today.
"Expected unicode, got pandas._libs.properties.CachedProperty"
my dataframe(called df) has timeindex. When add a new column to it, and fill with numpy.array data, it raise this error. I tried set it with df.index or df.index.value. It always raise this error.
Finally, I solved by 3 stesp:
df = df.reset_index()
df['new_column'] = new_column_data # it is np.array format
df = df.set_index('original_index_name')
WY
this Quetion is the same as https://stackoverflow.com/a/67997139/16240186, and there's a simple way to solve it: df = df.asfreq('H') # freq can be min\D\M\S\5min etc.

Index to column

I tried to convert my Index to a column. But I get the Error: AttributeError: 'DataFrame' object has no attribute 'reset_Seriennummer' It should be simple but it doesn't work.
My Index is ot called Index but it is written the same way:
My df:
Seriennummer 0
701085.0 "(array([1.52558046e+03, 2.55900548e+02, 5.96901108e-01]), array([[ 9.41414894e+03, -2.07982124e+03, -2.30130078e+00],
[-2.07982124e+03, 1.44373786e+03, 9.59282709e-01],
[-2.30130078e+00, 9.59282709e-01, 7.75807643e-04]]))"
701086.0 "(array([1.19304206e+03, 2.71174688e+02, 6.59205468e-01]), array([[ 5.21906135e+03, -2.23855187e+03, -2.11896425e+00],
[-2.23855187e+03, 2.61036500e+03, 1.67396324e+00],
[-2.11896425e+00, 1.67396324e+00, 1.22581746e-03]]))"
What I tried so far:
df['Seriennummer'] = df.Seriennummer
or
df.reset_Seriennummer(level=0, inplace=True)
This will work:
df.reset_index(level='Seriennummer')

'DataFrame' object is not callable error when I try to create a new df

I try to create a new df out of df_exo, however the error I get is 'DataFrame' object is not callable. df_exo is a DataFrame with 176, 1222 size. What is going wrong?
df_features = df_exo(['INDU.NL.INTM.1.BS.M', 'INDU.NL.CONS.1.BS.M_4',\
'INDU.NL.INTM.2.BS.M', 'INDU.NL.INTM.3.BS.M_12', 'INDU.NL.CONS.4.BS.M_10',\
'INDU.NL.INTM.COF.BS.M_3', 'INDU.NL.INTM.COF.BS.M_4', 'INDU.NL.INVE.5.BS.M_11',\
'INDU.NL.FOBE.7.BS.M_4', 'INDU.NL.TOT.1.BS.M_1', 'INDU.NL.TOT.6.BS.M_4',\
'INDU.NL.INTM.2.BS.M', 'SERV.NL.TOT.2.BS.M', 'SERV.NL.TOT.3.BS.M',\
'SERV.NL.TOT.1.BS.M_2', 'SERV.NL.TOT.1.BS.M_3', 'SERV.NL.TOT.3.BS.M_1',\
'SERV.NL.TOT.3.BS.M_2', 'SERV.NL.TOT.COF.BS.M_7', 'CONS.NL.TOT.7.BS.M',\
'CONS.NL.TOT.6.BS.M_12', 'CONS.NL.TOT.7.BS.M_1', 'CONS.NL.TOT.7.BS.M_2',\
'CONS.NL.TOT.7.BS.M_12', 'BUIL.NL.TOT.3.BS.M_12'])
use
df_features = df_exo[['col1', 'col2']]
not
df_features = df_exo(['col1', 'col2'])
Reference:
Selecting multiple columns in a pandas dataframe

Pandas Python user input an attribute for dataframe object

I'm trying to allow the user to input the attribute for the dataframe object.
I've tried changing my input into a string. I've also tried using my input saved to a variable. Both these options do not work.
data = pd.read_csv('2019FallEnrollees.csv')
input1_col = input("Enter comparison group A: ")
input2_col = input("Enter comparison group B: ")
input1_str= str(input1_col)
input2_str = str(input2_col)
test = data[['CUM_GPA', input1_str, input2_str]]
# error here! 'test' does not have attribute 'input1_str' or 'input1_col'
df_1 = test[(test.input1_str == 0) & (test.input2_str == 0)]
df_2 = test[(test.input1_col == 1) & (test.input2_col == 0)]
print(stats.ttest_ind(df_1.CUM_GPA, df_2.CUM_GPA, equal_var = False))
The error messages says
"AttributeError: 'DataFrame' object has no attribute 'input1_str'
or
"AttributeError: 'DataFrame' object has no attribute 'input1_col'
Welcome!
To access a column in pandas you cannot use data.column
Try data[column] or in your case test[input1_col]
Before you do so, make sure the column does exist and the user is not inputting a nonexistant column.
Sometimes the column name can be an integer and converting to a string may also be a concern
You can get a list of all the dataframe columns through running data.columns (if you want a regular array: list(data.columns)) and infact you can alter the column names through running data.columns = ["Column Header 1" , "Column Header 2" etc.]

How to select seperate string values from an array and use them as column names in a pandas dataframe Python?

I have following array containing string values:
type(array)
pandas.core.indexes.base.Index
print(array)
Index(['hooiland_1_1', 'hooiland_1_2', 'hooiland_1_3', 'hooiland_1_4',
'roggeteelt_1_1', 'roggeteelt_1_2', 'roggeteelt_1_3', 'roggeteelt_1_4',
'zwartebraak_1_1', 'zwartebraak_1_2', 'zwartebraak_1_3',
'zwartebraak_1_4', 'hooiland_2_1', 'hooiland_2_2', 'hooiland_2_3',
'hooiland_2_4', 'roggeteelt_2_1', 'roggeteelt_2_2', 'roggeteelt_2_3',
'roggeteelt_2_4', 'zwartebraak_2_1', 'zwartebraak_2_2',
'zwartebraak_2_3', 'zwartebraak_2_4'],
dtype='object')
I want to use each of this string values of this array as a separate column name in an empty pandas dataframe.
tried:
pd.DataFrame(columns=["class",array]) ###first column is "class"
gives error: Invalid syntax
Also tried to split the array but without success:
array.split()
gives error :AttributeError: 'Index' object has no attribute 'split'
Is there an easy way to do this?
Expected output:
dataframe:
class hooiland_1_1 hooiland_1_2 hooiland_1_3 .... zwartebraak_2_4
class1 value value value value
class2
class3
I add the values later on the process but I first need to construct the empty dataframe with the correct column names.
pd.DataFrame(columns=["class"] + array.tolist())
You need Index.insert
idx_arr = idx_arr.insert(0, 'class')
Out[444]:
Index(['class', 'hooiland_1_1', 'hooiland_1_2', 'hooiland_1_3', 'hooiland_1_4',
'roggeteelt_1_1', 'roggeteelt_1_2', 'roggeteelt_1_3', 'roggeteelt_1_4',
'zwartebraak_1_1', 'zwartebraak_1_2', 'zwartebraak_1_3',
'zwartebraak_1_4', 'hooiland_2_1', 'hooiland_2_2', 'hooiland_2_3',
'hooiland_2_4', 'roggeteelt_2_1', 'roggeteelt_2_2', 'roggeteelt_2_3',
'roggeteelt_2_4', 'zwartebraak_2_1', 'zwartebraak_2_2',
'zwartebraak_2_3', 'zwartebraak_2_4'],
dtype='object')
pd.DataFrame(columns=idx_arr)
Out[447]:
Empty DataFrame
Columns: [class, hooiland_1_1, hooiland_1_2, hooiland_1_3, hooiland_1_4, roggete
elt_1_1, roggeteelt_1_2, roggeteelt_1_3, roggeteelt_1_4, zwartebraak_1_1, zwarte
braak_1_2, zwartebraak_1_3, zwartebraak_1_4, hooiland_2_1, hooiland_2_2, hooilan
d_2_3, hooiland_2_4, roggeteelt_2_1, roggeteelt_2_2, roggeteelt_2_3, roggeteelt_
2_4, zwartebraak_2_1, zwartebraak_2_2, zwartebraak_2_3, zwartebraak_2_4]
Index: []
Your code pd.DataFrame(columns=["class",array]) doesn't generate a list and hence the invalid syntax.
If you want to add "class" to the beginning of the index, you could try:
pd.DataFrame(columns=array.insert(0, 'class')) # adding 'class' to the beginning of the index
Btw you may want to avoid using array as your variable name...

Categories