Pandas Python user input an attribute for dataframe object - python

I'm trying to allow the user to input the attribute for the dataframe object.
I've tried changing my input into a string. I've also tried using my input saved to a variable. Both these options do not work.
data = pd.read_csv('2019FallEnrollees.csv')
input1_col = input("Enter comparison group A: ")
input2_col = input("Enter comparison group B: ")
input1_str= str(input1_col)
input2_str = str(input2_col)
test = data[['CUM_GPA', input1_str, input2_str]]
# error here! 'test' does not have attribute 'input1_str' or 'input1_col'
df_1 = test[(test.input1_str == 0) & (test.input2_str == 0)]
df_2 = test[(test.input1_col == 1) & (test.input2_col == 0)]
print(stats.ttest_ind(df_1.CUM_GPA, df_2.CUM_GPA, equal_var = False))
The error messages says
"AttributeError: 'DataFrame' object has no attribute 'input1_str'
or
"AttributeError: 'DataFrame' object has no attribute 'input1_col'

Welcome!
To access a column in pandas you cannot use data.column
Try data[column] or in your case test[input1_col]
Before you do so, make sure the column does exist and the user is not inputting a nonexistant column.
Sometimes the column name can be an integer and converting to a string may also be a concern
You can get a list of all the dataframe columns through running data.columns (if you want a regular array: list(data.columns)) and infact you can alter the column names through running data.columns = ["Column Header 1" , "Column Header 2" etc.]

Related

How to extract part of a rows data and add it to a new column:?

df_projects_1 is a dataframe
I want to extract the author name which is in part of each row
df_projects_1["current_status"][0]["author"]['name'] works and gets the author
For each row that I am extracting. I am adding it to a new column names "Author Name"
i = 0
while i < len(df_projects_1):
df_projects_1["Author_Name"] = df_projects_1["current_status"][i]["author"]['name']
i = i+1
df_projects_1
This returns the error 'NoneType' object is not subscriptable
The variable i here is is causing the issue.

I am having troubling indexing a dataframe

So I have a csv, and I am trying to load it into a dataframe via
df = pd.read_csv("watchlist.csv", sep='\s{2,}',)
It seems to work fine when I print(df)
Also, when I print columns, this is the output I get.
print(df.columns) #- OUTPUT:
Index([',Name,Growth,Recommendation,CurrentRatio,TotalCash,Debt,Revenue,PercentageSharesOut,PercentageInstitutions,PercentageInsiders,PricetoBook,ShortRatio,RegularMarketPrice'], dtype='object')
The trouble I'm having, is that when I try to then go and access a column with something like
med_debt = math.floor(df.Debt), or even
print(df.Debt)
I get an attribute error:
AttributeError: 'DataFrame' object has no attribute 'Debt'
Any assistance here would be appreicated
 sep='\s{2,}' parameter will cause column list to become an object of type string, example:
>>> df = pd.read_csv("weather", sep='\s{2,}')
>>> df.columns
Index(['Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),
Visibility (km),Stn Press (kPa),Weather'], dtype='object')
>>> df.index
RangeIndex(start=0, stop=8784, step=1)
 When you try to access a specific column math.floor(df.Debt) it returns
AttributeError: 'DataFrame' object has no attribute 'Debt'
or maybe df["Debt"]
raise KeyError(key) from err
(KeyError: 'Debt')
 To have access on specific columns of df by this way, use:
df = pd.read_csv("watchlist.csv")
The separator is not separating the csv correctly, try leaving it out and letting the csv reader use the default value of , instead.

Pandas nan not treated as string

I have a csv file containing multiuple crypto currencies like this. Column 2 to 4 is id, symbol, name.
6201,nano-dogecoin,indc,Nano Dogecoin
6202,nano-shiba-inu,NanoShiba,Nano Shiba Inu
6203,nantrade,nan,NanTrade
6204,naos-finance,naos,NAOS Finance
6205,napoleon-x,npx,Napoleon X
I have a function where i get ids by symbols like this:
def symbols_to_ids(self, symbols):
ids = []
df = pd.read_csv(os.getcwd() + "/Backtester/Results/Misc/allcoins.csv")
for index, row in df.iterrows():
for symbol in symbols:
if str(row["symbol"].lower()) == str(symbol.lower()):
ids.append(row["id"])
return ids
However i get an error because the one of the symbols is nan. I am pretty sure it gets treated as a float, since this error gets thrown when the row symbol is nan:
if str(row["symbol"].lower()) == str(symbol.lower()):
AttributeError: 'float' object has no attribute 'lower'
I tried to convert it to string, but it does not work. I think this could be solved in pandas, but I dont know how.
You're calling the method before converting. Move .lower to outside conversion:
if str(row["symbol"]).lower() == str(symbol).lower():

How to use re.search column in dataframe

This code works for single string (inputx) but I can't get it to work when I replace it with the name of the column in my dataframe. What I want to do is split the string in column DESC where the capitalized words (at beginning of string) is place into column break2 and the remainder of the description is placed in column break3. Any assistance is appreciated. Thanks.
Example:
What I want output to look like (but with the different DESC from each row
Code that works for hardcoded string:
inputx= "STOCK RECORD INQUIRY This is a system that keeps track of the positions, location and ownership of the securities that the broker holds"
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]
But if I replace with dataframe column, I get error message: TypeError: expected string or bytes-like object
inputx = Before_df['DESC']
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]
You can use regex in the df.str.split method
df[['result','result2','result3']] = df['yourcol'].str.split("([a-z])", expand= True)
If you absolutely must use re.search (which sounds a little like homework...)
for i in df.index:
df.at[i, 'columnName'] = re.search("[a-z]", df.at[i, 'inputColumn'][::1]).start()
The reason for looping instead of using df.apply() is because dataframes do not like to be changed during an apply

ValueError: DataFrame constructor not properly called

I am trying to create a dataframe with Python, which works fine with the following command:
df_test2 = DataFrame(index = idx, data=(["-54350","2016-06-25T10:29:57.340Z","2016-06-25T10:29:57.340Z"]))
but, when I try to get the data from a variable instead of hard-coding it into the data argument; eg. :
r6 = ["-54350", "2016-06-25T10:29:57.340Z", "2016-06-25T10:29:57.340Z"]
df_test2 = DataFrame(index = idx, data=(r6))
I expect this is the same and it should work? But I get:
ValueError: DataFrame constructor not properly called!
Reason for the error:
It seems a string representation isn't satisfying enough for the DataFrame constructor
Fix/Solutions:
import ast
# convert the string representation to a dict
dict = ast.literal_eval(r6)
# and use it as the input
df_test2 = DataFrame(index = idx, data=(dict))
which will solve the error.

Categories