Pandas nan not treated as string

Pandas nan not treated as string - python

I have a csv file containing multiuple crypto currencies like this. Column 2 to 4 is id, symbol, name.
6201,nano-dogecoin,indc,Nano Dogecoin
6202,nano-shiba-inu,NanoShiba,Nano Shiba Inu
6203,nantrade,nan,NanTrade
6204,naos-finance,naos,NAOS Finance
6205,napoleon-x,npx,Napoleon X
I have a function where i get ids by symbols like this:
def symbols_to_ids(self, symbols):
ids = []
df = pd.read_csv(os.getcwd() + "/Backtester/Results/Misc/allcoins.csv")
for index, row in df.iterrows():
for symbol in symbols:
if str(row["symbol"].lower()) == str(symbol.lower()):
ids.append(row["id"])
return ids
However i get an error because the one of the symbols is nan. I am pretty sure it gets treated as a float, since this error gets thrown when the row symbol is nan:
if str(row["symbol"].lower()) == str(symbol.lower()):
AttributeError: 'float' object has no attribute 'lower'
I tried to convert it to string, but it does not work. I think this could be solved in pandas, but I dont know how.

You're calling the method before converting. Move .lower to outside conversion:
if str(row["symbol"]).lower() == str(symbol).lower():

Related

Find the mode value of a column via a CSV file

I would like to sort allocate the mode value of the given column from a CSV file.
The code I've tried:
def mode_LVL(self):
data = pd.read_csv('highscore.csv', sep=',')
mode_lvl = data["LVL"].mode()
return mode_lvl
Results in:
The mode value of LVL: 0 6
dtype: int64
I would like the mode value only, not wanting the 0 and dtype.
I have attempted to resolve by, but failed:
mode_lvl = data.mode(axis = 'LVL', numeric_only=True )
Sorry I know that this issue may be simple to solve, but I've had issues searching for the right solution.

Here is necessary seelct first value of mode, because possible mode return multiple values if same count of top categories:
mode_lvl = data["LVL"].mode().iat[0]

Pandas Python user input an attribute for dataframe object

I'm trying to allow the user to input the attribute for the dataframe object.
I've tried changing my input into a string. I've also tried using my input saved to a variable. Both these options do not work.
data = pd.read_csv('2019FallEnrollees.csv')
input1_col = input("Enter comparison group A: ")
input2_col = input("Enter comparison group B: ")
input1_str= str(input1_col)
input2_str = str(input2_col)
test = data[['CUM_GPA', input1_str, input2_str]]
# error here! 'test' does not have attribute 'input1_str' or 'input1_col'
df_1 = test[(test.input1_str == 0) & (test.input2_str == 0)]
df_2 = test[(test.input1_col == 1) & (test.input2_col == 0)]
print(stats.ttest_ind(df_1.CUM_GPA, df_2.CUM_GPA, equal_var = False))
The error messages says
"AttributeError: 'DataFrame' object has no attribute 'input1_str'
or
"AttributeError: 'DataFrame' object has no attribute 'input1_col'

Welcome!
To access a column in pandas you cannot use data.column
Try data[column] or in your case test[input1_col]
Before you do so, make sure the column does exist and the user is not inputting a nonexistant column.
Sometimes the column name can be an integer and converting to a string may also be a concern
You can get a list of all the dataframe columns through running data.columns (if you want a regular array: list(data.columns)) and infact you can alter the column names through running data.columns = ["Column Header 1" , "Column Header 2" etc.]

How to fix this “TypeError: sequence item 0: expected str instance, float found”

I am trying to combine the cell values (strings) in a dataframe column using groupby method, separating the cell values in the grouped cell using commas. I ran into the following error:
TypeError: sequence item 0: expected str instance, float found
The error occurs on the following line of code, see the code block for complete codes:
toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))
It seems that in the groupby function, the index corresponding to each row in the un-grouped dataframe is automatically added to the string before it was joined. This causes the TypeError. However, I have no idea how to fix the issue. I browsed a lot of threads but didn't find a solution. I would appreciate any guidance or assistance!
# Import Necessary Libraries
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
# Use BeautifulSoup to scrap information in the table from the Wikipedia page, and set up the dataframe containing all the information in the table
wiki_html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(wiki_html, 'lxml')
# print(soup.prettify())
table = soup.find('table', class_='wikitable sortable')
table_columns = []
for th_txt in table.tbody.findAll('th'):
table_columns.append(th_txt.text.rstrip('\n'))
toronto_df = pd.DataFrame(columns=table_columns)
for row in table.tbody.findAll('tr')[1:]:
row_data = []
for td_txt in row.findAll('td'):
row_data.append(td_txt.text.rstrip('\n'))
toronto_df = toronto_df.append({table_columns[0]: row_data[0],
table_columns[1]: row_data[1],
table_columns[2]: row_data[2]}, ignore_index=True)
toronto_df.head()
# Remove cells with a borough that is Not assigned
toronto_df.replace('Not assigned',np.nan, inplace=True)
toronto_df = toronto_df[toronto_df['Borough'].notnull()]
toronto_df.reset_index(drop=True, inplace=True)
toronto_df.head()
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))
toronto_df.drop_duplicates(inplace=True)
toronto_df.head()
The expected result of the 'Neighbourhood' column should separate the cell values in the grouped cell using commas, showing something like this (I cannot post images yet, so I just provide the link):
https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1557273600000&hmac=936wN3okNJ1UTDA6rOpQqwELESvqgScu08_Spai0aQQ

As mentioned in the comments, the NaN is a float, so trying to do string operations on it doesn't work (and this is the reason for the error message)
Replace your last part of code with this:
The filling of the nan is done with boolean indexing according to the logic you specified in your comment
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
toronto_df.Neighbourhood = np.where(toronto_df.Neighbourhood.isnull(),toronto_df.Borough,toronto_df.Neighbourhood)
toronto_df['Neighbourhood'] = toronto_df.groupby(['Postcode','Borough'])['Neighbourhood'].agg(lambda x: ','.join(x))

How to use re.search column in dataframe

This code works for single string (inputx) but I can't get it to work when I replace it with the name of the column in my dataframe. What I want to do is split the string in column DESC where the capitalized words (at beginning of string) is place into column break2 and the remainder of the description is placed in column break3. Any assistance is appreciated. Thanks.
Example:
What I want output to look like (but with the different DESC from each row
Code that works for hardcoded string:
inputx= "STOCK RECORD INQUIRY This is a system that keeps track of the positions, location and ownership of the securities that the broker holds"
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]
But if I replace with dataframe column, I get error message: TypeError: expected string or bytes-like object
inputx = Before_df['DESC']
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]

You can use regex in the df.str.split method
df[['result','result2','result3']] = df['yourcol'].str.split("([a-z])", expand= True)
If you absolutely must use re.search (which sounds a little like homework...)
for i in df.index:
df.at[i, 'columnName'] = re.search("[a-z]", df.at[i, 'inputColumn'][::1]).start()
The reason for looping instead of using df.apply() is because dataframes do not like to be changed during an apply

How to pass row of dataframe into API if System.ArgumentException being raised?

I Have df that looks like this:
id
1
2
3
I need to iterate through the dataframe (only 1 column in the frame df)and pass each ID into an API, where it says &leadId= after the equal sign. I.e &leadId=1
I have been trying this code:
lst = []
for index,row in df.iterrows():
url = 'https://url.com/path/to?username=string&password=string&leadId=index'
xml_data1 = requests.get(url).text
lst.append(xml_data1)
print(xml_data1)
but I get error:
System.ArgumentException: Cannot convert index to System.Int32.
What am I doing wrong in my code to not pass the index value into the parameter in the api? I have also tried passing row into the API and get the same error.
Thank you in advance.
edit:
converted dataframe to int by this line of code:
df = df.astype(int)
changed following to row instead of index in API parameters.
for index,row in df.iterrows():
url = 'https://url.com/path/to?username=string&password=string&leadId=row'
xml_data1 = requests.get(url).text
lst.append(xml_data1)
print(xml_data1)
getting same error.
edit2:
full traceback:
System.ArgumentException: Cannot convert index to System.Int32.
Parameter name: type ---> System.FormatException: Input string was not in a correct format.
at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
at System.String.System.IConvertible.ToInt32(IFormatProvider provider)
at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
at System.Web.Services.Protocols.ScalarFormatter.FromString(String value, Type type)
--- End of inner exception stack trace ---
at System.Web.Services.Protocols.ScalarFormatter.FromString(String value, Type type)
at System.Web.Services.Protocols.ValueCollectionParameterReader.Read(NameValueCollection collection)
at System.Web.Services.Protocols.HttpServerProtocol.ReadParameters()
at System.Web.Services.Protocols.WebServiceHandler.CoreProcessRequest()

You should convert your int to the type which the API expects
df.id=df.id.astype('int32')
With your current url string API is trying to convert 'row' which is a string to an integer which is not possible.
update your url string
url = 'https://url.com/path/to?username=string&password=string&leadId={}'.format(row)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas nan not treated as string - python

You're calling the method before converting. Move .lower to outside conversion: if str(row["symbol"]).lower() == str(symbol).lower():

Related

Find the mode value of a column via a CSV file

Pandas Python user input an attribute for dataframe object

How to fix this “TypeError: sequence item 0: expected str instance, float found”

How to use re.search column in dataframe

How to pass row of dataframe into API if System.ArgumentException being raised?

Categories

Resources