conditional statement producing weird results?

conditional statement producing weird results? - python

I want to create a new feature that converts and currency to EUR. the process is simple. I have 2 columns, one with the type of currency i.e USD and then the amount on the other column. I am creating a 3rd column called 'price_in_eur' and what is does is looks at the type of currency and if it is not 'EUR' then it should multiply the type of currency column by 1.1, otherwise it should be left alone but when i run the code i get the following error:
ValueError: either both or neither of x and y should be given this is my code:
x = data[['type_of_currency','amount']]
x.type_of_currency= x.amount.str.extract('(\d+)', expand=False)
x['type_of_currency'] = x['amount'].astype(float)
x['price_in_euro'] = np.where(x['type_of_currency']=='USD',x['amount']*1.1)
Can someone please help? I think it has something to do with the fact that the np.where statement is looking at the type_of_currency column which is a string but not sure.

You need to provide both arguments in np.where after condition. --> numpy.where(condition[, x, y])
Ex:
x['price_in_euro'] = np.where(x['type_of_currency']=='USD',x['amount']*1.1, x['amount'])
MoreInfo

Related

How to fix "ValueError: could not convert string to float"

I am going to train an SVM based on x dataframe and y series I have.
X dataframe is shown below:
x:
Timestamp Location of sensors Pressure or Flow values
0.00000 138.22, 1549.64 28.92
0.08333 138.22, 1549.64 28.94
0.16667 138.22, 1549.64 28.96
In X dataframe, location of sensors are represented as the form of node coordinate.
Y series is shown below:
y:
0
0
0
But when I fit svm to the training set, it returned an ValueError: could not convert string to float: '361.51,1100.77' and (361.51, 1100.77) are coordinates for a node.
Could you please give me some ideas to fix this problem?
Appreciate a lot if any advices.

'361.51,1100.77' are actually two numbers right? A latitude (361.51) and a longitude (1100.77). You would first need to split it into two strings. Here is one way to do it:
data = pd.DataFrame(data=[[0, "138.22,1549.64", 28.92]], columns=["Timestamp", "coordinate", "flow"])
data["latitude"] = data["coordinate"].apply(lambda x: float(x.split(",")[0]))
data["longitude"] = data["coordinate"].apply(lambda x: float(x.split(",")[1]))
This will give you two new columns in your dataframe each with a float of the values in the string.

I'm assuming that you are trying to convert the entire string "361.51,1100.77" into a float, and you can see why that's a problem because Python sees two decimal points and a comma inbetween, so it has no idea what to do.
Assuming you want the numbers to be separate, you could do something like this:
myStr = "361.51,1100.77"
x = float(myStr[0:myStr.index(",")])
y = float(myStr[myStr.index(",")+1:])
print(x)
print(y)
Which would get you an output of
361.51
1100.77
Assigning x to be myStr[0:myStr.index(",")] takes a substring of the original string from 1 to the first occurrence of a comma, getting you the first number.
Assigning y to be myStr[myStr.index(",")+1:] takes a substring of the original string starting after the first comma and to the end of the string, getting you the second number.
Both can easily be converted to floats from here using the float(myStr) method, getting you two separate floats.
Here is a helpful link to understand string slicing: https://www.geeksforgeeks.org/string-slicing-in-python/

Why my missing values are being replaced by something different than what I want? Python

I am trying to replace missing data from my dataframe.
Some of the data is replaced correctly according to what I want but the rest doesn't work.
For instance, I want to fill the missing data for my ['Gender'] column.
I tried 2 different methods:
Replacing using the Mode
for column in ['Gender']:
df[column].fillna(df[column].mode().index[0], inplace = True)
It works for more than 95% of the missing data, but for some missing data it replaced it with '0' rather than the mode (Male or Female).
So I tried a second method, replacing by Random
df['Gender'].fillna(lambda x: random.choice(df[df[Gender] != np.nan]['Gender']), inplace =True)
Same problem about 95% is replaced correctly and the rest gives me the following as replaced data:
<function at 0x000001F4BB66DF70>
instead of Male or Female.
Does anybody know why and how to fix this issue?

You get ZERO because you are passing the index, you should instead write:
for column in ['Gender']:
df[column].fillna(df[column].mode()[0], inplace = True)

Empty cell can not be recognised in object type to be replace with integer in Python

I have the below as part of my dataframe , column Curr has been built up based on other columns however I produced column First and second as part of column Curr with spot_table['First']=spot_table['Ccy'].map(lambda x: x[0:7]) and spot_table['Second']=spot_table['Ccy'].map(lambda x: x[8:15]).
Ccy First Second
AUD_USD/USD_INR AUD_USD USD_INR
AUD_USD/USD_SGD AUD_USD USD_SGD
USD_AUD USD_AUD
AUD_USD/USD_CNH AUD_USD USD_CNH
USD_AUD USD_AUD
they all are based on dtype = O while i need them to be string. I need null cells to be recognised to be fill in with -1 . something like : spot_table['Second'].fillna(-1, inplace = True)
the code will be run but unfortunately the null cells can not be recognised so it wouldn't be replaced with anything !
and whenever i use spot_table['Second'].isnull().any(), the result comes out as false.
Can anyone help me please ????????? can you please tag anyone who is able to fix the issue please?
Cheers,
Z
This is the picture of the column with white space which is related to the comment to figure it out how to use isnull() in all different scenarios for object type while it doesn't work !

Try this. Instead of using a substring, I’m splitting the value to prepare a list and only assigning the Second column with a value if there are 2 elements in the list. Else i set it with NaN.
spot_table['First']=spot_table['Ccy'].map(lambda x: x.split('/')[0])
spot_table['Second']=spot_table['Ccy'].map(lambda x: x.split('/')[1] if len(x.split('/'))==2 else np.NaN)
spot_table['Second'].fillna(-1, inplace = True)

Pandas adding decimal points when using read_csv

I'm working with some csv files and using pandas to turn them into a dataframe. After that, I use an input to find values to delete
I'm hung up on one small issue: for some columns it's adding ".o" to the values in the column. It only does this in columns with numbers, so I'm guessing it's reading the column as a float. How do I prevent this from happening?
The part that really confuses me is that it only happens in a few columns, so I can't quite figure out a pattern. I need to chop off the ".0" so I can re-import it, and I feel like it would be easiest to prevent it from happening in the first place.
Thanks!
Here's a sample of my code:
clientid = int(input('What client ID needs to be deleted?'))
df1 = pd.read_csv('Client.csv')
clientclean = df1.loc[df1['PersonalID'] != clientid]
clientclean.to_csv('Client.csv', index=None)
Ideally, I'd like all of the values to be the same as the original csv file, but without the rows with the clientid from the user input.
The part that really confuses me is that it only happens in a few columns, so I can't quite figure out a pattern. I need to chop off the ".0" so I can re-import it, and I feel like it would be easiest to prevent it from happening in the first place.
Thanks!

If PersonalID if the header of the problematic column, try this:
df1 = pd.read_csv('Client.csv', dtype={'PersonalID':np.int32})
Edit:
As there are no NaN value for integer.
You can try this on each problematic colums:
df1[col] = df1[col].fillna(-9999) # or 0 or any value you want here
df1[col] = df1[col].astype(int)

You could go through each value, and if it is a number x, subtract int(x) from it, and if this difference is not 0.0, convert the number x to int(x). Or, if you're not dealing with any non-integers, you could just convert all values that are numbers to ints.
For an example of the latter (when your original data does not contain any non-integer numbers):
for index, row in df1.iterrows():
for c, x in enumerate(row):
if isinstance(x, float):
df1.iloc[index,c] = int(x)
For an example of the former (if you want to keep non-integer numbers as non-integer numbers, but want to guarantee that integer numbers stay as integers):
import numbers
import sys
for c, col in enumerate(df1.columns):
foundNonInt = False
for r, index in enumerate(df1.index):
if isinstance(x, float):
if (x - int(x) > sys.float_info.epsilon):
foundNonInt = True
break
if (foundNonInt==False):
df1.iloc[:,c] = int(df1.iloc[:,c])
else:
Note, the above method is not fool-proof: if by chance, a non-integer number column from the original data set contains non-integers that are all x.0000000, all the way to the last decimal place, this will fail.

It was a datatype issue.
ALollz's comment lead me in the right direction. Pandas was assuming a data type of float, which added the decimal points.
I specified the datatype as object (from Akarius's comment) when using read_csv, which resolved the issue.

Find if a column in dataframe has neither nan nor none

I have gone through all posts on the website and am not able to find solution to my problem.
I have a dataframe with 15 columns. Some of them come with None or NaN values. I need help in writing the if-else condition.
If the column in the dataframe is not null and nan, I need to format the datetime column. Current Code is as below
for index, row in df_with_job_name.iterrows():
start_time=df_with_job_name.loc[index,'startTime']
if not df_with_job_name.isna(df_with_job_name.loc[index,'startTime']) :
start_time_formatted =
datetime(*map(int, re.split('[^\d]', start_time)[:-1]))
The error that I am getting is
if not df_with_job_name.isna(df_with_job_name.loc[index,'startTime']) :
TypeError: isna() takes exactly 1 argument (2 given)

A direct way to take care of missing/invalid values is probably:
def is_valid(val):
if val is None:
return False
try:
return not math.isnan(val)
except TypeError:
return True
and of course you'll have to import math.
Also it seems isna is not invoked with any argument and returns a dataframe of boolean values (see link). You can iterate thru both dataframes to determine if the value is valid.

isna takes your entire data frame as the instance argument (that's self, if you're already familiar with classes) and returns a data frame of Boolean values, True where that value is invalid. You tried to specify the individual value you're checking as a second input argument. isna doesn't work that way; it takes empty parentheses in the call.
You have a couple of options. One is to follow the individual checking tactics here. The other is to make the map of the entire data frame and use that:
null_map_df = df_with_job_name.isna()
for index, row in df_with_job_name.iterrows() :
if not null_map_df.loc[index,row]) :
start_time=df_with_job_name.loc[index,'startTime']
start_time_formatted =
datetime(*map(int, re.split('[^\d]', start_time)[:-1]))
Please check my use of row & column indices; the index, row handling doesn't look right. Also, you should be able to apply an any operation to the entire row at once.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

conditional statement producing weird results? - python

You need to provide both arguments in np.where after condition. --> numpy.where(condition[, x, y]) Ex: x['price_in_euro'] = np.where(x['type_of_currency']=='USD',x['amount']*1.1, x['amount']) MoreInfo

Related

How to fix "ValueError: could not convert string to float"

Why my missing values are being replaced by something different than what I want? Python

Empty cell can not be recognised in object type to be replace with integer in Python

Pandas adding decimal points when using read_csv

Find if a column in dataframe has neither nan nor none

Categories

Resources