Join 2 conditions using & operator [duplicate] - python

This question already has answers here:
Pandas: Filtering multiple conditions
(4 answers)
Closed 10 months ago.
i have 2 queries in pandas and need to join them together.
b.loc[b['Speed']=='100.0']
b.loc[b['Month']=='2022-01']
I need to join them using & but getting error of unsupported operand type.

You are comparing your data having different datatype with comparison value of str, while it should be float 64 and period M respectively as you have mentioned in your comment.
Try to match your comparison with correct data type. try this:
b.loc[(b['Speed'] == 100.0) & (b['Month'] == pd.Period('2022-01'))]

Related

subset fail on np.meshgrid generated dataframe [duplicate]

This question already has answers here:
Working with floating point NumPy arrays for comparison and related operations
(1 answer)
What is the best way to compare floats for almost-equality in Python?
(18 answers)
Pandas Dataframe Comparison and Floating Point Precision
(1 answer)
Closed 19 days ago.
I generate a dataframe for lonlat like this
a=np.arange(89.7664, 89.7789, 1e-4)
b=np.arange(20.6897, 20.7050, 1e-4)
temp_arr=np.array(np.meshgrid(a, b)).T.reshape(-1, 2)
np_df=pd.DataFrame(temp_arr, columns = ['lon','lat'])
and it create the dataframe I want
When I tried to subset the first lon
len(np_df[np_df['lon']==89.7664])
it will return 153. But when I tried subset some last lon
len(np_df[np_df['lon']==89.7788])
it will return 0
I wonder what is wrong here. Thank you
Use numpy.isclose if compare floats within a tolerance:
len(np_df[np.isclose(np_df['lon'], 89.7788)])
If still not working integers with multiple by 10000 and cast to ints should help:
len(np_df[np_df['lon'].mul(10000).astype(int).eq(897788)])

How to select rows from pandas dataframe by looking a feature' data types when a feature contains more than one type of value [duplicate]

This question already has answers here:
Select row from a DataFrame based on the type of the object(i.e. str)
(3 answers)
Closed 3 months ago.
I have a dataframe with 3 features: id, name and point. I need to select rows that type of 'point' value is string.
id
name
point
0
x
5
1
y
6
2
z
ten
3
t
nine
4
q
two
How can I split the dataframe just looking by type of one feature' value?
I tried to modify select_dtypes method but I lost. Also I tried to divide dataset with using
df[df[point].dtype == str] or df[df[point].dtype is str]
but didn't work.
Technically, the answer would be:
out = df[df['point'].apply(lambda x: isinstance(x, str))]
But this would also select rows containing a string representation of a number ('5').
If you want to select "strings" as opposed to "numbers" whether those are real numbers or string representations, you could use:
m = pd.to_numeric(df['point'], errors='coerce')
out = df[df['point'].notna() & m]
The question is now, what if you have '1A' or 'AB123' as value?

How can I drop rows based on given conditions in pandas and how to force converting data types? [duplicate]

This question already has answers here:
Change column type in pandas
(16 answers)
Closed 4 months ago.
I want to drop rows that don't verify a condition, I tried the code below, but it's not working
sample.drop(sample.loc[(sample.service == 'ftp') & (sample.is_ftp_login.isna())].index, inplace=False)
I also tried a loop, with condition isna() and with ' ', but it didn't work
for index, row in sample.iterrows():
if row['service'] == 'ftp' and row['is_ftp_login'].isna():
sample.drop([index])
I also want to change types from object to int and from float to int, (I tried both lines) it returns cannot convert to Class int.
sample['ct_ftp_cmd']=int(sample['ct_ftp_cmd'])
sample['ct_ftp_cmd']=str(int(sample['ct_ftp_cmd']))
Do you guys have any idea how to solve this, Thanks.
I
Thank you, I solved the problem.
I was able to convert the string type to numeric using this:
sample['ct_ftp_cmd']= pd.to_numeric(sample['ct_ftp_cmd'])
and I dropped rows based on a given condition using this:
sample = sample.drop(sample[(sample.service == 'ftp') & (sample.is_ftp_login.isna())].index)

Multiply by a float in pandas -> numbers with comma disappearing [duplicate]

This question already has answers here:
TypeError: can't multiply sequence by non-int of type 'float' (python 2.7)
(1 answer)
Finding non-numeric rows in dataframe in pandas?
(7 answers)
Change column type in pandas
(16 answers)
Closed 4 years ago.
Im having issue applying a currency rate in pandas.
Some numbers are being converted as 'nan' whenever they contains a comma, eg: 1,789 will be considered as nan.
I started with that code :
import pandas as pd
usd_rate = 0.77
salary = pd.read_csv("salary.csv")
#create revenue clean (convert usd to gbp)
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
So I was getting that error :
TypeError: can't multiply sequence by non-int of type 'float'
I've read you can't multiply the column by a float. So I converted my column to numeric :
salary.net_revenue = pd.to_numeric(salary.usd_revenue, errors='coerce')
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
Now I don't have any errors, yet when I looked at my file , all of the number above 999.99 - so the ones containing a comma - are put under 'nan'...
I thought it could be translate issue .. but I'm getting confused here..
any ideas ?
Thanks a lot
usd_revenue is probably not already a numeric type. Try this:
salary['usd_revenue'] = salary['usd_revenue'].map(float)
before your actual line:
salary['revenue_gbp'] = salary.usd_revenue * usd_rate

Pandas - ValueError: Error parsing datetime string "17-Jan-23" at position 3 [duplicate]

This question already has an answer here:
Error parsing datetime string "09-11-2017 00:02:00" at position 8
(1 answer)
Closed 4 years ago.
I have the following code where I am reading date column:
data = pd.DataFrame(array, columns=names)
data[['D_DATE']] = data[['D_DATE']].astype('datetime64')
But this is giving me error:
ValueError: Error parsing datetime string "17-Jan-23" at position 3
Can someone help how can I resolve this.
Try this:
data['D_DATE'] = pd.to_datetime(data['D_DATE'])
Indexing a single column with double brackets (df[['D_DATE']]) returns a DataFrame with one column named 'D_DATE'. Indexing with a single set of brackets (df['D_DATE']) returns a Series named 'D_DATE'. To create a new column in a DataFrame using the form df[new_col], use single brackets.

Categories