How to remove a string selectively from pandas series? - python

I am having some hard time to get rid of a string from my pandas series. I'd like to remove the first two '-' strings but want to keep the last two number objectives. Example is below.
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp)
Out[135]:
0 -
1 -
2 -0.3
3 -0.9
dtype: object
Can't use temp.str.replace("-", "") since it removes the minus sign from the last two number objectives as well. Can anyone help me with this. Thanks in advance!

Use a regular expression:
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp.str.replace('^-$', '', regex=True))
Output
0
1
2 -0.3
3 -0.9
dtype: object
Or simply use replace:
print(temp.replace('-', '')) # notice that there is no .str
Output
0
1
2 -0.3
3 -0.9
dtype: object

You can convert strings to numbers:
pd.to_numeric(temp, errors='coerce').fillna('')
Output:
0
1
2 -0.3
3 -0.9

You can remove the unwanted string's like this:
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
# this will drop the string that match '-'
new_temp= temp[temp != '-']
print(new_temp)
Output:
2 -0.3
3 -0.9
dtype: object
reference: Here

Related

Formatting a string containing currency and commas

Does anyone know how I'd format this string (which is a column in a dataframe) to be a float so I can sort by the column please?
£880,000
£88,500
£850,000
£845,000
i.e. I want this to become
88,500
845,000
850,000
880,000
Thanks in advance!
Assuming 'col' the column name.
If you just want to sort, and keep as string, you can use natsorted:
from natsort import natsort_key
df.sort_values(by='col', key=natsort_key)
# OR
from natsort import natsort_keygen
df.sort_values(by='col', key=natsort_keygen())
output:
col
1 £88,500
3 £845,000
2 £850,000
0 £880,000
If you want to convert to floats:
df['col'] = pd.to_numeric(df['col'].str.replace('[^\d.]', '', regex=True))
df.sort_values(by='col')
output:
col
1 88500
3 845000
2 850000
0 880000
If you want strings, you can use str.lstrip:
df['col'] = df['col'].str.lstrip('£')
output:
col
0 880,000
1 88,500
2 850,000
3 845,000

Pandas: How to display a series value with brackets

I need to display the values of a series in brackets ().
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
OUTPUT should look like here:
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
Any suggestions?
I don't know why do you want to this but maybe you can add '(' ')' manually.
pd = pd.Series([f"({i})" for i in pd])
I'm not sure if this is exactly what you want, but I hope it helps.
This works
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
pd=pd.apply(lambda i: f"({i})")
print(pd)
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
dtype: object

Convert type str (with number and words) column into int pandas

I have a column that contains type str of both numbers and words:
ex.
['2','3','Amy','199','Happy']
And I want to convert all "str number" into int and remove (the rows with) the "str words".
So my expected output would be a list like below:
[2, 3, 199]
Since I have a pandas dataframe, and this supposed to be one of the columns, it would be even better if it could be a Series as follows:
0 2.0
1 3.0
3 199.0
dtype: float64
As you mentioned you have a column (a series), so let's say it's called s:
s = pd.Series(['2', '3', 'Amy', '199', 'Happy'])
Then after assigning, just do pd.to_numeric and put the parameter of errors='coerce'. Then, remove the NaNs with dropna:
print(pd.to_numeric(s, errors='coerce').dropna())
Then the above code will output:
0 2.0
1 3.0
3 199.0
dtype: float64
without using pandas as you are supplying an array
import re
data = ['2','3','Amy','199','Happy']
for item in data:
print (*re.findall(r'\d+',item))
will give
2
3
199
and
import re
data = ['2','3','Amy','199','Happy']
out = []
for item in data:
m = str(*re.findall(r'\d+',item))
if m != "":
out.append(int(m))
print (out)
will give
[2, 3, 199]
You can use isnumeric to filter out nonnumeric items.
s = pd.Series(['2','3','Amy','199','Happy'])
print(s[s.str.isnumeric()].astype(int))
Output:
0 2
1 3
3 199
dtype: int64

how to remove zeros after decimal from string remove all zero after dot

I have data frame with a object column lets say col1, which has values likes:
1.00,
1,
0.50,
1.54
I want to have the output like the below:
1,
1,
0.5,
1.54
basically, remove zeros after decimal values if it does not have any digit after zero. Please note that i need answer for dataframe. pd.set_option and round don't work for me.
If want convert integers and floats numbers to strings with no trailing 0 use this with map or apply:
df = pd.DataFrame({'col1':[1.00, 1, 0.5, 1.50]})
df['new'] = df['col1'].map('{0:g}'.format)
#alternative solution
#df['new'] = df['col1'].apply('{0:g}'.format)
print (df)
col1 new
0 1.0 1
1 1.0 1
2 0.5 0.5
3 1.5 1.5
print (df['new'].apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
Name: new, dtype: object
I think something like this should work:
if val.is_integer() == True :
val = int(val)
elif val.is_float() == True :
val = Decimal(val).normalize()
Assuming that val is a float value inside the dataframe's column. You simply cast the value to be integer.
For float value instead you cut extra zeros.
A quick-and-dirty solution is to use "%g" % value, which will convert floats 1.5 to 1.5 but 1.0 to 1 and so on. The negative side-effect is that large numbers will be represented in scientific notation like 4.44e+07.
Taken from this Stackoverflow answer, I think you'd like to change the display precision of pandas like so:
pd.set_option('precision', 0)
How about the str.rstrip method. Like so (assuming your strings are in a list):
a = ["1.00", "1" ,"0.50", "1.50"]
b = [e.rstrip('.0') for e in a]
>>> ['1', '1', '0.5', '1.5']

Pandas substring

I have the following dataframe:
contract
0 WTX1518X22
1 WTX1518X20.5
2 WTX1518X19
3 WTX1518X15.5
I need to add a new column containing everything following the last 'X' from the first column. So the result would be:
contract result
0 WTX1518X22 22
1 WTX1518X20.5 20.5
2 WTX1518X19 19
3 WTX1518X15.5 15.5
So I figure I first need to find the string index position of the last 'X' (because there may be more than one 'X' in the string). Then get a substring containing everything following that index position for each row.
EDIT:
I have managed to get the index position of 'X' as required:
df.['index_pos'] = df['contract'].str.rfind('X', start=0, end=None)
But I still can't seem to get a new column containing all characters following the 'X'. I am trying:
df['index_pos'] = df['index_pos'].convert_objects(convert_numeric=True)
df['result'] = df['contract'].str[df['index_pos']:]
But this just gives me an empty column called 'result'. This is strange because if I do the following then it works correctly:
df['result'] = df['contract'].str[8:]
So I just need a way to not hardcode '8' but to instead use the column 'index_pos'. Any suggestions?
Use vectorised str.split to split the string and cast the last split to float:
In [10]:
df['result'] = df['contract'].str.split('X').str[-1].astype(float)
df
​
Out[10]:
contract result
0 WTX1518X22 22.0
1 WTX1518X20.5 20.5
2 WTX1518X19 19.0
3 WTX1518X15.5 15.5
import pandas as pd
import re as re
df['result'] = df['contract'].map(lambda x:float(re.findall('([0-9\.]+)$',x)[0]))
Out[34]:
contract result
0 WTX1518X22 22.0
1 WTX1518X20.5 20.5
2 WTX1518X19 19.0
3 WTX1518X15.5 15.5
A similar approach to the one by EdChump using regular expressions, this one only assumes that the number is at the end of the string.

Categories