How to remove a string selectively from pandas series?

How to remove a string selectively from pandas series? - python

I am having some hard time to get rid of a string from my pandas series. I'd like to remove the first two '-' strings but want to keep the last two number objectives. Example is below.
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp)
Out[135]:
0 -
1 -
2 -0.3
3 -0.9
dtype: object
Can't use temp.str.replace("-", "") since it removes the minus sign from the last two number objectives as well. Can anyone help me with this. Thanks in advance!

Use a regular expression:
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp.str.replace('^-$', '', regex=True))
Output
0
1
2 -0.3
3 -0.9
dtype: object
Or simply use replace:
print(temp.replace('-', '')) # notice that there is no .str
Output
0
1
2 -0.3
3 -0.9
dtype: object

You can convert strings to numbers:
pd.to_numeric(temp, errors='coerce').fillna('')
Output:
0
1
2 -0.3
3 -0.9

You can remove the unwanted string's like this:
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
# this will drop the string that match '-'
new_temp= temp[temp != '-']
print(new_temp)
Output:
2 -0.3
3 -0.9
dtype: object
reference: Here

Related

Formatting a string containing currency and commas

Does anyone know how I'd format this string (which is a column in a dataframe) to be a float so I can sort by the column please?
£880,000
£88,500
£850,000
£845,000
i.e. I want this to become
88,500
845,000
850,000
880,000
Thanks in advance!

Assuming 'col' the column name.
If you just want to sort, and keep as string, you can use natsorted:
from natsort import natsort_key
df.sort_values(by='col', key=natsort_key)
# OR
from natsort import natsort_keygen
df.sort_values(by='col', key=natsort_keygen())
output:
col
1 £88,500
3 £845,000
2 £850,000
0 £880,000
If you want to convert to floats:
df['col'] = pd.to_numeric(df['col'].str.replace('[^\d.]', '', regex=True))
df.sort_values(by='col')
output:
col
1 88500
3 845000
2 850000
0 880000
If you want strings, you can use str.lstrip:
df['col'] = df['col'].str.lstrip('£')
output:
col
0 880,000
1 88,500
2 850,000
3 845,000

Pandas: How to display a series value with brackets

I need to display the values of a series in brackets ().
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
OUTPUT should look like here:
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
Any suggestions?

I don't know why do you want to this but maybe you can add '(' ')' manually.
pd = pd.Series([f"({i})" for i in pd])
I'm not sure if this is exactly what you want, but I hope it helps.

This works
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
pd=pd.apply(lambda i: f"({i})")
print(pd)
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
dtype: object

Convert type str (with number and words) column into int pandas

I have a column that contains type str of both numbers and words:
ex.
['2','3','Amy','199','Happy']
And I want to convert all "str number" into int and remove (the rows with) the "str words".
So my expected output would be a list like below:
[2, 3, 199]
Since I have a pandas dataframe, and this supposed to be one of the columns, it would be even better if it could be a Series as follows:
0 2.0
1 3.0
3 199.0
dtype: float64

As you mentioned you have a column (a series), so let's say it's called s:
s = pd.Series(['2', '3', 'Amy', '199', 'Happy'])
Then after assigning, just do pd.to_numeric and put the parameter of errors='coerce'. Then, remove the NaNs with dropna:
print(pd.to_numeric(s, errors='coerce').dropna())
Then the above code will output:
0 2.0
1 3.0
3 199.0
dtype: float64

without using pandas as you are supplying an array
import re
data = ['2','3','Amy','199','Happy']
for item in data:
print (*re.findall(r'\d+',item))
will give
2
3
199
and
import re
data = ['2','3','Amy','199','Happy']
out = []
for item in data:
m = str(*re.findall(r'\d+',item))
if m != "":
out.append(int(m))
print (out)
will give
[2, 3, 199]

You can use isnumeric to filter out nonnumeric items.
s = pd.Series(['2','3','Amy','199','Happy'])
print(s[s.str.isnumeric()].astype(int))
Output:
0 2
1 3
3 199
dtype: int64

how to remove zeros after decimal from string remove all zero after dot

I have data frame with a object column lets say col1, which has values likes:
1.00,
1,
0.50,
1.54
I want to have the output like the below:
1,
1,
0.5,
1.54
basically, remove zeros after decimal values if it does not have any digit after zero. Please note that i need answer for dataframe. pd.set_option and round don't work for me.

If want convert integers and floats numbers to strings with no trailing 0 use this with map or apply:
df = pd.DataFrame({'col1':[1.00, 1, 0.5, 1.50]})
df['new'] = df['col1'].map('{0:g}'.format)
#alternative solution
#df['new'] = df['col1'].apply('{0:g}'.format)
print (df)
col1 new
0 1.0 1
1 1.0 1
2 0.5 0.5
3 1.5 1.5
print (df['new'].apply(type))
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
Name: new, dtype: object

I think something like this should work:
if val.is_integer() == True :
val = int(val)
elif val.is_float() == True :
val = Decimal(val).normalize()
Assuming that val is a float value inside the dataframe's column. You simply cast the value to be integer.
For float value instead you cut extra zeros.

A quick-and-dirty solution is to use "%g" % value, which will convert floats 1.5 to 1.5 but 1.0 to 1 and so on. The negative side-effect is that large numbers will be represented in scientific notation like 4.44e+07.

Taken from this Stackoverflow answer, I think you'd like to change the display precision of pandas like so:
pd.set_option('precision', 0)

How about the str.rstrip method. Like so (assuming your strings are in a list):
a = ["1.00", "1" ,"0.50", "1.50"]
b = [e.rstrip('.0') for e in a]
>>> ['1', '1', '0.5', '1.5']

Pandas substring

I have the following dataframe:
contract
0 WTX1518X22
1 WTX1518X20.5
2 WTX1518X19
3 WTX1518X15.5
I need to add a new column containing everything following the last 'X' from the first column. So the result would be:
contract result
0 WTX1518X22 22
1 WTX1518X20.5 20.5
2 WTX1518X19 19
3 WTX1518X15.5 15.5
So I figure I first need to find the string index position of the last 'X' (because there may be more than one 'X' in the string). Then get a substring containing everything following that index position for each row.
EDIT:
I have managed to get the index position of 'X' as required:
df.['index_pos'] = df['contract'].str.rfind('X', start=0, end=None)
But I still can't seem to get a new column containing all characters following the 'X'. I am trying:
df['index_pos'] = df['index_pos'].convert_objects(convert_numeric=True)
df['result'] = df['contract'].str[df['index_pos']:]
But this just gives me an empty column called 'result'. This is strange because if I do the following then it works correctly:
df['result'] = df['contract'].str[8:]
So I just need a way to not hardcode '8' but to instead use the column 'index_pos'. Any suggestions?

Use vectorised str.split to split the string and cast the last split to float:
In [10]:
df['result'] = df['contract'].str.split('X').str[-1].astype(float)
df

Out[10]:
contract result
0 WTX1518X22 22.0
1 WTX1518X20.5 20.5
2 WTX1518X19 19.0
3 WTX1518X15.5 15.5

import pandas as pd
import re as re
df['result'] = df['contract'].map(lambda x:float(re.findall('([0-9\.]+)$',x)[0]))
Out[34]:
contract result
0 WTX1518X22 22.0
1 WTX1518X20.5 20.5
2 WTX1518X19 19.0
3 WTX1518X15.5 15.5
A similar approach to the one by EdChump using regular expressions, this one only assumes that the number is at the end of the string.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove a string selectively from pandas series? - python

Use a regular expression: temp = pd.Series(['-', '-', '-0.3', '-0.9']) print(temp.str.replace('^-$', '', regex=True)) Output 0 1 2 -0.3 3 -0.9 dtype: object Or simply use replace: print(temp.replace('-', '')) # notice that there is no .str Output 0 1 2 -0.3 3 -0.9 dtype: object

You can convert strings to numbers: pd.to_numeric(temp, errors='coerce').fillna('') Output: 0 1 2 -0.3 3 -0.9

You can remove the unwanted string's like this: import pandas as pd temp = pd.Series(['-', '-', '-0.3', '-0.9']) # this will drop the string that match '-' new_temp= temp[temp != '-'] print(new_temp) Output: 2 -0.3 3 -0.9 dtype: object reference: Here

Related

Formatting a string containing currency and commas

Pandas: How to display a series value with brackets

Convert type str (with number and words) column into int pandas

how to remove zeros after decimal from string remove all zero after dot

Pandas substring

Categories

Resources