String.split() after n characters - python

I can split a string like this:
string = 'ABC_elTE00001'
string = string.split('_elTE')[1]
print(string)
How do I automate this, so I don't have to pass '_elTE' to the function? Something like this:
string = 'ABC_elTE00001'
string = string.split('_' + 4 characters)[1]
print(string)

Use regex, regex has a re.split thing which is the same as str.split just you can split by a regex pattern, it's worth a look at the docs:
>>> import re
>>> string = 'ABC_elTE00001'
>>> re.split('_\w{4}', string)
['ABC', '00001']
>>>
The above example is using a regex pattern as you see.

split() on _ and take everything after the first four characters.
s = 'ABC_elTE00001'
# s.split('_')[1] gives elTE00001
# To get the string after 4 chars, we'd slice it [4:]
print(s.split('_')[1][4:])
OUTPUT:
00001

You can use Regular expression to automate the extraction that you want.
import re
string = 'ABC_elTE00001'
data = re.findall('.([0-9]*$)',string)
print(data)

This is a, quite horrible, version that exactly "translates" string.split('_' + 4 characters)[1]:
s = 'ABC_elTE00001'
s.split(s[s.find("_"):(s.find("_")+1)+4])[1]
>>> '00001'

Related

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

Slice string at last digit in Python

So I have strings with a date somewhere in the middle, like 111_Joe_Smith_2010_Assessment and I want to truncate them such that they become something like 111_Joe_Smith_2010. The code that I thought would work is
reverseString = currentString[::-1]
stripper = re.search('\d', reverseString)
But for some reason this doesn't always give me the right result. Most of the time it does, but every now and then, it will output a string that looks like 111_Joe_Smith_2010_A.
If anyone knows what's wrong with this, it would be super helpful!
You can use re.sub and $ to match and substitute alphabetical characters
and underscores until the end of the string:
import re
d = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
new_s = [re.sub('[a-zA-Z_]+$', '', i) for i in d]
Output:
['111_Joe_Smith_2010', '111_Bob_Smith_2010']
You could strip non-digit characters from the end of the string using re.sub like this:
>>> import re
>>> re.sub(r'\D+$', '', '111_Joe_Smith_2010_Assessment')
'111_Joe_Smith_2010'
For your input format you could also do it with a simple loop:
>>> s = '111_Joe_Smith_2010_Assessment'
>>> i = len(s) - 1
>>> while not s[i].isdigit():
... i -= 1
...
>>> s[:i+1]
'111_Joe_Smith_2010'
You can use the following approach:
def clean_names():
names = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
for name in names:
while not name[-1].isdigit():
name = name[:-1]
print(name)
Here is another solution using rstrip() to remove trailing letters and underscores, which I consider a pretty smart alternative to re.sub() as used in other answers:
import string
s = '111_Joe_Smith_2010_Assessment'
new_s = s.rstrip(f'{string.ascii_letters}_') # For Python 3.6+
new_s = s.rstrip(string.ascii_letters+'_') # For other Python versions
print(new_s) # 111_Joe_Smith_2010

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)
You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]
Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'
Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Changing only one letter when there are a lot simular in string

Suppose I have the following string:
I.like.football
sky.is.blue
I need to make a loop that changes the last '.' to '_' so it looks this way
I.like_football
sky.is_blue
They are all simular style(3 words, 3 dots).
How to do that in a loop?
str='I.like.football'
str=str.rsplit('.',1) #this split from right but only first '.'
print '_'.join(str) # then join it
#output I.like_football
in single line
str='_'.join(str.rsplit('.',1))
str.replace lets you specify the number of replacements. Unfortunately there is no str.rreplace, so you'd need to reverse the string before and after :) eg.
>>> def f(s):
... return s[::-1].replace(".", "_", 1)[::-1]
...
>>> f('I.like.football')
'I.like_football'
>>> f('sky.is.blue')
'sky.is_blue'
Alternatively you could use one of str.rpartition, str.rsplit, str.rfind
This doesn't even need to run in a loop:
import re
p = re.compile(ur'\.(?=[^\.]+$)', re.IGNORECASE | re.MULTILINE)
test_str = u"I.like.football\nsky.is.blue"
subst = u"_"
result = re.sub(p, subst, test_str)

Removing many types of chars from a Python string

I have some string X and I wish to remove semicolons, periods, commas, colons, etc, all in one go. Is there a way to do this that doesn't require a big chain of .replace(somechar,"") calls?
You can use the translate method with a first argument of None:
string2 = string1.translate(None, ";.,:")
Alternatively, you can use the filter function:
string2 = filter(lambda x: x not in ";,.:", string1)
Note that both of these options only work for non-Unicode strings and only in Python 2.
You can use re.sub to pattern match and replace. The following replaces h and i only with empty strings:
In [1]: s = 'byehibyehbyei'
In [1]: re.sub('[hi]', '', s)
Out[1]: 'byebyebye'
Don't forget to import re.
>>> import re
>>> foo = "asdf;:,*_-"
>>> re.sub('[;:,*_-]', '', foo)
'asdf'
[;:,*_-] - List of characters to be matched
'' - Replace match with nothing
Using the string foo.
For more information take a look at the re.sub(pattern, repl, string, count=0, flags=0) documentation.
Don't know about the speed, but here's another example without using re.
commas_and_stuff = ",+;:"
words = "words; and stuff!!!!"
cleaned_words = "".join(c for c in words if c not in commas_and_stuff)
Gives you:
'words and stuff!!!!'

Categories