Slice string at last digit in Python - python

So I have strings with a date somewhere in the middle, like 111_Joe_Smith_2010_Assessment and I want to truncate them such that they become something like 111_Joe_Smith_2010. The code that I thought would work is
reverseString = currentString[::-1]
stripper = re.search('\d', reverseString)
But for some reason this doesn't always give me the right result. Most of the time it does, but every now and then, it will output a string that looks like 111_Joe_Smith_2010_A.
If anyone knows what's wrong with this, it would be super helpful!

You can use re.sub and $ to match and substitute alphabetical characters
and underscores until the end of the string:
import re
d = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
new_s = [re.sub('[a-zA-Z_]+$', '', i) for i in d]
Output:
['111_Joe_Smith_2010', '111_Bob_Smith_2010']

You could strip non-digit characters from the end of the string using re.sub like this:
>>> import re
>>> re.sub(r'\D+$', '', '111_Joe_Smith_2010_Assessment')
'111_Joe_Smith_2010'
For your input format you could also do it with a simple loop:
>>> s = '111_Joe_Smith_2010_Assessment'
>>> i = len(s) - 1
>>> while not s[i].isdigit():
... i -= 1
...
>>> s[:i+1]
'111_Joe_Smith_2010'

You can use the following approach:
def clean_names():
names = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
for name in names:
while not name[-1].isdigit():
name = name[:-1]
print(name)

Here is another solution using rstrip() to remove trailing letters and underscores, which I consider a pretty smart alternative to re.sub() as used in other answers:
import string
s = '111_Joe_Smith_2010_Assessment'
new_s = s.rstrip(f'{string.ascii_letters}_') # For Python 3.6+
new_s = s.rstrip(string.ascii_letters+'_') # For other Python versions
print(new_s) # 111_Joe_Smith_2010

Related

String.split() after n characters

I can split a string like this:
string = 'ABC_elTE00001'
string = string.split('_elTE')[1]
print(string)
How do I automate this, so I don't have to pass '_elTE' to the function? Something like this:
string = 'ABC_elTE00001'
string = string.split('_' + 4 characters)[1]
print(string)
Use regex, regex has a re.split thing which is the same as str.split just you can split by a regex pattern, it's worth a look at the docs:
>>> import re
>>> string = 'ABC_elTE00001'
>>> re.split('_\w{4}', string)
['ABC', '00001']
>>>
The above example is using a regex pattern as you see.
split() on _ and take everything after the first four characters.
s = 'ABC_elTE00001'
# s.split('_')[1] gives elTE00001
# To get the string after 4 chars, we'd slice it [4:]
print(s.split('_')[1][4:])
OUTPUT:
00001
You can use Regular expression to automate the extraction that you want.
import re
string = 'ABC_elTE00001'
data = re.findall('.([0-9]*$)',string)
print(data)
This is a, quite horrible, version that exactly "translates" string.split('_' + 4 characters)[1]:
s = 'ABC_elTE00001'
s.split(s[s.find("_"):(s.find("_")+1)+4])[1]
>>> '00001'

How to move all special characters to the end of the string in Python?

I'm trying to filter all non-alphanumeric characters to the end of the strings. I am having a hard time with the regex since I don't know where the special characters we be. Here are a couple of simple examples.
hello*there*this*is*a*str*ing*with*asterisks
and&this&is&a&str&ing&&with&ampersands&in&i&t
one%mo%refor%good%mea%sure%I%think%you%get%it
How would I go about sliding all the special characters to the end of the string?
Here is what I tried, but I didn't get anything.
re.compile(r'(.+?)(\**)')
r.sub(r'\1\2', string)
Edit:
Expected output for the first string would be:
hellotherethisisastringwithasterisks********
There's no need for regex here. Just use str.isalpha and build up two lists, then join them:
strings = ['hello*there*this*is*a*str*ing*with*asterisks',
'and&this&is&a&str&ing&&with&ampersands&in&i&t',
'one%mo%refor%good%mea%sure%I%think%you%get%it']
for s in strings:
a = []
b = []
for c in s:
if c.isalpha():
a.append(c)
else:
b.append(c)
print(''.join(a+b))
Result:
hellotherethisisastringwithasterisks********
andthisisastringwithampersandsinit&&&&&&&&&&&
onemoreforgoodmeasureIthinkyougetit%%%%%%%%%%
Alternative print() call for Python 3.5 and higher:
print(*a, *b, sep='')
Here is my proposed solution for this with regex:
import re
def move_nonalpha(string,char):
pattern = "\\"+char
char_list = re.findall(pattern,string)
if len(char_list)>0:
items = re.split(pattern,string)
if len(items)>0:
return ''.join(items)+''.join(char_list)
Usage:
string = "hello*there*this*is*a*str*ing*with*asterisks"
print (move_nonalpha(string,"*"))
Gives me output:
hellotherethisisastringwithasterisks********
I tried with your other input patterns as well and it's working. Hope it'll help.

Python: How to remove [' and ']?

I want to remove [' from start and '] characters from the end of a string.
This is my text:
"['45453656565']"
I need to have this text:
"45453656565"
I've tried to use str.replace
text = text.replace("['","");
but it does not work.
You need to strip your text by passing the unwanted characters to str.strip() method:
>>> s = "['45453656565']"
>>>
>>> s.strip("[']")
'45453656565'
Or if you want to convert it to integer you can simply pass the striped result to int function:
>>> try:
... val = int(s.strip("[']"))
... except ValueError:
... print("Invalid string")
...
>>> val
45453656565
Using re.sub:
>>> my_str = "['45453656565']"
>>> import re
>>> re.sub("['\]\[]","",my_str)
'45453656565'
You could loop over the character filtering if the element is a digit:
>>> number_array = "['34325235235']"
>>> int(''.join(c for c in number_array if c.isdigit()))
34325235235
This solution works even for both "['34325235235']" and '["34325235235"]' and whatever other combination of number and characters.
You also can import a package and use a regular expresion to get it:
>>> import re
>>> theString = "['34325235235']"
>>> int(re.sub(r'\D', '', theString)) # Optionally parse to int
Instead of hacking your data by stripping brackets, you should edit the script that created it to print out just the numbers. E.g., instead of lazily doing
output.write(str(mylist))
you can write
for elt in mylist:
output.write(elt + "\n")
Then when you read your data back in, it'll contain the numbers (as strings) without any quotes, commas or brackets.

Python: Change uppercase letter

I can't figure out how to replace the second uppercase letter in a string in python.
for example:
string = "YannickMorin"
I want it to become yannick-morin
As of now I can make it all lowercase by doing string.lower() but how to put a dash when it finds the second uppercase letter.
You can use Regex
>>> import re
>>> split_res = re.findall('[A-Z][^A-Z]*', 'YannickMorin')
['Yannick', 'Morin' ]
>>>'-'.join(split_res).lower()
This is more a task for regular expressions:
result = re.sub(r'[a-z]([A-Z])', r'-\1', inputstring).lower()
Demo:
>>> import re
>>> inputstring = 'YannickMorin'
>>> re.sub(r'[a-z]([A-Z])', r'-\1', inputstring).lower()
'yannic-morin'
Find uppercase letters that are not at the beginning of the word and insert a dash before. Then convert everything to lowercase.
>>> import re
>>> re.sub(r'\B([A-Z])', r'-\1', "ThisIsMyText").lower()
'this-is-my-text'
the lower() method does not change the string in place, it returns the value that either needs to be printed out, or assigned to another variable. You need to replace it.. One solution is:
strAsList = list(string)
strAsList[0] = strAsList[0].lower()
strAsList[7] = strAsList[7].lower()
strAsList.insert(7, '-')
print (''.join(strAsList))

Removing many types of chars from a Python string

I have some string X and I wish to remove semicolons, periods, commas, colons, etc, all in one go. Is there a way to do this that doesn't require a big chain of .replace(somechar,"") calls?
You can use the translate method with a first argument of None:
string2 = string1.translate(None, ";.,:")
Alternatively, you can use the filter function:
string2 = filter(lambda x: x not in ";,.:", string1)
Note that both of these options only work for non-Unicode strings and only in Python 2.
You can use re.sub to pattern match and replace. The following replaces h and i only with empty strings:
In [1]: s = 'byehibyehbyei'
In [1]: re.sub('[hi]', '', s)
Out[1]: 'byebyebye'
Don't forget to import re.
>>> import re
>>> foo = "asdf;:,*_-"
>>> re.sub('[;:,*_-]', '', foo)
'asdf'
[;:,*_-] - List of characters to be matched
'' - Replace match with nothing
Using the string foo.
For more information take a look at the re.sub(pattern, repl, string, count=0, flags=0) documentation.
Don't know about the speed, but here's another example without using re.
commas_and_stuff = ",+;:"
words = "words; and stuff!!!!"
cleaned_words = "".join(c for c in words if c not in commas_and_stuff)
Gives you:
'words and stuff!!!!'

Categories