How to extract the first numbers in a string - Python - python

How do I remove all the numbers before the first letter in a string? For example,
myString = "32cl2"
I want it to become:
"cl2"
I need it to work for any length of number, so 2h2 should become h2, 4563nh3 becomes nh3 etc.
EDIT:
This has numbers without spaces between so it is not the same as the other question and it is specifically the first numbers, not all of the numbers.

If you were to solve it without regular expressions, you could have used itertools.dropwhile():
>>> from itertools import dropwhile
>>>
>>> ''.join(dropwhile(str.isdigit, "32cl2"))
'cl2'
>>> ''.join(dropwhile(str.isdigit, "4563nh3"))
'nh3'
Or, using re.sub(), replacing one or more digits at the beginning of a string:
>>> import re
>>> re.sub(r"^\d+", "", "32cl2")
'cl2'
>>> re.sub(r"^\d+", "", "4563nh3")
'nh3'

Use lstrip:
myString.lstrip('0123456789')
or
import string
myString.lstrip(string.digits)

Related

How to format a number with comma every four digits in Python?

I have a number 12345 and I want the result '1,2345'. I tried the following code, but failed:
>>> n = 12345
>>> f"{n:,}"
'12,345'
Regex will work for you:
import re
def format_number(n):
return re.sub(r"(\d)(?=(\d{4})+(?!\d))", r"\1,", str(n))
>>> format_number(123)
'123'
>>> format_number(12345)
'1,2345'
>>> format_number(12345678)
'1234,5678'
>>> format_number(123456789)
'1,2345,6789'
Explanation:
Match:
(\d) Match a digit...
(?=(\d{4})+(?!\d)) ...that is followed by one or more groups of exactly 4 digits.
Replace:
\1, Replace the matched digit with itself and a ,
Sounds like a locale thing(*). This prints 12,3456,7890 (Try it online!):
import locale
n = 1234567890
locale._override_localeconv["thousands_sep"] = ","
locale._override_localeconv["grouping"] = [4, 0]
print(locale.format_string('%d', n, grouping=True))
That's an I guess hackish way based on this answer. The other answer there talks about using babel, maybe that's a clean way to achieve it.
(*) Quick googling found this talking about Chinese grouping four digits, and OP's name seems somewhat Chinese, so...
Using babel:
>>> from babel.numbers import format_decimal
>>> format_decimal(1234, format="#,####", locale="en")
'1234'
>>> format_decimal(12345, format="#,####", locale="en")
'1,2345'
>>> format_decimal(1234567890, format="#,####", locale="en")
'12,3456,7890'
This format syntax is specified in UNICODE LOCALE DATA MARKUP LANGUAGE (LDML). Some light bedtime reading there.
Using stdlib only (hackish):
>>> from textwrap import wrap
>>> n = 12345
>>> ",".join(wrap(str(n)[::-1], width=4))[::-1]
'1,2345'
You can break your number into chunks of 10000's using modulus and integer division, then str.join using ',' delimiters
def commas(n):
s = []
while n > 0:
n, chunk = divmod(s, n)
s.append(str(chunk))
return ','.join(reversed(s))
>>> commas(123456789)
'1,2345,6789'
>>> commas(123)
'123'

Python: Change uppercase letter

I can't figure out how to replace the second uppercase letter in a string in python.
for example:
string = "YannickMorin"
I want it to become yannick-morin
As of now I can make it all lowercase by doing string.lower() but how to put a dash when it finds the second uppercase letter.
You can use Regex
>>> import re
>>> split_res = re.findall('[A-Z][^A-Z]*', 'YannickMorin')
['Yannick', 'Morin' ]
>>>'-'.join(split_res).lower()
This is more a task for regular expressions:
result = re.sub(r'[a-z]([A-Z])', r'-\1', inputstring).lower()
Demo:
>>> import re
>>> inputstring = 'YannickMorin'
>>> re.sub(r'[a-z]([A-Z])', r'-\1', inputstring).lower()
'yannic-morin'
Find uppercase letters that are not at the beginning of the word and insert a dash before. Then convert everything to lowercase.
>>> import re
>>> re.sub(r'\B([A-Z])', r'-\1', "ThisIsMyText").lower()
'this-is-my-text'
the lower() method does not change the string in place, it returns the value that either needs to be printed out, or assigned to another variable. You need to replace it.. One solution is:
strAsList = list(string)
strAsList[0] = strAsList[0].lower()
strAsList[7] = strAsList[7].lower()
strAsList.insert(7, '-')
print (''.join(strAsList))

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)
You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]
Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'
Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Parsing String with Python

How can I parse a string ['FED590498'] in python, so than I can get all numeric values 590498 and chars FED separately.
Some Samples:
['ICIC889150']
['FED889150']
['MFL541606']
and [ ] is not part of string...
If the number of letters is variable, it's easiest to use a regular expression:
import re
characters, numbers = re.search(r'([A-Z]+)(\d+)', inputstring).groups()
This assumes that:
The letters are uppercase ASCII
There is at least 1 character, and 1 digit in each input string.
You can lock the pattern down further by using {3, 4} instead of + to limit repetition to just 3 or 4 instead of at least 1, etc.
Demo:
>>> import re
>>> inputstring = 'FED590498'
>>> characters, numbers = re.search(r'([A-Z]+)(\d+)', inputstring).groups()
>>> characters
'FED'
>>> numbers
'590498'
Given the requirement that there are always 3 or 4 letters you can use:
import re
characters, numbers = re.findall(r'([A-Z]{3,4})(\d+)', 'FED590498')[0]
characters, numbers
#('FED', '590498')
Or even:
ids = ['ICIC889150', 'FED889150', 'MFL541606']
[re.search(r'([A-Z]{3,4})(\d+)', id).groups() for id in ids]
#[('ICIC', '889150'), ('FED', '889150'), ('MFL', '541606')]
As suggested by Martjin, search is the preferred way.

python regular expression substitute

I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.
score:0.86|taxid:9606(Human)|intact:EBI-999900
How to write regular expression for this in python.
>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'
If there are multiple taxids, use re.findall, which returns a list of all matches:
>>> re.findall(r'taxid:(\d+)', s)
['9606']
for line in lines:
match = re.match(".*\|taxid:([^|]+)\|.*",line)
print match.groups()

Categories