Using regex to get multiple number in string in python - python

I have 3 strings
"1:1 (1 stock – 1 voting right)"
"1 stock – 1 voting right"
"01 stock – 01 voting right"
How can i get all the numbers in string and return it to format "num:num" when I use regex in Python. Can you give me a general regex to process all case?
Example:
If case is first string, it will return 1:1 because it already have 1:1
If case is in other, it will also return 1:1
Thank you

My two cents assuming:
At least two integers seperated by at least a single non-number;
Order of appearance of these integers is kept as input for 1st:2nd integer in final format;
Remove possible leading zeros from integers.
Try:
^\D*0*(\d+)\D+0*(\d+).*$
Replace with \1:\2. See an online demo
The above would match:
^ - Start-line anchor;
\D*0* - 0+ (Greedy) non-digits followed by 0+ zeros to prevent the following capture group from holding leading zeros;
(\d+) - A 1st capture group to catch the 1st integer of 1+ digits;
\D+0* - 1+ (Greedy) non-digits followed by 0+ zeros to prevent the following capture group from holding leading zeros;
(\d+) - A 2nd capture group to catch the 1st integer of 1+ digits;
.*$ - 0+ (Greedy) characters upto end-line anchor.

Related

Regex for Alternating Numbers

I am trying to write a regex pattern for phone numbers consisting of 9 fixed digits.
I want to identify numbers that have two numbers alternating for four times such as 5XYXYXYXY
I used the below sample
number = 561616161
I tried the below pattern but it is not accurate
^5(\d)(?=\d\1).+
can someone point out what i am doing wrong?
I would use:
^(?=\d{9}$)\d*(\d)(\d)(?:\1\2){3}\d*$
Demo
Here is an explanation of the pattern:
^ from the start of the number
(?=\d{9}$) assert exactly 9 digits
\d* match optional leading digits
(\d) capture a digit in \1
(\d) capture another digit in \2
(?:\1\2){3} match the XY combination 3 more times
\d* more optional digits
$ end of the number
If you want to repeat 4 times 2 of the same pairs and matching 9 digits in total:
^(?=\d{9}$)\d*(\d\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
(\d\d)\1{3} Capture group 1, match 2 digits and then repeat what is captured in group 1 3 times
\d* Match optional digits
$ End of string
Regex demo
If you want to match a pattern repeating 4 times 2 digits where the 2 digits are not the same:
^(?=\d{9}$)\d*((\d)(?!\2)\d)\1{3}\d*$
Explanation
^ Start of string
(?=\d{9}$) Positive lookahead, assert 9 digits till the end of the string
\d* Match optional digits
( Capture group 1
(\d) Capture group 2, match a single digit
(?!\2)\d Negative lookahead, assert not the same char as captured in group 2. If that is the case, then match a single digit
) Close group 1
\1{3} Repeat the captured value of capture group 1 3 times
\d* Match optional digits
$ End of string
Regex demo
My first guess from OP's self tried regex ^5(\d)(?=\d\1).+ without any own additions was a regex is needed to verify numbers starting with 5 and followed by 4 pairs of same two digits.
^5(\d\d)\1{3}$
Demo at regex101
The same idea with the "added guess" to disallow all same digits like e.g. 511111111
^5((\d)(?!\2)\d)\1{3}$
Demo at regex101
Guessing further that 5 is a variable value and assuming if one variable at start/end with the idea of taking out invalid values early - already having seen the other nice provided answers.
^(?=\d?(\d\d)\1{3})\d{9}$
Demo at regex101
Solution 3 with solution 2's assumption of two different digits in first pairing.
^(?=\d?((\d)(?!\2)\d)\1{3})\d{9}$
Demo at regex101
Solutions 3 and 4 are most obvious playings with #4thBird's nice answer in changed order.

Test for comma delimited string, ignoring any encountered periods, say from real numbers?

The following works for a simple comma delimited string, that has no periods, but if periods in real numbers found it breaks.
pattern = re.compile(r"^(\w+)(,\s*\w+)*$")
How can I modify or change the above to ignore periods? But still validate the given string is comma delimited?
A sample test string is "23,HIGH,1.0,LOW,1.0,HIGH,1.0,LOW,1.0".
\w matches "word" characters: letters, digits and _. It doesn't match a dot. If you want to match dots as well, use [\w.] instead of \w:
pattern = re.compile(r"^([\w.]+)(,\s*[\w.]+)*$")
You might also want to add -, if you expect negative numbers. To put - in a character class, you either have to backslash escape it or make sure it's either the first or last character in the class:
[-.\w]
[\w.-]
[\w\-.]
If the value can only be a number, and matching dots only would not be desired you can use and alternation to match either word characters or a number.
^(?:[+-]?\d*\.?\d+|\w+)(?:,(?:[+-]?\d*\.?\d+|\w+))*$
Explanation
^ Start of string
(?: Non capture group
[+-]?\d*\.?\d+ Match an optional + or -, then optional digits, optional dot and 1+ digits
| Or
\w+ Match 1+ word characters
) Close non capture group
(?: Non capture group
, Match the comma
(?:[+-]?\d*\.?\d+|\w+) The same pattern as in the first part
)* Close non capture group and optionally repeat to also match a single occurrence
$ End of string
Regex demo

Regex - Ignore if group has prefix

I am trying to capture 8 digit phone numbers in free text. This should be ignored if a particular string appears before.
My regex:
(\b(\+?001|002)?[-]?\d{4}(-|\s)?\d{4}\b)
To Capture:
+001 12345678
12345678
Not Capture:
TTT-12345678-123
TTT-12345678
I am trying to use negative look behind as below example:
\w*(?<!foo)bar
But the above works only if the regex doesn't have subsequent groups.
You may use
(?<!TTT-)(?<!\w)(?:\+?001|002)?[-\s]?\d{4}[-\s]?\d{4}\b
See the regex demo
Details
(?<!TTT-) - no TTT- allowed immediately on the left
(?<!\w) - no word char allowed immediately on the left
(?:\+?001|002)? - an optional non-capturing group matching 1 or 0 occurrences of +001, 001 or 002
[-\s]? - an optional - or whitespace
\d{4} - any four digits
[-\s]?\d{4} - - an optional - or whitespace and any four digits
\b - a word boundary.
If the number can be glued to a word char on the right, replace the \b word boundary with the right-hand digit boundary, (?!\d).

Searching multiple repeating patterns of text using regular exressions

I am trying to search for texts from a document, which have repeating portions and occur multiple times in the document. However, using the regex.match, it shows only the first match from the document and not others.
The patterns which I want to search looks like:
clauses 5.3, 12 & 15
clause 10 C, 10 CA & 10 CC
The following line shows the regular expression which I am using.
regex_crossref_multiple_1=r'(clause|Clause|clauses|Clauses)\s*\d+[.]?\d*\s*[a-zA-Z]*((,|&|and)\s*\d+[.]?\d*\s*[A-Z]*)+'
The code used for matching and the results are shown below:
cross=regex.search(regex_crossref_multiple_1,des)
(des is string containing text)
For printing the results, I am using print(cross.group()).
Result:
clauses 5.3, 12 & 15
However, there are other patterns as well in des which I am not getting in the result.
Please let me know what can be the problem.
The input string(des) is can be found from following link.
https://docs.google.com/document/d/1LPmYaD6VE724OYoXDGPfInvx8WTu5JfrTqTOIv8zAlg/edit?usp=sharing
In case, the contractor completes the work ahead of stipulated date of
completion or justified extended date of completion as determined
under clauses 5.3, 12 & 15, a bonus # 0.5 % (zero point five per cent) of
the tendered value per month computed on per day basis, shall be
payable to the contractor, subject to a maximum limit of 2 % (two
percent) of the tendered value. Provided that justified time for extra
work shall be calculated on pro-rata basis as cost of extra work excluding
amount payable/ paid under clause 10 C, 10 CA & 10 CC X stipulated
period /tendered value. The amount of bonus, if payable, shall be paid
along with final bill after completion of work. Provided always that
provision of the Clause 2A shall be applicable only when so provided in
‘Schedule F’
You could match clauses followed by an optional digits part and optional chars A-Z and then use a repeating pattern to match the optional following comma and the digits.
For the last part of the pattern you can optionally match either a ,, & or and followed by a digit and optional chars A-Z.
\b[Cc]lauses?\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?(?:,\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?)*(?:\s+(?:[,&]|and)\s+\d+(?:\.\d+)?(?:\s*[A-Z]+)?)?\b
Explanation
\b Word boundary
[Cc]lauses?\s+\d+(?:\.\d+)? Match clauses followed by digits and optional decimal part
(?:\s*[A-Z]+)? Optionally match whitespace chars and 1+ chars A-Z
(?: Non capture group
,\s+\d+(?:\.\d+)? Match a comma, digits and optional decimal part
(?:\s*[A-Z]+)? Optionally match whitespace chars and 1+ chars A-Z
)* Close group and repeat 0+ times
(?: Non capture group
\s+(?:[,&]|and) Match 1+ whitespace char and either ,, & or and
\s+\d+(?:\.\d+)? Match 1+ whitespace chars, 1+ digits with an optional decimal part
(?:\s*[A-Z]+)? Match optional whitespace chars and 1+ chars A-Z
)? Close group and make optional
\b Word boundary
Regex demo

How to extract different types of sub-strings from a string in python using regular expression?

As the title, I'm supposed to get some sub-strings from a string which looks like this: "-23/45 + 14/9". What I need to get from that string is the four numbers and the operator in the middle. What has confused me is that how to use only one regular expression pattern to do this. Below is the requirement:
Write a regular expression patt that can be used to extract
(numerator,denominator,operator,numerator,denominator)
from a string containing a fraction, an arithmetic operator, and a fraction. You may
assume there is a space before and after the arithmetic operator and no spaces
surrounding the / character in a fraction. And all fractions will have a numerator and
denominator.
Example:
>>> s = "-23/45 + 14/9"
>>> re.findall(patt,s)
[( "-23","45","+","14","49")]
>>> s = "-23/45 * 14/9"
>>> re.findall(patt,s)
[( "-23","45","*","14","49")]
In general, your code should handle any of the operators +, -, * and /.
Note: the operator module for the two argument function equivalents of the arithmetic
(and other) operators
My problem here is that how to use only one regular expression to do this. I have thought about getting the sub strings contain numbers and stop at any character which is not a number, but this will miss the operator in the middle. Another idea is to include all the operators( + - * /) and stop at white space, but this will make first and last two numbers become together. Can anybody give me a direction how to solve this problem with only one regular expression pattern? Thanks a lot!
Try this regex:
(-?\d+)\s*\/\s*(\d+) *([+*\/-])\s*(-?\d+)\s*\/(\d+)
Click for regex Demo
You can extract the required information from Group 1 to Group 5
Explanation:
(-?\d+) - matches an optional - followed by 1+ occurrences of a digit and capture it in Group 1
\s*\/\s* - matches 0+ occurrences of a whitespace followed by a / followed by 0+ occurrences of a whitespace
(\d+) - matches 1+ occurrences of a digit and capture it in Group 2
* - matches 0+ occurrences of a space
([+*\/-]) - matches one of the operators in +,-,/,* and captures it in Group 3
\s* - matches 0+ occurrences of a whitespace
(-?\d+) - matches an optional - followed by 1+ occurrences of a digit and capture it in Group 4
\s*\/ - matches 0+ occurrences of a whitespace followed by /
(\d+) - matches 1+ occurrences of a digit and capture it in Group 5

Categories