Python - Adding comments into a triple-quote string - python

Is there a way to add comments into a multiline string, or is it not possible? I'm trying to write data into a csv file from a triple-quote string. I'm adding comments in the string to explain the data. I tried doing this, but Python just assumed that the comment was part of the string.
"""
1,1,2,3,5,8,13 # numbers to the Fibonnaci sequence
1,4,9,16,25,36,49 # numbers of the square number sequence
1,1,2,5,14,42,132,429 # numbers in the Catalan number sequence
"""

No, it's not possible to have comments in a string. How would python know that the hash sign # in your string is supposed to be a comment, and not just a hash sign? It makes a lot more sense to interpret the # character as part of the string than as a comment.
As a workaround, you can make use of automatic string literal concatenation:
(
"1,1,2,3,5,8,13\n" # numbers to the Fibonnaci sequence
"1,4,9,16,25,36,49\n" # numbers of the square number sequence
"1,1,2,5,14,42,132,429" # numbers in the Catalan number sequence
)

If you add comments into the string, they become part of the string. If that weren't true, you'd never be able to use a # character in a string, which would be a pretty serious problem.
However, you can post-process the string to remove comments, as long as you know this particular string isn't going to have any other # characters.
For example:
s = """
1,1,2,3,5,8,13 # numbers to the Fibonnaci sequence
1,4,9,16,25,36,49 # numbers of the square number sequence
1,1,2,5,14,42,132,429 # numbers in the Catalan number sequence
"""
s = re.sub(r'#.*', '', s)
If you also want to remove trailing whitespace before the #, change the regex to r'\s*#.*'.
If you don't understand what these regexes are matching and how, see regex101 for a nice visualization.
If you plan to do this many times in the same program, you can even use a trick similar to the popular D = textwrap.dedent idiom:
C = functools.partial(re.sub, r'#.*', '')
And now:
s = C("""
1,1,2,3,5,8,13 # numbers to the Fibonnaci sequence
1,4,9,16,25,36,49 # numbers of the square number sequence
1,1,2,5,14,42,132,429 # numbers in the Catalan number sequence
""")

Related

Split string by special pattern

I have long string, which can consist of few sub-strings (not always, sometimes it's one string, sometimes there are 4 sub-strings sticked together). Each one starts with byte length, for example 4D or 4E. Below is example big-string which consists of 4 sub-strings:
4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
After splitting by pattern, the output SHOULD BE:
4D44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
4E44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB5
4E44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B969
4E44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
Each long string has ID - in this case it's 44B909, each line has this ID after bytes. My original code took first 6 letters (4D44B9) and splitted string by this. It's working in 95% cases - where EACH line has same length, for example 4D. The problem is that not always each line has same length - as in string above. Look at my code below:
def repeat():
string = input('Please paste string below:'+'\n')
code = string[:6]
print('\n')
print('SPLITTED:')
string = string.replace(code, '\n'+'\n'+code)
print(string)
while True:
repeat()
When you try to paste this one long string, it won't split it, because first line has 4D, and rest has 4E. I'd like it to "ignore" (for a moment) first 2 letters (4E) and take six next letters, as "split-pattern"? The output should be as these 4 lines above! I was changing code a bit, but I was getting some strange results, like below:
44B9096268182113077A95C84005D55FCD9D79476DDA4346C7EF1F4F07D4B46693F51812C8B74E
44B9097368182113077A340040058D55E7E8D3924C57182F6E07A4D3617E100D1652169668636CB54E
44B9096868182113077A37004005705FE9461E85F69A4C8E1B00CE03E6337B8F3D853A51C447B9694E
44B9096668182113077AA400400555C9FAADA21F1EC93DBD5B579E4E07DDAF75A45D095E72010DBB
How can I make it work??
If the first two characters encode the string's length in hex, why do you not use that to decide how much of the string to consume?
However, the offsets in your example seem wrong; 4D is correct (decimal 78) but 4E should apparently be 51 (the string is four characters longer).
For the question about how to split on a slightly variable pattern, a regular expression seems like a good solution.
import re
splitted = re.split(r'4[DE](?=44B909)', string)
In so many words, this says "use 4D or 4E as the delimiter to split on, but only if it's immediately followed by 44B909".
(There will be an empty group before the first value but that's easy to shift off; or change the regex to r'(?<!^)4[DE](?=44B909O)'.)
If you don't want to discard anything, include everything in the lookahead:
splitted = re.split(r'(?<!^)(?=4[DE]44B909)', string)

Regex substitution that returns a trimmed version of the input?

I am dealing with a variety of "five and two" strings that refer to an individual. The strings have the first five letters of an individual's last name, and then the first two letters of the individual's first name. Each string concludes with a two digit numeral that acts as a "tiebreaker" if more than two individuals have the same "five and two." The numerals are to be considered strings. In the event of an individual who possesses a last name shorter than five letters, the entire last name is included in the string with no extra characters to fill in the gap.
Examples:
adamsjo02
allenje01
alstoga01
ariasge01
aucoide01
ayraujo01
belkti01 #This individual has a last name with only four letters
I wish to convert each of these strings into a "four and one" string that has a three digit numeral. The result of the above examples after being converted should look like this:
adamj002
allej001
alstg001
ariag001
aucod001
ayraj001
belkt001
I am using python throughout my project. I suspect that a regex substitution would be the best course of action to achieve what I need. I have little experience with regexes, and have come up with this thus far to detect the regex:
re.compile(r'(/w){2,5}(/w/w)(/w/w)')
While this does not work for me, it does lay out that I perceive there to be three groupings in each string. The last name portion, the first name portion, and the numerals (to be treated as strings). Each of those groupings ought to be undergoing a change, with exception to any individual that may have a last name of four or fewer letters.
You can do with a proper escape character \ and f-string:
import re
text = '''adamsjo02
allenje01
alstoga01
ariasge01
aucoide01
ayraujo01
belkti01
maja01'''
p = re.compile(r"(\w{2,5})(\w{2})(\d{2})")
output = [f"{m.group(1):_<4.4}{m.group(2):1.1}{m.group(3):0>3}" for m in map(p.search, text.splitlines())]
print(output)
# ['adamj002', 'allej001', 'alstg001', 'ariag001', 'aucod001', 'ayraj001', 'belkt001', 'ma__j001']
In this case, since you have a very specific format, I'd say regex is not necessary, though it does the job. I'm proposing, then, an alternate solution without using it.
def to_four_one(code: str) -> str:
last, first, number = code[:-4][:4], code[-4:-2], int(code[-2:])
return f"{last}{first[-2]}{number:03}"
It's a simple function that rearranges the elements in the string. It simply gets the last name, first name and number as different elements, and rewrites them as the new format asks (clipping last names for len == 4, and first names for len == 1, besides formatting the number as 3 digit).
Usage below. I added two more names with even less characters to show it doesn't break in those cases.
codes = [
"adamsjo02",
"allenje01",
"alstoga01",
"ariasge01",
"aucoide01",
"ayraujo01",
"belkti01",
"jorma03",
"baka02"]
[print(to_four_one(code)) for code in codes]
>>>adamj002
allej001
alstg001
ariag001
aucod001
ayraj001
belkt001
jorm003
bak002

regex extraction with comma and thousand separators of various sizes [duplicate]

I am wondering, how would regular expression for testing correct format of number for German culture would look like.
In German, comma is used as decimal mark and dot is used to separate thousands.
Therefore:
1.000 equals to 1000
1,000 equals to 1
1.000,89 equals to 1000.89
1.000.123.456,89 equals to 1000123456.89
The real trick, seems to me, is to make sure, that there could be several dots, optionally followed by comma separator
This is the regex I would use:
^-?\d{1,3}(?:\.\d{3})*(?:,\d+)?$
Debuggex Demo
And this is a code example to interpret it as a valid floating point (notice the parseFloat() after the string replacements).
Edit: as mentioned in Severin Klug's answer, the below code assumes that the numbers are known to be in German format. Attempting to "detect" whether a string contains a German format or US format number is not arbitrary and out of scope for this question. '1.234' is valid in both formats but with different actual values, without context it is impossible to know for sure which format was meant.
var numbers = ['1.000', '1,000', '1.000,89', '1.000.123.456,89'];
document.getElementById('out').value=numbers.map(function(str) {
return parseFloat(str.replace(/\./g, '').replace(',', '.'));
}).join('\n');
<textarea id="out" rows="10" style="width:100%"></textarea>
I would have posted this as a comment, but I dont have enough reputation.
#funkwurm, your post https://stackoverflow.com/a/28361329/7329611 contains javascript
var numbers = ['1.000', '1,000', '1.000,89', '1.000.123.456,89', '1.2'];
numbers.map(function(str) {
return parseFloat(str.replace(/\./g, '').replace(',', '.'));
}).join('\n');
which should convert german numbers to english/international ones - which it does for every number with exactly three digits after a german thousands dot like the numbers you use in the example array. BUT - and there is the critical Use-Case-Error: it just deletes dots from any other string with not three digits after it aswell.
So if you insert a string like '1.2' it returns 12, if you insert '1.23' it returns 123.
And this is a very critical behaviour, if anyone just takes the above code snippet and thinks it'll convert any given number correctly into english ones. Because already correct english numbers will be corrupted! So be careful, please.
This regex should work :
([0-9]{1,3}(?:\.[0-9]{3})*(?:\,[0-9]+)?)
A good regex would be something like this
Regex regex = new Regex("-?\d{1,3}(?:\.\d{3})*(?:,\d+)?");
Match match = regex.Match(input);
Decimal result = Decimal.Zero;
if (match.Success)
result = Decimal.Parse(match.Value, new CultureInfo("de-DE"));
The result is the german number as parsed value.
Try this it will match your inputs:
^(\d+\.)*\d+(,\d+)?
This regex would work for + numbers
/^[0-9]{0,3}(\.[0-9]{3})*(,[0-9]{0,2})?$/
Breakdown
[0-9]{0,3} - this section allows zero up to 3 numbers. empty value is valid, '1', '26', '789' are valid. '1589' is invalid
(\.[0-9]{3})* - this section allows zero or more dots... if there's a dot, there must be three digits after the dot. '2.589' is valid. '2.5896' and '2.45' are invalid
(,[0-9]{0,2})? - this section allows zero or 1 comma. there can be zero up to 2 digits after the comma. '25,', '25,5', '25,45' are valid. '25,456' and '25,45,8' are invalid
Hope this is helpful

How can I generate random numbers based on a pattern from a given list of numbers?

I'm trying to generate x random numbers based on lists I will provide (containing the same amount of numbers I want generated).
So Far I have this code:
import random
list = []
while len(list) < 10:
x = random.randint(1, 100)
if x not in list:
list.append(x)
list.sort()
print (list)
The question is, how do I input the lists I have so Python can read some pattern (in lack of a better word) and generate numbers?
Tried Google it, found nothing so far.
Thanks.
With python a file can be read and split on whitespace into a list using str.split() with no argument like this:
lines = []
for line in open('filename'):
line = line.strip().split() # splits on whitespace
for token in line:
lines.append(token)
If the file has a different separator such as a colon it can be split like above if the separator is a character or fixed sequence of characters using split('char') as in split(':') or split('charseq') as in split('==='), or it can be split on a regular expression using re.split('some_regex','text2split'). Additionally, it could be useful to verify the format of numeric data to ensure invalid data does not cause an error or other undesirable behavior in subsequent processing.
Below is a complete example for extracting comma-separated numbers from a file and appending them into a list and where the numbers are filtered to match at least one of three forms defined by regular expressons: (1) '\d+' (more than one decimal digit); (2) '\d+.\d*' (more than one decimal digit followed by a period followed by zero or more decimal digits; or (3) '\d*.\d+' (zero or more decimal digits followed by a period followed by one or more decimal digits). In this example a regex for matching numbers in these forms is compiled to improve performance.
import re
numList = []
regex = re.compile('^(\d+)|(\d+\.\d*)|(\d*\.\d+)$')
for data in open('filename'):
tmpList = re.split(',',data.strip()) # could use data.strip().split(',')
for element in tmpList:
if regex.match(element):
numList.append(element)
After running this the numbers in numList can be iterated like this:
for item in numList:
print(item)
# do other things such as calculations with item

python: regular expressions, how to match a string of undefind length which has a structure and finishes with a specific group

I need to create a regexp to match strings like this 999-123-222-...-22
The string can be finished by &Ns=(any number) or without this... So valid strings for me are
999-123-222-...-22
999-123-222-...-22&Ns=12
999-123-222-...-22&Ns=12
And following are not valid:
999-123-222-...-22&N=1
I have tried testing it several hours already... But did not manage to solve, really need some help
Not sure if you want to literally match 999-123-22-...-22 or if that can be any sequence of numbers/dashes. Here are two different regexes:
/^[\d-]+(&Ns=\d+)?$/
/^999-123-222-\.\.\.-22(&Ns=\d+)?$/
The key idea is the (&Ns=\d+)?$ part, which matches an optional &Ns=<digits>, and is anchored to the end of the string with $.
If you just want to allow strings 999-123-222-...-22 and 999-123-222-...-22&Ns=12 you better use a string function.
If you want to allow any numbers between - you can use the regex:
^(\d+-){3}[.]{3}-\d+(&Ns=\d+)?$
If the numbers must be of only 3 digits and the last number of only 2 digits you can use:
^(\d{3}-){3}[.]{3}-\d{2}(&Ns=\d{2})?$
This looks like a phone number and extension information..
Why not make things simpler for yourself (and anyone who has to read this later) and split the input rather than use a complicated regex?
s = '999-123-222-...-22&Ns=12'
parts = s.split('&Ns=') # splits on Ns and removes it
If the piece before the "&" is a phone number, you could do another split and get the area code etc into separate fields, like so:
phone_parts = parts[0].split('-') # breaks up the digit string and removes the '-'
area_code = phone_parts[0]
The portion found after the the optional '&Ns=' can be checked to see if it is numeric with the string method isdigit, which will return true if all characters in the string are digits and there is at least one character, false otherwise.
if len(parts) > 1:
extra_digits_ok = parts[1].isdigit()

Categories