Very weird bug in Python 3.3 - python

Okay, so I made this little function that allows me to make a string into multiplier of 32 characters, but when I use String .replace I get some really, really weird bug. Since its making me pull my hair, can you guys take a look and see what I'm missing.
Variables:
self.blockSize = 32
self.interrupt = '$^EnD#Block^$'
self.filler = '#'
Functions:
def pad(self, data):
joint1 = ''.join([data, self.interrupt])
joint2 = self.filler * ((self.blockSize - len(joint1)) % self.blockSize)
return ''.join([joint1, joint2])
def unpad(self, data):
data = str(data).rstrip(self.interrupt)
return data.replace(self.filler, '')
Call:
p = e.pad('this is not a very good idea yo')
print(p)
print(e.unpad(p))
Output:
Jans-MacBook-Pro:test2 jan$ ../../bin/python3 data.py
this is not a very good idea yo123$^EnD#Block^$################
this is not a very good idea yo123
Jans-MacBook-Pro:test2 jan$ ../../bin/python3 data.py
this is not a very good idea yo$^EnD#Block^$###################
this is not a very good idea y
Jans-MacBook-Pro:test2 jan$
It makes o in yo disappear. Ahhhh! But nothing disappears if I add some random numbers after.
SOLUTION - EDIT: My bad. I have misplaced self.filler and self.interrupt. I am so embarrassed now. The code should have been:
def unpad(self, data):
data = str(data).rstrip(self.filler)
return data.replace(self.interrupt, '')

Read the documentation for rstrip:
The chars argument is a string specifying the set of characters to be removed.
rstrip removes all trailing characters present in the passed set of characters. It doesn't remove a trailing substring consisting of those characters in that order. 'abczyx'.rstrip('xyz') gives 'abc', and 'abczyx'.rstrip('zyx') also gives 'abc'.

Related

Extend Formatter features to f-string syntax

In a project of mine, I'm passing strings to a Formatter subclass whic formats it using the format specifier mini-language. In my case it is customized (using the features of the Formatter class) by adding additional bang converters : !u converts the resulting string to lowercase, !c to titlecase, !q doubles any square bracket (because reasons), and some others.
For example, using a = "toFu", "{a!c}" becomes "Tofu"
How could I make my system use f-string syntax, so I can have "{a+a!c}" be turned into "Tofutofu" ?
NB: I'm not asking for a way of making f"{a+a!c}" (note the presence of an f) resolve itself as "Tofutofu", which is what hook into the builtin python f-string format machinery covers, I'm asking if there is a way for a function or any form of python code to turn "{a+a!c}" (note the absence of an f) into "Tofutofu".
Not sure I still fully understand what you need, but from the details given in the question and some comments, here is a function that parses strings with the format you specified and gives the desired results:
import re
def formatter(s):
def replacement(match):
expr, frmt = match[1].split('!')
if frmt == 'c':
return eval(expr).title()
return re.sub(r"{([^{]+)}", replacement, s)
a = "toFu"
print(formatter("blah {a!c}"))
print(formatter("{a+a!c}blah"))
Outputs:
blah Tofu
Tofutofublah
This uses the function variation of the repl argument of the re.sub function. This function (replacement) can be further extended to support all other !xs.
Main disadvantages:
Using eval is evil.
This doesn't take in count regular format specifiers, i.e. :0.3
Maybe someone can take it from here and improve.
Evolved from #Tomerikoo 's life-saving answer, here's the code:
import re
def formatter(s):
def replacement(match):
pre, bangs, suf = match.group(1, 2, 3)
# pre : the part before the first bang
# bangs : the bang (if any) and the characters going with it
# suf : the colon (if any) and the characters going with it
if not bangs:
return eval("f\"{" + pre + suf + "}\"")
conversion = set(bangs[1:]) # the first character is always a bang
sra = conversion - set("tiqulc")
conversion = conversion - sra
if sra:
sra = "!" + "".join(sra)
value = eval("f\"{" + pre + (sra or "") + suf + "}\"")
if "q" in conversion:
value = value.replace("{", "{{")
if "u" in conversion:
value = value.upper()
if "l" in conversion:
value = value.lower()
if "c" in conversion and value:
value = value.capitalize()
return value
return re.sub(r"{([^!:\n]+)((?:![^!:\n]+)?)((?::[^!:\n]+)?)}", replacement, s)
The massive regex results in the three groups I commented about at the top.
Caveat: it still uses eval (no acceptable way around it anyway), it doesn't allow for multiline replacement fields, and it may cause issues and/or discrepancies to put spaces between the ! and the :.
But these are acceptable for the use I have.
Please check specifcation
only those characters are allowed : 's', 'r', or 'a'
https://peps.python.org/pep-0498/

python regex - no match in script, although it should

I am programming a parser for an old dictionary and I'm trying to find a pattern like re.findall("{.*}", string) in a string.
A control print after the check proves, that only a few strings match, although all strings contain a pattern like {...}.
Even copying the string and matching it interactively in the idle shell
gives a match, but inside the rest of the code, it simply does not.
Is it possible that this problem is caused by the actual python interpreter?
I cannot figure out any other problem...
thanks for your help
the code snippet looks like that:
for aParse in chunklist:
aSigle = aParse[1]
aParse = aParse[0]
print("to be parsed", aParse)
aContext = Context()
aContext._init_("")
aContext.ID = contextID
aContext.source = aSigle
# here, aParse is the string containing {Abriss}
# which is part of a lexicon entry
metamatches = re.findall("\{.*\}", aParse)
print("metamatches: ", metamatches)
for meta in metamatches:
aMeta = meta.replace("{", "").replace("}", "")
aMeta = aMeta.split()
for elem in aMeta:
...
Try this:
re = {0: "{.test1}",1: "{.test1}",2: "{.test1}",3: "{.test1}"}
for value in re.itervalues():
if "{" in value:
value = value.replace("{"," ")
print value
or if you want to remove both "{}”
for value in re.itervalues():
if "{" in value:
value = value.strip('{}')
print value
Try this
data=re.findall(r"\{([^\}]*)}",aParse,re.I|re.S)
DEMO
So, in a really simplified scenario, a lexical entry looks like that:
"headword" {meta, meaning} context [reference for context].
So, I was chunking (split()) the entry at [...] with a regex. that works fine so far. then, after separating the headword, I tried to find the meta/meaning with a regex that finds all patterns of the form {...}. Since that regex didn't work, I replaced it with this function:
def findMeta(self, string, alist):
opened = 0
closed = 0
for char in enumerate(string):
if char[1] == "{":
opened = char[0]
elif char[1] == "}":
closed = char[0]
meta = string[opened:closed+1]
alist.append(meta)
string.replace(meta, "")
Now, its effectively much faster and the meaning component is correctly analysed. The remaining question is: in how far are the regex which I use to find other information (e.g. orthographic variants, introduced by "s.}") reliable? should they work or is it possible that the IDLE shell is simply not capable of parsing a 1000 line program correctly (and compiling all regex)? an example for a string whose meta should actually have been found is: " {stm.} {der abbruch thut, den armen das gebührende vorenthält} [Renn.]"
the algorithm finds the first, saying this word is a noun, but the second, it's translation, is not recognized.
... This is medieval German, sorry for that! Thank you for all your help.

How to convert string to Title Case in Python?

Example:
HILO -> Hilo
new york -> New York
SAN FRANCISCO -> San Francisco
Is there a library or standard way to perform this task?
Why not use title Right from the docs:
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
If you really wanted PascalCase you can use this:
>>> ''.join(x for x in 'make IT pascal CaSe'.title() if not x.isspace())
'MakeItPascalCase'
This one would always start with lowercase, and also strip non alphanumeric characters:
def camelCase(st):
output = ''.join(x for x in st.title() if x.isalnum())
return output[0].lower() + output[1:]
def capitalizeWords(s):
return re.sub(r'\w+', lambda m:m.group(0).capitalize(), s)
re.sub can take a function for the "replacement" (rather than just a string, which is the usage most people seem to be familiar with). This repl function will be called with an re.Match object for each match of the pattern, and the result (which should be a string) will be used as a replacement for that match.
A longer version of the same thing:
WORD_RE = re.compile(r'\w+')
def capitalizeMatch(m):
return m.group(0).capitalize()
def capitalizeWords(s):
return WORD_RE.sub(capitalizeMatch, s)
This pre-compiles the pattern (generally considered good form) and uses a named function instead of a lambda.
Potential library: https://pypi.org/project/stringcase/
Example:
import stringcase
stringcase.camelcase('foo_bar_baz') # => "fooBarBaz"
Though it's questionable whether it will leave spaces in. (Examples show it removing space, but there is a bug tracker issue noting that it leaves them in.)
Why not write one? Something like this may satisfy your requirements:
def FixCase(st):
return ' '.join(''.join([w[0].upper(), w[1:].lower()]) for w in st.split())
Note: Why am I providing yet another answer? This answer is based on the title of the question and the notion that camelcase is defined as: a series of words that have been concatenated (no spaces!) such that each of the original words start with a capital letter (the rest being lowercase) excepting the first word of the series (which is completely lowercase). Also it is assumed that "all strings" refers to ASCII character set; unicode would not work with this solution).
simple
Given the above definition, this function
import re
word_regex_pattern = re.compile("[^A-Za-z]+")
def camel(chars):
words = word_regex_pattern.split(chars)
return "".join(w.lower() if i is 0 else w.title() for i, w in enumerate(words))
, when called, would result in this manner
camel("San Francisco") # sanFrancisco
camel("SAN-FRANCISCO") # sanFrancisco
camel("san_francisco") # sanFrancisco
less simple
Note that it fails when presented with an already camel cased string!
camel("sanFrancisco") # sanfrancisco <-- noted limitation
even less simple
Note that it fails with many unicode strings
camel("México City") # mXicoCity <-- can't handle unicode
I don't have a solution for these cases(or other ones that could be introduced with some creativity). So, as in all things that have to do with strings, cover your own edge cases and good luck with unicode!
just use .title(), and it will convert first letter of every word in capital, rest in small:
>>> a='mohs shahid ss'
>>> a.title()
'Mohs Shahid Ss'
>>> a='TRUE'
>>> b=a.title()
>>> b
'True'
>>> eval(b)
True
def camelCase(st):
s = st.title()
d = "".join(s.split())
d = d.replace(d[0],d[0].lower())
return d
I would like to add my little contribution to this post:
def to_camelcase(str):
return ' '.join([t.title() for t in str.split()])
From code wars - Write simple .camelCase method in Python for strings. All words must have their first letter capitalized without spaces.
camelcase("hello case") => HelloCase
camelcase("camel case word") => CamelCaseWord
def camel_case(string):
titled_string = string.title()
space_joined_string = titled_string.replace(' ', '')
return space_joined_string

Replacing leading text in Python

I use Python 2.6 and I want to replace each instance of certain leading characters (., _ and $ in my case) in a string with another character or string. Since in my case the replacement string is the same, I came up with this:
def replaceLeadingCharacters(string, old, new = ''):
t = string.lstrip(old)
return new * (len(string) - len(t)) + t
which seems to work fine:
>>> replaceLeadingCharacters('._.!$XXX$._', '._$', 'Y')
'YYY!$XXX$._'
Is there a better (simpler or more efficient) way to achieve the same effect in Python ?
Is there a way to achieve this effect with a string instead of characters? Something like str.replace() that stops once something different than the string-to-be-replaced comes up in the input string? Right now I've come up with this:
def replaceLeadingString(string, old, new = ''):
n = 0
o = 0
s = len(old)
while string.startswith(old, o):
n += 1
o += s
return new * n + string[o:]
I am hoping that there is a way to do this without an explicit loop
EDIT:
There are quite a few answers using the re module. I have a couple of questions/issues with it:
Isn't it significantly slower than the str methods when used as a replacement for them?
Is there an easy way to properly quote/escape strings that will be used in a regular expression? For example if I wanted to use re for replaceLeadingCharacters, how would I ensure that the contents of the old variable will not mess things up in ^[old]+ ? I'd rather have a "black-box" function that does not require its users to pay attention to the list of characters that they provide.
Your replaceLeadingCharacters() seems fine as is.
Here's replaceLeadingString() implementation that uses re module (without the while loop):
#!/usr/bin/env python
import re
def lreplace(s, old, new):
"""Return a copy of string `s` with leading occurrences of
substring `old` replaced by `new`.
>>> lreplace('abcabcdefabc', 'abc', 'X')
'XXdefabc'
>>> lreplace('_abc', 'abc', 'X')
'_abc'
"""
return re.sub(r'^(?:%s)+' % re.escape(old),
lambda m: new * (m.end() / len(old)),
s)
Isn't it significantly slower than the str methods when used as a replacement for them?
Don't guess. Measure it for expected input.
Is there an easy way to properly quote/escape strings that will be used in a regular expression?
re.escape()
re.sub(r'^[._$]+', lambda m: 'Y' * m.end(0), '._.!$XXX$._')
But IMHO your first solution is good enough.

Returning all characters before the first underscore

Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.
For example:
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.
To further clarify, my above examples aren't the actual strings. The actual string would look like:
AG.av08_binloop_v6
With my desired output looking like:
AGAV08
And the next example would be the same. String:
TL.av1_binloopv2
Desired output:
TLAV1
Again, thanks all for the help!
Even without re:
text.split('_', 1)[0].replace('.', '').upper()
Try this:
re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())
Since everyone is giving their favorite implementation, here's mine that doesn't use re:
>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
... print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1
I put .upper() on the outside of the generator so it is only called once.
You don't have to use re for this. Simple string operations would be enough based on your requirements:
tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
"""
for t in tests.splitlines():
print t[:t.find('_')].replace('.', '').upper()
# Returns:
# AGAV08
# TLAV1
Or if you absolutely must use re:
import re
pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)
for t in tests.splitlines():
print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()
# Returns:
# AGAV08
# TLAV1
He, just for fun, another option to get text before the first underscore is:
before_underscore, sep, after_underscore = str.partition('_')
So all in one line could be:
re.sub("[^A-Z\d]", "", str.partition('_')[0].upper())
import re
re.sub("[^A-Z\d]", "", yourstr.split('_',1)[0].upper())

Categories