Why doesn't this python regex work? - python

I wrote a simple class that takes an input zip or postal code and either returns that value or zero-pads it out to five digits if it happens to be all-numeric and less than 5 digits long.
Why doesn't my code work?
import re
class ZipOrPostalCode:
def __init__(self, data):
self.rawData = data
def __repr__(self):
if re.match(r"^\d{1,4}$", self.rawData):
return self.rawData.format("%05d")
else:
return self.rawData
if __name__ == "__main__":
z=ZipOrPostalCode("2345")
print(z)
The output I expect is 02345. It outputs 2345.
Running it in the debugger, it is clear that the regular expression didn't match.

Your regex works, it's the format that doesn't because you're trying to pass an integer format for a string, and the other way round, and with old-style % syntax...
In str.format, the string object bears the format (using {} style syntax) and the strings/integers/whatever objects to format are passed as parameters.
Replace (for instance) by:
if re.match(r"^\d{1,4}$", self.rawData):
return "{:05}".format(int(self.rawData))
without format, you can also use zfill to left-pad with zeroes (faster, since you don't have to convert to integer)
return self.rawData.zfill(5)
and you probably don't even need to test the number of digits, just zfill no matter what or only if the zipcode is only digits:
def __repr__(self):
return self.rawData.zfill(5) if self.rawData.isdigit() else self.rawData

You've got your format code backwards.
return "{:05d}".format(int(self.rawData))

Related

What is the best way to format a function's return value to include spaces between variables?

Without making my foo function contain an actual print statement, and only have foo return a
value, what is the best way to format a function? The code in the example will make the main function print "11" for example, instead of "1 1" or "1 & 1".
I really appreciate ya'll!
def foo(x,y):
if x == y:
return x*2
def main():
x=input("Enter value 1")
y=input("Enter value 2")
print(int(max(x,y)))
main()
Even though the code that you've provided isn't matching your actual question (in the title), I'll try to answer what I think is your question.
You can try returning a string with a space in between the elements, like so:
def foo(x,y):
if x == y:
string = str(x)+" "+str(y) #there's a space between the quotation marks
return string
#this works even if x and y are not integers due to `str(x)` & `str(y)`
Another approach is:
def foo(x,y):
if x==y:
n = 2 #this is the number of times you want x to appear in the output
x_new = x+" " #x concatenated with a space
return (x_new*n).rstrip()
#here, rstrip() is a string method that removes\
#whitespaces from the *right* end of the string
The advantage of concatenation of strings is that you can concatenate any character (or string!) to str(x). For example, you can return str(x)+"&"+str(y) and you'll get 1&1 for x=1!
In the image below, I have defined foo(x,y) thrice and then printed the value returned by the function. NOTE: This is in the interactive Python IDLE Shell.

array handling in python

i am just trying...but the self.value show error ie...i want to loop self.a,self.b,self.c...help require for learning help required......output wanted is x= [AA,EE,II] using classes and loops.i tried looping the self.a,self.b,self.c using for loop.........i am learning python and object oriented programming newly....help me out
import string
A = ["AA","BB","CC","DD"]
B = ["EE","FF","GG","HH"]
C = ["II","JJ","KK","LL"]
class User:
def __init__(self,A,B,C):
self.a= A
self.b= B
self.c= C
def User1(self):
x=[]
for i in range(ord('a'), ord('c')+1):
value= chr(i)
x.append= self.(value)[0] ///for getting first elemen from A,B,C
i+=1
return x
honey= User(A,B,C)
print(honey.User1())
WHat you want is to use getattr - but there are a few other things broken there. (to start with the fact that the comment character is # in Python, and not the // sequence.
So, your User1 method could be something like:
def User1(self):
x=[]
for value in "abc":
x.append(getattr(self, value)[0])
return x
Note as well that the for statement will always iterate over a sequence, and you don't need to go long ways to convert your sequence to numbers, just for converting those numbers back to the desired elements. As a string is also a sequence of characters - just looping over "abc" will yield your desired letters.
As stated above, the getattr built-in will then retrieve the desired attribute from self gven the attribute name as a string, contained in the value variable.

string formatting / value passing not working for format

Given the following two methods:
def test():
string = "{test}"
print convert(string, test='test')
def convert(string, **test):
return string.format(test)
Why does this throw an KeyError: 'test'?
As I have seen in other threads, this should be a valid way of passing values, shouldn't it?
As shown in the question you linked to, you need to expand the keyword-argument dictionary when passing it to format:
return string.format(**test)

Check if a string is in List case-insentive [duplicate]

I love using the expression
if 'MICHAEL89' in USERNAMES:
...
where USERNAMES is a list.
Is there any way to match items with case insensitivity or do I need to use a custom method? Just wondering if there is a need to write extra code for this.
username = 'MICHAEL89'
if username.upper() in (name.upper() for name in USERNAMES):
...
Alternatively:
if username.upper() in map(str.upper, USERNAMES):
...
Or, yes, you can make a custom method.
str.casefold is recommended for case-insensitive string matching. #nmichaels's solution can trivially be adapted.
Use either:
if 'MICHAEL89'.casefold() in (name.casefold() for name in USERNAMES):
Or:
if 'MICHAEL89'.casefold() in map(str.casefold, USERNAMES):
As per the docs:
Casefolding is similar to lowercasing but more aggressive because it
is intended to remove all case distinctions in a string. For example,
the German lowercase letter 'ß' is equivalent to "ss". Since it is
already lowercase, lower() would do nothing to 'ß'; casefold()
converts it to "ss".
I would make a wrapper so you can be non-invasive. Minimally, for example...:
class CaseInsensitively(object):
def __init__(self, s):
self.__s = s.lower()
def __hash__(self):
return hash(self.__s)
def __eq__(self, other):
# ensure proper comparison between instances of this class
try:
other = other.__s
except (TypeError, AttributeError):
try:
other = other.lower()
except:
pass
return self.__s == other
Now, if CaseInsensitively('MICHAEL89') in whatever: should behave as required (whether the right-hand side is a list, dict, or set). (It may require more effort to achieve similar results for string inclusion, avoid warnings in some cases involving unicode, etc).
Usually (in oop at least) you shape your object to behave the way you want. name in USERNAMES is not case insensitive, so USERNAMES needs to change:
class NameList(object):
def __init__(self, names):
self.names = names
def __contains__(self, name): # implements `in`
return name.lower() in (n.lower() for n in self.names)
def add(self, name):
self.names.append(name)
# now this works
usernames = NameList(USERNAMES)
print someone in usernames
The great thing about this is that it opens the path for many improvements, without having to change any code outside the class. For example, you could change the self.names to a set for faster lookups, or compute the (n.lower() for n in self.names) only once and store it on the class and so on ...
Here's one way:
if string1.lower() in string2.lower():
...
For this to work, both string1 and string2 objects must be of type string.
I think you have to write some extra code. For example:
if 'MICHAEL89' in map(lambda name: name.upper(), USERNAMES):
...
In this case we are forming a new list with all entries in USERNAMES converted to upper case and then comparing against this new list.
Update
As #viraptor says, it is even better to use a generator instead of map. See #Nathon's answer.
You could do
matcher = re.compile('MICHAEL89', re.IGNORECASE)
filter(matcher.match, USERNAMES)
Update: played around a bit and am thinking you could get a better short-circuit type approach using
matcher = re.compile('MICHAEL89', re.IGNORECASE)
if any( ifilter( matcher.match, USERNAMES ) ):
#your code here
The ifilter function is from itertools, one of my favorite modules within Python. It's faster than a generator but only creates the next item of the list when called upon.
To have it in one line, this is what I did:
if any(([True if 'MICHAEL89' in username.upper() else False for username in USERNAMES])):
print('username exists in list')
I didn't test it time-wise though. I am not sure how fast/efficient it is.
Example from this tutorial:
list1 = ["Apple", "Lenovo", "HP", "Samsung", "ASUS"]
s = "lenovo"
s_lower = s.lower()
res = s_lower in (string.lower() for string in list1)
print(res)
My 5 (wrong) cents
'a' in "".join(['A']).lower()
UPDATE
Ouch, totally agree #jpp, I'll keep as an example of bad practice :(
I needed this for a dictionary instead of list, Jochen solution was the most elegant for that case so I modded it a bit:
class CaseInsensitiveDict(dict):
''' requests special dicts are case insensitive when using the in operator,
this implements a similar behaviour'''
def __contains__(self, name): # implements `in`
return name.casefold() in (n.casefold() for n in self.keys())
now you can convert a dictionary like so USERNAMESDICT = CaseInsensitiveDict(USERNAMESDICT) and use if 'MICHAEL89' in USERNAMESDICT:

Python: Lazy String Decoding

I'm writing a parser, and there is LOTS of text to decode but most of my users will only care about a few fields from all the data. So I only want to do the decoding when a user actually uses some of the data. Is this a good way to do it?
class LazyString(str):
def __init__(self, v) :
self.value = v
def __str__(self) :
r = ""
s = self.value
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
return r
def p_buffer(p):
"""buffer : HASH chars"""
p[0] = LazyString(p[2])
Is that the only method I need to override?
I'm not sure how implementing a string subclass is of much benefit here. It seems to me that if you're processing a stream containing petabytes of data, whenever you've created an object that you don't need to you've already lost the game. Your first priority should be to ignore as much input as you possibly can.
You could certainly build a string-like class that did this:
class mystr(str):
def __init__(self, value):
self.value = value
self._decoded = None
#property
def decoded(self):
if self._decoded == None:
self._decoded = self.value.decode("hex")
return self._decoded
def __repr__(self):
return self.decoded
def __len__(self):
return len(self.decoded)
def __getitem__(self, i):
return self.decoded.__getitem__(i)
def __getslice__(self, i, j):
return self.decoded.__getslice__(i, j)
and so on. A weird thing about doing this is that if you subclass str, every method that you don't explicitly implement will be called on the value that's passed to the constructor:
>>> s = mystr('a0a1a2')
>>> s
 ¡¢
>>> len(s)
3
>>> s.capitalize()
'A0a1a2'
I don't see any kind on lazy evaluation in your code. The fact that you use xrange only means that the list of integers from 0 to len(s) will be generated on demand. The whole string r will be decoded during string conversion anyway.
The best way to implement lazy sequence in Python is using generators. You could try something like this:
def lazy(v):
for i in xrange(0, len(v), 2):
yield int(v[i:i+2], 16)
list(lazy("0a0a0f"))
Out: [10, 10, 15]
What you're doing is built in already:
s = "i am a string!".encode('hex')
# what you do
r = ""
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
# but decoding is builtin
print r==s.decode('hex') # => True
As you can see your whole decoding is s.decode('hex').
But "lazy" decoding sounds like premature optimization to me. You'd need gigabytes of data to even notice it. Try profiling, the .decode is 50 times faster that your old code already.
Maybe you want somthing like this:
class DB(object): # dunno what data it is ;)
def __init__(self, data):
self.data = data
self.decoded = {} # maybe cache if the field data is long
def __getitem__(self, name):
try:
return self.decoded[name]
except KeyError:
# this copies the fields data
self.decoded[name] = ret = self.data[ self._get_field_slice( name ) ].decode('hex')
return ret
def _get_field_slice(self, name):
# find out what part to decode, return the index in the data
return slice( ... )
db = DB(encoded_data)
print db["some_field"] # find out where the field is, get its data and decode it
The methods you need to override really depend on how are planning to use you new string type.
However you str based type looks a little suspicious to me, have you looked into the implementation of str to check that it has the value attribute that you are setting in your __init__()? Performing a dir(str) does not indicate that there is any such attribute on str. This being the case the normal str methods will not be operating on your data at all, I doubt that is the effect you want otherwise what would be the advantage of sub-classing.
Sub-classing base data types is a little strange anyway unless you have very specific requirements. For the lazy evaluation you want you are probably better of creating your class that contains a string rather than sub-classing str and write your client code to work with that class. You will then be free to add the just in time evaluation you want in a number of ways an example using the descriptor protocol can be found in this presentation: Python's Object Model (search for "class Jit(object)" to get to the relevant section)
The question is incomplete, in that the answer will depend on details of the encoding you use.
Say, if you encode a list of strings as pascal strings (i.e. prefixed with string length encoded as a fixed-size integer), and say you want to read the 100th string from the list, you may seek() forward for each of the first 99 strings and not read their contents at all. This will give some performance gain if the strings are large.
If, OTOH, you encode a list of strings as concatenated 0-terminated stirngs, you would have to read all bytes until the 100th 0.
Also, you're speaking about some "fields" but your example looks completely different.

Categories