Padding Function (Python) string.zfill - python

I would like to change the below Python function to cover all situations in which my business_code will need padding. The string.zfill Python function handles this exception, padding to the left until a given width is reached but I have never used it before.
#function for formating business codes
def formatBusinessCodes(code):
""" Function that formats business codes. Pass in a business code which will convert to a string with 6 digits """
busCode=str(code)
if len(busCode)==1:
busCode='00000'+busCode
elif len(busCode)==2:
busCode='0000'+busCode
else:
if len(busCode)==3:
busCode='000'+busCode
return busCode
#pad extra zeros
df2['business_code']=df2['business_code'].apply(lambda x: formatBusinessCodes(x))
businessframe['business_code']=businessframe['business_code'].apply(lambda x: formatBusinessCodes(x))
financialframe['business_code']=financialframe['business_code'].apply(lambda x: formatBusinessCodes(x))
The code above handles a business_code of length 6 but I'm finding that the business_codes vary in length < and > 6. I'm validating data state by state. Each state varies in their business_code lengths (IL - 6 len, OH - 8 len). All codes must be padded evenly. So a code for IL that is 10 should produce 000010, etc. I need to handle all exceptions. Using a command line parsing parameter (argparse), and string.zfill.

You could use str.format:
def formatBusinessCodes(code):
""" Function that formats business codes. Pass in a business code which will convert to a string with 6 digits """
return '{:06d}'.format(code)
In [23]: formatBusinessCodes(1)
Out[25]: '000001'
In [26]: formatBusinessCodes(10)
Out[26]: '000010'
In [27]: formatBusinessCodes(123)
Out[27]: '000123'
The format {:06d} can be understood as follows:
{...} means replace the following with an argument from format,
(e.g. code).
: begins the format specification
0 enables zero-padding
6 is the width of the string. Note that numbers larger than 6
digits will NOT be truncated, however.
d means the argument (e.g. code) should be of integer type.
Note in Python2.6 the format string needs an extra 0:
def formatBusinessCodes(code):
""" Function that formats business codes. Pass in a business code which will convert to a string with 6 digits """
return '{0:06d}'.format(code)

parser.add_argument('-b',help='Specify length of the district code')
businessformat=args.d
businessformat=businessformat.strip()
df2['business_code']=df2['business_code'].apply(lambda x: str(x))
def formatBusinessCodes(code):
bus=code bus.zfill(4)
return bus
formatBusinessCodes(businessformat)

Related

How to convert a three digit integer (xxx) with a 1 decimal place float (xx.x)?

Currently I'm getting data from some sensors with voltage(V) and current(C) values which is decoded into text as V040038038039C125067 to be stored in MYSQL DB table. The voltage contains 4 different voltage values combined while the current contains 2 different current values combined where each value represented by 3 digits in the format of Voltage xx.x C: Current xx.x. For example, the current value of C125067 is actually 12.5 and 06.7A respectively. I tried to use python slicing some and some simple math to achieve this by dividing the values by 10 e.g. C125067 = 125/10 = 12.5. While this works for integers with first non-zero values (e.g. 125), when I tried to perform the same for values such as 040 or 067, I get the SyntaxError: leading zeros in decimal integer literals are not permitted error. Are there any better ways to achieve the desired decoding output of xx.x or to insert a decimal point before the last digit etc? Thanks.
v1 = voltage[1:4]
v2 = voltage[4:7]
v3 = voltage[7:10]
v4 = voltage[10:13]
c1 = current[1:4]
c2 = current[4:7]
volt_1 = int(v1)/10
volt_2 = int(v2)/10
volt_3 = int(v3)/10
volt_4 = int(v4)/10
curr_1 = int(c1)/10
curr_2 = int(c2)/10
Which version of Python are you using? int should convert strings such as '040' just fine.
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: int('040')
Out[1]: 40
In [2]:
Are you by any chance typing int(040) instead of int('040')? One is a decimal integer literal while the latter is a string.
Leading zeros are not allowed in Python?
Using python 3.9.13, your code works without problems.
voltage = "V040038038039C125067"
v1 = voltage[1:4]
v2 = voltage[4:7]
v3 = voltage[7:10]
v4 = voltage[10:13]
volt_1 = int(v1)/10
volt_2 = int(v2)/10
volt_3 = int(v3)/10
volt_4 = int(v4)/10
print(v1, v2, v3, v4, volt_1, volt_2, volt_3, volt_4)
# 040 038 038 039 4.0 3.8 3.8 3.9
Use a regex to get a list of 6 string values from your sql data (grouped by 3 digits).
The most efficient way to use the regex is to compile it at the beginning then use the compiled regex on your sql rows.
Use a list-comprehension to obtain a list of floats (converted from strings, also stripping the leading zeros).
Use sequence unpacking to separate into a voltage list and a current list.
import re
pattern = re.compile(r"(\d{3})")
data = "V040038038039C125067"
values = [int(x.lstrip("0")) / 10.0 for x in pattern.findall(data)]
voltage, current = values[:4], values[4:]
print(voltage, current) # [4.0, 3.8, 3.8, 3.9] [12.5, 6.7]
You can make a function of that, to easily apply to your sql rows.
def parse(data):
values = [int(x.lstrip("0")) / 10.0 for x in pattern.findall(data)]
return values[:4], values[4:]
voltage, current = parse("V040038038039C125067")
This is a Very Simple Problem
What you have to do is just divide the number by 10 and convert it into float with float inbuilt function in python.
a = int(input("Enter a random number: "))
print(float(a/10))
now apply it in your problem.
volt_1 = float(int(v1)/10)
volt_2 = float(int(v2)/10)
volt_3 = float(int(v3)/10)
volt_4 = float(int(v4)/10)
curr_1 = float(int(c1)/10)
curr_2 = float(int(c2)/10)

Convert integer to hours and minutes

I'm working on a Python script that will read a file and grab a string total_time. Currently, this is what I have.
if("Total time" in data):
total_time=int(filter(str.isdigit, data))
print(total_time)
Output: 419
I'm trying to find the best way to read lots of files, grab this total time, and convert 419 into 4 hours and 19 minutes to allow me to do some statics and analytics with this.
Passing format argument to datetime in Pandas:
t="419"
a = pd.to_datetime(t, format='%H%M')
print(a.hour)
print(a.minute)
The built-in function divmod() seems appropriate here!
>>> a = 5
>>> b = 3
>>> divmod(a,b) # (a // b, a % b)
(1,2)
For your specific situation:
def dataToTime(data):
''' Returns a list of (hour, minute) tuples from
a list of strings '''
total_times = filter(str.isdigit,data)
return [divmod(int(time),100) for time in total_times]
If you would like to parse the data as you are inputting it try the re module which has the method re.sub() for regex substitution
>>> import re
>>> s = '| Total time | 4:19 | | |--------------+--------+------| –'
>>> h = int(re.sub(r':.*$|[^0-9]','',s))
>>> m = int(re.sub(r'^.*:|[^0-9]','',s))
>>> print h,m
(4,19)
Given some string set as
s = '419'
you can get the upper and lower digits by converting to an integer, then using modulo and integer division. The integer conversion can be encapsulated in a try-except block catching ValueError if you have a reasonable response to invalid inputs:
n = int(s)
hours = n // 100 # Truncating integer division
minutes = n % 100 # Modulo removes the upper digits

Converting OID of public key,etc in HEX data to the dot format

Hello guys,
I have CV1 RSA certificates that are slightly modified.So, I dont want to use asn1wrap to parse a 'der' file as it makes it too complex sometimes,instead as tags are already fixed for a CV1 certificate i can parse the HEX data of this 'der' file by converting the binary data to hex and extracting the required range of data.
However for representation i want the OID to be in the dot format
eg : ABSOLUTE OID 1.3.6.33.4.11.5318.2888.18.10377.5
i can extract the hex string for this from the whole hex data as :
'060D2B0621040BA946964812D10905'
any python3 function that can directly do this conversion. Or can anyone help out with the logic to convert the same.
Found an answer for anyone who's interested. Without using the pyasn1 or asn1crypto i did not find any package to convert the hexadecimal value to OID notation.
So i browsed around and made a mix of code from other languages and created one in python.
def notation_OID(oidhex_string):
''' Input is a hex string and as one byte is 2 charecters i take an
empty list and insert 2 characters per element of the list.
So for a string 'DEADBEEF' it would be ['DE','AD','BE,'EF']. '''
hex_list = []
for char in range(0,len(oidhex_string),2):
hex_list.append(oidhex_string[char]+oidhex_string[char+1])
''' I have deleted the first two element of the list as my hex string
includes the standard OID tag '06' and the OID length '0D'.
These values are not required for the calculation as i've used
absolute OID and not using any ASN.1 modules. Can be removed if you
have only the data part of the OID in hex string. '''
del hex_list[0]
del hex_list[0]
# An empty string to append the value of the OID in standard notation after
# processing each element of the list.
OID_str = ''
# Convert the list with hex data in str format to int format for
# calculations.
for element in range(len(hex_list)):
hex_list[element] = int(hex_list[element],16)
# Convert the OID to its standard notation. Sourced from code in other
# languages and adapted for python.
# The first two digits of the OID are calculated differently from the rest.
x = int(hex_list[0] / 40)
y = int(hex_list[0] % 40)
if x > 2:
y += (x-2)*40
x = 2;
OID_str += str(x)+'.'+str(y)
val = 0
for byte in range(1,len(hex_list)):
val = ((val<<7) | ((hex_list[byte] & 0x7F)))
if (hex_list[byte] & 0x80) != 0x80:
OID_str += "."+str(val)
val = 0
# print the OID in dot notation.
print (OID_str)
notation_OID('060D2B0621040BA946964812D10905')
Hope this helps... cHEErs !

How can I convert a byte array to an integer more elegantly in Python

I'm receiving a byte array via serial communication and converting part of the byte array to an integer. The code is as follows:
data = conn.recv(40)
print(data)
command = data[0:7]
if(command == b'FORWARD' and data[7] == 3):
value = 0
counter = 8
while (data[counter] != 4):
value = value * 10 + int(data[counter] - 48)
counter = counter + 1
In short, I unpack the bytearray data starting at location 8 and going until I hit a delimiter of b'\x03'. So I'm unpacking an integer of from 1 to 3 digits, and putting the numeric value into value.
This brute force method works. But is there a more elegant way to do it in Python? I'm new to the language and would like to learn better ways of doing some of these things.
You can find the delimiter, convert the substring of the bytearray to str and then int. Here's a little function to do that:
def intToDelim( ba, delim ):
i=ba.find( delim )
return int(str(ba[0:i]))
which you can invoke with
value = intToDelim( data[8:], b'\x04' )
(or with b'\x03' if that's your delimiter). This works in Python 2.7 and should work with little or no change in Python 3.

Check if a string is hexadecimal

I know the easiest way is using a regular expression, but I wonder if there are other ways to do this check.
Why do I need this? I am writing a Python script that reads text messages (SMS) from a SIM card. In some situations, hex messages arrives and I need to do some processing for them, so I need to check if a received message is hexadecimal.
When I send following SMS:
Hello world!
And my script receives
00480065006C006C006F00200077006F0072006C00640021
But in some situations, I receive normal text messages (not hex). So I need to do a if hex control.
I am using Python 2.6.5.
UPDATE:
The reason of that problem is, (somehow) messages I sent are received as hex while messages sent by operator (info messages and ads.) are received as a normal string. So I decided to make a check and ensure that I have the message in the correct string format.
Some extra details: I am using a Huawei 3G modem and PyHumod to read data from the SIM card.
Possible best solution to my situation:
The best way to handle such strings is using a2b_hex (a.k.a. unhexlify) and utf-16 big endian encoding (as #JonasWielicki mentioned):
from binascii import unhexlify # unhexlify is another name of a2b_hex
mystr = "00480065006C006C006F00200077006F0072006C00640021"
unhexlify(mystr).encode("utf-16-be")
>> u'Hello world!'
(1) Using int() works nicely for this, and Python does all the checking for you :)
int('00480065006C006C006F00200077006F0072006C00640021', 16)
6896377547970387516320582441726837832153446723333914657L
will work. In case of failure you will receive a ValueError exception.
Short example:
int('af', 16)
175
int('ah', 16)
...
ValueError: invalid literal for int() with base 16: 'ah'
(2) An alternative would be to traverse the data and make sure all characters fall within the range of 0..9 and a-f/A-F. string.hexdigits ('0123456789abcdefABCDEF') is useful for this as it contains both upper and lower case digits.
import string
all(c in string.hexdigits for c in s)
will return either True or False based on the validity of your data in string s.
Short example:
s = 'af'
all(c in string.hexdigits for c in s)
True
s = 'ah'
all(c in string.hexdigits for c in s)
False
Notes:
As #ScottGriffiths notes correctly in a comment below, the int() approach will work if your string contains 0x at the start, while the character-by-character check will fail with this. Also, checking against a set of characters is faster than a string of characters, but it is doubtful this will matter with short SMS strings, unless you process many (many!) of them in sequence in which case you could convert stringhexditigs to a set with set(string.hexdigits).
You can:
test whether the string contains only hexadecimal digits (0…9,A…F)
try to convert the string to integer and see whether it fails.
Here is the code:
import string
def is_hex(s):
hex_digits = set(string.hexdigits)
# if s is long, then it is faster to check against a set
return all(c in hex_digits for c in s)
def is_hex(s):
try:
int(s, 16)
return True
except ValueError:
return False
I know the op mentioned regular expressions, but I wanted to contribute such a solution for completeness' sake:
def is_hex(s):
return re.fullmatch(r"^[0-9a-fA-F]$", s or "") is not None
Performance
In order to evaluate the performance of the different solutions proposed here, I used Python's timeit module. The input strings are generated randomly for three different lengths, 10, 100, 1000:
s=''.join(random.choice('0123456789abcdef') for _ in range(10))
Levon's solutions:
# int(s, 16)
10: 0.257451018987922
100: 0.40081690801889636
1000: 1.8926858339982573
# all(_ in string.hexdigits for _ in s)
10: 1.2884491360164247
100: 10.047717947978526
1000: 94.35805322701344
Other answers are variations of these two. Using a regular expression:
# re.fullmatch(r'^[0-9a-fA-F]$', s or '')
10: 0.725040541990893
100: 0.7184272820013575
1000: 0.7190397029917222
Picking the right solution thus depends on the length on the input string and whether exceptions can be handled safely. The regular expression certainly handles large strings much faster (and won't throw a ValueError on overflow), but int() is the winner for shorter strings.
One more simple and short solution based on transformation of string to set and checking for subset (doesn't check for '0x' prefix):
import string
def is_hex_str(s):
return set(s).issubset(string.hexdigits)
More information here.
Another option:
def is_hex(s):
hex_digits = set("0123456789abcdef")
for char in s:
if not (char in hex_digits):
return False
return True
Most of the solutions proposed above do not take into account that any decimal integer may be also decoded as hex because decimal digits set is a subset of hex digits set. So Python will happily take 123 and assume it's 0123 hex:
>>> int('123',16)
291
This may sound obvious but in most cases you'll be looking for something that was actually hex-encoded, e.g. a hash and not anything that can be hex-decoded. So probably a more robust solution should also check for an even length of the hex string:
In [1]: def is_hex(s):
...: try:
...: int(s, 16)
...: except ValueError:
...: return False
...: return len(s) % 2 == 0
...:
In [2]: is_hex('123')
Out[2]: False
In [3]: is_hex('f123')
Out[3]: True
This will cover the case if the string starts with '0x' or '0X': [0x|0X][0-9a-fA-F]
d='0X12a'
all(c in 'xX' + string.hexdigits for c in d)
True
In Python3, I tried:
def is_hex(s):
try:
tmp=bytes.fromhex(hex_data).decode('utf-8')
return ''.join([i for i in tmp if i.isprintable()])
except ValueError:
return ''
It should be better than the way: int(x, 16)
Using Python you are looking to determine True or False, I would use eumero's is_hex method over Levon's method one. The following code contains a gotcha...
if int(input_string, 16):
print 'it is hex'
else:
print 'it is not hex'
It incorrectly reports the string '00' as not hex because zero evaluates to False.
Since all the regular expression above took about the same amount of time, I would guess that most of the time was related to converting the string to a regular expression. Below is the data I got when pre-compiling the regular expression.
int_hex
0.000800 ms 10
0.001300 ms 100
0.008200 ms 1000
all_hex
0.003500 ms 10
0.015200 ms 100
0.112000 ms 1000
fullmatch_hex
0.001800 ms 10
0.001200 ms 100
0.005500 ms 1000
Simple solution in case you need a pattern to validate prefixed hex or binary along with decimal
\b(0x[\da-fA-F]+|[\d]+|0b[01]+)\b
Sample: https://regex101.com/r/cN4yW7/14
Then doing int('0x00480065006C006C006F00200077006F0072006C00640021', 0) in python gives
6896377547970387516320582441726837832153446723333914657
The base 0 invokes prefix guessing behaviour.
This has saved me a lot of hassle. Hope it helps!
Most of the solution are not properly in checking string with prefix 0x
>>> is_hex_string("0xaaa")
False
>>> is_hex_string("0x123")
False
>>> is_hex_string("0xfff")
False
>>> is_hex_string("fff")
True
Here's my solution:
def to_decimal(s):
'''input should be int10 or hex'''
isString = isinstance(s, str)
if isString:
isHex = all(c in string.hexdigits + 'xX' for c in s)
return int(s, 16) if isHex else int(s)
else:
return int(hex(s), 16)
a = to_decimal(12)
b = to_decimal(0x10)
c = to_decimal('12')
d = to_decimal('0x10')
print(a, b, c, d)

Categories