Fastest way to parse (split) binary bits in python

Fastest way to parse (split) binary bits in python - python

We are counting photons and time-tagging with this FPGA counter.We got about 500MB of data per minutes. I am getting 32bits of data in hex string *32-bit signed integers stored using little-endian byte order. Currently I am doing like:
def getall(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
d0=0
raw_counts=[]
for i in data1:
binary = bin(i)[2:].zfill(8)
decimal = int(binary[5:],2)
if binary[:1] == '1':
raw_counts.append(decimal)
counter=collections.Counter(raw_counts)
sorted_counts=sorted(counter.items(), key=lambda pair: pair[0], reverse=False)
return counter,counter.keys(),counter.values()
I think this part (binary = bin(i)[2:].zfill(8);decimal = int(binary[5:],2)) is slowing down the process. ( No it is not. I found out by profiling my program.) Is there any way to speed it up? So far I only need the binary bits from [5:]. I don't need all 32bits. So I think the parsing the 32bits to last 27bits is taking much of the time. Thanks,
*Update 1
J.F.Sebastian pointed me it is not in hex string.
*Update 2
Here is the final code if any one needs it. I ended up using np.unique instead of collection counter. At the end , I converted back to collection counter because I want to get accumulative counting.
#http://stackoverflow.com/questions/10741346/numpy-most-efficient-frequency-counts-for-unique-values-in-an-array
def myc(x):
unique, counts = np.unique(x, return_counts=True)
return np.asarray((unique, counts)).T
def getallfast(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
data2=data1[np.nonzero((~data1 & (31 <<1)))] & 0x7ffffff #See J.F.Sebastian's comment.
counter=myc(data2)
raw_counts=dict(zip(counter[:,0],counter[:,1]))
counter=collections.Counter(raw_counts)
return counter,counter.keys(),counter.values()
However this one looks like the fastest version for me. data1[np.nonzero((~data1 & (31 <<1)))] & 0x7ffffff is slowing down compared to counting first and convert the data later binary = bin(counter[i,0])[2:].zfill(8)
def myc(x):
unique, counts = np.unique(x, return_counts=True)
return np.asarray((unique, counts)).T
def getallfast(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
counter=myc(data1)
xnew=[]
ynew=[]
raw_counts=dict()
for i in range(len(counter)):
binary = bin(counter[i,0])[2:].zfill(8)
decimal = int(binary[5:],2)
xnew.append(decimal)
ynew.append(counter[i,1])
raw_counts[decimal]=counter[i,1]
counter=collections.Counter(raw_counts)
return counter,xnew,ynew

I guess you could try one of these 2
could just take the bits with binary and fivebits=my_int&0x1f
if you want the five bits at the other end just fivebits = my_int >> (32-5)
but really in my experience converting it to a string is quite fast ... I thought that was a bottle neck many years ago ... after profiling it I found it wasnt

Related

How to create a script that gives me every combination possible of a six digit code

Me and a friend want to create a script that gives us every possible permutation of a six digit code, comprised of 36 alphanumeric characters (0-9, and a-z), in alphabetical order, then be able to see them in a .txt file.
And I want it to use all of the CPU and RAM it can, so that it takes less time to complete the task.
So far, this is the code:
import random
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
file = open("codes.txt", "a")
for g in range(0, 36**6):
key = ""
base = ""
print(str(g))
for i in range(0, 6):
char = random.choice(charset)
key += char
base += key
file.write(base + "\n")
file.close()
This code randomly generates the combinations and immediately writes them in a .txt file, while printing the amount of codes it has already created but, it isn't in alphabetical order (have to do it afterwards), and it takes too long.
How can the code be improved to give the desired outcome?
Thanks to #R0Best for providing the best answer

Although this post already has 6 answers, I'm not content with any of them, so I've decided to contribute a solution of my own.
First, note that many of the answers provide the combinations or permutations of letters, however the post actually wants the Cartesian Product of the alphabet with itself (repeated N times, where N=6). There is (at this time) two answers that do this, however they both write an excessive number of times, resulting in subpar performance, and also concatenate their intermediate results in the hottest portion of the loop (also bringing down performance).
In the interest of taking optimization to the absolute max, I present the following code:
from string import digits, ascii_lowercase
from itertools import chain
ALPHABET = (digits + ascii_lowercase).encode("ascii")
def fast_brute_force():
# Define some constants to make the following sections more readable
base_size = 6
suffix_size = 4
prefix_size = base_size - suffix_size
word_size = base_size + 1
# define two containers
# word_blob - placeholder words, with hyphens in the unpopulated characters (followed by newline)
# sleds - a tuple of repeated bytes, used for substituting a bunch of characters in a batch
word_blob = bytearray(b"-" * base_size + b"\n")
sleds = tuple(bytes([char]) for char in ALPHABET)
# iteratively extend word_blob and sleds, and filling in unpopulated characters using the sleds
# in doing so, we construct a single "blob" that contains concatenated suffixes of the desired
# output with placeholders so we can quickly substitute in the prefix, write, repeat, in batches
for offset in range(prefix_size, base_size)[::-1]:
word_blob *= len(ALPHABET)
word_blob[offset::word_size] = chain.from_iterable(sleds)
sleds = tuple(sled * len(ALPHABET) for sled in sleds)
with open("output.txt", "wb") as f:
# I've expanded out the logic for substituting in the prefixes into explicit nested for loops
# to avoid both redundancy (reassigning the same value) and avoiding overhead associated with
# a recursive implementation
# I assert this below, so any changes in suffix_size will fail loudly
assert prefix_size == 2
for sled1 in sleds:
word_blob[0::word_size] = sled1
for sled2 in sleds:
word_blob[1::word_size] = sled2
# we write to the raw FileIO since we know we don't need buffering or other fancy
# bells and whistles, however in practice it doesn't seem that much faster
f.raw.write(word_blob)
There's a lot of magic happening in that code block, but in a nutshell:
I batch the writes, so that I'm writing 36**4 or 1679616 entries at once, so there's less context switching.
I update all 1679616 entries per batch simultaneously with the new prefix, using bytearray slicing / assignment.
I operate on bytes, write to the raw FileIO, expand the loops for the prefix assignments, and other small optimizations to avoid encoding/buffering/function call overhead/other performance hits.
Note, unless you have a very fast disk and slowish CPU, you won't see much benefit from the smaller optimizations, just the write batching probably.
On my system, it takes about 45 seconds to product + write the 14880348 file, and that's writing to my slowest disk. On my NVMe drive, it takes 6.868 seconds.

The fastest way I can think of is using pypy3 with this code:
import functools
import time
from string import digits, ascii_lowercase
#functools.lru_cache(maxsize=128)
def main():
cl = []
cs = digits + ascii_lowercase
for letter in cs:
cl.append(letter)
ct = tuple(cl)
with open("codes.txt", "w") as file:
for p1 in ct:
for p2 in ct:
for p3 in ct:
for p4 in ct:
for p5 in ct:
for p6 in ct:
file.write(f"{p1}{p2}{p3}{p4}{p5}{p6}\n")
if __name__ == '__main__':
start = time.time()
main()
print(f"Done!\nTook {time.time() - start} seconds!")
It writes at around 10-15MB/s. The total file is around 15GB I believe so it would take like 990-1500 seconds to generate. The results are on a VM of unraid with 1 3.4 ghz core of server CPU, with an old SATA3 SSD. You will probably get better results with an NVME drive and a faster single core CPU.

Random Can be very inefficient. You can try :
from itertools import permutations
from pandas import Series
charset = list("0123456789abcdefghijklmnopqrstuvwxyz")
links = []
file = open("codes.txt", "a")
comb = permutations(charset,6)
comb = list(comb)
comb = list(map(lambda x:return ''.join(x),comb))
mySeries = Series(comb)
mySeries = mySeries.sort_values()
base = ""
for k in mySeries:
base += k
file.write(base + "\n")
file.close()

You could use itertools.permutaions from the default itertools library. You can also specify the number of characters in the combination.
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
c = permutations(charset, 6)
with open('code.txt', 'w') as f:
for i in c:
f.write("".join(i) + '\n')
Runs on my computer in about 200 milliseconds for creating the list of permutations, then spends a lot of time writing to the file

For permutations, this would do the trick:
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for permutation in permutations(charset, 6):
f.write(''.join(permutation) + '\n')
FYI, it would create a 7.8 GigaByte file
For combinations, this would do the trick:
from itertools import combinations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for comb in combinations(charset, 6):
f.write(''.join(comb)+ '\n')
FYI, it would create a 10.8 megabyte file

First thing; There is better ways to do this but I want to write something clear and understandable.
Pseudo Code:
base = "";
for(x1=0; x1<charset.length(); x1++)
for(x2=0; x2<charset.length(); x2++)
for(x3=0; x3<charset.length(); x3++)
.
.
.
{ base = charset[x1]+charset[x2]+charset[x3]+.....+charset[x6];
file.write(base + "\n")
}

This is a combination problem where you are trying to get combinations of length 6 from the character set of length 36. This will produce an output of size 36!/(30!*6!) . You can refer the itertools for solving a combination problem like yours. You can refer the Combination function in itertools
Documentation. It is recommended not to perform such a performance intensive computation using Python.

Why are wrong bits extracted in python?

I need several k bits at position p extracted in order to convert to decimal value. I use a standard function but still when I test it for a 6/7/8 byte long binary code it is not correct. When I have a 1 byte code it is little-endian but once I use a 8 byte code it is shifted. For one signal (of a 7byte code) it was shifted +7bits but another signal (another ID, 8byte code) was shifted by -21bits. I cannot explain this to myself so I thought of playing around and manually add or subtract bits in order to use the correct bits for calculation. Do you have any idea why this is happening?
Example 8bytes:
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',16,0)
Output: 001110100111101 instead of 1000011110000000
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',16,24)
Output: 0110010001110111 instead of 1011001000111011
This is the code I am working with:
import openpyxl
from openpyxl import Workbook
theFile = openpyxl.load_workbook('Adapted_T013.xlsx')
allSheetNames = theFile.sheetnames
print("All sheet names {} " .format(theFile.sheetnames))
sheet = theFile.active
def extract_k_bits(inputBIN,k,p):
end = len(inputBIN) - p
start = end - k + 1
kBitSub = inputBIN[start : end+1]
print(kBitSub)
Dec_Values=int(kBitSub,2)

Here's a working solution:
def extract_k_bits(inputBIN,k):
# Since you always extract 16 bits
# just use the point where you want to extract from
kBitSub = inputBIN[k : k+16]
print(kBitSub)
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',0)
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',23)
Output

Python turn array of booleans to binary

I am preparing new driver for one of our new hardware devices.
One of the option to set it up, is where one byte, has 8 options in it. Every bite turns on or off something else.
So, basically what I need to do is, take 8 zeros or ones and create one byte of them.
What I did is, I have prepares helper function for it:
#staticmethod
def setup2byte(setup_array):
"""Turn setup array (of 8 booleans) into byte"""
data = ''
for b in setup_array:
data += str(int(b))
return int(data, 2)
Called like this:
settings = [echo, reply, presenter, presenter_brake, doors_action, header, ticket_sensor, ext_paper_sensor]
data = self.setup2byte(settings)
packet = "{0:s}{1:s}{2:d}{3:s}".format(CONF_STX, 'P04', data, ETX)
self.queue_command.put(packet)
and I wonder if there is easier way how to do it. Some built in function or something like that. Any ideas?

I believe you want this:
convert2b = lambda ls: bytes("".join([str(int(b)) for b in ls]), 'utf-8')
Where ls is a list of booleans. Works in python 2.7 and 3.x. Alternative more like your original:
convert2b = lambda ls: int("".join([str(int(b)) for b in ls]), 2)

that's basically what you are already doing, but shorter:
data = int(''.join(['1' if i else '0' for i in settings]), 2)
But here is the answer you are looking for:
Bool array to integer

I think the previous answers created 8 bytes. This solution creates one byte only
settings = [False,True,False,True,True,False,False,True]
# LSB first
integerValue = 0
# init value of your settings
for idx, setting in enumerate(settings):
integerValue += setting*2**idx
# initialize an empty byte
mybyte = bytearray(b'\x00')
mybyte[0] =integerValue
print (mybyte)
For more example visit this great site: binary python

Fast extraction of chunks of lines from large CSV file

I have a large CSV file full of stock-related data formatted as such:
Ticker Symbol, Date, [some variables...]
So each line starts of with the symbol (like "AMZN"), then has the date, then has 12 variables related to price or volume on the selected date. There are about 10,000 different securities represented in this file and I have a line for each day that the stock has been publicly traded for each of them. The file is ordered first alphabetically by ticker symbol and second chronologically by date. The entire file is about 3.3 GB.
The sort of task I want to solve would be to be able to extract the most recent n lines of data for a given ticker symbol with respect to the current date. I have code that does this, but based on my observations it seems to take, on average, around 8-10 seconds per retrieval (all tests have been extracting 100 lines).
I have functions I'd like to run that require me to grab such chunks for hundreds or thousands of symbols, and I would really like to reduce the time. My code is inefficient, but I am not sure how to make it run faster.
First, I have a function called getData:
def getData(symbol, filename):
out = ["Symbol","Date","Open","High","Low","Close","Volume","Dividend",
"Split","Adj_Open","Adj_High","Adj_Low","Adj_Close","Adj_Volume"]
l = len(symbol)
beforeMatch = True
with open(filename, 'r') as f:
for line in f:
match = checkMatch(symbol, l, line)
if beforeMatch and match:
beforeMatch = False
out.append(formatLineData(line[:-1].split(",")))
elif not beforeMatch and match:
out.append(formatLineData(line[:-1].split(",")))
elif not beforeMatch and not match:
break
return out
(This code has a couple of helper functions, checkMatch and formatLineData, which I will show below.) Then, there is another function called getDataColumn that gets the column I want with the correct number of days represented:
def getDataColumn(symbol, col=12, numDays=100, changeRateTransform=False):
dataset = getData(symbol)
if not changeRateTransform:
column = [day[col] for day in dataset[-numDays:]]
else:
n = len(dataset)
column = [(dataset[i][col] - dataset[i-1][col])/dataset[i-1][col] for i in range(n - numDays, n)]
return column
(changeRateTransform converts raw numbers into daily change rate numbers if True.) The helper functions:
def checkMatch(symbol, symbolLength, line):
out = False
if line[:symbolLength+1] == symbol + ",":
out = True
return out
def formatLineData(lineData):
out = [lineData[0]]
out.append(datetime.strptime(lineData[1], '%Y-%m-%d').date())
out += [float(d) for d in lineData[2:6]]
out += [int(float(d)) for d in lineData[6:9]]
out += [float(d) for d in lineData[9:13]]
out.append(int(float(lineData[13])))
return out
Does anyone have any insight on what parts of my code run slow and how I can make this perform better? I can't do the sort of analysis I want to do without speeding this up.
EDIT:
In response to the comments, I made some changes to the code in order to utilize the existing methods in the csv module:
def getData(symbol, database):
out = ["Symbol","Date","Open","High","Low","Close","Volume","Dividend",
"Split","Adj_Open","Adj_High","Adj_Low","Adj_Close","Adj_Volume"]
l = len(symbol)
beforeMatch = True
with open(database, 'r') as f:
databaseReader = csv.reader(f, delimiter=",")
for row in databaseReader:
match = (row[0] == symbol)
if beforeMatch and match:
beforeMatch = False
out.append(formatLineData(row))
elif not beforeMatch and match:
out.append(formatLineData(row))
elif not beforeMatch and not match:
break
return out
def getDataColumn(dataset, col=12, numDays=100, changeRateTransform=False):
if not changeRateTransform:
out = [day[col] for day in dataset[-numDays:]]
else:
n = len(dataset)
out = [(dataset[i][col] - dataset[i-1][col])/dataset[i-1][col] for i in range(n - numDays, n)]
return out
Performance was worse using the csv.reader class. I tested on two stocks, AMZN (near top of file) and ZNGA (near bottom of file). With the original method, the run times were 0.99 seconds and 18.37 seconds, respectively. With the new method leveraging the csv module, the run times were 3.04 seconds and 64.94 seconds, respectively. Both return the correct results.
My thought is that the time is being taken up more from finding the stock than from the parsing. If I try these methods on the first stock in the file, A, the methods both run in about 0.12 seconds.

When you're going to do lots of analysis on the same dataset, the pragmatic approach would be to read it all into a database. It is made for fast querying; CSV isn't. Use the sqlite command line tools, for example, which can directly import from CSV. Then add a single index on (Symbol, Date) and lookups will be practically instantaneous.
If for some reason that is not feasible, for example because new files can come in at any moment and you cannot afford the preparation time before starting your analysis of them, you'll have to make the best of dealing with CSV directly, which is what the rest of my answer will focus on. Remember that it's a balancing act, though. Either you pay a lot upfront, or a bit extra for every lookup. Eventually, for some amount of lookups it would have been cheaper to pay upfront.
Optimization is about maximizing the amount of work not done. Using generators and the built-in csv module aren't going to help much with that in this case. You'd still be reading the whole file and parsing all of it, at least for line breaks. With that amount of data, it's a no-go.
Parsing requires reading, so you'll have to find a way around it first. Best practices of leaving all intricacies of the CSV format to the specialized module bear no meaning when they can't give you the performance you want. Some cheating must be done, but as little as possible. In this case, I suppose it is safe to assume that the start of a new line can be identified as b'\n"AMZN",' (sticking with your example). Yes, binary here, because remember: no parsing yet. You could scan the file as binary from the beginning until you find the first line. From there read the amount of lines you need, decode and parse them the proper way, etc. No need for optimization there, because a 100 lines are nothing to worry about compared to the hundreds of thousands of irrelevant lines you're not doing that work for.
Dropping all that parsing buys you a lot, but the reading needs to be optimized as well. Don't load the whole file into memory first and skip as many layers of Python as you can. Using mmap lets the OS decide what to load into memory transparently and lets you work with the data directly.
Still you're potentially reading the whole file, if the symbol is near the end. It's a linear search, which means the time it takes is linearly proportional to the number of lines in the file. You can do better though. Because the file is sorted, you could improve the function to instead perform a kind of binary search. The number of steps that will take (where a step is reading a line) is close to the binary logarithm of the number of lines. In other words: the number of times you can divide your file into two (almost) equally sized parts. When there are one million lines, that's a difference of five orders of magnitude!
Here's what I came up with, based on Python's own bisect_left with some measures to account for the fact that your "values" span more than one index:
import csv
from itertools import islice
import mmap
def iter_symbol_lines(f, symbol):
# How to recognize the start of a line of interest
ident = b'"' + symbol.encode() + b'",'
# The memory-mapped file
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Skip the header
mm.readline()
# The inclusive lower bound of the byte range we're still interested in
lo = mm.tell()
# The exclusive upper bound of the byte range we're still interested in
hi = mm.size()
# As long as the range isn't empty
while lo < hi:
# Find the position of the beginning of a line near the middle of the range
mid = mm.rfind(b'\n', 0, (lo+hi)//2) + 1
# Go to that position
mm.seek(mid)
# Is it a line that comes before lines we're interested in?
if mm.readline() < ident:
# If so, ignore everything up to right after this line
lo = mm.tell()
else:
# Otherwise, ignore everything from right before this line
hi = mid
# We found where the first line of interest would be expected; go there
mm.seek(lo)
while True:
line = mm.readline()
if not line.startswith(ident):
break
yield line.decode()
with open(filename) as f:
r = csv.reader(islice(iter_symbol_lines(f, 'AMZN'), 10))
for line in r:
print(line)
No guarantees about this code; I didn't pay much attention to edge cases, and I couldn't test with (any of) your file(s), so consider it a proof of concept. It is plenty fast, however – think tens of milliseconds on an SSD!

So I have an alternative solution which I ran and tested on my own as well with a sample data set that I got on Quandl that appears to have all the same headers and similar data. (Assuming that I havent misunderstood the end result that your trying to achieve).
I have this command line tool that one of our engineers built for us for parsing massive csvs - since I deal with absurd amount of data on a day to day basis - it is open sourced and you can get it here: https://github.com/DataFoxCo/gocsv
I also already wrote the short bash script for it in case you don't want to pipeline the commands but it does also support pipelining.
The command to run the following short script follows a super simple convention:
bash tickers.sh wikiprices.csv 'AMZN' '2016-12-\d+|2016-11-\d+'
#!/bin/bash
dates="$3"
cat "$1" \
| gocsv filter --columns 'ticker' --regex "$2" \
| gocsv filter --columns 'date' --regex "$dates" > "$2"'-out.csv'
both arguments for ticker and for dates are regexes
You can add as many variations as your want into that one regex, separating them by |.
So if you wanted AMZN and MSFT then you would simply modify it to this: AMZN|MSFT
I did something very similar with the dates - but i only limited my sample run to any dates from this month or last month.
End Result
Starting data:
myusername$ gocsv dims wikiprices.csv
Dimensions:
Rows: 23946
Columns: 14
myusername$ bash tickers.sh wikiprices.csv 'AMZN|MSFT' '2016-12-\d+'
myusername$ gocsv dims AMZN|MSFT-out.csv
Dimensions:
Rows: 24
Columns: 14
Here is a sample where I limited to only those 2 tickers and then to december only:
Voila - in a matter of seconds you have a second file saved with out the data you care about.
The gocsv program has great documentation by the way - and a ton of other functions e.g. running a vlookup basically at any scale (which is what inspired the creator to make the tool)

in addition to using csv.reader I think using itertools.groupby would speed up looking for the wanted sections, so the actual iteration could look something like this:
import csv
from itertools import groupby
from operator import itemgetter #for the keyfunc for groupby
def getData(wanted_symbol, filename):
with open(filename) as file:
reader = csv.reader(file)
#so each line in reader is basically line[:-1].split(",") from the plain file
for symb, lines in groupby(reader, itemgetter(0)):
#so here symb is the symbol at the start of each line of lines
#and lines is the lines that all have that symbol in common
if symb != wanted_symbol:
continue #skip this whole section if it has a different symbol
for line in lines:
#here we have each line as a list of fields
#for only the lines that have `wanted_symbol` as the first element
<DO STUFF HERE>
so in the space of <DO STUFF HERE> you could have the out.append(formatLineData(line)) to do what your current code does but the code for that function has a lot of unnecessary slicing and += operators which I think are pretty expensive for lists (might be wrong), another way you could apply the conversions is to have a list of all the conversions:
def conv_date(date_str):
return datetime.strptime(date_str, '%Y-%m-%d').date()
#the conversions applied to each element (taken from original formatLineData)
castings = [str, conv_date, #0, 1
float, float, float, float, #2:6
int, int, int, #6:9
float, float, float, float, #9:13
int] #13
then use zip to apply these to each field in a line in a list comprehension:
[conv(val) for conv, val in zip(castings, line)]
so you would replace <DO STUFF HERE> with out.append with that comprehension.
I'd also wonder if switching the order of groupby and reader would be better since you don't need to parse most of the file as csv, just the parts you are actually iterating over so you could use a keyfunc that seperates just the first field of the string
def getData(wanted_symbol, filename):
out = [] #why are you starting this with strings in it?
def checkMatch(line): #define the function to only take the line
#this would be the keyfunc for groupby in this example
return line.split(",",1)[0] #only split once, return the first element
with open(filename) as file:
for symb, lines in groupby(file,checkMatch):
#so here symb is the symbol at the start of each line of lines
if symb != wanted_symbol:
continue #skip this whole section if it has a different symbol
for line in csv.reader(lines):
out.append( [typ(val) for typ,val in zip(castings,line)] )
return out

I need to change a zip code into a series of dots and dashes (a barcode), but I can't figure out how

Here's what I've got so far:
def encodeFive(zip):
zero = "||:::"
one = ":::||"
two = "::|:|"
three = "::||:"
four = ":|::|"
five = ":|:|:"
six = ":||::"
seven = "|:::|"
eight = "|::|:"
nine = "|:|::"
codeList = [zero,one,two,three,four,five,six,seven,eight,nine]
allCodes = zero+one+two+three+four+five+six+seven+eight+nine
code = ""
digits = str(zip)
for i in digits:
code = code + i
return code
With this I'll get the original zip code in a string, but none of the numbers are encoded into the barcode. I've figured out how to encode one number, but it wont work the same way with five numbers.

codeList = ["||:::", ":::||", "::|:|", "::||:", ":|::|",
":|:|:", ":||::", "|:::|", "|::|:", "|:|::" ]
barcode = "".join(codeList[int(digit)] for digit in str(zipcode))

Perhaps use a dictionary:
barcode = {'0':"||:::",
'1':":::||",
'2':"::|:|",
'3':"::||:",
'4':":|::|",
'5':":|:|:",
'6':":||::",
'7':"|:::|",
'8':"|::|:",
'9':"|:|::",
}
def encodeFive(zipcode):
return ''.join(barcode[n] for n in str(zipcode))
print(encodeFive(72353))
# |:::|::|:|::||::|:|:::||:
PS. It is better not to name a variable zip, since doing so overrides the builtin function zip. And similarly, it is better to avoid naming a variable code, since code is a module in the standard library.

You're just adding i (the character in digits) to the string where I think you want to be adding codeList[int(i)].
The code would probably be much simpler by just using a dict for lookups.

I find it easier to use split() to create lists of strings:
codes = "||::: :::|| ::|:| ::||: :|::| :|:|: :||:: |:::| |::|: |:|::".split()
def zipencode(numstr):
return ''.join(codes[int(x)] for x in str(numstr))
print zipencode("32345")

This is made in python.
number = ["||:::",
":::||",
"::|:|",
"::||:",
":|::|",
":|:|:",
":||::",
"|:::|",
"|::|:",
"|:|::"
]
def encode(num):
return ''.join(map(lambda x: number[int(x)], str(num)))
print encode(32345)

I don't know what language you are usingm so I made an example in C#:
int zip = 72353;
string[] codeList = {
"||:::", ":::||", "::|:|", "::||:", ":|::|",
":|:|:", ":||::", "|:::|", "|::|:", "|:|::"
};
string code = String.Empty;
while (zip > 0) {
code = codeList[zip % 10] + code;
zip /= 10;
}
return code;
Note: Instead of converting the zip code to a string, and the convert each character back to a number, I calculated the digits numerically.
Just for fun, here's a one-liner:
return String.Concat(zip.ToString().Select(c => "||::::::||::|:|::||::|::|:|:|::||::|:::||::|:|:|::".Substring(((c-'0') % 10) * 5, 5)).ToArray());

It appears you're trying to generate a "postnet" barcode. Note that the five-digit ZIP postnet barcodes were obsoleted by ZIP+4 postnet barcodes, which were obsoleted by ZIP+4+2 delivery point postnet barcodes, all of which are supposed to include a checksum digit and leading and ending framing bars. In any case, all of those forms are being obsoleted by the new "intelligent mail" 4-state barcodes, which require a lot of computational code to generate and no longer rely on straight digit-to-bars mappings. Search USPS.COM for more details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fastest way to parse (split) binary bits in python - python

Related

How to create a script that gives me every combination possible of a six digit code

Why are wrong bits extracted in python?

Python turn array of booleans to binary

Fast extraction of chunks of lines from large CSV file

I need to change a zip code into a series of dots and dashes (a barcode), but I can't figure out how

Categories

Resources