Python: iterables & generators to replace my while true loops? - python

Let's start by my question: can you write a better code than the one below?
FRAME_DELIMITER = b'\x0a\x0b\x0c\x0d'
def get_data():
f = bytearray();
# detect frame delimiter
while True:
f += read_byte()
if f[-4:] == FRAME_DELIMITER:
start = len(f)-2
break
# read data until next frame delimiter
while True:
f += self._read_byte()
if f[-4:] == FRAME_DELIMITER:
return f[start:-2]
In few words, this code is reading a data flow and return an entire frame. Each frame is delimited by 0x0a 0x0b 0x0c.The read_byte function reads one byte on the data flow (maybe it could be convenient to retrieve a buffer of x bytes).
I had a look to Python documentation to try writing this code in a more pythonic way (and better performance ?).
I came to generators and iterators.
We could imagine to create a generator like this one:
def my_generator(self):
while True:
yield self._read_byte()
and play around with list comprehension and itertools like this one:
f = b''.join(itertools.takewhile(lambda c: c != b'\x03', self.my_generator()))
But in fact I'm stuck because I need to check a delimiter pattern and not only one character.
Could you help in giving me the right direction ... Or maybe my code above is just what I need ?!
Thanks!

It's not practical to perform the test you're going for without some state, but you can hide the state in your generator!
You could make your generator read the frame itself, assuming the delimiter is a constant value (or you pass in the required delimiter). A collections.deque can allow it to easily preserve state only for the last four characters, so it's not just hiding large data storage in state:
def read_until_delimiter(self):
# Note: If FRAME_DELIMITER is bytes, and _read_byte returns len 1 bytes objects
# rather than ints, you'll need to tweak the definition of frame_as_deque to make it store bytes
frame_as_deque = collections.deque(FRAME_DELIMITER)
window = collections.deque(maxlen=len(FRAME_DELIMITER))
while window != frame_as_deque:
byte = self._read_byte()
yield byte
window.append(byte) # Automatically ages off bytes to keep constant length after filling
Now your caller can just do:
f = bytearray(self.read_until_delimiter())
# Or bytearray().join(self.read_until_delimiter()) if reading bytes objects, not ints
start = len(f) - 2
Note: I defined the maxlen in terms of the length of FRAME_DELIMITER; your end of delimiter would almost never pass, because you sliced off the last four bytes, and compared them to a constant containing only three bytes.

I think by saying a better code Is code that don't slice a the concatenated bytes sequence instead a smart generator, and use only one while loop:
# just to simulate your method
data = b'AA\x0a\x0b\x0cBBqfdqsfqsfqsvcwccvxcvvqsfq\x0a\x0b\x0cqsdfqs'
index = -1
def get_bytes():
# you used two method
# return read_byte() if count == 2 else self._read_byte()
global index
index += 1
return data[index:index + 1]
FRAME_DELIMITER = b'\x0a\x0b\x0c'
def get_data():
def update_last_delimiter(v):
""" update the delemeter with the last readed element"""
nonlocal last_four_byte
if len(last_four_byte) < len(FRAME_DELIMITER):
last_four_byte += v
else:
last_four_byte = last_four_byte[1:] + v
count = 2
last_four_byte = b''
while True:
# because you have two method to extract bytes
# replace get_bytes() by (read_byte() if count == 2 else self._read_byte())
update_last_delimiter(get_bytes())
# only yields items when the first DELIMITER IS FOUND
if count < 2:
yield last_four_byte[1:2]
if last_four_byte == FRAME_DELIMITER:
count -= 1
if not count:
break
else:
# when first DELIMITER is found we should yield the [-2] element
yield last_four_byte[1:2]
print(b''.join(get_data()))
# b'\x0b\x0cBBqfdqsfqsfqsvcwccvxcvvqsfq\n\x0b'
The key here is to keep track of the last DELIMITER bytes

Related

Create a function in python that replaces "to be honest" in a sentence with "TBH"

Create a function in python that replaces at least four different words or phrases with internet slang acronyms such as LOL, OMG, TBH. For example, if the user enters a sentence "Oh my god, I am scared to be honest." The output should be "OMG I am scared TBH". The program must not use any built-in find, replace, encode, index, or translate functions. The program can use indexing (i.e., [ ] ), slicing (i.e., :), the in operator, and the len() function.
This is what I have so far:
user_string = (input("Please enter a string: ")).lower()
punctuations = '''.,!##$%^&*()[]{};:-'"\|<>/?_~'''
new_string = ""
list = []
for i in range(0, len(user_string)):
if (user_string[i] not in punctuations):
new_string = new_string + user_string[i]
print(new_string)
slang = "to be honest"
for i in range(0, len(slang)):
for j in range(0, len(new_string)):
if (new_string[j] == slang[i]):
list.append(j)
if (i < len(slang)):
i = i + 1
elif (new_string[j] != slang[i]):
if (len(list) > 0):
list.pop()
print(list)
First I am getting the sentence from the user and removing all the punctuations from the sentence. Then I have created a variable called slang which holds the slang that I want to replace in the sentence with the acronym "TBH".
I have nested for loops which compare the string that the user has entered to the first letter of the slang variable. If the letters are the same, it compares the next letter of the string with the next letter of the slang.
I'm getting an error from the last part. How do I check if "to be honest" is in the string that the user has entered? And if it is in the string, how do I replace it with "TBH"?
I cannot see any python errors that your code will actually produce, given the number of guard clauses, so I will assume what you mean by error is actually the program not working as you intended.
With that in mind, the main problem with your code is that you have nested for loops. This means that for any one character in slang, you check it against every character in new_string.
If you run through your code with this in mind, you will see that for every character in slang, you are attempting to add one value to the list and remove len(slang) - 1 values from it. Your clause, however, prevents this from causing an python error.
I would also like to mention that the statement
if (i < Len(slang)):
i = i + 1
is completely unnecessary because i is already automatically incremented by the for loop, which could cause issues later. It is guarded by a clause though, which is why it isn't a problem yet.
If you're still stuck on this problem, here's my version on how to solve this exercise:
# This is a dictionary so we can automate the replacement on the `__main__` scope
targets = {'to be honest': 'TBH', 'oh my god': 'OMG'}
# Returns a list of intervals that tells where all occurences of the
# `sequence` passed as parameter resides inside `source`.
#
# If `sequence` is not present, the list will be empty.
def findSequences(source, sequence):
# This is our return value.
intervals = []
# len is O(1). But if you need to implement your own len function,
# this might be handy to save for the linear complexity.
srcLength = len(source)
seqLength = len(sequence)
# If the sequence is larger than source, it's not inside
if (seqLength > srcLength):
return intervals
# If it's smaller or equal than source, it might be
else:
buffer = ''
for i in range(srcLength):
buffer = ''
# From a starting character on index `i`, we will create
# a temporary buffer with the length of sequence.
for j in range(seqLength):
# We must take care to not go out of the source string
# otherwise, there's no point in continuing on building
# buffer.
if (i+j >= srcLength):
break
else:
buffer += source[i+j]
# If this temporary buffer equals sequence, we found the
# substring!
if (buffer == sequence):
# Return the interval of the substring
intervals.append((i, i+j))
# Out of the for-loop.
return intervals
# Takes out any characters inside `punctuation` from source.
#
# Uses the `in` keyword on the if-statement. But as the post says,
# it's allowed.
def takeOutPunctuation(source, punctuation='.,!##$%^&*()[]{};:-\'"\\|<>/?_~'):
buffer = ''
for char in source:
if (char not in punctuation):
buffer += char
return buffer
# A naive approach would not to find all intervals, but to find the first
# `phrase` occurence inside the `source` string, and replace it. If you do
# that, it will get replacing "TBH" to "TBH2" infinitelly, always append "2"
# to the string.
#
# This function is smart enough to avoid that.
#
# It replaces all occurences of the `phrase` string into a `target` string.
#
# As `findSequences` returns a list of all capture's intervals, the
# replacement will not get stuck in an infinite loop if we use
# parameters such as: myReplace(..., "TBH", "TBH2")
def myReplace(source, phrase, target):
intervals = findSequences(source, phrase)
if (len(intervals) == 0):
return source
else:
# Append everything until the first capture
buffer = source[:intervals[0][0]]
# We insert this first interval just for writting less code inside the for-loop.
#
# This is not a capture, it's just so we can access (i-1) when the iteration
# starts.
intervals.insert(0, (0, intervals[0][0]))
# Start a the second position of the `intervals` array so we can access (i-1)
# at the start of the iteration.
for i in range(1, len(intervals)):
# For every `phrase` capture, we append:
# - everything that comes before the capture
# - the `target` string
buffer += source[intervals[i-1][1]+1:intervals[i][0]] + target
# Once the iteration ends, we must append everything that comes later
# after the last capture.
buffer += source[intervals[-1][1]+1:]
# Return the modified string
return buffer
if __name__ == '__main__':
# Note: I didn't wrote input() here so we can see what the actual input is.
user_string = 'Oh my god, I am scared to be honest and to be honest and to be honest!'.lower()
user_string = takeOutPunctuation(user_string)
# Automated Replacement
for key in targets:
user_string = myReplace(user_string, key, targets[key])
# Print the output:
print(user_string)
# -> OMG i am scared TBH and TBH and TBH
Note: I used Python 3.10.2 to run this script.

Variable table width with .format

I'm trying to display data from a csv in a text table. I've got to the point where it displays everything that I need, however the table width still has to be set, meaning if the data is longer than the number set then issues begin.
I currently print the table using .format to sort out formatting, is there a way to set the width of the data to a variable that is dependant on the length of the longest piece of data?
for i in range(len(list_l)):
if i == 0:
print(h_dashes)
print('{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}'.format('|', (list_l[i][0].upper()),'|', (list_l[i][1].upper()),'|',(list_l[i][2].upper()),'|', (list_l[i][3].upper()),'|'))
print(h_dashes)
else:
print('{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}'.format('|', list_l[i][0], '|', list_l[i][1], '|', list_l[i][2],'|', list_l[i][3],'|'))
I realise that the code is far from perfect, however I'm still a newbie so it's piecemeal from various tutorials
You can actually use a two-pass approach to first get the correct lengths. As per your example with four fields per line, the following shows the basic idea you can use.
What follows is an example of the two-pass approach, first to get the maximum lengths for each field, the other to do what you're currently doing (with the calculated rather than fixed lengths):
# Can set MINIMUM lengths here if desired, eg: lengths = [10, 0, 41, 7]
lengths = [0] * 4
fmtstr = None
for pass in range(2):
for i in range(len(list_l)):
if pass == 0:
# First pass sets lengths as per data.
for field in range(4):
lengths[field] = max(lengths[field], len(list_l[i][field])
else:
# Second pass prints the data.
# First, set format string if not yet set.
if fmtstr is None:
fmtstr = '|'
for item in lengths:
fmtstr += '{:^%ds}|' % (item)
# Now print item (and header stuff if first item).
if i == 0: print(h_dashes)
print(fmtstr.format(list_l[i][0].upper(), list_l[i][1].upper(), list_l[i][2].upper(), list_l[i][3].upper()))
if i == 0: print(h_dashes)
The construction of the format string is done the first time you process an item in pass two.
It does so by taking a collection like [31,41,59] and giving you the string:
|{:^31s}|{:^41s}|{:^59s}|
There's little point using all those {:^1s} format specifiers when the | is not actually a varying item - you may as well code it directly into the format string.

Transposition Cipher in Python

Im currently trying to code a transposition cipher in python. however i have reached a point where im stuck.
my code:
key = "german"
length = len(key)
plaintext = "if your happy and you know it clap your hands, clap your hands"
Formatted = "".join(plaintext.split()).replace(",","")
split = split_text(formatted,length)
def split_text(formatted,length):
return [formatted[i:i + length] for i in range(0, len(formatted), length)]
def encrypt():
i use that to count the length of the string, i then use the length to determine how many columns to create within the program. So it would create this:
GERMAN
IFYOUR
HAPPYA
NDYOUK
NOWITC
LAPYOU
RHANDS
CLAPYO
URHAND
S
this is know where im stuck. as i want to get the program to create a string by combining the columns together. so it would combine each column to create:
IHNNLRCUSFADOAHLRYPYWPAAH .....
i know i would need a loop of some sort but unsure how i would tell the program to create such a string.
thanks
you can use slices of the string to get each letter of the string in steps of 6 (length)
print(formatted[0::length])
#output:
ihnnlrcus
Then just loop through all the possible start indices in range(length) and link them all together:
def encrypt(formatted,length):
return "".join([formatted[i::length] for i in range(length)])
note that this doesn't actually use split_text, it would take formatted directly:
print(encrypt(formatted,length))
the problem with using the split_text you then cannot make use of tools like zip since they stop when the first iterator stops (so because the last group only has one character in it you only get the one group from zip(*split))
for i in zip("stuff that is important","a"):
print(i)
#output:
("s","a")
#nothing else, since one of the iterators finished.
In order to use something like that you would have to redefine the way zip works by allowing some of the iterators to finish and continue until all of them are done:
def myzip(*iterators):
iterators = tuple(iter(it) for it in iterators)
while True: #broken when none of iterators still have items in them
group = []
for it in iterators:
try:
group.append(next(it))
except StopIteration:
pass
if group:
yield group
else:
return #none of the iterators still had items in them
then you can use this to process the split up data like this:
encrypted_data = ''.join(''.join(x) for x in myzip(*split))

GET Request Flask

I have written something that works, but I am 100% sure that there is an even more efficient and faster way of doing what I did.
The code that I have written, essentially uses OpenBayes' library and creates a network with its nodes, relationships between nodes, and the probabilities and distributions associated with each of the nodes. Now, I was creating a GET request using Flask, in order to process the conditional probabilities by simply sending the request.
I will send some evidence (given values), and set the node in which I want its probability (observed value). Mathematically it looks like this:
Observed Value = O and Evidence = En, where n > 1
P( O | E1, E2, ..., En)
My final goal would be to have a client/server ping the server hosting this code(with the right parameters) and constantly give me the final values of the observed probability, given the evidence (which could be 1 or more values). The code I have written so far for the GET request portion is:
#app.route('/evidence/evidence=<evidence>&observed=<obv>', methods=['GET'])
def get_evidence(evidence, obv):
# Take <evidence> and <obv> split them up. For example:
# 'cloudy1rain0sprinkler1' to 'cloudy1', 'rain0' and 'sprinkler1', all in a nice list.
analyzeEvidence, observedNode = evidence.upper().strip(), obv.upper().strip()
string, count, newCount, listOfEvidence = "", 0, 0, {}
counter = sum(character.isdigit() for character in analyzeEvidence)
# This portion is to set up all the evidences.
for y in xrange(0, counter):
string, newCount = "", count
for x in xrange(newCount, len(analyzeEvidence)):
count += 1
if analyzeEvidence[x].isalpha() == True:
string += str(analyzeEvidence[x])
elif analyzeEvidence[x].isdigit() == True and string in allNodes:
if int(analyzeEvidence[x]) == 1 or int(analyzeEvidence[x]) == 0:
listOfEvidence[string] = int(analyzeEvidence[x])
break
else: abort(400)
break
else: abort(400)
net.SetObs(listOfEvidence) # This would set the evidence like this: {"CLOUDY": 1, "RAIN":0}
# This portion is to set up one single observed value
string = ""
for x in xrange(0, len(observedNode)):
if observedNode[x].isalpha() == True:
string += str(observedNode[x])
if string == "WETGRASS":
string = "WET GRASS"
elif observedNode[x].isdigit() == True and string in allNodes:
if int(observedNode[x]) == 1 or int(observedNode[x]) == 0:
observedValue = int(observedNode[x])
observedNode = string
break
else: abort(400)
else: abort(400)
return str(net.Marginalise(observedNode)[observedValue]) # Output returned is the value like: 0.7452
Given my code, is there any way to optimize it? Also, Is there a better way of passing these parameters that doesn't take so many lines like my code does? I was planning on setting fixed key parameters, but because my number of evidence can change per request, I thought this would be one way in doing so.
You can easily split your evidence input into a list of strings with this:
import re
# 'cloudy1rain0sprinkler1' => ['cloudy1', 'rain0' and 'sprinkler1'].
evidence_dict = {}
input_evidence = 'cloudy1rain0sprinkler1'
# looks for a sequence of alphabets followed by any number of digits
evidence_list = re.findall('([a-z]+\d+)', input_evidence.lower())
for evidence in evidence_list:
name, val, _ = re.split('(\d+)', evidence)
if name in allNodes:
evidence_dict[name] = val
# evidence_dict = {'cloudy': 1, 'rain': 0, 'sprinkler': 1}
You should be able to do something similar with the observations.
I would suggest you use an HTTP POST. That way you can send a JSON object which will already have the separation of variable names and values done for you, all you'll have to do is check that the variable names sent are valid in allNodes. It will also allow your variable list to grow somewhat arbitrarily.

Python, I need the following code to finish quicker

I need the following code to finish quicker without threads or multiprocessing. If anyone knows of any tricks that would be greatly appreciated. maybe for i in enumerate() or changing the list to a string before calculating, I'm not sure.
For the example below, I have attempted to recreate the variables using a random sequence, however this has rendered some of the conditions inside the loop useless ... which is ok for this example, it just means the 'true' application for the code will take slightly longer.
Currently on my i7, the example below (which will mostly bypass some of its conditions) completes in 1 second, I would like to get this down as much as possible.
import random
import time
import collections
import cProfile
def random_string(length=7):
"""Return a random string of given length"""
return "".join([chr(random.randint(65, 90)) for i in range(length)])
LIST_LEN = 18400
original = [[random_string() for i in range(LIST_LEN)] for j in range(6)]
LIST_LEN = 5
SufxList = [random_string() for i in range(LIST_LEN)]
LIST_LEN = 28
TerminateHook = [random_string() for i in range(LIST_LEN)]
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exclude above from benchmark
ListVar = original[:]
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
print(ListVar[b])
ListVar = original[:]
That makes a shallow copy of ListVar, so your changes to the second level lists are going to affect the original also. Are you sure that is what you want? Much better would be to build the new modified list from scratch.
for b in range(len(ListVar)):
for c in range(len(ListVar[b])):
Yuck: whenever possible iterate directly over lists.
#If its an int ... remove
try:
int(ListVar[b][c].replace(' ', ''))
ListVar[b][c] = ''
except: pass
You want to ignore spaces in the middle of numbers? That doesn't sound right. If the numbers can be negative you may want to use the try..except but if they are only positive just use .isdigit().
#if any second sufxList delete
for d in range(len(SufxList)):
if ListVar[b][c].find(SufxList[d]) != -1: ListVar[b][c] = ''
Is that just bad naming? SufxList implies you are looking for suffixes, if so just use .endswith() (and note that you can pass a tuple in to avoid the loop). If you really do want to find the the suffix is anywhere in the string use the in operator.
for d in range(len(TerminateHook)):
if ListVar[b][c].find(TerminateHook[d]) != -1: ListVar[b][c] = ''
Again use the in operator. Also any() is useful here.
#remove all '' from list
while '' in ListVar[b]: ListVar[b].remove('')
and that while is O(n^2) i.e. it will be slow. You could use a list comprehension instead to strip out the blanks, but better just to build clean lists to begin with.
print(ListVar[b])
I think maybe your indentation was wrong on that print.
Putting these suggestions together gives something like:
suffixes = tuple(SufxList)
newListVar = []
for row in original:
newRow = []
newListVar.append(newRow)
for value in row:
if (not value.isdigit() and
not value.endswith(suffixes) and
not any(th in value for th in TerminateHook)):
newRow.append(value)
print(newRow)

Categories