My question aims to use the else condition of a for-loop in a list comprehension.
example:
empty_list = []
def example_func(text):
for a in text.split():
for b in a.split(","):
empty_list.append(b)
else:
empty_list.append(" ")
I would like to make it cleaner by using a list comprehension with both for-loops.
But how can I do this by including an escape-clause for one of the loops (in this case the 2nd).
I know I can use if with and without else in a list comprehension. But how about using else without an if statement.
Is there a way, so the interpreter will understand it as escape-clause of a for loop?
Any help is much appreciated!
EDIT:
Thanks for the answers! In fact im trying to translate morse code.
The input is a string, containing morse codes.
Each word is separated by 3 spaces. Each letter of each word is separated by 1 space.
def decoder(code):
str_list = []
for i in code.split(" "):
for e in i.split():
str_list.append(morse_code_dic[e])
else:
str_list.append(" ")
return "".join(str_list[:-1]).capitalize()
print(decoder(".. - .-- .- ... .- --. --- --- -.. -.. .- -.--"))
I want to break down the whole sentence into words, then translate each word.
After the inner loop (translation of one word) is finished, it will launch its escape-clause else, adding a space, so that the structure of the whole sentence will be preserved. That way, the 3 Spaces will be translated to one space.
As noted in comments, that else does not really make all that much sense, since the purpose of an else after a for loop is actually to hold code for conditional execution if the loop terminates normally (i.e. not via break), which your loop always does, thus it is always executed.
So this is not really an answer to the question how to do that in a list comprehension, but more of an alternative. Instead of adding spaces after all words, then removing the last space and joining everything together, you could just use two nested join generator expressions, one for the sentence and one for the words:
def decoder(code):
return " ".join("".join(morse_code_dic[e] for e in i.split())
for i in code.split(" ")).capitalize()
As mentioned in the comments, the else clause in your particular example is pointless because it always runs. Let's contrive an example that would let us investigate the possibility of simulating a break and else.
Take the following string:
s = 'a,b,c b,c,d c,d,e, d,e,f'
Let's say you wanted to split the string by spaces and commas as before, but you only wanted to preserve the elements of the inner split up to the first occurrence of c:
out = []
for i in s.split():
for e in i.split(','):
if e == 'c':
break
out.append(e)
else:
out.append('-')
The break can be simulated using the arcane two-arg form of iter, which accepts a callable and a termination value:
>>> x = list('abcd')
>>> list(iter(iter(x).__next__, 'c'))
['a', 'b']
You can implement the else by chaining the inner iterable with ['-'].
>>> from itertools import chain
>>> x = list('abcd')
>>> list(iter(chain(x, ['-'])
.__next__, 'c'))
['a', 'b']
>>> y = list('def')
>>> list(iter(chain(y, ['-'])
.__next__, 'c'))
['d', 'e', 'f', '-']
Notice that the placement of chain is crucial here. If you were to chain the dash to the outer iterator, it would always be appended, not only when c is not encountered:
>>> list(chain(iter(iter(x).__next__, 'c'), ['-']))
['a', 'b', '-']
You can now simulate the entire nested loop with a single expression:
from itertools import chain
out = [e for i in s.split() for e in iter(chain(i.split(','), ['-']).__next__, 'c')]
Related
This question already has answers here:
Determine prefix from a set of (similar) strings
(11 answers)
Closed 2 years ago.
I need to know how to identify prefixes in strings in a list. For example,
list = ['nomad', 'normal', 'nonstop', 'noob']
Its answer should be 'no' since every string in the list starts with 'no'
I was wondering if there is a method that iterates each letter in strings in the list at the same time and checks each letter is the same with each other.
Use os.path.commonprefix it will do exactly what you want.
In [1]: list = ['nomad', 'normal', 'nonstop', 'noob']
In [2]: import os.path as p
In [3]: p.commonprefix(list)
Out[3]: 'no'
As an aside, naming a list "list" will make it impossible to access the list class, so I would recommend using a different variable name.
Here is a code without libraries:
for i in range(len(l[0])):
if False in [l[0][:i] == j[:i] for j in l]:
print(l[0][:i-1])
break
gives output:
no
There is no built-in function to do this. If you are looking for short python code that can do this for you, here's my attempt:
def longest_common_prefix(words):
i = 0
while len(set([word[:i] for word in words])) <= 1:
i += 1
return words[0][:i-1]
Explanation: words is an iterable of strings. The list comprehension
[word[:i] for word in words]
uses string slices to take the first i letters of each string. At the beginning, these would all be empty strings. Then, it would consist of the first letter of each word. Then the first two letters, and so on.
Casting to a set removes duplicates. For example, set([1, 2, 2, 3]) = {1, 2, 3}. By casting our list of prefixes to a set, we remove duplicates. If the length of the set is less than or equal to one, then they are all identical.
The counter i just keeps track of how many letters are identical so far.
We return words[0][i-1]. We arbitrarily choose the first word and take the first i-1 letters (which would be the same for any word in the list). The reason that it's i-1 and not i is that i gets incremented before we check if all of the words still share the same prefix.
Here's a fun one:
l = ['nomad', 'normal', 'nonstop', 'noob']
def common_prefix(lst):
for s in zip(*lst):
if len(set(s)) == 1:
yield s[0]
else:
return
result = ''.join(common_prefix(l))
Result:
'no'
To answer the spirit of your question - zip(*lst) is what allows you to "iterate letters in every string in the list at the same time". For example, list(zip(*lst)) would look like this:
[('n', 'n', 'n', 'n'), ('o', 'o', 'o', 'o'), ('m', 'r', 'n', 'o'), ('a', 'm', 's', 'b')]
Now all you need to do is find out the common elements, i.e. the len of set for each group, and if they're common (len(set(s)) == 1) then join it back.
As an aside, you probably don't want to call your list by the name list. Any time you call list() afterwards is gonna be a headache. It's bad practice to shadow built-in keywords.
I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']
I got this question in a quiz last week, a lot of people got it wrong, so I am pretty sure it will be on our midterm:
Write a function that takes as a parameter a list of strings and
returns a list containing the first letter of each of the strings.
That is, if the input parameter is ["Daniel","Nashyl","Orla",
"Simone","Zakaria"], your function should return ['D', 'N', 'O', 'S',
'Z']. The file you submit should include a main() function that
demonstrates that your function works.
I know you can use this [#:#] to print any letters of a word or sentence.
>>> `x = "Java, Python, Ruby"`
>>> `x[:13]`
'Java, Python,'
>>> `x[:-1]`
'Java, Python, Rub'
>>> `x[:1]`
'J'
But I get confused when it comes to printing the first letter of a bunch of words. I also think that the ".split" function is needed here. I am using python 3.3.3
def first_letters(lst):
return [s[:1] for s in lst]
def main():
lst = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
assert first_letters(lst) == ['D', 'N', 'O', 'S', 'Z']
if __name__=="__main__":
main()
str.split takes a string and breaks it into a list of strings. Your input is already a list of strings, therefore you do not need .split.
"mystring"[:1] gets the first character of the string (or "" if the string is "" to begin with). Apply this to each string in the input list, and return the result.
You can do this with a list comprehension. You'll definitely want to read about them! Here's a minimal example that does what you're looking for:
>>> L = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
>>> [item[0] for item in L]
['D', 'N', 'O', 'S', 'Z']
This loops through each name in your list and creates a new list from the first letter of each item in the original list. For example, "Daniel"[0] == 'D'. No .split is needed.
List comprehensions are cool, and you should learn to use them indeed, but let me explain a bit what's going on here, since in your question you said you're confused how to do it with a bunch of strings.
So, you have a list of strings. Lists are an iterable collection, which means we can iterate through it using, for example, a for loop:
words = ["Daniel","Nashyl","Orla", "Simone","Zakaria"]
for word in words:
print word[:1]
I'm sure you were taught about loops like this in class. Now, instead of printing the first letter, let's construct a new list that contains those letters:
result = []
for word in words:
result.append(word[:1])
Here I created a new list, then for every word, I appended the starting letter of that word to the new list. A list comprehension does the same thing, with a more obscure syntax, more elegance, and a bit more efficiency:
result = [word[:1] for word in words]
This is the gist of it.
My current Python project will require a lot of string splitting to process incoming packages. Since I will be running it on a pretty slow system, I was wondering what the most efficient way to go about this would be. The strings would be formatted something like this:
Item 1 | Item 2 | Item 3 <> Item 4 <> Item 5
Explanation: This particular example would come from a list where the first two items are a title and a date, while item 3 to item 5 would be invited people (the number of those can be anything from zero to n, where n is the number of registered users on the server).
From what I see, I have the following options:
repeatedly use split()
Use a regular expression and regular expression functions
Some other Python functions I have not thought about yet (there are probably some)
Solution 1 would include splitting at | and then splitting the last element of the resulting list at <> for this example, while solution 2 would probably result in a regular expression like:
((.+)|)+((.+)(<>)?)+
Okay, this regular expression is horrible, I can see that myself. It is also untested. But you get the idea.
Now, I am looking for the way that a) takes the least amount of time and b) ideally uses the least amount of memory. If only one of the two is possible, I would prefer less time. The ideal solution would also work for strings that have more items separated with | and strings that completely lack the <>. At least the regular expression-based solution would do that.
My understanding would be that split() would use more memory (since you basically get two resulting lists, one that splits at | and the second one that splits at <>), but I don't know enough about Python's implementation of regular expressions to judge how the regular expression would perform. split() is also less dynamic than a regular expression if it comes to different numbers of items and the absence of the second separator. Still, I can't shake the impression that Python can do this better without regular expressions, and that's why I am asking.
Some notes:
Yes, I could just benchmark both solutions, but I'm trying to learn something about Python in general and how it works here, and if I just benchmark these two, I still don't know what Python functions I have missed.
Yes, optimizing at this level is only really required for high-performance stuff, but as I said, I am trying to learn things about Python.
Addition: in the original question, I completely forgot to mention that I need to be able to distinguish the parts that were separated by | from the parts with the separator <>, so a simple flat list as generated by re.split(\||<>,input) (as proposed by obmarg) will not work too well. Solutions fitting this criterium are much appreciated.
To sum the question up: Which solution would be the most efficient one, for what reasons?
Due to multiple requests, I have run some timeit on the split()-solution and the first proposed regular expression by obmarg, as well as the solutions by mgibsonbr and duncan:
import timeit
import re
def splitit(input):
res0 = input.split("|")
res = []
for element in res0:
t = element.split("<>")
if t != [element]:
res0.remove(element)
res.append(t)
return (res0, res)
def regexit(input):
return re.split( "\||<>", input )
def mgibsonbr(input): # Solution by mgibsonbr
items = re.split(r'\||<>', input) # Split input in items
offset = 0
result = [] # The result: strings for regular items, lists for <> separated ones
acc = None
for i in items:
delimiter = '|' if offset+len(i) < len(input) and input[offset+len(i)] == '|' else '<>'
offset += len(i) + len(delimiter)
if delimiter == '<>': # Will always put the item in a list
if acc is None:
acc = [i] # Create one if doesn't exist
result.append(acc)
else:
acc.append(i)
else:
if acc is not None: # If there was a list, put the last item in it
acc.append(i)
else:
result.append(i) # Add the regular items
acc = None # Clear the list, since what will come next is a regular item or a new list
return result
def split2(input): # Solution by duncan
res0 = input.split("|")
res1, res2 = [], []
for r in res0:
if "<>" in r:
res2.append(r.split("<>"))
else:
res1.append(r)
return res1, res2
print "mgibs:", timeit.Timer("mgibsonbr('a|b|c|de|f<>ge<>ah')","from __main__ import mgibsonbr").timeit()
print "split:", timeit.Timer("splitit('a|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit()
print "split2:", timeit.Timer("split2('a|b|c|de|f<>ge<>ah')","from __main__ import split2").timeit()
print "regex:", timeit.Timer("regexit('a|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit()
print "mgibs:", timeit.Timer("mgibsonbr('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import mgibsonbr").timeit()
print "split:", timeit.Timer("splitit('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit()
print "split:", timeit.Timer("split2('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import split2").timeit()
print "regex:", timeit.Timer("regexit('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit()
The results:
mgibs: 14.7349407408
split: 6.403942732
split2: 3.68306812233
regex: 5.28414318792
mgibs: 107.046683735
split: 46.0844590775
split2: 26.5595985591
regex: 28.6513302646
At the moment, it looks like split2 by duncan beats all other algorithms, regardless of length (with this limited dataset at least), and it also looks like mgibsonbr's solution has some performance issues (sorry about that, but thanks for the solution regardless).
I was slightly surprised that split() performed so badly in your code, so I looked at it a bit more closely and noticed that you're calling list.remove() in the inner loop. Also you're calling split() an extra time on each string. Get rid of those and a solution using split() beats the regex hands down on shorter strings and comes a pretty close second on the longer one.
import timeit
import re
def splitit(input):
res0 = input.split("|")
res = []
for element in res0:
t = element.split("<>")
if t != [element]:
res0.remove(element)
res.append(t)
return (res0, res)
def split2(input):
res0 = input.split("|")
res1, res2 = [], []
for r in res0:
if "<>" in r:
res2.append(r.split("<>"))
else:
res1.append(r)
return res1, res2
def regexit(input):
return re.split( "\||<>", input )
rSplitter = re.compile("\||<>")
def regexit2(input):
return rSplitter.split(input)
print("split: ", timeit.Timer("splitit( 'a|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit())
print("split2:", timeit.Timer("split2( 'a|b|c|de|f<>ge<>ah')","from __main__ import split2").timeit())
print("regex: ", timeit.Timer("regexit( 'a|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit())
print("regex2:", timeit.Timer("regexit2('a|b|c|de|f<>ge<>ah')","from __main__ import regexit2").timeit())
print("split: ", timeit.Timer("splitit( 'a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit())
print("split2:", timeit.Timer("split2( 'a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import split2").timeit())
print("regex: ", timeit.Timer("regexit( 'a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit())
print("regex2:", timeit.Timer("regexit2('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import regexit2").timeit())
Which gives the following result:
split: 1.8427431439631619
split2: 1.0897291360306554
regex: 1.6694280610536225
regex2: 1.2277749050408602
split: 14.356198082969058
split2: 8.009285948995966
regex: 9.526430513011292
regex2: 9.083608677960001
And of course split2() gives the nested lists that you wanted whereas the regex solution doesn't.
Compiling the regex will improve performance. It does make a slight difference, but Python caches compiled regular expressions so the saving is not as much as you might expect. I think usually it isn't worth doing it for speed (though it can be in some cases), but it is often worthwhile to make the code clearer.
I'm not sure if it's the most efficient, but certainly the easiest to code seems to be something like this:
>>> input = "Item 1 | Item 2 | Item 3 <> Item 4 <> Item 5"
>>> re.split( "\||<>", input )
>>> ['Item 1 ', ' Item 2 ', ' Item 3 ', ' Item 4 ', ' Item 5']
I would think there's a fair chance of it being more efficient than a plain old split as well (depending on the input data) since you'd need to perform the second split operation on every string output from the first split, which doesn't seem likely to be efficient for either memory or time.
Though having said that I could easily be wrong, and the only way to be sure would be to time it.
Calling split multiple times is likely to be inneficient, because it might create unneeded intermediary strings. Using a regex like you proposed won't work, since the capturing group will only get the last item, not every of them. Splitting using a regex, like obmarg suggested, seems to be the best route, assuming a "flattened" list is what you're looking for.
If you don't want a flattened list, you can split using a regex first and then iterate over the results, checking the original input to see which delimiter was used:
items = re.split(r'\||<>', input)
offset = 0
for i in items:
delimiter = '|' if input[offset+len(i)] == '|' else '<>'
offset += len(i) + len(delimiter)
# Do something with i, depending on whether | or <> was the delimiter
At last, if you don't want the substrings created at all (using only the start and end indices to save space, for instance), re.finditer might do the job. Iterate over the delimiters, and do something to the text between them depending on which delimiter (| or <>) was found. It's a more complex operation, since you'll have to handle many corner cases, but might be worth it depending on your needs.
Update: for your particular case, where the input format is uniform, obmarg's solutions is the best one. If you must, post-process the result to have a nested list:
split_result = re.split( "\||<>", input )
result = [split_result[0], split_result[1], [i for i in split_result[2:] if i]]
(that last list comprehension is to ensure you'll get [] instead of [''] if there are no items after the last |)
Update 2: After reading the updated question, I finally understood what you're trying to achieve. Here's the full example, using the framework suggested earlier:
items = re.split(r'\||<>', input) # Split input in items
offset = 0
result = [] # The result: strings for regular itens, lists for <> separated ones
acc = None
for i in items:
delimiter = '|' if offset+len(i) < len(input) and input[offset+len(i)] == '|' else '<>'
offset += len(i) + len(delimiter)
if delimiter == '<>': # Will always put the item in a list
if acc is None:
acc = [i] # Create one if doesn't exist
result.append(acc)
else:
acc.append(i)
else:
if acc is not None: # If there was a list, put the last item in it
acc.append(i)
else:
result.append(i) # Add the regular items
acc = None # Clear the list, since what will come next is a regular item or a new list
print result
Tested it with your example, the result was:
['a', 'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c','de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'aha'],
'b', 'c', 'de', ['f', 'ge', 'ah']]
If you know that <> is not going to appear elsewhere in the string then you could replace '<>' with '|' followed by a single split:
>>> input = "Item 1 | Item 2 | Item 3 <> Item 4 <> Item 5"
>>> input.replace("<>", "|").split("|")
['Item 1 ', ' Item 2 ', ' Item 3 ', ' Item 4 ', ' Item 5']
This will almost certainly be faster than doing multiple splits. It may or may not be faster than using re.split - timeit is your friend.
On my system with the sample string you supplied, my version is more than three times faster than re.split:
>>> timeit input.replace("<>", "|").split("|")
1000000 loops, best of 3: 980 ns per loop
>>> import re
>>> timeit re.split(r"\||<>", input)
100000 loops, best of 3: 3.07 us per loop
(N.B.: This is using IPython, which has timeit as a built-in command).
You can make use of replace. First replace <> with |, and then split by |.
def replace_way(input):
return input.replace('<>','|').split('|')
Time performance:
import timeit
import re
def splitit(input):
res0 = input.split("|")
res = []
for element in res0:
t = element.split("<>")
if t != [element]:
res0.remove(element)
res.append(t)
return (res0, res)
def regexit(input):
return re.split( "\||<>", input )
def replace_way(input):
return input.replace('<>','|').split('|')
print "split:", timeit.Timer("splitit('a|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit()
print "regex:", timeit.Timer("regexit('a|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit()
print "replace:",timeit.Timer("replace_way('a|b|c|de|f<>ge<>ah')","from __main__ import replace_way").timeit()
print "split:", timeit.Timer("splitit('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import splitit").timeit()
print "regex:", timeit.Timer("regexit('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import regexit").timeit()
print "replace:",timeit.Timer("replace_way('a|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>aha|b|c|de|f<>ge<>ah')","from __main__ import replace_way").timeit()
Results on my machine:
split: 11.8682055461
regex: 12.7430856814
replace: 2.54299265006
split: 79.2124379066
regex: 68.6917008003
replace: 10.944842347
def display_hand(hand):
for letter in hand.keys():
for j in range(hand[letter]):
print letter,
Will return something like: b e h q u w x. This is the desired output.
How can I modify this code to get the output only when the function has finished its loops?
Something like below code causes me problems as I can't get rid of dictionary elements like commas and single quotes when printing the output:
def display_hand(hand):
dispHand = []
for letter in hand.keys():
for j in range(hand[letter]):
##code##
print dispHand
UPDATE
John's answer is very elegant i find. Allow me however to expand o Kugel's response:
Kugel's approach answered my question. However i kept running into an additional issue: the function would always return None as well as the output. Reason: Whenever you don't explicitly return a value from a function in Python, None is implicitly returned. I couldn't find a way to explicitly return the hand. In Kugel's approach i got closer but the hand is still buried in a FOR loop.
You can do this in one line by combining a couple of list comprehensions:
print ' '.join(letter for letter, count in hand.iteritems() for i in range(count))
Let's break that down piece by piece. I'll use a sample dictionary that has a couple of counts greater than 1, to show the repetition part working.
>>> hand
{'h': 3, 'b': 1, 'e': 2}
Get the letters and counts in a form that we can iterate over.
>>> list(hand.iteritems())
[('h', 3), ('b', 1), ('e', 2)]
Now just the letters.
>>> [letter for letter, count in hand.iteritems()]
['h', 'b', 'e']
Repeat each letter count times.
>>> [letter for letter, count in hand.iteritems() for i in range(count)]
['h', 'h', 'h', 'b', 'e', 'e']
Use str.join to join them into one string.
>>> ' '.join(letter for letter, count in hand.iteritems() for i in range(count))
'h h h b e e'
Your ##code perhaps?
dispHand.append(letter)
Update:
To print your list then:
for item in dispHand:
print item,
another option without nested loop
"".join((x+' ') * y for x, y in hand.iteritems()).strip()
Use
" ".join(sequence)
to print a sequence without commas and the enclosing brackets.
If you have integers or other stuff in the sequence
" ".join(str(x) for x in sequence)