python: what does the comma do in += s,? - python

I am doing a problem, the input is strings:
["abc","bcd","acef","xyz","az","ba","a","z"]
The code is listed below.
def groupStrings(self, strings):
groups = collections.defaultdict(list)
for s in strings:
tmp=[0]*len(s)
for i in range(len(s)):
tmp[i]=(ord(s[i])-ord(s[0]))%26
tmptuple=tuple(tmp)
groups[tmptuple] += s,
return groups.values()
So in groups[tmptuple]+=s,
if I remove the comma ','
I get
[["a","b","c","b","c","d","x","y","z"],["a","c","e","f"],["a","z"],["a","z","b","a"]]
instead of
[["abc","bcd","xyz"],["acef"],["a","z"],["az","ba"]]
The groups just does not add the whole string s, can anyone explain why does the comma make it different and why I could not do it without the comma?

The trailing comma makes a tuple, with a single element, s. Python doesn't require parentheses to make a tuple unless there is ambiguity (e.g. with function call parens); aside from the empty tuple (()), you can usually make tuples with just commas, no parentheses at all. In this case, a single trailing comma, s,, is equivalent to (s,).
Since groups has list values, this means that doing += s, is equivalent to .append(s) (technically, it's closer to .extend((s,)), but the end result is the same). Someone is probably trying to save a few keystrokes.
If you omitted the comma, it would be doing list += str, interpreting the str as a sequence of characters and extending the list with each of the resulting len 1 strings, as you observed.

Related

How to remove a substrings from a list of strings?

I have a list of strings, all of which have a common property, they all go like this "pp:actual_string". I do not know for sure what the substring "pp:" will be, basically : acts as a delimiter; everything before : shouldn't be included in the result.
I have solved the problem using the brute force approach, but I would like to see a clever method, maybe something like regex.
Note : Some strings might not have this "pp:string" format, and could be already a perfect string, i.e. without the delimiter.
This is my current solution:
ll = ["pp17:gaurav","pp17:sauarv","pp17:there","pp17:someone"]
res=[]
for i in ll:
g=""
for j in range(len(i)):
if i[j] == ':':
index=j+1
res.append(i[index:len(i)])
print(res)
Is there a way that I can do it without creating an extra list ?
Whilst regex is an incredibly powerful tool with a lot of capabilities, using a "clever method" is not necessarily the best idea you are unfamiliar with its principles.
Your problem is one that can be solved without regex by splitting on the : character using the str.split() method, and just returning the last part by using the [-1] index value to represent the last (or only) string that results from the split. This will work even if there isn't a :.
list_with_prefixes = ["pp:actual_string", "perfect_string", "frog:actual_string"]
cleaned_list = [x.split(':')[-1] for x in list_with_prefixes]
print(cleaned_list)
This is a list comprehension that takes each of the strings in turn (x), splits the string on the : character, this returns a list containing the prefix (if it exists) and the suffix, and builds a new list with only the suffix (i.e. item [-1] in the list that results from the split. In this example, it returns:
['actual_string', 'perfect_string', 'actual_string']
Here are a few options, based upon different assumptions.
Most explicit
if s.startswith('pp:'):
s = s[len('pp:'):] # aka 3
If you want to remove anything before the first :
s = s.split(':', 1)[-1]
Regular expressions:
Same as startswith
s = re.sub('^pp:', '', s)
Same as split, but more careful with 'pp:' and slower
s = re.match('(?:^pp:)?(.*)', s).group(1)

Operate on part of sequence while returning whole sequence

I want to shorten a python class name by truncating all but the last part ie: module.path.to.Class => mo.pa.to.Class.
This could be accomplished by splittin the string and storing the list in a variable and then operating on all but the last part and joining them back.
I would like to know if there is a way to do this in one step ie:
split to parts
create two copies of sequence (tee ?)
apply truncation to one sequence and not the other
join selected parts of sequence
Something like:
'.'.join( [chain(map(lambda x: x[:2], foo[:-1]), bar[-1]) for foo, bar in tee(name.split('.'))] )
But I'm unable to figure out working with ...foo, bar in tee(...
If you want to do it by splitting, you can split once on the last dot first, and then process only the first part by splitting it again to get the package indices, then shorten each to its first two characters, and finally join everything back together in the end. If you insist on doing it inline:
name = "module.path.to.Class"
short = ".".join([[x[:2] for x in p.split(".")] + [n] for p, n in [name.rsplit(".", 1)]][0])
print(short) # mo.pa.to.Class
This creates unnecessary lists just so it can traverse the list comprehension waters safely, in reality it probably ends up being slower than just doing it in a normal, procedural fashion:
def shorten_path(source):
indices = source.split(".")
return ".".join(x[:2] for x in indices[:-1]) + "." + indices[-1]
name = "module.path.to.Class"
print(shorten_path(name)) # mo.pa.to.Class
You could do this in one line with a regular expression:
>>> re.sub(r'(\b\w{2})\w*(\.)', r'\1\2', 'module.path.to.Class')
'mo.pa.to.Class'
The pattern r'(\b\w{2})\w*(\.)' captures two matches: the first two letters of a word, and the dot at the end of the word.
The substitution pattern r'\1\2' concatenates the two captured groups - the first two letters of the word and the dot.
No count parameter is passed to re.sub so all occurrences of the pattern are substituted.
The final word - the class name - is not truncated because it isn't follwed by a dot, so it doesn't match the pattern.

Confusion with string split method in python

Consider the following example
a= 'Apple'
b = a.split(',')
print(b)
Output is ['Apple'].
I am not getting why is it returning a list even when there is no ',' character in Apple
There might be case when we use split method we are expecting more than one element in list but since we are splitting based on separator not present in string, there will be only one element, wouldn't it be better if this mistake is caught during this split method itself
The behaviour of a.split(',') when no commas are present in a is perfectly consistent with the way it behaves when there are a positive number of commas in a.
a.split(',') says to split string a into a list of substrings that are delimited by ',' in a; the delimiter is not preserved in the substrings.
If 1 comma is found you get 2 substrings in the list, if 2 commas are found you get 3 substrings in the list, and in general, if n commas are found you get n+1 substrings in the list. So if 0 commas are found you get 1 substring in the list.
If you want 0 substrings in the list, then you'll need to supply a string with -1 commas in it. Good luck with that. :)
The docstring of that method says:
Return a list of the words in the string S, using sep as the delimiter string.
The delimiter is used to separate multiple parts of the string; having only one part is not an error.
That's the way split() function works. If you do not want that behaviour, you can implement your my_split() function as follows:
def my_split(s, d=' '):
return s.split(d) if d in s else s

Python help: generating all possible strings given optional character

I'm trying to write a function in Python that, given a string and an optional character, generates all possible strings from the given string. The big picture is using this function to eventually help with turning a CFG into chomsky normal form.
For example, given a string 'ASA' and optional character 'A', I want to be able to generate the following array:
['SA', 'AS', 'S']
Since these are all the possible strings that can be generated by omitting one or both of the A's of the original string.
For reference, I've looked at the following question: generating all possible strings given a grammar rule, but the problem seemed to be slightly different since the rules of the grammar were defined in the original string.
Here is my thinking on how to go about solving the problem: Have a recursive function that takes a string and an optional character, loops through the string to find the first optional character, then create a new string that has the first optional character omitted, add this to a return array, and call itself again with the string it just generated and the same optional character.
Then, after all recursions return, go back to the original string and omit the second occurrence of the optional character, and repeat the process.
This would continue on until all occurrences of the optional character were omitted.
I was wondering if there was any better way of doing this than by using the type of logic I just described.
As was mentioned in the comments it could also be done with itertools. Here's a quick demonstration:
import itertools
mystr='ABCDABCDAABCD'
optional_letter='A'
indices=[i for i,char in enumerate(list(mystr)) if char==optional_letter]
def remover(combination,mystr):
mylist=list(mystr)
for index in combination[::-1]:
del mylist[index]
return ''.join(mylist)
all_strings=[remover(combination,mystr)
for n in xrange(len(indices)+1)
for combination in itertools.combinations(indices,n)]
for string in all_strings: print string
It first finds all indices of occurrences of your character, then removes all the combinations of these indices from your string. If you have two optional letters in a row in the sring you will get duplicates which can be removed by using:
set(all_strings)
This is based on the combinations method, that returns a list of all possible combinations (without regard to order) of elements a list. Pass a list of indexes of the occurrences of your character to it, and the rest is straightforward:
def indexes(string, char):
return [i for i in range(len(string)) if string[i] == char]
def combinations(chars, max_length=None):
if max_length is None:
max_length = len(chars)
if len(chars) == 0:
return [[]]
nck = []
for sub_list in combinations(chars[1:], max_length):
nck.append(sub_list)
if len(sub_list) < max_length:
nck.append(chars[:1] + sub_list)
return nck
def substringsOmitting(string, char):
subbies = []
for combo in combinations(indexes(string, char)):
keepChars = [string[i] for i in range(len(string)) if not i in combo]
subbies.append(''.join(keepChars))
return subbies
if __name__ == '__main__':
print(substringsOmitting('ASA', 'A'))
output: ['ASA', 'SA', 'AS', 'S']
It does contain the string itself, too. But this should be a good starting point.

Remove items in a sequence from a string Python

Okay so I'm trying to make a function that will take a string and a sequence of items (in the form of either a list, a tuple or a string) and remove all items from that list from the string.
So far my attempt looks like this:
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.strip(item)
return s
However, for some reason when I try this or variations of this, it only returns either the original string or a version with only the first item in bad_characters removed.
>>> eliminate("foobar",["o","b"])
'foobar'
Is there a way to remove all items in bad_characters from the given string?
The reason your solution doesn't work is because str.strip() only removes characters from the outsides of the string, i.e. characters on the leftmost or rightmost end of the string. So, in the case of 'foobar', str.strip() with a single character argument would only work if you wanted to remove the characters 'f' and 'r'.
You could eliminate more of the inner characters with strip, but you would need to include one of the outer characters as well.
>>> 'foobar'.strip('of')
'bar'
>>> 'foobar'.strip('o')
'foobar'
Here's how to do it by string-joining a generator expression:
def eliminate(s, bad_characters):
bc = set(bad_characters)
return ''.join(c for c in s if c not in bc)
Try to replace the bad characters as empty strings.
def eliminate(s, bad_characters):
for item in bad_characters:
s = s.replace(item, '')
return s
strip() doesn't work as it tries to remove beginning and tail part of the original string only.
strip is not a correct choice for this task as it remove the characters from leading and trailing of the string, instead you can use str.translate method :
>>> s,l="foobar",["o","b"]
>>> s.translate(None,''.join(l))
'far'
Try this, may be time consuming using recursion
def eliminate(s, seq):
while seq:
return eliminate(s.replace(seq.pop(),""), seq)
return s
>>>eliminate("foobar",["o","b"])
'far'

Categories