why cannot invoke remove directly after split a string in python? - python

I wanted to remove a substring from a string, for example "a" in "a,b,c" and then return "b,c" to me, it does not matter what's the order of a in string(like "a,b,c", "b,a,c", and so one).
DELIMITER = ","
def remove(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER)
members.remove(member)
return DELIMITER.join(members)
print remove("a","b,a,c")
output: b,c
The above function is working as it is expected.
My question is that accidently I modified my code, and it looks as:
def remove_2(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER).remove(member)
return DELIMITER.join(members)
You can see that I modified
members = members_string.split(DELIMITER)
members.remove(member)
to
members = members_string.split(DELIMITER).remove(member)
after that the method is broken, it throws
Traceback (most recent call last):
File "test.py", line 15, in <module>
remove_2("a","b,a,c")
File "test.py", line 11, in remove_2
return DELIMITER.join(members)
TypeError
Based on my understanding, members_string.split(DELIMITER) is a list, and invokes remove() is allowed and it should return the new list and stores into members, but
when I print members_string.split(DELIMITER) it returns None, it explains why throws TypeError, my question is , why it returns None other than a list with elements "b" and "c"?

remove() does not return anything. It modifies the list it's called on (lists are mutable, so it would be a major waste of cpu time and memory to create a new list) so returning the same list would be somewhat pointless.

This was already answered here.
Quote from the pythondocs:
You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.
Mutable objects like lists can be manipulated under the hood via their data-manipulation methods, like remove(),insert(),add().
Immutable objects like strings always return a copy of themselves from their data-manipulation methods, like with replace() or upper().
Method chaining
The next sample shows that your intended method-chaining works with strings:
# Every replace() call is catching a different case from
# member_string like
# a,b,member
# member,b,c
# a,member,c
DELIMITER = ","
def remove(member, member_string):
members = member_string.replace(DELIMITER + member, '').replace(member + DELIMITER, '').replace(DELIMITER + member + DELIMITER, '').upper()
return members
# puts out B,C
print remove("a","b,a,c")
List comprehension
Now for clever lists manipulation (it is even faster than for-looping) the pythonians invented a different feature named list comprehension. You can read about it in python documentation.
DELIMITER = ","
def remove(member, members_string):
members = [m.upper() for m in members_string.split(DELIMITER) if m != member]
return DELIMITER.join(members)
# puts out B,C
print remove("a","b,a,c")
In addition you could google for generators or look into pythondocs. But don't know about that a lot.
BTW, flame me down as a noob but, I hate it when they call python a beginner language, as above list-comprehension looks easy, it could be intimidating for a beginner, couldn't it?

Related

Why does print() on tuple call __repr__ for elements of tuple and not __str__? [duplicate]

I've noticed that when an instance with an overloaded __str__ method is passed to the print function as an argument, it prints as intended. However, when passing a container that contains one of those instances to print, it uses the __repr__ method instead. That is to say, print(x) displays the correct string representation of x, and print(x, y) works correctly, but print([x]) or print((x, y)) prints the __repr__ representation instead.
First off, why does this happen? Secondly, is there a way to correct that behavior of print in this circumstance?
The problem with the container using the objects' __str__ would be the total ambiguity -- what would it mean, say, if print L showed [1, 2]? L could be ['1, 2'] (a single item list whose string item contains a comma) or any of four 2-item lists (since each item can be a string or int). The ambiguity of type is common for print of course, but the total ambiguity for number of items (since each comma could be delimiting items or part of a string item) was the decisive consideration.
I'm not sure why exactly the __str__ method of a list returns the __repr__ of the objects contained within - so I looked it up: [Python-3000] PEP: str(container) should call str(item), not repr(item)
Arguments for it:
-- containers refuse to guess what the user wants to see on str(container) - surroundings, delimiters, and so on;
-- repr(item) usually displays type information - apostrophes around strings, class names, etc.
So it's more clear about what exactly is in the list (since the object's string representation could have commas, etc.). The behavior is not going away, per Guido "BDFL" van Rossum:
Let me just save everyone a lot of
time and say that I'm opposed to this
change, and that I believe that it
would cause way too much disturbance
to be accepted this close to beta.
Now, there are two ways to resolve this issue for your code.
The first is to subclass list and implement your own __str__ method.
class StrList(list):
def __str__(self):
string = "["
for index, item in enumerate(self):
string += str(item)
if index != len(self)-1:
string += ", "
return string + "]"
class myClass(object):
def __str__(self):
return "myClass"
def __repr__(self):
return object.__repr__(self)
And now to test it:
>>> objects = [myClass() for _ in xrange(10)]
>>> print objects
[<__main__.myClass object at 0x02880DB0>, #...
>>> objects = StrList(objects)
>>> print objects
[myClass, myClass, myClass #...
>>> import random
>>> sample = random.sample(objects, 4)
>>> print sample
[<__main__.myClass object at 0x02880F10>, ...
I personally think this is a terrible idea. Some functions - such as random.sample, as demonstrated - actually return list objects - even if you sub-classed lists. So if you take this route there may be a lot of result = strList(function(mylist)) calls, which could be inefficient. It's also a bad idea because then you'll probably have half of your code using regular list objects since you don't print them and the other half using strList objects, which can lead to your code getting messier and more confusing. Still, the option is there, and this is the only way to get the print function (or statement, for 2.x) to behave the way you want it to.
The other solution is just to write your own function strList() which returns the string the way you want it:
def strList(theList):
string = "["
for index, item in enumerate(theList):
string += str(item)
if index != len(theList)-1:
string += ", "
return string + "]"
>>> mylist = [myClass() for _ in xrange(10)]
>>> print strList(mylist)
[myClass, myClass, myClass #...
Both solutions require that you refactor existing code, unfortunately - but the behavior of str(container) is here to stay.
Because when you print the list, generally you're looking from the programmer's perspective, or debugging. If you meant to display the list, you'd process its items in a meaningful way, so repr is used.
If you want your objects to be printed while in containers, define repr
class MyObject:
def __str__(self): return ""
__repr__ = __str__
Of course, repr should return a string that could be used as code to recreate your object, but you can do what you want.

Understanding a Python function

I need some help understanding a function that i want to use but I'm not entirely sure what some parts of it do. I understand that the function is creating dictionaries from reads out of a Fasta-file. From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences).
The code:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
dict = {}
multipleKeys = []
i = 1
for read in reads:
if read[0:lenKeys] in dict:
multipleKeys.append(read[0:lenKeys])
else:
dict[read[0:lenKeys]] = read[lenKeys:]
if verbose:
print("\rChecking suffix", i, "of", len(reads), end = "", flush = True)
i += 1
for key in set(multipleKeys):
del(dict[key])
if verbose:
print("\nCreated", len(dict), "suffixes with length", lenSuffix, \
"from", len(reads), "Reads. (", len(reads) - len(dict), \
"unambigous)")
return(dict)
Additional Information: reads = readFasta("smallReads.fna", verbose = True)
This is how the function is called:
if __name__ == "__main__":
reads = readFasta("smallReads.fna", verbose = True)
suffixDicts = makeSuffixDicts(reads, 10)
The smallReads.fna file contains strings of bases (Dna):
"> read 1
TTATGAATATTACGCAATGGACGTCCAAGGTACAGCGTATTTGTACGCTA
"> read 2
AACTGCTATCTTTCTTGTCCACTCGAAAATCCATAACGTAGCCCATAACG
"> read 3
TCAGTTATCCTATATACTGGATCCCGACTTTAATCGGCGTCGGAATTACT
Here are the parts I don't understand:
lenKeys = len(reads[0]) - lenSuffix
What does the value [0] mean? From what I understand "len" returns the number of elements in a list.
Why is "reads" automatically a list? edit: It seems a Fasta-file can be declared as a List. Can anybody confirm that?
if read[0:lenKeys] in dict:
Does this mean "from 0 to 'lenKeys'"? Still confused about the value.
In another function there is a similar line: if read[-lenKeys:] in dict:
What does the "-" do?
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
Here I don't understand the parameters: How can reads be a parameter? What is lenSuffix = 20 in the context of this function other than a value subtracted from len(reads[0])?
What is verbose? I have read about a "verbose-mode" ignoring whitespaces but i have never seen it used as a parameter and later as a variable.
The tone of your question makes me feel like you're confusing things like program features (len, functions, etc) with things that were defined by the original programmer (the type of reads, verbose, etc).
def some_function(these, are, arbitrary, parameters):
pass
This function defines a bunch of parameters. They don't mean anything at all, other than the value I give to them implicitly. For example if I do:
def reverse_string(s):
pass
s is probably a string, right? In your example we have:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
...
From these two lines we can infer a few things:
the function will probably return a dictionary (from its name)
lenSuffix is an int, and verbose is a bool (from their default parameters)
reads can be indexed (string? list? tuple?)
the items inside reads have length (string? list? tuple?)
Since Python is dynamically typed, this is ALL WE CAN KNOW about the function so far. The rest would be explained by its documentation or the way it's called.
That said: let me cover all your questions in order:
What does the value [0] mean?
some_object[0] is grabbing the first item in a container. [1,2,3][0] == 1, "Hello, World!"[0] == "H". This is called indexing, and is governed by the __getitem__ magic method
From what I understand "len" returns the number of elements in a list.
len is a built-in function that returns the length of an object. It is governed by the __len__ magic method. len('abc') == 3, also len([1, 2, 3]) == 3. Note that len(['abc']) == 1, since it is measuring the length of the list, not the string inside it.
Why is "reads" automatically a list?
reads is a parameter. It is whatever the calling scope passes to it. It does appear that it expects a list, but that's not a hard and fast rule!
(various questions about slicing)
Slicing is doing some_container[start_idx : end_idx [ : step_size]]. It does pretty much what you'd expect: "0123456"[0:3] == "012". Slice indexes are considered to be zero-indexed and lay between the elements, so [0:1] is identical to [0], except that slices return lists, not individual objects (so 'abc'[0] == 'a' but 'abc'[0:1] == ['a']). If you omit either start or end index, it is treated as the beginning or end of the string respectively. I won't go into step size here.
Negative indexes count from the back, so '0123456'[-3:] == '456'. Note that [-0]is not the last value,[-1]is. This is contrasted with[0]` being the first value.
How can reads be a parameter?
Because the function is defined as makeSuffixDict(reads, ...). That's what a parameter is.
What is lenSuffix = 20 in the context of this function
Looks like it's the length of the expected suffix!
What is verbose?
verbose has no meaning on its own. It's just another parameter. Looks like the author included the verbose flag so you could get output while the function ran. Notice all the if verbose blocks seem to do nothing, just provide feedback to the user.

string[i:length] giving memory error

def suffix(stng):
list = []
length = len(stng)
for i in range(length):
x = stng[i:length] ## This gives a Memory Error..See below
list.append(x)
return list
This piece of code is a part of my solution of a problem on interviewstreet.com but when i submit it i get a Memory error...i want to know how to correct it?
This is the traceback:
Original exception was:
Traceback (most recent call last):
File "/run-1342184337-542152202/solution.py", line 35, in
listofsuffix=suffix(var)
File "/run-1342184337-542152202/solution.py", line 13, in suffix
x=stng[i:length]
MemoryError
A MemoryError means you have consumed all your RAM. You are creating a list containing all trailing parts of an original string. If your original string is too long, you will consume a lot of memory.
One possibility is to use a generator to produce the suffixes one at a time instead of creating a list of all of them:
def suffixes(stng):
for i in xrange(len(stng)):
yield stng[i:]
If the caller of suffixes simply iterates over the result, you don't even have to change the caller. If you truly needed an explicit list, then you'll need a different solution.
"I need to return a list" -- This is highly unlikely. You just need to return an object which looks enough like a list to make it work.
class FakeList(object):
def __init__(self,strng):
self.string=strng
self._idx=0
def __getitem__(self,i):
return self.strng[:i]
def __len__(self):
return len(self.string)
def __iter__(self):
return self
def __contains__(self,other):
return other in self.string
def next(self):
if(self._idx<len(self)):
self._idx+=1
return self[self._idx-1]
else:
raise StopIteration
a=FakeList("My String")
print a[3]
print a[4]
for i in a:
print i
This creates an object which you can access randomly and iterate over like a list. It also will allow you to call len(my_fake_list). It doesn't support slicing, and a myriad of other methods pop, append, extend ... Which of those you need to add depends on which ones you use.

Dynamic class instance naming in Python

I'm trying to write a program that determines whether two words are cognates. I've written two classes: featTup (basically a wrapper around a tuple containing the values of a letter), and featWord (basically a wrapper around of featTup objects.)
(Sorry this is all so long!)
Here's some (hopefully relevant) code:
class featTup(object):
def __init__(self,char):
self.char = char
self.phone_vals = None
self.dia_vals = None
if self.char in phone_codes:
self.phone_vals = phone_feats[phone_codes.index(char)]
elif self.char in dia_codes:
self.dia_vals = dia_feats[dia_codes.index(char)]
...
class featWord(list):
def do_dia(self,char_feats,dia_feats):
#This method handles the changes diacritics make to preceding phones
for val in dia_feats:
if dia_val:
char_feats.change_val(tup,char_feats.index(dia_val),dia_val)
def get_featWord(self):
return self.word_as_feats
def __init__(self,word):
self.word = word
self.word_as_feats = [featTup(char) for char in self.word]
for char in self.word_as_feats:
if char.is_dia():
i = self.word_as_feats.char_index(char)
self.word_as_feats.do_dia(self.word_as_feats[i-1],self.word_as_feats[i])
def word_len(self):
return len(self.get_featWord())
def char_index(self,char):
return self.word_as_feats.index(char)
The issue is that I want to take a list of words and make featWord objects for all of them. I don't know how long each list will be, nor do I know how many characters will be in each word.
More code:
def get_words(text1,text2):
import codecs
textin1 = codecs.open(text1,encoding='utf8')
word_list1 = textin1.readlines()
textin1.close()
textin2 = codecs.open(text2,encoding='utf8')
word_list2 = textin2.readlines()
textin2.close()
print word_list1,word_list2
fixed_words1 = []
fixed_words2 = []
for word in word_list1:
fixed_word = word.replace('\n','')
fixed_words1.append(fixed_word)
for word in word_list2:
fixed_word = word.replace('\n','')
fixed_words2.append(fixed_word)
print fixed_words1,fixed_words2
words1 = [(featWord(word)) for word in fixed_words1]
words2 = [(featWord(word)) for word in fixed_words2]
# for word1 in fixed_words1:
# for x in xrange(len(fixed_words1)):
words1.append(featWord(word))
for word2 in fixed_words2:
#for x in xrange(len(fixed_words2)):
words2.append(featWord(word))
print words1
#words1 = [featWord(word) for word in fixed_words1]
#words2 = [featWord(word) for word in fixed_words2]
return words1,words2
def get_cog_dict(text1,text2,threshold=10,print_results=True):
#This is the final method, running are_cog over all words in
#both lists.
word_list1,word_list2 = get_words(text1,text2)
print word_list1, word_list2
As it stands, when I call either of these last two methods, I get lists of empty lists; when I instantiate new featWord objects from strings I just give it (e.g. x = featWord("ten"), or whatever) it works fine. A relevant thing is that featWord seems to return an empty list instead of (when I instantiate featWord from IDLE, as above, it comes back as a list of featTup instances, which is good). I'm not sure why/if that's the problem.
It seems to me that (at least part of) my problem stems from improperly initializing the featWord. I'm constructing them, or whatever, but not assigning them names. I've tried just about everything I can think of (as the commented-out sections prove), and I'm stumped. There're answers on here about using dictionaries to name class instances and such, but since I can't pre-define a dictionary (each word and wordlist is potentially a different length), I'm not sure what to do.
Any help would be GREATLY appreciated. I'm kind of driving myself insane over here. Thanks.
your featWord class derives from list, but you never append anything to self, and you have overridden __init__, so lists __init__ never gets called, too.
So a featWord instance is just an empty list with some attributes and methods.
Their __repr__ is list's __repr__, that's why a list of featwords displays as a list of empty lists.
So: implement a meaningful __repr__, do not subclass from list, append something meaninful to self. Any of that will solve your problem.

Defining dynamic functions to a string

I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...
What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.
First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)
Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.
class O(object):
c = str.capitalize
r = str.replace
s = str.strip
def process_line(line, *ops):
i = iter(ops)
while True:
try:
op = i.next()
args = i.next()
except StopIteration:
break
line = op(line, *args)
return line
The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.
The process_line function is where all the interesting things happen. First, here is a description of the argument format:
The first argument is the string to be processed.
The remaining arguments must be given in pairs.
The first argument of the pair is a string method. Use the shortened method names here.
The second argument of the pair is a list representing the arguments to that particular string method.
The process_line function returns the string that emerges after all these operations have performed.
Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.
f = open("parrot_sketch.txt")
for line in f:
p = process_line(
line,
O.r, ["He's resting...", "This is an ex-parrot!"],
O.c, [],
O.s, []
)
print p
Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.
If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:
def all_lines(somefile, methods):
"""Apply a sequence of methods to all lines of some file and yield the results.
Args:
somefile: an open file or other iterable yielding lines
methods: a string that's a whitespace-separated sequence of method names.
(note that the methods must be callable without arguments beyond the
str to which they're being applied)
"""
tobecalled = [getattr(str, name) for name in methods.split()]
for line in somefile:
for tocall in tobecalled: line = tocall(line)
yield line
It is possible to map string operations to numbers:
>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>
But maybe string descriptions of the operations will be more readable.
>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>
Note:
Instead of using the string module, you can use the str type for the same effect.
>>> ops={1:str.split, 2:str.replace}
To map names (or numbers) to different string operations, I'd do something like
OPERATIONS = dict(
strip = str.strip,
lower = str.lower,
removespaces = lambda s: s.replace(' ', ''),
maketitle = lamdba s: s.title().center(80, '-'),
# etc
)
def process(myfile, ops):
for line in myfile:
for op in ops:
line = OPERATIONS[op](line)
yield line
which you use like this
for line in process(afile, ['strip', 'removespaces']):
...

Categories