Even positioned numbers are escaped unfortunatly python3 - python

I have a function:
def fun(l):
for i in l:
if len(i)==10:
l.append('+91 {} {}'.format(i[:5],i[5:]))
l.remove(i)
if len(i)==11:
j=list(''.join(i))
j.remove(i[0])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==12:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==13:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
j.remove(i[2])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
return l
say l=['9195969878','07895462130','919875641230']
I am getting the output as
['+91 91959 69878','7895462130','+91 98756 41230']
But i have suppose to get the output as:
['+91 91959 69878','+91 78954 62130,'+91 98756 41230']
Actually this function is escaping all that is positioned even no in 'l' list. Kindly suggest

The first problem is that you're mutating the list while iterating over it. In this particular case, this caused the loop to skip some items, as you deleted items that were earlier. In other Python versions it might trigger an error. But you're returning your result, so I don't see why you're mutating the list at all.
Secondly your code does some roundabout things, in particular ''.join(i) which is absolutely redundant (it literally rebuilds the same string), and series of remove() calls which almost certainly don't do what you expect. If you remove the first item from [1,2,3], the list becomes [2,3], and if you follow that by removing the second item (index 1) you end up with [2]. This is the same sort of issue your for loop has with the other remove.
I would also restructure the code a bit to avoid code duplication. I get something like:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:])
for i in l]
This never alters l, makes one single pass, and joins all the different length behaviours by observing that we're using parts at a fixed distance from the end. There is one caveat: other lengths aren't handled separately. I don't know if those occur, or how you actually want them handled (the old code would leave them as is). We can easily enough specify other behaviour:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:]) if 10<=len(i)<=13
else i
for i in l]
This still doesn't reproduce the behaviour that reformatted numbers were appended at the end, but I'm not sure you really wanted that. It made little sense for the loop to process its own output in the first place.

You are modifying the list l as you go - I would suggest to create a new list and add things to this list. Is there a reason you want to mutate in place?
If you are intent on mutating in place, why not just do something like this?
l[index] = '+91 {} {}'.format(i[:5],i[5:])
Also, here is the first google result for "python phone number library": https://github.com/daviddrysdale/python-phonenumbers as it may be of use to you. (Never used it, am not the maintainer.)

Related

Pythonic way not working with list

I have a seemingly simple problem, but I the code that I believe should solve it is not behaving as expected -- but a less elegant code that I find functionally equivalent behaves as expected. Can you help me understand?
The task: create a list, drop a specific value.
The specific usecase is that I am dropping a specific list of columns of pd.df, but that is not the part I want to focus on. It's that I seem to be unable to do it in a nice, pythonic single-line operation.
What I think should work:
result = list(df.columns).remove(x)
This results in object of type 'NoneType'
However, the following works fine:
result = list(df.columns)
result.remove(X)
These look functionally equivalent to me -- but the top approach is clearer and preferred, but it does not work. Why?
The reason is that remove changes the list, and does not return a new one, so you can't chain it.
What about the following way?
result = [item for item in df.columns if item != x]
Please note that this code is not exactly equivalent to the one you provided, as it will remove all occurrences of x, not just the first one as with remove(x).
Those are definitely not functionally equivalent.
The first piece of code puts the result of the last called method into result, so whatever remove returns. remove always returns None since it returns nothing.
The second piece of code puts the list into result, then removes from the list (which is already stored in result) the item. You are discarding the return of remove, as you should. The equivalent and wrong thing to do would be:
:
result = list(df.columns)
result = result.remove(X)
The two pieces of code are not really equivalent. In the second one, the variable result holds your list. You then call remove on that list, and the element is removed. So far so good.
In the first piece of code you try to assign the return value of remove() to result, so this would be the same as:
result = list(df.columns)
result = result.remove(X)
And since remove has no return value, the result will be NoneType.

What is the most efficient way to to replace switch statement in my case?

I needed to search and sort a huge piles of data in a biglist and put them sorted in other lists in python 3.5.
As I finishied coding, I realized that if I only need to check for the item in biglist, I should use a switch statement to make my code more efficient. I am at beginner-level speaking of python. I searched for the switch statement in python 3.5 and was in shock that such a statement doesn' t exist in python.(I programmed a little bit in C, Java and JavaScript and they all have switch statement, so I thought that would be something that has to exist in every language for flow control.)
The part of my code to search through biglist looks currently something like this:
for item in biglist:
if item == String1:
list1.append(biglist[biglist.index(item) + 1])
continue
#
#this goes on till String10 and ends like this
#
elif item == String10:
list10.append(biglist[biglist.index(item) + 1])
continue
else:
break
The whole program took about 12 hours to finish for one dataset. I need to do this 4 times more, but before I do so, I would love some suggestions or even solutions of how to make my code more efficient and faster, if I haven't implemented the most efficient solution already.
Please, also explain why the solution is more efficient, because I want to understand it.
The inefficency has nothing to do with the presence or otherwise of "switch", but the fact that you use the .index() method, which incurs a full scan through the list to find your item. There's no need to do this, you can return an index using the enumerate function:
for index, item in enumerate(biglist):
if item == String1:
list1.append(biglist[index + 1])
Most likely the performance problem is not the switch-like if statements but the biglist.index(item) operation which runs in O(n) (Complexity of list.index(x) in Python).
Use something like:
for idx, item in enumerate(biglist):
print idx, item
to keep track of the index of the item.
If you still want to replace the if statements you could use a dictionary which has a list stored for each possible item value.
This can mimic switch.
def switch(x):
return {
'String1': list1,
'String10' : list10
}.get(x)
for item in bigList:
try:
switch(item).append(biglist[biglist.index(item) + 1])
except AttributeError:
#Do some other work

why doesn't following Python removing duplicates function work? [duplicate]

This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed last month.
As an experiment, I did this:
letters=['a','b','c','d','e','f','g','h','i','j','k','l']
for i in letters:
letters.remove(i)
print letters
The last print shows that not all items were removed ? (every other was).
IDLE 2.6.2
>>> ================================ RESTART ================================
>>>
['b', 'd', 'f', 'h', 'j', 'l']
>>>
What's the explanation for this ? How it could this be re-written to remove every item ?
Some answers explain why this happens and some explain what you should've done. I'll shamelessly put the pieces together.
What's the reason for this?
Because the Python language is designed to handle this use case differently. The documentation makes it clear:
It is not safe to modify the sequence being iterated over in the loop (this can only happen for mutable sequence types, such as lists). If you need to modify the list you are iterating over (for example, to duplicate selected items) you must iterate over a copy.
Emphasis mine. See the linked page for more -- the documentation is copyrighted and all rights are reserved.
You could easily understand why you got what you got, but it's basically undefined behavior that can easily change with no warning from build to build. Just don't do it.
It's like wondering why i += i++ + ++i does whatever the hell it is it that line does on your architecture on your specific build of your compiler for your language -- including but not limited to trashing your computer and making demons fly out of your nose :)
How it could this be re-written to remove every item?
del letters[:] (if you need to change all references to this object)
letters[:] = [] (if you need to change all references to this object)
letters = [] (if you just want to work with a new object)
Maybe you just want to remove some items based on a condition? In that case, you should iterate over a copy of the list. The easiest way to make a copy is to make a slice containing the whole list with the [:] syntax, like so:
#remove unsafe commands
commands = ["ls", "cd", "rm -rf /"]
for cmd in commands[:]:
if "rm " in cmd:
commands.remove(cmd)
If your check is not particularly complicated, you can (and probably should) filter instead:
commands = [cmd for cmd in commands if not is_malicious(cmd)]
You cannot iterate over a list and mutate it at the same time, instead iterate over a slice:
letters=['a','b','c','d','e','f','g','h','i','j','k','l']
for i in letters[:]: # note the [:] creates a slice
letters.remove(i)
print letters
That said, for a simple operation such as this, you should simply use:
letters = []
You cannot modify the list you are iterating, otherwise you get this weird type of result. To do this, you must iterate over a copy of the list:
for i in letters[:]:
letters.remove(i)
It removes the first occurrence, and then checks for the next number in the sequence. Since the sequence has changed it takes the next odd number and so on...
take "a"
remove "a" -> the first item is now "b"
take the next item, "c"
-...
what you want to do is:
letters[:] = []
or
del letters[:]
This will preserve original object letters was pointing to. Other options like, letters = [], would create a new object and point letters to it: old object would typically be garbage-collected after a while.
The reason not all values were removed is that you're changing list while iterating over it.
ETA: if you want to filter values from a list you could use list comprehensions like this:
>>> letters=['a','b','c','d','e','f','g','h','i','j','k','l']
>>> [l for l in letters if ord(l) % 2]
['a', 'c', 'e', 'g', 'i', 'k']
Probably python uses pointers and the removal starts at the front. The variable „letters“ in the second line partially has a different value than tha variable „letters“ in the third line. When i is 1 then a is being removed, when i is 2 then b had been moved to position 1 and c is being removed. You can try to use „while“.
#!/usr/bin/env python
import random
a=range(10)
while len(a):
print a
for i in a[:]:
if random.random() > 0.5:
print "removing: %d" % i
a.remove(i)
else:
print "keeping: %d" % i
print "done!"
a=range(10)
while len(a):
print a
for i in a:
if random.random() > 0.5:
print "removing: %d" % i
a.remove(i)
else:
print "keeping: %d" % i
print "done!"
I think this explains the problem a little better, the top block of code works, whereas the bottom one doesnt.
Items that are "kept" in the bottom list never get printed out, because you are modifiying the list you are iterating over, which is a recipe for disaster.
OK, I'm a little late to the party here, but I've been thinking about this and after looking at Python's (CPython) implementation code, have an explanation I like. If anyone knows why it's silly or wrong, I'd appreciate hearing why.
The issue is moving through a list using an iterator, while allowing that list to change.
All the iterator is obliged to do is tell you which item in the (in this case) list comes after the current item (i.e. with the next() function).
I believe the way iterators are currently implemented, they only keep track of the index of the last element they iterated over. Looking in iterobject.c one can see what appears to be a definition of an iterator:
typedef struct {
PyObject_HEAD
Py_ssize_t it_index;
PyObject *it_seq; /* Set to NULL when iterator is exhausted */
} seqiterobject;
where it_seq points to the sequence being iterated over and it_index gives the index of the last item supplied by the iterator.
When the iterator has just supplied the nth item and one deletes that item from the sequence, the correspondence between subsequent list elements and their indices changes. The former (n+1)st item becomes the nth item as far as the iterator is concerned. In other words, the iterator now thinks that what was the 'next' item in the sequence is actually the 'current' item.
So, when asked to give the next item, it will give the former (n+2)nd item(i.e. the new (n+1)st item).
As a result, for the code in question, the iterator's next() method is going to give only the n+0, n+2, n+4, ... elements from the original list. The n+1, n+3, n+5, ... items will never be exposed to the remove statement.
Although the intended activity of the code in question is clear (at least for a person), it would probably require much more introspection for an iterator to monitor changes in the sequence it iterates over and, then, to act in a 'human' fashion.
If iterators could return prior or current elements of a sequence, there might be a general work-around, but as it is, you need to iterate over a copy of the list, and be certain not to delete any items before the iterator gets to them.
Intially i is reference of a as the loop runs the first position element deletes or removes and the second position element occupies the first position but the pointer moves to the second position this goes on so that's the reason we are not able to delete b,d,f,h,j,l
`

How to delete elements in a list based on another list?

Suppose I have a list called icecream_flavours, and two lists called new_flavours and unavailable. I want to remove the elements in flavours that appear in 'unavailable', and add those in new_flavours to the original one. I wrote the following program:
for i in unavailable:
icecream_flavours.remove(i)
for j in new_flavours:
icecream_flavours.append(j)
the append one is fine, but it keeps showing 'ValueError: list.remove(x): x not in list' for the first part of the program. What's the problem?
thanks
There are two possibilities here.
First, maybe there should never be anything in unavailable that wasn't in icecream_flavours, but, because of some bug elsewhere in your program, that isn't true. In that case, you're going to need to debug where things first go wrong, whether by running under the debugger or by adding print calls all over the code. At any rate, since the problem is most likely in code that you haven't shown us here, we can't help if that's the problem.
Alternatively, maybe it's completely reasonable for something to appear in unavailable even though it's not in icecream_flavours, and in that case you just want to ignore it.
That's easy to do, you just need to write the code that does it. As the docs for list.remove explain, it:
raises ValueError when x is not found in s.
So, if you want to ignore cases when i is not found in icecream_flavours, just use a try/except:
for i in unavailable:
try:
icecream_flavours.remove(i)
except ValueError:
# We already didn't have that one... which is fine
pass
That being said, there are better ways to organize your code.
First, using the right data structure always makes things easier. Assuming you don't want duplicate flavors, and the order of flavors doesn't matter, what you really want here is sets, not lists. And if you had sets, this would be trivial:
icecream_flavours -= unavailable
icecream_flavours |= new_flavours
Even if you can't do that, it's usually simpler to create a new list than to mutate one in-place:
icecream_flavours = [flavour for flavour in icecream_flavours
if flavour not in set(unavailable)]
(Notice that I converted unavailable to a set, so we don't have to brute-force search for each flavor in a list.)
Either one of these changes makes the code shorter, and makes it more efficient. But, more importantly, they both make the code easier to reason about, and eliminate the possibility of bugs like the one you're trying to fix.
To add all the new_flavours that are not unavailable, you can use a list comprehension, then use the += operator to add it to the existing flavors.
icecream_flavours += [i for i in new_flavours if i not in unavailable]
If there are already flavors in the original list you want to remove, you can remove them in the same way
icecream_flavours = [i for i in icecream_flavours if i not in unavailable]
If you first want to remove all the unavailable flavours from icecream_flavours and then add the new flavours, you can use this list comprehension:
icecream_flavours = [i for i in icecream_flavours if i not in unavailable] + new_flavours
Your error is caused because unavailable contains flavours that are not in icecream_flavours.
Unless order is important, you could use set instead of list as they have operations for differences and unions and you don't need to worry about duplicates
If you must use lists, a list comprehension is a better way to filter the list
icecream_flavours = [x for x in icecream_flavours if x not in unavaiable]
You can extend the list of flavours like this
icecream_flavours += new_flavours
assuming there are no duplicates.

Python: Nested for loops or "next" statement

I'm a rookie hobbyist and I nest for loops when I write python, like so:
dict = {
key1: {subkey/value1: value2}
...
keyn: {subkeyn/valuen: valuen+1}
}
for key in dict:
for subkey/value in key:
do it to it
I'm aware of a "next" keyword that would accomplish the same goal in one line (I asked a question about how to use it but I didn't quite understand it).
So to me, a nested for loop is much more readable. Why, then do people use "next"? I read somewhere that Python is a dynamically-typed and interpreted language and because + both concontinates strings and sums numbers, that it must check variable types for each loop iteration in order to know what the operators are, etc. Does using "next" prevent this in some way, speeding up the execution or is it just a matter of style/preference?
next is precious to advance an iterator when necessary, without that advancement controlling an explicit for loop. For example, if you want "the first item in S that's greater than 100", next(x for x in S if x > 100) will give it to you, no muss, no fuss, no unneeded work (as everything terminates as soon as a suitable x is located) -- and you get an exception (StopIteration) if unexpectedly no x matches the condition. If a no-match is expected and you want None in that case, next((x for x in S if x > 100), None) will deliver that. For this specific purpose, it might be clearer to you if next was actually named first, but that would betray its much more general use.
Consider, for example, the task of merging multiple sequences (e.g., a union or intersection of sorted sequences -- say, sorted files, where the items are lines). Again, next is just what the doctor ordered, because none of the sequences can dominate over the others by controlling A "main for loop". So, assuming for simplicity no duplicates can exist (a condition that's not hard to relax if needed), you keep pairs (currentitem, itsfile) in a list controlled by heapq, and the merging becomes easy... but only thanks to the magic of next to advance the correct file once its item has been used, and that file only.
import heapq
def merge(*theopentextfiles):
theheap = []
for afile in theopentextfiles:
theitem = next(afile, '')
if theitem: theheap.append((theitem, afile))
heapq.heapify(theheap)
while theheap:
theitem, afile = heapq.heappop(theheap)
yielf theitem
theitem = next(afile, '')
if theitem: heapq.heappush(theheap, (theitem, afile))
Just try to do anything anywhere this elegant without next...!-)
One could go on for a long time, but the two use cases "advance an iterator by one place (without letting it control a whole for loop)" and "get just the first item from an iterator" account for most important uses of next.

Categories