get the flagged bit index in pure python - python

i involve here 2 questions: one is for 'how', and second is for 'is this great solution sounds ok?'
the thing is this: i have an object with int value that stores all the persons' ids that used that object. it's done using a flagging technique (person id is 0-10).
i got to a situation where in case this value is flagged with only one id, i want to get this id.
for the first test i used value & (value-1) which is nice, but as for the second thing, i started to wonder what's the best way to do it (the reason i wonder about it is because this calculation happens at least 300 times in a second in a critical place).
so the 1st way i thought about is using math.log(x,2), but i feel a little uncomfortable with this solution since it involves "hard" math on a value, instead of very simple bits operation, and i feel like i'm missing something.
the 2nd way i thought about is to count value<<1 until it reaches 1, but it as you could see in the benchmark test, it was just worse.
the 3rd way i was implemented is a non-calc way, and was the fastest, and it's using a dictionary with all the possible values for ids 0-10.
so like i said before: is there a 'right' way for doing it in pure python?
is a dictionary-based solution is a "legitimate" solution? (readability/any-other-reason-why-not-to?)
import math
import time
def find_bit_using_loop(num,_):
c=0
while num!=1:
c+=1
num=num>>1
return c
def find_bit_using_dict(num,_):
return options[num]
def get_bit_idx(num, func):
t=time.time()
for i in xrange(100000):
a=func(num,2)
t=time.time()-t
#print a
return t
options={}
for i in xrange(20):
options[1<<i]=i
num=256
print "time using log:", get_bit_idx(num, math.log)
print "time using loop:", get_bit_idx(num, find_bit_using_loop)
print "time using dict:", get_bit_idx(num, find_bit_using_dict)
output:
time using log: 0.0450000762939
time using loop: 0.156999826431
time using dict: 0.0199999809265
(there's a very similar question here: return index of least significant bit in Python , but first, in this case i know there's only 1 flagged bit, and second, i want to keep it a pure python solution)

If you are using Python 2.7 or 3.1 and above, you can use the bit_length method on integers. It returns the number of bits necessary to represent the integer, i.e. one more than the index of the most significant bit:
>>> (1).bit_length()
1
>>> (4).bit_length()
3
>>> (32).bit_length()
6
This is probably the most Pythonic solution as it's part of the standard library. If you find that dict performs better and this is a performance bottleneck, I see no reason not to use it though.

Related

Need help finding GCD (noob approach)

I am currently going through Math adventures with Python book by Peter Farrel. Now I am simply trying to improve my math skills while learning Python in a fun way. So we made a factors function as seen below:
def factors(num):
factorList = []
for i in range(1, num+1):
if num % i == 0:
factorList.append(i)
return factorList
Exercise 3-1 is asking to make GCF (Greatest Common Factor) function. All the answers here are how we could use builtin Python modules or recursive or Euclid algorithm. I have no clue what any of these things mean, let alone trying it on this assignment. I came with the following solution using the above function:
def gcFactor(num1, num2):
fnum1 = factors(num1)
fnum2 = factors(num2)
gcf = list(set(fnum1).intersection(fnum2))
return max(gcf)
print(gcFactor(28,21))
Is this the best way of doing it? Using the .intersection() function seems a little cheaty to me.
So what I wanted to do is if I could use a loop and separate the list values in fnum1 & fnum2 and compare them and then return the value that matches (which would make common factors) and is greatest (which would be GCF).
The idea behind your algorithm is sound, but there are a few problems:
In your original version, you used gcf[-1] to get the greatest factor, but that will not always work, since converting a set to list does not guarantee that the elements will be in sorted order, even if they were sorted before converting to set. Better use max (you already changed that).
Using set.intersection is definitely not "cheating" but just making good use of what the languages provides. It might be considered cheating to just use math.gcd, but not basic set or list functions.
Your algorithm is rather inefficient. I don't know the book, but I don't think you should actually use the factors function to calculate the gcf, but that was just an exercise to teach you stuff like loops and modulo. Consider two very different numbers as inputs, say 23764372 and 6. You'd calculate all the factors of 23764372 first, before testing the very few values that could actually be common factors. Instead of using factors directly, try to rewrite your gcFactor function to test which values up to the min of the two numbers are factors of both numbers.
Even then, your algorithm will not be very efficient. I would suggest reading up on Euclid's Algorithm and trying to implement that next. If unsure if you did it right, you can use your first function as a reference for testing, and to see the difference in performance.
About your factors function itself: Note that there is a symmetry: if i is a factor, so is n//i. If you use this, you do not have to test all the values up to n but just up to sqrt(n), which is a speed-up equivalent to reducing running time from O(n²) to O(n).

Check if float is an integer: is_integer() vs. modulo 1

I've seen a number of questions asking how to check if a float is an integer. Majority of answers seem to recommend using is_integer():
(1.0).is_integer()
(1.55).is_integer()
I have also occasionally seen math.floor() being used:
import math
1.0 == math.floor(1.0)
1.55 == math.floor(1.55)
I'm wondering why % 1 is rarely used or recommended?
1.0 % 1 == 0
1.55 % 1 == 0
Is there a problem with using modulo for this purpose? Are there edge cases that this doesn't catch? Performance issues for really large numbers?
If % 1 is a fine alternative, then I'm also wondering why is_integer() was introduced to the standard library?
It seems that % is much more flexible. For example, its common to use % 2 to check if a number is odd/even, or % n to check if something is a multiple of n. Given this flexibility, why introduce a new method (is_integer) that does the same thing, or use math.floor, both of which require knowing/remembering that they exist and knowing how to use them? I know that math.floor has uses beyond just integer checking but still...
All are valid for the purpose. The math.floor option requires exact matching between a specific value and the result of the floor function. Which is not very convenient if you want to encapsulate it in a generic method. So it boils down to the first and third option. Both are valid and will do the job. So the key difference is simple - performance:
from timeit import Timer
def with_isint(num):
return num.is_integer()
def with_mod(num):
return num % 1 == 0
Timer(lambda: with_isint(10.0)).timeit(number=10000000)
#output: 2.0617980659008026
Timer(lambda: with_mod(10.0)).timeit(number=10000000)
#output: 2.6560597440693527
Naturally this is a simple operation so you'd need a lot of calls in order to see a considerable difference, as you can see in the example.
One soft reason is definitely: readability
If a function called is_integer() returns True, it is obvious what you have been testing.
However, using the modulo solution, one has to think through the process to see, that it is actually testing if a float is an integer. If you wrap your modulo formalism in a function with an obvious name such as simon_says_its_an_integer(), I think it's just as fine (apart from needlessly introducing an already existing function).

Most efficient way to determine if an element is in a list

So I have alist = [2,4,5,6,9,10], and b = 6. What is the more efficient way to determine if b is in alist?
(1)
if b in alist:
print " b is in alist"
(2)
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
split_list(alist,b)
else:
alist=alist[midPoint:]
split_list(alist,b)
I thought method number 1 is better because it is only one line of code, but I've read that method 2 is better because it searchs from middle of list rather than from the beginning the.
Actually the difference between the functions you have shown lies in the matter of time saving during execution. If you are sure that your list will always have more than 2 members then function 2 is better but not too much.
Here is how it works
Function 1
if b in alist:
print " b is in alist"
This will loop through all element in the list only looking for b and when it finds it makes it true but what if your list has 200 members times become sensitive for your program
Function 2
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
split_list(alist,b)
else:
alist=alist[midPoint:]
split_list(alist,b)
This does the same except now you are testing a condition first using that midpoint so as to know where might "b" be so as to save the task of looping through the whole list now you will loop half the time, Note:You will make sure that your list has much members may be more than 3 to be reasonable to do that remainder because it may make your logic easy and readable in the future. So in some way it has helped you but consider the fact that what if your list has 200 elements and you divide that by two will it be too helpful to divide it by two and use 100 loop?
No!It still take significant time!
My suggestion according to your case is that if you want to work with small lists your function 1 is better. But if you want to work with huge lists!! Here are some functions which will solve your problem will saving much of your time if you want the best performance for your program. This function uses some built in functions which does take small time to finish because of some list information are in already in memory
def is_inside(alist,b):
how_many=alist.count(b) #return the number of times x appears in the list
if how_many==0:
return False
else:
return True
#you can also modify the function in case you want to check if an element appears more than once!
But if you don't want it to say how many times an element appears and only one satisfy your need! This also another way of doing so using some built in functions for lists
def is_inside(alist,b):
try:
which_position=alist.index(b) #this methods throws an error if b is not in alist
return True
except Error:
return False
So life becomes simple when using built functions specifically for lists. You should consider reading how to use lists well when they long for performance of the programs stuffs like dequeue,stacks,queue,sets
Good source is the documentation itself Read here!
The expected way to find something in a list in python is using the in keyword. If you have a very large dataset, then you should use a data structure that is designed for efficient lookup, such as a set. Then you can still do a find via in.

Questions related to performance/efficiency in Python/Django

I have few questions which are bothering me since few days back. I'm a beginner Python/Django Programmer so I just want to clear few things before I dive into real time product development.(for Python 2.7.*)
1) saving value in a variable before using in a function
for x in some_list/tuple:
func(do_something(x))
for x in some_list/tuple:
y = do_something(x)
func(y)
Which one is faster or which one I SHOULD use.
2)Creating a new object of a model in Django
def myview(request):
u = User(username="xyz12",city="TA",name="xyz",...)
u.save()
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
3) creating dictionary
var = Dict(key1=val1,key2=val2,...)
var = {'key1':val1,'key2':val2,...}
4) I know .append() is faster than += but what if I want to append a list's elements to another
a = [1,2,3,],b=[4,5,6]
a += b
or
for i in b:
a.append(i)
This is a very interesting question, but I think you don't ask it for the good reason. The performances gained by such optimisations are negligible, especially if you're working with small number of elements.
On the other hand, what is really important is the ease of reading the code and it's clarity.
def myview(request):
d = {'username':"xyz12",'city':"TA",'name':"xyz",...}
u = User(**d)
u.save()
This code for example isn't "easy" to read and to understand at first sight. It requires to think about it before finding what is actually does. Unless you need the intermediary step, don't do it.
For the 4th point, I'd go for the first solution, way much clearer (and it avoids the function call overhead created by calling the same function in a loop). You could also use more specialised function for better performances such as reduce (see this answer : https://stackoverflow.com/a/11739570/3768672 and this thread as well : What is the fastest way to merge two lists in python?).
The 1st and 3rd points are usually up to what you prefer, as both are really similar and will probably be optimised when compiled to bytecode anyway.
If you really want to optimise more your code, I advise you to go check this out : https://wiki.python.org/moin/PythonSpeed/PerformanceTips
PS : Ultimately, you can still do your own tests. Write two functions doing the exact same things with the two different methods you want to test, measure the execution times of these methods and compare them (be careful, do the tests multiple time to reduce the uncertainties).

Python dictionary instead of switch/case

I've recently learned that python doesn't have the switch/case statement. I've been reading about using dictionaries in its stead, like this for example:
values = {
value1: do_some_stuff1,
value2: do_some_stuff2,
valueN: do_some_stuffN,
}
values.get(var, do_default_stuff)()
What I can't figure out is how to apply this to do a range test. So instead of doing some stuff if value1=4 say, doing some stuff if value1<4. So something like this (which I know doesn't work):
values = {
if value1 <val: do_some_stuff1,
if value2 >val: do_some_stuff2,
}
values.get(var, do_default_stuff)()
I've tried doing this with if/elif/else statements. It works fine but it seems to go considerably slower compared to the situation where I don't need the if statements at all (which is maybe something obvious an inevitable). So here's my code with the if/elif/else statement:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
elif sep_ang(val1,val2,X,Y)<=5.0:
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
else:
spectrum[0].getElementsByTagName("parameter")[0].setAttribute("free","0") #Integral
spectrum[0].getElementsByTagName("parameter")[1].setAttribute("free","0") #Index
integral=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
index=float(spectrum[0].getElementsByTagName("parameter")[0].getAttribute("free"))
print name,val1,val2,sep_ang(val1,val2,X,Y),integral,index
print >> reg,'fk5;point(',val1,val2,')# point=cross text={',name,'}'
Which takes close to 5 min for checking about 1500 values of the var sep_ang. Where as if I don't want to use setAttribute() to change values in my xml file based on the value of sep_ang, I use this simple if else:
if sep_ang(val1,val2,X,Y)>=ROI :
main.removeChild(source)
else:
print name,val1,val2,ang_sep(val1,val2,X,Y);print >> reg,'fk5;point(',val1,val2,')# point
Which only takes ~30sec. Again I know it's likely that adding that elif statement and changing values of that attribute inevitably increases the execution time of my code by a great deal, I was just curious if there's a way around it.
Edit:
Is the benefit of using bisect as opposed to an if/elif statement in my situation that it can check values over some range quicker than using a bunch of elif statements?
It seems like I'll still need to use elif statements. Like this for example:
range=[10,100]
options='abc'
def func(val)
return options[bisect(range, val)]
if func(val)=a:
do stuff
elif func(val)=b:
do other stuff
else:
do other other stuff
So then my elif statement are only checking against a single value.
Thanks much for the help, it's greatly appreciated.
A dictionary is the wrong structure for this. The bisect examples show an example of this sort of range test.
Whilst the dictionary approach works well for single values, if you want ranges, if ... else if ... else if is probably the simplest approach.
If you're looking for a single value this a good match to a dictionary - since this is what dictionaries are for - but if you're looking for a range it doesn't work. You could do it with a dict using something like:
values = {
lambda x: x < 4: foo,
lambda x: x > 4: bar
}
and then loop through all the key-value pairs in the dictionary, passing your value key and running the value as a function if the key function returns true.
However, this wouldn't give you any benefit over a number of if statements and would be harder to maintain and debug. So don't do it, and just use if instead.
In that case you would use an if/then/else. You cannot do this with a switch, either.
The idea of a switch statement is that you have a value V that you test for identity against N possible outcomes. You can do this with an if-construct - however that would take O(N) runtime on average. The switch gives you constant O(1) every time.
This is obviously not possible for ranges (since they are not easily hashable) and thus you use if-constructs for these cases.
Example
if value1 <val: do_some_stuff1()
elif value2 >val: do_some_stuff2()
Note that this is actually smaller than trying to use a dictionary.
dict is not for doing this (nor is switch!).
A couple posters have suggested a dict with containment functions, but this is not the solution you want at all. It is O(n) (like an if statement), it doesn't really work (because you could have overlapping conditions), is unpredictable (because you do not know what order you will do the loop), and is much less clear than the equivalent if statement. The if statement is probably the way you want to go if you have a short, static-length list of conditions to apply.
If you have tons of conditions or if they could change as a result of your program, you want a different data structure. You could implement a binary tree or keep a sorted list and use the bisect module to find a value associated with the given range.
I don't know of any practicable solution. If you want to go with the guess what it does approach though you could do something like this:
obsure_switch = {
lambda x: 1<x<6 : some_function,
...
}
[action() for condition,action in obscure_switch.iteritems() if condition(var)]
Finally figured out what to do!
So instead of using a bunch of elif statements I did this:
range=[10,100]
options='abc'
def func(val)
choose=str(options[bisect(range,val)])
exec choose+"()"
def a():
do_stuff
def b():
do_other_stuff
def c():
do_other_other stuff
Not only does it work but it goes almost as fast as my original 4 line code where I'm not changing any values of things!

Categories