Which is more Pythonic way? - python

I want to write a function to create an empty square matrix have size NxN.
I have 2 ways to write this:
1:
s_matrix = []
create_empty_square_matrix(s_matrix, N)
2:
s_matrix = empty_square_matrix(N)
(Ofcourse, 2 two functions will different a bit. Function create_empty_square_matrix is like a procedure - only manipulate on s_matrix. Function empty_square_matrix create & return a matrix)
Which way is more Pythonic & clearer?
Do you have some suggestions about naming style? I'm not sure about empty_square_matrix & create_empty_square_matrix.

I'd always prefer the second way.
The problem with the first is that you pass the object that you want to write to as the paramenter (s_matrix), and the caller of the function will have to know that it must be passed an empty list. What happens if the caller passes a dict, or a list that is not empty?
By the way, if you want to do matrix calculations, you should take a look at the NumPy library, it offers many things that standard Python does not.

Related

Attempting to use np.insert in a created class which has subscripts yields "object does not support item assignment" debug

I have defined my own class which takes in any matrix and is defined in such a way to convert this matrix into three numpy arrays inside a parenthesis (which I assume means it's a tuple). Furthermore, I have added a getitem method which allows output arrays to be subscript-able just like normal arrays.
My class is called MatrixConverter, and say x is some random matrix, then:
q=MatrixConverter(x)
Where q gives:
q=(array[1,2,3,4],array[5,6,7,8],array[9,10,11,12])
(Note that this is just an example, it does not produce three arrays with consecutive numbers)
Then, for example, by my getitem method, it allows for:
q[0] = array[1,2,3,4]
q[0][1] = 2
Now, I'm attempting to design a method to add en element into one of the arrays using the np.insert function such as the following:
class MatrixConverter
#some code here
def __change__(self,n,x):
self[1]=np.insert(self[1],n,x)
return self
Then, my desired output for the case where n=2 and x=70 is the following:
In:q.__change__(2,70)
Out:(array[1,2,3,4],array[5,6,70,7,8],array[9,10,11,12])
However, this gives me a TypeError: 'MatrixConverter' object does not support item assignment.
Any help/debugs? Should I perhaps use np.concentate instead?
Thank you!
Change your method to:
def __change__(self,n,x):
temp = np.insert(self[1],n,x)
self[1] = temp
return self
This will help you distinguish between a problem with the insert and a problem with the self[1] = ... setting.
I don't think the problem is with the insert call, but you need to write code that doesn't confuse you on such matters. That's a basic part of debugging.
Beyond that you haven't given us enough code to help you. For example what's the "getitem".
Expressions like array[1,2,3,4] tell me that you aren't actually copying from your code. That's not a valid Python expression, or array display.

Handle multiple returns from a function in Python

I wrote a function (testFunction) with four return values in Python:
diff1, diff2, sameCount, vennPlot
where the first 3 values (in the output tuple) were used to plot "vennPlot" inside of the function.
A similar questions was asked : How can I plot output from a function which returns multiple values in Python?, but in my case, I also want to know two additional things:
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here? If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Technically, every function returns exactly one value; that value, however, can be a tuple, a list, or some other type that contains multiple values.
That said, you can return something that uses something other than just the order of values to distinguish them. You can return a dict:
def testFunction(...):
...
return dict(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x['diff1'])
or you can define a named tuple:
ReturnType = collections.namedtuple('ReturnType', 'diff1 diff2 sameCount venn')
def testFunction(...):
...
return ReturnType(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x.diff1) # or x[0], if you still want to use the index
To answer your first question, you can unpack tuples returned from a function as such:
diff1, diff2, samecount, vennplot = testFunction(...)
Secondly, there is nothing wrong with multiple outputs from a function, though using multiple return statements within the same function is typically best avoided if possible for clarity's sake.
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here?
It seems you're correct (depends on your use case).
If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
You could use a namedtuple: docs
or - if order is not important - you could just return a dictionary, so you can acess the values by name.
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Sure, as long as it's documented, then it's just what the function does and the programmer knows then how to handle the return values.
Python supports direct unpacking into variables. So downstream, when you call the function, you can retrieve the return values into separate variables as simply as:
diff1, diff2, sameCount, vennPlot= testFunction(...)
EDIT: You can even "swallow" the ones you don't need. For example:
diff1, *stuff_in_the_middle, vennPlot= testFunction(...)
in which case stuff_in_the_middle will contain a tuple of 2.
It is quite appropriate AFAIK, even standard library modules return tuples.
For example - Popen.communicate() from the subprocess module.

Most efficient way to determine if an element is in a list

So I have alist = [2,4,5,6,9,10], and b = 6. What is the more efficient way to determine if b is in alist?
(1)
if b in alist:
print " b is in alist"
(2)
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
split_list(alist,b)
else:
alist=alist[midPoint:]
split_list(alist,b)
I thought method number 1 is better because it is only one line of code, but I've read that method 2 is better because it searchs from middle of list rather than from the beginning the.
Actually the difference between the functions you have shown lies in the matter of time saving during execution. If you are sure that your list will always have more than 2 members then function 2 is better but not too much.
Here is how it works
Function 1
if b in alist:
print " b is in alist"
This will loop through all element in the list only looking for b and when it finds it makes it true but what if your list has 200 members times become sensitive for your program
Function 2
def split_list(alist,b):
midpoint = len(alist)/2
if b<=alist[midpoint]:
alist =alist[:midpoint]:
split_list(alist,b)
else:
alist=alist[midPoint:]
split_list(alist,b)
This does the same except now you are testing a condition first using that midpoint so as to know where might "b" be so as to save the task of looping through the whole list now you will loop half the time, Note:You will make sure that your list has much members may be more than 3 to be reasonable to do that remainder because it may make your logic easy and readable in the future. So in some way it has helped you but consider the fact that what if your list has 200 elements and you divide that by two will it be too helpful to divide it by two and use 100 loop?
No!It still take significant time!
My suggestion according to your case is that if you want to work with small lists your function 1 is better. But if you want to work with huge lists!! Here are some functions which will solve your problem will saving much of your time if you want the best performance for your program. This function uses some built in functions which does take small time to finish because of some list information are in already in memory
def is_inside(alist,b):
how_many=alist.count(b) #return the number of times x appears in the list
if how_many==0:
return False
else:
return True
#you can also modify the function in case you want to check if an element appears more than once!
But if you don't want it to say how many times an element appears and only one satisfy your need! This also another way of doing so using some built in functions for lists
def is_inside(alist,b):
try:
which_position=alist.index(b) #this methods throws an error if b is not in alist
return True
except Error:
return False
So life becomes simple when using built functions specifically for lists. You should consider reading how to use lists well when they long for performance of the programs stuffs like dequeue,stacks,queue,sets
Good source is the documentation itself Read here!
The expected way to find something in a list in python is using the in keyword. If you have a very large dataset, then you should use a data structure that is designed for efficient lookup, such as a set. Then you can still do a find via in.

Numpy: vectorizing two-branch test (ternary-operator like)

I am vectorizing a test in Numpy for the following idea: perform elementwise some test and pick expr1 or expr2 according to the test. This is like the ternary-operator in C: test?expr1:expr2
I see two major ways for performing that; I would like to know if there is a good reason to choose one rather than the other one; maybe also other tricks are available and I would be very happy to know about them. Main goal is speed; for that reason I don't want to use np.vectorize with an if-else statement.
For my example, I will re-build the min function; please, don't tell me about some Numpy function for computing that; this is a mere example!
Idea 1: Use the arithmetic value of the booleans in a multiplication:
# a and b have similar shape
test = a < b
ntest = np.logical_not(test)
out = test*a + ntest*b
Idea 2: More or less following the APL/J style of coding (by using the conditional expression as an index for an array made with one dimension more than initial arrays).
# a and b have similar shape
np.choose(a<b, np.array([b,a]))
This is a better way to use choose
np.choose(a<b, [b,a])
In my small timings it is faster. Also the choose doc says Ifchoicesis itself an array (not recommended), ....
(a<b).choose([b,a])
saves one level of function redirection.
Another option:
out = b.copy(); out[test] = a[test]
In quick tests this actually faster. masked.filled uses np.copyto for this sort of 'where' copy, though it doesn't seem to be any faster.
A variation on the choose is where:
np.where(test,a,b)
Or use where (or np.nonzero) to convert boolean index to a numeric one:
I = np.where(test); out = b.copy(); out[I] = a[I]
For some reason this times faster than the one-piece where.
I've used the multiplication approach in the past; if I recall correctly even with APL (though that's decades ago). An old trick to avoid divide by 0 was to add n==0, a/(b+(b==0)). But it's not as generally applicable. a*0, a*1 have to make sense.
choose looks nice, but with the mode parameter may be more powerful (and hence complicated) that needed.
I'm not sure there is a 'best' way. timing tests can evaluate certain situations, but I don't know where they can be generalized across all cases.

Python 'pointer arithmetic' - Quicksort

In idiomatic C fashion, one can implement quicksort in a simple way with two arguments:
void quicksort(int inputArray[], int numelems);
We can safely use two arguments for later subdivisions (i.e. the partitions, as they're commonly called) via pointer arithmetic:
//later on in our quicksort routine...
quicksort(inputArray+last+1, numelems-last-1);
In fact, I even asked about this before on SO because I was untrained in pointer arithmetic at the time: see Passing an array to a function with an odd format - “v+last+1”
Basically, Is it possible to replicate the same behavior in python and if so, how? I have noticed that lists can be subdivided with the colon inside of square brackets (the slicing operator), but the slice operator does not pass the list from that point on; that is to say that the 1st element (0th index) is still the same in both cases.
As you're aware, Python's slice syntax makes a copy, so in order to manipulate a subsection of a list (not "array", in Python) in place, you need to pass around both the list and the start-index and size (or end-index) of the portion under discussion, much as you could in C. The signature of the recursive function would be something like:
def quicksort( inputList, numElems, startIndex = 0 ):
And the recursive call would be something like:
quicksort( inputList, numElems-last-1, last+1 )
Throughout the function you'd add startIndex to whatever list accesses you would make.
I suppose if you want to do something like that you could do the following:
# list we want to mutate
sort_list = [1,2,3,4,5,6,7,8,9,0]
#wrapper just so everything looks pretty, process could go here if we wanted
def wrapper(a, numlems):
cut = len(a) - numlems
# overwrites a part of the list with another one
a[cut:] = process(a[cut:])
# processing of the slice
def process(a):
# just to show it works
a[1] = 15
return a
wrapper(sort_list, 2)
print(sort_list)
wrapper(sort_list, 4)
print(sort_list)
wrapper(sort_list, 6)
print(sort_list)
This is probably considered pretty evil in python and I wouldn't really recommend it, but it does emulate the functionality you wanted.
For python you only really need:
def quicksort(inputList, startIndex):
Then creating and concatenating slices would work fine without the need for pointer like functionality.

Categories