Related
For example I have this list containing ranges.
x=[[1,4],
[6,7],
[9,9]]
where the first value of each item (e.g. [1,4]) is the start position (1) and, the second value is the end (4) position.
I want to convert this list of ranges into a boolean list, wherein the value is True if the position is between (any of) the ranges (i.e. the start and end positions) indicated in the list above, otherwise the value should be False.
[False, True, True, True, True, False, True, True, False, True]
This is obviously possible using a for loop. However, I am looking for a other options that are one-liners. Ideally, I am looking for some way that could also be applicable to a pandas series.
Note: This is essentially an opposite problem of this question: Get ranges of True values (start and end) in a boolean list (without using a for loop)
A hopefully efficient way using numpy:
low, high = np.array(x).T[:,:, None] # rearrange the limits into a 3d array in a convenient shape
a = np.arange(high.max() + 1) # make a range from 0 to 9
print(((a >= low) & (a <= high)).any(axis=0))
An alternative that edits the array in a python loop:
result = np.zeros(np.array(x).max() + 1, dtype=bool)
for start, end in x:
result[start:end+1] = True
This could be faster depending on the speed of editing a slice of an array relative to numpy 2d matrix comparisons.
What explains the difference in behavior of boolean and bitwise operations on lists vs NumPy arrays?
I'm confused about the appropriate use of & vs and in Python, illustrated in the following examples.
mylist1 = [True, True, True, False, True]
mylist2 = [False, True, False, True, False]
>>> len(mylist1) == len(mylist2)
True
# ---- Example 1 ----
>>> mylist1 and mylist2
[False, True, False, True, False]
# I would have expected [False, True, False, False, False]
# ---- Example 2 ----
>>> mylist1 & mylist2
TypeError: unsupported operand type(s) for &: 'list' and 'list'
# Why not just like example 1?
>>> import numpy as np
# ---- Example 3 ----
>>> np.array(mylist1) and np.array(mylist2)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
# Why not just like Example 4?
# ---- Example 4 ----
>>> np.array(mylist1) & np.array(mylist2)
array([False, True, False, False, False], dtype=bool)
# This is the output I was expecting!
This answer and this answer helped me understand that and is a boolean operation but & is a bitwise operation.
I read about bitwise operations to better understand the concept, but I am struggling to use that information to make sense of my above 4 examples.
Example 4 led me to my desired output, so that is fine, but I am still confused about when/how/why I should use and vs &. Why do lists and NumPy arrays behave differently with these operators?
Can anyone help me understand the difference between boolean and bitwise operations to explain why they handle lists and NumPy arrays differently?
and tests whether both expressions are logically True while & (when used with True/False values) tests if both are True.
In Python, empty built-in objects are typically treated as logically False while non-empty built-ins are logically True. This facilitates the common use case where you want to do something if a list is empty and something else if the list is not. Note that this means that the list [False] is logically True:
>>> if [False]:
... print 'True'
...
True
So in Example 1, the first list is non-empty and therefore logically True, so the truth value of the and is the same as that of the second list. (In our case, the second list is non-empty and therefore logically True, but identifying that would require an unnecessary step of calculation.)
For example 2, lists cannot meaningfully be combined in a bitwise fashion because they can contain arbitrary unlike elements. Things that can be combined bitwise include: Trues and Falses, integers.
NumPy objects, by contrast, support vectorized calculations. That is, they let you perform the same operations on multiple pieces of data.
Example 3 fails because NumPy arrays (of length > 1) have no truth value as this prevents vector-based logic confusion.
Example 4 is simply a vectorized bit and operation.
Bottom Line
If you are not dealing with arrays and are not performing math manipulations of integers, you probably want and.
If you have vectors of truth values that you wish to combine, use numpy with &.
About list
First a very important point, from which everything will follow (I hope).
In ordinary Python, list is not special in any way (except having cute syntax for constructing, which is mostly a historical accident). Once a list [3,2,6] is made, it is for all intents and purposes just an ordinary Python object, like a number 3, set {3,7}, or a function lambda x: x+5.
(Yes, it supports changing its elements, and it supports iteration, and many other things, but that's just what a type is: it supports some operations, while not supporting some others. int supports raising to a power, but that doesn't make it very special - it's just what an int is. lambda supports calling, but that doesn't make it very special - that's what lambda is for, after all:).
About and
and is not an operator (you can call it "operator", but you can call "for" an operator too:). Operators in Python are (implemented through) methods called on objects of some type, usually written as part of that type. There is no way for a method to hold an evaluation of some of its operands, but and can (and must) do that.
The consequence of that is that and cannot be overloaded, just like for cannot be overloaded. It is completely general, and communicates through a specified protocol. What you can do is customize your part of the protocol, but that doesn't mean you can alter the behavior of and completely. The protocol is:
Imagine Python interpreting "a and b" (this doesn't happen literally this way, but it helps understanding). When it comes to "and", it looks at the object it has just evaluated (a), and asks it: are you true? (NOT: are you True?) If you are an author of a's class, you can customize this answer. If a answers "no", and (skips b completely, it is not evaluated at all, and) says: a is my result (NOT: False is my result).
If a doesn't answer, and asks it: what is your length? (Again, you can customize this as an author of a's class). If a answers 0, and does the same as above - considers it false (NOT False), skips b, and gives a as result.
If a answers something other than 0 to the second question ("what is your length"), or it doesn't answer at all, or it answers "yes" to the first one ("are you true"), and evaluates b, and says: b is my result. Note that it does NOT ask b any questions.
The other way to say all of this is that a and b is almost the same as b if a else a, except a is evaluated only once.
Now sit for a few minutes with a pen and paper, and convince yourself that when {a,b} is a subset of {True,False}, it works exactly as you would expect of Boolean operators. But I hope I have convinced you it is much more general, and as you'll see, much more useful this way.
Putting those two together
Now I hope you understand your example 1. and doesn't care if mylist1 is a number, list, lambda or an object of a class Argmhbl. It just cares about mylist1's answer to the questions of the protocol. And of course, mylist1 answers 5 to the question about length, so and returns mylist2. And that's it. It has nothing to do with elements of mylist1 and mylist2 - they don't enter the picture anywhere.
Second example: & on list
On the other hand, & is an operator like any other, like + for example. It can be defined for a type by defining a special method on that class. int defines it as bitwise "and", and bool defines it as logical "and", but that's just one option: for example, sets and some other objects like dict keys views define it as a set intersection. list just doesn't define it, probably because Guido didn't think of any obvious way of defining it.
numpy
On the other leg:-D, numpy arrays are special, or at least they are trying to be. Of course, numpy.array is just a class, it cannot override and in any way, so it does the next best thing: when asked "are you true", numpy.array raises a ValueError, effectively saying "please rephrase the question, my view of truth doesn't fit into your model". (Note that the ValueError message doesn't speak about and - because numpy.array doesn't know who is asking it the question; it just speaks about truth.)
For &, it's completely different story. numpy.array can define it as it wishes, and it defines & consistently with other operators: pointwise. So you finally get what you want.
HTH,
The short-circuiting boolean operators (and, or) can't be overriden because there is no satisfying way to do this without introducing new language features or sacrificing short circuiting. As you may or may not know, they evaluate the first operand for its truth value, and depending on that value, either evaluate and return the second argument, or don't evaluate the second argument and return the first:
something_true and x -> x
something_false and x -> something_false
something_true or x -> something_true
something_false or x -> x
Note that the (result of evaluating the) actual operand is returned, not truth value thereof.
The only way to customize their behavior is to override __nonzero__ (renamed to __bool__ in Python 3), so you can affect which operand gets returned, but not return something different. Lists (and other collections) are defined to be "truthy" when they contain anything at all, and "falsey" when they are empty.
NumPy arrays reject that notion: For the use cases they aim at, two different notions of truth are common: (1) Whether any element is true, and (2) whether all elements are true. Since these two are completely (and silently) incompatible, and neither is clearly more correct or more common, NumPy refuses to guess and requires you to explicitly use .any() or .all().
& and | (and not, by the way) can be fully overriden, as they don't short circuit. They can return anything at all when overriden, and NumPy makes good use of that to do element-wise operations, as they do with practically any other scalar operation. Lists, on the other hand, don't broadcast operations across their elements. Just as mylist1 - mylist2 doesn't mean anything and mylist1 + mylist2 means something completely different, there is no & operator for lists.
Example 1:
This is how the and operator works.
x and y =>
if x is false, then x, else y
So in other words, since mylist1 is not False, the result of the expression is mylist2. (Only empty lists evaluate to False.)
Example 2:
The & operator is for a bitwise and, as you mention. Bitwise operations only work on numbers. The result of a & b is a number composed of 1s in bits that are 1 in both a and b. For example:
>>> 3 & 1
1
It's easier to see what's happening using a binary literal (same numbers as above):
>>> 0b0011 & 0b0001
0b0001
Bitwise operations are similar in concept to boolean (truth) operations, but they work only on bits.
So, given a couple statements about my car
My car is red
My car has wheels
The logical "and" of these two statements is:
(is my car red?) and (does car have wheels?) => logical true of false value
Both of which are true, for my car at least. So the value of the statement as a whole is logically true.
The bitwise "and" of these two statements is a little more nebulous:
(the numeric value of the statement 'my car is red') & (the numeric value of the statement 'my car has wheels') => number
If python knows how to convert the statements to numeric values, then it will do so and compute the bitwise-and of the two values. This may lead you to believe that & is interchangeable with and, but as with the above example they are different things. Also, for the objects that can't be converted, you'll just get a TypeError.
Example 3 and 4:
Numpy implements arithmetic operations for arrays:
Arithmetic and comparison operations on ndarrays are defined as element-wise operations, and generally yield ndarray objects as results.
But does not implement logical operations for arrays, because you can't overload logical operators in python. That's why example three doesn't work, but example four does.
So to answer your and vs & question: Use and.
The bitwise operations are used for examining the structure of a number (which bits are set, which bits aren't set). This kind of information is mostly used in low-level operating system interfaces (unix permission bits, for example). Most python programs won't need to know that.
The logical operations (and, or, not), however, are used all the time.
In Python an expression of X and Y returns Y, given that bool(X) == True or any of X or Y evaluate to False, e.g.:
True and 20
>>> 20
False and 20
>>> False
20 and []
>>> []
Bitwise operator is simply not defined for lists. But it is defined for integers - operating over the binary representation of the numbers. Consider 16 (01000) and 31 (11111):
16 & 31
>>> 16
NumPy is not a psychic, it does not know, whether you mean that
e.g. [False, False] should be equal to True in a logical expression. In this it overrides a standard Python behaviour, which is: "Any empty collection with len(collection) == 0 is False".
Probably an expected behaviour of NumPy's arrays's & operator.
For the first example and base on the django's doc
It will always return the second list, indeed a non empty list is see as a True value for Python thus python return the 'last' True value so the second list
In [74]: mylist1 = [False]
In [75]: mylist2 = [False, True, False, True, False]
In [76]: mylist1 and mylist2
Out[76]: [False, True, False, True, False]
In [77]: mylist2 and mylist1
Out[77]: [False]
Operations with a Python list operate on the list. list1 and list2 will check if list1 is empty, and return list1 if it is, and list2 if it isn't. list1 + list2 will append list2 to list1, so you get a new list with len(list1) + len(list2) elements.
Operators that only make sense when applied element-wise, such as &, raise a TypeError, as element-wise operations aren't supported without looping through the elements.
Numpy arrays support element-wise operations. array1 & array2 will calculate the bitwise or for each corresponding element in array1 and array2. array1 + array2 will calculate the sum for each corresponding element in array1 and array2.
This does not work for and and or.
array1 and array2 is essentially a short-hand for the following code:
if bool(array1):
return array2
else:
return array1
For this you need a good definition of bool(array1). For global operations like used on Python lists, the definition is that bool(list) == True if list is not empty, and False if it is empty. For numpy's element-wise operations, there is some disambiguity whether to check if any element evaluates to True, or all elements evaluate to True. Because both are arguably correct, numpy doesn't guess and raises a ValueError when bool() is (indirectly) called on an array.
Good question. Similar to the observation you have about examples 1 and 4 (or should I say 1 & 4 :) ) over logical and bitwise & operators, I experienced on sum operator. The numpy sum and py sum behave differently as well. For example:
Suppose "mat" is a numpy 5x5 2d array such as:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
Then numpy.sum(mat) gives total sum of the entire matrix. Whereas the built-in sum from Python such as sum(mat) totals along the axis only. See below:
np.sum(mat) ## --> gives 325
sum(mat) ## --> gives array([55, 60, 65, 70, 75])
What explains the difference in behavior of boolean and bitwise operations on lists vs NumPy arrays?
I'm confused about the appropriate use of & vs and in Python, illustrated in the following examples.
mylist1 = [True, True, True, False, True]
mylist2 = [False, True, False, True, False]
>>> len(mylist1) == len(mylist2)
True
# ---- Example 1 ----
>>> mylist1 and mylist2
[False, True, False, True, False]
# I would have expected [False, True, False, False, False]
# ---- Example 2 ----
>>> mylist1 & mylist2
TypeError: unsupported operand type(s) for &: 'list' and 'list'
# Why not just like example 1?
>>> import numpy as np
# ---- Example 3 ----
>>> np.array(mylist1) and np.array(mylist2)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
# Why not just like Example 4?
# ---- Example 4 ----
>>> np.array(mylist1) & np.array(mylist2)
array([False, True, False, False, False], dtype=bool)
# This is the output I was expecting!
This answer and this answer helped me understand that and is a boolean operation but & is a bitwise operation.
I read about bitwise operations to better understand the concept, but I am struggling to use that information to make sense of my above 4 examples.
Example 4 led me to my desired output, so that is fine, but I am still confused about when/how/why I should use and vs &. Why do lists and NumPy arrays behave differently with these operators?
Can anyone help me understand the difference between boolean and bitwise operations to explain why they handle lists and NumPy arrays differently?
and tests whether both expressions are logically True while & (when used with True/False values) tests if both are True.
In Python, empty built-in objects are typically treated as logically False while non-empty built-ins are logically True. This facilitates the common use case where you want to do something if a list is empty and something else if the list is not. Note that this means that the list [False] is logically True:
>>> if [False]:
... print 'True'
...
True
So in Example 1, the first list is non-empty and therefore logically True, so the truth value of the and is the same as that of the second list. (In our case, the second list is non-empty and therefore logically True, but identifying that would require an unnecessary step of calculation.)
For example 2, lists cannot meaningfully be combined in a bitwise fashion because they can contain arbitrary unlike elements. Things that can be combined bitwise include: Trues and Falses, integers.
NumPy objects, by contrast, support vectorized calculations. That is, they let you perform the same operations on multiple pieces of data.
Example 3 fails because NumPy arrays (of length > 1) have no truth value as this prevents vector-based logic confusion.
Example 4 is simply a vectorized bit and operation.
Bottom Line
If you are not dealing with arrays and are not performing math manipulations of integers, you probably want and.
If you have vectors of truth values that you wish to combine, use numpy with &.
About list
First a very important point, from which everything will follow (I hope).
In ordinary Python, list is not special in any way (except having cute syntax for constructing, which is mostly a historical accident). Once a list [3,2,6] is made, it is for all intents and purposes just an ordinary Python object, like a number 3, set {3,7}, or a function lambda x: x+5.
(Yes, it supports changing its elements, and it supports iteration, and many other things, but that's just what a type is: it supports some operations, while not supporting some others. int supports raising to a power, but that doesn't make it very special - it's just what an int is. lambda supports calling, but that doesn't make it very special - that's what lambda is for, after all:).
About and
and is not an operator (you can call it "operator", but you can call "for" an operator too:). Operators in Python are (implemented through) methods called on objects of some type, usually written as part of that type. There is no way for a method to hold an evaluation of some of its operands, but and can (and must) do that.
The consequence of that is that and cannot be overloaded, just like for cannot be overloaded. It is completely general, and communicates through a specified protocol. What you can do is customize your part of the protocol, but that doesn't mean you can alter the behavior of and completely. The protocol is:
Imagine Python interpreting "a and b" (this doesn't happen literally this way, but it helps understanding). When it comes to "and", it looks at the object it has just evaluated (a), and asks it: are you true? (NOT: are you True?) If you are an author of a's class, you can customize this answer. If a answers "no", and (skips b completely, it is not evaluated at all, and) says: a is my result (NOT: False is my result).
If a doesn't answer, and asks it: what is your length? (Again, you can customize this as an author of a's class). If a answers 0, and does the same as above - considers it false (NOT False), skips b, and gives a as result.
If a answers something other than 0 to the second question ("what is your length"), or it doesn't answer at all, or it answers "yes" to the first one ("are you true"), and evaluates b, and says: b is my result. Note that it does NOT ask b any questions.
The other way to say all of this is that a and b is almost the same as b if a else a, except a is evaluated only once.
Now sit for a few minutes with a pen and paper, and convince yourself that when {a,b} is a subset of {True,False}, it works exactly as you would expect of Boolean operators. But I hope I have convinced you it is much more general, and as you'll see, much more useful this way.
Putting those two together
Now I hope you understand your example 1. and doesn't care if mylist1 is a number, list, lambda or an object of a class Argmhbl. It just cares about mylist1's answer to the questions of the protocol. And of course, mylist1 answers 5 to the question about length, so and returns mylist2. And that's it. It has nothing to do with elements of mylist1 and mylist2 - they don't enter the picture anywhere.
Second example: & on list
On the other hand, & is an operator like any other, like + for example. It can be defined for a type by defining a special method on that class. int defines it as bitwise "and", and bool defines it as logical "and", but that's just one option: for example, sets and some other objects like dict keys views define it as a set intersection. list just doesn't define it, probably because Guido didn't think of any obvious way of defining it.
numpy
On the other leg:-D, numpy arrays are special, or at least they are trying to be. Of course, numpy.array is just a class, it cannot override and in any way, so it does the next best thing: when asked "are you true", numpy.array raises a ValueError, effectively saying "please rephrase the question, my view of truth doesn't fit into your model". (Note that the ValueError message doesn't speak about and - because numpy.array doesn't know who is asking it the question; it just speaks about truth.)
For &, it's completely different story. numpy.array can define it as it wishes, and it defines & consistently with other operators: pointwise. So you finally get what you want.
HTH,
The short-circuiting boolean operators (and, or) can't be overriden because there is no satisfying way to do this without introducing new language features or sacrificing short circuiting. As you may or may not know, they evaluate the first operand for its truth value, and depending on that value, either evaluate and return the second argument, or don't evaluate the second argument and return the first:
something_true and x -> x
something_false and x -> something_false
something_true or x -> something_true
something_false or x -> x
Note that the (result of evaluating the) actual operand is returned, not truth value thereof.
The only way to customize their behavior is to override __nonzero__ (renamed to __bool__ in Python 3), so you can affect which operand gets returned, but not return something different. Lists (and other collections) are defined to be "truthy" when they contain anything at all, and "falsey" when they are empty.
NumPy arrays reject that notion: For the use cases they aim at, two different notions of truth are common: (1) Whether any element is true, and (2) whether all elements are true. Since these two are completely (and silently) incompatible, and neither is clearly more correct or more common, NumPy refuses to guess and requires you to explicitly use .any() or .all().
& and | (and not, by the way) can be fully overriden, as they don't short circuit. They can return anything at all when overriden, and NumPy makes good use of that to do element-wise operations, as they do with practically any other scalar operation. Lists, on the other hand, don't broadcast operations across their elements. Just as mylist1 - mylist2 doesn't mean anything and mylist1 + mylist2 means something completely different, there is no & operator for lists.
Example 1:
This is how the and operator works.
x and y =>
if x is false, then x, else y
So in other words, since mylist1 is not False, the result of the expression is mylist2. (Only empty lists evaluate to False.)
Example 2:
The & operator is for a bitwise and, as you mention. Bitwise operations only work on numbers. The result of a & b is a number composed of 1s in bits that are 1 in both a and b. For example:
>>> 3 & 1
1
It's easier to see what's happening using a binary literal (same numbers as above):
>>> 0b0011 & 0b0001
0b0001
Bitwise operations are similar in concept to boolean (truth) operations, but they work only on bits.
So, given a couple statements about my car
My car is red
My car has wheels
The logical "and" of these two statements is:
(is my car red?) and (does car have wheels?) => logical true of false value
Both of which are true, for my car at least. So the value of the statement as a whole is logically true.
The bitwise "and" of these two statements is a little more nebulous:
(the numeric value of the statement 'my car is red') & (the numeric value of the statement 'my car has wheels') => number
If python knows how to convert the statements to numeric values, then it will do so and compute the bitwise-and of the two values. This may lead you to believe that & is interchangeable with and, but as with the above example they are different things. Also, for the objects that can't be converted, you'll just get a TypeError.
Example 3 and 4:
Numpy implements arithmetic operations for arrays:
Arithmetic and comparison operations on ndarrays are defined as element-wise operations, and generally yield ndarray objects as results.
But does not implement logical operations for arrays, because you can't overload logical operators in python. That's why example three doesn't work, but example four does.
So to answer your and vs & question: Use and.
The bitwise operations are used for examining the structure of a number (which bits are set, which bits aren't set). This kind of information is mostly used in low-level operating system interfaces (unix permission bits, for example). Most python programs won't need to know that.
The logical operations (and, or, not), however, are used all the time.
In Python an expression of X and Y returns Y, given that bool(X) == True or any of X or Y evaluate to False, e.g.:
True and 20
>>> 20
False and 20
>>> False
20 and []
>>> []
Bitwise operator is simply not defined for lists. But it is defined for integers - operating over the binary representation of the numbers. Consider 16 (01000) and 31 (11111):
16 & 31
>>> 16
NumPy is not a psychic, it does not know, whether you mean that
e.g. [False, False] should be equal to True in a logical expression. In this it overrides a standard Python behaviour, which is: "Any empty collection with len(collection) == 0 is False".
Probably an expected behaviour of NumPy's arrays's & operator.
For the first example and base on the django's doc
It will always return the second list, indeed a non empty list is see as a True value for Python thus python return the 'last' True value so the second list
In [74]: mylist1 = [False]
In [75]: mylist2 = [False, True, False, True, False]
In [76]: mylist1 and mylist2
Out[76]: [False, True, False, True, False]
In [77]: mylist2 and mylist1
Out[77]: [False]
Operations with a Python list operate on the list. list1 and list2 will check if list1 is empty, and return list1 if it is, and list2 if it isn't. list1 + list2 will append list2 to list1, so you get a new list with len(list1) + len(list2) elements.
Operators that only make sense when applied element-wise, such as &, raise a TypeError, as element-wise operations aren't supported without looping through the elements.
Numpy arrays support element-wise operations. array1 & array2 will calculate the bitwise or for each corresponding element in array1 and array2. array1 + array2 will calculate the sum for each corresponding element in array1 and array2.
This does not work for and and or.
array1 and array2 is essentially a short-hand for the following code:
if bool(array1):
return array2
else:
return array1
For this you need a good definition of bool(array1). For global operations like used on Python lists, the definition is that bool(list) == True if list is not empty, and False if it is empty. For numpy's element-wise operations, there is some disambiguity whether to check if any element evaluates to True, or all elements evaluate to True. Because both are arguably correct, numpy doesn't guess and raises a ValueError when bool() is (indirectly) called on an array.
Good question. Similar to the observation you have about examples 1 and 4 (or should I say 1 & 4 :) ) over logical and bitwise & operators, I experienced on sum operator. The numpy sum and py sum behave differently as well. For example:
Suppose "mat" is a numpy 5x5 2d array such as:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
Then numpy.sum(mat) gives total sum of the entire matrix. Whereas the built-in sum from Python such as sum(mat) totals along the axis only. See below:
np.sum(mat) ## --> gives 325
sum(mat) ## --> gives array([55, 60, 65, 70, 75])
I want to create permutations of a matrix, which has 10 rows with 70 items each.
Every item contains either True or False. I need to create permutations of this matrix.
The problem is that I would need to write 1400 for statements.
Is there a better way to do these permutations?
matrix = [[False for i in range(0, 70)] for i in range(0, 10)]
possible_items = [True, False]
Edit: Loop through all possible combinations of all True and False items in the matrix.
I agree 100% with the comment made by #user2357112, there must be an underlying issue with your solution that prompted you to pursue such a solution.
However, if for any reason you do want a solution to this you might consider using itertools.product.
VALUES = (True, False)
rows = itertools.product(VALUES, repeat=70)
This will produce all rows of 70 items of VALUES, I do not suggest running it.
You can then easily extend this to be a solution to your problem, but I repeat, this is probably not a good way to do this.
This is definitely more of a notional question, but I wanted to get others expertise input on this topic at SO. Most of my programming is coming from Numpy arrays lately. I've been matching items in two or so arrays that are different in sizes. Most of the time I will go to a for-loop or even worst, nested for-loop. I'm ultimately trying to avoid using for-loops as I try to gain more experience in Data Science because for-loops perform slower.
I am well aware of Numpy and the pre-defined cmds I can research, but for those of you whom are experienced, do you have a general school of thought when you iterate through something?
Something similar to the following:
small_array = np.array(["a", "b"])
big_array = np.array(["a", "b", "c", "d"])
for i in range(len(small_array)):
for p in range(len(big_array)):
if small_array[i] == big_array[p]:
print "This item is matched: ", small_array[i]
I'm well aware there are more than one way to skin a cat with this, but I am interested in others approach and way of thinking.
Since I've been working with array languages for decades (APL, MATLAB, numpy) I can't help with the starting steps. But I suspect I work mostly from patterns, things I've seen and used in the past. And I do a lot to experimentation in an interactive session.
To take your example:
In [273]: small_array = np.array(["a", "b"])
...: big_array = np.array(["a", "b", "c", "d"])
...:
...: for i in range(len(small_array)):
...: for p in range(len(big_array)):
...: if small_array[i] == big_array[p]:
...: print( "This item is matched: ", small_array[i])
...:
This item is matched: a
This item is matched: b
Often I run the iterative case just to get a clear(er) idea of what is desired.
In [274]: small_array
Out[274]:
array(['a', 'b'],
dtype='<U1')
In [275]: big_array
Out[275]:
array(['a', 'b', 'c', 'd'],
dtype='<U1')
I've seen this before - iterating over two arrays, and doing something with the paired values. This is a kind of outer operation. There are various tools, but the one I like best makes use of numpy broadcasting. It turn one array into a (n,1) array, and use it with the other (m,) array
In [276]: small_array[:,None]
Out[276]:
array([['a'],
['b']],
dtype='<U1')
The result of (n,1) operating with (1,m) is a (n,m) array:
In [277]: small_array[:,None]==big_array
Out[277]:
array([[ True, False, False, False],
[False, True, False, False]], dtype=bool)
Now I can take an any or all reduction on either axis:
In [278]: _.all(axis=0)
Out[278]: array([False, False, False, False], dtype=bool)
In [280]: __.all(axis=1)
Out[280]: array([False, False], dtype=bool)
I could also use np.where to reduce that boolean to indices.
Oops, I should have used any
In [284]: (small_array[:,None]==big_array).any(0)
Out[284]: array([ True, True, False, False], dtype=bool)
In [285]: (small_array[:,None]==big_array).any(1)
Out[285]: array([ True, True], dtype=bool)
Having played with this I remember that there's a in1d that does something similar
In [286]: np.in1d(big_array, small_array)
Out[286]: array([ True, True, False, False], dtype=bool)
But when I look at the code for in1d (see the [source] link in the docs), I see that, in some cases it actually iterates on the small array:
In [288]: for x in small_array:
...: print(x==big_array)
...:
[ True False False False]
[False True False False]
Compare that to Out[277]. x==big_array compares a scalar with an array. In numpy, doing something like ==, +, * etc with an array and scalar is easy, and should become second nature. Doing the same thing with 2 arrays of matching shapes is the next step. And from there do it with broadcastable shapes.
In other cases it use np.unique and np.argsort.
This pattern of creating a higher dimension array by broadcasting the inputs against each other, and then combining values with some sort of reduction (any, all, sum, mean, etc) is very common.
I will interpret your question in a more specific way:
How do I quit using index variables?
How do I start writing list comprehensions instead of normal loops"?
To quit using index variables, the key is to understand that "for" in Python is not the "for" of other languagues. It should be called "for each".
for x in small_array:
for y in big_array:
if x == y:
print "This item is matched: ", x
That's much better.
I also find myself in situations where I would write code with normal loops (or actually do it) and then start wondering whether it would be clearer and more elegant with a list comprehension.
List comprehensions are really a domain-specific language to create lists, so the first step would be to learn its basics. A typical statement would be:
l = [f(x) for x in list_expression if g(x)]
Meaning "give me a list of f(x), for all x out of list_expression that meet condition g"
So you could write it in this way:
matched = [x for x in small_array if x in big_array]
Et voilĂ , you are on the road to pythonic style!
As you said, you better use vectorized stuff to speed up. Learning it is a long path. You have to get used with matrices multiplication if you aren't already. Once you are, try to translate your data into matrix and see which multiplication you can do. Usually you can't do what you want with this and have super-matrices (more than 2D dimensions). That's where numpy get useful.
Numpy provides some functions like np.where, know how to use them. Know shortcuts like small_array[small_array == 'a'] = 'z'. Try to combine numpy functions with nativ pythons (map, filter...).
To handle multi-dimension matrix, there's no seccret, practice and use paper to understand what you're doing. But over 4 dimensions it starts getting very tricky.
For loops are not necessarily slow. That's a matlab nonsense spread through time because of matlab's own fault. Vectorization is "for" looping but in a lower level. You need to get a handle on what kind of data and architecture you are working in and which kind of function your are executing over your data.