I'm a Matlab user needing to use Python for some things, I would really appreciate it if someone can help me out with Python syntax:
(1) Is it true that lists can be indexed by tuples in Python? If so, how do I do this? For example, I would like to use that to represent a matrix of data.
(2) Assuming I can use a list indexed by tuples, say, data[(row,col)], how do I remove an entire column? I know in Matlab, I can do something like
new_data = [data(:,1:x-1) data(:,x+1:end)];
if I wanted to remove column x from data.
(3) How can I easily count the number of non-negative elements in each row. For example, in Matlab, I can do something like this:
sum(data>=0,1)
this would give me a column vector that represents the number of non-negative entries in each row.
Thanks a lot!
You should look into numpy, it's made for just this sort of thing.
No, but dicts can.
Sounds like you want a "2d array", matrix type, or something else. Have you looked at numpy yet?
Depends on what you choose from #2, but Python does have sum and other functions that work directly on iterables. Look at gen-exprs (generator expressions) and list comprehensions. For example:
row_count_of_non_neg = sum(1 for n in row if n >= 0)
# or:
row_count_of_non_neg = sum(n >= 0 for n in row)
# "abusing" True == 1 and False == 0
I agree with everyone. Use Numpy/Scipy. But here are specific answers to your questions.
Yes. And the index can either be a built-in list or a Numpy array. Suppose x = scipy.array([10, 11, 12, 13]) and y = scipy.array([0, 2]). Then x[[0, 2]] and x[y] both return the same thing.
new_data = scipy.delete(data, x, axis=0)
(data>=0).sum(axis=1)
Careful: Example 2 illustrates a common pitfall with Numpy/Scipy. As shown in Example 3, the axis property is usually set to 0 to operate along the first dimension of an array, 1 to operate along the second dimension, and so on. But some commands like delete actually reverse the order of dimensions as shown in Example 2. You know, row major vs. column major.
Here's an example of how to easily create an array (matrix) in numpy:
>>> import numpy
>>> a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
here is how it is displayed
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and how to get a row or column:
>>> a[0,:]
array([1, 2, 3])
>>> a[:,0]
array([1, 4, 7])
Hope the syntax is clear from the example! Numpy is rather powerfull.
You can expand list functionality to allow indexing with tuples by overloading the __getitem__ and __setitem__ methods of the built-in list. Try the following code:
class my_list(list):
def __getitem__(self, key):
if isinstance(key, tuple) and len(key) > 0:
temp = []
for k in key: temp.append(list.__getitem__(self, k))
return temp
else:
return list.__getitem__(self, key)
def __setitem__(self, key, data):
if isinstance(key, tuple) and len(key) > 0:
for k in key: list.__setitem__(self, k, data)
else:
list.__setitem__(self, key, data)
if __name__ == '__main__':
L = my_list([1, 2, 3, 4, 5])
T = (1,3)
print(L[T])
(1)
I don't think you can use a tuple as an index of python list. You may use list of list ( e.g. a[i][j]) but it seems that it's not your point. You may use a dictionary whose key is tuple.
d = { (1,1):1, (2,1):2 ... }
(2)
If you don't mind about the performance,
map( lambda x: d.remove(x) if x[1] = col_number, d.keys() )
(3)
You can also use the filter to do that.
sum(
map( lambda x:x[1], filter(lambda x,y: x[1] == row_num and y > 0, d.items()))
)
No, it isn't the case that a list can be indexed by anything but an integer. A dictionary, however, is another case. A dictionary is a hash table consisting a key-value pairs. Keys must be unique and immutable. The value can be objects of any type, including integers, tuples, lists, or other dictionaries. For your example, tuples can serve as keys, since they are immutable. Lists, on the other hand, aren't and, thus, can't be dictionary keys.
Some of the capabilities you've asked about could be implemented as a combination of a dictionary and list comprehensions. Others would require subclassing the dictionary and adding methods to implement your desired functionality.
Using native python you could use:
my_list = [0, 1, 2, 3]
index_tuple = (1,2)
x = [item for idx, item in enumerate(my_list) if idx in index_tuple]
Related
I want to find a maximum elements index in a nested list for each row.
I got this error:
maxe = array[i].index(max(e_object.x for e_object in array[i]))
ValueError: 5 is not in list
class e_object():
def __init__(self,x):
self.x = x
array = []
array.append( [])
array[0].append(e_object(0))
array[0].append(e_object(2))
array[0].append(e_object(-3))
array[0].append(e_object(5))
array.append( [])
array[1].append(e_object(0))
array[1].append(e_object(2))
array[1].append(e_object(8))
array[1].append(e_object(5))
max_array = []
for i in range(len(array)):
maxe = array[i].index(max(e_object.x for e_object in array[i]))
max_array.append(maxe)
print(max_array)
How can I get this result?
[3,2]
Use a list comprehension to convert the nested list into one of x values, and then use np.argmax:
import numpy as np
np.argmax([[element.x for element in row] for row in array], axis=1)
Output:
array([3, 2], dtype=int64)
The problem is line
maxe = array[i].index(max(e_object.x for e_object in array[i]))
You are asking the index of object.x, but the list actually contains object. Change the max function to look inside the objects x attribute
maxe = array[i].index(max(array[i], key=lambda o: o.x))
The error happens because your max effectively operates on a list of integers [0, 2, -3, 5], while index searches in a list of e_objects. There are any number of ways of fixing this issue.
The simplest is probably to just have max return the index:
max_array = [max(range(len(a)), key=lambda x: a[x].x) for a in array]
This is very similar to using numpy's argmax, but without the heavy import and intermediate memory allocations. Notice that this version does not require two passes over each list since you don't call index on the result of max.
A more long term solution would be to add the appropriate comparison methods, like __eq__ and __gt__/__lt__ to the e_object class.
I have a very long array (over 2 million values) with repeating value. It looks something like this:
array = [1,1,1,1,......,2,2,2.....3,3,3.....]
With a bunch of different values. I want to create individual arrays for each group of points. IE: an array for the ones, an array for the twos, and so forth. So something that would look like:
array1 = [1,1,1,1...]
array2 = [2,2,2,2.....]
array3 = [3,3,3,3....]
.
.
.
.
None of the values occur an equal amount of time however, and I don't know how many times each value occurs. Any advice?
Assuming that repeated values are grouped together (otherwise you simply need to sort the list), you can create a nested list (rather than a new list for every different value) using itertools.groupby:
from itertools import groupby
array = [1,1,1,1,2,2,2,3,3]
[list(v) for k,v in groupby(array)]
[[1, 1, 1, 1], [2, 2, 2], [3, 3]]
Note that this will be more convenient than creating n new lists created dinamically as shown for instance in this post, as you have no idea of how many lists will be created, and you will have to refer to each list by its name rather by simply indexing a nested list
You can use bisect.bisect_left to find the indices of the first occurence of each element. This works only if the list is sorted:
from bisect import bisect_left
def count_values(l, values=None):
if values is None:
values = range(1, l[-1]+1) # Default assume list is [1..n]
counts = {}
consumed = 0
val_iter = iter(values)
curr_value = next(val_iter)
next_value = next(val_iter)
while True:
ind = bisect_left(l, next_value, consumed)
counts[curr_value] = ind - consumed
consumed = ind
try:
curr_value, next_value = next_value, next(val_iter)
except StopIteration:
break
counts[next_value] = len(l) - consumed
return counts
l = [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3]
print(count_values(l))
# {1: 9, 2: 8, 3: 7}
This avoids scanning the entire list, trading that for a binary search for each value. Expect this to be more performant where there are very many of each element, and less performant where there are few of each element.
Well, it seems to be wasteful and redundant to create all those arrays, each of which just stores repeating values.
You might want to just create a dictionary of unique values and their respective counts.
From this dictionary, you can always selectively create any of the individual arrays easily, whenever you want, and whichever particular one you want.
To create such a dictionary, you can use:
from collections import Counter
my_counts_dict = Counter(my_array)
Once you have this dict, you can get the number of 23's, for example, with my_counts_dict[23].
And if this returns 200, you can create your list of 200 23's with:
my_list23 = [23]*200
****Use this code ****
<?php
$arrayName = array(2,2,5,1,1,1,2,3,3,3,4,5,4,5,4,6,6,6,7,8,9,7,8,9,7,8,9);
$arr = array();
foreach ($arrayName as $value) {
$arr[$value][] = $value;
}
sort($arr);
print_r($arr);
?>
Solution with no helper functions:
array = [1,1,2,2,2,3,4]
result = [[array[0]]]
for i in array[1:]:
if i == result[-1][-1]:
result[-1].append(i)
else:
result.append([i])
print(result)
# [[1, 1], [2, 2, 2], [3], [4]]
I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].
This is my code:
a = list(set(l))
res = [a.index(x) for x in l]
Which turns out to be very slow.
l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?
You can do this in O(N) time using a defaultdict and a list comprehension:
>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
In Python 3 use __next__ instead of next.
If you're wondering how it works?
The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.
d at the end will look like this:
>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})
The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.
The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.
Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.
res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)
Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:
[1,1,2,3,1,2]
then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you
import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
and for this example the output of y is:
y = [0,0,2,1,0,2]
I tested this for 1,000,000 entries and it was done essentially immediately.
Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.
To make it O(n), get rid of index() and store indexes in a dictionary:
>>> idx, indexes = 1, {}
>>> for x in l:
... if x not in indexes:
... indexes[x] = idx
... idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.
You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)
For completness, you can also do it eagerly:
from itertools import count
wordid = dict(zip(set(list_), count(1)))
This uses a set to obtain the unique words in list_, pairs
each of those unique words with the next value from count() (which
counts upwards), and constructs a dictionary from the results.
Original answer, written by nneonneo.
I want to remove the list items found in list B from the list A. This is the function I wrote:
def remove(A,B):
to_remove=[];
for i in range(len(A)):
for j in range(len(B)):
if (B[j]==A[i]):
to_remove.append(i);
for j in range(len(to_remove)):
A.pop(to_remove[j]);
Is this the normal way to do it ? Although, this works completely fine (if typos, I don't know), I think there might be more pythonic way to do it. Please suggest.
Convert B to a set first and then create a new array from A using a list comprehension:
s = set(B)
A = [item for item in A if item not in s]
Item lookup in a set is an O(1) operation.
If you don't want to change the id() of A, then:
A[:] = [item for item in A if item not in s]
First, note that your function doesn't work right. Try this:
A = [1, 2, 3]
B = [1, 2, 3]
remove(A, B)
You'll get an IndexError, because the correct indices to delete change each time you do a .pop().
You'll doubtless get answers recommending using sets, and that's indeed much better if the array elements are hashable and comparable, but in general you may need something like this:
def remove(A, B):
A[:] = [avalue for avalue in A if avalue not in B]
That works for any kinds of array elements (provided only they can be compared for equality), and preserves the original ordering. But it takes worst-case time proportional to len(A) * len(B).
List comprehenstion to the rescue:
[item for item in A if item not in B]
This however creates a new list. You can return the list from the function.
Or, if you are ok with loosing any duplicates in list A, or there are no duplicates, you can use set difference:
return list(set(A) - set(B))
One caveat is, this won't preserve the order of elements in A. So, if you want elements in order, this is not what you want. Use the 1st approach instead.
What about list comprehension?
def remove(removeList, fromList):
return [x for x in fromList if x not in removeList]
Also, to make life easier and removing faster you can make a set from list removeList, leaving only unique elements:
def remove(removeList, fromList):
removeSet = set(removeList)
return [x for x in fromList if x not in removeSet]
>>> print remove([1,2,3], [1,2,3,4,5,6,7])
[4, 5, 6, 7]
And, of course, you can use built-in filter function, though someone will say that it's non-pythonic, and you should use list generators instead. Either way, here is an example:
def remove(removeList, fromList):
removeSet = set(removeList)
return filter(lambda x : x not in removeSet, fromList)
I'm trying to write a piece of code that can automatically factor an expression. For example,
if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].
From this post: How to find list intersection?
I see that the common elements can be found by
set([1,2,3,4]&set([2,3,5]).
Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?
I can go ahead and do a for loop:
lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
for elem in eachList:
if elem not in common:
nonCommon += elem
But this seems redundant and inefficient. Does Python provide any handy function that can do that? Thanks in advance!!
Use the symmetric difference operator for sets (aka the XOR operator):
>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])
Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference().
EXAMPLE
list_one = [1,2,3,4]
list_two = [2,3,5]
one_not_two = set(list_one).difference(list_two)
# set([1, 4])
two_not_one = set(list_two).difference(list_one)
# set([5])
This could also be written as:
one_not_two = set(list_one) - set(list_two)
Timing
I ran some timing tests on both and it appears that .difference() has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.
Other Notes
It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. Most of the previous answers return a single list or set that include all of the items.
There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.
If the OP wants to maintain duplicates, a list comprehension could be used, for example:
one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]
...which is roughly the same solution as posed in the original question, only a little cleaner. This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.
You can use Intersection concept to deal with this kind of problems.
b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}
Best thing about using this code is it works pretty fast for large data also. I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.
You can use the .__xor__ attribute method.
set([1,2,3,4]).__xor__(set([2,3,5]))
or
a = set([1,2,3,4])
b = set([2,3,5])
a.__xor__(b)
You can use symmetric_difference command
x = {1,2,3}
y = {2,3,4}
z = set.difference(x,y)
Output will be : z = {1,4}
This should get the common and remaining elements
lis1=[1,2,3,4,5,6,2,3,1]
lis2=[4,5,8,7,10,6,9,8]
common = list(dict.fromkeys([l1 for l1 in lis1 if l1 in lis2]))
remaining = list(filter(lambda i: i not in common, lis1+lis2))
common = [4, 5, 6]
remaining = [1, 2, 3, 2, 3, 1, 8, 7, 10, 9, 8]
All the good solutions, starting from basic DSA style to using inbuilt functions:
# Time: O(2n)
def solution1(arr1, arr2):
map = {}
maxLength = max(len(arr1), len(arr2))
for i in range(maxLength):
if(arr1[i]):
if(not map.get(arr1[i])):
map[arr1[i]] = [True, False]
else:
map[arr1[i]][0] = True
if(arr2[i]):
if(not map.get(arr2[i])):
map[arr2[i]] = [False, True]
else:
map[arr2[i]][1] = False
res = [];
for key, value in map.items():
if(value[0] == False or value[1] == False):
res.append(key)
return res
def solution2(arr1, arr2):
return set(arr1) ^ set(arr2)
def solution3(arr1, arr2):
return (set(arr1).difference(arr2), set(arr2).difference(arr1))
def solution4(arr1, arr2):
return set(arr1).__xor__(set(arr2))
print(solution1([1,2,3], [2,4,6]))
print(solution2([1,2,3], [2,4,6]))
print(solution3([1,2,3], [2,4,6]))
print(solution4([1,2,3], [2,4,6]))