Python 2 dimensional table and distinct entries - python

I'm making a function called different(). It needs to take a two-dimensional table as input and return the number of distinct entries in the table. I'm not sure how to start it,m I would really appreciate some suggestions. When used, it should look like this in the shell:
t = [[1,0,1], [0,1,0]]
different(t)
>>2
This is what I have so far:
def different()-> int
''' takes a two-dimensional table and returns number of distinct entries'''
t = []
while
#use set method?

I think there are two possible interpretations
>>> set(j for i in t for j in i)
set([0, 1])
and
>>> set(tuple(i) for i in t) # equivalent to set(map(tuple, t))
set([(0, 1, 0), (1, 0, 1)])
Either way, different should return the len of the set
def different(t):
return len(set(...))
If you like itertools, you can do the following
from itertools import chain
def different(t):
return len(set(chain.from_iterable(t)))

def different(t):
return len(set(tuple(item) for item in t))

From basic knowledge of python you can solve the above problem
t = [[1,0,1], [0,1,0], [1,2,3], [1,0,1]]
a = []
for i in t:
if i not in a:
a.append(i)
print len(a)
you have created a new list name 'a' and you have inserted all those element in the list which are unique. And afetrwards you can get the length of the new list a.

Related

Python list comprehension - generating multiple data

import random
M = 4
N = 3
def generisanjevol1(nekalista, m):
return random.choices(nekalista, k=m)
def generisanjevol2(nekalista, m,n):
#obj = [[random.choice(nekalista,k=m)] for i in range(N)]]
obj = [[random.choice(nekalista)] for i in range(n)]
return obj
#def poredjenje()
listaslova = ['A', 'B', 'C', 'D', 'E']
lista = generisanjevol1(listaslova, M)
lista2 = generisanjevol2(listaslova, M, N)
print(lista)
print(lista2)
So above is my try (generisanjevol2(nekalista, m,n)...
What I am trying to do is next:
I want to generate N of arrays and fill them with strings which are generated by random.choice function and they still must be strings from listaslova)
Perhaps let's say N=3 (N represents numbers of arrays) and M=4 (M represents length of array) I should get something like this (doesn't have to be same data in arrays, because of course they are randomly generated):
[A,C,D,E]
([A,C,E,D] [E,C,B,A] [E,D,D,A])
But the results which I get are following:
[A,D,E,C]
[[B],[D],[E]]
P.S If I try the one which is commented I get an error
The error in your commented line is because you have an extra ]. And random.choice should be random.choices.
But you also shouldn't put another list around the call to random.choice(). It already returns a list.
def generisanjevol2(nekalista, m,n):
obj = [random.choices(nekalista,k=m) for i in range(n)]
return obj
Like #Barmar said, you indeed have an extra [].
Your function should look:
def generisanjevol2(nekalista, m,n):
...: obj = [random.choices(nekalista, k=m) for i in range(n)]
...: return obj

Optimize search to find next matching value in a list

I have a program that goes through a list and for each objects finds the next instance that has a matching value. When it does it prints out the location of each objects. The program runs perfectly fine but the trouble I am running into is when I run it with a large volume of data (~6,000,000 objects in the list) it will take much too long. If anyone could provide insight into how I can make the process more efficient, I would greatly appreciate it.
def search(list):
original = list
matchedvalues = []
count = 0
for x in original:
targetValue = x.getValue()
count = count + 1
copy = original[count:]
for y in copy:
if (targetValue == y.getValue):
print (str(x.getLocation) + (,) + str(y.getLocation))
break
Perhaps you can make a dictionary that contains a list of indexes that correspond to each item, something like this:
values = [1,2,3,1,2,3,4]
from collections import defaultdict
def get_matches(x):
my_dict = defaultdict(list)
for ind, ele in enumerate(x):
my_dict[ele].append(ind)
return my_dict
Result:
>>> get_matches(values)
defaultdict(<type 'list'>, {1: [0, 3], 2: [1, 4], 3: [2, 5], 4: [6]})
Edit:
I added this part, in case it helps:
values = [1,1,1,1,2,2,3,4,5,3]
def get_next_item_ind(x, ind):
my_dict = get_matches(x)
indexes = my_dict[x[ind]]
temp_ind = indexes.index(ind)
if len(indexes) > temp_ind + 1:
return(indexes)[temp_ind + 1]
return None
Result:
>>> get_next_item_ind(values, 0)
1
>>> get_next_item_ind(values, 1)
2
>>> get_next_item_ind(values, 2)
3
>>> get_next_item_ind(values, 3)
>>> get_next_item_ind(values, 4)
5
>>> get_next_item_ind(values, 5)
>>> get_next_item_ind(values, 6)
9
>>> get_next_item_ind(values, 7)
>>> get_next_item_ind(values, 8)
There are a few ways you could increase the efficiency of this search by minimising additional memory use (particularly when your data is BIG).
you can operate directly on the list you are passing in, and don't need to make copies of it, in this way you won't need: original = list, or copy = original[count:]
you can use slices of the original list to test against, and enumerate(p) to iterate through these slices. You won't need the extra variable count and, enumerate(p) is efficient in Python
Re-implemented, this would become:
def search(p):
# iterate over p
for i, value in enumerate(p):
# if value occurs more than once, print locations
# do not re-test values that have already been tested (if value not in p[:i])
if value not in p[:i] and value in p[(i + 1):]:
print(e, ':', i, p[(i + 1):].index(e))
v = [1,2,3,1,2,3,4]
search(v)
1 : 0 2
2 : 1 2
3 : 2 2
Implementing it this way will only print out the values / locations where a value is repeated (which I think is what you intended in your original implementation).
Other considerations:
More than 2 occurrences of value: If the value repeats many times in the list, then you might want to implement a function to walk recursively through the list. As it is, the question doesn't address this - and it may be that it doesn't need to in your situation.
using a dictionary: I completely agree with Akavall above, dictionary's are a great way of looking up values in Python - especially if you need to lookup values again later in the program. This will work best if you construct a dictionary instead of a list when you originally create the list. But if you are only doing this once, it is going to cost you more time to construct the dictionary and query over it than simply iterating over the list as described above.
Hope this helps!

Python:reduce list but keep details

say i have a list of items which some of them are similiar up to a point
but then differ by a number after a dot
['abc.1',
'abc.2',
'abc.3',
'abc.7',
'xyz.1',
'xyz.3',
'xyz.11',
'ghj.1',
'thj.1']
i want to to produce from this list a new list which collapses multiples but preserves some of their data, namely the numbers suffixes
so the above list should produce a new list
[('abc',('1','2','3','7'))
('xyz',('1','3','11'))
('ghj',('1'))
('thj',('1'))]
what I have thought, is the first list can be split by the dot into pairs
but then how i group the pairs by the first part without losing the second
I'm sorry if this question is noobish, and thanks in advance
...
wow, I didnt expect so many great answers so fast, thanks
from collections import defaultdict
d = defaultdict(list)
for el in elements:
key, nr = el.split(".")
d[key].append(nr)
#revert dict to list
newlist = d.items()
Map the list with a separator function, use itertools.groupby with a key that takes the first element, and collect the second element into the result.
from itertools import groupby, imap
list1 = ["abc.1", "abc.2", "abc.3", "abc.7", "xyz.1", "xyz.3", "xyz.11", "ghj.1", "thj.1"]
def break_up(s):
a, b = s.split(".")
return a, int(b)
def prefix(broken_up): return broken_up[0]
def suffix(broken_up): return broken_up[1]
result = []
for key, sub in groupby(imap(break_up, list1), prefix):
result.append((key, tuple(imap(suffix, sub))))
print result
Output:
[('abc', (1, 2, 3, 7)), ('xyz', (1, 3, 11)), ('ghj', (1,)), ('thj', (1,))]

loop through list of dictionaries

i have a list of dictionaries. there are several points inside the list, some are multiple. When there is a multiple entry i want to calculate the average of the x and the y of this point. My problem is, that i don't know how to loop through the list of dictionaries to compare the ids of the points!
when i use something like that:
for i in list:
for j in list:
if i['id'] == j['id']:
point = getPoint(i['geom'])
....
sorry, the formating is a little bit tricky... the second loop is inside the first one...
i think it compares the first entry of the list, so it's the same... so i have to start in the second loop with the second entry, but i can't do that with i-1 because i is the hole dictionary...
Someone an idea?
thanks in advance!
for j in range(1, len(NEWPoint)):
if i['gid']==j['gid']:
allsamePoints.append(j)
for k in allsamePoints:
for l in range(1, len(allsamePoints)):
if k['gid']==l['gid']:
Point1 = k['geom']
Point2=l['geom']
X=(Point1.x()+Point2.x())/2
Y=(Point1.y()+Point2.y())/2
AVPoint = QgsPoint(X, Y)
NEWReturnList.append({'gid': j['gid'], 'geom': AVPoint})
del l
for m in NEWReturnList:
for n in range(1, len(NEWReturnList)):
if m['gid']==n['gid']:
Point1 = m['geom']
Point2=n['geom']
X=(Point1.x()+Point2.x())/2
Y=(Point1.y()+Point2.y())/2
AVPoint = QgsPoint(X, Y)
NEWReturnList.append({'gid': j['gid'], 'geom': AVPoint})
del n
else:
pass
ok, i think... at the moment thats more confusing :)...
One way would be changing the way you store your points, because as you already noticed, it's hard to get what you want out of it.
A much more useful structure would be a dict where the id maps to a list of points:
from collections import defaultdict
points_dict = defaultdict(list)
# make the new dict
for point in point_list:
id = point["id"]
points_dict[id].append(point['geom'])
def avg( lst ):
""" average of a `lst` """
return 1.0 * sum(lst)/len(lst)
# now its simple to get the average
for id in points_dict:
print id, avg( points_dict[id] )
I'm not totally sure what you want to do, but I think list filtering would help you. There's built-in function filter, which iterates over a sequence and for each item it calls user-defined function to determine whether to include that item in the resulting list or not.
For instance:
def is4(number):
return number == 4
l = [1, 2, 3, 4, 5, 6, 4, 7, 8, 4, 4]
filter(is4, l) # returns [4, 4, 4, 4]
So, having a list of dictionaries, to filter out all dictionaries with certain entry equal to a given value, you could do something like this:
def filter_dicts(dicts, entry, value):
def filter_function(d):
if entry not in d:
return False
return d[entry] == value
return filter(filter_function, dicts)
With this function, to get all dictionaries with the "id" entry equal to 2, you can do:
result = filter_dicts(your_list, "id", 2)
With this, your main loop could look something like this:
processed_ids = set()
for item in list:
id = item['id']
if id in processed_ids:
continue
processed_ids.add(id)
same_ids = filter_dicts(list, "id", id)
# now do something with same_ids
I hope I understood you correctly and that this is helpful to you.

Python List indexed by tuples

I'm a Matlab user needing to use Python for some things, I would really appreciate it if someone can help me out with Python syntax:
(1) Is it true that lists can be indexed by tuples in Python? If so, how do I do this? For example, I would like to use that to represent a matrix of data.
(2) Assuming I can use a list indexed by tuples, say, data[(row,col)], how do I remove an entire column? I know in Matlab, I can do something like
new_data = [data(:,1:x-1) data(:,x+1:end)];
if I wanted to remove column x from data.
(3) How can I easily count the number of non-negative elements in each row. For example, in Matlab, I can do something like this:
sum(data>=0,1)
this would give me a column vector that represents the number of non-negative entries in each row.
Thanks a lot!
You should look into numpy, it's made for just this sort of thing.
No, but dicts can.
Sounds like you want a "2d array", matrix type, or something else. Have you looked at numpy yet?
Depends on what you choose from #2, but Python does have sum and other functions that work directly on iterables. Look at gen-exprs (generator expressions) and list comprehensions. For example:
row_count_of_non_neg = sum(1 for n in row if n >= 0)
# or:
row_count_of_non_neg = sum(n >= 0 for n in row)
# "abusing" True == 1 and False == 0
I agree with everyone. Use Numpy/Scipy. But here are specific answers to your questions.
Yes. And the index can either be a built-in list or a Numpy array. Suppose x = scipy.array([10, 11, 12, 13]) and y = scipy.array([0, 2]). Then x[[0, 2]] and x[y] both return the same thing.
new_data = scipy.delete(data, x, axis=0)
(data>=0).sum(axis=1)
Careful: Example 2 illustrates a common pitfall with Numpy/Scipy. As shown in Example 3, the axis property is usually set to 0 to operate along the first dimension of an array, 1 to operate along the second dimension, and so on. But some commands like delete actually reverse the order of dimensions as shown in Example 2. You know, row major vs. column major.
Here's an example of how to easily create an array (matrix) in numpy:
>>> import numpy
>>> a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
here is how it is displayed
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and how to get a row or column:
>>> a[0,:]
array([1, 2, 3])
>>> a[:,0]
array([1, 4, 7])
Hope the syntax is clear from the example! Numpy is rather powerfull.
You can expand list functionality to allow indexing with tuples by overloading the __getitem__ and __setitem__ methods of the built-in list. Try the following code:
class my_list(list):
def __getitem__(self, key):
if isinstance(key, tuple) and len(key) > 0:
temp = []
for k in key: temp.append(list.__getitem__(self, k))
return temp
else:
return list.__getitem__(self, key)
def __setitem__(self, key, data):
if isinstance(key, tuple) and len(key) > 0:
for k in key: list.__setitem__(self, k, data)
else:
list.__setitem__(self, key, data)
if __name__ == '__main__':
L = my_list([1, 2, 3, 4, 5])
T = (1,3)
print(L[T])
(1)
I don't think you can use a tuple as an index of python list. You may use list of list ( e.g. a[i][j]) but it seems that it's not your point. You may use a dictionary whose key is tuple.
d = { (1,1):1, (2,1):2 ... }
(2)
If you don't mind about the performance,
map( lambda x: d.remove(x) if x[1] = col_number, d.keys() )
(3)
You can also use the filter to do that.
sum(
map( lambda x:x[1], filter(lambda x,y: x[1] == row_num and y > 0, d.items()))
)
No, it isn't the case that a list can be indexed by anything but an integer. A dictionary, however, is another case. A dictionary is a hash table consisting a key-value pairs. Keys must be unique and immutable. The value can be objects of any type, including integers, tuples, lists, or other dictionaries. For your example, tuples can serve as keys, since they are immutable. Lists, on the other hand, aren't and, thus, can't be dictionary keys.
Some of the capabilities you've asked about could be implemented as a combination of a dictionary and list comprehensions. Others would require subclassing the dictionary and adding methods to implement your desired functionality.
Using native python you could use:
my_list = [0, 1, 2, 3]
index_tuple = (1,2)
x = [item for idx, item in enumerate(my_list) if idx in index_tuple]

Categories