loop through list of dictionaries - python

i have a list of dictionaries. there are several points inside the list, some are multiple. When there is a multiple entry i want to calculate the average of the x and the y of this point. My problem is, that i don't know how to loop through the list of dictionaries to compare the ids of the points!
when i use something like that:
for i in list:
for j in list:
if i['id'] == j['id']:
point = getPoint(i['geom'])
....
sorry, the formating is a little bit tricky... the second loop is inside the first one...
i think it compares the first entry of the list, so it's the same... so i have to start in the second loop with the second entry, but i can't do that with i-1 because i is the hole dictionary...
Someone an idea?
thanks in advance!
for j in range(1, len(NEWPoint)):
if i['gid']==j['gid']:
allsamePoints.append(j)
for k in allsamePoints:
for l in range(1, len(allsamePoints)):
if k['gid']==l['gid']:
Point1 = k['geom']
Point2=l['geom']
X=(Point1.x()+Point2.x())/2
Y=(Point1.y()+Point2.y())/2
AVPoint = QgsPoint(X, Y)
NEWReturnList.append({'gid': j['gid'], 'geom': AVPoint})
del l
for m in NEWReturnList:
for n in range(1, len(NEWReturnList)):
if m['gid']==n['gid']:
Point1 = m['geom']
Point2=n['geom']
X=(Point1.x()+Point2.x())/2
Y=(Point1.y()+Point2.y())/2
AVPoint = QgsPoint(X, Y)
NEWReturnList.append({'gid': j['gid'], 'geom': AVPoint})
del n
else:
pass
ok, i think... at the moment thats more confusing :)...

One way would be changing the way you store your points, because as you already noticed, it's hard to get what you want out of it.
A much more useful structure would be a dict where the id maps to a list of points:
from collections import defaultdict
points_dict = defaultdict(list)
# make the new dict
for point in point_list:
id = point["id"]
points_dict[id].append(point['geom'])
def avg( lst ):
""" average of a `lst` """
return 1.0 * sum(lst)/len(lst)
# now its simple to get the average
for id in points_dict:
print id, avg( points_dict[id] )

I'm not totally sure what you want to do, but I think list filtering would help you. There's built-in function filter, which iterates over a sequence and for each item it calls user-defined function to determine whether to include that item in the resulting list or not.
For instance:
def is4(number):
return number == 4
l = [1, 2, 3, 4, 5, 6, 4, 7, 8, 4, 4]
filter(is4, l) # returns [4, 4, 4, 4]
So, having a list of dictionaries, to filter out all dictionaries with certain entry equal to a given value, you could do something like this:
def filter_dicts(dicts, entry, value):
def filter_function(d):
if entry not in d:
return False
return d[entry] == value
return filter(filter_function, dicts)
With this function, to get all dictionaries with the "id" entry equal to 2, you can do:
result = filter_dicts(your_list, "id", 2)
With this, your main loop could look something like this:
processed_ids = set()
for item in list:
id = item['id']
if id in processed_ids:
continue
processed_ids.add(id)
same_ids = filter_dicts(list, "id", id)
# now do something with same_ids
I hope I understood you correctly and that this is helpful to you.

Related

Finding first time value occurs in an array when you don't know what it is

I have a very long array (over 2 million values) with repeating value. It looks something like this:
array = [1,1,1,1,......,2,2,2.....3,3,3.....]
With a bunch of different values. I want to create individual arrays for each group of points. IE: an array for the ones, an array for the twos, and so forth. So something that would look like:
array1 = [1,1,1,1...]
array2 = [2,2,2,2.....]
array3 = [3,3,3,3....]
.
.
.
.
None of the values occur an equal amount of time however, and I don't know how many times each value occurs. Any advice?
Assuming that repeated values are grouped together (otherwise you simply need to sort the list), you can create a nested list (rather than a new list for every different value) using itertools.groupby:
from itertools import groupby
array = [1,1,1,1,2,2,2,3,3]
[list(v) for k,v in groupby(array)]
[[1, 1, 1, 1], [2, 2, 2], [3, 3]]
Note that this will be more convenient than creating n new lists created dinamically as shown for instance in this post, as you have no idea of how many lists will be created, and you will have to refer to each list by its name rather by simply indexing a nested list
You can use bisect.bisect_left to find the indices of the first occurence of each element. This works only if the list is sorted:
from bisect import bisect_left
def count_values(l, values=None):
if values is None:
values = range(1, l[-1]+1) # Default assume list is [1..n]
counts = {}
consumed = 0
val_iter = iter(values)
curr_value = next(val_iter)
next_value = next(val_iter)
while True:
ind = bisect_left(l, next_value, consumed)
counts[curr_value] = ind - consumed
consumed = ind
try:
curr_value, next_value = next_value, next(val_iter)
except StopIteration:
break
counts[next_value] = len(l) - consumed
return counts
l = [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3]
print(count_values(l))
# {1: 9, 2: 8, 3: 7}
This avoids scanning the entire list, trading that for a binary search for each value. Expect this to be more performant where there are very many of each element, and less performant where there are few of each element.
Well, it seems to be wasteful and redundant to create all those arrays, each of which just stores repeating values.
You might want to just create a dictionary of unique values and their respective counts.
From this dictionary, you can always selectively create any of the individual arrays easily, whenever you want, and whichever particular one you want.
To create such a dictionary, you can use:
from collections import Counter
my_counts_dict = Counter(my_array)
Once you have this dict, you can get the number of 23's, for example, with my_counts_dict[23].
And if this returns 200, you can create your list of 200 23's with:
my_list23 = [23]*200
****Use this code ****
<?php
$arrayName = array(2,2,5,1,1,1,2,3,3,3,4,5,4,5,4,6,6,6,7,8,9,7,8,9,7,8,9);
$arr = array();
foreach ($arrayName as $value) {
$arr[$value][] = $value;
}
sort($arr);
print_r($arr);
?>
Solution with no helper functions:
array = [1,1,2,2,2,3,4]
result = [[array[0]]]
for i in array[1:]:
if i == result[-1][-1]:
result[-1].append(i)
else:
result.append([i])
print(result)
# [[1, 1], [2, 2, 2], [3], [4]]

Python recursion list

This is the situation: A list consisting of ids - every one of those Ids can be related to any number (0 to n) other ids, which again can be related to other ids, etc. As a result I want a list of all relations, no matter the "depth".
At least to me this screams recursion but I can't quite wrap my head around how to do it.
def dive(rels):
if dive(rels) == []:
return rels
else:
for item in rels:
rels.append(getRelation(item))
rels = list(set(flattenAndClean(rels)))
return dive(rels)
This is my first (not working) attempt, where the function getRelation returns a list of relations of this item and the function flattenAndClean takes nested lists and returns flat ones.
Edit: Example:
Items={1:[4,5,6],2:[6,8],3:[],4:[7],5:[],6:[],7:[4],8:[]}
List = [1,2,3]
def getRelation(id):
return Items[id]
In: dive(List)
Out: [4,5,6,7,8]
def rels(items, L):
answer = set(L)
for i in L:
targs = tuple(t for t in items[i] if t not in L)
L.extend(targs)
answer.update(set(targs))
return answer
Output
In [29]: rels({1:[4,5,6],2:[6,8],3:[],4:[7],5:[],6:[],7:[4],8:[]}, [1,2,3])
Out[29]: {1, 2, 3, 4, 5, 6, 7, 8}

How to find second smallest UNIQUE number in a list?

I need to create a function that returns the second smallest unique number, which means if
list1 = [5,4,3,2,2,1], I need to return 3, because 2 is not unique.
I've tried:
def second(list1):
result = sorted(list1)[1]
return result
and
def second(list1):
result = list(set((list1)))
return result
but they all return 2.
EDIT1:
Thanks guys! I got it working using this final code:
def second(list1):
b = [i for i in list1 if list1.count(i) == 1]
b.sort()
result = sorted(b)[1]
return result
EDIT 2:
Okay guys... really confused. My Prof just told me that if list1 = [1,1,2,3,4], it should return 2 because 2 is still the second smallest number, and if list1 = [1,2,2,3,4], it should return 3.
Code in eidt1 wont work if list1 = [1,1,2,3,4].
I think I need to do something like:
if duplicate number in position list1[0], then remove all duplicates and return second number.
Else if duplicate number postion not in list1[0], then just use the code in EDIT1.
Without using anything fancy, why not just get a list of uniques, sort it, and get the second list item?
a = [5,4,3,2,2,1] #second smallest is 3
b = [i for i in a if a.count(i) == 1]
b.sort()
>>>b[1]
3
a = [5,4,4,3,3,2,2,1] #second smallest is 5
b = [i for i in a if a.count(i) == 1]
b.sort()
>>> b[1]
5
Obviously you should test that your list has at least two unique numbers in it. In other words, make sure b has a length of at least 2.
Remove non unique elements - use sort/itertools.groupby or collections.Counter
Use min - O(n) to determine the minimum instead of sort - O(nlongn). (In any case if you are using groupby the data is already sorted) I missed the fact that OP wanted the second minimum, so sorting is still a better option here
Sample Code
Using Counter
>>> sorted(k for k, v in Counter(list1).items() if v == 1)[1]
1
Using Itertools
>>> sorted(k for k, g in groupby(sorted(list1)) if len(list(g)) == 1)[1]
3
Here's a fancier approach that doesn't use count (which means it should have significantly better performance on large datasets).
from collections import defaultdict
def getUnique(data):
dd = defaultdict(lambda: 0)
for value in data:
dd[value] += 1
result = [key for key in dd.keys() if dd[key] == 1]
result.sort()
return result
a = [5,4,3,2,2,1]
b = getUnique(a)
print(b)
# [1, 3, 4, 5]
print(b[1])
# 3
Okay guys! I got the working code thanks to all your help and helping me to think on the right track. This code works:
`def second(list1):
if len(list1)!= len(set(list1)):
result = sorted(list1)[2]
return result
elif len(list1) == len(set(list1)):
result = sorted(list1)[1]
return result`
Okay, here usage of set() on a list is not going to help. It doesn't purge the duplicated elements. What I mean is :
l1=[5,4,3,2,2,1]
print set(l1)
Prints
[0, 1, 2, 3, 4, 5]
Here, you're not removing the duplicated elements, but the list gets unique
In your example you want to remove all duplicated elements.
Try something like this.
l1=[5,4,3,2,2,1]
newlist=[]
for i in l1:
if l1.count(i)==1:
newlist.append(i)
print newlist
This in this example prints
[5, 4, 3, 1]
then you can use heapq to get your second largest number in your list, like this
print heapq.nsmallest(2, newlist)[-1]
Imports : import heapq, The above snippet prints 3 for you.
This should to the trick. Cheers!

Python List indexed by tuples

I'm a Matlab user needing to use Python for some things, I would really appreciate it if someone can help me out with Python syntax:
(1) Is it true that lists can be indexed by tuples in Python? If so, how do I do this? For example, I would like to use that to represent a matrix of data.
(2) Assuming I can use a list indexed by tuples, say, data[(row,col)], how do I remove an entire column? I know in Matlab, I can do something like
new_data = [data(:,1:x-1) data(:,x+1:end)];
if I wanted to remove column x from data.
(3) How can I easily count the number of non-negative elements in each row. For example, in Matlab, I can do something like this:
sum(data>=0,1)
this would give me a column vector that represents the number of non-negative entries in each row.
Thanks a lot!
You should look into numpy, it's made for just this sort of thing.
No, but dicts can.
Sounds like you want a "2d array", matrix type, or something else. Have you looked at numpy yet?
Depends on what you choose from #2, but Python does have sum and other functions that work directly on iterables. Look at gen-exprs (generator expressions) and list comprehensions. For example:
row_count_of_non_neg = sum(1 for n in row if n >= 0)
# or:
row_count_of_non_neg = sum(n >= 0 for n in row)
# "abusing" True == 1 and False == 0
I agree with everyone. Use Numpy/Scipy. But here are specific answers to your questions.
Yes. And the index can either be a built-in list or a Numpy array. Suppose x = scipy.array([10, 11, 12, 13]) and y = scipy.array([0, 2]). Then x[[0, 2]] and x[y] both return the same thing.
new_data = scipy.delete(data, x, axis=0)
(data>=0).sum(axis=1)
Careful: Example 2 illustrates a common pitfall with Numpy/Scipy. As shown in Example 3, the axis property is usually set to 0 to operate along the first dimension of an array, 1 to operate along the second dimension, and so on. But some commands like delete actually reverse the order of dimensions as shown in Example 2. You know, row major vs. column major.
Here's an example of how to easily create an array (matrix) in numpy:
>>> import numpy
>>> a = numpy.array([[1,2,3],[4,5,6],[7,8,9]])
here is how it is displayed
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and how to get a row or column:
>>> a[0,:]
array([1, 2, 3])
>>> a[:,0]
array([1, 4, 7])
Hope the syntax is clear from the example! Numpy is rather powerfull.
You can expand list functionality to allow indexing with tuples by overloading the __getitem__ and __setitem__ methods of the built-in list. Try the following code:
class my_list(list):
def __getitem__(self, key):
if isinstance(key, tuple) and len(key) > 0:
temp = []
for k in key: temp.append(list.__getitem__(self, k))
return temp
else:
return list.__getitem__(self, key)
def __setitem__(self, key, data):
if isinstance(key, tuple) and len(key) > 0:
for k in key: list.__setitem__(self, k, data)
else:
list.__setitem__(self, key, data)
if __name__ == '__main__':
L = my_list([1, 2, 3, 4, 5])
T = (1,3)
print(L[T])
(1)
I don't think you can use a tuple as an index of python list. You may use list of list ( e.g. a[i][j]) but it seems that it's not your point. You may use a dictionary whose key is tuple.
d = { (1,1):1, (2,1):2 ... }
(2)
If you don't mind about the performance,
map( lambda x: d.remove(x) if x[1] = col_number, d.keys() )
(3)
You can also use the filter to do that.
sum(
map( lambda x:x[1], filter(lambda x,y: x[1] == row_num and y > 0, d.items()))
)
No, it isn't the case that a list can be indexed by anything but an integer. A dictionary, however, is another case. A dictionary is a hash table consisting a key-value pairs. Keys must be unique and immutable. The value can be objects of any type, including integers, tuples, lists, or other dictionaries. For your example, tuples can serve as keys, since they are immutable. Lists, on the other hand, aren't and, thus, can't be dictionary keys.
Some of the capabilities you've asked about could be implemented as a combination of a dictionary and list comprehensions. Others would require subclassing the dictionary and adding methods to implement your desired functionality.
Using native python you could use:
my_list = [0, 1, 2, 3]
index_tuple = (1,2)
x = [item for idx, item in enumerate(my_list) if idx in index_tuple]

Storing and updating lists in Python dictionaries: why does this happen?

I have a list of data that looks like the following:
// timestep,x_position,y_position
0,4,7
0,2,7
0,9,5
0,6,7
1,2,5
1,4,7
1,9,0
1,6,8
... and I want to make this look like:
0, (4,7), (2,7), (9,5), (6,7)
1, (2,5), (4,7), (9,0), (6.8)
My plan was to use a dictionary, where the value of t is the key for the dictionary, and the value against the key would be a list. I could then append each (x,y) to the list. Something like:
# where t = 0, c = (4,7), d = {}
# code 1
d[t].append(c)
Now this causes IDLE to fail. However, if I do:
# code 2
d[t] = []
d[t].append(c)
... this works.
So the question is: why does code 2 work, but code 1 doesn't?
PS Any improvement on what I'm planning on doing would be of great interest!! I think I will have to check the dictionary on each loop through the input to see if the dictionary key already exists, I guess by using something like max(d.keys()): if it is there, append data, if not create the empty list as the dictionary value, and then append data on the next loop through.
Let's look at
d[t].append(c)
What is the value of d[t]? Try it.
d = {}
t = 0
d[t]
What do you get? Oh. There's nothing in d that has a key of t.
Now try this.
d[t] = []
d[t]
Ahh. Now there's something in d with a key of t.
There are several things you can do.
Use example 2.
Use setdefault. d.setdefault(t,[]).append(c).
Use collections.defaultdict. You'd use a defaultdict(list) instead of a simple dictionary, {}.
Edit 1. Optimization
Given input lines from a file in the above form: ts, x, y, the grouping process is needless. There's no reason to go from a simple list of ( ts, x, y ) to a more complex
list of ( ts, (x,y), (x,y), (x,y), ... ). The original list can be processed exactly as it arrived.
d= collections.defaultdict(list)
for ts, x, y in someFileOrListOrQueryOrWhatever:
d[ts].append( (x,y) )
Edit 2. Answer Question
"when initialising a dictionary, you need to tell the dictionary what the key-value data structure will look like?"
I'm not sure what the question means. Since, all dictionaries are key-value structures, the question's not very clear. So, I'll review the three alternatives, which may answer the question.
Example 2.
Initialization
d= {}
Use
if t not in d:
d[t] = list()
d[t].append( c )
Each dictionary value must be initialized to some useful structure. In this case, we check to see if the key is present; when the key is missing, we create the key and assign an empty list.
Setdefault
Initialization
d= {}
Use
d.setdefault(t,list()).append( c )
In this case, we exploit the setdefault method to either fetch a value associated with a key or create a new value associated with a missing key.
default dict
Initialization
import collections
d = collections.defaultdict(list)
Use
d[t].append( c )
The defaultdict uses an initializer function for missing keys. In this case, we provide the list function so that a new, empty list is created for a missing key.
I think you want to use setdefault. It's a bit weird to use but does exactly what you need.
d.setdefault(t, []).append(c)
The .setdefault method will return the element (in our case, a list) that's bound to the dict's key t if that key exists. If it doesn't, it will bind an empty list to the key t and return it. So either way, a list will be there that the .append method can then append the tuple c to.
dict=[] //it's not a dict, it's a list, the dictionary is dict={}
elem=[1,2,3]
dict.append(elem)
you can access the single element in this way:
print dict[0] // 0 is the index
the output will be:
[1, 2, 3]
In the case your data is not already sorted by desired criteria, here's the code that might help to group the data:
#!/usr/bin/env python
"""
$ cat data_shuffled.txt
0,2,7
1,4,7
0,4,7
1,9,0
1,2,5
0,6,7
1,6,8
0,9,5
"""
from itertools import groupby
from operator import itemgetter
# load the data and make sure it is sorted by the first column
sortby_key = itemgetter(0)
data = sorted((map(int, line.split(',')) for line in open('data_shuffled.txt')),
key=sortby_key)
# group by the first column
grouped_data = []
for key, group in groupby(data, key=sortby_key):
assert key == len(grouped_data) # assume the first column is 0,1, ...
grouped_data.append([trio[1:] for trio in group])
# print the data
for i, pairs in enumerate(grouped_data):
print i, pairs
Output:
0 [[2, 7], [4, 7], [6, 7], [9, 5]]
1 [[4, 7], [9, 0], [2, 5], [6, 8]]

Categories