Organizing a random list of objects in Python - python

So I have a list that I want to convert to a list that contains a list for each group of objects.
ie
['objA.attr1', 'objC', 'objA.attr55', 'objB.attr4']
would return
[['objA.attr1', 'objA.attr55'], ['objC'], ['objB.attr4']]
currently this is what I use:
givenList = ['a.attr1', 'b', 'a.attr55', 'c.attr4']
trgList = []
objNames = []
for val in givenList:
obj = val.split('.')[0]
if obj in objNames:
id = objNames.index(obj)
trgList[id].append(val)
else:
objNames.append(obj)
trgList.append([val])
#print trgList
It seems to run a decent speed when the original list has around 100,000 ids... but I am curious if there is a better way to do this. Order of the objects or attributes does not matter. Any ideas?

This needs to be better defined: what do you do when there is no property? What order do you want the final list as? What about duplicates?
A general algorithm would be to use a multi-map: a map that has multiple values per key.
You will then scan through the original list, separate each element into an "object" and "property", and then add a key-value pair for the object and property. At the end of this cycle, you will have a mapping from objects to set of properties. You can then iterate over this to build your final list.
You can use a third-party multimap or implement yourself by mapping into a sequence.
You might want to create a dummy property for cases when the object does not have a property.

Related

Create a list of an inner value from a dict of dicts

I am trying to figure out the max and min values for an inner value of a dict of dicts.
The dict looks like this:
{'ALLEN PHILLIP K': {'bonus': 4175000,
'exercised_stock_options': 1729541,
'expenses': 13868},
'BADUM JAMES P': {'bonus': 'NaN',
'exercised_stock_options': 257817,
'expenses': 3486},
...
}
I want to figure out the minimum and maximum exercised_stock_options across all dictionaries.
I tried using pandas to do this, but couldn't find a way to shape the data appropriately. Then, I tried a simple for-loop in Python. My code for the for-loop doesn't work, and I can't figure out why (the dict of dicts is called data_dict):
stock_options=[]
for person in range(len(data_dict)):
stock_options.append(data_dict[person]['exercised_stock_options'])
print stock_options
Then I was going to take the max and min values of the list.
Any idea why this code doesn't work? Any alternative methods for figuring out the max and min of an inner value of a dict of dicts?
Here's a method that uses a list comprehension to get the exercised_stock_options from each dictionary and then prints out the minimum and maximum value from the data. Ignore the sample data, and you can modify it to suit your needs.
d = {'John Smith':{'exercised_stock_options':99},
'Roger Park':{'exercised_stock_options':50},
'Tim Rogers':{'exercised_stock_options':10}}
data = [d[person]['exercised_stock_options'] for person in d]
print min(data), max(data)
You are using range to get an index number for your main dictionary. What you really should do is get the keys for the dictionary and not the index. That is, person is the name of each one. Thus when person == 'ALLEN PHILLIP K' datadict[person] now gets the dictionary for that key.
Note that the Use items() to iterate across dictionary says that it is better to use d, v = data_dict.items() rather than looping over the dictionary itself. Also note the difference between Python 2 and Python 3.
people=[]
stock_options=[]
for person, stock_data in data_dict.items():
people.append(person)
stock_options.append(stock_data['exercised_stock_options'])
# This lets you keep track of the people as well for future use
print stock_options
mymin = min(stock_options)
mymax = max(stock_options)
# process min and max values.
Best-practice
Use items() to iterate across dictionary
The updated code below demonstrates the Pythonic style for iterating
through a dictionary. When you define two variables in a for loop in
conjunction with a call to items() on a dictionary, Python
automatically assigns the first variable as the name of a key in that
dictionary, and the second variable as the corresponding value for
that key.
d = {"first_name": "Alfred", "last_name":"Hitchcock"}
for key,val in d.items():
print("{} = {}".format(key, val))
Difference Python 2 and Python 3
In python 2.x the above examples using items would return a list with
tuples containing the copied key-value pairs of the dictionary. In
order to not copy and with that load the whole dictionary’s keys and
values inside a list to the memory you should prefer the iteritems
method which simply returns an iterator instead of a list. In Python
3.x the iteritems is removed and the items method returns view objects. The benefit of these view objects compared to the tuples
containing copies is that every change made to the dictionary is
reflected in the view objects.
You need to iterate your dictionary .values() and return the value of "exercised_stock_options". You can use a simple list comprehensions to retrieve those values
>>> values = [value['exercised_stock_options'] for value in d.values()]
>>> values
[257817, 1729541]
>>> min(values)
257817
>>> max(values)
1729541
I've released lifter a few weeks ago exactly for these kind of tasks, I think you may find it useful.
The only problem here is that you have a mapping (a dict of dicts) instead of a regular iterable.
Here is an answer using lifter:
from lifter.models import Model
# We create a model representing our data
Person = Model('Person')
# We convert your data to a regular iterable
iterable = []
for name, data in your_data.items():
data['name'] = name
iterable.append(data)
# we load this into lifter
manager = Person.load(iterable)
# We query the data
results = manager.aggregate(
(Person.exercised_stock_options, min),
(Person.exercised_stock_options, max),
)
You can of course achieve the same result using list comprehensions, however, it's sometimes handy to use a dedicated library, especially if you want to filter data using complex queries before fetching your results. For example, you could get your min and max value only for people with less than 10000 expenses:
# We filter the data
queryset = manager.filter(Person.expenses < 10000)
# we apply our aggregate on the filtered queryset
results = queryset.aggregate(
(Person.exercised_stock_options, min),
(Person.exercised_stock_options, max),
)

Is there a quick/optimal way to get a list of unique values for particular key?

I'd like to get all unique values in a collection for a particular key in a MongoDB. I can loop through the entire collection to get them:
values = []
for item in collection.find():
if item['key'] in values:
pass
else:
values.append(item)
But this seems incredibly inefficient, since I have to check every entry, and loop through the list each time (which gets slow as the number of values gets high). Alternatively, I can put all the values in a list and then make a set (which I think is faster, though I haven't tried to figure out how to test speed yet):
values = []
for item in collection.find():
values.append(item['key'])
unique_values = set(values)
Or with a list comprehension:
unique_values = set([item['key'] for item in collection.find()])
But I'm wondering if there's a built-in function that wouldn't require looping through the entire collection (like if these values are stored in hash tables or something), or if there's some better way to get this.
The distinct() method does this. It returns an array(list) of the distinct values for the given key:
unqiue_values = collection.distinct("key")
MongoDB has a build-in method for this problem:
db.collection.distinct(FIELD)

Accessing class variables from within a list without using for loop?

I have a list of class objects who have two values, x and y. This is my current code to grab an individual object from the list:
for object in object_list:
if object.x == 10 and object.y == 10:
current_object = object
break
And then I can do operations on the object by referencing current_object. However, my problem is that the list contains 2000 class object entries, and I worry that it will be very inefficient to iterate through the list like that until I find the desired object.
Is there a more efficient way for me to get my requested object?
If you are going to do the lookup again and again, then you can turn your list to a dictionary, like this
lookup = {(obj.x, obj.y): object for obj in object_list}
This will create a dictionary with keys as the tuples of x and y values from objects.
Now, you can simply do the lookup like this
lookup[(x_value, y_value)]
or if you want to return a default value if the key is not found in the dictionary, then you can use the dictionary.get, like this
lookup.get((x_value, y_value), None)
This will not throw a KeyError, if the key is not found in the dictionary, but return None.
Is there a more efficient way for me to get my requested object?
The above suggested dictionary method will be very fast, because the dictionary lookup can happen in constant time (as they internally use hash tables), but searching the list will be in linear time complexity (we need to iterate the list and check elements one by one).

Datastructure choice issue

I'm new to Python. I need a data structure to contain a tuple of two elements: date and file path. I need to be able to change their values from time to time, hence I'm not sure a tuple is a good idea as it is immutable. Every time I need to change it I must create a new tuple and reference it, instead of really changing its values; so, we may have a memory issue here: a lot of tuples allocated.
On the other hand, I thought of a list , but a list isn't in fixed size, so the user may potentially enter more than 2 elements, which is not ideal.
Lastly, I would also want to reference each element in a reasonable name; that is, instead of list[0] (which maps to the date) and list[1] (which maps to the file path), I would prefer a readable solution, such as associative arrays in PHP:
tuple = array()
tuple['Date'] = "12.6.15"
tuple['FilePath] = "C:\somewhere\only\we\know"
What is the Pythonic way to handle such situation?
Sounds like you're describing a dictionary (dict)
# Creating a dict
>>> d = {'Date': "12.6.15", 'FilePath': "C:\somewhere\only\we\know"}
# Accessing a value based on a key
>>> d['Date']
'12.6.15'
# Changing the value associated with that key
>>> d['Date'] = '12.15.15'
# Displaying the representation of the updated dict
>>> d
{'FilePath': 'C:\\somewhere\\only\\we\\know', 'Date': '12.15.15'}
Why not use a dictionary. Dictionaries allow you to map a 'Key' to a 'Value'.
For example, you can define a dictionary like this:
dict = { 'Date' : "12.6.15", 'Filepath' : "C:\somewhere\only\we\know"}
and you can easily change it like this:
dict['Date'] = 'newDate'

In Python, when to use a Dictionary, List or Set?

When should I use a dictionary, list or set?
Are there scenarios that are more suited for each data type?
A list keeps order, dict and set don't: when you care about order, therefore, you must use list (if your choice of containers is limited to these three, of course ;-) ).
dict associates each key with a value, while list and set just contain values: very different use cases, obviously.
set requires items to be hashable, list doesn't: if you have non-hashable items, therefore, you cannot use set and must instead use list.
set forbids duplicates, list does not: also a crucial distinction. (A "multiset", which maps duplicates into a different count for items present more than once, can be found in collections.Counter -- you could build one as a dict, if for some weird reason you couldn't import collections, or, in pre-2.7 Python as a collections.defaultdict(int), using the items as keys and the associated value as the count).
Checking for membership of a value in a set (or dict, for keys) is blazingly fast (taking about a constant, short time), while in a list it takes time proportional to the list's length in the average and worst cases. So, if you have hashable items, don't care either way about order or duplicates, and want speedy membership checking, set is better than list.
Do you just need an ordered sequence of items? Go for a list.
Do you just need to know whether or not you've already got a particular value, but without ordering (and you don't need to store duplicates)? Use a set.
Do you need to associate values with keys, so you can look them up efficiently (by key) later on? Use a dictionary.
When you want an unordered collection of unique elements, use a set. (For example, when you want the set of all the words used in a document).
When you want to collect an immutable ordered list of elements, use a tuple. (For example, when you want a (name, phone_number) pair that you wish to use as an element in a set, you would need a tuple rather than a list since sets require elements be immutable).
When you want to collect a mutable ordered list of elements, use a list. (For example, when you want to append new phone numbers to a list: [number1, number2, ...]).
When you want a mapping from keys to values, use a dict. (For example, when you want a telephone book which maps names to phone numbers: {'John Smith' : '555-1212'}). Note the keys in a dict are unordered. (If you iterate through a dict (telephone book), the keys (names) may show up in any order).
Use a dictionary when you have a set of unique keys that map to values.
Use a list if you have an ordered collection of items.
Use a set to store an unordered set of items.
In short, use:
list - if you require an ordered sequence of items.
dict - if you require to relate values with keys
set - if you require to keep unique elements.
Detailed Explanation
List
A list is a mutable sequence, typically used to store collections of homogeneous items.
A list implements all of the common sequence operations:
x in l and x not in l
l[i], l[i:j], l[i:j:k]
len(l), min(l), max(l)
l.count(x)
l.index(x[, i[, j]]) - index of the 1st occurrence of x in l (at or after i and before j indeces)
A list also implements all of the mutable sequence operations:
l[i] = x - item i of l is replaced by x
l[i:j] = t - slice of l from i to j is replaced by the contents of the iterable t
del l[i:j] - same as l[i:j] = []
l[i:j:k] = t - the elements of l[i:j:k] are replaced by those of t
del l[i:j:k] - removes the elements of s[i:j:k] from the list
l.append(x) - appends x to the end of the sequence
l.clear() - removes all items from l (same as del l[:])
l.copy() - creates a shallow copy of l (same as l[:])
l.extend(t) or l += t - extends l with the contents of t
l *= n - updates l with its contents repeated n times
l.insert(i, x) - inserts x into l at the index given by i
l.pop([i]) - retrieves the item at i and also removes it from l
l.remove(x) - remove the first item from l where l[i] is equal to x
l.reverse() - reverses the items of l in place
A list could be used as stack by taking advantage of the methods append and pop.
Dictionary
A dictionary maps hashable values to arbitrary objects. A dictionary is a mutable object. The main operations on a dictionary are storing a value with some key and extracting the value given the key.
In a dictionary, you cannot use as keys values that are not hashable, that is, values containing lists, dictionaries or other mutable types.
Set
A set is an unordered collection of distinct hashable objects. A set is commonly used to include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
For C++ I was always having this flow chart in mind: In which scenario do I use a particular STL container?, so I was curious if something similar is available for Python3 as well, but I had no luck.
What you need to keep in mind for Python is: There is no single Python standard as for C++. Hence there might be huge differences for different Python interpreters (e.g. CPython, PyPy). The following flow chart is for CPython.
Additionally I found no good way to incorporate the following data structures into the diagram: bytes, byte arrays, tuples, named_tuples, ChainMap, Counter, and arrays.
OrderedDict and deque are available via collections module.
heapq is available from the heapq module
LifoQueue, Queue, and PriorityQueue are available via the queue module which is designed for concurrent (threads) access. (There is also a multiprocessing.Queue available but I don't know the differences to queue.Queue but would assume that it should be used when concurrent access from processes is needed.)
dict, set, frozen_set, and list are builtin of course
For anyone I would be grateful if you could improve this answer and provide a better diagram in every aspect. Feel free and welcome.
PS: the diagram has been made with yed. The graphml file is here
Although this doesn't cover sets, it is a good explanation of dicts and lists:
Lists are what they seem - a list of values. Each one of them is
numbered, starting from zero - the first one is numbered zero, the
second 1, the third 2, etc. You can remove values from the list, and
add new values to the end. Example: Your many cats' names.
Dictionaries are similar to what their name suggests - a dictionary.
In a dictionary, you have an 'index' of words, and for each of them a
definition. In python, the word is called a 'key', and the definition
a 'value'. The values in a dictionary aren't numbered - tare similar
to what their name suggests - a dictionary. In a dictionary, you have
an 'index' of words, and for each of them a definition. The values in
a dictionary aren't numbered - they aren't in any specific order,
either - the key does the same thing. You can add, remove, and modify
the values in dictionaries. Example: telephone book.
http://www.sthurlow.com/python/lesson06/
In combination with lists, dicts and sets, there are also another interesting python objects, OrderedDicts.
Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.
OrderedDicts could be useful when you need to preserve the order of the keys, for example working with documents: It's common to need the vector representation of all terms in a document. So using OrderedDicts you can efficiently verify if a term has been read before, add terms, extract terms, and after all the manipulations you can extract the ordered vector representation of them.
May be off topic in terms of the question OP asked-
List: A unhashsable collection of ordered, mutable objects.
Tuple: A hashable collection of ordered, immutable objects, like
list.
Set: An unhashable collection of unordered, mutable and distinct
objects.
Frozenset: A hashable collection of unordered, immutable and
distinct objects.
Dictionary : A unhashable,unordered collection of mutable objects
that maps hashable values to arbitrary values.
To compare them visually, at a glance, see the image-
Lists are what they seem - a list of values. Each one of them is numbered, starting from zero - the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats' names.
Tuples are just like lists, but you can't change their values. The values that you give it first up, are the values that you are stuck with for the rest of the program. Again, each value is numbered starting from zero, for easy reference. Example: the names of the months of the year.
Dictionaries are similar to what their name suggests - a dictionary. In a dictionary, you have an 'index' of words, and for each of them a definition. In python, the word is called a 'key', and the definition a 'value'. The values in a dictionary aren't numbered - tare similar to what their name suggests - a dictionary. In a dictionary, you have an 'index' of words, and for each of them a definition. In python, the word is called a 'key', and the definition a 'value'. The values in a dictionary aren't numbered - they aren't in any specific order, either - the key does the same thing. You can add, remove, and modify the values in dictionaries. Example: telephone book.
When use them, I make an exhaustive cheatsheet of their methods for your reference:
class ContainerMethods:
def __init__(self):
self.list_methods_11 = {
'Add':{'append','extend','insert'},
'Subtract':{'pop','remove'},
'Sort':{'reverse', 'sort'},
'Search':{'count', 'index'},
'Entire':{'clear','copy'},
}
self.tuple_methods_2 = {'Search':'count','index'}
self.dict_methods_11 = {
'Views':{'keys', 'values', 'items'},
'Add':{'update'},
'Subtract':{'pop', 'popitem',},
'Extract':{'get','setdefault',},
'Entire':{ 'clear', 'copy','fromkeys'},
}
self.set_methods_17 ={
'Add':{['add', 'update'],['difference_update','symmetric_difference_update','intersection_update']},
'Subtract':{'pop', 'remove','discard'},
'Relation':{'isdisjoint', 'issubset', 'issuperset'},
'operation':{'union' 'intersection','difference', 'symmetric_difference'}
'Entire':{'clear', 'copy'}}
Dictionary: A python dictionary is used like a hash table with key as index and object as value.
List: A list is used for holding objects in an array indexed by position of that object in the array.
Set: A set is a collection with functions that can tell if an object is present or not present in the set.
Dictionary: When you want to look up something using something else than indexes. Example:
dictionary_of_transport = {
"cars": 8,
"boats": 2,
"planes": 0
}
print("I have the following amount of planes:")
print(dictionary_of_transport["planes"])
#Output: 0
List and sets: When you want to add and remove values.
Lists: To look up values using indexes
Sets: To have values stored, but you cannot access them using anything.

Categories