Datastructure choice issue

Datastructure choice issue - python

I'm new to Python. I need a data structure to contain a tuple of two elements: date and file path. I need to be able to change their values from time to time, hence I'm not sure a tuple is a good idea as it is immutable. Every time I need to change it I must create a new tuple and reference it, instead of really changing its values; so, we may have a memory issue here: a lot of tuples allocated.
On the other hand, I thought of a list , but a list isn't in fixed size, so the user may potentially enter more than 2 elements, which is not ideal.
Lastly, I would also want to reference each element in a reasonable name; that is, instead of list[0] (which maps to the date) and list[1] (which maps to the file path), I would prefer a readable solution, such as associative arrays in PHP:
tuple = array()
tuple['Date'] = "12.6.15"
tuple['FilePath] = "C:\somewhere\only\we\know"
What is the Pythonic way to handle such situation?

Sounds like you're describing a dictionary (dict)
# Creating a dict
>>> d = {'Date': "12.6.15", 'FilePath': "C:\somewhere\only\we\know"}
# Accessing a value based on a key
>>> d['Date']
'12.6.15'
# Changing the value associated with that key
>>> d['Date'] = '12.15.15'
# Displaying the representation of the updated dict
>>> d
{'FilePath': 'C:\\somewhere\\only\\we\\know', 'Date': '12.15.15'}

Why not use a dictionary. Dictionaries allow you to map a 'Key' to a 'Value'.
For example, you can define a dictionary like this:
dict = { 'Date' : "12.6.15", 'Filepath' : "C:\somewhere\only\we\know"}
and you can easily change it like this:
dict['Date'] = 'newDate'

Related

python : Sorting a list of dict basing on an element which is time string

I have a list of dicts. Among other elements, each dict in the list has a date and time element which looks like - 2018-08-14T14:42:14. I have written a function which compares two such strings and returns the most recent one. How do I use this to sort the list (most recent first)? Also, each dict is quite big in size hence, if possible, I would like to get the indices of array sorted (according to the time element) rather than the whole array. I have seen other similar questions on this site but all of them tell about sorting basing on a known data type like int or string.

The dates written in ISO format have one nice property - if you sort them alphabetically, you sort them according to date values also (if the belong to one timezone, of course). Just use list.sort() function to do that. That will sort list in-place. Anyway you should not worry about memory, since creating the second sorted list will not take much memory since it holds references to dictionaries in the first list.
a = [
{'time': '2018-01-02T00:00:00Z'},
{'time': '2018-01-01T00:00:00Z'},
]
a.sort(key=lambda x: x['time'])
print(a)

We sort on the time by converting it to a python datetime object, which has natural ordering like int. So, you need not worry about the format of the time string.
# l is the list of the dicts, each dict contains a key "time".
l = sorted(l, key=lambda x: datetime.datetime.strptime(x["time"], '%Y-%m-%dT%H:%M:%S'))

Create a list of an inner value from a dict of dicts

I am trying to figure out the max and min values for an inner value of a dict of dicts.
The dict looks like this:
{'ALLEN PHILLIP K': {'bonus': 4175000,
'exercised_stock_options': 1729541,
'expenses': 13868},
'BADUM JAMES P': {'bonus': 'NaN',
'exercised_stock_options': 257817,
'expenses': 3486},
...
}
I want to figure out the minimum and maximum exercised_stock_options across all dictionaries.
I tried using pandas to do this, but couldn't find a way to shape the data appropriately. Then, I tried a simple for-loop in Python. My code for the for-loop doesn't work, and I can't figure out why (the dict of dicts is called data_dict):
stock_options=[]
for person in range(len(data_dict)):
stock_options.append(data_dict[person]['exercised_stock_options'])
print stock_options
Then I was going to take the max and min values of the list.
Any idea why this code doesn't work? Any alternative methods for figuring out the max and min of an inner value of a dict of dicts?

Here's a method that uses a list comprehension to get the exercised_stock_options from each dictionary and then prints out the minimum and maximum value from the data. Ignore the sample data, and you can modify it to suit your needs.
d = {'John Smith':{'exercised_stock_options':99},
'Roger Park':{'exercised_stock_options':50},
'Tim Rogers':{'exercised_stock_options':10}}
data = [d[person]['exercised_stock_options'] for person in d]
print min(data), max(data)

You are using range to get an index number for your main dictionary. What you really should do is get the keys for the dictionary and not the index. That is, person is the name of each one. Thus when person == 'ALLEN PHILLIP K' datadict[person] now gets the dictionary for that key.
Note that the Use items() to iterate across dictionary says that it is better to use d, v = data_dict.items() rather than looping over the dictionary itself. Also note the difference between Python 2 and Python 3.
people=[]
stock_options=[]
for person, stock_data in data_dict.items():
people.append(person)
stock_options.append(stock_data['exercised_stock_options'])
# This lets you keep track of the people as well for future use
print stock_options
mymin = min(stock_options)
mymax = max(stock_options)
# process min and max values.
Best-practice
Use items() to iterate across dictionary
The updated code below demonstrates the Pythonic style for iterating
through a dictionary. When you define two variables in a for loop in
conjunction with a call to items() on a dictionary, Python
automatically assigns the first variable as the name of a key in that
dictionary, and the second variable as the corresponding value for
that key.
d = {"first_name": "Alfred", "last_name":"Hitchcock"}
for key,val in d.items():
print("{} = {}".format(key, val))
Difference Python 2 and Python 3
In python 2.x the above examples using items would return a list with
tuples containing the copied key-value pairs of the dictionary. In
order to not copy and with that load the whole dictionary’s keys and
values inside a list to the memory you should prefer the iteritems
method which simply returns an iterator instead of a list. In Python
3.x the iteritems is removed and the items method returns view objects. The benefit of these view objects compared to the tuples
containing copies is that every change made to the dictionary is
reflected in the view objects.

You need to iterate your dictionary .values() and return the value of "exercised_stock_options". You can use a simple list comprehensions to retrieve those values
>>> values = [value['exercised_stock_options'] for value in d.values()]
>>> values
[257817, 1729541]
>>> min(values)
257817
>>> max(values)
1729541

I've released lifter a few weeks ago exactly for these kind of tasks, I think you may find it useful.
The only problem here is that you have a mapping (a dict of dicts) instead of a regular iterable.
Here is an answer using lifter:
from lifter.models import Model
# We create a model representing our data
Person = Model('Person')
# We convert your data to a regular iterable
iterable = []
for name, data in your_data.items():
data['name'] = name
iterable.append(data)
# we load this into lifter
manager = Person.load(iterable)
# We query the data
results = manager.aggregate(
(Person.exercised_stock_options, min),
(Person.exercised_stock_options, max),
)
You can of course achieve the same result using list comprehensions, however, it's sometimes handy to use a dedicated library, especially if you want to filter data using complex queries before fetching your results. For example, you could get your min and max value only for people with less than 10000 expenses:
# We filter the data
queryset = manager.filter(Person.expenses < 10000)
# we apply our aggregate on the filtered queryset
results = queryset.aggregate(
(Person.exercised_stock_options, min),
(Person.exercised_stock_options, max),
)

How to store an array in a dictionary using python

I am currently attempting to modify a series of programs by utilizing dictionaries as opposed to arrays. I have columns of raw information in a file, which is then read into an ASCII csv file. I need to convert this file into a dictionary, so that it can be fed into another program.
I used a numpy.genfromtxt to pull out the information i need from the csv file, following this format:
a,b,c,d = np.genfromtxt("file",delimiter = ',', unpack = true)
this step works completely fine.
I then attempt to build a dictionary:
ouputDict = dict([a,a],[b,b],[c,c],[d,d])
As i understand it, this should make the key "a" in the dictionary a correspond to the array "a".
thus if:
a = [1,2,3,4]
then:
outputDict[a][0] = 1
However, when i attempt to create this dictionary i receive the following error:
TypeError: unhashable type: 'numpy.ndarray'
Why can't I construct an array in this fashion and what is the workaround, if any? Any help will be greatly appreciated!

You can do this even with using collections
Declare your dictionary as:
Dictionary = {}; // {} makes it a key, value pair dictionary
add your value for which you want an array as a key by declaring
Dictionary[a] = [1,2,3,4]; // [] makes it an array
So now your dictionary will look like
{a: [1,2,3,4]}
Which means for key a, you have an array and you can insert data in that which you can access like dictionary[a][0] which will give the value 1 and so on. :)
Btw.. If you look into examples of a dictionary, array and key value pairs, nested dictionary, your concept will get clearer.

Copied from my comment:
Correct dictionary formats:
{'a':a, 'b':b,...}, or
dict(a=a, b=b,...)
dict([('a', a), ('b', b),...])
The goal is to make the strings 'a','b',etc the keys, not the variable values.

How to declare and add items to an array in Python?

I'm trying to add items to an array in python.
I run
array = {}
Then, I try to add something to this array by doing:
array.append(valueToBeInserted)
There doesn't seem to be a .append method for this. How do I add items to an array?

{} represents an empty dictionary, not an array/list. For lists or arrays, you need [].
To initialize an empty list do this:
my_list = []
or
my_list = list()
To add elements to the list, use append
my_list.append(12)
To extend the list to include the elements from another list use extend
my_list.extend([1,2,3,4])
my_list
--> [12,1,2,3,4]
To remove an element from a list use remove
my_list.remove(2)
Dictionaries represent a collection of key/value pairs also known as an associative array or a map.
To initialize an empty dictionary use {} or dict()
Dictionaries have keys and values
my_dict = {'key':'value', 'another_key' : 0}
To extend a dictionary with the contents of another dictionary you may use the update method
my_dict.update({'third_key' : 1})
To remove a value from a dictionary
del my_dict['key']

If you do it this way:
array = {}
you are making a dictionary, not an array.
If you need an array (which is called a list in python ) you declare it like this:
array = []
Then you can add items like this:
array.append('a')

Arrays (called list in python) use the [] notation. {} is for dict (also called hash tables, associated arrays, etc in other languages) so you won't have 'append' for a dict.
If you actually want an array (list), use:
array = []
array.append(valueToBeInserted)

Just for sake of completion, you can also do this:
array = []
array += [valueToBeInserted]
If it's a list of strings, this will also work:
array += 'string'

In some languages like JAVA you define an array using curly braces as following but in python it has a different meaning:
Java:
int[] myIntArray = {1,2,3};
String[] myStringArray = {"a","b","c"};
However, in Python, curly braces are used to define dictionaries, which needs a key:value assignment as {'a':1, 'b':2}
To actually define an array (which is actually called list in python) you can do:
Python:
mylist = [1,2,3]
or other examples like:
mylist = list()
mylist.append(1)
mylist.append(2)
mylist.append(3)
print(mylist)
>>> [1,2,3]

You can also do:
array = numpy.append(array, value)
Note that the numpy.append() method returns a new object, so if you want to modify your initial array, you have to write: array = ...

Isn't it a good idea to learn how to create an array in the most performant way?
It's really simple to create and insert an values into an array:
my_array = ["B","C","D","E","F"]
But, now we have two ways to insert one more value into this array:
Slow mode:
my_array.insert(0,"A") - moves all values to the right when entering an "A" in the zero position:
"A" --> "B","C","D","E","F"
Fast mode:
my_array.append("A")
Adds the value "A" to the last position of the array, without touching the other positions:
"B","C","D","E","F", "A"
If you need to display the sorted data, do so later when necessary. Use the way that is most useful to you, but it is interesting to understand the performance of each method.

I believe you are all wrong. you need to do:
array = array[] in order to define it, and then:
array.append ["hello"] to add to it.

Organizing a random list of objects in Python

So I have a list that I want to convert to a list that contains a list for each group of objects.
ie
['objA.attr1', 'objC', 'objA.attr55', 'objB.attr4']
would return
[['objA.attr1', 'objA.attr55'], ['objC'], ['objB.attr4']]
currently this is what I use:
givenList = ['a.attr1', 'b', 'a.attr55', 'c.attr4']
trgList = []
objNames = []
for val in givenList:
obj = val.split('.')[0]
if obj in objNames:
id = objNames.index(obj)
trgList[id].append(val)
else:
objNames.append(obj)
trgList.append([val])
#print trgList
It seems to run a decent speed when the original list has around 100,000 ids... but I am curious if there is a better way to do this. Order of the objects or attributes does not matter. Any ideas?

This needs to be better defined: what do you do when there is no property? What order do you want the final list as? What about duplicates?
A general algorithm would be to use a multi-map: a map that has multiple values per key.
You will then scan through the original list, separate each element into an "object" and "property", and then add a key-value pair for the object and property. At the end of this cycle, you will have a mapping from objects to set of properties. You can then iterate over this to build your final list.
You can use a third-party multimap or implement yourself by mapping into a sequence.
You might want to create a dummy property for cases when the object does not have a property.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Datastructure choice issue - python

Why not use a dictionary. Dictionaries allow you to map a 'Key' to a 'Value'. For example, you can define a dictionary like this: dict = { 'Date' : "12.6.15", 'Filepath' : "C:\somewhere\only\we\know"} and you can easily change it like this: dict['Date'] = 'newDate'

Related

python : Sorting a list of dict basing on an element which is time string

Create a list of an inner value from a dict of dicts

How to store an array in a dictionary using python

How to declare and add items to an array in Python?

Organizing a random list of objects in Python

Categories

Resources