Data Structure Recommendation, Nuclides - python

I'm writing a program to do various calculations involving nuclides. Some of these involve binding energies, magnetic moments, etc. Part of the program will need to be the storing of some dictionary, list, or something I'm unaware of as a novice Python programmer. I'd like to (by hand) create a set that contains Z, N, masses etc. Specifically, I'd like a structure that has multiple traits to a piece. I've thought of making a nested dictionary (maybe calling an attribute, nuclides[C14[attribute]]), but don't think this is intuitive. Here's the trickiest part, I'd like nuclides to referable by either Z and N, and by a string (e.g. nuclides['14C'] or nuclides[6,8]). As far as I know, dictionaries are only referenced by their label, so I'm not sure if dictionaries are ideal.
TL;DR
What's the best format for storing numerous of sets of integers/floats and a unique string where each set can be referenced by either it's string or pair of numbers.
An example of application, if say given 238Pu, finding the daughter nuclide and it's mass (both of which are in this table/data) from alpha decay.

Related

Are nested sets and dictionaries anti-pattern in Python?

I need to create a sort of similarity matrix based on user_id values. I am currently using Pandas to store the majority of my data, but I know that iteration is very anti-pattern, so I am considering creating a set/dictionary nest to store the similarities, similar to some of the proposed structures here
I would only be storing N nearest similarities, so it would amount to something like this:
{
'user_1' : {'user_2':0.5, 'user_4':0.9, 'user_3':1.0},
'user_2' : ...
}
It would be allowing me to access a neighbourhood by doing dict_name[user_id] quite easily.
Essentially the outermost dictionary key would hold a user_id which returns another dictionary of its N closest neighbours with user_id- similarity_value key-value sets.
For more context, I'm just writing a simple KNN recommender. I am doing it from scratch as I've tried using Surpriselib and sklearn but they don't have the context-aware flexibility I require.
This seems like a reasonable way to store these values to me, but is it very anti-pythonic, or should I be looking to do this using some other structures (e.g. NumPy or Pandas or something else I don't yet know about)?
As the comment says, there is nothing inherently wrong or anti-pythonic with using (one level of) nested dictionaries and writing everything from scratch.
Performance-wise you can probably beat your self-written solution if you use an existing data structure whose API works well with the transformations/operations you intend to perform on them. Numpy/Pandas only will help if your operations can be expressed as vectorized operations that operate on all (pairs of) elements along a common axis, e.g. all users in your top-level dictionary.

How does one let a for-loop add a number to the end of a variable name

So basically I'm trying to build a calculator using Python. I understand this might not be the most optimal, but I love Python and wanted a challenge. I'm currently trying to figure out a way to make an "infinite" amount of numbers and operators possible, so I would like to know whether it is possible for example to add a number after variable n making n1, n2, n3 and a number after operator op making op1, op2, op3 etc.
If there are better ways to do that, that'd be great, as I'm still relatively new when it comes to Python.
If there are better ways to do that that'd be great as I'm still relatively new when it comes to python.
Yes. Lists. Or dicts, but probably lists. Possibly deque (double-ended queue) depending on the exact way you want to use it (it's quite inefficient to add or remove items from the front of a normal list, especially as the size of the list increases). Wanting to create variable names dynamically is a huge red flag.
I'm currently trying to figure out away to make an "infinite" amount
Look up iterators and generators, as well as itertools. These are tools to lazily build and compose infinite sequences of things.
Though I don't quite get why you'd want that for a calculator, unless you want to generate a stream of data and operations for testing? In which case I'd suggest taking a gander at hypothesis.
What you asked for (NB: not what you need)
so I would like to know whether it is possible for example to add a number after variable n making n1, n2, n3 and a number after operator op making op1, op2, op3 etc.
By inserting into the dictionary of global variables returned by globals(), you can actually create global variables, thereby allowing you to use string manipulation for their names:
globals_dict = globals()
for i in range(50):
# As an example, let's put i squared into variable n{i},
# i.e., 0 squared (= 0) into n0, 1 squared (= 1) into n1,
# 2 squared (= 4) into n2, etc.
globals_dict[f"n{i}"] = i**2
assert n11 == 121 # Check that n11 is actually 11 squared, i.e., 121.
In some implementations of Python, this even works for the dictionary of local variables returned by locals() for creating local variables, but other than with globals() that behavior is unspecified and should not be relied on.
For using these variables without hard-coding their names you'd have to access globals() or globals_dict again, which is entirely impractical.
What you actually need
There's a pythonic (i.e., idiomatic in Python) way to put multiple values into a single variable: Collection types. The most general purpose (and thus most commonly used) ones are built into the language itself:
dict
list
set
tuple
More collection types are available in the standard library module collections.
If you are unsure how to use the built-in collections, going through the Python tutorial in the official Python documentation would probably be a good idea.

Python Array Data Structure with History

I recently needed to store large array-like data (sometimes numpy, sometimes key-value indexed) whose values would be changed over time (t=1 one element changes, t=2 another element changes, etc.). This history needed to be accessible (some time in the future, I want to be able to see what t=2’s array looked like).
An easy solution was to keep a list of arrays for all timesteps, but this became too memory intensive. I ended up writing a small class that handled this by keeping all data “elements” in a dict with each element represented by a list of (this_value, timestamp_for_this_value). that let me recreate things for arbitrary timestamps by looking for the last change before some time t, but it was surely not as efficient as it could have been.
Are there data structures available for python that have these properties natively? Or some sort of class of data structure meant for this kind of thing?
Have you considered writing a log file? A good use of memory would be to have the arrays contain only the current relevant values but build in a procedure where the update statement could trigger a logging function. This function could write to a text file, database or an array/dictionary of some sort. These types of audit trails are pretty common in the database world.

More efficient use of dictionaries

I'm going to store on the order of 10,000 securities X 300 date pairs X 2 Types in some caching mechanism.
I'm assuming I'm going to use a dictionary.
Question Part 1:
Which is more efficient or Faster? Assume that I'll be generally looking up knowing a list of security IDs and the 2 dates plus type. If there is a big efficiency gain by tweaking my lookup, I'm happy to do that. Also assume I can be wasteful of memory to an extent.
Method 1: store and look up using keys that look like strings "securityID_date1_date2_type"
Method 2: store and look up using keys that look like tuples (securityID, date1, date2, type)
Method 3: store and look up using nested dictionaries of some variation mentioned in methods 1 and 2
Question Part 2:
Is there an easy and better way to do this?
It's going to depend a lot on your use case. Is lookup the only activity or will you do other things, e.g:
Iterate all keys/values? For simplicity, you wouldn't want to nest dictionaries if iteration is relatively common.
What about iterating a subset of keys with a given securityID, type, etc.? Nested dictionaries (each keyed on one or more components of your key) would be beneficial if you needed to iterate "keys" with one component having a given value.
What about if you need to iterate based on a different subset of the key components? If that's the case, plain dict is probably not the best idea; you may want relational database, either the built-in sqlite3 module or a third party module for a more "production grade" DBMS.
Aside from that, it matters quite a bit how you construct and use keys. Strings cache their hash code (and can be interned for even faster comparisons), so if you reuse a string for lookup having stored it elsewhere, it's going to be fast. But tuples are usually safer (strings constructed from multiple pieces can accidentally produce the same string from different keys if the separation between components in the string isn't well maintained). And you can easily recover the original components from a tuple, where a string would need to be parsed to recover the values. Nested dicts aren't likely to win (and require some finesse with methods like setdefault to populate properly) in a simple contest of lookup speed, so it's only when iterating a subset of the data for a single component of the key that they're likely to be beneficial.
If you want to benchmark, I'd suggest populating a dict with sample data, then use the timeit module (or ipython's %timeit magic) to test something approximating your use case. Just make sure it's a fair test, e.g. don't lookup the same key each time (using itertools.cycle to repeat a few hundred keys would work better) since dict optimizes for that scenario, and make sure the key is constructed each time, not just reused (unless reuse would be common in the real scenario) so string's caching of hash codes doesn't interfere.

Best data structures for 3D array in Python

I am developing a moving average filter for position "tracks" in Touch Designer, which implements a Python runtime. I'm new to Python and I'm unclear on the best data structures to use. The pcode is roughly:
Receive a new list of tracks formatted as:
id, posX, posY
2001, 0.54, 0.21
2002, 0.43, 0.23
...
Add incoming X and Y values to an existing data structure keyed on "id," if that id is already present
Create new entries for new ids
Remove any id entries that are not present in the incoming data
Return a moving average of X and Y values per id
Question: Would it be a good idea to do this as a hashtable where the key is id, and the values are a list of lists? eg:
ids = {2001: posVals, 2002: posVals2}
where posVals is a list of [x,y] pairs?
I think this is like a 3D array, but where I want to use id as a key for lots of operations.
Thanks
First, if the IDs are all relatively small positive integers, and the array is almost completely dense (that is, almost all IDs will exist), a dict is just adding extra overhead. On the other hand, if there are large numbers, or large gaps, or keys of different types, using a dict as a "sparse array" makes perfect sense.
Meanwhile, for the other two dimensions… how many IDs are you going to have, and how many pairs for each? If we're talking a handful of pairs per ID and a few thousands total pairs across all IDs, a list of pairs per ID is perfectly fine (although I'd probably represent each pair as a tuple rather than a list), and anything else would be adding unnecessary complexity. But if there are going to be a lot of pairs per ID, or a lot total, you may run into problems with storage, or performance.
If you can use the third-party library numpy, it can store a 2D array of numbers in much less memory than a list of pairs of numbers, and it can perform calculations like moving averages with much briefer/more readable code, and with much less CPU time. In fact, it can even store a sparse 3D array for you.
If you can only use the standard library, the array module can get you most of the same memory benefits, but without the simplicity benefits (in fact, your code becomes slightly more complex, as you have to represent the 2D array as a 1D array, old-school-C-style—although you can wrap that up pretty easily), or the time performance benefits, and it can't help with the sparseness.
Yes, this is how I would do it. It's very intuitive this way, assuming you're always looking things up by their id and don't need to sort in some other way.
Also, the terminology in Python is dict (as in dictionary), rather than hashtable.

Categories