use string content as variable - python - python

I am using a package that has operations inside the class (? not sure what either is really), and normally the data is called this way data[package.operation]. Since I have to do multiple operations thought of shortening it and do the following
list =["o1", "o2", "o3", "o4", "o5", "o6"]
for i in list:
print data[package.i]
but since it's considering i as a string it doesnt do the operation, and if I take away the string then it is an undefined variable. Is there a way to go around this? Or will I just have to write it the long way?.
In particular I am using pymatgen, its package Orbital and with the .operation I want to call specific suborbitals. A real example of how it would be used is data[0][Orbital.s], the first [0] denotes the element in question for which to get the orbitals s (that's why I omitted it in the code above).

You can use getattr in order to dynamically select attributes from objects (the Orbital package in your case; for example getattr(Orbital, 's')).
So your loop would be rewritten to:
for op in ['o1', 'o2', 'o3', 'o4', 'o5', 'o6']:
print(data[getattr(package, op)])

Related

What are the advantages of using column objects instead of strings in PySpark

In PySpark one can use column objects and strings to select columns. Both ways return the same result. Is there any difference? When should I use column objects instead of strings?
For example, I can use a column object:
import pyspark.sql.functions as F
df.select(F.lower(F.col('col_name')))
# or
df.select(F.lower(df['col_name']))
# or
df.select(F.lower(df.col_name))
Or I can use a string instead and get the same result:
df.select(F.lower('col_name'))
What are the advantages of using column objects instead of strings in PySpark
Read this PySpark style guide from Palantir here which explains when to use F.col() and not and best practices.
Git Link here
In many situations the first style can be simpler, shorter and visually less polluted. However, we have found that it faces a number of limitations, that lead us to prefer the second style:
If the dataframe variable name is large, expressions involving it quickly become unwieldy;
If the column name has a space or other unsupported character, the bracket operator must be used instead. This generates inconsistency, and df1['colA'] is just as difficult to write as F.col('colA');
Column expressions involving the dataframe aren't reusable and can't be used for defining abstract functions;
Renaming a dataframe variable can be error-prone, as all column references must be updated in tandem.
Additionally, the dot syntax encourages use of short and non-descriptive variable names for the dataframes, which we have found to be harmful for maintainability. Remember that dataframes are containers for data, and descriptive names is a helpful way to quickly set expectations about what's contained within.
By contrast, F.col('colA') will always reference a column designated colA in the dataframe being operated on, named df, in this case. It does not require keeping track of other dataframes' states at all, so the code becomes more local and less susceptible to "spooky interaction at a distance," which is often challenging to debug.
It depends on how the functions are implemented in Scala.
In scala, the signature of the function is part of the function itself. For example, func(foo: str) and func(bar: int) are two different functions and Scala can make the difference whether you call one or the other depending on the type of argument you use.
F.col('col_name')), df['col_name'] and df.col_name are the same type of object, a column. It is almost the same to use one syntax or another. A little difference is that you could write for example :
df_2.select(F.lower(df.col_name)) # Where the column is from another dataframe
# Spoiler alert : It may raise an error !!
When you call df.select(F.lower('col_name')), if the function lower(smth: str) is not defined in Scala, then you will have an error. Some functions are defined with str as input, others take only columns object. Try it to know if it works and then uses it. otherwise, you can make a pull request on the spark project to add the new signature.

Solutions for a Dynamic Infinite Tree Structure in Python

I am trying to build a Tree Structure, starting at a point 1, which can branch into infinte directions. Every point can path into infinite other points ( 1.1, 1.2, 1.3, ... ) and each of those points can also path into infinite points (1.1.1, 1.2.1, 1.2.2, ...).
My plan was to store an Object at every point and be able to refer to them by a position 1.1.1 etc. Also i decided to generate every point dynamically, so the Tree starts at 1 and only branches when an Object is created.
Since i tend to overcomplicate things i used a nested Dictionary, so i could refer to a object by using dict[1][1]["data"], but i'm struggling with the use of an infinite nested Dictionary:
How do i use a Dictionary if the amount of "[1]" varies? (think dict[1][1][1]....[1]["data"]).
I can simply loop through the dict to find the data, like
for i in [1.1.1]:
point = dict[i]
But i can't find a way to open new dictionary branches, or store data, when the amount of "[1]" is unknown.
Basically, I want to know if a simpler solution exists and how to deal with too many nested "[]" brackets.
You might want a different way of retrieving values than using [], since as you said it's hard to do when you don't know how deep something is.
Instead you can use a simple recursive function, and use a list for your key instead of a string:
def fetch_field(subtree, key_list):
if not key_list:
return subtree["data"]
return fetch_field(subtree[key_list[0]], key_list[1:])
key = "1.2.1.3"
# Instead of using a string, split it into a list:
key = key.split(".")
fetch_field(tree, key)
You can tweak the function to accept a string instead of an array if you like, I personally prefer working with a list instead of messing around with strings.

How do I convert a GP to a string and back again using DEAP in Python?

I'm doing a project in Genetic Programming and I need to be able to convert a genetic program (of class deap.creator.Individual) to string, change some things (while keeping the problem 100% syntactically aligned with DEAP), and then put it back into a population of individuals for further evolution.
However, I've only been able to convert my string back to class gp.PrimitiveTree using the from_string method.
The only constructors for creator.Individual I see generate entire populations blindly or construct an Individual from an existing Individual/s. No methods to only create one individual from an existing gp.PrimitiveTree.
So, does anybody have any idea how I go about that?
Note: Individual is self-defined, but it is standard across all DEAP examples and is created using
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMax)
After many many hours I believe I've figured this out.
So, I'd become confused between two of the DEAP modules: 'creator' and 'toolbox'.
In order for me to create an individual with a given PrimitiveTree I simply needed to do:
creator.Individual(myPrimativeTree)
What you do not do is:
toolbox.individual(myPrimativeTree)
as that usually gets setup as the initialiser itself, and thus doesn't take arguments.
I hope that this can save somebody a decent chunk of time at some point in the future.
Individual to string: str(individual)
In order to create an Individual from a string: Primitive Tree has class method from_string:
https://deap.readthedocs.io/en/master/api/gp.html#deap.gp.PrimitiveTree.from_string
In your Deap evolution, to create an individual from string, you can try something like(note use of creator vs toolbox):
creator.Individual.from_string("add(IN1, IN2)", pset)
But the individual expression, as a string, needs to be such as it would if you did str(individual), aka stick to your pset when creating your string. So in my above example string I believe you would need to have a pset similar to:
pset = gp.PrimitiveSetTyped("MAIN", [float]*2, float, "IN")
pset.addPrimitive(operator.add, [float,float], float)

Using list elements as variable names in Python

I'm guessing this will be a really simple problem but I have no solution yet.
I have a long code that does modelling and updates values of variables for optimisation. The code is initially written like this:
init_old(x,y):
return {(k):olddict[k][x][0]*prod[y] for k in realnames}
Q_house=init_old(“Q_house”,"P_house")
Q_car=init_old(“Q_car”,"P_car")
Q_holiday=init_old(“Q_holiday”,"P_holiday")
I already can simplify it a bit with a comprehension:
ListOfExpenses=["house","car","holiday"]
Q_house, Q_car, Q_holiday=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
I am trying to find an equivalent but more flexible way of writing that final line, so that I change the list of Expenses and the "Q_..." variables together easily:
ListOfExpenses=["house","car","holiday"]
ListOfCost=["Q_house","Q_car","Q_holiday"]
Elements_Of_ListOfCost=[init_old(“Q_”+i,"P0_"+i) for i in ListOfExpenses]
So when I look for Q_house, Q_car or Q_holiday later, it returns the same Q_house=init_old(“Q_house”,"P_house") calculated in the original code.
I don't want to use dictionaries for now as they would require major change to the rest of the code and calling dictionaries causes problems in some of the other functions. Thanks in advance for the help.

array vs hash key search

So I'm a longtime perl scripter who's been getting used to python since I changed jobs a few months back. Often in perl, if I had a list of values that I needed to check a variable against (simply to see if there is a match in the list), I found it easier to generate hashes to check against, instead of putting the values into an array, like so:
$checklist{'val1'} = undef;
$checklist{'val2'} = undef;
...
if (exists $checklist{$value_to_check}) { ... }
Obviously this wastes some memory because of the need for a useless right-hand value, but IMO is more efficients and easier to code than to loop through an array.
Now in python, the code for this is exactly the same no matter if you're searching an list or a dictionary:
if value_to_check in checklist_which_can_be_list_or_dict:
<code>
So my real question here is: in perl, the hash method was preferred for speed of processing vs. iterating through an array, but is this true in python? Given the code is the same, I'm wondering if python does list iteration better? Should I still use the dictionary method for larger lists?
Dictionaries are hashes. An in test on a list has to walk through every element to check it against, while an in test on a dictionary uses hashing to see if the key exists. Python just doesn't make you explicitly loop through the list.
Python also has a set datatype. It's basically a hash/dictionary without the right-hand values. If what you want is to be able to build up a collection of things, then test whether something is already in that collection, and you don't care about the order of the things or whether a thing is in the collection multiple times, then a set is exactly what you want!

Categories