String representation of attribute in Python - python

Is there a way to get the string representation of a single NamedTuple (or perhaps another type's) attribute?
The snippet below instantiates a pandas.Series indexing a value using a str key matching where the value came from.
If you later want to change the name of attr_1 to some_useful_attribute_name, and want this reflected in the pandas series index, the code below requires two changes.
import pandas as pd
from typing import NamedTuple
class MyTuple(NamedTuple):
attr_1: int
attr_2: float
mt = MyTuple(1, 1)
ser = pd.Series(data=[mt.attr_1], index=['attr_1'])
People often miss the second change when the series instantiation is far away from the class definition.
Refactoring tools such as Pycharm's help somewhat, as they can identify all the attr_1 strings. However, if instead of attr_1 the string is more common, such as level or whatever else, it becomes rather tedious to identify the correct strings to change.
Instead of the last line of code above I'd therefore like to do something like
ser = pd.Series(data=[mt.attr_1], index=[repr(mt.attr_1)])
except that of course repr() doesn't give me the name of the attribute, but rather the string representation of its contents.
I am also aware of the NamedTuple._asdict() method. However the use case here is one where only selected attributes go into the series rather than the entire dictionary.
EDIT:
Might something be achieved by combining Enum and NamedTuple in a clever way?
Thanks for the help.

Related

What are the advantages of using column objects instead of strings in PySpark

In PySpark one can use column objects and strings to select columns. Both ways return the same result. Is there any difference? When should I use column objects instead of strings?
For example, I can use a column object:
import pyspark.sql.functions as F
df.select(F.lower(F.col('col_name')))
# or
df.select(F.lower(df['col_name']))
# or
df.select(F.lower(df.col_name))
Or I can use a string instead and get the same result:
df.select(F.lower('col_name'))
What are the advantages of using column objects instead of strings in PySpark
Read this PySpark style guide from Palantir here which explains when to use F.col() and not and best practices.
Git Link here
In many situations the first style can be simpler, shorter and visually less polluted. However, we have found that it faces a number of limitations, that lead us to prefer the second style:
If the dataframe variable name is large, expressions involving it quickly become unwieldy;
If the column name has a space or other unsupported character, the bracket operator must be used instead. This generates inconsistency, and df1['colA'] is just as difficult to write as F.col('colA');
Column expressions involving the dataframe aren't reusable and can't be used for defining abstract functions;
Renaming a dataframe variable can be error-prone, as all column references must be updated in tandem.
Additionally, the dot syntax encourages use of short and non-descriptive variable names for the dataframes, which we have found to be harmful for maintainability. Remember that dataframes are containers for data, and descriptive names is a helpful way to quickly set expectations about what's contained within.
By contrast, F.col('colA') will always reference a column designated colA in the dataframe being operated on, named df, in this case. It does not require keeping track of other dataframes' states at all, so the code becomes more local and less susceptible to "spooky interaction at a distance," which is often challenging to debug.
It depends on how the functions are implemented in Scala.
In scala, the signature of the function is part of the function itself. For example, func(foo: str) and func(bar: int) are two different functions and Scala can make the difference whether you call one or the other depending on the type of argument you use.
F.col('col_name')), df['col_name'] and df.col_name are the same type of object, a column. It is almost the same to use one syntax or another. A little difference is that you could write for example :
df_2.select(F.lower(df.col_name)) # Where the column is from another dataframe
# Spoiler alert : It may raise an error !!
When you call df.select(F.lower('col_name')), if the function lower(smth: str) is not defined in Scala, then you will have an error. Some functions are defined with str as input, others take only columns object. Try it to know if it works and then uses it. otherwise, you can make a pull request on the spark project to add the new signature.

What is the Python equivalent for the R function names( )?

The function names() in R gets or sets the names of an object. What is the Python equivalent to this function, including import?
Usage:
names(x)
names(x) <- value
Arguments:
(x) an R object.
(value) a character vector of up to the same length as x, or NULL.
Details:
Names() is a generic accessor function, and names<- is a generic replacement function. The default methods get and set the "names" attribute of a vector (including a list) or pairlist.
Continue R Documentation on Names( )
In Python (pandas) we have .columns function which is equivalent to names() function in R:
Ex:
# Import pandas package
import pandas as pd
# making data frame
data = pd.read_csv("Filename.csv")
# Extract column names
list(data.columns)
not sure if there is anything directly equivalent, especially for getting names. some objects, like dicts, provide .keys() method that allows getting things out
sort of relevant are the getattr and setattr primitives, but it's pretty rare to use these in production code
I was going to talk about Pandas, but I see user2357112 has just pointed that out already!
There is no equivalent. The concept does not exist in Python. Some specific types have roughly analogous concepts, like the index of a Pandas Series, but arbitrary Python sequence types don't have names for their elements.

use string content as variable - python

I am using a package that has operations inside the class (? not sure what either is really), and normally the data is called this way data[package.operation]. Since I have to do multiple operations thought of shortening it and do the following
list =["o1", "o2", "o3", "o4", "o5", "o6"]
for i in list:
print data[package.i]
but since it's considering i as a string it doesnt do the operation, and if I take away the string then it is an undefined variable. Is there a way to go around this? Or will I just have to write it the long way?.
In particular I am using pymatgen, its package Orbital and with the .operation I want to call specific suborbitals. A real example of how it would be used is data[0][Orbital.s], the first [0] denotes the element in question for which to get the orbitals s (that's why I omitted it in the code above).
You can use getattr in order to dynamically select attributes from objects (the Orbital package in your case; for example getattr(Orbital, 's')).
So your loop would be rewritten to:
for op in ['o1', 'o2', 'o3', 'o4', 'o5', 'o6']:
print(data[getattr(package, op)])

How do I convert a GP to a string and back again using DEAP in Python?

I'm doing a project in Genetic Programming and I need to be able to convert a genetic program (of class deap.creator.Individual) to string, change some things (while keeping the problem 100% syntactically aligned with DEAP), and then put it back into a population of individuals for further evolution.
However, I've only been able to convert my string back to class gp.PrimitiveTree using the from_string method.
The only constructors for creator.Individual I see generate entire populations blindly or construct an Individual from an existing Individual/s. No methods to only create one individual from an existing gp.PrimitiveTree.
So, does anybody have any idea how I go about that?
Note: Individual is self-defined, but it is standard across all DEAP examples and is created using
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMax)
After many many hours I believe I've figured this out.
So, I'd become confused between two of the DEAP modules: 'creator' and 'toolbox'.
In order for me to create an individual with a given PrimitiveTree I simply needed to do:
creator.Individual(myPrimativeTree)
What you do not do is:
toolbox.individual(myPrimativeTree)
as that usually gets setup as the initialiser itself, and thus doesn't take arguments.
I hope that this can save somebody a decent chunk of time at some point in the future.
Individual to string: str(individual)
In order to create an Individual from a string: Primitive Tree has class method from_string:
https://deap.readthedocs.io/en/master/api/gp.html#deap.gp.PrimitiveTree.from_string
In your Deap evolution, to create an individual from string, you can try something like(note use of creator vs toolbox):
creator.Individual.from_string("add(IN1, IN2)", pset)
But the individual expression, as a string, needs to be such as it would if you did str(individual), aka stick to your pset when creating your string. So in my above example string I believe you would need to have a pset similar to:
pset = gp.PrimitiveSetTyped("MAIN", [float]*2, float, "IN")
pset.addPrimitive(operator.add, [float,float], float)

How do I get a formatted list of methods from a given object?

I'm a beginner, and the answers I've found online so far for this have been too complicated to be useful, so I'm looking for an answer in vocabulary and complexity similar to this writing.
I'm using python 2.7 in ipython notebook environment, along with related modules as distributed by anaconda, and I need to learn about the library-specific objects in the course of my daily work. The case I'm using here is a pandas dataframe object but the answer must work for any object of python or of an imported module.
I want to be able to print a list of methods for the given object. Directly from my program, in a concise and readable format. Even if it's just the method names in a list by alphabetical order, that would be great. A bit more detail would be even better, an ordering based on what it does is fine, but I'd like the output to look like a table, one row per method, and not big blocks of text. What i've tried is below, and it fails for me because it's unreadable. It puts copies of my data between each line, and it has no formatting.
(I love stackoverflow. I aspire to have enough points someday to upvote all your wonderful answers.)
import pandas
import inspect
data_json = """{"0":{"comment":"I won\'t go to school"}, "1":{"note":"Then you must stay in bed"}}"""
data_df = pandas.io.json.read_json(data_json, typ='frame',
dtype=True, convert_axes=True,
convert_dates=True, keep_default_dates=True,
numpy=False, precise_float=False,
date_unit=None)
inspect.getmembers(data_df, inspect.ismethod)
Thanks,
- Sharon
Create an object of type str:
name = "Fido"
List all its attributes (there are no “methods” in Python) in alphabetical order:
for attr in sorted(dir(name)):
print attr
Get more information about the lower (function) attribute:
print(name.lower.__doc__)
In an interactive session, you can also use the more convenient
help(name.lower)
function.

Categories