I am making a PyCharm plugin that will generate code examples based on a Python class. The user will be able to select a class and the plugin will generate code examples that utilise the attributes and methods of the class.
Basic Class
class Rectangle:
# Hardcoded values
width = 5
length = 10
def calculate_area(self):
return self.width*self.length
Basic Example
r = Rectangle()
r.calculate_area()
The problem arises when the class contains methods that take parameters. For example, if the class contains a method that takes a string parameter, then I want the generated examples to pass either an empty or a random string to the method. Is there a good way to analyse the potential type of the parameters (when not strongly defined) to generate code examples?
Potential Solutions
So far I have come up with 2 applicable solutions (passing None parameters when calling the method is not applicable):
A potential solution would be to keep track of the parameters and analyse the operations that they use/are used in to determine their type. In the code below num will most likely be an integer, as determined by the method body:
def calculate_even(num):
return num % 2 == 0
I know that you can execute a string containing Python code and check if it compiles correctly (source). A possible solution (one that I would love to avoid) is to generate code examples, passing different types as method parameters and determining which ones work after executing said code examples.
I can see huge downsides with both solutions, as the first one will require the handling of lots of operations for the different types and will still most likely be inaccurate, and the second one becomes incredibly slow for methods that take multiple parameters. I am really hoping that there's a more elegant solution that I have missed.
Edit
I am looking for a possible solution that won't require the use of type annotations/hints
This may be a straight-up unwise idea so I'd best explain the context. I am finding that some of my functions have multiple and sometimes mutually exclusive or interdependent keyword arguments - ie, they offer the user the ability to input a certain piece of data as (say) a numpy array or a dataframe. And then if a numpy array, an index can be separately passed, but not if it it's a dataframe.
Which has led me to wonder if it's worth creating some kind of keyword parser function to handle these exclusivities/dependencies. One issue with this is that the keyword parser function would then need to return any variables created (and ex-ante, we would not know their number or their names) into the namespace of the function that called it. I'm not sure if that's possible, at least in a reasonable way (I imagine it could be achieved by directly changing the local dict but that's sometimes said to be a bad idea).
So my question is:
1. Is this a bad idea in the first place? Would creating separate functions depending on whether the input was a dataframe or ndarray be more sensible and simpler?
2. Is it possible without too much hacking to have a function return an unspecified number of variables into the local namespace?
Apologies for the slightly vague nature of this question but any thoughts gratefully received.
A dict is a good way to package a variable number of named values. If the parser returns a dict, then there is a single object that can be queried to get those names and values, avoiding the problem of needing to know the number and names ahead of time.
Another possibility would be to put the parser into a class, either as a factory method (classmethod or staticmethod returning an instance) or as a regular method (invoked during or after __init__), where the class instance holds the parsed values.
I wrote a class with many different parameters, depending on the parameter the class uses different actions.
I can easily differentiate my methods into different cases, each set of methods belonging to certain parameters.
This resulted in a huge .py file, implementing all methods in the one class. For better readability, is it possible to write multiple methods in an own file and load it (similar as a package) into the class to treat them as class methods?
To give more details, my class is a decision tree. A parameter for example is the pruning method, used to shrink the tree. As I use different pruning methods, this takes a lot of lines in my class. I need to have a set of methods for each pruning parameter. It would be nice to simply load the methods for pruning from another file into the class and therefore shrinking the size of my decision tree .py file.
For better readability, I got a few suggestions.
Collapse all functions definitions, which can easily be achieved by
one click in most of the popular text editors.
Place related methods next to each other.
Give proper names to categorise and differentiate methods, for
example.
def search_board_first(): pass
def search_deep_first(): pass
Regarding splitting a huge class into a Object oriented behaviour, my rule of thumb is to consider the re-usability. If functions can be reused by other classes, they should be put in separate files and make them independent(static) of other classes.
If the methods are tied to the class and no where else, it is better to just to enclose that method within the class itself. Think it this way, to review the code, you need to refer the class properties anyway. Logically it doesn't quite make sense to split files just for splitting.
I am new to programming and would appreciate any advice regarding my assignment. And before you read what is following, please accept my appologies for being so silly.
Context
Every week I receive several .txt documents. Each document goes something like this:
№;Date;PatientID;Checkpoint;Parameter1;Parameter2;Parameter3/n
1;01.02.2014;0001;1;25.3;24.2;40.0/n
2;01.02.2014;0002;1;22.1;19.1;40.7/n
3;02.02.2014;0003;1;20.1;19.3;44.2/n
4;04.02.2014;0001;2;22.8;16.1;39.3/n
...
The first line contains column names, and every other line represents an observation. In fact there are over 200 columns and about 3000 lines in each .txt file I get. Moreover, every week column names may be slightly different from what they were a week before, and every week the number of observations increases.
My job is to select the observations that satisfy certain parameter requirements and build boxplots for some of the parameters.
What I think I should do
I want to make a program in Python 2.7.6 that would consist of four parts.
Code that would turn every observation into an object, so that I can access attributes like this:
obs1.checkpoint = 1
obs4.patientid = "0001"
I literary want column names to become attribute names.
Having done this it would be nice to create an object for every unique PatientID. And I would like objects representing observations related to this patient to be attributes of patient objects. My goal here is to make it easy to check if from checkpoint 1 to checkpoint 2 the patient's parameter increases.
Code that would select the observations I need.
Code that would build boxplots.
Code that would combine the three parts above into one program.
What I've found so far
I have found some working code that dynamically adds attributes to instances:
http://znasibov.info/blog/html/2010/03/10/python-classes-dynamic-properties.html
I'm afraid I don't fully understand how it works yet, but I think it might be of use in my case to turn column names into attribute names.
I have also found that creating variables dynamically is frowned upon:
http://nedbatchelder.com/blog/201112/keep_data_out_of_your_variable_names.html
Questions
Is it a good idea to turn every line in the table into an object?
If so, how do I go about it?
How do I create as many objects as there are lines in the table, and how do I name these objects?
What is the best way to turn column names into class attributes?
I'm new to Python, and I'm learning about the class function. Does anyone know of any good sources/examples for this? Here's an example I wrote up. I'm trying to read more about self and init
class Example:
def __init__(a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
test = Example(1, 1, 1, 1)
I've been to python.org, as well as this site. I've also been reading beginners Python books, but I'd like more information.
Thanks.
A couple of generic clarifications here:
Python's "class" keyword is not a function, it's a statement which signals to the language that the following code describes a class (a user-defined data type and its associated behavior). "class" takes a name (and a possibly empty list of "parent" classes) ... and introduces a "suite" (an indented block of code).
The "def" keyboard is a similar statement which defines a function. In your example, which should have read: *def _init_(self, a, b, c)*) you're defining a special type of function which is "part of" (associated with, bound to) the Example class. It's also possible (and fairly common) to create unbound functions. Commonly, in Python, unbound functions are simple called "functions" while those which are part of a class are called "methods" ... or "instance functions."
classes are templates for instantiating objects. The terms "instance" and "object" are synonymous in this context. Your example "test" is an instance ... and the Python interpreter "instantiates" and initializes that object according to the class description.
A class is also a "type", that is to say that it's a user definition of a type of data and its associated methods. "class" and "type" are somewhat synonymous in Python though they are conventionally used in different ways. The core "types" of Python data (integers, real numbers, imaginary/complex numbers, strings, lists, tuples, and dictionaries) are all referred to as "types" while the more complex data/operational structures are called classes. Early versions of Python were implemented with constraints that made the distinction between "type" and "class" more than merely a matter of terminological difference. However, the last several versions of Python have eliminated those underlying technical distinctions. Those distinctions related to "subclassing" (inheritance).
classes can be described as a set of additions and modifications to another class. This is called "inheritance" and the class which is derived from another in this manner is referred to as a "subclass." It's common for programmers to create hierarchies of classes ... with specific variations all deriving from more common bases. It's also common to define related functionality within the same files or sets of files. These are "class libraries" and sometimes they are built as "packages."
_init_() is a method; in particular it's the initializer for Python objects (instances of a class).
Python generally uses _..._ (prefixing and suffixing pairs of underscore characters around selected keywords) for "special" method or attribute names ... which is intended to reduce the likelihood that its naming choices will conflict with the meaningful names that you might wish to give to your own methods. While you can name your other methods and attributes with this _XXXX_ --- Python will not inherently treat that as an error --- it's an extremely bad idea to do so. Even if you don't pick any of the currently defined special names there's no guarantee that some future version of Python won't conflict with your usage later.
"methods" are functions ... but they are a type of function which is bound (associated with) a particular instance of a particular class. There are also "class methods" which are associated with the class rather than with a specific instance of the class.
In your example self.b, self.c and so on are "attributes" or "members" of the object (instance). (The terms are synonymous).
In general the purpose of object orient programming is to provide ways of describing types of data and operations on those types of data in a way that's amenable to both human comprehension and computerized interpretation and execution. (All programming languages are intended to strike some balance between human readability/comprehension and machine parsing/execution; but object-oriented languages do so specifically with a focus on the description and definition of "types," and the instantiation of those types into "objects" and finally the various interactions among those objects.
"self" is a Python specific convention for naming the special (and generally required) first argument to any bound method. It's a "self" reference (a way for the code in a method to refer to the object/instance's own attributes and other methods without any ambiguity). While you can call your first argument to your bound methods "a" (as you've unwittingly done in your Example) it's an extremely bad idea to do so. Not only will it likely confuse you later ... it will make no sense to anyone else trying to read your Python code).
The term "object-oriented" is confusing unless one is aware of the comparisons to other forms of programming language. It's an evolution from "procedural" programming. The simplest gist of that comparison is when you consider the sorts of functions one would define in a procedural language were one might have to define and separately name different functions to perform analogous operations on different types of data: print_student_record(this_student) vs. print_teacher_report(some_report) --- a programming model which necessitates a fair amount of overhead on the part of the programmer, to keep track of which functions work on which types. This sort of problem is eliminated in OO (object oriented) programming where one can, conceivably, call on this.print_() ... and, assuming one has created compatible classes, this will "print" regardless of whether "this" is a student (record/object) or a teacher (report/record/object). That's greatly oversimplified but useful for understanding the pressures which led to the development and adoption of OO based programming.
In Python it's possible to create classes with little or no functionality. Your example does nothing, yet, but transfer a set of arguments into "attributes" (members) during initialization (instantiation). After that you could use these attributes in programming statements like: test.a += 1 and print (test.a). This is possible because Python is a "multi-paradigm" language. It supports procedural as well as object-orient programming styles. Objects used this way are very similar to "structs" from the C programming language (predecessor to C++) and to the "records" in Pascal, etc. That style of programming is largely considered to be obsolete (particularly when using a modern, "very high level" language such as Python).
The gist of what I'm getting at is this ... you'll want to learn how to think of your data as the combination of it's "parts" (attributes) and the functionality that changes, manipulates, validates, and handles input, output, and possibly storage, of those attributes.
For example if you were writing a "code breaker" program to solve simply ciphers you might implement a "Histogram" object which counts the letter frequencies of a given coded message. That would have attributes (one integer for every letter) and behavior (feeding ports of the coded message(s) into the instance, splitting the strings into individual characters, filtering out all the non-letter characters, converting all the letters to upper or lower case, and counting them --- that is incrementing the integer corresponding to each letter). Additionally you'd need to have some way of querying the histogram ... for example getting list of the letters sorted by their frequency in the cipher text.
Once you had such a "histogram" class then you could think of ways to use that for your solver. For example to solve a cryptogram puzzle you might computer the histogram then try substituting "etaon" for the five most common ciphered letters ... then check how many of the "partial" strings (t.e for "the") match words, trying permutations, and so on. Each of these might be it's own class. A key point of programming is that your histogram class might be useful for counting all sorts of other things (even in a simple voting system or popularity context). A particular subclass or instantiation might make it a histogram of letters while others could be re-used for other types of "things" you want counted. Similarly the code that iterates over permutions of some list might be used in any number of simulation, optimization, and related programs. (In fact Python's standard libraries already including "counters" and "permutations" functions in the "collections" and "itertools" modules, respectively).
Of course you're going to hear of all of these concepts repeatedly as you study programming. This has been a rather rambling attempt to kickstart that process. I know I've been a bit repetitious in a few points here --- part of that is because I'm typing this a 4am after having started work at 7am yesterday; but part of it serves a pedagogical purpose as well.
There's an error in your class definition. Always include the variable self in your __init__ method. It represents the instance of the object itself and should be included as the first parameter to all of your methods.
What do you want to accomplish with this class? Up until now, it just stores a few variables. Try adding a few methods to spice things up a little bit! There is a plethora of available resources on classes in Python. To start, you may wanna give this one a try:
Python Programming - Classes
I'm learning python now also, and there is an intro class that's pretty good on codecademy.com
http://www.codecademy.com/tracks/python
It has a section that goes through an exercise on classes. Hope this helps