Alternative to locals() in printing a table with a header - python

[Python 3.1]
Edit: mistake in the original code.
I need to print a table. The first row should be a header, which consists of column names separated by tabs. The following rows should contain the data (also tab-separated).
To clarify, let's say I have columns "speed", "power", "weight". I originally wrote the following code, with the help from a related question I asked earlier:
column_names = ['speed', 'power', 'weight']
def f(row_number):
# some calculations here to populate variables speed, power, weight
# e.g., power = retrieve_avg_power(row_number) * 2.5
# e.g., speed = math.sqrt(power) / 2
# etc.
locals_ = locals()
return {x : locals_[x] for x in column_names}
def print_table(rows):
print(*column_names, sep = '\t')
for row_number in range(rows):
row = f(row_number)
print(*[row[x] for x in component_names], sep = '\t')
But then I learned that I should avoid using locals() if possible.
Now I'm stuck. I don't want to type the list of all the column names more than once. I don't want to rely on the fact that every dictionary I create inside f() is likely to iterate through its keys in the same order. And I don't want to use locals().
Note that the functions print_table() and f() do a lot of other stuff; so I have to keep them separate.
How should I write the code?

class Columns:
pass
def f(row_number):
c = Columns()
c.power = retrieve_avg_power(row_number) * 2.5
c.speed = math.sqrt(power) / 2
return c.__dict__
This also lets you specify which of the variables are meant as columns, instead of rather being temporary in the function.

You could use an OrderedDict to fix the order of the dictionaries. But as I see it that isn't even necessary. You are always taking the keys from the column_names list (except in the last line, I assume that is a typo), so the order of the values will always be the same.

an alternative to locals() will be to use the inspect module
import inspect
def f(row_number):
# some calculations here to populate variables speed, power, weight
# e.g., power = retrieve_avg_power(row_number) * 2.5
# e.g., speed = math.sqrt(power) / 2
# etc.
locals_ = inspect.currentframe().f_locals
return {x : locals_[x] for x in column_names }

Related

Change part of csv string panda

I would like to change the word consolidation (two times in the following string) with an other value with a variable or ? (ex. breakout/outofconsolidation/inside)
Can I help me to achieve this, please?
dfconsolidationcsv.to_csv(r'symbols\stocks_consolidation_sp500.csv', index = False)
a = ('breakout')
df{a}csv.to_csv(r'symbols\stocks_{a}_sp500.csv', index = False)
Unless there is a justifiable reason to be creating dynamic variable assignments, I would avoid doing so. In this case, defining your DataFrame variables in a dict is probably sufficient:
# store df in a dict instead of separate variables
df_dict = dict()
df_dict['consolidation'] = dfconslidationcv
df_dict['breakout'] = dfbreakoutcv
...
# invoke command for a specific variable
a = 'breakout'
df_dict[a].to_csv(r'symbols\stocks_%s_sp500.csv' % a, index = False)
Now, if there is an overwhelming reason why you HAVE to use pre-existing variable names that need to be changed dynamically, I think you can do something like this:
a = 'breakout'
exec("df%scsv.to_csv(r'symbols\stocks_%s_sp500.csv', index=False)" % (a, a))

Dynamically creating a class

I have a function which returns me two lists, symbols and data where the corresponding values are with the same index. For example symbols[i] gives the variable name and data[i] gives the actual value (int).
I would like to use these two lists to dynamically create a class with static values of the following format:
class a:
symbols[i] = data[i]
symbols[i+1] = data[i+1]
and so on so that I could later refer to the values like this:
a.symbols[i]
a.symbols[i+1]
where symbols[i] and symbols[i+1] should be replaced with the wanted variable name, like a.var1 or a.var2
How could this be achieved?
Edit: added detail below
So I have a main program lets say def main() which should read in a list.dat of this style:
dimension1;0.1
dimension2;0.03
dimension3;0.15
and separate the values to symbols and data lists.
So I don't know how many values there are exactly in these lists. I want to create a class dynamically to be able to refer to the values in the main program and to give the class to sub functions as an argument like def sub1(NewClass, argument1, argument2) etc. At the moment I am using a manually created simple python list (list.py) of the following format:
dimension1 = 0.1
dimension2 = 0.03
dimension3 = 0.15
and then using from list import * in the main program and also in the sub functions, which causes a SyntaxWarning telling me that import * only allowed at module level. So what I actually want is a smart and consistent way of handling the parameters list and transferring it to another functions
You can create a class dynamically with type. If I understand what you want to achieve here, your code will look like:
my_classes = []
for i in range(0, len(data), 2):
my_classes.append(
type('A%d' % i, (), {'var1': data[i], 'var2': data[i+1]})
)
I suspect what you actually want, re-reading the description, is to use type as follows:
NewClass = type('NewClass', (object,), dict(zip(symbols, data)))
Given a minimal example:
>>> symbols = 'foo bar baz'.split()
>>> data = range(3)
The outcome would be:
>>> NewClass.foo
0
>>> NewClass.bar
1
>>> NewClass.baz
2
Using zip allows you to easily create a dictionary from a list of keys and a list of associated values, which you can use as the __dict__ for your new class.
However, it's not clear why you want this to be a class, specifically.

Find Nth item in comma separated list in Python

I have a large CSV with comma separated lines of varying length. Sorting another set of data I used split(',') in a loop to separate fields, but this method requires each line to have the same number of entries. Is there a way I can look at a line and, independent of the total number of entries, just pull the Nth item? For reference, the method I was using will only work with a line that looks like AAA,BBB,CCC,DDD
entry = 'A,B,C,D'
(a,b,c,d) = entry.split(',')
print a,b,c,d
But I would like to pull A and C even if it looks like A,B,C,D,E,F or A,B,C.
Use a list instead of separate variables.
values = entry.split(',')
print values[0], values[2]
Just use a list:
xyzzy = entry.split(",");
print xyzzy[0], xyzzy[2]
But be aware that, once you allow the possibility of variable element counts, you'd probably better allow for too few:
entry = 'A,B'
xyzzy = entry.split(",");
(a,c) = ('?','?')
if len(xyzzy) > 0: a = xyzzy[0]
if len(xyzzy) > 2: c = xyzzy[2]
print a, c
If you don't want to index the results, it's not difficult to write your own function to deal with the situation where there are either too few or two many values. Although it requires a few more lines of code to set up, an advantage is that you can give the results meaningful names instead of anonymous ones likeresults[0]andresults[2].
def splitter(s, take, sep=',', default=None):
r = s.split(sep)
if len(r) < take:
r.extend((default for _ in xrange(take - len(r))))
return r[:take]
entry = 'A,B,C'
a,b,c,d = splitter(entry, 4)
print a,b,c,d # --> A B C None
entry = 'A,B,C,D,E,F'
a,b,c,d = splitter(entry, 4)
print a,b,c,d # --> A B C D

Python references to references in python

I have a function that takes given initial conditions for a set of variables and puts the result into another global variable. For example, let's say two of these variables is x and y. Note that x and y must be global variables (because it is too messy/inconvenient to be passing large amounts of references between many functions).
x = 1
y = 2
def myFunction():
global x,y,solution
print(x)
< some code that evaluates using a while loop >
solution = <the result from many iterations of the while loop>
I want to see how the result changes given a change in the initial condition of x and y (and other variables). For flexibility and scalability, I want to do something like this:
varSet = {'genericName0':x, 'genericName1':y} # Dict contains all variables that I wish to alter initial conditions for
R = list(range(10))
for r in R:
varSet['genericName0'] = r #This doesn't work the way I want...
myFunction()
Such that the 'print' line in 'myFunction' outputs the values 0,1,2,...,9 on successive calls.
So basically I'm asking how do you map a key to a value, where the value isn't a standard data type (like an int) but is instead a reference to another value? And having done that, how do you reference that value?
If it's not possible to do it the way I intend: What is the best way to change the value of any given variable by changing the name (of the variable that you wish to set) only?
I'm using Python 3.4, so would prefer a solution that works for Python 3.
EDIT: Fixed up minor syntax problems.
EDIT2: I think maybe a clearer way to ask my question is this:
Consider that you have two dictionaries, one which contains round objects and the other contains fruit. Members of one dictionary can also belong to the other (apples are fruit and round). Now consider that you have the key 'apple' in both dictionaries, and the value refers to the number of apples. When updating the number of apples in one set, you want this number to also transfer to the round objects dictionary, under the key 'apple' without manually updating the dictionary yourself. What's the most pythonic way to handle this?
Instead of making x and y global variables with a separate dictionary to refer to them, make the dictionary directly contain "x" and "y" as keys.
varSet = {'x': 1, 'y': 2}
Then, in your code, whenever you want to refer to these parameters, use varSet['x'] and varSet['y']. When you want to update them use varSet['x'] = newValue and so on. This way the dictionary will always be "up to date" and you don't need to store references to anything.
we are going to take an example of fruits as given in your 2nd edit:
def set_round_val(fruit_dict,round_dict):
fruit_set = set(fruit_dict)
round_set = set(round_dict)
common_set = fruit_set.intersection(round_set) # get common key
for key in common_set:
round_dict[key] = fruit_dict[key] # set modified value in round_dict
return round_dict
fruit_dict = {'apple':34,'orange':30,'mango':20}
round_dict = {'bamboo':10,'apple':34,'orange':20} # values can even be same as fruit_dict
for r in range(1,10):
fruit_set['apple'] = r
round_dict = set_round_val(fruit_dict,round_dict)
print round_dict
Hope this helps.
From what I've gathered from the responses from #BrenBarn and #ebarr, this is the best way to go about the problem (and directly answer EDIT2).
Create a class which encapsulates the common variable:
class Count:
__init__(self,value):
self.value = value
Create the instance of that class:
import Count
no_of_apples = Count.Count(1)
no_of_tennis_balls = Count.Count(5)
no_of_bananas = Count.Count(7)
Create dictionaries with the common variable in both of them:
round = {'tennis_ball':no_of_tennis_balls,'apple':no_of_apples}
fruit = {'banana':no_of_bananas,'apple':no_of_apples}
print(round['apple'].value) #prints 1
fruit['apple'].value = 2
print(round['apple'].value) #prints 2

Converting an imperative algorithm that "grows" a table into pure functions

My program, written in Python 3, has many places where it starts with a (very large) table-like numeric data structure and adds columns to it following a certain algorithm. (The algorithm is different in every place.)
I am trying to convert this into pure functional approach since I run into problems with the imperative approach (hard to reuse, hard to memoize interim steps, hard to achieve "lazy" computation, bug-prone due to reliance on state, etc.).
The Table class is implemented as a dictionary of dictionaries: the outer dictionary contains rows, indexed by row_id; the inner contains values within a row, indexed by column_title. The table's methods are very simple:
# return the value at the specified row_id, column_title
get_value(self, row_id, column_title)
# return the inner dictionary representing row given by row_id
get_row(self, row_id)
# add a column new_column_title, defined by func
# func signature must be: take a row and return a value
add_column(self, new_column_title, func)
Until now, I simply added columns to the original table, and each function took the whole table as an argument. As I'm moving to pure functions, I'll have to make all arguments immutable. So, the initial table becomes immutable. Any additional columns will be created as standalone columns and passed only to those functions that need them. A typical function would take the initial table, and a few columns that are already created, and return a new column.
The problem I run into is how to implement the standalone column (Column)?
I could make each of them a dictionary, but it seems very expensive. Indeed, if I ever need to perform an operation on, say, 10 fields in each logical row, I'll need to do 10 dictionary lookups. And on top of that, each column will contain both the key and the value, doubling its size.
I could make Column a simple list, and store in it a reference to the mapping from row_id to the array index. The benefit is that this mapping could be shared between all columns that correspond to the same initial table, and also once looked up once, it works for all columns. But does this create any other problems?
If I do this, can I go further, and actually store the mapping inside the initial table itself? And can I place references from the Column objects back to the initial table from which they were created? It seems very different from how I imagined a functional approach to work, but I cannot see what problems it would cause, since everything is immutable.
In general does functional approach frown on keeping a reference in the return value to one of the arguments? It doesn't seem like it would break anything (like optimization or lazy evaluation), since the argument was already known anyway. But maybe I'm missing something.
Here is how I would do it:
Derive your table class from a frozenset.
Each row should be a sublcass of tuple.
Now you can't modify the table -> immutability, great! The next step
could be to consider each function a mutation which you apply to the
table to produce a new one:
f T -> T'
That should be read as apply the function f on the table T to produce
a new table T'. You may also try to objectify the actual processing of
the table data and see it as an Action which you apply or add to the
table.
add(T, A) -> T'
The great thing here is that add could be subtract instead giving you
an easy way to model undo. When you get into this mindset, your code
becomes very easy to reason about because you have no state that can
screw things up.
Below is an example of how one could implement and process a table
structure in a purely functional way in Python. Imho, Python is not
the best language to learn about FP in because it makes it to easy to
program imperatively. Haskell, F# or Erlang are better choices I think.
class Table(frozenset):
def __new__(cls, names, rows):
return frozenset.__new__(cls, rows)
def __init__(self, names, rows):
frozenset.__init__(self, rows)
self.names = names
def add_column(rows, func):
return [row + (func(row, idx),) for (idx, row) in enumerate(rows)]
def table_process(t, (name, func)):
return Table(
t.names + (name,),
add_column(t, lambda row, idx: func(row))
)
def table_filter(t, (name, func)):
names = t.names
idx = names.index(name)
return Table(
names,
[row for row in t if func(row[idx])]
)
def table_rank(t, name):
names = t.names
idx = names.index(name)
rows = sorted(t, key = lambda row: row[idx])
return Table(
names + ('rank',),
add_column(rows, lambda row, idx: idx)
)
def table_print(t):
format_row = lambda r: ' '.join('%15s' % c for c in r)
print format_row(t.names)
print '\n'.join(format_row(row) for row in t)
if __name__ == '__main__':
from random import randint
cols = ('c1', 'c2', 'c3')
T = Table(
cols,
[tuple(randint(0, 9) for x in cols) for x in range(10)]
)
table_print(T)
# Columns to add to the table, this is a perfect fit for a
# reduce. I'd honestly use a boring for loop instead, but reduce
# is a perfect example for how in FP data and code "becomes one."
# In fact, this whole program could have been written as just one
# big reduce.
actions = [
('max', max),
('min', min),
('sum', sum),
('avg', lambda r: sum(r) / float(len(r)))
]
T = reduce(table_process, actions, T)
table_print(T)
# Ranking is different because it requires an ordering, which a
# table does not have.
T2 = table_rank(T, 'sum')
table_print(T2)
# Simple where filter: select * from T2 where c2 < 5.
T3 = table_filter(T2, ('c2', lambda c: c < 5))
table_print(T3)

Categories