For loop inside a function - python

I'm still learning Python, and I'd love to know a way to make the following work:
a_function(
for n,item in enumerate(list):
inside_function(code code code,
code code code,
code code code))
So there's a function nested inside another function, and I need to play out the inside function a number of times, but not the outside one. The code I'm working with is not mine so I can't change the way these functions work.
I can edit with the actual code if anyone needs it, it's something from PyChart.
Edit: actual code:
ar = area.T(y_coord = category_coord.T(data, 0),
x_grid_style=line_style.gray50_dash1,
x_grid_interval=chartlength/5, x_range = (0,chartlength),
x_axis=axis.X(label="X label"),
y_axis=axis.Y(label="Y label"))
chart_object.set_defaults(interval_bar_plot.T, direction="horizontal",
width=5, cluster_sep = 0, data=data)
ar.add_plot(
for n,item in enumerate(compactlist):
interval_bar_plot.T(line_styles = [None, None],
fill_styles = [fill_style.red, None],
label=compactlist[n], cluster=(n,len(compactlist)))
)
can = canvas.default_canvas()
can.set_title("Chromosome comparison")
can.set_author("Foo")
ar.draw()
The ar.add_plot function creates a working area in the canvas (as I understand it), while the interval_bar_plot function creates the bars, one by one. So I need multiple interval_bar_plot functions but only the one add_plot, or it simply repeats the first bar n times.
Edit: and the error:
File "intvlbar.py", line 105
for n,item in enumerate(compactlist):
^
SyntaxError: invalid syntax

What you are trying to do is pass several bar plot objects to the add_plot method (documented in here). One way you can do this is to pass them each explicitly. For example:
ar.add_plot(bar1, bar2, bar3)
Examples of this are in the sample code sections of the PyChart documentation for bar plots and interval bar plots, for example.
You do not want to do this because your compactlist might be inconveniently long or of varying length between runs. Another option is to use argument unpacking. Create a list containing your bar plot objects:
bars = [interval_bar_plot.T(line_styles = [None, None],
fill_styles = [fill_style.red, None],
label=compactlist[n], cluster=(n,len(compactlist)))
for n,item in enumerate(compactlist)]
Now call add_plot with your bars:
ar.add_plot(*bars)

The error you are getting is because the for loop does not return anything in itself. But the for loop is placed inside the function call ar.add_plot() where the parameters should go. So python is telling you "ar.add_plot() needs parameters, but this for loop isn't going to give them to me"
What parameters does ar.add_plot() need?
You need something closer to this (though this probably isn't correct):
ar.add_plot()
for n,item in enumerate(compactlist):
interval_bar_plot.T(line_styles = [None, None],
fill_styles = [fill_style.red, None],
label=compactlist[n], cluster=(n,len(compactlist)

Related

Pandas Styler.to_latex() - how to pass commands and do simple editing

How do I pass the following commands into the latex environment?
\centering (I need landscape tables to be centered)
and
\caption* (I need to skip for a panel the table numbering)
In addition, I would need to add parentheses and asterisks to the t-statistics, meaning row-specific formatting on the dataframes.
For example:
Current
variable
value
const
2.439628
t stat
13.921319
FamFirm
0.114914
t stat
0.351283
founder
0.154914
t stat
2.351283
Adjusted R Square
0.291328
I want this
variable
value
const
2.439628
t stat
(13.921319)***
FamFirm
0.114914
t stat
(0.351283)
founder
0.154914
t stat
(1.651283)**
Adjusted R Square
0.291328
I'm doing my research papers in DataSpell. All empirical work is in Python, and then I use Latex (TexiFy) to create the pdf within DataSpell. Due to this workflow, I can't edit tables in latex code while they get overwritten every time I run the jupyter notebook.
In case it helps, here's an example of how I pass a table to the latex environment:
# drop index to column
panel_a.reset_index(inplace=True)
# write Latex index and cut names to appropriate length
ind_list = [
"ageFirm",
"meanAgeF",
"lnAssets",
"bsVol",
"roa",
"fndrCeo",
"lnQ",
"sic",
"hightech",
"nonFndrFam"
]
# assign the list of values to the column
panel_a["index"] = ind_list
# format column names
header = ["", "count","mean", "std", "min", "25%", "50%", "75%", "max"]
panel_a.columns = header
with open(
os.path.join(r"/.../tables/panel_a.tex"),"w"
) as tf:
tf.write(
panel_a
.style
.format(precision=3)
.format_index(escape="latex", axis=1)
.hide(level=0, axis=0)
.to_latex(
caption = "Panel A: Summary Statistics for the Full Sample",
label = "tab:table_label",
hrules=True,
))
You're asking three questions in one. I think I can do you two out of three (I hear that "ain't bad").
How to pass \centering to the LaTeX env using Styler.to_latex?
Use the position_float parameter. Simplified:
df.style.to_latex(position_float='centering')
How to pass \caption*?
This one I don't know. Perhaps useful: Why is caption not working.
How to apply row-specific formatting?
This one's a little tricky. Let me give an example of how I would normally do this:
df = pd.DataFrame({'a':['some_var','t stat'],'b':[1.01235,2.01235]})
df.style.format({'a': str, 'b': lambda x: "{:.3f}".format(x)
if x < 2 else '({:.3f})***'.format(x)})
Result:
You can see from this example that style.format accepts a callable (here nested inside a dict, but you could also do: .format(func, subset='value')). So, this is great if each value itself is evaluated (x < 2).
The problem in your case is that the evaluation is over some other value, namely a (not supplied) P value combined with panel_a['variable'] == 't stat'. Now, assuming you have those P values in a different column, I suggest you create a for loop to populate a list that becomes like this:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
Now, we can apply a function to df.style.format, and pop/select from the list like so:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
panel_a.style.format({'variable': str, 'value': func})
Result:
This solution is admittedly a bit "hacky", since modifying a globally declared list inside a function is far from good practice; e.g. if you modify the list again before calling func, its functionality is unlikely to result in the expected behaviour or worse, it may throw an error that is difficult to track down. I'm not sure how to remedy this other than simply turning all the floats into strings in panel_a.value inplace. In that case, of course, you don't need .format anymore, but it will alter your df and that's also not ideal. I guess you could make a copy first (df2 = df.copy()), but that will affect memory.
Anyway, hope this helps. So, in full you add this as follows to your code:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
with open(fname, "w") as tf:
tf.write(
panel_a
.style
.format({'variable': str, 'value': func})
...
.to_latex(
...
position_float='centering'
))

Python Error: "divide by zero encountered in log" in function, how to adjust? (Beginner here)

For my current assignment, I am to establish the stability of intersection/equilibrium points between two nullclines, which I have defined as follows:
def fNullcline(F):
P = (1/k)*((1/beta)*np.log(F/(1-F))-c*F+v)
return P
def pNullcline(P):
F = (1/delta)*(pD-alpha*P+(r*P**2)/(m**2+P**2))
return F
I also have a method "stability" that applies the Hurwitz criteria on the underlying system's Jacobian:
def dPdt(P,F):
return pD-delta*F-alpha*P+(r*P**2)/(m**2+P**2)
def dFdt(P,F):
return s*(1/(1+sym.exp(-beta*(-v+c*F+k*P)))-F)
def stability(P,F):
x = sym.Symbol('x')
ax = sym.diff(dPdt(x, F),x)
ddx = sym.lambdify(x, ax)
a = ddx(P)
# shortening the code here: the same happens for b, c, d
matrix = [[a, b],[c,d]]
eigenvalues, eigenvectors = np.linalg.eig(matrix)
e1 = eigenvalues[0]
e2 = eigenvalues[1]
if(e1 >= 0 or e2 >= 0):
return 0
else:
return 1
The solution I was looking for was later provided. Basically, values became too small! So this code was added to make sure no too small values are being used for checking the stability:
set={0}
for j in range(1,210):
for i in range(1,410):
x=i*0.005
y=j*0.005
x,y=fsolve(System,[x,y])
nexist=1
for i in set:
if(abs(y-i))<0.00001:
nexist=0
if(nexist):
set.add(y)
set.discard(0)
I'm still pretty new to coding so the function in and on itself is still a bit of a mystery to me, but it eventually helped in making the little program run smoothly :) I would again like to express gratitude for all the help I have received on this question. Below, there are still some helpful comments, which is why I will leave this question up in case anyone might run into this problem in the future, and can find a solution thanks to this thread.
After a bit of back and forth, I came to realise that to avoid the log to use unwanted values, I can instead define set as an array:
set = np.arange(0, 2, 0.001)
I get a list of values within this array as output, complete with their according stabilities. This is not a perfect solution as I still get runtime errors (in fact, I now get... three error messages), but I got what I wanted out of it, so I'm counting that as a win?
Edit: I am further elaborating on this in the original post to improve the documentation, however, I would like to point out again here that this solution does not seem to be working, after all. I was too hasty! I apologise for the confusion. It's a very rocky road for me. The correct solution has since been provided, and is documented in the original question.

How to insert all occurances in list[] into CompositeVideoClip

I have a video where I want to insert a dynamic amount of TextClip(s). I have a while loop that handles the logic for actually creating the different TextClips and giving them individual durations & start_times (this works). I do however have a problem with actually "compiling" the video itself with inserting these texts.
Code for creating a TextClip (that works).
text = mpy.TextClip(str(contents),
color='white', size=[1700, 395], method='caption').set_duration(
int(list[i - 1])).set_start(currentTime).set_position(("center", 85))
print(str(i) + " written")
textList.append(text)
Code to "compile" the video. (that doesn't work)
final_clip = CompositeVideoClip([clip, len(textList)])
final_clip.write_videofile("files/final/TEST.mp4")
I tried several approaches but now I'm stuck and can't figure out a way to continue. Before I get a lot of "answers" telling me to do a while loop on the compiling, let me just say that the actual compiling takes about 5 minutes and I have 100-500 different texts I need implemented in the final video which would take days. Instead I want to add them one by one and then do 1 big final compile which I know it will take slightly longer than 5 minutes, but still a lot quicker than 2-3 days.
For those of you that may not have used moviepy before I will post a snippet of "my code" that actually works, not in the way I need it to though.
final_clip = CompositeVideoClip([clip, textList[0], textList[1], textList[2]])
final_clip.write_videofile("files/final/TEST.mp4")
This works exactly as intended (adding 3 texts), However I dont/can't know how many texts there will be in each video beforehand so I need to somehow insert a dynamic amount of textList[] into the function.
Kind regards,
Unsure what the arguments after clip, do (you could clarify), but if the problem's solved by inserting a variable number of textList[i] args, the solution's simple:
CompositeVideoClip([clip, *textList])
The star unpacks the list (or any iterable); ex: arg=4,5 -- def f(a,b): return a+b -- f(*arg)==9. If you have many textLists, you can manage them via a nested list or a dictionary:
textListDict = {'1':textList1, '2':textList2, ...}
textListNest = [textList1, textList2, ...] # probably preferable - using for below
# now iterate:
for textList in textListNest:
final_clip = CompositeVideoClip([clip, *textList])
Unpacking demo:
def show_then_join(a, b, c):
print("a={}, b={}, c={}".format(a,b,c))
print(''.join("{}{}{}".format(a,b,c)))
some_list = [1, 2, 'dog']
show_then_sum(*some_list) # only one arg is passed in, but is unpacked into three
# >> a=1, b=2, c=dog
# >> 12dog

Create a summary Pandas DataFrame using concat/append via a for loop

Can't get my mind around this...
I read a bunch of spreadsheets, do a bunch of calculations and then want to create a summary DF from each set of calculations. I can create the initial df but don't know how to control my loops so that I
create the initial DF (1st time though the loop)
If it has been created append the next DF (last two rows) for each additional tab.
I just can't wrap my head how to create the right nested loop so that once the 1st one is done the subsequent ones get appended?
My current code looks like: (which just dumbly prints each tab's results separately rather than create a new consolidated sumdf with just the last 2 rows of each tabs' results..
#make summary
area_tabs=['5','12']
for area_tabs in area_tabs:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
Still a newb learning basic python loop techniques :-(
Often a neater way than writing for loops, especially if you are planning on using the result, is to use a list comprehension over a function:
def get_sumdf(area_tab): # perhaps you can name better?
actdf,aname = get_data(area_tab)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
return sumdf
[get_sumdf(area_tab) for area_tab in areas_tabs]
and concat:
pd.concat([get_sumdf(area_tab) for area_tab in areas_tabs])
or you can also use a generator expression:
pd.concat(get_sumdf(area_tab) for area_tab in areas_tabs)
.
To explain my comment re named tuples and dictionaries, I think this line is difficult to read and ripe for bugs:
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
A trick is to have do_projections return a named tuple, rather than a tuple:
from collections import namedtuple
Projection = namedtuple('Projection', ['lastq', 'fcast_yr', 'projections', 'yrahead', 'aname', 'actdf', 'merged2', 'mergederrs', 'montdist', 'ols_test', 'mergedfcst'])
then inside do_projections:
return (1, 2, 3, 4, ...) # don't do this
return Projection(1, 2, 3, 4, ...) # do this
return Projection(last_q=last_q, fcast_yr=f_cast_yr, ...) # or this
I think this avoids bugs and is a lot cleaner, especially to access the results later.
projections = do_projections(actdf)
projections.aname
Do the initialisation outside the for loop. Something like this:
#make summary
area_tabs=['5','12']
if not area_tabs:
return # nothing to do
# init the first frame
actdf,aname = get_data(area_tabs[0])
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname
for area_tabs in area_tabs[1:]:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
You can further improve the code by putting the common steps into a function.

Iterate over two big arrays at once

I have to iterate over two arrays which are 1000x1000 big. I already reduced the resolution to 100x100 to make the iteration faster, but it still takes about 15 minutes for ONE array!
So I tried to iterate over both at the same time, for which I found this:
for index, (x,y) in ndenumerate(izip(x_array,y_array)):
but then I get the error:
ValueError: too many values to unpack
Here is my full python code: I hope you can help me make this a lot faster, because this is for my master thesis and in the end I have to run it about a 100 times...
area_length=11
d_circle=(area_length-1)/2
xdis_new=xdis.copy()
ydis_new=ydis.copy()
ie,je=xdis_new.shape
while (np.isnan(np.sum(xdis_new))) and (np.isnan(np.sum(ydis_new))):
xdis_interpolated=xdis_new.copy()
ydis_interpolated=ydis_new.copy()
# itx=np.nditer(xdis_new,flags=['multi_index'])
# for x in itx:
# print 'next x and y'
for index, (x,y) in ndenumerate(izip(xdis_new,ydis_new)):
if np.isnan(x):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=xdis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
xdis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
if np.isnan(y):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=ydis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
ydis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
else:
print 'no need to interpolate'
xdis_new=xdis_interpolated
ydis_new=ydis_interpolated
Some advice:
Profile your code to see what is the slowest part. It may not be the iteration but the computations that need to be done each time.
Reduce function calls as much as possible. Function calls are not for free in Python.
Rewrite the slowest part as a C extension and then call that C function in your Python code (see Extending and Embedding the Python interpreter).
This page has some good advice as well.
You specifically asked for iterating two arrays in a single loop. Here is a way to do that
l1 = ["abc", "def", "hi"]
l2 = ["ghi", "jkl", "lst"]
for f,s in zip(l1,l2):
print "%s : %s" %(f,s)
The above is for python 3, you can use izip for python 2
You may use this as your for loop:
for index, x in ndenumerate((x_array,y_array)):
But it wont help you much, because your computer cant do two things at the same time.
Profiling is definitely a good start to identify where all the time spent actually goes.
I usually use the cProfile module, as it requires minimal overhead and gives me more than enough information.
import cProfile
import pstats
cProfile.run('main()', "ProfileData.txt", 'tottime')
p = pstats.Stats('ProfileData.txt')
p.sort_stats('cumulative').print_stats(100)
I your example you would have to wrap your code into a main() function to be able to use this code snippet at the very end of your file.
Comment #1: You don't want to use ndenumerate on the izip iterator, as it'll output you the iterator, which isn't what you want.
Comment #2:
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
could be simplified in i1 = min(index[0]-(area_length-1)/2, 0), and you could store your (area_length+/-1)/2 in specific variables.
Idea #1 : try to iterate on flat versions of the arrays, i.e. with something like
for (i, (x, y)) in enumerate(izip(xdis_new.flat,ydis_new.flat)):
You could get the original indices via divmod(i, xdis_new.shape[-1]), as you should be iterating by rows first.
Idea #2 : Iterate only on the nans, i.e. indexing your arrays with np.isnan(xdis_new)|np.isnan(ydis_new), that could save you some iterations
EDIT #1
You probably don't need to initialize d, weight_fac and weighted_area in your loop, as you compute them separately.
Your weight[where(d>0)] can be simplified in weight[d>0]
Do you need weight_fac ? Can't you just compute weight then normalize it in place ? That should save you some temporary arrays.

Categories