Hey guys so I have a 2D array that looks like this:
12381.000 63242.000 0.000 0.000 0.000 8.000 9.200 0.000 0.000
12401.000 8884.000 0.000 0.000 96.000 128.000 114.400 61.600 0.000
12606.000 74204.000 56.000 64.000 72.000 21.600 18.000 0.000 0.000
12606.000 105492.000 0.000 0.000 0.000 0.000 0.000 0.000 45.600
12606.000 112151.000 2.400 4.000 0.000 0.000 0.000 0.000 0.000
12606.000 121896.000 0.000 0.000 0.000 0.000 0.000 60.800 0.000
(Cut off couple of columns due to formatting)
So it indicates the employee ID, Department ID, followed by the 12 months worked by each employee and the hours they worked for each month. My 2D array is essentially a list of lists where each row is a list in its own. I am trying to convert each nonzero value to a one and maintain all the zeros. There are 857 rows and 14 columns. My code is as follows:
def convBin(A):
"""Nonzero values are converted into 1s and zero values are kept constant.
"""
for i in range(len(A)):
for j in range(len(A[i])):
if A[i][j] > 0:
A[i][j] == 1
else:
A[i][j] == 0
return A
Can someone tell me what I am doing wrong?
You are doing equality evaluation, not assignment, inside your loop:
A[i][j] == 1
should be
A[i][j] = 1
# ^ note only one equals sign
Also, there is no need to return A; A is being modified in-place, so it is conventional to implicitly return None by removing the explicit return ... line.
You should bear in mind that:
You don't actually want to do anything in the else case; and
Iterating over range(len(...)) is not Pythonic - use e.g. enumerate.
Your function could therefore be simplified to:
def convBin(A):
"""Convert non-zero values in 2-D array A into 1s."""
for row in A:
for j, val in enumerate(row):
if val:
row[j] = 1
Related
I'm trying to profile my python script using cProfile and displaying the results with pstats. In particular, I'm trying to use the pstats function p.sort_stats('time').print_callers(20) to print only the top 20 functions by time, as decribed in the documentation.
I expect to get only the top 20 results (functions profiled and their calling functions ordered by time), instead, I get a seemingly unfiltered list of over 1000 functions that completely saturates my terminal (hence I'm estimating over a 1000 functions).
Why is my restriction argument (i.e. 20) being ignored by print_callers() and how can I fix this?
I've tried looking up an answer and couldn't find one. And I tried to create a minimal reproducible example, but when I do, I can't reproduce the problem (i.e. it works fine).
my profiling code is:
import cProfile
import pstats
if __name__ == '__main__':
cProfile.run('main()', 'mystats')
p = pstats.Stats('mystats')
p.sort_stats('time').print_callers(20)
I'm trying to avoid having to post my full code, so if someone else has encountered this issue before, and can answer without seeing my full code, that would be great.
Thank you very much in advance.
Edit 1:
Partial output:
Ordered by: internal time
List reduced from 1430 to 1 due to restriction <1>
Function was called by...
ncalls tottime cumtime
{built-in method builtins.isinstance} <- 2237 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
9 0.000 0.000 <frozen importlib._bootstrap_external>:485(_compile_bytecode)
44 0.000 0.000 <frozen importlib._bootstrap_external>:1117(_get_spec)
4872 0.001 0.001 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\_strptime.py:321(_strptime)
5 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\abc.py:196(__subclasscheck__)
26 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\calendar.py:58(__getitem__)
14 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\calendar.py:77(__getitem__)
2 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\distutils\version.py:331(_cmp)
20 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\enum.py:797(__or__)
362 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\enum.py:803(__and__)
1 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\inspect.py:73(isclass)
30 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\json\encoder.py:182(encode)
2 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:34(_get_bothseps)
1 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:75(join)
4 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:122(splitdrive)
3 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\ntpath.py:309(expanduser)
4 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\os.py:728(check_str)
44 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\re.py:249(escape)
4 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\re.py:286(_compile)
609 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\dateutil\parser\_parser.py:62(__init__)
1222 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py:48(_count_reduce_items)
1222 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py:58(_mean)
1 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\arrayprint.py:834(__init__)
1393 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:1583(ravel)
1239 0.000 0.000 C:\Users\rafael.natan\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:1966(sum)
...
I figured out the issue.
As usual, the Python Library does not have a bug, rather, I misunderstood the output of the function call.
I'm elaborating it here as an answer in case it helps anyone clear up this misunderstanding in the future.
When I asked the question, I didn't understand why p.print_callers(20) prints out to terminal over a thousand lines, even though I am restricting it to the top 20 function calls (by time).
What is actually happening is that my restriction of printing the top 20 "most time consuming functions", restricts the list to the top 20 functions, but then prints a list of all the functions that called each of the top twenty functions.
Since each of the top 20 functions was called (on average) by about 100 different functions each one of the top functions, had about 100 lines associated with it. so 20*100=2000, and so p.print_callers(20) printed well over a 1000 lines and saturated my terminal.
I hope this saves someone some time and debugging headache :)
I have the following function, which takes a numpy array of floats and an integer as its arguments. Each row in the array 'counts' is the result of some experiment, and I want to randomly draw a list of the experiments and add them up, then repeat this process to create lots of samples groups.
def my_function(counts,nSamples):
''' Create multiple randomly drawn (with replacement)
samples from the raw data '''
nSat,nRegions = counts.shape
sampleData = np.zeros((nSamples,nRegions))
for i in range(nSamples):
rc = np.random.randint(0,nSat,size=nSat)
sampleData[i] = counts[rc].sum(axis=0)
return sampleData
This function seems quite slow, typically counts has around 100,000 rows (and 4 columns) and nSamples is around 2000. I have tried using numba and implicit for loops to try and speed up this code with no success.
What are some other methods to try and increase the speed?
I have run cProfile on the function and got the following output.
8005 function calls in 60.208 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 60.208 60.208 <string>:1(<module>)
2000 0.010 0.000 13.306 0.007 _methods.py:31(_sum)
1 40.950 40.950 60.208 60.208 optimize_bootstrap.py:25(bootstrap)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2000 5.938 0.003 5.938 0.003 {method 'randint' of 'mtrand.RandomState' objects}
2000 13.296 0.007 13.296 0.007 {method 'reduce' of 'numpy.ufunc' objects}
2000 0.015 0.000 13.321 0.007 {method 'sum' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.000 0.000 {range}
Are you sure that
rc = np.random.randint(0,nSat,size=nSat)
is what you want, instead of size=someconstant? Otherwise you're summing over all the rows with many repeats.
edit
does it help to replace the slicing altogether with a matrix product:
rcvec=np.zeros(nSat,np.int)
for i in rc:
rcvec[i]+=1
sampleData[i] = rcvec.dot(counts)
(maybe there is a function in numpy that can give you rcvec faster)
Simply generate all indices in one go with a 2D size for np.random.randint, use those to index into counts array and then sum along the first axis, just like you were doing with the loopy one.
Thus, one vectorized way and as such faster one, would be like so -
RC = np.random.randint(0,nSat,size=(nSat, nSamples))
sampleData_out = counts[RC].sum(axis=0)
Sorting a list of tuples (dictionary keys,values pairs where the key is a random string) is faster when I do not explicitly specify that the key should be used (edit: added operator.itemgetter(0) from comment by #Chepner and the key version is now faster!):
import timeit
setup ="""
import random
import string
random.seed('slartibartfast')
d={}
for i in range(1000):
d[''.join(random.choice(string.ascii_uppercase) for _ in range(16))] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
setup=setup).repeat(7, 1000))
Gives:
0.575334150664
0.579534521128
0.523808984422 (the itemgetter version!)
If however I create a custom object passing the key=lambda x: x[0] explicitly to sorted makes it faster:
setup ="""
import random
import string
random.seed('slartibartfast')
d={}
class A(object):
def __init__(self):
self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
range(16))
def __hash__(self): return hash(self.s)
def __eq__(self, other):
return self.s == other.s
def __ne__(self, other): return self.s != other.s
# def __cmp__(self, other): return cmp(self.s ,other.s)
for i in range(1000):
d[A()] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
setup=setup).repeat(3, 1000))
Gives:
4.65625458083
1.87191002252
1.78853626684
Is this expected ? Seems like second element of the tuple is used in the second case but shouldn't the keys compare unequal ?
Note: uncommenting the comparison method gives worse results but still the times are at one half:
8.11941771831
5.29207000173
5.25420037046
As expected built in (address comparison) is faster.
EDIT: here are the profiling results from my original code that triggered the question - without the key method:
12739 function calls in 0.007 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.007 0.007 <string>:1(<module>)
1 0.000 0.000 0.007 0.007 __init__.py:6527(_refreshOrder)
1 0.002 0.002 0.006 0.006 {sorted}
4050 0.003 0.000 0.004 0.000 bolt.py:1040(__cmp__) # here is the custom object
4050 0.001 0.000 0.001 0.000 {cmp}
4050 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 {method 'sort' of 'list' objects}
291 0.000 0.000 0.000 0.000 __init__.py:6537(<lambda>)
291 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 bolt.py:1240(iteritems)
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
and here are the results when I specify the key:
7027 function calls in 0.004 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.004 0.004 <string>:1(<module>)
1 0.000 0.000 0.004 0.004 __init__.py:6527(_refreshOrder)
1 0.001 0.001 0.003 0.003 {sorted}
2049 0.001 0.000 0.002 0.000 bolt.py:1040(__cmp__)
2049 0.000 0.000 0.000 0.000 {cmp}
2049 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 {method 'sort' of 'list' objects}
291 0.000 0.000 0.000 0.000 __init__.py:6538(<lambda>)
291 0.000 0.000 0.000 0.000 __init__.py:6533(<lambda>)
291 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 bolt.py:1240(iteritems)
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Apparently it is the __cmp__ and not the __eq__ method that is called (edit cause that class defines __cmp__ but not __eq__, see here for the order of resolution of equal and compare).
In the code here __eq__ method is indeed called (8605 times) as seen by adding debug prints (see the comments).
So the difference is as stated in the answer by #chepner. The last thing I am not quite clear on is why are those tuple equality calls needed (IOW why we need to call eq and we don't call cmp directly).
FINAL EDIT: I asked this last point here: Why in comparing python tuples of objects is __eq__ and then __cmp__ called? - turns out it's an optimization, tuple's comparison calls __eq__ in the tuple elements, and only call cmp for not eq tuple elements. So this is now perfectly clear. I thought it called directly __cmp__ so initially it seemed to me that specifying the key is just unneeded and after Chepner's answer I was still not getting where the equal calls come in.
Gist: https://gist.github.com/Utumno/f3d25e0fe4bd0f43ceb9178a60181a53
There are two issues at play.
Comparing two values of builtin types (such as int) happens in C. Comparing two values of a class with an __eq__ method happens in Python; repeatedly calling __eq__ imposes a significant performance penalty.
The function passed with key is called once per element, rather than once per comparison. This means that lambda x: x[0] is called once to build a list of A instances to be compared. Without key, you need to make O(n lg n) tuple comparisons, each of which requires a call to A.__eq__ to compare the first element of each tuple.
The first explains why your first pair of results is under a second while the second takes several seconds. The second explains why using key is faster regardless of the values being compared.
I am trying to make a DataFrame in python and update the various rows and columns in the dataframe though a loop based on various calculations. The calculations are all correct, but when I try to display the DataFrame once the loop is complete, some of the numbers calculated are displayed yet I also see mostly zeros. Below is an example code:
map = pd.DataFrame(np.zeros([16, 6]), columns=['A', 'B', 'C', 'D', 'E', 'F'])
for i in range(0, len(map)):
map.A[i] = 1+1 #Some calculations
map.B[i] = map.A[i] + 2
print map
Result (just an example):
A B C D E F
1 2 4 0.000 0.000 0.000 0.000
2 0.000 0.000 0.000 0.000 0.000 0.000
3 0.000 0.000 0.000 0.000 0.000 0.000
4 0.000 0.000 0.000 0.000 0.000 0.000
5 0.000 0.000 0.000 0.000 0.000 0.000
(continues for 16 rows)
However, if I were to print a specific clolumn, I would get the real calculated numbers. Also, calcB uses the correct numbers from calcA so it has to be just a print error. I am guessing it is something to do with initializing the array and the memory but I am not sure. I originally used np.empty([16, 6]) but the same result occured. How do I get the DataFrame to print the actual numbers, not the zeros?
Below is a function I wrote to label certain rows based on ranges of indexes. For convenience, I'm making the two function arguments, samples and matdat available for download in pickle format.
from operator import itemgetter
from itertools import izip, imap
import pandas as pd
def _insert_design_columns(samples, matdat):
"""Add columns for design-factors, label lines that correspond to a given trials and
then fill in said columns with the appropriate value on lines that belong to a
trial.
samples : DataFrame
DataFrame of eyetracker samples.
column `t`: time sample, in ms
column `event`: TTL event
columns x, y: x and y coordinates of gaze
column cr: corneal reflection area
matdat : dict of numpy arrays
dict mapping matlab variable name to numpy array
returns : modified `samples` dataframe
"""
## This is fairly trivial preperation and data formatting for the nested
# for-loop below. We're just fixing types, adding empty columns, and
# ensuring that our numpy arrays have the right shape.
# Grab variables from the dict & squeeze the numpy arrays
key = ('cuepos', 'targetpos', 'targetorientation', 'soa', 'normalizedResp')
cpos, tpos, torient, soa, resp = map(pd.np.squeeze, imap(matdat.get, key))
cpos = cpos.astype(float)
cpos[cpos < 0] = pd.np.nan
cong = tpos == cpos
cong[pd.isnull(cpos)] = pd.np.nan
# Add empty columns for each factor. These will contain the factor level on
# that correspond to a trial (i.e. between a `TrialStart` and `ReportCueOnset` in
# `samples.event`
samples['soa'] = pd.np.nan
samples['cpos'] = pd.np.nan
samples['tpos'] = pd.np.nan
samples['cong'] = pd.np.nan
samples['torient'] = pd.np.nan
samples['normalizedResp'] = pd.np.nan
## This is important, but not the part we need to optimize.
# Here, we're finding the start and end indexes for every trial. Trials
# are composed of continuous slices of rows.
# Assign trial numbers
tstart = samples[samples.event == 'TrialStart'].t # each trial starts on a `TrialStart`
tstop = samples[samples.event == 'ReportCueOnset'].t # ... and ends on a `ReportCueOnset`
samples['trial'] = pd.np.nan # make an empty column which will contain trial num
## This is the sub-optimal part. Here, we're iterating through our start/end index
# pairs, slicing the dataframe to get the rows we need, and then:
# 1. Assigning a trial number to that slice of rows
# 2. Assigning the correct value to corresponding columns (see `factor_names`)
samples.set_index(['t'], inplace=True)
for i, (start, stop) in enumerate(izip(tstart, tstop)):
samples.loc[start:stop, 'trial'] = i + 1 # label the interval's trial number
# Now that we've labeled a range of rows as a trial, we can add factor levels
# to the corresponding columns
idx = itemgetter(i - 1)
# factor_values/names has the same length as the number of trials we're going to
# find. Get the corresponding value for the current trial so that we can
# assign it.
factor_values = imap(idx, (cpos, tpos, torient, soa, resp, cong))
factor_names = ('cpos', 'tpos', 'torient', 'soa', 'resp', 'cong')
for c, v in izip(factor_names, factor_values): # loop through columns and assign
samples.loc[start:stop, c] = v
samples.reset_index(inplace=True)
return samples
I've performed a %prun, the first few lines of which read:
548568 function calls (547462 primitive calls) in 9.380 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
11360 6.074 0.001 6.084 0.001 index.py:604(__contains__)
2194 0.949 0.000 0.949 0.000 {method 'copy' of 'numpy.ndarray' objects}
1430 0.730 0.001 0.730 0.001 {pandas.lib.infer_dtype}
1098 0.464 0.000 0.467 0.000 internals.py:277(set)
1093/1092 0.142 0.000 9.162 0.008 indexing.py:157(_setitem_with_indexer)
1100 0.106 0.000 1.266 0.001 frame.py:1851(__setitem__)
166 0.047 0.000 0.047 0.000 {method 'astype' of 'numpy.ndarray' objects}
107209 0.037 0.000 0.066 0.000 {isinstance}
14 0.029 0.002 0.029 0.002 {numpy.core.multiarray.concatenate}
39362/38266 0.026 0.000 6.101 0.000 {getattr}
7829/7828 0.024 0.000 0.030 0.000 {numpy.core.multiarray.array}
1092 0.023 0.000 0.457 0.000 internals.py:564(setitem)
5 0.023 0.005 0.023 0.005 {pandas.algos.take_2d_axis0_float64_float64}
4379 0.021 0.000 0.108 0.000 index.py:615(__getitem__)
1101 0.020 0.000 0.582 0.001 frame.py:1967(_sanitize_column)
2192 0.017 0.000 0.946 0.000 internals.py:2236(apply)
8 0.017 0.002 0.017 0.002 {method 'repeat' of 'numpy.ndarray' objects}
Judging by the line that reads 1093/1092 0.142 0.000 9.162 0.008 indexing.py:157(_setitem_with_indexer), I strongly suspect my nested loop assignment with loc to be the culprit. The whole function takes about 9.3 seconds to execute and has to be performed 144 times in total (i.e. ~22 minutes).
Is there a way to vectorize or otherwise optimize the assignment I'm trying to do?