By using this Python code (I'm working with Python 3.6):
length = 4
overall = [["row" + str(length + 1)] +
[1.0] + [0.0] * (length - 1)]
for i in range(1, length):
overall += [["row" + str(i + length + 1)] +
[0.0] * i + [1.0] + [0.0] * (length - (i + 1))]
I obtain the following list of lists:
OUTPUT 1:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
Now, I'd like to parametrize the piece of code above.
Given a parameter, for example, n_repetitions = 3, I'd like to obtain:
OUTPUT 2:
overall = [['row5', 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]]
where, in each "row", the initial "group" made of 4 numerical one-element lists has been repeated n_repetitions times (3, in this example).
Which is a good way to do that in an automatic way (e.g.: by using a for loop, a list comprehension, ...)?
Yes you can use list comprehension + list addition/multiplication like so:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
overall = [[row[0]] + row[1:]*repeat for row in overall]
I want to confirm.
I follow your code
Output:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
What do you hope?
overall = [["row5"], [1], [0], [0], [0],
["row6"], [0], [1], [0], [0],
["row7"], [0], [0], [1], [0],
["row8"], [0], [0], [0], [1]]
this ?
length = 4
n_repetitions = 3
arr = [1.0] + [0.0] * (length - 1)
overall = [["row" + str(length + 1)] + arr * n_repetitions]
for i in range(1, length):
_ = [0.0] * i + [1.0] + [0.0] * (length - (i + 1))
overall += [["row" + str(i + length + 1)] + _ * n_repetitions]
overall is a list of lists
type(overall)
list
in matrix terms, this is your ID matrix without the first column:
id =[l[1:] for l in overall]
and this is your label columns:
labels = [[l[0]] for l in overall]
you can then isolate the first element of each list and repeat the rest:
n_repetitions = 3
result = [[l[0]] + l[1:]*n_repetitions for l in overall]
result
[['row5', 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]]
Related
The string to be converted is:
strng= "0.000000000000000E+000, 2*2.400000000000000E-002 , 97*0.000000000000000E+000 ,"
which I was able to split using strng.split(',')
[' 0.000000000000000E+000', ' 2*2.400000000000000E-002 ', ' 97*0.000000000000000E+000 ', '\n']
however ' 2*2.400000000000000E-002 ' is not a string but actually two elemts of a list
and ' 97*0.000000000000000E+000 is actually 97 elements of a list.
Here is an ugly attempt to accomplish this:
a=strng.split("=").split(',')
lst=[]
for item in a:
print("intem=", item)
i= item.split("*")
if len(i) >1:
print(int(i[0])*i[1])
lst.append(int(i[0])*i[1])
else:
lst.append(i[0])
print("================")
print(lst)
What is a more elegant way to accomplish this?
Considering
string = "0.000000000000000E+000, 2*2.400000000000000E-002 , 97*0.000000000000000E+000 ,"
You can apply the following steps:
Split and strip the original string:
# Split by commas
expressions = string.split(',')
# Remove leading and trailing white space of each element
expressions = [expression.strip() for expression in expressions]
# Remove empty elements
expressions = [expression for expression in expressions if expression]
or, alternatively,
expressions = [expression.strip() for expression in string.split(',') if expression.strip()]
This will evaluate to
['0.000000000000000E+000', '2*2.400000000000000E-002', '97*0.000000000000000E+000']
Create a new list and evaluate each expression:
result = []
for expression in expressions:
# Split by the multiplication sign
operands = expression.split('*')
# A single element, just add to `result`
if len(operands) == 1:
result.append(float(operands[0]))
# Two operands, add to `result` repeatdely
elif len(operands) == 2:
o1, o2 = operands
o1 = int(o1)
o2 = float(o2)
for _ in range(o1):
result.append(o2)
This will evaluate to
[0.0, 0.024, 0.024, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Here is another way to do this with string manipulations but this is only so that you for your understanding on how it works.
string = "0.000000000000000E+000, 2*2.400000000000000E-002 , 97*0.000000000000000E+000 ,"
out = [eval(i.strip().replace('*','*[')+']' if '*' in i else '['+i.strip()+']') for i in strng.split(',') if len(i)>0]
print(out)
[[0.0], [0.024, 0.024], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
Assume that I have the following dictionary:
scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0, '22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0, '23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0, '23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0, '23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0, '24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0, '24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0, '25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0, '25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0, '25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0, '26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0, '26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0, '26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0, '27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0, '27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0, '28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0, '28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0, '28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0, '29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0, '29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0, '30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0, '30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0, '30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0, '31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0, '31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0, '31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0, '32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0, '32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0, '33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0, '33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0, '33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0, '34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0, '34555-34681': 0.0, '34681-34807': 0.0}}
As you can see, the dictionary consists of a range of values in string and their frequency. I would like to plot this as a histogram, but I don't know how I would be able to transform the string into a form that pandas or plotly would understand. What would your approach be? Or is there an easier way to do it, instead of hardcoding things? Or, would another module be easier option in doing so?
Thanks!
Since the bins (ranges) are already defined and their counts are already aggregated at an initial level, maybe it can help if you build something that overlays a histogram (distribution) on the top of the existing bin ranges:
import matplotlib
%matplotlib inline
def plot_hist(bins,input_dict):
df1 = pd.DataFrame(input_dict).reset_index()
df1['min'] = df1['index'].apply(lambda x:x.split('-')[0]).astype(int)
df1['max'] = df1['index'].apply(lambda x:x.split('-')[1]).astype(int)
df1['group'] = pd.cut(df1['max'],bins,labels=False)
df2 = df1.groupby('group' [['Day1','min','max']].agg({'min':'min','max':'max','Day1':'sum'}).reset_index()
df2['range_new'] = df2['min'].astype(str) + str('-') + df2['max'].astype(str)
df2.plot(x='range_new',y='Day1',kind='bar')
...and call the function by choosing bins lesser than the length of the dictionary - or the first level of 98 bins that are already there, like, say if you want a distribution of 20 groups aggregate:
plot_hist(20,scenario_summary)
Result Image :
hope it helps...
A histogram is basically a simple bar chart, where each bar represents a bin (usually in the form of a range) and a frequency of the elements that fall into that bin.
This is exactly the data that you already have. So instead of computing values for a histogram (as it would be done with plt.hist), you can simply pass your data to plt.bar, as it is. The result would then be this:
The code with your data, as a MCVE :
import matplotlib.pyplot as plt
scenario_summary = { 'Day1': {
'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
'22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
'23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
'23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
'23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
'24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
'24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
'25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
'25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
'25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
'26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
'26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
'26995-27121': 19.0, '27121-27247': 7.0, '27247-27373': 11.0,
'27373-27499': 15.0, '27499-27625': 7.0, '27625-27751': 4.0,
'27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
'28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
'28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
'28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
'29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
'29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
'30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
'30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
'30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
'31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
'31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
'31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
'32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
'32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
'33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
'33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
'33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
'34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
'34555-34681': 0.0, '34681-34807': 0.0}}
data = scenario_summary['Day1']
x = range(len(data))
y = list(data.values())
plt.figure(figsize=(16, 9))
plt.bar(x, y)
plt.subplots_adjust(bottom=0.2)
plt.xticks(x, data.keys(), rotation='vertical')
plt.show()
You can use pandas module to convert dictionary data into data frame:
import pandas as pd
import matplotlib.pyplot as plt
scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
'22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
'23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
'23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
'23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
'24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
'24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
'25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
'25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
'25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
'26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
'26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
'26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0,
'27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0,
'27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
'28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
'28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
'28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
'29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
'29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
'30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
'30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
'30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
'31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
'31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
'31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
'32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
'32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
'33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
'33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
'33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
'34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
'34555-34681': 0.0, '34681-34807': 0.0}}
# convert to data frame
data_frame = pd.DataFrame.from_dict(scenario_summary)
# plot data
plt.hist(data_frame['Day1'], density=1, bins=20)
plt.show()
How to make every element as positive in the list
l =[[u'Contribution', -2.6, -2.6, -2.6, -1.3, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -9.100000000000001], [u'Tax ',
-569.72, -569.72, -569.72, -284.86, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, -1994.02], [u'CityTax', -387.32,
-387.32, -387.32, -193.66, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,-1355.6200000000001]]
the output should be like
[[u'Contribution', 2.6, 2.6, 2.6, 1.3, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.100000000000001], [u'Tax ',
569.72, 569.72, 569.72, 284.86, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1994.02], [u'CityTax', 387.32,
387.32, 387.32, 193.66, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0,1355.6200000000001]]
-ve values should be converted into positive and the list should be as it is
I've tried this
p=[]
k=[]
for i in l:
p.append([abs(x) for x in i[1:]])
k.append(i[0])
for j in p:
j.append(k[j])
This comprehension will work:
[[x[0]] + [abs(y) for y in x[1:]] for x in l]
An alternative to the inner comprehension, you can use map:
[[x[0]] + list(map(abs, x[1:])) for x in l]
Or, most generically:
[[abs(y) if hasattr(y, '__abs__') else y for y in x] for x in l]
Another option, if non-number elements are spread across the sublists (also avoids slicing):
[[abs(x) if isinstance(x,(int,float)) else x for x in sublist] for sublist in l]
Your method is also close to correct, we just would need to join p and k:
[[k_sub]+p_sub for k_sub,p_sub in zip(k,p)]
for example:
a = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 6.409872844109646, 0.17506688391255013, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
print max(max(a))
print max(a)
The result is:
1.28529767878
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0]
This is clearly wrong, the max value should be 6.409872844109646.
b = []
for i in a:
b.extend(i)
print max(b)
6.40987284411
This is python 2.7, Cpython.
Thank you very much.
Lists are sorted element-wise. Since the index of 1.2852976787772832 is one place ahead of that of 6.409872844109646 in the candidate sublists, the list containing the former gets picked as the maximum.
In the same index in the second list, we have a 0 and 1.2852976787772832 is clearly greater than 0:
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 6.409872844109646, 0.17506688391255013, 0.0, 0.0, 0.0, 0.0]
# ^ here's your tie-breaker
In fact, the next index containing 6.4... is never checked.
I'm not sure how you expect the maximum sublist to be selected: sublist with maximum sum, sublist containing maximum number? You'll have to code the behavior you want if the default behavior does not cut it.
Moses already explained why you got the wrong result: first element greater than the other "wins" when comparing lists.
To get the maximum value you have to flatten your list:
print(max(x for l in a for x in l))
I'm trying to use numpy.optimize.curve_fit to estimate the frequency and phase of an on/off sequence.
This is the code I'm using:
from numpy import *
from scipy import optimize
row = array([0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,])
def fit_func(x, a, b, c, d):
return c * sin (a * x + b) + d
p0 = [(pi/10.0), 5.0, row.std(), row.mean()]
result = optimize.curve_fit(fit_func, arange(len(row)), row, p0)
print result
This works. But on some rows, even though they seem perfectly ok, it fails.
Example of failing row:
row = array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,])
The error is:
RuntimeError: Optimal parameters not found: Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000
Which tells me very little about what's happened.
A quick test shows that varying the parameters in p0 will cause that row to succeed... and others to fail. Why is that?
I tried both rows of data that you provided and both worked for me just fine. I'm using Scipy 0.8.0rc3. What version are you using? Another thing that might help is to set c and d to fixed values since they really should be the same every time. I set c to 0.6311786 and d to .5. You could also use an fft with zero padding and quadratic fitting around the peak to find the frequency if you want another method. Really, any pitch estimation method is applicable since you are looking for the fundamental frequency.