How to make a scatter plot using dictionary? - python

I have the following dictionary of keys and values as lists:
comp = {
0: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
2: [0.2073837448663338, 0.19919737000568305, 0.24386659105843467, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
3: [0.2752555116304319, 0.19919737000568305, 0.21704752129294347, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
4: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
5: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
6: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
7: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0]
}
There are 8 values in each list respectively (1 for each node/person for example). The keys can be called 'time-stamps'. And the values are recorded for 8 nodes/persons from time-stamps 0 to 7.
I want to realise a scatter-plot with x-axis being the time-stamps and y-axis being the values, and points on the plot should be the nodes/persons corresponding to their x and y.
The plot should form a cluster of 8 points (nodes) on each time-stamp. I have the following code that partly works, but I think it takes the average of all the 8 values in each list and plots the points as one in the time-stamps:
import pylab
import matplotlib.pyplot as plt
for key in comp:
#print(key)
for idx, item in enumerate(comp[key]):
x = idx
y = item
if idx == 0:
pylab.scatter(x, y, label=key)
else:
pylab.scatter(x, y)
pylab.legend()
pylab.show()
Not sure how to create the cluster that I want. Any help is appreciated.
(Using Ubuntu 14.04 32-Bit VM and Python 2.7)

I think you are slightly overcomplicating it. If you loop through and get the keys of the dictionary, you can get the values by simply comp[key_name]. This can then be passed to plt.scatter(). You will have to repeat the key 8 times using [key] * 8, in order to pass the whole list of values to scatter:
import matplotlib.pyplot as plt
comp = {
0: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
1: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
2: [0.2073837448663338, 0.19919737000568305, 0.24386659105843467, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
3: [0.2752555116304319, 0.19919737000568305, 0.21704752129294347, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
4: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
5: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
6: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
7: [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0]
}
for key in comp:
plt.scatter([key]*8, comp[key], label=key)
plt.legend()
plt.show()
Update: To get the colors as you want you can do the following, which is a modified version of the answer given by #lkriener
array = np.zeros((8,8))
for key in comp:
array[:,key] = comp[key]
x = range(8)
for i in range (8):
plt.scatter(x, array[i,:], label=i)
plt.legend()
plt.show()
Which gives the figure:
You can move the legend by giving the call to plt.legend() certain arguments. The most important ones are loc and bbox_to_anchor, the documentation of which can be found here

A slight alternative here, if you're able to use a few other modules. A standard scatter plot is useful, however your data features a large number of overlapping points, which aren't visible in the final graph. For this, seaborn's swarmplot might be useful.
To make life a little easier, I use pandas to reshape the data into a DataFrame and then call the sramplot directly:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
comp = {
'0': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
'1': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
'2': [0.2073837448663338, 0.19919737000568305, 0.24386659105843467, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'3': [0.2752555116304319, 0.19919737000568305, 0.21704752129294347, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'4': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'5': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'6': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
'7': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
}
df = pd.DataFrame.from_dict(comp, orient='index')
df.index.rename('Observation', inplace=True)
stacked = df.stack().reset_index()
stacked.rename(columns={'level_1': 'Person', 0: 'Value'}, inplace=True)
sns.swarmplot(data=stacked, x='Observation', y='Value', hue='Person')
plt.show()
This gives the following plot:

To plot the values of the same node in the same color you could do something like this:
import numpy as np
import matplotlib.pyplot as plt
comp = {
'0': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
'1': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
'2': [0.2073837448663338, 0.19919737000568305, .24386659105843467,0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'3': [0.2752555116304319, 0.19919737000568305, 0.21704752129294347, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'4': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'5': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2703400161511446, 0.0, 0.0],
'6': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
'7': [0.2752555116304319, 0.19919737000568305, 0.21782751590851177, 0.25659375810265855, 0.0, 0.2691379068024452, 0.0, 0.0],
}
array = np.zeros([8,8])
for i, key in enumerate(comp.keys()):
for j in range(8):
array[j, i] = comp[key][j]
plt.xlim((-1,8))
plt.ylim((-0.05,0.3))
plt.xlabel('timestamps')
plt.ylabel('values of nodes')
for i in range(8):
plt.plot(range(8), array[i], ls='--', marker='o', label='node {}'.format(i))
plt.legend(loc='upper_left')
plt.savefig('temp.png')
plt.show()
This would give you the following picture:
enter image description here

Related

Parameterizing the construction of a Python list

By using this Python code (I'm working with Python 3.6):
length = 4
overall = [["row" + str(length + 1)] +
[1.0] + [0.0] * (length - 1)]
for i in range(1, length):
overall += [["row" + str(i + length + 1)] +
[0.0] * i + [1.0] + [0.0] * (length - (i + 1))]
I obtain the following list of lists:
OUTPUT 1:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
Now, I'd like to parametrize the piece of code above.
Given a parameter, for example, n_repetitions = 3, I'd like to obtain:
OUTPUT 2:
overall = [['row5', 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]]
where, in each "row", the initial "group" made of 4 numerical one-element lists has been repeated n_repetitions times (3, in this example).
Which is a good way to do that in an automatic way (e.g.: by using a for loop, a list comprehension, ...)?
Yes you can use list comprehension + list addition/multiplication like so:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
overall = [[row[0]] + row[1:]*repeat for row in overall]
I want to confirm.
I follow your code
Output:
overall = [['row5', 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0]]
What do you hope?
overall = [["row5"], [1], [0], [0], [0],
["row6"], [0], [1], [0], [0],
["row7"], [0], [0], [1], [0],
["row8"], [0], [0], [0], [1]]
this ?
length = 4
n_repetitions = 3
arr = [1.0] + [0.0] * (length - 1)
overall = [["row" + str(length + 1)] + arr * n_repetitions]
for i in range(1, length):
_ = [0.0] * i + [1.0] + [0.0] * (length - (i + 1))
overall += [["row" + str(i + length + 1)] + _ * n_repetitions]
overall is a list of lists
type(overall)
list
in matrix terms, this is your ID matrix without the first column:
id =[l[1:] for l in overall]
and this is your label columns:
labels = [[l[0]] for l in overall]
you can then isolate the first element of each list and repeat the rest:
n_repetitions = 3
result = [[l[0]] + l[1:]*n_repetitions for l in overall]
result
[['row5', 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0],
['row6', 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
['row7', 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
['row8', 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]]

Plotting a histogram using a range of values and their frequency as a dictionary

Assume that I have the following dictionary:
scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0, '22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0, '23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0, '23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0, '23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0, '24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0, '24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0, '25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0, '25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0, '25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0, '26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0, '26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0, '26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0, '27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0, '27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0, '28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0, '28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0, '28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0, '29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0, '29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0, '30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0, '30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0, '30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0, '31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0, '31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0, '31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0, '32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0, '32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0, '33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0, '33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0, '33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0, '34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0, '34555-34681': 0.0, '34681-34807': 0.0}}
As you can see, the dictionary consists of a range of values in string and their frequency. I would like to plot this as a histogram, but I don't know how I would be able to transform the string into a form that pandas or plotly would understand. What would your approach be? Or is there an easier way to do it, instead of hardcoding things? Or, would another module be easier option in doing so?
Thanks!
Since the bins (ranges) are already defined and their counts are already aggregated at an initial level, maybe it can help if you build something that overlays a histogram (distribution) on the top of the existing bin ranges:
import matplotlib
%matplotlib inline
def plot_hist(bins,input_dict):
df1 = pd.DataFrame(input_dict).reset_index()
df1['min'] = df1['index'].apply(lambda x:x.split('-')[0]).astype(int)
df1['max'] = df1['index'].apply(lambda x:x.split('-')[1]).astype(int)
df1['group'] = pd.cut(df1['max'],bins,labels=False)
df2 = df1.groupby('group' [['Day1','min','max']].agg({'min':'min','max':'max','Day1':'sum'}).reset_index()
df2['range_new'] = df2['min'].astype(str) + str('-') + df2['max'].astype(str)
df2.plot(x='range_new',y='Day1',kind='bar')
...and call the function by choosing bins lesser than the length of the dictionary - or the first level of 98 bins that are already there, like, say if you want a distribution of 20 groups aggregate:
plot_hist(20,scenario_summary)
Result Image :
hope it helps...
A histogram is basically a simple bar chart, where each bar represents a bin (usually in the form of a range) and a frequency of the elements that fall into that bin.
This is exactly the data that you already have. So instead of computing values for a histogram (as it would be done with plt.hist), you can simply pass your data to plt.bar, as it is. The result would then be this:
The code with your data, as a MCVE :
import matplotlib.pyplot as plt
scenario_summary = { 'Day1': {
'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
'22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
'23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
'23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
'23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
'24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
'24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
'25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
'25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
'25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
'26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
'26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
'26995-27121': 19.0, '27121-27247': 7.0, '27247-27373': 11.0,
'27373-27499': 15.0, '27499-27625': 7.0, '27625-27751': 4.0,
'27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
'28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
'28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
'28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
'29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
'29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
'30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
'30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
'30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
'31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
'31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
'31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
'32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
'32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
'33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
'33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
'33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
'34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
'34555-34681': 0.0, '34681-34807': 0.0}}
data = scenario_summary['Day1']
x = range(len(data))
y = list(data.values())
plt.figure(figsize=(16, 9))
plt.bar(x, y)
plt.subplots_adjust(bottom=0.2)
plt.xticks(x, data.keys(), rotation='vertical')
plt.show()
You can use pandas module to convert dictionary data into data frame:
import pandas as pd
import matplotlib.pyplot as plt
scenario_summary = {'Day1': {'22459-22585': 0.0, '22585-22711': 0.0, '22711-22837': 0.0,
'22837-22963': 0.0, '22963-23089': 0.0, '23089-23215': 0.0,
'23215-23341': 0.0, '23341-23467': 0.0, '23467-23593': 0.0,
'23593-23719': 0.0, '23719-23845': 0.0, '23845-23971': 0.0,
'23971-24097': 0.0, '24097-24223': 0.0, '24223-24349': 0.0,
'24349-24475': 0.0, '24475-24601': 0.0, '24601-24727': 0.0,
'24727-24853': 0.0, '24853-24979': 0.0, '24979-25105': 0.0,
'25105-25231': 0.0, '25231-25357': 0.0, '25357-25483': 0.0,
'25483-25609': 0.0, '25609-25735': 0.0, '25735-25861': 0.0,
'25861-25987': 0.0, '25987-26113': 1.0, '26113-26239': 1.0,
'26239-26365': 0.0, '26365-26491': 2.0, '26491-26617': 5.0,
'26617-26743': 5.0, '26743-26869': 5.0, '26869-26995': 12.0,
'26995-27121': 19.0, '27121-27247': 7.000000000000001, '27247-27373': 11.0,
'27373-27499': 15.0, '27499-27625': 7.000000000000001, '27625-27751': 4.0,
'27751-27877': 4.0, '27877-28003': 2.0, '28003-28129': 0.0,
'28129-28255': 0.0, '28255-28381': 0.0, '28381-28507': 0.0,
'28507-28633': 0.0, '28633-28759': 0.0, '28759-28885': 0.0,
'28885-29011': 0.0, '29011-29137': 0.0, '29137-29263': 0.0,
'29263-29389': 0.0, '29389-29515': 0.0, '29515-29641': 0.0,
'29641-29767': 0.0, '29767-29893': 0.0, '29893-30019': 0.0,
'30019-30145': 0.0, '30145-30271': 0.0, '30271-30397': 0.0,
'30397-30523': 0.0, '30523-30649': 0.0, '30649-30775': 0.0,
'30775-30901': 0.0, '30901-31027': 0.0, '31027-31153': 0.0,
'31153-31279': 0.0, '31279-31405': 0.0, '31405-31531': 0.0,
'31531-31657': 0.0, '31657-31783': 0.0, '31783-31909': 0.0,
'31909-32035': 0.0, '32035-32161': 0.0, '32161-32287': 0.0,
'32287-32413': 0.0, '32413-32539': 0.0, '32539-32665': 0.0,
'32665-32791': 0.0, '32791-32917': 0.0, '32917-33043': 0.0,
'33043-33169': 0.0, '33169-33295': 0.0, '33295-33421': 0.0,
'33421-33547': 0.0, '33547-33673': 0.0, '33673-33799': 0.0,
'33799-33925': 0.0, '33925-34051': 0.0, '34051-34177': 0.0,
'34177-34303': 0.0, '34303-34429': 0.0, '34429-34555': 0.0,
'34555-34681': 0.0, '34681-34807': 0.0}}
# convert to data frame
data_frame = pd.DataFrame.from_dict(scenario_summary)
# plot data
plt.hist(data_frame['Day1'], density=1, bins=20)
plt.show()

y-axis labels don't get displayed properly with Plotly

I've a problem with displaying the y-axis labels properly with plotly.
This is my index:
index = ['2015-11','2015-12','2016-01','2016-02','2016-03','2016-04','2016-05',
'2016-06','2016-07','2016-08','2016-09','2016-10','2016-11']
the data
data = [[0.115, 0.077, 0.0, 0.038, 0.0, 0.038, 0.038, 0.077, 0.0, 0.077, 0.077, 0.038],
[0.073, 0.055, 0.083, 0.055, 0.018, 0.055, 0.073, 0.037, 0.028, 0.037, 0.009, 0.0],
[0.099, 0.027, 0.036, 0.045, 0.063, 0.153, 0.027, 0.045, 0.063, 0.027, 0.0, 0.0],
[0.076, 0.038, 0.053, 0.061, 0.098, 0.068, 0.038, 0.061, 0.023, 0.0, 0.0, 0.0],
[0.142, 0.062, 0.027, 0.08, 0.097, 0.044, 0.071, 0.027, 0.0, 0.0, 0.0, 0.0],
[0.169, 0.026, 0.026, 0.026, 0.013, 0.013, 0.091, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.138, 0.121, 0.052, 0.017, 0.034, 0.017, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.297, 0.081, 0.054, 0.054, 0.054, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.095, 0.016, 0.024, 0.04, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.102, 0.023, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.054, 0.027, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.087, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
I create a heatmap with following code:
import plotly.figure_factory as ff
from plotly.offline import iplot
import re
cols = range(12)
index = index
df = pd.DataFrame(data, columns = cols)
df.index = index
x = df.columns.tolist()
y = df.index.tolist()
z = df.values
annotation_text = np.char.mod('%.0f%%', df*100).tolist()
annotation_text = [[re.sub('^0%$','', x) for x in l] for l in annotation_text]
colorscale=[[0.0, 'rgb(248, 248, 255)'],
[0.04, 'rgb(224, 228, 236)'],
[0.08, 'rgb(196, 210, 226)'],
[0.12, 'rgb(158, 178, 226)'],
[0.16, 'rgb(134, 158, 227)'],
[0.2, 'rgb(122, 146, 227)'],
[1.0, 'rgb(65, 105, 225)'],
]
fig = ff.create_annotated_heatmap(z, x=x, y=y, colorscale= colorscale,
annotation_text = annotation_text)
fig.layout.yaxis.autorange = 'reversed'
offline.iplot(fig, filename='annotated_heatmap_color.html')
Which produces the correct heatmap but with the y-axis labels missing
When I change the index to shorter values like '5-11' with
index = [x[3:] for x in index]
the labels show up.
I don't understand the logic behind that and would like to know how to fix it.
Plotly.py uses plotly.js under the hood, which is transforming your date strings to a numerical date format and misplacing them on your non numerical axis.
To explicit a categorical axis you just have to add:
fig.layout.yaxis.type = 'category'

python why does max(max(float_2d_array)) give wrong answer?

for example:
a = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 6.409872844109646, 0.17506688391255013, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
print max(max(a))
print max(a)
The result is:
1.28529767878
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0]
This is clearly wrong, the max value should be 6.409872844109646.
b = []
for i in a:
b.extend(i)
print max(b)
6.40987284411
This is python 2.7, Cpython.
Thank you very much.
Lists are sorted element-wise. Since the index of 1.2852976787772832 is one place ahead of that of 6.409872844109646 in the candidate sublists, the list containing the former gets picked as the maximum.
In the same index in the second list, we have a 0 and 1.2852976787772832 is clearly greater than 0:
[0.0, 0.0, 1.2852976787772832, 0.7965388321000092, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 6.409872844109646, 0.17506688391255013, 0.0, 0.0, 0.0, 0.0]
# ^ here's your tie-breaker
In fact, the next index containing 6.4... is never checked.
I'm not sure how you expect the maximum sublist to be selected: sublist with maximum sum, sublist containing maximum number? You'll have to code the behavior you want if the default behavior does not cut it.
Moses already explained why you got the wrong result: first element greater than the other "wins" when comparing lists.
To get the maximum value you have to flatten your list:
print(max(x for l in a for x in l))

Scipy optimize.curve_fit sometimes won't converge

I'm trying to use numpy.optimize.curve_fit to estimate the frequency and phase of an on/off sequence.
This is the code I'm using:
from numpy import *
from scipy import optimize
row = array([0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,])
def fit_func(x, a, b, c, d):
return c * sin (a * x + b) + d
p0 = [(pi/10.0), 5.0, row.std(), row.mean()]
result = optimize.curve_fit(fit_func, arange(len(row)), row, p0)
print result
This works. But on some rows, even though they seem perfectly ok, it fails.
Example of failing row:
row = array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,])
The error is:
RuntimeError: Optimal parameters not found: Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000
Which tells me very little about what's happened.
A quick test shows that varying the parameters in p0 will cause that row to succeed... and others to fail. Why is that?
I tried both rows of data that you provided and both worked for me just fine. I'm using Scipy 0.8.0rc3. What version are you using? Another thing that might help is to set c and d to fixed values since they really should be the same every time. I set c to 0.6311786 and d to .5. You could also use an fft with zero padding and quadratic fitting around the peak to find the frequency if you want another method. Really, any pitch estimation method is applicable since you are looking for the fundamental frequency.

Categories