pie plot in matplotlib in Python - python

I am trying to plot a pie chart from a .txt file with data-set looks like:
asp: 2.11
glu: 1.11
arg: 0.99
his: 5.11
acid: 11.1
base: 2.11
now,
1) I want to plot a pie chart with first 4 entries, with proper labeling.
2) and then another pie plot using last 2 entries.
I was trying with this following code but I am getting errors. My code is:
from pylab import *
inp = open('c:/users/rox/desktop/xx.txt','r').read().strip().replace(': ',' ').split('\n')
for line in map(str.split,inp):
x = line[0]
z = line[1]
fracs = [x]
labels = [z]
pie(fracs,labels=labels,explode=None,autopct='%1.1f%%,shadow=False)
show()
but this code is generating an error report: Could not convert string to float...
and do I need to use tempfile to plot first 4 entries present in the .txt file.
if I want to plot pie chart using last two line of data set, then could it be done using slicing.

Edit: make input more general, so multiple plots can be read from same file:
import matplotlib.pyplot as plt
def read_data(f, num_lines=1, split_on=':'):
lines = (f.next() for i in range(num_lines))
pieces = (line.split(split_on) for line in lines)
data = ((a,float(b)) for a,b in pieces)
return zip(*data)
with open("xx.txt") as inf:
amino_names, amino_values = read_data(inf, 4)
ph_names, ph_values = read_data(inf, 2)
fig = plt.figure(figsize=(2,1))
p1 = fig.add_subplot(1,2,1)
p1.pie(amino_values, labels=amino_names)
p2 = fig.add_subplot(1,2,2)
p2.pie(ph_values, labels=ph_names)
fig.show()
results in

Related

Trying to plot real time data in plotly

I have this code were i want to draw a live cahrt
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import style
style.use('fivethirtyeight')
# Create figure for plotting
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
xs = []
ys = []
def animate(i, xs, ys):
# Read temperature (Celsius) from TMP102
polarity = open("sentiment2.txt", "r").read()
lines = polarity.split('\n')
print(lines)
for line in lines:
if len(line) > 1:
x,y = line.split(',')
xs.append(dt.datetime.now().strftime('%H:%M:%S.%f'))
ys.append(line)
# Add x and y to lists
# Limit x and y lists to 20 items
xs = xs[-20:]
ys = ys[-20:]
# Draw x and y lists
ax.clear()
ax.plot(xs, ys)
# Format plot
plt.xticks(rotation=45, ha='right')
plt.subplots_adjust(bottom=0.30)
plt.title('Etehreum Sentiment')
plt.ylabel('Sentiment')
# Set up plot to call animate() function periodically
ani = animation.FuncAnimation(fig, animate, fargs=(xs, ys), interval=60000)
plt.show()
When i run it i get this error:
File "ploty.py", line 23, in animate
x,y = line.split(',')
ValueError: not enough values to unpack (expected 2, got 1)
I have this code from a tutorial and he does the same as i try to achieve so im not sure whats the problem here
What I try to achieve is to get the value(sentiment) from my text file- i run a senitment analysis every 10 minutes and whenever my sentiment.txt file gets updated i want to update my chart
the content of my sentiment.txt file:
-8.944388227513231
-7.731292989417991
-8.493252615440113
0.5413275613275612
Perhaps look at how you had generated the sentiment2.txt file.
Each line of your sentiment2.txt file has only one number and there is no comma.
so despite the line.split(','), there is only one of the coordinate but the code is expecting TWO, X AND Y.
Updated:
Currently, the chart is plotting xs and ys onto the chart; ys are values from the file and xs is the real-time when reading the value. If that is intended, then the split line is redundant and can be removed, and you can remove the 'commas' from your source file.
However, if the sentiment file should contains both x-axis and y-axis values; x and y should map into xs and ys instead:
x,y = line.split(',')
xs.append(x)
ys.append(y)
The latter can be improved further by working with panda which works with csv files see: Plot from CSV with Plotly Express

Plot 2D array with Pandas, Matplotlib, and Numpy

As a result from simulations, I parsed the output using Pandas groupby(). I am having a bit of difficulty to plot the data the way I want. Here's the Pandas output file (suppressed for simplicity) that I'm trying to plot:
Avg-del Min-del Max-del Avg-retx Min-retx Max-retx
Prob Producers
0.3 1 8.060291 0.587227 26.709371 42.931779 5.130041 136.216642
5 8.330889 0.371387 54.468836 43.166326 3.340193 275.932170
10 1.012147 0.161975 4.320447 6.336965 2.026241 19.177802
0.5 1 8.039639 0.776463 26.053635 43.160880 5.798276 133.090358
5 4.729875 0.289472 26.717824 25.732373 2.909811 135.289244
10 1.043738 0.160671 4.353993 6.461914 2.015735 19.595393
My y-axis is delay and my x-axis is the number of producers. I want to have errorbars for probability p=0.3 and another one for p=0.5.
My python script is the following:
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.expand_frame_repr', False)
outputFile = 'averages.txt'
f_out = open(outputFile, 'w')
data = pd.read_csv(sys.argv[1], delimiter=",")
result = data.groupby(["Prob", "Producers"]).mean()
print "Writing to output file: " + outputFile
result_s = str(result)
f_out.write(result_s)
f_out.close()
*** Update from James ***
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')
plt.show()
Now I have 4 sliced arrays, one for each probability.
How do I slice them again based on delay(del) and retx, and plot errorbars based on ave, min, max?
Ok, there is a lot going on here. First, it is plotting 6 lines. When your code calls
plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3')
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')
it is calling plt.plot on a 3x3 array of data. plt.plot interprets this input not as an x and y, but rather as 3 separate series of y-values (with 3 points each). For the x values, it is imputing the values 0,1,2. In other words it for the first plot call it is plotting the data:
x = [1,2,3]; y = [8.060291, 8.330889, 1.012147]
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975]
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]
Based on your x-label, I think you want the values to be x = [1,5,10]. Try this to see if it gets the plot you want.
# iterate over the first dataframe index
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')

Python: Legend has wrong colors on Pandas MultiIndex plot

I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.

Python Bar Plot pylab not reading in all data from csv

I am attempting to create a bar plot of a large text file of data that looks like the following, storedd in a csv file:
#DowntonPBS, 23
#DowntonAbbey, 12
#Download, 8
#Download:, 2
#Downloads, 2
#DownstairsMixtape, 1
#DownWithAssad, 1
#DownYoTLParty, 1
#DowntonAbbey?, 1
#Downtonabbey, 1
#DowntownAbbey, 1
The following code is where I'm at, and while this method has worked in the past for different plotting scripts, I've done something wrong here that I just can't seem to find. Instead of plotting all of the data, I only seem to be getting three records.
import pylab as p
import sys
from matplotlib.mlab import csv2rec
y = []
fig = p.figure()
ax = fig.add_subplot(1,1,1)
input = open(sys.argv[1], 'r')
data = csv2rec(input, names=['tag', 'count'])
for item in data['count']:
y.append(item)
N = len(y)
ind = range(N)
ax.bar(ind, y, align='center')
ax.set_ylabel('Counts')
ax.set_title('HashTag Diversity')
ax.set_xticks(ind)
group_labels = data['tag']
ax.set_xticklabels(group_labels)
fig.autofmt_xdate()
p.show()
If I add print statements for y and N, and run the script against my larger dataset I end up with:
[45, 37, 36]
3
These values should be a very large array "1000" values and the lenth (N) should = 1000. I'm not sure what's going on here.
csv2rec() will ignore lines startswith "#" by default, you can change this by:
csv2rec(input, names=['tag', 'count'], comments="something")

How to plot data from multiple two column text files with legends in Matplotlib?

How do I open multiple text files from different directories and plot them on a single graph with legends?
This is relatively simple if you use pylab (included with matplotlib) instead of matplotlib directly. Start off with a list of filenames and legend names, like [ ('name of file 1', 'label 1'), ('name of file 2', 'label 2'), ...]. Then you can use something like the following:
import pylab
datalist = [ ( pylab.loadtxt(filename), label ) for filename, label in list_of_files ]
for data, label in datalist:
pylab.plot( data[:,0], data[:,1], label=label )
pylab.legend()
pylab.title("Title of Plot")
pylab.xlabel("X Axis Label")
pylab.ylabel("Y Axis Label")
You also might want to add something like fmt='o' to the plot command, in order to change from a line to points. By default, matplotlib with pylab plots onto the same figure without clearing it, so you can just run the plot command multiple times.
Assume your file looks like this and is named test.txt (space delimited):
1 2
3 4
5 6
7 8
Then:
#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
with open("test.txt") as f:
data = f.read()
data = data.split('\n')
x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
I find that browsing the gallery of plots on the matplotlib site helpful for figuring out legends and axes labels.
I feel the simplest way would be
from matplotlib import pyplot;
from pylab import genfromtxt;
mat0 = genfromtxt("data0.txt");
mat1 = genfromtxt("data1.txt");
pyplot.plot(mat0[:,0], mat0[:,1], label = "data0");
pyplot.plot(mat1[:,0], mat1[:,1], label = "data1");
pyplot.legend();
pyplot.show();
label is the string that is displayed on the legend
you can plot as many series of data points as possible before show() to plot all of them on the same graph
This is the simple way to plot simple graphs. For other options in genfromtxt go to this url.

Categories