Python Bar Plot pylab not reading in all data from csv

Python Bar Plot pylab not reading in all data from csv - python

I am attempting to create a bar plot of a large text file of data that looks like the following, storedd in a csv file:
#DowntonPBS, 23
#DowntonAbbey, 12
#Download, 8
#Download:, 2
#Downloads, 2
#DownstairsMixtape, 1
#DownWithAssad, 1
#DownYoTLParty, 1
#DowntonAbbey?, 1
#Downtonabbey, 1
#DowntownAbbey, 1
The following code is where I'm at, and while this method has worked in the past for different plotting scripts, I've done something wrong here that I just can't seem to find. Instead of plotting all of the data, I only seem to be getting three records.
import pylab as p
import sys
from matplotlib.mlab import csv2rec
y = []
fig = p.figure()
ax = fig.add_subplot(1,1,1)
input = open(sys.argv[1], 'r')
data = csv2rec(input, names=['tag', 'count'])
for item in data['count']:
y.append(item)
N = len(y)
ind = range(N)
ax.bar(ind, y, align='center')
ax.set_ylabel('Counts')
ax.set_title('HashTag Diversity')
ax.set_xticks(ind)
group_labels = data['tag']
ax.set_xticklabels(group_labels)
fig.autofmt_xdate()
p.show()
If I add print statements for y and N, and run the script against my larger dataset I end up with:
[45, 37, 36]
3
These values should be a very large array "1000" values and the lenth (N) should = 1000. I'm not sure what's going on here.

csv2rec() will ignore lines startswith "#" by default, you can change this by:
csv2rec(input, names=['tag', 'count'], comments="something")

Related

Trying to plot real time data in plotly

I have this code were i want to draw a live cahrt
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import style
style.use('fivethirtyeight')
# Create figure for plotting
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
xs = []
ys = []
def animate(i, xs, ys):
# Read temperature (Celsius) from TMP102
polarity = open("sentiment2.txt", "r").read()
lines = polarity.split('\n')
print(lines)
for line in lines:
if len(line) > 1:
x,y = line.split(',')
xs.append(dt.datetime.now().strftime('%H:%M:%S.%f'))
ys.append(line)
# Add x and y to lists
# Limit x and y lists to 20 items
xs = xs[-20:]
ys = ys[-20:]
# Draw x and y lists
ax.clear()
ax.plot(xs, ys)
# Format plot
plt.xticks(rotation=45, ha='right')
plt.subplots_adjust(bottom=0.30)
plt.title('Etehreum Sentiment')
plt.ylabel('Sentiment')
# Set up plot to call animate() function periodically
ani = animation.FuncAnimation(fig, animate, fargs=(xs, ys), interval=60000)
plt.show()
When i run it i get this error:
File "ploty.py", line 23, in animate
x,y = line.split(',')
ValueError: not enough values to unpack (expected 2, got 1)
I have this code from a tutorial and he does the same as i try to achieve so im not sure whats the problem here
What I try to achieve is to get the value(sentiment) from my text file- i run a senitment analysis every 10 minutes and whenever my sentiment.txt file gets updated i want to update my chart
the content of my sentiment.txt file:
-8.944388227513231
-7.731292989417991
-8.493252615440113
0.5413275613275612

Perhaps look at how you had generated the sentiment2.txt file.
Each line of your sentiment2.txt file has only one number and there is no comma.
so despite the line.split(','), there is only one of the coordinate but the code is expecting TWO, X AND Y.
Updated:
Currently, the chart is plotting xs and ys onto the chart; ys are values from the file and xs is the real-time when reading the value. If that is intended, then the split line is redundant and can be removed, and you can remove the 'commas' from your source file.
However, if the sentiment file should contains both x-axis and y-axis values; x and y should map into xs and ys instead:
x,y = line.split(',')
xs.append(x)
ys.append(y)
The latter can be improved further by working with panda which works with csv files see: Plot from CSV with Plotly Express

Pandas groupby results on the same plot

I am dealing with the following data frame (only for illustration, actual df is quite large):
seq x1 y1
0 2 0.7725 0.2105
1 2 0.8098 0.3456
2 2 0.7457 0.5436
3 2 0.4168 0.7610
4 2 0.3181 0.8790
5 3 0.2092 0.5498
6 3 0.0591 0.6357
7 5 0.9937 0.5364
8 5 0.3756 0.7635
9 5 0.1661 0.8364
Trying to plot multiple line graph for the above coordinates (x as "x1 against y as "y1").
Rows with the same "seq" is one path, and has to be plotted as one separate line, like all the x, y coordinates corresponding the seq = 2 belongs to one line, and so on.
I am able to plot them, but on a separate graphs, I want all the lines on the same graph, Using subplots, but not getting it right.
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib notebook
df.groupby("seq").plot(kind = "line", x = "x1", y = "y1")
This creates 100's of graphs (which is equal to the number of unique seq). Suggest me a way to obtain all the lines on the same graph.
**UPDATE*
To resolve the above problem, I implemented the following code:
fig, ax = plt.subplots(figsize=(12,8))
df.groupby('seq').plot(kind='line', x = "x1", y = "y1", ax = ax)
plt.title("abc")
plt.show()
Now, I want a way to plot the lines with specific colors. I am clustering path from seq = 2 and 5 in cluster 1; and path from seq = 3 in another cluster.
So, there are two lines under cluster 1 which I want in red and 1 line under cluster 2 which can be green.
How should I proceed with this?

You need to init axis before plot like in this example
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])
# plot groupby results on the same canvas
fig, ax = plt.subplots(figsize=(8,6))
df.groupby('ProjID').plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
plt.show()

Consider the dataframe df
df = pd.DataFrame(dict(
ProjID=np.repeat(range(10), 10),
Xcoord=np.random.rand(100),
Ycoord=np.random.rand(100),
))
Then we create abstract art like this
df.set_index('Xcoord').groupby('ProjID').Ycoord.plot()

Another way:
for k,g in df.groupby('ProjID'):
plt.plot(g['Xcoord'],g['Ycoord'])
plt.show()

Here is a working example including the ability to adjust legend names.
grp = df.groupby('groupCol')
legendNames = grp.apply(lambda x: x.name) #Get group names using the name attribute.
#legendNames = list(grp.groups.keys()) #Alternative way to get group names. Someone else might be able to speak on speed. This might iterate through the grouper and find keys which could be slower? Not sure
plots = grp.plot('x1','y1',legend=True, ax=ax)
for txt, name in zip(ax.legend_.texts, legendNames):
txt.set_text(name)
Explanation:
Legend values get stored in the parameter ax.legend_ which in turn contains a list of Text() objects, with one item per group, where Text class is found within the matplotlib.text api. To set the text object values, you can use the setter method set_text(self, s).
As a side note, the Text class has a number of set_X() methods that allow you to change the font sizes, fonts, colors, etc. I haven't used those, so I don't know for sure they work, but can't see why not.

based on Serenity's anwser, i make the legend better.
import pandas as pd
import matplotlib.pylab as plt
import numpy as np
# random df
df = pd.DataFrame(np.random.randint(0,10,size=(25, 3)), columns=['ProjID','Xcoord','Ycoord'])
# plot groupby results on the same canvas
grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
grouped.plot(kind='line', x = "Xcoord", y = "Ycoord", ax=ax)
ax.legend(labels=grouped.groups.keys()) ## better legend
plt.show()
and you can also do it like:
grouped = df.groupby('ProjID')
fig, ax = plt.subplots(figsize=(8,6))
g_plot = lambda x:x.plot(x = "Xcoord", y = "Ycoord", ax=ax, label=x.name)
grouped.apply(g_plot)
plt.show()
and it looks like:

Plot 2D array with Pandas, Matplotlib, and Numpy

As a result from simulations, I parsed the output using Pandas groupby(). I am having a bit of difficulty to plot the data the way I want. Here's the Pandas output file (suppressed for simplicity) that I'm trying to plot:
Avg-del Min-del Max-del Avg-retx Min-retx Max-retx
Prob Producers
0.3 1 8.060291 0.587227 26.709371 42.931779 5.130041 136.216642
5 8.330889 0.371387 54.468836 43.166326 3.340193 275.932170
10 1.012147 0.161975 4.320447 6.336965 2.026241 19.177802
0.5 1 8.039639 0.776463 26.053635 43.160880 5.798276 133.090358
5 4.729875 0.289472 26.717824 25.732373 2.909811 135.289244
10 1.043738 0.160671 4.353993 6.461914 2.015735 19.595393
My y-axis is delay and my x-axis is the number of producers. I want to have errorbars for probability p=0.3 and another one for p=0.5.
My python script is the following:
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.expand_frame_repr', False)
outputFile = 'averages.txt'
f_out = open(outputFile, 'w')
data = pd.read_csv(sys.argv[1], delimiter=",")
result = data.groupby(["Prob", "Producers"]).mean()
print "Writing to output file: " + outputFile
result_s = str(result)
f_out.write(result_s)
f_out.close()
*** Update from James ***
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')
plt.show()
Now I have 4 sliced arrays, one for each probability.
How do I slice them again based on delay(del) and retx, and plot errorbars based on ave, min, max?

Ok, there is a lot going on here. First, it is plotting 6 lines. When your code calls
plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3')
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')
it is calling plt.plot on a 3x3 array of data. plt.plot interprets this input not as an x and y, but rather as 3 separate series of y-values (with 3 points each). For the x values, it is imputing the values 0,1,2. In other words it for the first plot call it is plotting the data:
x = [1,2,3]; y = [8.060291, 8.330889, 1.012147]
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975]
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]
Based on your x-label, I think you want the values to be x = [1,5,10]. Try this to see if it gets the plot you want.
# iterate over the first dataframe index
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')

Using Python loop to compile single graph

I'm attempting to compile 12 boxplots on a single set of axes using a loop. However, when I run the code as show below, I end up with 12 different figures. Can anyone provide input on how to make sure they all stay on one figure?
import numpy as np
import matplotlib.pyplot as plt
mons =['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
for i in range(len(mons)):
data = np.genfromtxt('Z:/Dan/SnowStatData/Errors/'+mons[i]+'ErrorStats.txt',skip_header=4)
lat = data[:,1]
lon = data[:,2]
corr = data[:,3]
corp = data[:,4]
RMSE = data[:,5]
MAE = data[:,6]
SDE = data[:,7]
sigcorrs = []
clat = []
clon = []
for j in range(len(lat)):
if corp[j] < 0.05:
sigcorrs = np.append(sigcorrs,corr[j])
clat = np.append(clat,lat[j])
clon = np.append(clon,lon[j])
else:
continue
plt.boxplot(sigcorrs,positions=[i])
plt.show()

The trick is to create the figure before you start looping:
fig = plt.figure()
ax = plt.subplot(111)
Now, in the loop don't use plt.boxplot which might create a new figure, but call the function directly from the axes object:
ax.boxplot(sigcorrs,positions=[i])
And then call plt.show() after the loop

It looks like you're calling plt.show() every time through the loop, so it's drawing the figure every time. Try putting it after the loop.

pie plot in matplotlib in Python

I am trying to plot a pie chart from a .txt file with data-set looks like:
asp: 2.11
glu: 1.11
arg: 0.99
his: 5.11
acid: 11.1
base: 2.11
now,
1) I want to plot a pie chart with first 4 entries, with proper labeling.
2) and then another pie plot using last 2 entries.
I was trying with this following code but I am getting errors. My code is:
from pylab import *
inp = open('c:/users/rox/desktop/xx.txt','r').read().strip().replace(': ',' ').split('\n')
for line in map(str.split,inp):
x = line[0]
z = line[1]
fracs = [x]
labels = [z]
pie(fracs,labels=labels,explode=None,autopct='%1.1f%%,shadow=False)
show()
but this code is generating an error report: Could not convert string to float...
and do I need to use tempfile to plot first 4 entries present in the .txt file.
if I want to plot pie chart using last two line of data set, then could it be done using slicing.

Edit: make input more general, so multiple plots can be read from same file:
import matplotlib.pyplot as plt
def read_data(f, num_lines=1, split_on=':'):
lines = (f.next() for i in range(num_lines))
pieces = (line.split(split_on) for line in lines)
data = ((a,float(b)) for a,b in pieces)
return zip(*data)
with open("xx.txt") as inf:
amino_names, amino_values = read_data(inf, 4)
ph_names, ph_values = read_data(inf, 2)
fig = plt.figure(figsize=(2,1))
p1 = fig.add_subplot(1,2,1)
p1.pie(amino_values, labels=amino_names)
p2 = fig.add_subplot(1,2,2)
p2.pie(ph_values, labels=ph_names)
fig.show()
results in

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Bar Plot pylab not reading in all data from csv - python

csv2rec() will ignore lines startswith "#" by default, you can change this by: csv2rec(input, names=['tag', 'count'], comments="something")

Related

Trying to plot real time data in plotly

Pandas groupby results on the same plot

Plot 2D array with Pandas, Matplotlib, and Numpy

Using Python loop to compile single graph

pie plot in matplotlib in Python

Categories

Resources