Units on x axis of graph do not have the same scale - python

I have the following code
import numpy as np
#x axis
data = open("spectra.dat","r")
linesColumn = data.readlines()
xaxis = []
for x in linesColumn:
xaxis.append(x.split()[0])
data.close()
#y data
data = open("spectra.dat","r")
linesColumn1 = data.readlines()
firstColumn = []
for x in linesColumn1:
firstColumn.append(x.split()[1])
data.close()
plt.plot(xaxis, firstColumn)
plt.show()
The data is here https://drive.google.com/file/d/177kzRGXIoSKvH1fC9XJZubfK3AHzHsFF/view?usp=sharing
When I plot the graph I get a linear function because the units on the x axis do not scale the same way. In the beginning one unit is 0.1 and in the end it's 5, but it's still displayed as the same distance on x axis.
How do I fix that?
Also, is there a way to optimize the column splitting (doing it through loop or something) and storing each column as one list?

The problem is xaxis consists of str type variables. By converting str type variable to float we can get expected figure. Code is shown below:
import numpy as np
from matplotlib import pyplot as plt
#x axis
data = open("spectra.DAT","r")
linesColumn = data.readlines()
xaxis = []
for x in linesColumn:
xaxis.append(float(x.split()[0]))
data.close()
#y data
data = open("spectra.DAT","r")
linesColumn1 = data.readlines()
firstColumn = []
for x in linesColumn1:
firstColumn.append(x.split()[1])
data.close()
plt.plot(xaxis, firstColumn)
plt.show()

Related

Wanted to partially remove items on x axis while using matplotlib.pyplot

I am designing a currency converter app and I had an idea to add graphical currency analysis to it.
for this I've started using matplotlib.pyplot . I am taking from date(i.e. date from which graph compares data ) as input from user.And using this data , i am taking real time currency data from certain sources.
But here came the main issue.When i drew the graph the x - axis is really bad😫.
Ill insert the output i am getting--> graph and a rough code of mine.The main isuue i want to eliminate is that i want only certain parts of x-axis visible.
import matplotlib.pyplot as plt
import requests
x = []
y = []
for i in range(fyear,tyear):
for j in range(fmonth,tmonth):
for k in range(fday,tday):
response = requests.get("https://api.ratesapi.io/api/{}-{}-{}?base={}&symbols{}".format(i,j,k,inp_curr,out_curr))
data = response.json()
rate = data['rates'][out_curr]
y.append(rate)
x.append("{}/{}/{}".format(j,i,k))
plt.plot(x,y)
OBTAINED OUTPUT:
enter image description here
need answer quickly.....
If for parts you mean to set only few labels along x axis you could use xticks and locator_params. See docs here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.xticks.html
import matplotlib.pyplot as plt
import numpy as np
import requests
# use some fake data for testing - use your params
fyear = 2019
tyear = 2020
fmonth = 1
tmonth = 13
fday=1
tday=28
inp_curr = "EUR"
out_curr = "GBP"
# init lists
x = []
y = []
for i in range(fyear,tyear):
for j in range(fmonth,tmonth):
for k in range(fday,tday):
response = requests.get("https://api.ratesapi.io/api/{}-{}-{}?base={}&symbols{}".format(i,j,k,inp_curr,out_curr))
data = response.json()
rate = data['rates'][out_curr]
y.append(rate)
x.append("{}/{}/{}".format(j,i,k))
# create subplot
fig, ax = plt.subplots(1,1, figsize=(20, 11))
# plot image
img = ax.plot(x, y)
# set the total number of x_ticks (the ticks on the x label)
ax.set_xticks(np.arange(len(x)))
# set the labels for each x_tick (actually is x list)
ax.set_xticklabels(x)
# set the number of ticks you want to visualize
# you can just select a number i.e. 10 and you will visualize onlu 10 ticks
# in order to visualize, say the first day of each month set this
n = round(len(x)/(tday-fday))
plt.locator_params(axis='x', nbins=n)
# change labels position to oblique
ax.get_figure().autofmt_xdate()
fig.tight_layout()
Remember to import numpy! Hope it helps you. Here you can see my output.

Plot certain range of values with pandas and matplotlib

I have parsed out data form .json than plotted them but I only wants a certain range from it
e.g. year-mounth= 2014-12to 2020-03
THE CODE IS
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_json("observed-solar-cycle-indices.json", orient='records')
data = pd.DataFrame(data)
print(data)
x = data['time-tag']
y = data['ssn']
plt.plot(x, y, 'o')
plt.xlabel('Year-day'), plt.ylabel('SSN')
plt.show()
Here is the result, as you can see it is too many
here is the json file: https://services.swpc.noaa.gov/json/solar-cycle/observed-solar-cycle-indices.json
How to either parse out certain value from the JSON file or plot a certain range?
The following should work:
Select the data using a start and end date
ndata = data[ (data['time-tag'] > '2014-01') & (data['time-tag'] < '2020-12')]
Plot the data. The x-axis labeling is adapted to display only every 12th label
x = ndata['time-tag']
y = ndata['ssn']
fig, ax = plt.subplots()
plt.plot(x, y, 'o')
every_nth = 12
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
plt.xlabel('Year-Month')
plt.xticks(rotation='vertical')
plt.ylabel('SSN')
plt.show()
You could do a search for the index value of your start and end dates for both x and y values. Use this to create a smaller set of lists that you can plot.
For example, it might be something like
x = data['time-tag']
y = data['ssn']
start_index = x.index('2014-314')
end_index = x.index('2020-083')
x_subsection = x[start_index : end_index]
y_subsection = y[start_index : end_index]
plt.plot(x_subsection, y_subsection, 'o')
plt.xlabel('Year-day'), plt.ylabel('SSN')
plt.show()
You may need to convert the dataframe into an array with np.array().

How to grid plot 2D categorical data

I hava data that looks like:
Name X Y
A HIGH MID
B LOW LOW
C MID LOW
D HIGH MID
How to plot this data in a 2-D diagram with a 3x3 grid adding a random variation to place each data point including its name with enough spacing between each other.
So it should look somewhat like that:
The following i tried, but i dont know how to plot the values not exactly on the grid, but in between, so they do nbot overlap.
import pandas as pd
import matplotlib.pyplot as plt
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
### Plotting ###
fig,axs = plt.subplots()
axs.scatter(df.X,df.Y,label=df.name)
axs.set_xlabel('X')
axs.set_ylabel('Y')
for i,p in enumerate(df.name):
axs.annotate(p, (df.X[i],df.Y[i]))
axs.grid()
axs.set_axisbelow(True)
fig.tight_layout()
plt.show()
resulting:
You can control directly the positions and change the labels on the axis. There are a few problems with your drawing because you are not taking into account some issue such as "what label will you have if you have more than one point at the same location?".
In any case here is a possible solution:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
pos = [0, 1, 2]
lbls = ["LOW", "MID", "HIGH"]
trans = {lbls[i]:pos[i] for i in range(len(pos))}
mat = np.zeros((3, 3), dtype="U10") # This is limited to 10 characters
xxs = []
yys = []
offset = 0.05
for i in range(df.shape[0]):
xc, yc = trans[df.X[i]], trans[df.Y[i]]
if mat[xc, yc]=="":
mat[xc, yc] = df.name[i]
else:
mat[xc, yc] = mat[xc, yc] + ";" + df.name[i]
xxs.append(xc)
yys.append(yc)
fig,axs = plt.subplots()
axs.scatter(xxs, yys)
for i in range(df.shape[0]):
name = mat[xxs[i], yys[i]]
axs.text(xxs[i]+offset, yys[i]+offset, name)
axs.set_xticks(pos)
axs.set_xticklabels(lbls)
axs.set_yticks(pos)
axs.set_yticklabels(lbls)
for i in pos:
axs.axhline(pos[i]-0.5, color="black")
axs.axvline(pos[i]-0.5, color="black")
axs.set_xlim(-0.5, 2.5)
axs.set_ylim(-0.5, 2.5)
plt.show()
This result in the following image:

Removing Data Below A Line In A Scatterplot (Python)

So I had code that graphed a 2dhistogram of my dataset. I plotted it like so:
histogram = plt.hist2d(fehsc, ofesc, bins=nbins, range=[[-1,.5],[0.225,0.4]])
I wanted to only look at data above a certain line though, so I added the following and it worked just fine:
counts = histogram[0]
xpos = histogram[1]
ypos = histogram[2]
image = histogram[3]
newcounts = counts #we're going to iterate over this
for i in range (nbins):
xin = xpos[i]
yin = ypos
yline = m*xin + b
reset = np.where(yin < yline) #anything less than yline we want to be 0
#index = index[0:len(index)-1]
countout = counts[i]
countout[reset] = 0
newcounts[i] = countout
However, I now need to draw a regression line through that cut region. Doing so is not possible (AFAIK) in plt.2dhist, so I'm using plt.scatter. Problem is I don't know how to make that cut anymore - I can't index the scatterplot.
I have this now:
plt.xlim(-1,.5)
plt.ylim(.225, .4)
scatter = plt.scatter(fehsc,ofesc, marker = ".")
and I only want to retain the data above some line:
xarr = np.arange(-1,0.5, 0.015)
yarr = m*xarr + b
plt.plot(xarr, yarr, color='r')
I've tried running the loop with some variations of the variables but I don't actually understand or know how to get it to work.
You could define a mask for your data before you plot and then just plot the data points that actually meet your criteria. Below an example, where all data points above a certain line are plotted in green and all data points below the line are plotted in black.
from matplotlib import pyplot as plt
import numpy as np
#the scatterplot data
xvals = np.random.rand(100)
yvals = np.random.rand(100)
#the line
b = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b
mask = yvals > m*xvals+b
plt.scatter(xvals[mask],yvals[mask],color='g')
plt.scatter(xvals[~mask],yvals[~mask],color='k')
plt.plot(x,y,'r')
plt.show()
The result looks like this
Hope this helps.
EDIT:
If you want to create a 2D histogram, where the portion below the line is set to zero, you can do that by first generating the histogram using numpy (as an array) and then setting the values inside that array to zero, if the bins fall below the line. After that, you can plot the matrix using plt.pcolormesh:
from matplotlib import pyplot as plt
import numpy as np
#the scatterplot data
xvals = np.random.rand(1000)
yvals = np.random.rand(1000)
histogram,xbins,ybins = np.histogram2d(xvals,yvals,bins=50)
#computing the bin centers from the bin edges:
xcenters = 0.5*(xbins[:-1]+xbins[1:])
ycenters = 0.5*(ybins[:-1]+ybins[1:])
#the line
b = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b
#hiding the part of the histogram below the line
xmesh,ymesh = np.meshgrid(xcenters,ycenters)
mask = m*xmesh+b > ymesh
histogram[mask] = 0
#making the plot
mat = plt.pcolormesh(xcenters,ycenters,histogram)
line = plt.plot(x,y,'r')
plt.xlim([0,1])
plt.ylim([0,1])
plt.show()
The result would be something like this:

Plot 2D array with Pandas, Matplotlib, and Numpy

As a result from simulations, I parsed the output using Pandas groupby(). I am having a bit of difficulty to plot the data the way I want. Here's the Pandas output file (suppressed for simplicity) that I'm trying to plot:
Avg-del Min-del Max-del Avg-retx Min-retx Max-retx
Prob Producers
0.3 1 8.060291 0.587227 26.709371 42.931779 5.130041 136.216642
5 8.330889 0.371387 54.468836 43.166326 3.340193 275.932170
10 1.012147 0.161975 4.320447 6.336965 2.026241 19.177802
0.5 1 8.039639 0.776463 26.053635 43.160880 5.798276 133.090358
5 4.729875 0.289472 26.717824 25.732373 2.909811 135.289244
10 1.043738 0.160671 4.353993 6.461914 2.015735 19.595393
My y-axis is delay and my x-axis is the number of producers. I want to have errorbars for probability p=0.3 and another one for p=0.5.
My python script is the following:
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.expand_frame_repr', False)
outputFile = 'averages.txt'
f_out = open(outputFile, 'w')
data = pd.read_csv(sys.argv[1], delimiter=",")
result = data.groupby(["Prob", "Producers"]).mean()
print "Writing to output file: " + outputFile
result_s = str(result)
f_out.write(result_s)
f_out.close()
*** Update from James ***
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')
plt.show()
Now I have 4 sliced arrays, one for each probability.
How do I slice them again based on delay(del) and retx, and plot errorbars based on ave, min, max?
Ok, there is a lot going on here. First, it is plotting 6 lines. When your code calls
plt.plot(np.transpose(np.array(result)[0:3, 0:3]), label = 'p=0.3')
plt.plot(np.transpose(np.array(result)[3:6, 0:3]), label = 'p=0.5')
it is calling plt.plot on a 3x3 array of data. plt.plot interprets this input not as an x and y, but rather as 3 separate series of y-values (with 3 points each). For the x values, it is imputing the values 0,1,2. In other words it for the first plot call it is plotting the data:
x = [1,2,3]; y = [8.060291, 8.330889, 1.012147]
x = [1,2,3]; y = [0.587227, 0.371387, 0.161975]
x = [1,2,3]; y = [26.709371, 54.468836, 4.320447]
Based on your x-label, I think you want the values to be x = [1,5,10]. Try this to see if it gets the plot you want.
# iterate over the first dataframe index
for prob_index in result.index.levels[0]:
r = result.loc[prob_index]
labels = [col for col in r]
lines = plt.plot(r)
[line.set_label(str(prob_index)+" "+col) for col, line in zip(labels, lines)]
ax = plt.gca()
ax.legend()
ax.set_xticks(r.index)
ax.set_ylabel('Latency (s)')
ax.set_xlabel('Number of producer nodes')

Categories