I am quite new to python so sorry if my question is very basic.
I have a couple of data files (let's say 4), each containing 9 columns and n rows (I need to skip the first row because it is the name of each column). I would like to have the flexibility of plotting any two columns against each other and at the same time do it in one graph.
Let's say I want to take column 2 and 4 from all the data files and plot against each other and have them in one graph to compare them.
What is the general way to do it please?
I looked at a lot of different examples but I couldn't really find the one that addresses this specific case.
Below is a piece of code I have for plotting two columns against each other for one file:
from pylab import *
### processing function
def store(var,textFile):
data=loadtxt(textFile,skiprows=1)
it=[]
eps=[]
sig=[]
tc=[]
sc=[]
te=[]
se=[]
ubf=[]
for i in range(0,len(data)):
it.append(float(data[i,1]))
eps.append(float(data[i,0]))
sig.append(float(data[i,4]))
tc.append(float(data[i,6]))
sc.append(float(data[i,2]))
te.append(float(data[i,7]))
se.append(float(data[i,3]))
ubf.append(float(data[i,8]))
var.append(it)
var.append(eps)
var.append(sig)
var.append(tc)
var.append(sc)
var.append(te)
var.append(se)
var.append(ubf)
### data input
dataFile1='555_20K_tensionTestCentreCrack_L5a0_r0.01'
a1=[]
store(a1,dataFile1)
rcParams.update({'legend.numpoints':1,'font.size': 20,'axes.labelsize':25,'xtick.major.pad':10,'ytick.major.pad':10,'legend.fontsize':20})
lw=2
ms=10
### plots
crossSection=0.04
figure(0,figsize=(10,10))
ax1=subplot(1,1,1)
grid()
xlabel('iteration [-]')
ax1.plot(a1[0],[x/1e6 for x in a1[2]],'-k',linewidth=lw)
ylabel(r'$\sigma_1$ [MPa]')
#axis(ymin=0,ymax=10)
ax2 = ax1.twinx()
ax2.plot(a1[0],a1[7],'-r',linewidth=lw)
ylabel('unbForce [-]')
figure(1,figsize=(10,10))
ax1=subplot(1,1,1)
grid()
xlabel(r'$\varepsilon_1$ [millistrain]')
#axis(xmin=0,xmax=0.12)
plot([x*1e3 for x in a1[1]],[x/1e6 for x in a1[2]],'-k',linewidth=lw)
ylabel(r'$\sigma_1$ [MPa]')
#axis(ymin=0,ymax=10)
#savefig(dataFile1+'_sigVSeps.eps',dpi=1000,format='eps',transparent=False)
figure(2,figsize=(10,10))
ax1=subplot(1,1,1)
grid()
xlabel(r'$\varepsilon_1$ [millistrain]')
axis(xmin=0,xmax=0.12)
ax1.plot([x*1e3 for x in a1[1]],[x/1e6 for x in a1[2]],'-k',linewidth=lw)
ylabel(r'$\sigma_1$ [MPa]')
#axis(ymin=0,ymax=10)
ax2 = ax1.twinx()
ax2.plot([x*1e3 for x in a1[1]],a1[3],'-b',linewidth=lw)
ax2.plot([x*1e3 for x in a1[1]],a1[4],'-r',linewidth=lw)
ylabel('cumulative number of microcracks [-]')
legend(('tensile','shear'))
#savefig(dataFile1+'_sig&cracksVSeps.eps',dpi=1000,format='eps',transparent=False)
### show or save
show()
The name of the columns in the data file are it , eps, sig, tc, sc, te, se, ubf and i.
The data file name is 555_20K_tensionTestCentreCrack_L5a0_r0.01.
As you see using this code I am able to plot any two different columns against each other. But I can do it only for one data file.
How can I change this code in order to be able to call different data files?
I actually didn't write this piece of code myself, so I don't get what this line means:
ax1.plot(a1[0],[x/1e6 for x in a1[2]],'-k',linewidth=lw)
and also this line:
plot([x*1e3 for x in a1[1]],[x/1e6 for x in a1[2]],'-k',linewidth=lw)
Sorry again for my weird question.
Please tell me if you need more details about my case.
Thanks a lot
Related
I'm trying to plot a multi line-graph plot from a pandas dataframe using seaborn. Below is a .csv of the of the data and the desired plot. In excel I simply selected the whole dataset and swapped the axis. Technically there are 110 lines (rows) on this, but many aren't visible because they only contain 0's.
This is my code:
individual_burst_data = {'nb001':nb001, 'nb002':nb002, 'nb003':nb003, 'nb004':nb004, 'nb005':nb005, 'nb006':nb006, 'nb007':nb007, 'nb008':nb008, 'nb009':nb009, 'nb010':nb010, 'nb011':nb011, 'nb012':nb012, 'nb013':nb013, 'nb015':nb015, 'nb016':nb016 }
ibd_panda_conv = pd.DataFrame(individual_burst_data)
sns.lineplot(data = ibd_panda_conv, x = individual_burst_data, y =ibd_panda_conv)
Other sources seem to only extract one column, whereas I need all the columns.
I tried to create an index for the y-axis
index_data = list(range(0,len(individual_burst_data)))
but this didn't work either.
The seaborn lineplot() documentation says:
Passing the entire wide-form dataset to data plots a separate line for each column
Since you want a line for each row instead, you need to transpose your dataframe, so try this:
sns.lineplot(data=ibd_panda_conv.T, dashes=False)
I have a situation with a bunch of datafiles, these datafiles have a number of samples in a given time frame that depends on the system. i.e. At time t=1 for instance I might have a file with 10 items, or 20 items, at later times in that file I will always have the same number of items. The format is time, x, y, z in columns, and loaded into a numpy array. The time values show which frame, but as mentioned there's always the same, let's go with 10 as a sample. So I'll have a (10,4) numpy array where the time values are identical, but there are many frames in the file, so lets say 100 frames, so really I have (1000,4). I want to plot the data with time on the x-axis and manipulations of the other data on the y, but I am unsure how to do this with line plot methods in matplotlib. Normally to provide both x,y values I believe I need to do a scatter plot, so I'm hoping there's a better way to do this. What I ideally want is to treat each line that has the same time code as a different series (so it will colour differently), and the next bit of data for that same line number in the next frame (time value) will be labelled the same colour, giving those good contiguous lines. We can look at the time column and figure out how many items share a time code, let's call it "n". Sample code:
a = numpy.loadtxt('sampledata.txt')
plt.plot(a[:0,:,n],a[:1,:1])
plt.show()
I think this code expresses what I'm going for, though it doesn't work.
Edit:
I hope this is what you wanted.
seaborn scatterplot can categorize data to some groups which have the same codes (time code in this case) and use the same colors to them.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r"E:\Programming\Python\Matplotlib\timecodes.csv",
names=["time","x","y","z","code"]) #use your file
df["time"]=pd.to_datetime(df["time"]) #recognize the data as Time
df["x"]=df["time"].dt.day # I changed the data into "Date only" and imported to x column. Easier to see on graph.
#just used random numbers in y and z in my data.
sns.scatterplot("x", "y", data = df, hue = "code") #hue does the grouping
plt.show()
I used csv file here but you can do to your text file as well by adding sep="\t" in the argument. I also added a code in the file. If you have it the code can group the data in the graph, so you don't have to separate or make a hierarchical index. If you want to change colors or grouping please see seaborn website.
Hope this helps.
Alternative, the method I used, but Tim's answer is still accurate as well. Since the time codes are not date/time information I modified my own code to add tags as a second column I call "p" (they're polymers).
import numpy as np
import pandas as pd
datain = np.loadtxt('somefile.txt')
df = pd.DataFrame(data = datain, columns = ["t","p","x","y","z"])
ax = sns.scatterplot("t","x", data = df, hue = "p")
plt.show()
And of course the other columns can be plotted similarly if desired.
I have dataframes with columns containing x,y coordinates for multiple points. One row can consist of several points.
I'm trying to find out an easy way to be able to plot lines between each point generating a curve for each row of data.
Here is a simplified example where two lines are represented by two points each.
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
df.plot(y=['p1_y','p2_y'], x=['p1_x','p2_x'])
when trying to plot them I expect line 1 to start where x=1 and line 2 to start where x=2.
Instead, the x axis contains two value-pairs (1,2) and (2,3) and both lines have the same start and end-point in x-axis.
How do I get around this problem?
Edit:
If using matplotlib, the following hardcoded values generates the plot i'm interested in
plt.plot([[1,2],[2,3]],[[10,9],[11,12]])
While I'm sure that there should be a more succinct way using pure pandas, here's a simple approach using matplotlib and some derivatives from the original df.(I hope I understood the question correctly)
Assumption: In df, you place x values in even columns and y values in odd columns
Obtain x values
x = df.loc[:, df.columns[::2]]
x
p1_x p2_x
0 1 2
1 2 3
Obtain y values
y = df.loc[:, df.columns[1::2]]
y
p1_y p2_y
0 10 11
1 9 12
Then plot using a for loop
for i in range(len(df)):
plt.plot(x.iloc[i,:], y.iloc[i,:])
One does not need to create additional data frames. One can loop through the rows to plot these lines:
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
for i in range(len(df)): # for each row:
# plt.plot([list of Xs], [list of Ys])
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]])
plt.show()
The lines will be drawn in different colors. To get lines of same color, one can add option c='k' or whatever color one wants.
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]], c='k')
I generaly don't use the pandas plotting because I think it is rather limited, if using matplotlib is not an issue, the following code works:
from matplotlib import pyplot as plt
plt.plot(df.p1_x,df.p1_y)
plt.plot(df.p2_x,df.p2_y)
plt.plot()
if you got lots of lines to plot, you can use a for loop.
I am trying to fill between two lines using Bokeh. My two datasets contain sections of NaNs. The patch renders correctly for the last section of data, but fails for sections before any NaN blocks. The following example illustrates the problem:
from bokeh.plotting import figure, output_file, show
import numpy as np
p = figure(plot_width=400, plot_height=300)
mx = np.array(np.random.randint(20, 25, 30), dtype=float)
mx[7:11] = np.nan
mx[19:23] = np.nan
mn = mx-10
x = np.arange(0, len(mn))
wX = np.append(x, x[::-1])
wY = np.append(mx, mn[::-1])
p.patch(wX, wY)
show(p)
This produces the following figure:
I would like the first two parallel line sections to plot with a fill-between as the final section is plotting. Instead, these sections seem to be applying the patch just to the line segments themselves. I have a solution that creates individual patches by looping over each contiguous section of data, but it is too slow over many 100s of patches.
For as far as i can tell, Bokeh renders the patches you provide correctly. Keep in mind that passing NaN's seperates individual patches. That makes it a bit strange that you pass multiple consecutive NaN's, which doesn't add anything. It's also a bit confusing to specify a valid X-coordinates together with a NaN Y-coordinate.
Just as with line() and multi_line(), NaN values can be passed to patch() and patches() glyphs. In this case, you end up with single logical patch objects, that have multiple disjoint components when rendered
http://docs.bokeh.org/en/latest/docs/user_guide/plotting.html
I have added the x,y coordinates to the first patch i get when running you code. Perhaps you intend something different, but Bokeh is rendering, what you specify, correctly.
I'm writing a python function to produce boxplots of data using python's matplotlib. I require this function to dynamically determine the number of figures and subplots based on the number of columns in the data. To make the figures readable I want the maximum number of subplots to be 4 by 4. If there are more than 16 I want to function to fill as many 4 by 4 figures as necessary then partially fill a final figure with the remainder.
For example, the data I currently have has 43 columns. I therefore want this function to produce 2 full figure containing 16 subplots each and one partially filled figure containing the remaining subplots. My problem is that I am stuck with the logic of writing such a function.
My code:
import matplotlib.pyplot as plt
def boxplot_data(self,parameters_file):
data = read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
for i in range(0,number_of_full_subplots):
plt.figure(i)
for j in range(0,16):
plt.subplot(4,4,j)
boxplot(data[header[0:16]])
My plan was to iterate over the 'full subplots' first then iterate over the remainder however this method produces two identical figures.
Does anybody have any suggestions?
Thanks
the figures are identical because this line
boxplot(data[header[0:16]])
does not change when i and j are incremented.
Replace it with
boxplot(data[header[16*i+j]])
import matplotlib.pyplot as plt
def boxplot_data(self, parameters_file):
data = read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
for i in range(number_of_full_subplots):
plt.figure(i)
for j in range(16):
plt.subplot(4, 4, j)
boxplot(data[header[16*i+j]])