pandas plot line segments for each row - python

I have dataframes with columns containing x,y coordinates for multiple points. One row can consist of several points.
I'm trying to find out an easy way to be able to plot lines between each point generating a curve for each row of data.
Here is a simplified example where two lines are represented by two points each.
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
df.plot(y=['p1_y','p2_y'], x=['p1_x','p2_x'])
when trying to plot them I expect line 1 to start where x=1 and line 2 to start where x=2.
Instead, the x axis contains two value-pairs (1,2) and (2,3) and both lines have the same start and end-point in x-axis.
How do I get around this problem?
Edit:
If using matplotlib, the following hardcoded values generates the plot i'm interested in
plt.plot([[1,2],[2,3]],[[10,9],[11,12]])

While I'm sure that there should be a more succinct way using pure pandas, here's a simple approach using matplotlib and some derivatives from the original df.(I hope I understood the question correctly)
Assumption: In df, you place x values in even columns and y values in odd columns
Obtain x values
x = df.loc[:, df.columns[::2]]
x
p1_x p2_x
0 1 2
1 2 3
Obtain y values
y = df.loc[:, df.columns[1::2]]
y
p1_y p2_y
0 10 11
1 9 12
Then plot using a for loop
for i in range(len(df)):
plt.plot(x.iloc[i,:], y.iloc[i,:])

One does not need to create additional data frames. One can loop through the rows to plot these lines:
line1 = {'p1_x':1, 'p1_y':10, 'p2_x':2, 'p2_y':11 }
line2 = {'p1_x':2, 'p1_y':9, 'p2_x':3, 'p2_y':12 }
df = pd.DataFrame([line1,line2])
for i in range(len(df)): # for each row:
# plt.plot([list of Xs], [list of Ys])
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]])
plt.show()
The lines will be drawn in different colors. To get lines of same color, one can add option c='k' or whatever color one wants.
plt.plot([df.iloc[i,0],df.iloc[i,2]],[df.iloc[i,1],df.iloc[i,3]], c='k')

I generaly don't use the pandas plotting because I think it is rather limited, if using matplotlib is not an issue, the following code works:
from matplotlib import pyplot as plt
plt.plot(df.p1_x,df.p1_y)
plt.plot(df.p2_x,df.p2_y)
plt.plot()
if you got lots of lines to plot, you can use a for loop.

Related

Why is Shapely plotting two lines when I change a coordinate?

I'm trying to understand how Shapely works.
I can draw a simple line with the following code:
import matplotlib.pyplot as plt
A = Point(0,0)
B = Point(1,1)
AB = LineString([A,B])
plt.plot(AB)
However when I alter the coordinates:
A = Point(1,0)
B = Point(3,4)
AB = LineString([A,B])
plt.plot(AB)
Shapely decides to plot two lines, which is behaviour I don't understand.
Using Shapely 1.7.0
You are using plt.plot() incorrectly.
What plt.plot() does is Plot y versus x as lines and/or markers.
In the docs, you can see that since the call plot(AB) has only 1 argument, AB is being passed as the Y values.
The X value, in this case, is the index of the elements in the array of Y values.
It is the same as calling plt.plot([(1,0),(3,4)]). Since you have 2 tuples of Y values, you will get 2 different lines: [(0,1),(1,3)] and [(0,0),(1,4)]. (Notice the x values are 0 and 1, the index of the corresponding tuple of Y value.)
You can see in the screenshot of the output, that in the first case you also plot 2 lines. But in the case of these specific values, plt.plot([(0,0),(1,1)]) will plot the same line twice.
If you just want to graph a line from point A to point B, you can use:
A = Point(1,0)
B = Point(3,4)
AB = LineString([A,B])
plt.plot(*AB.xy)
plt.show()

How to make a 2D plot with color density as the 3rd argument in python 3

I'd like to make a plot where each point it has its x&y value and it also has a third value expressing the color density at that point. Applying my python code in mathematica I am able to do it using the following code, but now I want to do it only using python(preferably using matlibplot).
def printMath2DTableMethod():
print('{', end="")
for i in range(0, lines, 1):
print('{', end="")
for j in range(0, columns, 1):
f = int(columns * rearrange_.rearrangeMethod(i) + rearrange_.rearrangeMethod(j))
print('%d' % size[f], end = '')
if (j < columns - 1):
print(',', end='')
if (i < lines - 1):
print('},')
else:
print('}}')
The plotting should look something similar to the images of these two questions
How can I make a scatter plot colored by density in matplotlib?
How to plot a density map in python?
it should have a colorbar at the side and the points with the biggest density should be on the top of the other points(if they overlap).
The data that this method produces I append it to some file and it looks like:
1,2,4,5,6,2,6 x256 columns in total
3,2,4,5,1,6,4
4,2,5,6,1,7,5
x256 rows in total
The plotting can be made by using the code directly or by reading the data from the file, but what I don't know is how to assign values to x(which is the i at the 1st for loop at the code above), to y(which is the j at the 2nd for loop at the code above) and especially to the 3rd argument, the one which will show the color density(which is the size[f] at the code above) since it is depended on i and j of the for loops.
I have been trying to research and solve it myself all these days, but not much success, so any help would be highly appreciated. Thanks in advance :)
Here are examples for both plots you linked
import matplotlib.pyplot as plt
import scipy as sp
# scatterplot as link 1
Data = sp.randn(1000,3)
plt.scatter(Data[:,0],Data[:,1],c=Data[:,2],cmap='magma')
plt.colorbar()
# density matrix as link 2
Nbins = 50
M = sp.zeros((Nbins+1,Nbins+1))
xinds = sp.digitize(Data[:,0],sp.linspace(-3,3,Nbins)) # chose limits accordingly
yinds = sp.digitize(Data[:,1],sp.linspace(-3,3,Nbins))
# to account for the highest density drawn over the others
sort_inds = sp.argsort(Data[:,2])[::-1]
Data = Data[sort_inds,:]
xinds = xinds[sort_inds]
yinds = yinds[sort_inds]
for i in range(Data.shape[0]):
M[xinds[i],yinds[i]] = Data[i,2]
plt.matshow(M,cmap='magma',
extent=(Data[:,0].min(),Data[:,0].max(),Data[:,1].max(),Data[:,1].min()),
aspect='equal')
plt.colorbar()

Plotting boxplots for a groupby object

I would like to plot boxplots for several datasets based on a criterion.
Imagine a dataframe similar to the example below:
df = pd.DataFrame({'Group':[1,1,1,2,3,2,2,3,1,3],'M':np.random.rand(10),'F':np.random.rand(10)})
df = df[['Group','M','F']]
Group M F
0 1 0.465636 0.537723
1 1 0.560537 0.727238
2 1 0.268154 0.648927
3 2 0.722644 0.115550
4 3 0.586346 0.042896
5 2 0.562881 0.369686
6 2 0.395236 0.672477
7 3 0.577949 0.358801
8 1 0.764069 0.642724
9 3 0.731076 0.302369
In this case, I have three groups, so I would like to make a boxplot for each group and for M and F separately having the groups on Y axis and the columns of M and F colour-coded.
This answer is very close to what I want to achieve, but I would prefer something more robust, applicable for larger dataframes with greater number of groups. I feel that groupby is the way to go, but I am not familiar with groupby objects and I am failing to even slice them.
. The desirable output would look something like this:
Looks like years ago, someone had the same problem, but got no answers :( Having a boxplot as a graphical representation of the describe function of groupby
My questions are:
How to implement groupby to feed the desired data into the boxplot
What is the correct syntax for the box plot if I want to control what is displayed and not just use default settings (which I don't even know what they are, I am finding the documentation rather vague. To be specific,can I have the box covering the mean +/- standard deviation, and keep the vertical line at median value?)
I think you should use Seaborn library that offers to create these type of customize plots.In your case i had first melted your dataframe to convert it into proper format and then created the boxplot of your choice.
import pandas as pd
import matplotlib.pyplot as plt
Import seaborn as sns
dd=pd.melt(df,id_vars=['Group'],value_vars=['M','F'],var_name='sex')
sns.boxplot(y='Group',x='value',data=dd,orient="h",hue='sex')
The plot looks similar to your required plot.
Finally, I found a solution by slightly modifying this answer. It does not use groupby object, so it is more tedious to prepare the data, but so far it looks like the best solution to me. Here it is:
# here I prepare the data (group them manually and then store in lists)
Groups=[1,2,3]
Columns=df.columns.tolist()[1:]
print Columns
Mgroups=[]
Fgroups=[]
for g in Groups:
dfgc = df[df['Group']==g]
m=dfgc['M'].dropna()
f=dfgc['F'].dropna()
Mgroups.append(m.tolist())
Fgroups.append(f.tolist())
fig=plt.figure()
ax = plt.axes()
def setBoxColors(bp,cl):
plt.setp(bp['boxes'], color=cl, linewidth=2.)
plt.setp(bp['whiskers'], color=cl, linewidth=2.5)
plt.setp(bp['caps'], color=cl,linewidth=2)
plt.setp(bp['medians'], color=cl, linewidth=3.5)
bpl = plt.boxplot(Mgroups, positions=np.array(xrange(len(Mgroups)))*3.0-0.4,vert=False,whis='range', sym='', widths=0.6)
bpr = plt.boxplot(Fgroups, positions=np.array(xrange(len(Fgroups)))*3.0+0.4,vert=False,whis='range', sym='', widths=0.6)
setBoxColors(bpr, '#D7191C') # colors are from http://colorbrewer2.org/
setBoxColors(bpl, '#2C7BB6')
# draw temporary red and blue lines and use them to create a legend
plt.plot([], c='#D7191C', label='F')
plt.plot([], c='#2C7BB6', label='M')
plt.legend()
plt.yticks(xrange(0, len(Groups) * 3, 3), Groups)
plt.ylim(-3, len(Groups)*3)
#plt.xlim(0, 8)
plt.show()
The result looks mostly like what I wanted (as far as I have been able to find, the box always ranges from first to third quartile, so it is not possible to set it to +/- standard deviation). So I am a bit disappointed there is no one-line solution, but I am glad it is possible. However, for hundreds of groups this would not be good enough...

unable to plot two columns from DataFrame after using pandas.read_csv

I'm trying to plot two columns that have been read in using pandas.read_csv, the code:-
from pandas import read_csv
from matplotlib import pyplot
data = read_csv('Stats.csv', sep=',')
#data = data.astype(float)
data.plot(x = 1, y = 2)
pyplot.show()
the csv file snippet:-
1,a4,2000,125,1.9,2.8,25.6
2,a4,7000,125,1.7,2.3,18
3,a2,7000,30,0.84,1.1,8.11
4,a2,5000,30,0.83,1.05,6.87
5,a2,4000,45,2.8,3.48,16.54
when x = 1 and y = 2 it will plot the second column against the fourth not the third as I expected
When I try to plot the third column against the fourth (x = 2, y = 3) it plots the third against the fifth
I'm trying to plot the third against the fourth right now, when both x and y = 2 it will plot the third column against the fourth but the values are incorrect, what am I missing? is the read_csv changing the order of the columns?
Your input csv is without headers which doesn't help clarity (see Murali's comment). But I think the problem stems from the nature of column that contains a4,a2.
This column can be used for the x axis but not for y axis (non-numeric data on an x axis appears to be just read in order). Hence the count offset. So as y "reads over" the column at 1 (all 0 indexed) - but x does not.
Conducting
data.plot(x=1,y=0)
and
data.plot(x=0,y=1)
and inspecting the axis helps visualise what's going on.
Bizarrely this means you can do
df.plot(x=1,y=1)
to get what you want.

Producing boxplot figures dynamically depending on number of columns in data

I'm writing a python function to produce boxplots of data using python's matplotlib. I require this function to dynamically determine the number of figures and subplots based on the number of columns in the data. To make the figures readable I want the maximum number of subplots to be 4 by 4. If there are more than 16 I want to function to fill as many 4 by 4 figures as necessary then partially fill a final figure with the remainder.
For example, the data I currently have has 43 columns. I therefore want this function to produce 2 full figure containing 16 subplots each and one partially filled figure containing the remaining subplots. My problem is that I am stuck with the logic of writing such a function.
My code:
import matplotlib.pyplot as plt
def boxplot_data(self,parameters_file):
data = read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
remainder = len(header)-(16*number_of_full_subplots)
for i in range(0,number_of_full_subplots):
plt.figure(i)
for j in range(0,16):
plt.subplot(4,4,j)
boxplot(data[header[0:16]])
My plan was to iterate over the 'full subplots' first then iterate over the remainder however this method produces two identical figures.
Does anybody have any suggestions?
Thanks
the figures are identical because this line
boxplot(data[header[0:16]])
does not change when i and j are incremented.
Replace it with
boxplot(data[header[16*i+j]])
import matplotlib.pyplot as plt
def boxplot_data(self, parameters_file):
data = read_csv(parameters_file)
header = data.keys()
number_of_full_subplots = len(header)/16
for i in range(number_of_full_subplots):
plt.figure(i)
for j in range(16):
plt.subplot(4, 4, j)
boxplot(data[header[16*i+j]])

Categories