using matplotlib /plotly to make an histogram

using matplotlib /plotly to make an histogram - python

I have to realize a histogram using matplotlip and plotly. But I am stuck because there are so many options available and with so don't manage to have a proper histogram with all the online tutorials.
My data is a matrix of two columns and 20000 rows. I use those commands, but it didn't work.
here is my code:
with open('rmsd.dat') as f:
v = np.loadtxt(f, delimiter= ' ')
plt.hist(v, bins=100)
plt.xlabel("G-r0")
plt.ylabel('# of stars')
plt.title("RMSD histogramm")
plt.show()
In a second time the histogram has to be horizontal and near another plot using the same data
I tried to use matplotlib and plotly but it was a big mess
that all

Your data has two columns, so you must to indicate which column you want to plot.
import matplotlib.pyplot as plt
data[:,0] #shape (3000,2)
plt.hist(data[:,0],bins=100)
Example1
Or horizontal:
plt.hist(data[:,0],bins=100,orientation='horizontal')
If I just use plt.hist(data,bins=30) it will appear like a simple bar plot.
Example2

Related

How do you change the spread of the Y axis of pandas box plot?

I am plotting 100 data points for 9 different groups. One group's data points are much larger than all the other groups so when I make a box graph using pandas only that group is shown, while all other groups are smashed to the bottom. Here is what it looks like now: smushed box plot
I would like the Y axis to be more spaced out so that I can see the other groups' box graphs. Here is similar data in a scatter plot that has the spacing I am looking for: well spaced scatter plot
What I have
What is need
Here is my code at the moment:
# use ``` to designate a code block in markdown
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("residues.csv")
df.plot.box()
plt.show()

It looks like you want y to be log-scaled:
df.plot.box(logy=True)

Try this:
boxplot = df.boxplot(column=df.columns)
plt.show()
Reference
See the pandas documentation on boxplot: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html

A problem of python plot for large number of data

I am new to python and trying to plot a color magnitude diagram(CMD) for a selected cluster by matplotlib, there are 3400000 stars that I need to plot, the data for each star would be color on x axis and magnitude on y axis, However, my code should read two columns in a csv file and plot. The problem is when I using a part of the data (3000 stars), I can plot a CMD succesfully but when I use all the data, the plot is very mess(see figure below) and it seems that points are ploted by their positions in the column instead of its value. For example, a point has data (0.92,20.64) should be close to the y-axis, but is actually located at the far right of the plot just becasue it placed at last few columns of the dataset. So I wanna know how can I plot the entire dataset and show a plot like the first figure.Thanks for yout time. These are my codes:
import matplotlib.pyplot as plt
import pandas as pd
import csv
data = pd.read_csv(r'C:\Users\Peter\Desktop\F275W test.csv', low_memory=False)
# Generate some test data
x = data['F275W-F336W']
y = data['F275W']
#remove the axis
plt.axis('off')
plt.plot(x,y, ',')
plt.show()
This is the plot I got for 3000 stars it's a CMD
This is the plot I got for entire dataset, which is very mess

How to plot certain row and column using panda dataframe?

I have a very simple data frame but I could not plot a line using a row and a column. Here is an image, I would like to plot a "line" that connects them.
enter image description here
I tried to plot it but x-axis disappeared. And I would like to swap those axes. I could not find an easy way to plot this simple thing.

Try:
import matplotlib.pyplot as plt
# Categories will be x axis, sexonds will be y
plt.plot(data["Categories"], data["Seconds"])
plt.show()
Matplotlib generates the axis dynamically, so if you want the labels of the x-axis to appear you'll have to increase the size of your plot.

Violin Plot with python

I want to create 10 violin plots but within one diagram. I looked at many examples like this one: Violin plot matplotlib, what shows what I would like to have at the end.
But I did not know how to adapt it to a real data set. They all just generate some random data which is normal distributed.
I have data in form D[10,730] and if I try to adapt it from the link above with :
example:
axes[0].violinplot(all_data,showmeans=False,showmedians=True)
my code:
axes[0].violinplot(D,showmeans=False,showmedians=True)
it do not work.
It should print 10 violin plot in parallel (first dimension of D).
So how do my data need to look like to get the same type of violin plot?

You just need to transpose your data array D.
axes[0].violinplot(D.T,showmeans=False,showmedians=True)
This appears to be a small bug in matplotlib. The axes are treated in a non-consistent manner for a list of 1D arrays and a 2D array.
import numpy as np
import matplotlib.pyplot as plt
n_datasets = 10
n_samples = 730
data = np.random.randn(n_datasets,n_samples)
fig, axes = plt.subplots(1,3)
# http://matplotlib.org/examples/statistics/boxplot_vs_violin_demo.html
axes[0].violinplot([d for d in data])
# should be equivalent to:
axes[1].violinplot(data)
# is actually equivalent to
axes[2].violinplot(data.T)
You should file a bug report.

Pyplot colormap line by line

I'm beginning with plotting on python using the very nice pyplot. I aim at showing the evolution of two series of data along time. Instead of doing a casual plot of data function of time, I'd like to have a scatter plot (data1,data2) where the time component is shown as a color gradient.
In my two column file, the time would be described by the line number. Either written as a 3rd column in the file either using the intrinsic capability of pyplot to get the line number on its own.
Can anyone help me in doing that ?
Thanks a lot.
Nicolas

When plotting using matplotlib.pyplot.scatter you can pass a third array via the keyword argument c. This array can choose the colors that you want your scatter points to be. You then also pick an appropriate colormap from matplotlib.cm and assign that with the cmap keyword argument.
This toy example creates two datasets data1 and data2. It then also creates an array colors, an array of continual values equally spaced between 0 and 1, and with the same length as data1 and data2. It doesn't need to know the "line number", it just needs to know the total number of data points, and then equally spaces the colors.
I've also added a colorbar. You can remove this by removing the plt.colorbar() line.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
N = 500
data1 = np.random.randn(N)
data2 = np.random.randn(N)
colors = np.linspace(0,1,N)
plt.scatter(data1, data2, c=colors, cmap=cm.Blues)
plt.colorbar()
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

using matplotlib /plotly to make an histogram - python

Related

How do you change the spread of the Y axis of pandas box plot?

A problem of python plot for large number of data

How to plot certain row and column using panda dataframe?

Violin Plot with python

Pyplot colormap line by line

Categories

Resources