A problem of python plot for large number of data

A problem of python plot for large number of data - python

I am new to python and trying to plot a color magnitude diagram(CMD) for a selected cluster by matplotlib, there are 3400000 stars that I need to plot, the data for each star would be color on x axis and magnitude on y axis, However, my code should read two columns in a csv file and plot. The problem is when I using a part of the data (3000 stars), I can plot a CMD succesfully but when I use all the data, the plot is very mess(see figure below) and it seems that points are ploted by their positions in the column instead of its value. For example, a point has data (0.92,20.64) should be close to the y-axis, but is actually located at the far right of the plot just becasue it placed at last few columns of the dataset. So I wanna know how can I plot the entire dataset and show a plot like the first figure.Thanks for yout time. These are my codes:
import matplotlib.pyplot as plt
import pandas as pd
import csv
data = pd.read_csv(r'C:\Users\Peter\Desktop\F275W test.csv', low_memory=False)
# Generate some test data
x = data['F275W-F336W']
y = data['F275W']
#remove the axis
plt.axis('off')
plt.plot(x,y, ',')
plt.show()
This is the plot I got for 3000 stars it's a CMD
This is the plot I got for entire dataset, which is very mess

Related

How do you change the spread of the Y axis of pandas box plot?

I am plotting 100 data points for 9 different groups. One group's data points are much larger than all the other groups so when I make a box graph using pandas only that group is shown, while all other groups are smashed to the bottom. Here is what it looks like now: smushed box plot
I would like the Y axis to be more spaced out so that I can see the other groups' box graphs. Here is similar data in a scatter plot that has the spacing I am looking for: well spaced scatter plot
What I have
What is need
Here is my code at the moment:
# use ``` to designate a code block in markdown
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("residues.csv")
df.plot.box()
plt.show()

It looks like you want y to be log-scaled:
df.plot.box(logy=True)

Try this:
boxplot = df.boxplot(column=df.columns)
plt.show()
Reference
See the pandas documentation on boxplot: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html

Smoothing the curve in a line plot - Values interval x axis

I'm trying to recreate the following plot:
With an online tool I could create the dataset (135 data points) which I saved in a CSV file with the following structure:
Year,Number of titles available
1959,1.57480315
1959,1.57480315
1959,1.57480315
...
1971,221.4273356
1971,215.2494175
1971,211.5426666
I created a Python file with the following code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('file.csv')
df.plot.line(x='Year', y='Number of titles available')
plt.show()
and I'm getting the following plot:
What can I do to get a smooth line like in the original plot?
How can I have the same values in the x axis like in the original plot?
EDIT: I worked on the data set and formatting properly the dates, the plot is now better.
This is how the data set looks now:
Date,Number of available titles
1958/07/31,2.908816952
1958/09/16,3.085527674
1958/11/02,4.322502727
1958/12/19,5.382767059
...
1971/04/13,221.6766907
1971/05/30,215.4918154
1971/06/26,211.7808903
This is the plot I can get with the same code posted above:
The question now is: how can I have the same date range as in the original plot (1958 - mid 1971)?

Try taking the mean of your values that you have grouped by year. This will smooth out the discontinuities that you get each year to an average value. If that does not help, then you should apply any one of numerous filters.
df.groupby('year').mean().plot(kind='line')

How to plot certain row and column using panda dataframe?

I have a very simple data frame but I could not plot a line using a row and a column. Here is an image, I would like to plot a "line" that connects them.
enter image description here
I tried to plot it but x-axis disappeared. And I would like to swap those axes. I could not find an easy way to plot this simple thing.

Try:
import matplotlib.pyplot as plt
# Categories will be x axis, sexonds will be y
plt.plot(data["Categories"], data["Seconds"])
plt.show()
Matplotlib generates the axis dynamically, so if you want the labels of the x-axis to appear you'll have to increase the size of your plot.

Plotting points extracted from a .txt file in python

I am trying to create a plot extracting points from a .txt file. The points are separated by 'tab' space only. Also, there are too many points to be accommodated in only one column, so they have been spread over 3 columns. However, when I plot in matplotlib, I am a little suspicious I am not seeing all the numbers plotted. It may be the case the data is plotted only over the first column and is ignoring the other two columns.
Here is the sample example of such data file: https://www.dropbox.com/s/th6uwrk2xdnmhyi/n1l2m2.txt?dl=0
I also attached the simple code I am using to plot:
import matplotlib.pyplot as plt
%matplotlib inline
import sys
import os
import numpy
from pylab import *
exp_sum = '/home/trina/Downloads/n1l2m2.txt'
a= numpy.loadtxt(exp_sum, unpack =True)
plt.plot(a)
show()
and here is the output image:
I am interested to know if this plot covers all the points in my data file. Your suggestion is very appreciated.

By doing plt.plot(a), you are passing a 3 dimensional data set to be plotted onto a 2 dimensional graph.
From the matplotlib docs for plot
If x and/or y is 2-dimensional, then the corresponding columns will be
plotted.
So, your graph output is:
column 0 values at x = 0
column 1 values at x = 1
column 2 values at x = 2
Adding the following to the code:
for i in range(0,len(a)):
print('a'+str(i),max(a[i]),min(a[i]))
Outputs the following:
stats max min
a0 0.9999 0.0
a1 0.9856736 0.3736717
a2 -0.003469009 -0.08896232
Using the mouseover position readout with matplotlib, this looks correct.
On a general graphs point, I'd recommend using histograms, boxplots or violin plots if you want to visualise the frequency (and other stats) of data sets. See the matplotlib examples for histograms, boxplots and violin plots.
Edit: from the shading on the graph you have, it also looks like it does contain all the points, as your data columns are long tails when plotted individually. The long tail graphs correlate to the shading on the graph you have.

using matplotlib /plotly to make an histogram

I have to realize a histogram using matplotlip and plotly. But I am stuck because there are so many options available and with so don't manage to have a proper histogram with all the online tutorials.
My data is a matrix of two columns and 20000 rows. I use those commands, but it didn't work.
here is my code:
with open('rmsd.dat') as f:
v = np.loadtxt(f, delimiter= ' ')
plt.hist(v, bins=100)
plt.xlabel("G-r0")
plt.ylabel('# of stars')
plt.title("RMSD histogramm")
plt.show()
In a second time the histogram has to be horizontal and near another plot using the same data
I tried to use matplotlib and plotly but it was a big mess
that all

Your data has two columns, so you must to indicate which column you want to plot.
import matplotlib.pyplot as plt
data[:,0] #shape (3000,2)
plt.hist(data[:,0],bins=100)
Example1
Or horizontal:
plt.hist(data[:,0],bins=100,orientation='horizontal')
If I just use plt.hist(data,bins=30) it will appear like a simple bar plot.
Example2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

A problem of python plot for large number of data - python

Related

How do you change the spread of the Y axis of pandas box plot?

Smoothing the curve in a line plot - Values interval x axis

How to plot certain row and column using panda dataframe?

Plotting points extracted from a .txt file in python

using matplotlib /plotly to make an histogram

Categories

Resources