python: scatter plot logarithmic scale - python

In my code, I take the logarithm of two data series and plot them. I would like to change each tick value of the x-axis by raising it to the power of e (anti-log of natural logarithm).
In other words. I want to graph the logarithms of both series but have x-axis in levels.
Here is the code that I'm using.
from pylab import scatter
import pylab
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
file_name = '/Users/joedanger/Desktop/Python/scatter_python.csv'
data = DataFrame(pd.read_csv(file_name))
y = np.log(data['o_value'], dtype='float64')
x = np.log(data['time_diff_day'], dtype='float64')
fig = plt.figure()
plt.scatter(x, y, c='blue', alpha=0.05, edgecolors='none')
fig.suptitle('test title', fontsize=20)
plt.xlabel('time_diff_day', fontsize=18)
plt.ylabel('o_value', fontsize=16)
plt.xticks([-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4])
plt.grid(True)
pylab.show()

let matplotlib take the log for you:
fig = plt.figure()
ax = plt.gca()
ax.scatter(data['o_value'] ,data['time_diff_day'] , c='blue', alpha=0.05, edgecolors='none')
ax.set_yscale('log')
ax.set_xscale('log')
If you are using all the same size and color markers, it is faster to use plot
fig = plt.figure()
ax = plt.gca()
ax.plot(data['o_value'] ,data['time_diff_day'], 'o', c='blue', alpha=0.05, markeredgecolor='none')
ax.set_yscale('log')
ax.set_xscale('log')

The accepted answer is a bit out of date. At least pandas 0.25 natively supports log axes:
# logarithmic X
df.plot.scatter(..., logx=True)
# logarithmic Y
df.plot.scatter(..., logy=True)
# both
df.plot.scatter(..., loglog=True)

Related

How to plot a density bar next to my density scatter plot? [duplicate]

I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot

How to set title position inside graph in scatter plot?

MWE:
I would like the title position same as in the graph :
Here is my code :
import matplotlib.pyplot as plt
import numpy as np
import random
fig, ax = plt.subplots()
x = random.sample(range(256),200)
y = random.sample(range(256),200)
cor=np.corrcoef(x,y)
plt.scatter(x,y, color='b', s=5, marker=".")
#plt.scatter(x,y, label='skitscat', color='b', s=5, marker=".")
ax.set_xlim(0,300)
ax.set_ylim(0,300)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Correlation Coefficient: %f'%cor[0][1])
#plt.legend()
fig.savefig('plot.png', dpi=fig.dpi)
#plt.show()
But this gives :
How do I fix this title position?
assign two corresponded value to X and Y axis. notice! to have title inside graph, values should be in (0,1) interval. you can see a sample code here:
import matplotlib. pyplot as plt
A= [2,1,4,5]; B = [3,2,-2,1]
plt.scatter(A,B)
plt.title("title", x=0.9, y=0.9)
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.show()
It will be unnecessarily complicated to move the title at some arbitrary position inside the axes.
Instead one would rather create a text at the desired position.
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
x = np.random.randint(256,size=200)
y = np.random.randint(256,size=200)
cor=np.corrcoef(x,y)
ax.scatter(x,y, color='b', s=5, marker=".")
ax.set_xlim(0,300)
ax.set_ylim(0,300)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.text(0.9, 0.9, 'Correlation Coefficient: %f'%cor[0][1],
transform=ax.transAxes, ha="right")
plt.show()

How to create histogram with multiple arrays with various length, with percentage on y axes with matplotlib

I would like to create histogram plot for multiple arrays, that will have shared percentage y-axis.
For example, this plot correctly:
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
# these are my measurements, unsorted
num_of_points = 10000
num_of_bins = 20
data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution
fig, ax = plt.subplots()
ax.hist(data, bins=num_of_bins, edgecolor='black', alpha=0.3)
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage")
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data)))
plt.show()
But when I add another data with diffrent lenght, percentage for data2 is off, because PercentFormatter takes len(data).
num_of_points = 10000
num_of_points2 = 30000
num_of_bins = 20
data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution
data2 = np.random.randn(num_of_points2)
fig, ax = plt.subplots()
ax.hist(data, bins=num_of_bins, edgecolor='black', alpha=0.3)
ax.hist(data2, bins=num_of_bins, edgecolor='black', alpha=0.3)
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage")
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data2)))
plt.show()
So how can I have shared percentage y-ax, that will be correct for both data arrays?
I think one way to solve this issue would be plotting the second data using the secondary y-axis.
Try this!
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
# these are my measurements, unsorted
num_of_points = 10000
num_of_bins = 20
data = np.random.randn(num_of_points) # generate random numbers from a gaussian distribution
fig, ax = plt.subplots()
ax.hist(data, bins=num_of_bins, color='blue', edgecolor='black', alpha=0.1)
ax.set_title("Histogram")
ax.set_xlabel("X axis")
ax.set_ylabel("Percentage of data")
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data)))
ax1 = ax.twinx()
num_of_points2 = 30000
data2 = np.random.randn(num_of_points2)
ax1.hist(data2, bins=num_of_bins, color='orange', edgecolor='black', alpha=0.1)
ax1.set_ylabel("Percentage of data2")
ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=len(data2)))
plt.show()

Superimposition of histogram and density in Pandas/Matplotlib in Python

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:
import pandas as pd
import matplotlib.pyplot as plt
Maxv=200
plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):
yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
Some hint? (Additional question: how can I change the width of density smoothing?)
Based on your code, this should work:
ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])
You might not even need the secondary_y anymore.
No I try this:
ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
But the range part doesn't work, and ther's still the left y-axis problem
Seaborn makes this easy
import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

seaborn: how to make a tsplot square

I would like to create a tsplot, where the x and the y axis are the same length. in other words the aspect ratio of the graph should be 1.
this dos not work:
fig, ax = plt.subplots()
fig.set_size_inches(2, 2)
sns.tsplot(data=df, condition=' ', time='time', value='value', unit=' ', ax=ax)
You could change the aspect ratio of your plots by controlling the aspect
parameter of a matplotlib object as shown:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(22)
sns.set_style("whitegrid")
gammas = sns.load_dataset("gammas")
fig = plt.figure()
ax = fig.add_subplot(111, aspect=2) #Use 'equal' to have the same scaling for x and y axes
sns.tsplot(time="timepoint", value="BOLD signal", unit="subject",
condition="ROI", data=gammas, ax=ax)
plt.tight_layout()
plt.show()
A little more direct is ax.set_box_aspect(1)1

Categories