Trying to plot a simple graph in Jupyter Notebook with the package matplotlib, I came accross a strange problem that I had never had before.
I've seen that it has hapenned before to other people, and the answers talk about backends and other complicated stuff that I can't understand, me having only a rather basic knowledge of Python.
Here comes the code:
import numpy as np
import matplotlib.pyplot as plt
time_samples = np.arange(17000)
force_samples = np.arange(17000)
plt.plot(time_samples,force_samples)
plt.show()
time_samples2 = np.random.rand(1,1000)
force_samples2 = np.random.rand(1,1000)
plt.plot(time_samples2,force_samples2)
plt.show()
And this is what I get:
I have no clue why this is happenning.
I think the array dimension is the issue. x and y should be a 1D array.
import numpy as np
import matplotlib.pyplot as plt
time_samples = np.arange(17000)
force_samples = np.arange(17000)
plt.plot(time_samples,force_samples)
plt.show()
time_samples2 = np.random.rand(1000)
force_samples2 = np.random.rand(1000)
plt.plot(time_samples2,force_samples2)
plt.show()
Related
Hii experts i have written a simple python script for accessing the list of values and does some calculation in for loop,while plotting(x,y) it doesn't give the plot.My programme is given below.i hope some expert will help me rectifying the problem.Thanks in advance.
import math
import numpy as np
import matplotlib.pyplot as plt
a=1978780
b=[4,40,90,100,600,785,900]
for i in range(len(b)):
zx=math.exp(b[i]/a)*5
# print(b[i],zx)
plt.scatter(b[i],zx)
plt.show()
This will solve the issue. (But not recommended try the other solution, fast and efficient)
Or this
a=1978780
b=np.array([4,40,90,100,600,785,900])
zx=np.exp(b/a)*5
plt.plot(b,zx)
plt.show()
or this
a=1978780
b=np.array([4,40,90,100,600,785,900])
for i in range(len(b)):
zx=math.exp(b[i]/a)*5
# print(b[i],zx)
plt.scatter(b[i],zx)
plt.ylim(4.999,5.005)
plt.show()
Can't you apply the vectorized operation on the numpy array and plot it?
import matplotlib.pyplot as plt
import numpy as np
a=1978780
b=np.array([4,40,90,100,600,785,900])
zx=np.exp(b/a)*5
plt.plot(b,zx)
plt.show()
You want to use numpy's ability to work on arrays. That way, no need to iterate over lists anymore and you can write your function in a single line.
import matplotlib.pyplot as plt
import numpy as np
a = 1978780
b = np.array([4, 40, 90, 100, 600, 785, 900])
zx = np.exp(b/a)*5
fig, ax = plt.subplots()
ax.plot(b, zx)
fig.show()
I was trying to plot with matplotlib and xarray on Python 3.6.5.
My code is actually from the xarray documentation:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
airtemps = xr.tutorial.open_dataset('air_temperature')
print(airtemps)
air = airtemps.air - 273.15
air.attrs = airtemps.air.attrs
air.attrs['units'] = 'deg C'
air1d = air.isel(lat=10, lon=10)
air1d.plot() # using xarray.DataArray's plot()
When I ran the code (within the virturlenv) it didn't show any plotting window and no error was thrown.
My mistake I forgot to call plt.show() at the end.
I am a newbie in datascience and I was trying to plot a scatter plot for a dataset with 4000 rows. I am running Jupyter Notebook on a macbook. I found it took more than five minutes for the scatter plot to appear in the Jupyter notebook. My notebook was recently bought and it is 2.3Ghz intel core i5 and the memory is 8GB.
I have two questions: why it took so long? why the plot was so congested (for example, all x scales appeared small and they came together and could not be read clearly) and not very clear. The dataset is here: https://raw.githubusercontent.com/datascienceinc/learn-data-science/master/Introduction-to-K-means-Clustering/Data/data_1024.csv
I really appreciate for any englightments.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
df= pd.read_csv('/users/kyaw/Downloads/data_1024.csv')
df = df.join(df['Driver_ID'].str.split(expand=True))
df = df.drop(["Driver_ID"], axis=1)
df.columns=['Driver_ID','Distance_Feature','Speeding_Feature']
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
I tried to run your code and it didn't work. I make the following corrections
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
#%matplotlib inline --> Removed this inline, maybe is here due to jupyter
from sklearn.cluster import KMeans
df= pd.read_csv('./data_1024.csv',sep='\t' ) #indicate the separator as tab.
#remove the other instructions that are useless
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
I got this image
I have a series of data which consists of values from several experiments (1-40, in the MWE it is 1-5). The overall amount of entries in my original data is ~4.000.000, which I try to smooth in order to display it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import spline
from statsmodels.nonparametric.smoothers_lowess import lowess
df = pd.DataFrame()
df["values"] = np.random.randint(100000, 200000, 1000)
df["id"] = [1,2,3,4,5] * 200
plt.figure(1, figsize=(11.69,8.27))
# Both fail for my amount of data:
plt.plot(spline(df["values"], df["id"], range(100)), "r-")
plt.plot(lowess(df["values"], df["id"]), "r-")
Both, scipy.interplate and statsmodels.nonparametric.smoothers_lowess.lowess, throw out of memory exceptions for my data. Is there any efficient way to solve this like in, e.g., GNU R using ggplot2 and geom_smooth()?
I can't quite tell what you're getting at with all the dimensions to your data, but one very simple thing you can try is to just use the 'markevery' kwarg like so:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,100,1E7)
y=x**2
plt.figure(1, figsize=(11.69,8.27))
plt.plot(x,y,markevery=100)
plt.show()
This will only plot every nth point (n=100 here).
If that doesn't help then you may want to try just a simple numpy interpolation with fewer samples like so:
x_large=np.linspace(1,100,1E7)
y_large=x**2
x_small=np.linspace(1,100,1E3)
y_small=np.interp(x_small,x_large,y_large)
plt.plot(x_small,y_small)
Python (and matplotlib) newbie here coming over from R, so I hope this question is not too idiotic. I'm trying to make a loglog plot on a natural log scale. But after some googling I cannot somehow figure out how to force pyplot to use a base e scale on the axes. The code I have currently:
import matplotlib.pyplot as pyplot
import math
e = math.exp(1)
pyplot.loglog(range(1,len(degrees)+1),degrees,'o',basex=e,basey=e)
Where degrees is a vector of counts at each value of range(1,len(degrees)+1). For some reason when I run this code, pyplot keeps giving me a plot with powers of 2 on the axes. I feel like this ought to be easy, but I'm stumped...
Any advice is greatly appreciated!
When plotting using plt.loglog you can pass the keyword arguments basex and basey as shown below.
From numpy you can get the e constant with numpy.e (or np.e if you import numpy as np)
import numpy as np
import matplotlib.pyplot as plt
# Generate some data.
x = np.linspace(0, 2, 1000)
y = x**np.e
plt.loglog(x,y, basex=np.e, basey=np.e)
plt.show()
Edit
Additionally if you want pretty looking ticks you can use matplotlib.ticker to choose the format of your ticks, an example of which is given below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
x = np.linspace(1, 4, 1000)
y = x**3
fig, ax = plt.subplots()
ax.loglog(x,y, basex=np.e, basey=np.e)
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
ax.xaxis.set_major_formatter(mtick.FuncFormatter(ticks))
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
It can also works for semilogx and semilogy to show them in e and also change their name.
import matplotlib.ticker as mtick
fig, ax = plt.subplots()
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
plt.semilogy(Time_Series, California_Pervalence ,'gray', basey=np.e )
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
Take a look at the image.