I am struggling a bit with the pandas transformations needed to make data render in 3D on matplot lib. The data I have is usually in columns of numbers (usually time and some value). So lets create some test data to illustrate.
import pandas as pd
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# create the data and coords
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
Zinverse = list(map(lambda d:1 if d == '.' else -int(d), pattern))
Xdata = [x for y in range(1,10) for x in range(1,10)]
Ydata = [y for y in range(1,10) for x in range(1,10)]
# pivot the data into columns
data = [d for d in zip(Xdata,Ydata,Zdata,Zinverse)]
# create the data frame
df = pd.DataFrame(data, columns=['X','Y','Z',"Zi"], index=zip(Xdata,Ydata))
df.head(5)
Edit: This block of data is demo data that would normally come from a query on a
database that may need more cleaning and transforms before plotting. In this case data is already aligned and there are no problems aside having one more column we don't need (Zi).
So the numbers in pattern are transferred into height data in the Z column of df ('Zi' being the inverse image) and with that as the data frame I've struggled to come up with this pivot method which is 3 separate operations. I wonder if that can be better.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Xs = df.pivot(index='X', columns='Y', values='X').values
Ys = df.pivot(index='X', columns='Y', values='Y').values
Zs = df.pivot(index='X', columns='Y', values='Z').values
ax.plot_surface(Xs,Ys,Zs, cmap=cm.RdYlGn)
plt.show()
Although I have something working I feel there must be a better way than what I'm doing. On a big data set I would imagine doing 3 pivots is an expensive way to plot something. Is there a more efficient way to transform this data ?
I guess you can avoid some steps during the preparation of the data by not using pandas (but only numpy arrays) and by using some convenience fonctions provided by numpy such as linespace and meshgrid.
I rewrote your code to do so, trying to keep the same logic and the same variable names :
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# Extract the value according to your logic
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
# Assuming the pattern is always a square
size = int(len(Zdata) ** 0.5)
# Create a mesh grid for plotting the surface
Xdata = np.linspace(1, size, size)
Ydata = np.linspace(1, size, size)
Xs, Ys = np.meshgrid(Xdata, Ydata)
# Convert the Zdata to a numpy array with the appropriate shape
Zs = np.array(Zdata).reshape((size, size))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the surface
ax.plot_surface(Xs, Ys, Zs, cmap=cm.RdYlGn)
plt.show()
I have a sequence of data files which contain two columns of data (x value, and z value). I want to asign each file with a unique constant y value with a loop and then use x,y,z values to make a contour plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files=glob.glob('C:\Users\DDT\Desktop\DATA TIANYU\materials\AB2O4\synchronchron\OX1\YbFe1Mn1O4_2cyc_600_meth_ox1-*.xye')
s1=1
for file in files:
t1=s1/3
x,z = np.loadtxt(file,skiprows=3,unpack=True, usecols=[0,1])
def f(x, y):
return x*0 +y*0 +z
l1=np.size(x)
y=np.full(l1, t1,dtype=int)
X,Y=np.meshgrid(x,y)
Z = f(X,Y)
plt.contour(X,Y,Z)
s1=s1+1
continue
plt.show()
There is no error in this code, however what I got is an empty figure with nothing.
What mistake did I make?
It is very hard to guess what you're trying to do. Here is an attempt. It supposes that all x-arrays are equal. And that the y really makes sense (although that is hard if the files are read in an unspecified order). To get a useful plot, the data from all the files should be collected before starting to plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files = glob.glob('........')
zs = []
for file in files:
x, z = np.loadtxt(file, skiprows=3, unpack=True, usecols=[0, 1])
zs.append(z)
# without creating a new x, the x from the last file will be used
# x = np.linspace(0, 15, 10)
y = np.linspace(-100, 1000, len(zs))
zs = np.array(zs)
fig, axs = plt.subplots(ncols=2)
axs[0].scatter(np.tile(x, y.size), np.repeat(y, x.size), c=zs)
axs[1].contour(x, y, zs)
plt.show()
With simulated random data, the scatter plot and the contour plot would look like:
I am trying to compare and get a proper point of intersection between the two CSV files. I am using the graph depiction for better understanding.
But I am getting very diminished image of one graph as compared to another.
See the following:
Here is the data: trade-volume.csv
Here is the real graph:
Here is the data: miners-revenue.csv
Here is the real graph:
Here is the program I wrote for comparison:
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'])
ax.plot(dat3['timeDiff'], dat3['Value'])
plt.show()
I got the output like the following:
As one can see the orange color graph is very low and I could not understand the points as it is lower. I am willing to overlap the graphs and then check.
Please help me make it possible with my existing code, if no alteration required.
The problem comes down to your y axis. One has a maximum of 60,000,000 while the other has a maximum of 6,000,000,000. Trying to plot these on the same graph is going to lead to one "looking" like a straight line even though it isn't if you zoom in.
A possible solution is to use a second y axis (you can change the color of the lines using the color= argument in ax.plot():
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'], color="blue")
ax2=ax.twinx()
ax2.plot(dat3['timeDiff'], dat3['Value'], color="red")
plt.show()
Both data live on very different scales. You may normalize both in order to compare them.
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value']/dat2['Value'].values.max())
ax.plot(dat3['timeDiff'], dat3['Value']/dat3['Value'].values.max())
plt.show()
I have a scatter plot that I'd like to output as SVG (Python 3.5). However, when used with agg as backend, some points are simply missing. See the data and the PNG and SVG output. Is this some kind of misconfiguration or a bug?
Code:
import matplotlig
matplotlib.use('agg')
import matplotlib.pyplot as plt
x = [22752.9597858324,33434.3100283611,None,None,3973.2239542398,None,None,None
,None,None,None,None,None,960.6513071797,None,None,None,None,None,None,None
,None,None,None,None,None,749470.931292081,None,None,None,None,None,None
,None,None,None,None,None,None,None,None,23045.262784499,None,None,None
,None,None,None,None,1390.8383822667,None,None,9802.5632611025
,3803.3240362092,None,None,None,None,None,2058.1191666219,None
,3777.5383953988,None,91224.0759036624,23296.1857550166,27956.249381887
,None,237247.707648005,None,None,None,None,None,None,None,None,None
,760.3493458787,None,321687.799104496,None,None,22339.5617383239,None,None
,None,None,None,28135.0261453192,None,None,None,None,None,None,None
,1687.4387356974,None,None,29037.8494868489,None,None,None,None,None,None
,None,3937.3066755226,None,None,None,None]
y = [63557.4319306279,None,None,None,9466.0204228915,None,None,None,None,None
,None,None,None,3080.3393940948,None,None,None,None,None,None,None,None
,None,None,None,None,592184.803802073,None,None,None,None,None,None,None
,None,None,None,None,None,None,None,18098.725166318,None,None,None,None
,None,None,None,789.2710621298,None,None,7450.9539135753,4251.6033622036
,None,None,None,None,None,1277.1691956597,None,4273.5950324508,None
,51861.5572682614,19415.3369388317,2117.2407148378,None,160776.887146683
,None,None,None,None,None,None,None,None,None,1550.3003177484,None
,402333.163939038,None,None,16604.3340243551,None,None,None,None,None
,32545.0784355136,None,None,None,None,None,None,None,2567.9264180605,None
,None,45786.935597305,None,None,None,None,None,None,None,5645.5218715636
,None,None,None,None]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, '.')
fig.savefig('/home/me/test_svg', format='svg')
fig.savefig('/home/me/test_png', format='png')
The result:
PNG:
SVG:
The problem seems to be related to the None values. Though there is simply no point included if no matching point exists, it seems to influence the rendering of the SVG. Removing both entries if at one or the other point is None fixes the issue.
data = np.array([x, y])
data = data.transpose()
# Filter out pairs of points of which at least one is None.
data = [pair for pair in data if pair[0] and pair[1]]
data = np.array(data).transpose()
x = data[0]
y = data[1]
ax.plot(x, y, '.')
fig.savefig('/home/me/test_svg', format='svg')
fig.savefig('/home/me/test_png', format='png')
Update
This looks like a bug that was fixed some time between matplotlib 2.0.0 and 3.1.1. Upgrading solved the problem for me.
Original Answer
I ran into the same problem, so I created a minimal example to reproduce it:
import numpy as np
from matplotlib import pyplot as plt
data = np.array([1.0, np.nan, 1.0])
plt.plot(data, 'o')
plt.savefig('example.svg')
plt.savefig('example.png')
It works fine as a PNG:
However, the left point is missing from the SVG.
Using your suggestion of removing invalid data, I used the numpy indexing features:
import numpy as np
from matplotlib import pyplot as plt
data = np.array([1.0, np.nan, 1.0])
indexes = np.arange(data.size)
is_valid = np.negative(np.isnan(data))
plt.plot(indexes[is_valid], data[is_valid], 'o')
plt.savefig('example.svg')
plt.savefig('example.png')
Now the PNG and the SVG display both points.