HDF5 file to diagram in python

HDF5 file to diagram in python - python

I'm trying to generate some diagrams from an .h5 file but I don't know how to do it.
I'm using pytables, numpy and matplotlib.
The hdf5 files I use contains 2 sets of data, 2 differents curves.
My goal is to get diagrams like this one.
This is what I managed to do for the moment:
import tables as tb
import numpy as np
import matplotlib.pyplot as plt
h5file = tb.openFile(args['FILE'], "a")
for group in h5file.walkGroups("/"):
for array in h5file.walkNodes("/","Array"):
if(isinstance(array.atom.dflt, int)):
tab = np.array(array.read())
x = tab[0]
y = tab[1]
plt.plot(x, y)
plt.show()
x and y values are good but I don't know how to use them, so the result is wrong. I get a triangle instead of what I want ^^
Thank you for your help
EDIT
I solved my problem.
Here is the code :
fig = plt.figure()
tableau = np.array(array.read())
x = tableau[0]
y = tableau[1]
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot(x)
ax2.plot(y)
plt.title(array.name)
plt.show()

Related

How to use pandas with matplotlib to create 3D plots

I am struggling a bit with the pandas transformations needed to make data render in 3D on matplot lib. The data I have is usually in columns of numbers (usually time and some value). So lets create some test data to illustrate.
import pandas as pd
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# create the data and coords
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
Zinverse = list(map(lambda d:1 if d == '.' else -int(d), pattern))
Xdata = [x for y in range(1,10) for x in range(1,10)]
Ydata = [y for y in range(1,10) for x in range(1,10)]
# pivot the data into columns
data = [d for d in zip(Xdata,Ydata,Zdata,Zinverse)]
# create the data frame
df = pd.DataFrame(data, columns=['X','Y','Z',"Zi"], index=zip(Xdata,Ydata))
df.head(5)
Edit: This block of data is demo data that would normally come from a query on a
database that may need more cleaning and transforms before plotting. In this case data is already aligned and there are no problems aside having one more column we don't need (Zi).
So the numbers in pattern are transferred into height data in the Z column of df ('Zi' being the inverse image) and with that as the data frame I've struggled to come up with this pivot method which is 3 separate operations. I wonder if that can be better.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
Xs = df.pivot(index='X', columns='Y', values='X').values
Ys = df.pivot(index='X', columns='Y', values='Y').values
Zs = df.pivot(index='X', columns='Y', values='Z').values
ax.plot_surface(Xs,Ys,Zs, cmap=cm.RdYlGn)
plt.show()
Although I have something working I feel there must be a better way than what I'm doing. On a big data set I would imagine doing 3 pivots is an expensive way to plot something. Is there a more efficient way to transform this data ?

I guess you can avoid some steps during the preparation of the data by not using pandas (but only numpy arrays) and by using some convenience fonctions provided by numpy such as linespace and meshgrid.
I rewrote your code to do so, trying to keep the same logic and the same variable names :
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
pattern = ("....1...."
"....1...."
"..11111.."
".1133311."
"111393111"
".1133311."
"..11111.."
"....1...."
"....1....")
# Extract the value according to your logic
Zdata = list(map(lambda d:0 if d == '.' else int(d), pattern))
# Assuming the pattern is always a square
size = int(len(Zdata) ** 0.5)
# Create a mesh grid for plotting the surface
Xdata = np.linspace(1, size, size)
Ydata = np.linspace(1, size, size)
Xs, Ys = np.meshgrid(Xdata, Ydata)
# Convert the Zdata to a numpy array with the appropriate shape
Zs = np.array(Zdata).reshape((size, size))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the surface
ax.plot_surface(Xs, Ys, Zs, cmap=cm.RdYlGn)
plt.show()

contour plot with mutiplile files

I have a sequence of data files which contain two columns of data (x value, and z value). I want to asign each file with a unique constant y value with a loop and then use x,y,z values to make a contour plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files=glob.glob('C:\Users\DDT\Desktop\DATA TIANYU\materials\AB2O4\synchronchron\OX1\YbFe1Mn1O4_2cyc_600_meth_ox1-*.xye')
s1=1
for file in files:
t1=s1/3
x,z = np.loadtxt(file,skiprows=3,unpack=True, usecols=[0,1])
def f(x, y):
return x*0 +y*0 +z
l1=np.size(x)
y=np.full(l1, t1,dtype=int)
X,Y=np.meshgrid(x,y)
Z = f(X,Y)
plt.contour(X,Y,Z)
s1=s1+1
continue
plt.show()
There is no error in this code, however what I got is an empty figure with nothing.
What mistake did I make?

It is very hard to guess what you're trying to do. Here is an attempt. It supposes that all x-arrays are equal. And that the y really makes sense (although that is hard if the files are read in an unspecified order). To get a useful plot, the data from all the files should be collected before starting to plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files = glob.glob('........')
zs = []
for file in files:
x, z = np.loadtxt(file, skiprows=3, unpack=True, usecols=[0, 1])
zs.append(z)
# without creating a new x, the x from the last file will be used
# x = np.linspace(0, 15, 10)
y = np.linspace(-100, 1000, len(zs))
zs = np.array(zs)
fig, axs = plt.subplots(ncols=2)
axs[0].scatter(np.tile(x, y.size), np.repeat(y, x.size), c=zs)
axs[1].contour(x, y, zs)
plt.show()
With simulated random data, the scatter plot and the contour plot would look like:

Data Points not being plotted on a Matplotlib plot

Hello I am attempting to write a program that allows the plotting of the graph from various data sets from a excel database.(The x axis is a fixed set of values while the data values from other columns can be selected). However, the graph that is plotted only contains the axes of the graph, while the data points are completely missing. The code I have used is as such:
import xlrd
import matplotlib.pyplot as plt
from matplotlib.figure import *
loc = ("C:\\Users\\yeoho\\DCO_Raw_Data.xlsx")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0,0)
x = [[sheet.cell_value(r,0)]for r in range(6,sheet.nrows)]
checkOn = True
while checkOn:
FileName = [[sheet.cell_value(0,c)]for c in range(1,13)]
print(FileName)
print("Enter the Integer (1-n) corresponding to the file name that you would like to plot")
z = int(input())
y = [[sheet.cell_value(r,z)]for r in range(6,sheet.nrows)]
fig = plt.figure()
ax = fig.add_subplot(111)
assert len(x) == len(y)
for i in range(len(x)):
plt.plot(x[i],y[i],color='black')
plt.show()
break
The code in lines 16-21 were taken from another stackoverflow page. How to plot two lists of tuples with Matplotlib
The original code did not have a color parameter but I have found out that that is not the source of the issue.
I am unsure of what the issue here is. Thank you for taking your time to read this and I hope you can help me with this issue.

Not getting the proper graph comparison using Python

I am trying to compare and get a proper point of intersection between the two CSV files. I am using the graph depiction for better understanding.
But I am getting very diminished image of one graph as compared to another.
See the following:
Here is the data: trade-volume.csv
Here is the real graph:
Here is the data: miners-revenue.csv
Here is the real graph:
Here is the program I wrote for comparison:
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'])
ax.plot(dat3['timeDiff'], dat3['Value'])
plt.show()
I got the output like the following:
As one can see the orange color graph is very low and I could not understand the points as it is lower. I am willing to overlap the graphs and then check.
Please help me make it possible with my existing code, if no alteration required.

The problem comes down to your y axis. One has a maximum of 60,000,000 while the other has a maximum of 6,000,000,000. Trying to plot these on the same graph is going to lead to one "looking" like a straight line even though it isn't if you zoom in.
A possible solution is to use a second y axis (you can change the color of the lines using the color= argument in ax.plot():
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'], color="blue")
ax2=ax.twinx()
ax2.plot(dat3['timeDiff'], dat3['Value'], color="red")
plt.show()

Both data live on very different scales. You may normalize both in order to compare them.
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value']/dat2['Value'].values.max())
ax.plot(dat3['timeDiff'], dat3['Value']/dat3['Value'].values.max())
plt.show()

Matplotlib: Points do not show in SVG

I have a scatter plot that I'd like to output as SVG (Python 3.5). However, when used with agg as backend, some points are simply missing. See the data and the PNG and SVG output. Is this some kind of misconfiguration or a bug?
Code:
import matplotlig
matplotlib.use('agg')
import matplotlib.pyplot as plt
x = [22752.9597858324,33434.3100283611,None,None,3973.2239542398,None,None,None
,None,None,None,None,None,960.6513071797,None,None,None,None,None,None,None
,None,None,None,None,None,749470.931292081,None,None,None,None,None,None
,None,None,None,None,None,None,None,None,23045.262784499,None,None,None
,None,None,None,None,1390.8383822667,None,None,9802.5632611025
,3803.3240362092,None,None,None,None,None,2058.1191666219,None
,3777.5383953988,None,91224.0759036624,23296.1857550166,27956.249381887
,None,237247.707648005,None,None,None,None,None,None,None,None,None
,760.3493458787,None,321687.799104496,None,None,22339.5617383239,None,None
,None,None,None,28135.0261453192,None,None,None,None,None,None,None
,1687.4387356974,None,None,29037.8494868489,None,None,None,None,None,None
,None,3937.3066755226,None,None,None,None]
y = [63557.4319306279,None,None,None,9466.0204228915,None,None,None,None,None
,None,None,None,3080.3393940948,None,None,None,None,None,None,None,None
,None,None,None,None,592184.803802073,None,None,None,None,None,None,None
,None,None,None,None,None,None,None,18098.725166318,None,None,None,None
,None,None,None,789.2710621298,None,None,7450.9539135753,4251.6033622036
,None,None,None,None,None,1277.1691956597,None,4273.5950324508,None
,51861.5572682614,19415.3369388317,2117.2407148378,None,160776.887146683
,None,None,None,None,None,None,None,None,None,1550.3003177484,None
,402333.163939038,None,None,16604.3340243551,None,None,None,None,None
,32545.0784355136,None,None,None,None,None,None,None,2567.9264180605,None
,None,45786.935597305,None,None,None,None,None,None,None,5645.5218715636
,None,None,None,None]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, '.')
fig.savefig('/home/me/test_svg', format='svg')
fig.savefig('/home/me/test_png', format='png')
The result:
PNG:
SVG:

The problem seems to be related to the None values. Though there is simply no point included if no matching point exists, it seems to influence the rendering of the SVG. Removing both entries if at one or the other point is None fixes the issue.
data = np.array([x, y])
data = data.transpose()
# Filter out pairs of points of which at least one is None.
data = [pair for pair in data if pair[0] and pair[1]]
data = np.array(data).transpose()
x = data[0]
y = data[1]
ax.plot(x, y, '.')
fig.savefig('/home/me/test_svg', format='svg')
fig.savefig('/home/me/test_png', format='png')

Update
This looks like a bug that was fixed some time between matplotlib 2.0.0 and 3.1.1. Upgrading solved the problem for me.
Original Answer
I ran into the same problem, so I created a minimal example to reproduce it:
import numpy as np
from matplotlib import pyplot as plt
data = np.array([1.0, np.nan, 1.0])
plt.plot(data, 'o')
plt.savefig('example.svg')
plt.savefig('example.png')
It works fine as a PNG:
However, the left point is missing from the SVG.
Using your suggestion of removing invalid data, I used the numpy indexing features:
import numpy as np
from matplotlib import pyplot as plt
data = np.array([1.0, np.nan, 1.0])
indexes = np.arange(data.size)
is_valid = np.negative(np.isnan(data))
plt.plot(indexes[is_valid], data[is_valid], 'o')
plt.savefig('example.svg')
plt.savefig('example.png')
Now the PNG and the SVG display both points.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

HDF5 file to diagram in python - python

Related

How to use pandas with matplotlib to create 3D plots

contour plot with mutiplile files

Data Points not being plotted on a Matplotlib plot

Not getting the proper graph comparison using Python

Matplotlib: Points do not show in SVG

Categories

Resources