plot data from CSV file with matplotlib - python

I have a CSV file at e:\dir1\datafile.csv.
It contains three columns and 10 heading and trailing lines need to be skipped.
I would like to plot it with numpy.loadtxt(), for which I haven't found any rigorous documentation.
Here is what I started to write from the several tries I found on the web.
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def read_datafile(file_name):
# the skiprows keyword is for heading, but I don't know if trailing lines
# can be specified
data = np.loadtxt(file_name, delimiter=',', skiprows=10)
return data
data = read_datafile('e:\dir1\datafile.csv')
x = ???
y = ???
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Mains power stability")
ax1.set_xlabel('time')
ax1.set_ylabel('Mains voltage')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()

According to the docs numpy.loadtxt is
a fast reader for simply formatted files. The genfromtxt function provides more sophisticated handling of, e.g., lines with missing values.
so there are only a few options to handle more complicated files.
As mentioned numpy.genfromtxt has more options. So as an example you could use
import numpy as np
data = np.genfromtxt('e:\dir1\datafile.csv', delimiter=',', skip_header=10,
skip_footer=10, names=['x', 'y', 'z'])
to read the data and assign names to the columns (or read a header line from the file with names=True) and than plot it with
ax1.plot(data['x'], data['y'], color='r', label='the data')
I think numpy is quite well documented now. You can easily inspect the docstrings from within ipython or by using an IDE like spider if you prefer to read them rendered as HTML.

I'm guessing
x= data[:,0]
y= data[:,1]

Related

How to scatter plot with 1 million points

I'm trying to make a program that draws a graph using given points from a csv file which contains 4 strings for each row (Number of the point, x pos, y pos, color), but the time it takes is ridiculously high, so i'm looking for ideas to make it faster.
from matplotlib import pyplot as plt
from matplotlib import style
import csv
style.use('ggplot')
s = 0.5
with open('total.csv') as f:
f_reader = csv.reader(f, delimiter=',')
for row in f_reader:
plt.scatter(str(row[1]), str(row[2]), color=str(row[3]), s=s)
plt.savefig("graph.png", dpi=1000)
The first step would be to call scatter once instead of for every points, without adding a dependency on numpy and pandas it could look like:
from matplotlib import pyplot as plt
from matplotlib import style
import csv
style.use("ggplot")
s = 0.5
x = []
y = []
c = []
with open("total.csv") as f:
f_reader = csv.reader(f, delimiter=",")
for row in f_reader:
x.append(row[1])
y.append(row[2])
c.append(row[3])
plt.scatter(x, y, color=c, s=s)
plt.savefig("graph.png", dpi=1000)
Then maybe try pandas.read_csv which would give you an pandas dataframe allowing you to access the columns of your CSV without a for loop, which would probably be faster.
Each time you try a variation, measure the time it take (possibly on a smaller file) to know what help and what don't, in other words, don't try to enhance perfs blindly.
Using pandas it would look like:
from matplotlib import pyplot as plt
from matplotlib import style
import pandas as pd
style.use("ggplot")
total = pd.read_csv("total.csv")
plt.scatter(total.x, total.y, color=total.color, s=0.5)
plt.savefig("graph.png", dpi=1000)
If you want to learn more on pandas good practices for performance, I like the No more sad pandas talk, take a look at it.

Matplotlib: Generating Subplots for Multiple Time Series

I have the following dataset that was randomly generated through a simulation I am building:
https://drive.google.com/drive/folders/1JF5QrliE9s8VPMaGc8Z-mwpFhNWkeYtk?usp=sharing
For debugging purposes, I would like to be able to view this data in a series of small multiples. Like this:
I am attempting to do this using matplotlib and pandas. Here is my code for that:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
"""
Graphs a single file of data
and exports it as a pdf of separate charts.
"""
data = pd.read_csv(f)
header = data.columns
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
n = len(header)
time: str = header[0]
# Multiple charts on one page
fig = plt.figure()
for i in range(1, n):
y: str = header[i]
ax = fig.add_subplot()
data.plot(x=time, y=y)
pdf.savefig(bbox_inches='tight')
When I open up the .csv file and try to run the function using a Jupyter notebook, I get the same deprecation warning over and over again:
<ipython-input-5-0563709f3c08>:24: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
ax = fig.add_subplot()
The resulting pdf file does not contain a single page with multiple graphs (which is what I want like in the first image) but just a single page with a single graph:
What exactly am I doing wrong? I greatly appreciate any feedback you can give.
Here is a solution that should meet your needs. It reads the csv file into a dataframe and iterates through the columns of the dataframe to plot corresponding subplots.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
df = pd.read_csv(f)
fig, axs = plt.subplots(nrows=3, ncols=3)
fig.set_size_inches(20, 10)
fig.subplots_adjust(wspace=0.5)
fig.subplots_adjust(hspace=0.5)
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
for col, ax in zip(df.columns[1:], axs.flatten()):
ax.plot(df['time (days)'], df[col])
ax.set(xlabel='time (days)', ylabel=col)
ax.tick_params(axis='x', labelrotation=30)
pdf.savefig(bbox_inches='tight')
plt.show()

HDF5 file to diagram in python

I'm trying to generate some diagrams from an .h5 file but I don't know how to do it.
I'm using pytables, numpy and matplotlib.
The hdf5 files I use contains 2 sets of data, 2 differents curves.
My goal is to get diagrams like this one.
This is what I managed to do for the moment:
import tables as tb
import numpy as np
import matplotlib.pyplot as plt
h5file = tb.openFile(args['FILE'], "a")
for group in h5file.walkGroups("/"):
for array in h5file.walkNodes("/","Array"):
if(isinstance(array.atom.dflt, int)):
tab = np.array(array.read())
x = tab[0]
y = tab[1]
plt.plot(x, y)
plt.show()
x and y values are good but I don't know how to use them, so the result is wrong. I get a triangle instead of what I want ^^
Thank you for your help
EDIT
I solved my problem.
Here is the code :
fig = plt.figure()
tableau = np.array(array.read())
x = tableau[0]
y = tableau[1]
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot(x)
ax2.plot(y)
plt.title(array.name)
plt.show()

Extract header names from a CSV and use it to plot against each other in Python?

I am pretty new to python and coding in general. I have this code so far.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
mSec = data[:,0]
Airspeed = data[:,10]
AS_Cmd = data[:,25]
airspeed = data[:,3]
plt.rc('xtick', labelsize=25) #increase xaxis tick size
plt.rc('ytick', labelsize=25) #increase yaxis tick size
fig, ax = plt.subplots(figsize=(40,40), edgecolor='b')
ax.patch.set_facecolor('white')
ax.plot(mSec, Airspeed, label='Ground speed [m/s]')
ax.plot(mSec, AS_Cmd, label='Voltage [V]')
plt.legend(loc='best',prop={'size':20})
fig.savefig('trans2.png', dpi=(200), bbox_inches='tight') #borderless on save
However, I don't want to individually read every data column there is. I want to be able to load a csv file and have it read out all column names, then asks the users what you want for your x-axis and y-axis and plots that graph. The csv file format is:
time(s),speed(mph),heading,bvoltage(v)
20,30,50,10
25,45,50,10
30,50,55,9
Here is my attempt at the code but I am missing a lot of information:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('data.csv', delimiter=',')
## names = where I will store the column names
white True:
## display names to user
print ('Pick your x-axis')
xaxis = input()
print ('Pick your y-axis')
yaxis1 = input()
print('pick a 2nd y-axis or enter none')
yaxis2 = input()
if input()= 'none'
break;
else continue
#plot xaxis vs yaxis vs 2nd yaxis
I understand the loop is not correct. I don't want anyone to correct me on that I will figure it out myself, however, I would like a way to access those values from the CSV file so that I can use it in that method.
Using pandas you can do:
import pandas as pd
data = pd.read_csv("yourFile.csv", delimiter=",")
and plot columns with names ColName1, ColName2 against each other with:
data.plot(x='Col1', y='Col2')
If you have a first line in the csv file with the desired names of the columns, pandas will pick those automatically, otherwise you can play with the header argument of read_csv.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
If you don't mind using/installing another module then pandas should do it.

Error messages in Matplotlib

I am trying to chart this data in matplotlib and I am getting the following error message:
raise TypeError('Unrecognized argument type %s to close'%type(arg))
TypeError: Unrecognized argument type <type 'list'> to close
The data I am sending to it is not a string, it is a float as you can see from the code below:
import os
import csv
import glob as g
import pprint as p
import matplotlib.pyplot as plt
os.chdir('F:\\')
def graphWriter():
for file in g.glob('*.TXT'):
for col in csv.DictReader(open(file,'rU')):
set_ = int(col[' Set'])
iriR = float(col[' IRI R e'])
iriL = float(col['IRI LWP '])
rutL = float(col[' RUT L e'])
rutR = float(col[' RUT R e'])
start = float(col['Start-Mi'])
end = float(col[' End-Mi'])
fig = plt.plot(iriR,iriL)
plt.show()
plt.close(fig)
graphWriter()
Though the window is coming up to chart the data and the units are correct, there is also no line in the chart, probably that's stemming from the apparent data issue. So the question is whats causing the error message, and whats causing there to be no data lines in the chart. But the two are most likely related. Here is some of the input data though I am only trying to graph the two datasets to the right side which would be iriR and iriL as show above:
(194.449, 194.549, 90.0, 77.9)
(194.549, 194.649, 84.6, 81.5)
(194.649, 194.749, 88.4, 84.1)
(194.749, 194.849, 69.5, 82.9)
(194.849, 194.949, 76.2, 71.0)
The problem is that the function plt.plot returns a list of lines (that were added to the plot), and not a Figure object --- while plt.close only accepts a Figure object. There are numerous ways to work around this,
First, get the figure object ("get current figure"):
fig = plt.gcf()
plt.close(fig)
Second, call close with no arguments: plt.close() --- this will automatically close the active figure.
Third, close all figures: plt.close('all').
All of these usages are covered in the matplotlib.pyplot.close documentation.
Edit:
The next issue is that you're not storing an array of values to your variables, instead you're just storing a single floating value. You can initialize a list, and store new elements to it.
os.chdir('F:\\')
iriR = [] # Initialize a list
def graphWriter():
for file in g.glob('*.TXT'):
for col in csv.DictReader(open(file,'rU')):
set_ = int(col[' Set'])
iriR.append(float(col[' IRI R e'])) # Append new entry
Do the same thing for the other variables that you want to plot.
Maybe this will work.
import pandas as pd
import matplotlib.pyplot as plt
import glob as g
def graphWriter():
data = {}
for file in g.glob('*.TXT'):
data[file] = pd.read_csv(file)
# Removes ')' and turn it into float
data[file][3] = data[file][3].apply(lambda x:x[:-1]).astype(float)
fig, ax = plt.subplots()
for d in data.itervalues():
ax.plot(d[:,2], d[:,3])
plt.show()
plt.close(fig)
graphWriter()
The function will obtain a list of files ended in .TXT then it will load them into a dictionary in which the keys are the names of the files. Later will plot them.
Update
Since the OP posted that pandas is unavailable, its is possible to use numpy.
import numpy as np
import matplotlib.pyplot as plt
import glob as g
def graphWriter():
data = {}
for file in g.glob('*.TXT'):
data[file] = np.fromregex(file, '\d*\.\d*',
dtype=[('1', float), ('2', float),
('3', float), ('4', float)])
fig, ax = plt.subplots()
for d in data.itervalues():
ax.plot(d['3'], d['4'])
plt.show()
plt.close(fig)
graphWriter()

Categories