Coverting string to floats using .csv file and generating Histogram - python

I've been with .cvs files to generate Histogram from the data. It has data something like this
102.919 103.36
102.602 103.05
104.106 104.57
108.791 109.26
104.045 104.52
104.324 104.77
105.106 105.57
102.619 103.08
102.124 102.6
Here's the code I have written
# histplot.py
import numpy as np
import matplotlib.pyplot as plt
import csv
with open('datafile.csv', 'rU') as data:
reader = csv.DictReader(data, delimiter=' ', quoting=csv.QUOTE_NONNUMERIC)
for line in reader:
t = float(line)
data.append(t)
reader.close()
# generate the histogram
hist, bin_edges=np.histogram(data, bins=50, range=[80,135])
# generate histogram figure
plt.hist(data, bin_edges)
plt.savefig('chart_file', format="pdf")
plt.show()
Running this code give me an error ValueError: could not convert string to float: '102.919,103.36'
Can someone help me in giving few ideas regarding converting strings to float using csv file.
Thank you in advance.

First of all with open('datafile.csv', 'rU') as data: means that you obtain data as a filehandle to the file. You can use this filehandle as an iterable but you cannot append anything to it.
Second csv.DictReader provides access to the data as a dictionary. In this case here, I would recommend using csv.reader, which gives access to the data as a list.
Third, you cannot convert the whole line, may it be a dictionary or a list, to a float. You can only do that with a single element of the list. (This is where the error comes from.) Conversion to float isn't even necessary, since the reader already takes care of that.
Now, you can simply append the elements line by line to an initially empty list and supply this list to the histogram function.
import numpy as np
import matplotlib.pyplot as plt
import csv
data = [] #create empty list
with open('datafile.csv', 'rU') as f:
reader = csv.reader(f, delimiter=' ', quoting=csv.QUOTE_NONNUMERIC)
for line in reader:
data.extend(line)
# generate the histogram
hist, bin_edges=np.histogram(data, bins=50, range=[80,135])
# generate histogram figure
plt.hist(data, bin_edges)
#plt.savefig('chart_file', format="pdf")
plt.show()
Let me just mention that the whole data reading can be done in a much simpler way, using numpy.loadtxt.
Also, plotting the histogram may be simplified, in case no further data processing needs to take place.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('datafile.csv').flatten()
plt.hist(data, bins=50, range=[80,135])
plt.show()

Related

Matplotlib: blank plot and window won't close

I'm trying to plot a curve using the data from a csv file using:
import matplotlib.pyplot as plt
from csv import reader
with open('transmission_curve_HST_ACS_HRC.F606W.csv', 'rw') as f:
data = list(reader(f))
wavelength_list = [i[0] for i in data[1::]]
percentage = [i[1] for i in data[1::]]
plt.plot(wavelength_list, percentage)
plt.show()
But all it make is opening a completely blank window and I can't close it unless I close the terminal.
The csv file looks like this:
4565,"0,00003434405472044760"
4566,"0,00004045191689260860"
4567,"0,00004656394357747830"
4568,"0,00005267963655205460"
4569,"0,00005879949856084820"
Do you have any idea why?
You need to modify three things in your code:
Change 'rw' to 'r' when you read from the file
Correct the way you iterate over data
Convert the numbers from the second column to float
import matplotlib.pyplot as plt
from csv import reader
with open('transmission_curve_HST_ACS_HRC.F606W.csv', 'r') as f:
data = list(reader(f))
wavelength_list = [i[0] for i in data]
percentage = [float(str(i[1]).replace(',','.')) for i in data]
plt.plot(wavelength_list, percentage)
plt.show()
Content of the csv file:
4564,"0,00002824029270045730"
4565,"0,00003434405472044760"
4566,"0,00004045191689260860"
4567,"0,00004656394357747830"
4568,"0,00005267963655205460"
4569,"0,00005879949856084820"

How to convert from a string to a float when taking the data from excel

I imported data from an excel sheet, only reading the first two rows. Whenever I try to run the code my error is that it could not convert from string to float. Is there any way to fix this? My code is below.
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import math
file = 'Cepheid_proj.csv'
Cepheid1 = np.loadtxt(file, skiprows=1)
fig= plt.figure()
plt.title('Luminosity vs Period\n')
axes=fig.add_subplot(111)
plt.ylabel('Luminosity (W)')
plt.xlabel('Period (days)')
plt.xlim((44500.0,46000.0))
plt.ylim((9.0,12.0))
axes.plot(Cepheid1[:,0], Cepheid1[:,1], label='Ca ces')
plt.legend(loc=1, prop={'size': 7})
plt.show()
You can try something like this:
import csv
import numpy
output = csv.reader(open("Cepheid_proj.csv", "rb"), delimiter=",")
x = list(output)
float_output = numpy.array(x).astype("float")
If this source code not work then take a look at the next steps:
1.You have an CVS file , read how to used it, see this
2.Using numpy you need to use something like that:
from numpy import genfromtxt
csv_data = genfromtxt('Cepheid_proj.csv', delimiter=',')
3.Try read about Python Data Type Conversion , see this link

Error in Python 3.4 (Spyder) while plotting on Open Mandriva

I was trying to plot an IR spectrum from csv file, like this :
import matplotlib.pyplot as plt
file=open('261.1_2014-12-10t16-33-55.csv')
for line in file :
data.append(line)
pointset=data[6:]
for point in pointset:
res=point.split(',')
h=float(res[0])
wn.append(h)
y=float(res[1])
Ads.append(y)
plt.plot(wn,Ads)
plt.show()
but instead of single line, i get huge lot of them.
variables Ads and wn have much more entries then point set and data.
What is wrong?
You are iterating over the lines in the file twice. For each line in the file, you iterate over each point in pointset, but pointset is just the set of all lines read so far except the first six.
I think this is what you want:
from matplotlib import pyplot as plt
file = open('filename.csv')
for ii,line in enumerate(file):
if ii>=6: #skip lines 0, 1,2,3,4,5
fields = line.split(",")
wn.append(float(fields[0]))
Ads.append(float(fields[1]))
plt.plot(wn,Ads)
plt.show()

How do I make a histogram from a csv file which contains a single column of numbers in python?

I have a csv file (excel spreadsheet) of a column of roughly a million numbers. I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. I know matplotlib can plot a histogram, but my main problem is converting the csv file from string to float since a string can't be graphed. This is what I have:
import matplotlib.pyplot as plt
import csv
with open('D1.csv', 'rb') as data:
rows = csv.reader(data, quoting = csv.QUOTE_NONNUMERIC)
floats = [[item for number, item in enumerate(row) if item and (1 <= number <= 12)] for row in rows]
plt.hist(floats, bins=50)
plt.title("histogram")
plt.xlabel("value")
plt.ylabel("frequency")
plt.show()
You can do it in one line with pandas:
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['column_you_want'].hist(bins=50)
Okay I finally got something to work with headings, titles, etc.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
My first problem was that matplotlib is necessary to actually show the graph. Also, I needed to set the action
pd.read_csv('D1.csv', quoting=2)
to data so I could plot the histogram of that action with
data.hist
Thank you all for the help.
Panda's read_csv is very powerful, but if your csv file is simple (without headers, or NaNs or comments) you do not need Pandas, as you can use Numpy:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('D1.csv')
plt.hist(data, normed=True, bins='auto')
(In fact loadtxt can deal with some headers and comments, but read_csv is more versatile)

plot data from CSV file with matplotlib

I have a CSV file at e:\dir1\datafile.csv.
It contains three columns and 10 heading and trailing lines need to be skipped.
I would like to plot it with numpy.loadtxt(), for which I haven't found any rigorous documentation.
Here is what I started to write from the several tries I found on the web.
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def read_datafile(file_name):
# the skiprows keyword is for heading, but I don't know if trailing lines
# can be specified
data = np.loadtxt(file_name, delimiter=',', skiprows=10)
return data
data = read_datafile('e:\dir1\datafile.csv')
x = ???
y = ???
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Mains power stability")
ax1.set_xlabel('time')
ax1.set_ylabel('Mains voltage')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
According to the docs numpy.loadtxt is
a fast reader for simply formatted files. The genfromtxt function provides more sophisticated handling of, e.g., lines with missing values.
so there are only a few options to handle more complicated files.
As mentioned numpy.genfromtxt has more options. So as an example you could use
import numpy as np
data = np.genfromtxt('e:\dir1\datafile.csv', delimiter=',', skip_header=10,
skip_footer=10, names=['x', 'y', 'z'])
to read the data and assign names to the columns (or read a header line from the file with names=True) and than plot it with
ax1.plot(data['x'], data['y'], color='r', label='the data')
I think numpy is quite well documented now. You can easily inspect the docstrings from within ipython or by using an IDE like spider if you prefer to read them rendered as HTML.
I'm guessing
x= data[:,0]
y= data[:,1]

Categories