Python - ValueError: could not convert string to float - python

I am a beginner in python and I'm trying to graph some data from a file. The code is the following:
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import find_peaks
import os
dataFrame = pd.read_csv('soporte.txt', sep='\t',skiprows=1, encoding = 'utf-8-sig')
x = dataFrame['Wavelength nm.']
y = dataFrame['Abs.']
indices, _ = find_peaks(y, threshold=1)
plt.plot(x, y)
plt.show()
And I get the following error:
ValueError: could not convert string to float: '-0,04008'
I'll show you a piece of the file I am trying to work with:
"soporte.spc - RawData"
"Wavelength nm." "Abs."
180,0 -0,04008
181,0 -0,00084
182,0 -0,00746
183,0 0,00854
184,0 -0,01525
185,0 -0,00354
Thank you very much!!!
L

Use the decimal=',' option in pandas, i.e.,
dataFrame = pd.read_csv('soporte.txt', sep='\t',skiprows=1, encoding = 'utf-8-sig', decimal=',')

Related

pandas.read_csv() returns strings from columns instead numbers

I am trying to find linear regression plot for the data provided
import pandas
from pandas import DataFrame
import matplotlib.pyplot
data = pandas.read_csv('cost_revenue_clean.csv')
data.describe()
X = DataFrame(data,columns=['production_budget_usd'])
y = DataFrame(data,columns=['worldwide_gross_usd'])
when I try to plot it
matplotlib.pyplot.scatter(X,y)
matplotlib.pyplot.show()
the plot was completely empty
and when I printed the type of X
for element in X:
print(type(element))
it shows the type is string.. Where am I standing wrong???
No need to make new DataFrames for X and y. Try astype(float) if you want them as numeric:
X = data['production_budget_usd'].astype(float)
y = data['worldwide_gross_usd'].astype(float)

Drawing a graph using matplotlib

I have a text file that looks the following way:
14:49:15
0.00152897834778
14:49:22
0.00193500518799
14:49:29
0.00154614448547
14:49:36
0.0024299621582
14:49:43
0.00161910057068
14:49:50
0.00165987014771
14:49:57
0.00150108337402
I want to create a graph using the plot() method in which i wish every odd line from the text file to be a coordinate on the x axis and every non-odd line to be a y-axis coordinate to it's respective x(the line before the non-odd)
In this particular case 14:49:15 would be the first x and 0.00152897834778 the first y
You could convert the datetimes to numeric and plot them
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import datetime
string = """14:49:15
0.00152897834778
14:49:22
0.00193500518799
14:49:29
0.00154614448547
14:49:36
0.0024299621582
14:49:43
0.00161910057068
14:49:50
0.00165987014771"""
x = string.split('\n')[::2]
x = matplotlib.dates.date2num([datetime.datetime.strptime(xi, '%H:%M:%S') for xi in x])
y = np.array(string.split('\n')[1::2], dtype=float)
plt.plot(x, y)
You may split the input by linebreaks .split("\n") and convert every second one to a datetime object. Then using plt.plot_date() gives you a plot showing the times.
import datetime
import numpy as np
import matplotlib.pyplot as plt
u = u"""14:49:15
0.00152897834778
14:49:22
0.00193500518799
14:49:29
0.00154614448547
14:49:36
0.0024299621582
14:49:43
0.00161910057068
14:49:50
0.00165987014771"""
# split the string by linebreaks
l = u.split("\n")
# take every second substring and convert it to datetime
x = [datetime.datetime.strptime(i, "%H:%M:%S") for i in l[::2] ]
# take every second substring starting at the second one
y = l[1::2]
plt.plot_date(x,y)
plt.show()

ValueError: Could not convert string to float: "nbformat":4

After some long calculation, I've got files which contain following strings.
(Each string is separated with "\t" and has "\n" at the end of the each line.)
0.0000008375000 829.685601736 555.939928236
0.0000008376000 829.511081539 555.889353246
0.0000008377000 829.336613968 555.838785601
0.0000008378000 829.162199002 555.7882253
0.0000008379000 828.987836621 555.737672342
0.0000008380000 828.813526805 555.687126727
0.0000008381000 828.639269533 555.636588453
Then I tried to plot these files. (The file's name is starting with P.)
fList = np.array(gl.glob("P*"))
for i in fList:
f = open(i, "r")
data = f.read()
data = data.replace("\n", "\t")
data = np.array(data.split("\t"))[:-1].reshape(-1,3)
plt.plot(data[:,0], data[:,1], label=i)
Then I ended up with following error.
(Error pointer indicates this happened at the line plt.plot(data[:,0], data[:,1], label=i))
ValueError: could not convert string to float: "nbformat": 4,
I've looked up some other tutorials or walkthroughs but unfortunately, could not understand how to fix this issue. Any help or advice would be very grateful.
You can directly use numpy to read in the file into three arrays:
import numpy as np
import matplotlib.pyplot as plt
from glob import glob
fList = glob("P*")
for i in fList:
x,y,z = np.loadtxt(i, unpack=True)
plt.plot(x,y, label=i)
plt.legend()
plt.show()

pandas.DataFrame returns Series not a Dataframe

I am working with a series of images. I read them first and store in the list then I convert them to dataframe and finally I would like to implement Isomap. When I read images (I have 84 of them) I get 84x2303 dataframe of objects. Now each object by itself also looks like a dataframe. I am wondering how to convert all of it to_numeric so I can use Isomap on it and then plot it.
Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples)
print df.dtypes
print df.shape
Thanks!

Python Importing data with multiple delimiters

In Python, how can I import data that looks like this:
waveform [0]
t0 26/11/2014 10:53:03.639218
delta t 2.000000E-5
time[0] Y[0]
26/11/2014 10:53:03.639218 1.700977E-2
26/11/2014 10:53:03.639238 2.835937E-4
26/11/2014 10:53:03.639258 2.835937E-4
26/11/2014 10:53:03.639278 -8.079492E-3
There are two delimiters, : and white space. I want to get rid of the date 24/11/2014 and delete the semicolons so that the time array looks like 105303.639218, etc. So is there a way to specify two delimiters in the code, or is there a better way to analyse the data?
So far I have got:
import numpy as np
import matplotlib.pyplot as plt
_, time, y = np.loadtxt('data.txt', delimiter=':', skiprows=5)
plt.plot(time,y)
plt.show()
You can do this:
time = '10:34:20.454068'
list_ = time.split(':')
''.join(list_)
# '103420.454068'
for each row.
Maybe it's sort of a roundabout way of doing this, but...
import numpy as np
import matplotlib.pyplot as plt
mydata = np.loadtxt('data.txt', dtype='string', skiprows=5)
time = mydata[:,1]
time = np.array([s.replace(':','') for s in time])
y = np.array(mydata[:,2])
plt.plot(time,y)
plt.show()

Categories