Problems with importing data into code

Problems with importing data into code - python

So I'm trying to make a code that will import data from a text file and graph it with matplotlib here what i have so far:
import matplotlib.pyplot as plt
x = []
y = []
readFile = open ('C:/Users/Owner/Documents/forcecurve.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()
for plotPair in sepFile:
xAndY = plotPair.split('\t')
x.append(int (xAndY[0]))
y.append(int (xAndY[1]))
print x
print y
plt.plot (x, y)
plt.xlabel('Distance (Nanometers)')
plt.ylabel('Force (Piconewtons)')
plt.show()
Once running this I get the error
ValueError: invalid literal for int() with base 10: '1,40.9'

Your file appears to be comma-delimited (1,40.9), not tab delimited, so you need to split by commas rather than tabs. Change
xAndY = plotPair.split('\t')
to
xAndY = plotPair.split(',')
Alternatively, it might be easier to use the csv module to read in the file. As a simple example:
import csv
readFile = open ('C:/Users/Owner/Documents/forcecurve.txt', 'r')
x = []
y = []
r = csv.reader(readFile)
for x1, y1 in r:
x.append(int(x1))
y.append(int(y1))
readFile.close()

Related

Matplotlib is making positives into negatives

I'm just trying to graph some simple data and whether I try to do it with plot or subplot it comes out the same. All values in my lists are positive but the y axis is acting like a number line with only positives.
import matplotlib.pyplot as plt
xVal = []
yVal1 = []
yVal2 = []
yVal3 = []
data = []
# load data
with open(r"path", 'r') as f:
data = f.readlines()
yVal1 = data[0].split(",")
yVal2 = data[1].split(",")
yVal3 = data[2].split(",")
del yVal1[-1]
del yVal2[-1]
del yVal3[-1]
print(yVal1)
print(yVal2)
print(yVal3)
# graph dem bois
xVal = [*range(0, len(yVal1))]
'''fig, ax = plt.subplots(3)
ax[0].plot(xVal, yVal1)
ax[0].set_title("pm5")
ax[1].plot(xVal, yVal2)
ax[1].set_title("pm7.5")
ax[2].plot(xVal, yVal3)
ax[2].set_title("pm10")
fig.suptitle("Particulate Levels over time")'''
plt.plot(xVal, yVal3)
plt.show()

As per the comment by Jody Klymak I converted the string lists into float lists and it worked.
fyVal1 = [float(x) for x in yVal1]

path, angle = line.strip().split() ValueError: too many values to unpack (expected 2)

Code:
from __future__ import division
import cv2
import os
import numpy as np
import scipy
import pickle
import matplotlib.pyplot as plt
from itertools import islice
LIMIT = None
DATA_FOLDER = 'driving_dataset'
TRAIN_FILE = os.path.join(DATA_FOLDER, 'data.txt')
def preprocess(img):
resized = cv2.resize((cv2.cvtColor(img, cv2.COLOR_RGB2HSV))[:, :, 1], (100, 100))
return resized
def return_data():
X = []
y = []
features = []
with open(TRAIN_FILE) as fp:
for line in islice(fp, LIMIT):
path, angle = line.strip().split()
full_path = os.path.join(DATA_FOLDER, path)
X.append(full_path)
# using angles from -pi to pi to avoid rescaling the atan in the network
y.append(float(angle) * scipy.pi / 180)
for i in range(len(X)):
img = plt.imread(X[i])
features.append(preprocess(img))
features = np.array(features).astype('float32')
labels = np.array(y).astype('float32')
with open("features", "wb") as f:
pickle.dump(features, f, protocol=4)
with open("labels", "wb") as f:
pickle.dump(labels, f, protocol=4)
return_data()
Error:
path, angle = line.strip().split()
ValueError: too many values to unpack (expected 2)
Ready I got an Autopilot code when I use the code to extract the data
I'm Getting An Error Like This I Don't Know What To Do Exactly
My Python Version Latest Version
Thanks in advance

That means there is a line in your data.txt file with more than two space separated values. You are trying to put more than two values into two variables, which causes an error.
If you only want the first two values try this:
path, angle, *_ = line.strip().split()
This will assign the remaining values into _.
If this is not what you want then either your data.txt file is the problem, or you need to add more variables, for example:
path, angle, and, more, variables = line.strip().split()
EDIT
If i understand correctly, a single line looks like this
0.jpg 0.000000,2018-07-01 17:09:44:912
and you are trying to get '0.jpg' as path, and 0.00000 as angle. To achieve this you first have to get rid of everything after the comma, then split the remaining string by spaces. For example
line = line.strip().split(',')[0] # get rid of everything after the comma
path, angle = line.strip().split() # split the rest on spaces

Python input for Spectral Clustering

I am using the code from https://github.com/pin3da/spectral-clustering/blob/master/spectral/utils.py to spectrally cluster data in https://cs.joensuu.fi/sipu/datasets/s1.txt
May i know how I can change the code such that it can take in txt file as input?
I have given the original code below for reference
Original code from GitHub
import numpy
import scipy.io
import h5py
def load_dot_mat(path, db_name):
try:
mat = scipy.io.loadmat(path)
except NotImplementedError:
mat = h5py.File(path)
return numpy.array(mat[db_name]).transpose()
I do not understand the purpose of the variable, db_name

The code you show here just opens a given mat or h5 file. The path to the file (path) and the name of the data set within the file (db_name) are provided as arguments to the load_dot_mat function.
To load your txt file, we can create our own little load function:
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
This function takes the path to your "txt" file as an argument an returns a numpy array with the data from your file. The data array has shape (5000,2) for the file you provided. You may want to use float instead of int, if other files contain float values and not only integers.
The complete clustering step for your data could then look like this:
from itertools import cycle, islice
import matplotlib.pyplot as plt
import numpy as np
import seaborn
from spectral import affinity, clustering
seaborn.set()
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
data = load_txt("s1.txt")
A = affinity.com_aff_local_scaling(data)
n_cls = 15 # found by looking at your data
Y = clustering.spectral_clustering(A, n_cls)
colors = np.array(list(islice(cycle(seaborn.color_palette()), int(max(Y) + 1))))
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.scatter(data[:, 0], data[:, 1], color=colors[Y], s=6, alpha=0.6)
plt.show()

Only last graph is getting pasted in pdf file in python

I am reading the parameters from different CSV files and creating the graphs after comparing the parameters across the CSVs. The problem is only last graph is getting pasted in PDF for the last parameter.
with PdfPages('example.pdf') as pdf:
for arg in sys.argv[1:]:
file_reader= open(arg, "rt", encoding='ascii')
read = csv.reader(file_reader)
for row in read:
if operation_OnDut in row:
column_Result = row[10]
resultOfOperations_OnDut_List.append(column_Result)
buildNumber = row[0]
buildName_List.append(buildNumber)
N = len(resultOfOperations_OnDut_List)
ind = np.arange(N)
#Draw graph for operations performed in that TEST CASE
y = resultOfOperations_OnDut_List
width = .1
fig, ax = plt.subplots()
plt.bar(ind, y, width, label = column_Parameters, color="blue")
plt.xticks(ind, buildName_List)
plt.title("Performance and Scale")
plt.ylabel('Result of Operations')
plt.xlabel('Execution Builds')
plt.legend()
plt.tight_layout()
pdf.savefig()
plt.close()
resultOfOperations_OnDut_List = []
buildName_List = []

You probably got the indentation wrong...
Try
with PdfPages('example.pdf') as pdf:
for arg in sys.argv[1:]:
file_reader= open(arg, "rt", encoding='ascii')
read = csv.reader(file_reader)
for row in read:
if operation_OnDut in row:
column_Result = row[10]
....
# one level deeper
N = len(resultOfOperations_OnDut_List)
ind = np.arange(N)
#Draw graph for operations performed in that TEST CASE
...
Note that the section starting with N = len(resultOfOperations_OnDut_List) has been shifted four spaces to the left to be within the first for loop. If you want it to be within the second for loop add four more spaces.

Python - Outputting two data sets (lists?) to data file as two columns

I am very novice when it comes to python. I have done most of my programming in C++. I have a program which generates the fast Fourier transform of a data set and plots both the data and the FFT in two windows using matplotlib. Instead of plotting, I want to output the data to a file. This would be a simple task for me in C++, but I can't seem to figure this out in python. So the question is, "how can I output powerx and powery to a data file in which both data sets are in separate columns? Below is the program:
import matplotlib.pyplot as plt
from fft import fft
from fft import fft_power
from numpy import array
import math
import time
# data downloaded from ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt
print ' C02 Data from Mauna Loa'
data_file_name = 'co2_mm_mlo.txt'
file = open(data_file_name, 'r')
lines = file.readlines()
file.close()
print ' read', len(lines), 'lines from', data_file_name
window = False
yinput = []
xinput = []
for line in lines :
if line[0] != '#' :
try:
words = line.split()
xval = float(words[2])
yval = float( words[4] )
yinput.append( yval )
xinput.append( xval )
except ValueError :
print 'bad data:',line
N = len(yinput)
log2N = math.log(N, 2)
if log2N - int(log2N) > 0.0 :
print 'Padding with zeros!'
pads = [300.0] * (pow(2, int(log2N)+1) - N)
yinput = yinput + pads
N = len(yinput)
print 'Padded : '
print len(yinput)
# Apply a window to reduce ringing from the 2^n cutoff
if window :
for iy in xrange(len(yinput)) :
yinput[iy] = yinput[iy] * (0.5 - 0.5 * math.cos(2*math.pi*iy/float(N-1)))
y = array( yinput )
x = array([ float(i) for i in xrange(len(y)) ] )
Y = fft(y)
powery = fft_power(Y)
powerx = array([ float(i) for i in xrange(len(powery)) ] )
Yre = [math.sqrt(Y[i].real**2+Y[i].imag**2) for i in xrange(len(Y))]
plt.subplot(2, 1, 1)
plt.plot( x, y )
ax = plt.subplot(2, 1, 2)
p1, = plt.plot( powerx, powery )
p2, = plt.plot( x, Yre )
ax.legend( [p1, p2], ["Power", "Magnitude"] )
plt.yscale('log')
plt.show()

You can use a csv.writer() to achieve this task, here is the reference: https://docs.python.org/2.6/library/csv.html
Basic usage:
zip you lists into rows:
rows=zip(powery,powerx)
Use a csv writer to write the data to a csv file:
with open('test.csv', 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)

Depending on what you want to use the file for, I'd suggest either the csv module or the json module.
Writing the file as CSV data will give you the ability to open it with a spreadsheet, graph it, edit it, etc.
Writing the file as JSON data will give you the ability to quickly import it into other programming languages, and to inspect it (generally read-only -- if you want to do serious editing, go with CSV).

This is how you can write data from two different lists into text file in two column.
# Two random lists
index = [1, 2, 3, 4, 5]
value = [4.5, 5, 7.0, 11, 15.7]
# Opening file for output
file_name = "output.txt"
fwm = open(file_name, 'w')
# Writing data in file
for i in range(len(index)):
fwm.write(str(index[i])+"\t")
fwm.write(str(value[i])+"\n")
# Closing file after writing
fwm.close()
if your list contain data in the form of string then remove 'str' while writing data in file.
If you want to save data in csv file change
fwm.write(str(index[i])+"\t")
WITH
fwm.write(str(index[i])+",")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems with importing data into code - python

Related

Matplotlib is making positives into negatives

path, angle = line.strip().split() ValueError: too many values to unpack (expected 2)

Python input for Spectral Clustering

Only last graph is getting pasted in pdf file in python

Python - Outputting two data sets (lists?) to data file as two columns

Categories

Resources