Python input for Spectral Clustering - python

I am using the code from https://github.com/pin3da/spectral-clustering/blob/master/spectral/utils.py to spectrally cluster data in https://cs.joensuu.fi/sipu/datasets/s1.txt
May i know how I can change the code such that it can take in txt file as input?
I have given the original code below for reference
Original code from GitHub
import numpy
import scipy.io
import h5py
def load_dot_mat(path, db_name):
try:
mat = scipy.io.loadmat(path)
except NotImplementedError:
mat = h5py.File(path)
return numpy.array(mat[db_name]).transpose()
I do not understand the purpose of the variable, db_name

The code you show here just opens a given mat or h5 file. The path to the file (path) and the name of the data set within the file (db_name) are provided as arguments to the load_dot_mat function.
To load your txt file, we can create our own little load function:
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
This function takes the path to your "txt" file as an argument an returns a numpy array with the data from your file. The data array has shape (5000,2) for the file you provided. You may want to use float instead of int, if other files contain float values and not only integers.
The complete clustering step for your data could then look like this:
from itertools import cycle, islice
import matplotlib.pyplot as plt
import numpy as np
import seaborn
from spectral import affinity, clustering
seaborn.set()
def load_txt(filename):
with open(filename, "r") as f:
data = [[int(x) for x in line.split(" ") if x != ""] for line in f]
return np.array(data)
data = load_txt("s1.txt")
A = affinity.com_aff_local_scaling(data)
n_cls = 15 # found by looking at your data
Y = clustering.spectral_clustering(A, n_cls)
colors = np.array(list(islice(cycle(seaborn.color_palette()), int(max(Y) + 1))))
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.scatter(data[:, 0], data[:, 1], color=colors[Y], s=6, alpha=0.6)
plt.show()

Related

path, angle = line.strip().split() ValueError: too many values to unpack (expected 2)

Code:
from __future__ import division
import cv2
import os
import numpy as np
import scipy
import pickle
import matplotlib.pyplot as plt
from itertools import islice
LIMIT = None
DATA_FOLDER = 'driving_dataset'
TRAIN_FILE = os.path.join(DATA_FOLDER, 'data.txt')
def preprocess(img):
resized = cv2.resize((cv2.cvtColor(img, cv2.COLOR_RGB2HSV))[:, :, 1], (100, 100))
return resized
def return_data():
X = []
y = []
features = []
with open(TRAIN_FILE) as fp:
for line in islice(fp, LIMIT):
path, angle = line.strip().split()
full_path = os.path.join(DATA_FOLDER, path)
X.append(full_path)
# using angles from -pi to pi to avoid rescaling the atan in the network
y.append(float(angle) * scipy.pi / 180)
for i in range(len(X)):
img = plt.imread(X[i])
features.append(preprocess(img))
features = np.array(features).astype('float32')
labels = np.array(y).astype('float32')
with open("features", "wb") as f:
pickle.dump(features, f, protocol=4)
with open("labels", "wb") as f:
pickle.dump(labels, f, protocol=4)
return_data()
Error:
path, angle = line.strip().split()
ValueError: too many values to unpack (expected 2)
Ready I got an Autopilot code when I use the code to extract the data
I'm Getting An Error Like This I Don't Know What To Do Exactly
My Python Version Latest Version
Thanks in advance
That means there is a line in your data.txt file with more than two space separated values. You are trying to put more than two values into two variables, which causes an error.
If you only want the first two values try this:
path, angle, *_ = line.strip().split()
This will assign the remaining values into _.
If this is not what you want then either your data.txt file is the problem, or you need to add more variables, for example:
path, angle, and, more, variables = line.strip().split()
EDIT
If i understand correctly, a single line looks like this
0.jpg 0.000000,2018-07-01 17:09:44:912
and you are trying to get '0.jpg' as path, and 0.00000 as angle. To achieve this you first have to get rid of everything after the comma, then split the remaining string by spaces. For example
line = line.strip().split(',')[0] # get rid of everything after the comma
path, angle = line.strip().split() # split the rest on spaces

spline fit using python code version 2.7 (reading from excel file the values of x and y)

I would like to simply read in two columns of data (values) as an X and Y from my Excel file (sheet1) and do a spline curve fit to the data. so I would be able to pick any desired value of x (lets say 2.6) and calculate y.
Can you please guide me on this?
I have tried two methods below, none of them works!
two ways I tried to do this:
first method:
from scipy import interpolate
import numpy as np
from openpyxl import load_workbook
#read from excel file
wb = load_workbook('python_excel_read.xlsx')
sheet1 = wb.get_sheet_by_name('Sheet1')
x = np.zeros(sheet1.max_row)
y = np.zeros(sheet1.max_row)
for i in range(0,sheet1.max_row):
x[i]=sheet1.cell(row=i+1, column=1).value
y[i]=sheet1.cell(row=i+1, column=2).value
def f(x):
x_current = [x]
y_voltage = [y]
tck = interpolate.splrep(x_current, y_voltage)
return interpolate.splev(x, tck)
print f(2.6)
my second method:
from scipy import interpolate
import xlwings as xw
wb = xw.Book('python_excel_read.xlsx')
sht = xw.Book('Sheet1')
x_current = sht.range1('A2:A40').value
y_voltage = sht.range2('B2:B40').value
def f(x):
x_current = [sht.range1]
y_voltage = [sht.range2]
tck = interpolate.splrep(x_current, y_voltage)
return interpolate.splev(x, tck)
print f(2.6)

Importing data as an array for plotting in Python

I am new to this question. I hop to get benefit of your advice. Sorry if it is amateurish.
I have the following code which finally shows a plot. I just write one part of code.
...
cov = np.dot(A, A.T)
samps2 = np.random.multivariate_normal([0]*ndim, cov, size=nsamp)
print(samps2)
names = ["x%s"%i for i in range(ndim)]
labels = ["x_%s"%i for i in range(ndim)]
samples2 = MCSamples(samples=samps2,names = names, labels = labels, label='Second set')
g = plots.getSubplotPlotter()
g.triangle_plot([samples2], filled=True)
It has no problem. The plot is drawn using the data coming from samps2. To see what the samps2 is, we do print(samps2) and see:
[[-0.11213986 -0.0582685 ]
[ 0.20346731 0.25309022]
[ 0.22737737 0.2250694 ]
[-0.09544588 -0.12754274]
[-1.05491483 -1.15432073]
[-0.31340717 -0.36144749]
[-0.99158936 -1.12785124]
[-0.5218308 -0.59193326]
[ 0.76552123 0.82138362]
[ 0.65083618 0.70784292]]
My question is, If I want to read these data from a txt file. what should I do?
Thank you.
There are several ways. What comes to my mind is:
plain python:
data = []
with open(filename, 'r') as f:
for line in f:
data.append([float(num) for num in line.split()])
numpy:
import numpy as np
data = np.genfromtxt(filename, ...)
pandas:
import pandas as pd
df = pd.read_table(filename, sep='\s+', header=None)

Python - Outputting two data sets (lists?) to data file as two columns

I am very novice when it comes to python. I have done most of my programming in C++. I have a program which generates the fast Fourier transform of a data set and plots both the data and the FFT in two windows using matplotlib. Instead of plotting, I want to output the data to a file. This would be a simple task for me in C++, but I can't seem to figure this out in python. So the question is, "how can I output powerx and powery to a data file in which both data sets are in separate columns? Below is the program:
import matplotlib.pyplot as plt
from fft import fft
from fft import fft_power
from numpy import array
import math
import time
# data downloaded from ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt
print ' C02 Data from Mauna Loa'
data_file_name = 'co2_mm_mlo.txt'
file = open(data_file_name, 'r')
lines = file.readlines()
file.close()
print ' read', len(lines), 'lines from', data_file_name
window = False
yinput = []
xinput = []
for line in lines :
if line[0] != '#' :
try:
words = line.split()
xval = float(words[2])
yval = float( words[4] )
yinput.append( yval )
xinput.append( xval )
except ValueError :
print 'bad data:',line
N = len(yinput)
log2N = math.log(N, 2)
if log2N - int(log2N) > 0.0 :
print 'Padding with zeros!'
pads = [300.0] * (pow(2, int(log2N)+1) - N)
yinput = yinput + pads
N = len(yinput)
print 'Padded : '
print len(yinput)
# Apply a window to reduce ringing from the 2^n cutoff
if window :
for iy in xrange(len(yinput)) :
yinput[iy] = yinput[iy] * (0.5 - 0.5 * math.cos(2*math.pi*iy/float(N-1)))
y = array( yinput )
x = array([ float(i) for i in xrange(len(y)) ] )
Y = fft(y)
powery = fft_power(Y)
powerx = array([ float(i) for i in xrange(len(powery)) ] )
Yre = [math.sqrt(Y[i].real**2+Y[i].imag**2) for i in xrange(len(Y))]
plt.subplot(2, 1, 1)
plt.plot( x, y )
ax = plt.subplot(2, 1, 2)
p1, = plt.plot( powerx, powery )
p2, = plt.plot( x, Yre )
ax.legend( [p1, p2], ["Power", "Magnitude"] )
plt.yscale('log')
plt.show()
You can use a csv.writer() to achieve this task, here is the reference: https://docs.python.org/2.6/library/csv.html
Basic usage:
zip you lists into rows:
rows=zip(powery,powerx)
Use a csv writer to write the data to a csv file:
with open('test.csv', 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Depending on what you want to use the file for, I'd suggest either the csv module or the json module.
Writing the file as CSV data will give you the ability to open it with a spreadsheet, graph it, edit it, etc.
Writing the file as JSON data will give you the ability to quickly import it into other programming languages, and to inspect it (generally read-only -- if you want to do serious editing, go with CSV).
This is how you can write data from two different lists into text file in two column.
# Two random lists
index = [1, 2, 3, 4, 5]
value = [4.5, 5, 7.0, 11, 15.7]
# Opening file for output
file_name = "output.txt"
fwm = open(file_name, 'w')
# Writing data in file
for i in range(len(index)):
fwm.write(str(index[i])+"\t")
fwm.write(str(value[i])+"\n")
# Closing file after writing
fwm.close()
if your list contain data in the form of string then remove 'str' while writing data in file.
If you want to save data in csv file change
fwm.write(str(index[i])+"\t")
WITH
fwm.write(str(index[i])+",")

Problems with importing data into code

So I'm trying to make a code that will import data from a text file and graph it with matplotlib here what i have so far:
import matplotlib.pyplot as plt
x = []
y = []
readFile = open ('C:/Users/Owner/Documents/forcecurve.txt', 'r')
sepFile = readFile.read().split('\n')
readFile.close()
for plotPair in sepFile:
xAndY = plotPair.split('\t')
x.append(int (xAndY[0]))
y.append(int (xAndY[1]))
print x
print y
plt.plot (x, y)
plt.xlabel('Distance (Nanometers)')
plt.ylabel('Force (Piconewtons)')
plt.show()
Once running this I get the error
ValueError: invalid literal for int() with base 10: '1,40.9'
Your file appears to be comma-delimited (1,40.9), not tab delimited, so you need to split by commas rather than tabs. Change
xAndY = plotPair.split('\t')
to
xAndY = plotPair.split(',')
Alternatively, it might be easier to use the csv module to read in the file. As a simple example:
import csv
readFile = open ('C:/Users/Owner/Documents/forcecurve.txt', 'r')
x = []
y = []
r = csv.reader(readFile)
for x1, y1 in r:
x.append(int(x1))
y.append(int(y1))
readFile.close()

Categories