Python - Error scatter plotting with Matplotlib: Index out of range - python

I'm very new to Python, and I have a CSV file with three columns. They represent a transmission time in milliseconds, signal amplitude, and FM radio frequency in kHz. There's a lot of lines, but they look something like this:
My task is to find out which radio frequency is generating random noise and which is a structured signal. For how to do this, I'm trying to first find the unique values in the frequency column of my data file (column 3) and then plot them individually to find the structured data. My guess is that the 71.231012 frequency is the white noise (it seemed less frequent in the file), and so I'm basically trying to plot both frequencies to see if my guess is somewhat correct.
So far, this is my code:
from __future__ import division
import matplotlib.pyplot as mplot
import numpy as np
file=open("data.csv", "r")
data=file.read()
data=data.replace(" ", ",")
data=data.split("\n")
xscatter=[]
yscatter=[]
for row in data:
row=row.split(",")
row[2]=float(row[2])
if row[2] == 71.231012:
xscatter.append(row[2])
yscatter.append(row[1])
mplot.scatter(xscatter, yscatter, color="blue", marker="o")
mplot.show()
But I keep getting this error:
row[2]=float(row[2])
IndexError: list index out of range
I'm not sure why this is the case; I thought that, with the split, I would have three indexes per row (0,1,2). And because I'm so new to Python, I'm also not sure how accurate or efficient my code is at doing what I want, but it's a start. I'd greatly appreciate some help.
EDIT: Here is a sample of my output after splitting the file, before the for loop:

The code row=row.split(",") sets the row variable to something like ['0.000000', '', '0.000000', '', '0.000000']. Your code is giving index error because there are no index 2 from the string ''.
There are 2 ways of doing this:
My idea is to remove those annoying empty strings in the array by changing your row=row.split(",") to row=row.split(",,"), this will work perfectly.
Change your data=data.replace(" ", ",") to data=data.replace(" ", ",") (two whitespaces), that will also work perfectly.

If you have an input csv file like the following,
0,1.62435,7.61417
0,-0.611756,7.61417
0,-0.528172,71.231
0,-1.07297,71.231
0,0.865408,7.61417
0,-2.30154,7.61417
0,1.74481,7.61417
0,-0.761207,7.61417
0,0.319039,71.231
0,-0.24937,71.231
1,1.46211,71.231
1,-2.06014,7.61417
1,-0.322417,71.231
1,-0.384054,7.61417
1,1.13377,7.61417
1,-1.09989,71.231
1,-0.172428,71.231
1,-0.877858,7.61417
1,0.0422137,71.231
1,0.582815,71.231
You can read it in using numpy.loadtxt and plot it separated by frequency value by looping over the respective unique frequencies in the last column.
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt("data/filename.csv", delimiter=",")
for freq in np.unique(data[:,2]):
thisdata = data[data[:,2] == freq]
plt.scatter(thisdata[:,0], thisdata[:,1], label="{}".format(freq))
plt.legend()
plt.show()

Related

Enumerating through a list of data to find averages, but the lines aren't just numbers

I am new to Python. I am enumerating through a large list of data, as shown below, and would like to find the mean of every line.
for index, line in enumerate (data):
#calculate the mean
However, the lines of this particular set of data are as such:
[array([[2.3325655e-10, 2.4973504e-10],
[1.3025138e-10, 1.3025231e-10]], dtype=float32)].
I would like to find the mean of both 2x1s separately, then the average of both means, so it outputs a single number.
Thanks in advance.
You probably do not need to enumerate through the list to achieve what you want. You can do it in two steps using list comprehension.
For example,
data = [[2.3325655e-10, 2.4973504e-10],
[1.3025138e-10, 1.3025231e-10]]
# Calculate the average for 2X1s or each row
avgs_along_x = [sum(line)/len(line) for line in data]
# Calculate the average along y
avg_along_y = sum(avgs_along_x)/len(avgs_along_x)
There are other ways to calculate the mean of a list in python. You can read about them here.
If you are using numpy this can be done in one line.
import numpy as np
np.average(data, 1) # calculate the mean along x-axes denoted as 1
# To get what you want, we can pass tuples of axes.
np.average(data, (1,0))

Calculating intermittent average

I have a huge dataframe with a lot of zero values. And, I want to calculate the average of the numbers between the zero values. To make it simple, the data shows for example 10 consecutive values then it renders zeros then values again. I just want to tell python to calculate the average of each patch of the data.
The pic shows an example
first of all I'm a little bit confused why you are using a DataFrame. This is more likely being stored in a pd.Series while I would suggest storing numeric data in an numpy array. Assuming that you are having a pd.Series in front of you and you are trying to calculate the moving average between two consecutive points, there are two approaches you can follow.
zero-paddding for the last integer:
assuming circularity and taking the average between the first and the last value
Here is the expected code:
import numpy as np
import pandas as pd
data_series = pd.Series([0,0,0.76231, 0.77669,0,0,0,0,0,0,0,0,0.66772, 1.37964, 2.11833, 2.29178, 0,0,0,0,0])
np_array = np.array(data_series)
#assuming zero_padding
np_array_zero_pad = np.hstack((np_array, 0))
mvavrg_zeropad = [np.mean([np_array_zero_pad[i], np_array_zero_pad[i+1]]) for i in range(len(np_array_zero_pad)-1)]
#asssuming circularity
np_array_circ_arr = np.hstack((np_array, np_array[-1]))
np_array_circ_arr = [np.mean([np_array_circ_arr[i], np_array_circ_arr[i+1]]) for i in range(len(np_array_circ_arr)-1)]

Export a numpy matrix of complex numbers to CSV

I am having the following trouble in Python. Assume a numpy.matrix A with entities of dtype to be complex128. I want to export A in CSV format so that the entries are separated by commas and each line at the output file corresponds to a row of A. I also need 18 decimal points of precision for both the real and imaginary parts and no spaces within an entry for example I need this
`6.103515626000000000e+09+1.712134684679831166e+05j`
instead of
`6.103515626000000000e+09 + 1.712134684679831166e+05j`
The following command works but only for 1-by-1 matrix
numpy.savetxt('A.out', A, fmt='%.18e%+.18ej', delimiter=',')
If I use:
numpy.savetxt('A.out', A, delimiter=',')
there are two problems. First, I don't know how many decimal points are preserved by default. Second, each complex entry is put in parentheses like
(6.103515626000000000e+09+1.712134684679831166e+05j)
and I cannot read the file in Matlab.
What do you suggest?
This is probably not the most efficient way of converting data in the large matrix and I am sure there exists a more efficient one-line-of-code solution, but you can try executing the code below and see if it works. Here I will be using pandas to save data to a csv file. The first columns in the generated csv file would be respectively your real and imaginary parts. Here I also assume that the dimension of the input matrix is Nx1.
import pandas as pd
import numpy as np
def to_csv(t, nr_of_decimal = 18):
t_new = np.matrix(np.zeros((t.shape[0], 2)))
t_new[:,:] = np.round(np.array(((str(np.array(t[:])[0][0])[1:-2]).split('+')), dtype=float), decimals=nr_of_decimal)
(pd.DataFrame(t_new)).to_csv('out.csv', index = False, header = False)
#Assume t is your complex matrix
t = np.matrix([[6.103515626000000000e+09+1.712134684679831166e+05j], [6.103515626000000000e+09+1.712134684679831166e+05j]])
to_csv(t)

averaging datasets of varying length

I have a series of datasets outputted from a program. My goal is to plot an average of the datasets as a line graph in pyplot or numpy. My problem is that the length of the outputted datasets is not controllable.
For example, I have four data sets of lengths varying between 200 and 400 points with x values normalised to figures from 0 to 1, and I want to calculate the median line for the four datasets.
All I can think of so far is to interpolate (linearly would be sufficient) to add extra data points to the shorter sequences, or somehow periodically remove values from the longer sequences. Does anyone have any suggestions?
At the moment I am importing with csv reader and appending row by row to a list, so the output is a list of lists, each with a set of xy coordinates which I think is the same as a 2d array?
I was actually thinking it may be easier to delete excess data points than to interpolate, for example, starting with four lists, I could remove unnecessary points along the x axis since they are normalised and increasing, then cull points with too small a step size by referencing the shortest list step sizes (this explanation may not be so clear, I will try to write up an example and put it up tomorrow)
An example data set would be
line1=[[0.33,2],[0.66,5],[1,5]]
line 2=[[0.25,43],[0.5,53],[0.75,6.5],[1,986]]
so the solution that I used was to interpolate as suggested above, I've included a simplified version of the code below:
first the data is imported as a dictionary for ease of access and manipulation:
def average(files, newfile):
import csv
dict={}
ln=[]
max=0
for x in files:
with open(x+'.csv', 'rb') as file:
reader = csv.reader(file, delimiter=',')
l=[]
for y in reader:
l.append(y)
dict[x]=l
ln.append(x)
Next the length of the longest data set is established:
for y in ln:
if max == 0:
max = len(dict[y])
elif len(dict[y]) >= max:
max = len(dict[y])
next the number of iterations required for each dataset needs to be defined:
for y in ln:
dif = max - len(dict[y])
finally the intermediary values are calculated by linear interpolation and inserted to the dataset
for i in range(dif):
loc = int( i* len(dict[y])/dif)
if loc ==0:
loc =1
new = [(float(dict[y][loc-1][x])+float(dict[y][loc][x]))/2
for x in range(len(dict[y][loc]))]
dict[y].insert(loc,new)
then taking the average is very simple:
for x in range(len(dict[ln[0]])):
t = [sum(float(dict[u][x][0]) for u in ln)/len(ln),
sum(float(dict[u][x][1])/4 for u in ln)]
avg.append(t)
I'm not saying it's pretty code, but it does what I needed it to...

Saving/loading a table (with different column lengths) using numpy

A bit of context: I am writting a code to save the data I plot to a text file. This data should be stored in such a way it can be loaded back using a script so it can be displayed again (but this time without performing any calculation). The initial idea was to store the data in columns with a format x1,y1,x2,y2,x3,y3...
I am using a code which would be simplified to something like this (incidentally, I am not sure if using a list to group my arrays is the most efficient approach):
import numpy as np
MatrixResults = []
x1 = np.array([1,2,3,4,5,6])
y1 = np.array([7,8,9,10,11,12])
x2 = np.array([0,1,2,3])
y2 = np.array([0,1,4,9])
MatrixResults.append(x1)
MatrixResults.append(y1)
MatrixResults.append(x2)
MatrixResults.append(y2)
MatrixResults = np.array(MatrixResults)
TextFile = open('/Users/UserName/Desktop/Datalog.txt',"w")
np.savetxt(TextFile, np.transpose(MatrixResults))
TextFile.close()
However, this code gives and error when any of the data sets have different lengths. Reading similar questions:
Can numpy.savetxt be used on N-dimensional ndarrays with N>2?
Table, with the different length of columns
However, this requires to break the format (either with flattening or adding some filling strings to the shorter columns to fill the shorter arrays)
My issue summarises as:
1) Is there any method that at the same time we transpose the arrays these are saved individually as consecutive columns?
2) Or maybe is there anyway to append columns to a text file (given a certain number of rows and columns to skip)
3) Should I try this with another library such as pandas?
Thank you very for any advice.
Edit 1:
After looking a bit more it seems that leaving blank spaces is more innefficient than filling the lists.
In the end I wrote my own (not sure if there is numpy function for this) in which I match the arrays length with "nan" values.
To get the data back I use the genfromtxt method and then I use this line:
x = x[~isnan(x)]
To remove the these cells from the arrays
If I find a better solution I will post it :)
To save your array you can use np.savez and read them back with np.load:
# Write to file
np.savez(filename, matrixResults)
# Read back
matrixResults = np.load(filename + '.npz').items[0][1]
As a side note you should follow naming conventions i.e. only class names start with upper case letters.

Categories