Unable to append a list with numpy values - python

I want to calculate the average vector length from a file that contains coordinates. Ultimately I want to store vector_length as a list called pair_length. I will calculate the average of the pair_length list later on in my program using average() function. Here is a snippet of my code:
from numpy import sqrt
from itertools import islice
from statistics import mean
data = open("coords.txt","r")
def average():
return mean()
pair_length = []
for line in islice(data, 1, None): #the first line is the number of pairs
fields = line.split(" ")
pair_num = int(fields[0]) #the first field is the pair number
x_cord = float(fields[1]) #x-coordinate
y_cord = float(fields[2]) #y-coordinate
vector_length = sqrt(x_cord**2 + y_cord**2) #vector length (all numbers in the coords.txt file are real and positive)
vector_length.append(pair_length)
I receive the error:
AttributeError: 'numpy.float64' object has no attribute 'append'

Here vector_length stores a float value, and hence append operation won't work with it.
Append operation works with lists, in python.
So, what we can do is:
Instead of
vector_length.append(pair_length)
We can do as follows:
pair_length.append(vector_length)
Hope this works.

Related

Nested for loop producing more number of values than expected-Python

Background:I have two catalogues consisting of positions of spatial objects. My aim is to find the similar ones in both catalogues with a maximum difference in angular distance of certain value. One of them is called bss and another one is called super.
Here is the full code I wrote
import numpy as np
def crossmatch(bss_cat, super_cat, max_dist):
matches=[]
no_matches=[]
def find_closest(bss_cat,super_cat):
dist_list=[]
def angular_dist(ra1, dec1, ra2, dec2):
r1 = np.radians(ra1)
d1 = np.radians(dec1)
r2 = np.radians(ra2)
d2 = np.radians(dec2)
a = np.sin(np.abs(d1-d2)/2)**2
b = np.cos(d1)*np.cos(d2)*np.sin(np.abs(r1 - r2)/2)**2
rad = 2*np.arcsin(np.sqrt(a + b))
d = np.degrees(rad)
return d
for i in range(len(bss_cat)): #The problem arises here
for j in range(len(super_cat)):
distance = angular_dist(bss_cat[i][1], bss_cat[i][2], super_cat[j][1], super_cat[j][2]) #While this is supposed to produce single floating point values, it produces numpy.ndarray consisting of three entries
dist_list.append(distance) #This list now contains numpy.ndarrays instead of numpy.float values
for k in range(len(dist_list)):
if dist_list[k] < max_dist:
element = (bss_cat[i], super_cat[j], dist_list[k])
matches.append(element)
else:
element = bss_cat[i]
no_matches.append(element)
return (matches,no_matches)
When put seperately, the function angular_dist(ra1, dec1, ra2, dec2) produces a single numpy.float value as expected. But when used inside the for loop in this crossmatch(bss_cat, super_cat, max_dist) function, it produces numpy.ndarrays instead of numpy.float. I've stated this inside the code also. I don't know where the code goes wrong. Please help

How do I get a text output from a string created from an array to remain unshortened?

Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')

'Bool object has no attributed index' Error

Im trying to write a code that solves a facility location problem. I have created a data structure in the variable data. data is a list with 4 lists in it. data[0] is a list of city names with a length of 128. The other three are irrelevant for now. There is also a function called nearbyCities(cityname, radius, data) which takes a city name, a radius, and the data and outputs a list of cities within the radius. Assuming that all the code mentioned is correct, why is the error:
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 232, in locateFacilities
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 162, in served
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 131, in nearbyCities
AttributeError: 'bool' object has no attribute 'index'
popping up?
Here's the three functions in question. r is the radius of the cities I am trying to serve. The first two are just helpers for the third which I am trying to call. The error is in the while loop I think.
def served(city, r, data, FalseList): #Helper Function 1
nearbycity=nearbyCities(city, r, data)
for everycity in nearbycity:
dex1=data[0].index(everycity)
FalseList[dex1]=True
return FalseList
def CountHowManyCitiesAreInRThatAreNotServed(city, FalseList, r, data): #Helper Function 2
NBC= nearbyCities(city, r, data)
notserved=0
for element in NBC:
if FalseList[data[0].index(element)] == False:
notserved= notserved+1
return notserved
def locateFacilities(data, r):
FalseList=[False]*128
Cities=data[0]
Radius=[]
output=[]
for everycity in Cities:
Radius.append(len(nearbyCities(everycity, r, data)))
maxito= max(Radius) #Take Radius and find the city that has the most cities in r radius from it.
dex= Radius.index(maxito)
firstserver=Cities[dex]
output.append(firstserver)
FalseList=served(firstserver, r, data, FalseList)
while FalseList.count(False) > 0:
WorkingCityList=[]
Radius2=[]
temp=[]
for everycity in Cities:
if FalseList[Cities.index(everycity)] == False:
Radius2.append(CountHowManyCitiesAreInRThatAreNotServed(everycity, FalseList, r, data))
temp.append(everycity)
maxito=max(Radius2)
dex = Radius2.index(maxito)
serverC= temp[dex]
output.append(serverC)
FalseList=served(serverC, r, FalseList, data)
output.sort()
return output
This is how the rest of the code starts
import re #Import Regular Expressions
def createDataStructure():
f=open('miles.dat') #Opens file
CITY_REG = re.compile(r"([^[]+)\[(\d+),(\d+)\](\d+)") #RegularExpression with a pattern that groups 3 diffrent elements. r" takes a raw string and each thing in parentheses are groups. The first group takes a string until there is actual brackets. The second starts at brackets with two integers sperated by a comma. The third takes an intger. The ending " ends the raw string.
CITY_TYPES = (str, int, int, int) # A conversion factor to change the raw string to the desired types.
#Initialized lists
Cities=[]
Coordinates=[]
Populations=[]
TempD=[]
FileDistances=[]
#Loop that reads the file line by line
line=f.readline()
while line:
match = CITY_REG.match(line) #Takes the compiled pattern and matches it. Returns false of not matched.
if match:
temp= [type(dat) for dat,type in zip(match.groups(), CITY_TYPES)] #Returns the matched string and converts it into the desired format.
# Moves the matched lists into individual lists
Cities.append(temp[0])
Coordinates.append([temp[1],temp[2]])
Populations.append(temp[3])
if TempD: #Once the distance line(s) are over and a city line is matched this appends the distances to a distance list.
FileDistances.append(TempD)
TempD=[]
elif not(line.startswith('*')): # Runs if the line isn't commented out with a "*" or a matched line (line that starts with a city).
g=line.split() #This chunck takes a str of numbers and converts it into a list of integers.
i=0
intline=[]
while i != len(g):
intline.append(int(g[i]))
i+=1
TempD.extend(intline)
line=f.readline()
f.close() #End parsing file
FileDistances.append(TempD) #Appends the last distance line
FileDistances.insert(0,[]) #For first list
i=0
j=1
while i!= 128: #Loop takes lists of distances and makes them len(128) with corresponding distances
FileDistances[i].reverse() #Reverses the current distance list to correspond with distance from city listed before.
FileDistances[i].append(0) #Appends 0 because at this point the city distance is it's self.
counter=i+1
while len(FileDistances[i]) != 128: #Loop that appends the other distnaces.
FileDistances[i].append(FileDistances[counter][-j])
counter=counter+1
j+=1
i+=1
cities=[]
for i in Cities: #Removes the commas. I dont know why we need to get rid of the commas...
new=i.replace(',','')
cities.append(new)
#Final product <3
MasterList=[cities, Coordinates, Populations, FileDistances]
return MasterList
getCoordinates
def getCoordinates(cityname, data): #Basic search function
INDEX=data[0].index(cityname)
return data[1][INDEX]
getPopulation
def getPopulation (cityname, data): #Basic search function
INDEX=data[0].index(cityname)
return data[2][INDEX]
getDistance
def getDistance (cityname1, cityname2, data): #Basic search function
INDEX=data[0].index(cityname1)
INDEX2=data[0].index(cityname2)
return data[3][INDEX][INDEX2]
nearbyCities
def nearbyCities(cityname, radius, data):
Cities=data[0]
INDEX=Cities.index(cityname)
workinglist=data[3][INDEX] #Data[3] is the distance list
IndexList=[]
index = 0
while index < len(workinglist): #Goes through the lists and outputs the indexes of cities in radius r
if workinglist[index] <= radius:
IndexList.append(index)
index += 1
output=[]
for i in IndexList: #Searches the indexes and appends them to an output list
output.append(Cities[i])
output.sort()
return output
The file miles.dat can be found at http://mirror.unl.edu/ctan/support/graphbase/miles.dat
Well, it appears that data[0] contains a boolean, and not a string. I tried this in an empty interpreter, and was able to raise the same exception.
It appears that there is an error in your data list's format. We would need to see that in order to figure out the true issue.

How to find the average of values in a .txt file

I need to find the minimum, maximum, and average of values given in a .txt file. I've been able to find the minimum and maximum values but I'm struggling with finding the average of values. I haven't wrote any coding for determining the average as I have no clue where to start. My current code is:
def summaryStats():
filename = input("Enter a file name: ")
file = open(filename)
data = file.readlines()
data = data[0:]
print("The minimum value is " + min(data))
print("The maximum value is " + max(data))
I need to be able to return the average of these values. As of now the .txt document has the following values:
893
255
504
I'm struggling on being able to find the average of these because every way I try to find the sum my result is 0.
Thanks
(sorry I'm just learning to work with files)
You should convert the data retrived from file to integers first, because your data list contains strings not numbers. And after the conversion to integers average can be found easily:
Why conversion to int is required?
>>> '2' > '10' #strings are compared lexicographically
True
Code:
def summaryStats():
filename = input("Enter a file name: ")
with open(filename) as f:
data = [int(line) for line in f]
print("The minimum value is ", min(data))
print("The maximum value is ", max(data))
print("The average value is ", sum(data)/len(data))
Output:
Enter a file name: abc1
The minimum value is 255
The maximum value is 893
The average value is 550.6666666666666
Don't reinvent the wheel, use numpy it takes a couple of instructions. You can import a txt file into a numpy array and then use the built-in function to perform the operations you want:
>>> import numpy as np
>>> data = np.loadtxt('data.txt')
>>> np.average(data)
550.666666667
>>> np.max(data)
893.0
>>> np.min(data)
255.0

Why doesn't this return the average of the column of the CSV file?

def averager(filename):
f=open(filename, "r")
avg=f.readlines()
f.close()
avgr=[]
final=""
x=0
i=0
while i < range(len(avg[0])):
while x < range(len(avg)):
avgr+=str((avg[x[i]]))
x+=1
final+=str((sum(avgr)/(len(avgr))))
clear(avgr)
i+=1
return final
The error I get is:
File "C:\Users\konrad\Desktop\exp\trail3.py", line 11, in averager
avgr+=str((avg[x[i]]))
TypeError: 'int' object has no attribute '__getitem__'
x is just an integer, so you can't index it.
So, this:
x[i]
Should never work. That's what the error is complaining about.
UPDATE
Since you asked for a recommendation on how to simplify your code (in a below comment), here goes:
Assuming your CSV file looks something like:
-9,2,12,90...
1423,1,51,-12...
...
You can read the file in like this:
with open(<filename>, 'r') as file_reader:
file_lines = file_reader.read().split('\n')
Notice that I used .split('\n'). This causes the file's contents to be stored in file_lines as, well, a list of the lines in the file.
So, assuming you want the ith column to be summed, this can easily be done with comprehensions:
ith_col_sum = sum(float(line.split(',')[i]) for line in file_lines if line)
So then to average it all out you could just divide the sum by the number of lines:
average = ith_col_sum / len(file_lines)
Others have pointed out the root cause of your error. Here is a different way to write your method:
def csv_average(filename, column):
""" Returns the average of the values in
column for the csv file """
column_values = []
with open(filename) as f:
reader = csv.reader(f)
for row in reader:
column_values.append(row[column])
return sum(column_values) / len(column_values)
Let's pick through this code:
def averager(filename):
averager as a name is not as clear as it could be. How about averagecsv, for example?
f=open(filename, "r")
avg=f.readlines()
avg is poorly named. It isn't the average of everything! It's a bunch of lines. Call it csvlines for example.
f.close()
avgr=[]
avgr is poorly named. What is it? Names should be meaningful, otherwise why give them?
final=""
x=0
i=0
while i < range(len(avg[0])):
while x < range(len(avg)):
As mentioned in comments, you can replace these with for loops, as in for i in range(len(avg[0])):. This saves you from needing to declare and increment the variable in question.
avgr+=str((avg[x[i]]))
Huh? Let's break this line down.
The poorly named avg is our lines from the csv file.
So, we index into avg by x, okay, that would give us the line number x. But... x[i] is meaningless, since x is an integer, and integers don't support array access. I guess what you're trying to do here is... split the file into rows, then the rows into columns, since it's csv. Right?
So let's ditch the code. You want something like this, using the split http://docs.python.org/2/library/stdtypes.html#str.split function:
totalaverage = 0
for col in range(len(csvlines[0].split(","))):
average = 0
for row in range(len(csvlines)):
average += int(csvlines[row].split(",")[col])
totalaverage += average/len(csvlines)
return totalaverage
BUT wait! There's more! Python has a built in csv parser that is safer than splitting by ,. Check it out here: http://docs.python.org/2/library/csv.html
In response to OP asking how he should go about this in one of the comments, here is my suggestion:
import csv
from collections import defaultdict
with open('numcsv.csv') as f:
reader = csv.reader(f)
numbers = defaultdict(list) #used to avoid so each column starts with a list we can append to
for row in reader:
for column, value in enumerate(row,start=1):
numbers[column].append(float(value)) #convert the value to a float 1. as the number may be a float and 2. when we calc average we need to force float division
#simple comprehension to print the averages: %d = integer, %f = float. items() goes over key,value pairs
print('\n'.join(["Column %d had average of: %f" % (i,sum(column)/(len(column))) for i,column in numbers.items()]))
Producing
>>>
Column 1 had average of: 2.400000
Column 2 had average of: 2.000000
Column 3 had average of: 1.800000
For a file:
1,2,3
1,2,3
3,2,1
3,2,1
4,2,1
Here's two methods. The first one just gets the average for the line (what your code above looks like it's doing). The second gets the average for a column (which is what your question asked)
''' This just gets the avg for a line'''
def averager(filename):
f=open(filename, "r")
avg = f.readlines()
f.close()
count = 0
for i in xrange(len(avg)):
count += len(avg[i])
return count/len(avg)
''' This gets a the avg for all "columns"
char is what we split on , ; | (etc)
'''
def averager2(filename, char):
f=open(filename, "r")
avg = f.readlines()
f.close()
count = 0 # count of items
total = 0 # sum of all the lengths
for i in xrange(len(avg)):
cols = avg[i].split(char)
count += len(cols)
for j in xrange(len(cols)):
total += len(cols[j].strip()) # Remove line endings
return total/float(count)

Categories