Having an issue with using median function in numpy - python

I am having an issue with using the median function in numpy. The code used to work on a previous computer but when I tried to run it on my new machine, I got the error "cannot perform reduce with flexible type". In order to try to fix this, I attempted to use the map() function to make sure my list was a floating point and got this error message: could not convert string to float: .
Do some more attempts at debugging, it seems that my issue is with my splitting of the lines in my input file. The lines are of the form: 2456893.248202,4.490 and I want to split on the ",". However, when I print out the list for the second column of that line, I get
4
.
4
9
0
so it seems to somehow be splitting each character or something though I'm not sure how. The relevant section of code is below, I appreciate any thoughts or ideas and thanks in advance.
def curve_split(fn):
with open(fn) as f:
for line in f:
line = line.strip()
time,lc = line.split(",")
#debugging stuff
g=open('test.txt','w')
l1=map(lambda x:x+'\n',lc)
g.writelines(l1)
g.close()
#end debugging stuff
return time,lc
if __name__ == '__main__':
# place where I keep the lightcurve files from the image subtraction
dirname = '/home/kuehn/m4/kepler/subtraction/detrending'
files = glob.glob(dirname + '/*lc')
print(len(files))
# in order to create our lightcurve array, we need to know
# the length of one of our lightcurve files
lc0 = curve_split(files[0])
lcarr = np.zeros([len(files),len(lc0)])
# loop through every file
for i,fn in enumerate(files):
time,lc = curve_split(fn)
lc = map(float, lc)
# debugging
print(fn[5:58])
print(lc)
print(time)
# end debugging
lcm = lc/np.median(float(lc))
#lcm = ((lc[qual0]-np.median(lc[qual0]))/
# np.median(lc[qual0]))
lcarr[i] = lcm
print(fn,i,len(files))

Related

python: issue with handling csv object in an iteration

First, sorry if the title is not clear. I (noob) am baffled by this...
Here's my code:
import csv
from random import random
from collections import Counter
def rn(dic, p):
for ptry in parties:
if p < float(dic[ptry]):
return ptry
else:
p -= float(dic[ptry])
def scotland(r):
r['SNP'] = 48
r['Con'] += 5
r['Lab'] += 1
r['LibDem'] += 5
def n_ireland(r):
r['DUP'] = 9
r['Alliance'] = 1
# SF = 7
def election():
results = Counter([rn(row, random()) for row in data])
scotland(results)
n_ireland(results)
return results
parties = ['Con', 'Lab', 'LibDem', 'Green', 'BXP', 'Plaid', 'Other']
with open('/Users/andrew/Downloads/msp.csv', newline='') as f:
data = csv.DictReader(f)
for i in range(1000):
print(election())
What happens is that in every iteration after the first one, the variable data seems to have vanished: the function election() creates a Counter object from a list obtained by processing data, but on every pass after the first one, this object is empty, so the function just returns the hard coded data from scotland() and n_ireland(). (msp.csv is a csv file containing detailed polling data). I'm sure I'm doing something stupid but would welcome anyone gently pointing out where...
I’m going to place a bet on your definition of newline. Are you sure you don’t want newline = “\n” ? Otherwise it will interpret the entire file as a single line, which explains what you’re seeing.
EDIT
I now see another issue. The file object in python acts as a generator for each line. The problem is once the generator is finished (you hit the end of the file), you have no more data generated. To solve this: reset your file pointer to the beginning of the file like so:
with open('/Users/andrew/Downloads/msp.csv') as f:
data = csv.DictReader(f)
for i in range(1000):
print(election())
f.seek(0)
Here the call to f.seek(0) will reset the file pointer to the beginning of your file. You are correct that data is a global object given the way you've defined it at the module level, there's no need to pass it as a parameter.
I agree with #smassey, you might need to change the code to
with open('/Users/andrew/Downloads/msp.csv', newline='\n') as f:
or simply try not use that argument
with open('/Users/andrew/Downloads/msp.csv') as f:

How to read in multiple documents with same code?

So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.
Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk
First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here

Script skips second for loop when reading a file

I am trying to read a log file and compare certain values against preset thresholds. My code manages to log the raw data from with the first for loop in my function.
I have added print statements to try and figure out what was going on and I've managed to deduce that my second for loop never "happens".
This is my code:
def smartTest(log, passed_file):
# Threshold values based on averages, subject to change if need be
RRER = 5
SER = 5
OU = 5
UDMA = 5
MZER = 5
datafile = passed_file
# Log the raw data
log.write('=== LOGGING RAW DATA FROM SMART TEST===\r\n')
for line in datafile:
log.write(line)
log.write('=== END OF RAW DATA===\r\n')
print 'Checking SMART parameters...',
log.write('=== VERIFYING SMART PARAMETERS ===\r\n')
for line in datafile:
if 'Raw_Read_Error_Rate' in line:
line = line.split()
if int(line[9]) < RRER and datafile == 'diskOne.txt':
log.write("Raw_Read_Error_Rate SMART parameter is: %s. Value under threshold. DISK ONE OK!\r\n" %int(line[9]))
elif int(line[9]) < RRER and datafile == 'diskTwo.txt':
log.write("Raw_Read_Error_Rate SMART parameter is: %s. Value under threshold. DISK TWO OK!\r\n" %int(line[9]))
else:
print 'FAILED'
log.write("WARNING: Raw_Read_Error_Rate SMART parameter is: %s. Value over threshold!\r\n" %int(line[9]))
rcode = mbox(u'Attention!', u'One or more hardrives may need replacement.', 0x30)
This is how I am calling this function:
dataOne = diskOne()
smartTest(log, dataOne)
print 'Disk One Done'
diskOne() looks like this:
def diskOne():
if os.path.exists(r"C:\Dejero\HDD Guardian 0.6.1\Smartctl"):
os.chdir(r"C:\Dejero\HDD Guardian 0.6.1\Smartctl")
os.system("Smartctl -a /dev/csmi0,0 > C:\Dejero\Installation-Scripts\diskOne.txt")
# Store file in variable
os.chdir(r"C:\Dejero\Installation-Scripts")
datafile = open('diskOne.txt', 'rb')
return datafile
else:
log.write('Smart utility not found.\r\n')
I have tried googling similar issues to mine and have found none. I tried moving my first for loop into diskOne() but the same issue occurs. There is no syntax error and I am just not able to see the issue at this point.
It is not skipping your second loop. You need to seek the position back. This is because after reading the file, the file offset will be placed at the end of the file, so you will need to put it back at the start. This can be done easily by adding a line
datafile.seek(0);
Before the second loop.
Ref: Documentation

Writing a random amount of random numbers to a file and returning their squares

So, I'm trying to write a random amount of random whole numbers (in the range of 0 to 1000), square these numbers, and return these squares as a list. Initially, I started off writing to a specific txt file that I had already created, but it didn't work properly. I looked for some methods I could use that might make things a little easier, and I found the tempfile.NamedTemporaryFile method that I thought might be useful. Here's my current code, with comments provided:
# This program calculates the squares of numbers read from a file, using several functions
# reads file- or writes a random number of whole numbers to a file -looping through numbers
# and returns a calculation from (x * x) or (x**2);
# the results are stored in a list and returned.
# Update 1: after errors and logic problems, found Python method tempfile.NamedTemporaryFile:
# This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system, and creates a temprary file that can be written on and accessed
# (say, for generating a file with a list of integers that is random every time).
import random, tempfile
# Writes to a temporary file for a length of random (file_len is >= 1 but <= 100), with random numbers in the range of 0 - 1000.
def modfile(file_len):
with tempfile.NamedTemporaryFile(delete = False) as newFile:
for x in range(file_len):
newFile.write(str(random.randint(0, 1000)))
print(newFile)
return newFile
# Squares random numbers in the file and returns them as a list.
def squared_num(newFile):
output_box = list()
for l in newFile:
exp = newFile(l) ** 2
output_box[l] = exp
print(output_box)
return output_box
print("This program reads a file with numbers in it - i.e. prints numbers into a blank file - and returns their conservative squares.")
file_len = random.randint(1, 100)
newFile = modfile(file_len)
output = squared_num(file_name)
print("The squared numbers are:")
print(output)
Unfortunately, now I'm getting this error in line 15, in my modfile function: TypeError: 'str' does not support the buffer interface. As someone who's relatively new to Python, can someone explain why I'm having this, and how I can fix it to achieve the desired result? Thanks!
EDIT: now fixed code (many thanks to unutbu and Pedro)! Now: how would I be able to print the original file numbers alongside their squares? Additionally, is there any minimal way I could remove decimals from the outputted float?
By default tempfile.NamedTemporaryFile creates a binary file (mode='w+b'). To open the file in text mode and be able to write text strings (instead of byte strings), you need to change the temporary file creation call to not use the b in the mode parameter (mode='w+'):
tempfile.NamedTemporaryFile(mode='w+', delete=False)
You need to put newlines after each int, lest they all run together creating a huge integer:
newFile.write(str(random.randint(0, 1000))+'\n')
(Also set the mode, as explained in PedroRomano's answer):
with tempfile.NamedTemporaryFile(mode = 'w+', delete = False) as newFile:
modfile returns a closed filehandle. You can still get a filename out of it, but you can't read from it. So in modfile, just return the filename:
return newFile.name
And in the main part of your program, pass the filename on to the squared_num function:
filename = modfile(file_len)
output = squared_num(filename)
Now inside squared_num you need to open the file for reading.
with open(filename, 'r') as f:
for l in f:
exp = float(l)**2 # `l` is a string. Convert to float before squaring
output_box.append(exp) # build output_box with append
Putting it all together:
import random, tempfile
def modfile(file_len):
with tempfile.NamedTemporaryFile(mode = 'w+', delete = False) as newFile:
for x in range(file_len):
newFile.write(str(random.randint(0, 1000))+'\n')
print(newFile)
return newFile.name
# Squares random numbers in the file and returns them as a list.
def squared_num(filename):
output_box = list()
with open(filename, 'r') as f:
for l in f:
exp = float(l)**2
output_box.append(exp)
print(output_box)
return output_box
print("This program reads a file with numbers in it - i.e. prints numbers into a blank file - and returns their conservative squares.")
file_len = random.randint(1, 100)
filename = modfile(file_len)
output = squared_num(filename)
print("The squared numbers are:")
print(output)
PS. Don't write lots of code without running it. Write little functions, and test that each works as expected. For example, testing modfile would have revealed that all your random numbers were being concatenated. And printing the argument sent to squared_num would have shown it was a closed filehandle.
Testing the pieces gives you firm ground to stand on and lets you develop in an organized way.

Python: How to extract string from text file to use as data

this is my first time writing a python script and I'm having some trouble getting started. Let's say I have a txt file named Test.txt that contains this information.
x y z Type of atom
ATOM 1 C1 GLN D 10 26.395 3.904 4.923 C
ATOM 2 O1 GLN D 10 26.431 2.638 5.002 O
ATOM 3 O2 GLN D 10 26.085 4.471 3.796 O
ATOM 4 C2 GLN D 10 26.642 4.743 6.148 C
What I want to do is eventually write a script that will find the center of mass of these three atoms. So basically I want to sum up all of the x values in that txt file with each number multiplied by a given value depending on the type of atom.
I know I need to define the positions for each x-value, but I'm having trouble with figuring out how to make these x-values be represented as numbers instead of txt from a string. I have to keep in mind that I'll need to multiply these numbers by the type of atom, so I need a way to keep them defined for each atom type. Can anyone push me in the right direction?
mass_dictionary = {'C':12.0107,
'O':15.999
#Others...?
}
# If your files are this structured, you can just
# hardcode some column assumptions.
coords_idxs = [6,7,8]
type_idx = 9
# Open file, get lines, close file.
# Probably prudent to add try-except here for bad file names.
f_open = open("Test.txt",'r')
lines = f_open.readlines()
f_open.close()
# Initialize an array to hold needed intermediate data.
output_coms = []; total_mass = 0.0;
# Loop through the lines of the file.
for line in lines:
# Split the line on white space.
line_stuff = line.split()
# If the line is empty or fails to start with 'ATOM', skip it.
if (not line_stuff) or (not line_stuff[0]=='ATOM'):
pass
# Otherwise, append the mass-weighted coordinates to a list and increment total mass.
else:
output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs])
total_mass = total_mass + mass_dictionary[line_stuff[type_idx]]
# After getting all the data, finish off the averages.
avg_x, avg_y, avg_z = tuple(map( lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]]))
# A lot of this will be better with NumPy arrays if you'll be using this often or on
# larger files. Python Pandas might be an even better option if you want to just
# store the file data and play with it in Python.
Basically using the open function in python you can open any file. So you can do something as follows: --- the following snippet is not a solution to the whole problem but an approach.
def read_file():
f = open("filename", 'r')
for line in f:
line_list = line.split()
....
....
f.close()
From this point on you have a nice setup of what you can do with these values. Basically the second line just opens the file for reading. The third line define a for loop that reads the file one line at a time and each line goes into the line variable.
The last line in that snippet basically breaks the string --at every whitepsace -- into an list. So line_list[0] will be the value on your first column and so forth. From this point if you have any programming experience you can just use if statements and such to get the logic that you want.
** Also keep in mind that the type of values stored in that list will all be string so if you want to perform any arithmetic operations such as adding you have to be careful.
* Edited for syntax correction
If you have pandas installed, checkout the read_fwf function that imports a fixed-width file and creates a DataFrame (2-d tabular data structure). It'll save you lines of code on import and also give you a lot of data munging functionality if you want to do any additional data manipulations.

Categories