So I have a set of positional data that I comes from a factory sensor. It produces x, y and z info in meters from to a known lat/long position. I have a function that will convert the distance in meters from the lat/long but I need to use the x and y data in a Pythagoras function to determine that. Let me try to clarify with an example of the JSON data the sensor gives.
[
{
"id": "84eb18677194",
"name": "forklift_0001",
"areaId": "Tracking001",
"areaName": "Hall1",
"color": "#FF0000",
"coordinateSystemId": "CoordSys001",
"coordinateSystemName": null,
"covarianceMatrix": [
0.82,
-0.07,
-0.07,
0.55
],
"position": [ #this is the x,y and z data, in meters from the ref point
18.11,
33.48,
2.15
],
In this branch the forklift is 18.11m along and 33.38m up from the reference lat/long. The sensor is 2.15m high and that is a constant piece of info i don't need. To work out the distance from the reference point I need to use Pythagoras and then convert that data back into lat/long so my analysis tool can present it.
My problem (as far as python goes) is that I can't figure out how to make it see 18.11 & 33.38 as the x & y and tell it to disregard 2.15 entirely. Here is what i have so far.
import math
import json
import pprint
import os
from glob import iglob
rootdir_glob = 'C:/Users/username/Desktop/test_folder**/*"' # Note the
added asterisks, use forward slash
# This will return absolute paths
file_list = [f for f in
iglob('C:/Users/username/Desktop/test_folder/13/00**/*', recursive=True)
if os.path.isfile(f)]
for f in file_list:
print('Input file: ' + f) # Replace with desired operations
with open(f, 'r') as f:
distros = json.load(f)
output_file = 'position_data_blob_14' + str(output_nr) + '.csv' #output file name may be changed
def pythagoras(a,b):
value = math.sqrt(a*a + b*b)
return value
result = pythagoras(str(distro['position'])) #I am totally stuck here :/
print(result)
This piece of script is part of a wider project to parse the file by machine and people and also by work and non work times of day.
If someone could give me some tips on how to make the pythagorus part work i'd be really grateful. I am not sure if I should define it as a function but as I've typed this I am wondering if it should be a 'for' loop which uses the x & y and ignores the x.
All help really appreciated.
Try this:
position = distro['position'] # Get the full list
result = pythagoras(position[0], position[1]) # Get the first and second element from the list
print(result)
Why do you use str() for the argument of the function ? What were you trying to do ?
You're passing one input, a list of numbers, into a function that takes two numbers as input. There are two solutions to this - either change what you pass in, or change the function.
distro['position'] = [18.11, 33.48, 2.15], so for the first solution all you need to do is pass in distro['position'][0] and distro['position'][1]:
result = pythagoras(distro['position'][0], distro['position'][1])
Alternatively (which in my opinion is more elegant), pass in the list to the function and have the function extract the values it cares about:
result = pythagoras(distro['position'])
def pythagoras(input_triple):
a,b,c = input_triple
value = math.sqrt(a*a + b*b)
return value
The solution i used was
for f in file_list:
print('Input file: ' + f) # Replace with desired operations
with open(f, 'r') as f:
distros = json.load(f)
output_file = '13_01' + str(output_nr) + '.csv' #output file name may be changed
with open(output_file, 'w') as text_file:
for distro in distros:
position = distro['position']
result = math.sqrt(position[0]*position[0] + position[1]*position[1]),
print((result), file=text_file)
print('Output written to file: ' + output_file)
output_nr = output_nr + 1
Did you check with the data type of the parameters you're passing?
def pythagoras(a,b):
value = math.sqrt(int(a)**2 + int(b)**2)
return value
This is in the case of integers.
Related
So I have a couple of documents, of which each has a x and y coordinate (among other stuff). I wrote some code which is able to filter out said x and y coordinates and store them into float variables.
Now Ideally I'd want to find a way to run the same code on all documents I have (number not fixed, but let's say 3 for now), extract x and y coordinates of each document and calculate an average of these 3 x-values and 3 y-values.
How would I approach this? Never done before.
I successfully created the code to extract the relevant data from 1 file.
Also note: In reality each file has more than just 1 set of x and y coordinates but this does not matter for the problem discussed at hand.
I'm just saying that so that the code does not confuse you.
with open('TestData.txt', 'r' ) as f:
full_array = f.readlines()
del full_array[1:31]
del full_array[len(full_array)-4:len(full_array)]
single_line = full_array[1].split(", ")
x_coord = float(single_line[0].replace("1 Location: ",""))
y_coord = float(single_line[1])
size = float(single_line[3].replace("Size: ",""))
#Remove unecessary stuff
category= single_line[6].replace(" Type: Class: 1D Descr: None","")
In the end I'd like to not have to write the same code for each file another time, especially since the amount of files may vary. Now I have 3 files which equals to 3 sets of coordinates. But on another day I might have 5 for example.
Use os.walk to find the files that you want. Then for each file do you calculation.
https://docs.python.org/2/library/os.html#os.walk
First of all create a method to read a file via it's file name and do the parsing in your way. Now iterate through the directory,I guess files are in the same directory.
Here is the basic code:
import os
def readFile(filename):
try:
with open(filename, 'r') as file:
data = file.read()
return data
except:
return ""
for filename in os.listdir('C:\\Users\\UserName\\Documents'):
#print(filename)
data=readFile( filename)
print(data)
#parse here
#do the calculation here
I have been working with some code that exports layers individually filled with important data into a folder. The next thing I want to do is bring each one of those layers into a different program so that I can combine them and do some different tests. The current way that I know how to do it is by importing them one by one (as seen below).
fn0 = 'layer0'
f0 = np.genfromtxt(fn0 + '.csv', delimiter=",")
fn1 = 'layer1'
f1 = np.genfromtxt(fn1 + '.csv', delimiter=",")
The issue with continuing this way is that I may have to deal with up to 100 layers at a time, and it would be very inconvenient to have to import each layer individually.
Is there a way I can change my code to do this iteratively so that I can have a code similar to such:
N = 100
for i in range(N)
fn(i) = 'layer(i)'
f(i) = np.genfromtxt(fn(i) + '.csv', delimiter=",")
Please let me know if you know of any ways!
you can use string formatting as follows
N = 100
f = [] #create an empty list
for i in range(N)
fn_i = 'layer(%d)'%i #parentheses!
f.append(np.genfromtxt(fn_i + '.csv', delimiter=",")) #add to f
What I mean by parentheses! is that they are 'important' characters. They indicate function calls and tuples, so you shouldn't use them in variables (ever!)
The answer of Mohammad Athar is correct. However, you should not use the % printing any longer. According to PEP 3101 (https://www.python.org/dev/peps/pep-3101/) it is supposed to be replaced by format(). Moreover, as you have more than 100 files a format like layer_007.csv is probably appreciated.
Try something like:
dataDict=dict()
for counter in range(214):
fileName = 'layer_{number:03d}.csv'.format(number=counter)
dataDict[fileName] = np.genfromtxt( fileName, delimiter="," )
When using a dictionary, like here, you can directly access your data later by using the file name; it is unsorted though, such that you might prefer the list version of Mohammad Athar.
I have 3 csv that i will like to change one column to a running number that depends on the number on rows in the file.
For exmaple, file 1 got 400 rows, file 2 got 240, and file 3 got 100.
so the added column for file 1 will be running number from 1 to 400.
so the added column for file 2 will be running number from 401 to 640.
so the added column for file 3 will be running number from 641 to 741.
what I wrote is this
file1 = str(path) + "file1"
file2 = str(path) + "file2"
file3 = str(path) + "file3"
files = [file1, file2, file3]
class File_Editor():
def line_len(self):
for k in range(0,2):
file_name = open(files[k] + ".csv")
numline = len(file_name.readlines())
print (numline)
I am stuck with making the running number for each file by remembering the number of row that were on the file before.
Thanks Alot!
+++++EDIT+++++
#roganjosh Thanks alot, I used your code with a little fixed for the running_number = 1, I have put it inside the def, that both files will have the same running number.
One last thing, How can I add at the first row Index, for example, "Number"
and then from the 2nd row, run the running_number_in_csv.
Thanks
Looking at your previous questions that are left open, the common theme is fundamental issue with understanding in how to use functions in Python that isn't being addressed. I will try and unpick part of this to prevent similar questions arising. I'm assuming you come from a scientific background like me so I'll stick to that.
You never pass arguments to your functions, only self. Instead you try to reference globals from within the function, but there is no need and it is confusing. For example, I might have the equation y = x^2 + 3x + 5 that is both a mathematical function and can be a python function.
def quadratic(value_of_x):
y = (value_of_x **2) + (3*value_of_x) + 5
return y
eg_1 = quadratic(5)
print (eg_1)
eg_2 = quadratic(3)
print (eg_2)
# But this will fail
#print (y)
y exists only within the Python function as a local variable and is destroyed once you leave the def / return block. In this case, eg_1, eg_2 assume the value of y at the end of the function and value_of_x assumes the value that I put in brackets on the function call (the argument/variable). That's the point of functions, they can be used over and over.
I can also pass multiple arguments to the function.
def new_quadratic(value_of_x, coefficient):
y = coefficient*(value_of_x **2) + (3*value_of_x) + 5
return y
eg_3 = new_quadratic(5, 2)
print (eg_3)
Not only can I not get a value for y outside of the scope of a function, but a function does nothing unless it's called. This does nothing; it's the equivalent of knowing the formula in your head but never running a number through it - you're just defining it as something that your script could use.
starting_number = 5
def modify_starting_number(starting_number):
starting_number = starting_number * 2
return starting_number
print (starting_number)
Whereas this does what you expected it to do. You call the function i.e. pass the number through the formula.
starting_number = 5
def modify_starting_num(starting_num):
starting_num = starting_num * 2
return starting_num
starting_number = modify_starting_num(starting_number) # Calling the function
print (starting_number)
With that out of the way, on to your question.
import csv
files = ['file_1', 'file_2']
def running_number_in_csv(filename_list):
""" running_number resets every time the function is called, but is
remembered within the function itself"""
running_number = 1
for individual_file in filename_list:
new_rows = [] # Make something to hold row + extra column
# Read contents of each row and append the running number to the list
with open(individual_file + '.csv', 'r') as infile:
reader = csv.reader(infile)
for row in reader:
row.append(running_number)
new_rows.append(row)
running_number += 1 # Increments every row, regardless of file name number
# Write the list containing the extra column for running number
with open(individual_file + '.csv', 'w') as outfile: # Might need 'wb' in Windows
writer = csv.writer(outfile)
writer.writerows(new_rows)
get_running_number = running_number_in_csv(files) # CALL THE FUNCTION :)
#roganjosh I have fixed my code.
I know what is the lenght on the file, now i need to add a column with running numbers like:
file1
1 to 400
file2
401 to 641
file 3
642 to 742
Thanks alot!
I am working with datasets stored in large text files. For the analysis I am carrying out, I open the files, extract parts of the dataset and compare the extracted subsets. My code works like so:
from math import ceil
with open("seqs.txt","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
fives = int( ceil(0.05*len(f)) )
thirds = int( ceil(len(f)/3) )
## top/bottom 5% of dataset
low_5=f[0:fives]
top_5=f[-fives:]
## top/bottom 1/3 of dataset
low_33=f[0:thirds]
top_33=f[-thirds:]
## Write lists to file
# top-5
with open("high-5.out","w") as outfile1:
for i in top_5:
outfile1.write("%s" %i)
# low-5
with open("low-5.out","w") as outfile2:
for i in low_5:
outfile2.write("%s" %i)
# top-33
with open("high-33.out","w") as outfile3:
for i in top_33:
outfile3.write("%s" %i)
# low-33
with open("low-33.out","w") as outfile4:
for i in low_33:
outfile4.write("%s" %i)
I am trying to find a more clever way of automating the process of writing the lists out to files. In this case there are only four, but in the future cases where I may end up with as many as 15-25 lists I would some function to take care of this. I wrote the following:
def write_to_file(*args):
for i in args:
with open(".out", "w") as outfile:
outfile.write("%s" %i)
but the resulting file only contains the final list when I call the function like so:
write_to_file(low_33,low_5,top_33,top_5)
I understand that I have to define an output file for each list (which I am not doing in the function above), I'm just not sure how to implement this. Any ideas?
Make your variable names match your filenames and then use a dictionary to hold them instead of keeping them in the global namespace:
data = {'high_5': # data
,'low_5': # data
,'high_33': # data
,'low_33': # data}
for key in data:
with open('{}.out'.format(key), 'w') as output:
for i in data[key]:
output.write(i)
Keeps your data in a single easy to use place, and assuming you want to apply the same actions to them you can continue using the same paradigm.
As mentioned by PM2Ring below, it would be advisable to use underscores (as you do in the variable names) instead of dashes(as you do in the filenames) as by doing so you can pass the dictionary keys as keyword arguments into a writing function:
write_to_file(**data)
This would equate to:
write_to_file(low_5=f[:fives], high_5=f[-fives:],...) # and the rest of the data
From this you could use one of the functions defined by the other answers.
You could have one output file per argument by incrementing a counter for each argument. For example:
def write_to_file(*args):
for index, i in enumerate(args):
with open("{}.out".format(index+1), "w") as outfile:
outfile.write("%s" %i)
The example above will create output files "1.out", "2.out", "3.out", and "4.out".
Alternatively, if you had specific names you wanted to use (as in your original code), you could do something like the following:
def write_to_file(args):
for name, data in args:
with open("{}.out".format(name), "w") as outfile:
outfile.write("%s" % data)
args = [('low-33', low_33), ('low-5', low_5), ('high-33', top_33), ('high-5', top_5)]
write_to_file(args)
which would create output files "low-33.out", "low-5.out", "high-33.out", and "high-5.out".
Don't try to be clever. Instead aim to have your code readable, easy to understand. You can group repeated code into a function, for example:
from math import ceil
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write('{}'.format(item))
with open('data.txt') as f:
numbers = list(f)
five_percent = int(len(numbers) * 0.05)
thirty_three_percent = int(ceil(len(numbers) / 3.0))
# Why not: thirty_three_percent = int(len(numbers) * 0.33)
save_to_file(numbers[:five_percent], 'low-5.out')
save_to_file(numbers[-five_percent:], 'high-5.out')
save_to_file(numbers[:thirty_three_percent], 'low-33.out')
save_to_file(numbers[-thirty_three_percent:], 'high-33.out')
Update
If you have quite a number of lists to write, then it makes sense to use a loop. I suggest to have two functions: save_top_n_percent and save_low_n_percent to help with the job. They contain a little duplicated code, but by separating them into two functions, it is clearer and easier to understand.
def save_to_file(data, filename):
with open(filename, 'wb') as f:
for item in data:
f.write(item)
def save_top_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[-n_percent:], 'top-{}.out'.format(n))
def save_low_n_percent(n, data):
n_percent = int(len(data) * n / 100.0)
save_to_file(data[:n_percent], 'low-{}.out'.format(n))
with open('data.txt') as f:
numbers = list(f)
for n_percent in [5, 33]:
save_top_n_percent(n_percent, numbers)
save_low_n_percent(n_percent, numbers)
On this line you are opening up a file called .out each time and writing to it.
with open(".out", "w") as outfile:
You need to make the ".out" unique for each i in args. you can achieve this by passing in a list as the args and the list will contain the file name and data.
def write_to_file(*args):
for i in args:
with open("%s.out" % i[0], "w") as outfile:
outfile.write("%s" % i[1])
And pass in arguments like so...
write_to_file(["low_33",low_33],["low_5",low_5],["top_33",top_33],["top_5",top_5])
You are creating a file called '.out' and overwriting it each time.
def write_to_file(*args):
for i in args:
filename = i + ".out"
contents = globals()[i]
with open(".out", "w") as outfile:
outfile.write("%s" %contents)
write_to_file("low_33", "low_5", "top_33", "top_5")
https://stackoverflow.com/a/6504497/3583980 (variable name from a string)
This will create low_33.out, low_5.out, top_33.out, top_5.out and their contents will be the lists stored in these variables.
Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!
Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..
You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')