I wrote a function to write a file based on different inputs. The function works fine when I used it for individual part. However, when I used a for loop to do it, it does not work, I do not know why.
Below is my code
import controDictRead as CD
import numpy as np
Totallength=180000
x=np.arange(0.,Totallength,1.)
h=300*np.ones(len(x))
Relaxationzone=1000 # Relaxation zone width
#I=2
#x_input=1000
#y_start=-200
y_end=9
z=0.05
nPoints=4000
dx=20000
threshold=1000
x_input=np.zeros(int(np.floor(Totallength/dx)))
y_start=np.zeros(len(x_input))
I=np.zeros(len(x_input))
for i in range(len(x_input)):
if i==0:
x_input[i]=Relaxationzone
else:
x_input[i]=dx*i
if x_input[i]>=Totallength-Relaxationzone:
x_input[i]=Totallength-Relaxationzone
I[i]=i+1
y_start[i]=np.interp(x_input[i],x,h)
controlDict_new=CD.controlDicWritter(I,x_input[i],y_start[i],y_end,z,nPoints)
All I tried to do is to write a file called controlDict, It has a list of locations that has a format like the one below
location1
{
type uniform;
axis y;
start (1000 -300 0.05 );
end (1000 9 0.05 );
nPoints 3000;
}
All my controlDictRead function does is to find the location name (e.g. location1) I am interested in, and then replace the values (e.g. start, end, nPoints) to the one I inputted. This function works fine when I input these values one by one, for example
controlDict_new=CD.controlDicWritter(3,x_input[2],y_start[2],y_end,z,nPoints)
However, when I did it using a loop like the one shown at the beginning, it does not work, the loop stuck at the first location and kept writing values onto the first location throughout the loop, I am not sure why. Any suggestions?
It would be helpful to also see the source of the controDictRead module, but I have a suspicion what might be wrong:
Your code seems to be passing an entire array to controlDictWriter, but your example code appears to be passing a single integer. Is that intentional?
Related
Hello friends!
Summarization:
I got a ee.FeatureCollection containing around 8500 ee.Point-objects. I would like to calculate the distance of these points to a given coordinate, lets say (0.0, 0.0).
For this i use the function geopy.distance.distance() (ref: https://geopy.readthedocs.io/en/latest/#module-geopy.distance). As input the the function takes 2 coordinates in the form of 2 tuples containing 2 floats.
Problem: When i am trying to convert the coordinates in form of an ee.List to float, i always use the getinfo() function. I know this is a callback and it is very time intensive but i don't know another way to extract them. Long story short: To extract the data as ee.Number it takes less than a second, if i want them as float it takes more than an hour. Is there any trick to fix this?
Code:
fc_containing_points = ee.FeatureCollection('projects/ee-philadamhiwi/assets/Flensburg_100') #ee.FeatureCollection
list_containing_points = fc_containing_points.toList(fc_containing_points.size()) #ee.List
fc_containing_points_length = fc_containing_points.size() #ee.Number
for index in range(fc_containing_points_length.getInfo()): #i need to convert ee.Number to int
point_tmp = list_containing_points.get(i) #ee.ComputedObject
point = ee.Feature(point_tmp) #transform ee.ComputedObject to ee.Feature
coords = point.geometry().coordinates() #ee.List containing 2 ee.Numbers
#when i run the loop with this function without the next part
#i got all the data i want as ee.Number in under 1 sec
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0]) #tuple containing 2 floats
#when i add this part to the function it takes hours
PS: This is my first question, pls be patient with me.
I would use .map instead of your looping. This stays server side until you export the table (or possibly do a .getInfo on the whole thing)
fc_containing_points = ee.FeatureCollection('projects/eephiladamhiwi/assets/Flensburg_100')
fc_containing_points.map(lambda feature: feature.set("distance_to_point", feature.distance(ee.Feature(ee.Geometry.Point([0.0,0.0])))
# Then export using ee.batch.Export.Table.toXXX or call getInfo
(An alternative might be to useee.Image.paint to convert the target point to an image then, use ee.Image.distance to calculate the distance to the point (as an image), then use reduceRegions over the feature collection with all points but 1) you can only calculate distance to a certain distance and 2) I don't think it would be any faster.)
To comment on your code, you are probably aware loops (especially client side loops) are frowned upon in GEE (primarily for the performance reasons you've run into) but also note that any time you call .getInfo on a server side object it incurs a performance cost. So this line
coords_as_tuple_of_ints = (coords.getInfo()[1],coords.getInfo()[0])
Would take roughly double the time as this
coords_client = coords.getInfo()
coords_as_tuple_of_ints = (coords_client[1],coords_client[0])
Finally, you could always just export your entire feature collection to a shapefile (using ee.batch.Export.Table.... as above) and do all the operations using geopy locally.
My program is a function that converts numbers from one base to another. It takes three arguments: the initial value, the base of the initial value, then the base it is to be converted to.
The thing has several errors. For one, the thing won't accept any value that contains a letter for cnum. I don't know why. And I can't seem to figure out how to force the thing to recognize the argument 'cnum' as a string within the function call. I have to convert it into a function in the code itself.
Also, I can't get the second half, the part that converts the number to the final base, to work. Either it gives me an infinite loop (for some reason I can't figure out), or it doesn't do the complete calculation. This one, if I enter fconbase(100, 10, 12) Should convert 100 from base 10 to base 12. It only spits out 8. The answer should be 84.
Here's my entire function.
#delcaring variables
cnum=0 #number to be converted
cbase1=0 #base the number is written in
cbase2=0 #base the number will be converted to
cnumlen=0 #number of digits
digitNum=0 #used to fetch out each digit one by one in order
exp=0 #used to calculate position in result
currentDigit="blank" #stores the digit that's been pulled from the string
result=0 #stores the result of internal calculations
decimalResult=0 #stores cnum as a base 10 number
finalResult=0 #the final result of the conversion
def fconbase(cnum, cbase1, cbase2):
#converts number into base 10, because the math must be done in base 10
#resets variables used in calculations
exp=0
result=0
decimalResult=0
currentDigit="blank"
cnumlen=len(str(cnum)) #finds length of cnum, stays constant
digitNum=cnumlen #sets starting placement
while exp<cnumlen:
currentDigit=str(cnum)[digitNum-1:digitNum]
#the following converts letters into their corresponding integers
if currentDigit=="a" or currentDigit=="A":
currentDigit="10"
if currentDigit=="b" or currentDigit=="B":
currentDigit="11"
if currentDigit=="c" or currentDigit=="C":
currentDigit="12"
if currentDigit=="d" or currentDigit=="D":
currentDigit="13"
if currentDigit=="e" or currentDigit=="E":
currentdigit="14"
if currentDigit=="f" or currentDigit=="F":
currentDigit="15"
result=int(currentDigit)
decimalResult=decimalResult+result*(cbase1**exp)
exp=exp+1
digitNum=digitNum-1
#this part converts the decimal number into the target base
#resetting variables again
exp=0
result=0
finalResult=""
while int(decimalResult)>(cbase2**exp):
exp=exp+1
exp=exp-1
while int(decimalResult)/cbase2**exp!=int(decimalResult):
result=int(decimalResult/(cbase2**exp))
if result==10:
result="a"
if result==11:
result="b"
if result==12:
result="c"
if result==13:
result="d"
if result==14:
result="e"
if result==15:
result="f"
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp=exp+1
print(finalResult)
Here is what is supposed to happen in the latter half of the equation:
The program solves cbase2^exp. Exp starts at 0. If that number is less than the decimalResult, then it increases the exp(onent) by 1 and tries again until it results in a number that's greater than the decimalResult.
Then, it divides the decimalResult by cbase2^exp. It converts numbers between 10 and 15 as letters (for bases higher than 10), then appends the result to the final result. It should be concatenating the results together to form the final result that gets printed. I don't understand why its not doing that.
Why does it not generate the right result and why can't I enter a string into the function call?
Without going into specific problems with your code, which as you stated are many, I'll give a brief answer to the actual question in the title
What could be wrong and how can I find [the errors in my code]?
Rather than treating your code as one big complicated function that you have to stare at and understand all at once (I can rarely hold more than 10 lines of code in my own internal brain cache at once), try to break it down into smaller pieces "first I do this and expect this result. Then I take that result and do this to it, and expect another result."
From your description of the problem it seems like you're already thinking that way, but you still dumped this big chunk of code and seemed to struggle with figuring out exactly where the problem is. A lot of beginners will write some big pile of code, and then treat it as a black box while testing it. Like "I'm not getting the right answer and I don't know where the problem begins." This is where learning good debugging skills is crucial.
I would first break things into smaller pieces to just try out at the interactive Python prompt. Put in dummy values for different variables and make sure small snippets of code (1 to 5 lines or so, small small enough that it's easy to reason about) do exactly what you expect to do with different values of the variables.
If that doesn't help, then for starters the tried and true method, often for beginners and advanced developers alike, is to riddle your code with print statements. In as many places as you think is necessary, put a statement to print the values of one or more variables. Like print("exp = %s; result = %s" % (exp, result). Put something this in as many places as you need to trace the values of some variables through the execution. See where it starts to give answers that don't make sense.
Sometimes this is hard to do though. You might not be able to guess the most effective places to put print statements, or even what's important to print. In cases like this (and IMO in most cases) it is more effective to use an interactive debugger like Python's built in pdb. There are many good resources to learn pdb but the basics shouldn't take too long to get down and will save you a whole lot of headache.
pdb will run your code line-by-line, stopping after each line (and in loops it will step through each loop through the loop), allowing you to examine the contents of each variable before advancing to the next line. This gives you full power to check that each part of your code does or doesn't do what you expect, and should help you pinpoint numerous problem areas.
You should use the exp you find in the first step:
while int(decimalResult)>=(cbase2**exp):
exp=exp+1
exp -= 1
while exp >= 0:
...
finalResult=str(finalResult)+str(result)
decimalResult=decimalResult%cbase2**exp
exp -= 1
First of all, the entire first part of the code is not needed, as the int function does it for you. Instead of all that, you can do this.
int(cnum, base=cbase1)
This converts cnum from cbase1 to base 10.
The second part might go to an infinite loop because at the bottom, it says
exp = exp + 1
When it should say
exp = exp - 1
Since you want to go from (for example) 5^2 to 5^0.
The resulting not having the last digit is because it breaks out of the loop at exp = 0.
It doesn't actually add the digit to the result. A simple fix for that is
finalResult = str(finalResult) + str(decimalResult)
I have some .sav files that I want to check for bad data. What I mean by bad data is irrelevant to the problem. I have written a script in python using the spss module to check the cases and then delete them if they are bad. I do that within a datastep by defining a dataset object and then getting its case list. I then use
del datasetObj.cases[k]
to delete the problematic cases within the datastep.
Here is my problem:
Say I have a data set foo.sav and it is the active data set in spss, then I can run something like:
BEGIN PROGRAM PYTHON.
import spss
spss.StartDataStep()
datasetObj = spss.Dataset()
caselist = datasetObj.cases
del caselist[k]
spss.EndDataStep()
END PROGRAM.
from within the spss client and it will delete the case k from the data set foo.sav. But, if I run something like the following using the directory of foo.sav as the working directory:
import os, spss
pathname = os.curdir()
foopathname = os.path.join(pathname, 'foo.sav')
spss.Submit("""
GET FILE='%(foopathname)s'.
DATASET NAME file1.
DATASET ACTIVATE file1.
""" %locals())
spss.StartDataStep()
datasetObj = spss.Dataset()
caselist = datasetObj.cases
del caselist[3]
spss.EndDataStep()
from command line, then it doesn't delete the case k. Similar code which gets values will work fine. E.g.,
print caselist[3]
will print case k (when it is in the data step). I can even change the values for the various entries of a case. But it will not delete cases. Any ideas?
I am new to python and spss, so there may be something that I am not seeing which is obvious to others; hence why I am asking the question.
Your first piece of code did not work for me. I adjusted it as follows to get it working:
BEGIN PROGRAM PYTHON.
import spss
spss.StartDataStep()
datasetObj = spss.Dataset()
del datasetObj.cases[k]
spss.EndDataStep()
END PROGRAM.
Notice that, in your code, caselist is just a list, containing values taken from the datasetObj in SPSS. The attribute .cases belongs to datasetObj.
With spss.Submit, you can also delete cases (or actually, not select them) using the SPSS command SELECT IF. For example, if your file has a variable (column) named age, with values ranging from 0 to 100, you can delete all cases with an age lower than (in SPSS: lt or <) 25 using:
BEGIN PROGRAM PYTHON.
import spss
spss.Submit("""
SELECT IF age lt 25.
""")
END PROGRAM.
Don't forget to add some code to save the edited file.
caselist is not actually a regular list containing the dataset values. Although its interface is the list interface, it actually works directly with the dataset, so it does not contain a list of values. It just accesses operations on the SPSS side to retrieve, change, or delete values. The most important difference is that since Statistics is not keeping the data in memory, the size of the caselist is not limited by memory.
However, if you are trying to iterate over the cases with a loop using
range(spss.GetCaseCount())
and deleting some, the loop will eventually fail, because the actual case count reflects the deletions, but the loop limit doesn't reflect that. And datasetObj.cases[k] might not be the case you expect if an earlier case has been deleted. So you need to keep track of the deletions and adjust the limit or the k value appropriately.
HTH
I'm using influxdb in my project and I'm facing an issue with query when multiple points are written at once
I'm using influxdb-python to write 1000 unique points to influxdb.
In the influxdb-python there is a function called influxclient.write_points()
I have two options now:
Write each point once every time (1000 times) or
Consolidate 1000 points and write all the points once.
The first option code looks like this(pseudo code only) and it works:
thousand_points = [0...9999
while i < 1000:
...
...
point = [{thousand_points[i]}] # A point must be converted to dictionary object first
influxclient.write_points(point, time_precision="ms")
i += 1
After writing all the points, when I write a query like this:
SELECT * FROM "mydb"
I get all the 1000 points.
To avoid the overhead added by every write in every iteration, I felt like exploring writing multiple points at once. Which is supported by the write_points function.
write_points(points, time_precision=None, database=None,
retention_policy=None, tags=None, batch_size=None)
Write to multiple time series names.
Parameters: points (list of dictionaries, each dictionary represents
a point) – the list of points to be written in the database
So, what I did was:
thousand_points = [0...999]
points = []
while i < 1000:
...
...
points.append({thousand_points[i]}) # A point must be converted to dictionary object first
i += 1
influxclient.write_points(points, time_precision="ms")
With this change, when I query:
SELECT * FROM "mydb"
I only get 1 point as the result. I don't understand why.
Any help will be much appreciated.
You might have a good case for a SeriesHelper
In essence, you set up a SeriesHelper class in advance, and every time you discover a data point to add, you make a call. The SeriesHelper will batch up the writes for you, up to bulk_size points per write
I know this has been asked well over a year ago, however, in order to publish multiple data points in bulk to influxdb each datapoint needs to have a unique timestamp it seems, otherwise it will just be continously overwritten.
I'd import a datetime and add the following to each datapoint within the for loop:
'time': datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
So each datapoint should look something like...
{'fields': data, 'measurement': measurement, 'time': datetime....}
Hope this is helpful for anybody else who runs into this!
Edit: Reading the docs show that another unique identifier is a tag, so you could instead include {'tag' : i} (supposedly each iteration value is unique) if you wish to specify time. (However this I haven't tried)
the following is code I have written that tries to open individual files, which are long strips of data and read them into an array. Essentially I have files that run over 15 times (24 hours to 360 hours), and each file has an iteration of 50, hence the two loops. I then try to open the files into an array. When I try to print a specific element in the array, I get the error "'file' object has no attribute 'getitem'". Any ideas what the problem is? Thanks.
#!/usr/bin/python
############################################
#
import csv
import sys
import numpy as np
import scipy as sp
#
#############################################
level = input("Enter a level: ");
LEVEL = str(level);
MODEL = raw_input("Enter a model: ");
NX = 360;
NY = 181;
date = 201409060000;
DATE = str(date);
#############################################
FileList = [];
data = [];
for j in range(1,51,1):
J = str(j);
for i in range(24,384,24):
I = str(i);
fileName = '/Users/alexg/ECMWF_DATA/DAT_FILES/'+MODEL+'_'+LEVEL+'_v_'+J+'_FT0'+I+'_'+DATE+'.dat';
FileList.append(fileName);
fo = open(fileName,"rb");
data.append(fo);
fo.close();
print data[1][1];
print FileList;
EDITED TO ADD:
Below, find the CORRECT array that the python script should be producing (sorry it wont let me post this inline yet):
http://i.stack.imgur.com/ItSxd.png
The problem I now run into, is that the first three values in the first row of the output matrix are:
-7.090874
-7.004936
-6.920952
These values are actually the first three values of the 11th row in the array below, which is the how it should look (performed in MATLAB). The next three values the python script outputs (as what it believes to be the second row) are:
-5.255577
-5.159874
-5.064171
These values should be found in the 22nd row. In other words, python is placing the 11th row of values in the first position, the 22nd in the second and so on. I don't have a clue as to why, or where in the code I'm specifying it do this.
You're appending the file objects themselves to data, not their contents:
fo = open(fileName,"rb");
data.append(fo);
So, when you try to print data[1][1], data[1] is a file object (a closed file object, to boot, but it would be just as broken if still open), so data[1][1] tries to treat that file object as if it were a sequence, and file objects aren't sequences.
It's not clear what format your data are in, or how you want to split it up.
If "long strips of data" just means "a bunch of lines", then you probably wanted this:
data.append(list(fo))
A file object is an iterable of lines, it's just not a sequence. You can copy any iterable into a sequence with the list function. So now, data[1][1] will be the second line in the second file.
(The difference between "iterable" and "sequence" probably isn't obvious to a newcomer to Python. The tutorial section on Iterators explains it briefly, the Glossary gives some more information, and the ABCs in the collections module define exactly what you can do with each kind of thing. But briefly: An iterable is anything you can loop over. Some iterables are sequences, like list, which means they're indexable collections that you can access like spam[0]. Others are not, like file, which just reads one line at a time into memory as you loop over it.)
If, on the other hand, you actually imported csv for a reason, you more likely wanted something like this:
reader = csv.reader(fo)
data.append(list(reader))
Now, data[1][1] will be a list of the columns from the second row of the second file.
Or maybe you just wanted to treat it as a sequence of characters:
data.append(fo.read())
Now, data[1][1] will be the second character of the second file.
There are plenty of other things you could just as easily mean, and easy ways to write each one of them… but until you know which one you want, you can't write it.