How to fix negative values in log? - python

So, I am getting the data from a txt file and I want to get specific data within the whole set. In the code, I am trying to grab it by specifying which indexes and which frequencies are being used for those indexes. But my log is showing a negative value and I don't how to fix that. Code is below, thanks!
indexes = [9,10,11,12,13]
frequenciesmh = [151,610,1400,4860,18000]
frequenciesgh = [i*10**-3 for i in frequenciesmh]
bigclusterallfluxes = bigcluster[indexes]
bigclusterlogflux151mhandredshift = [i[indexes] for i in bigcluster]
shiftedlogflux151mh =
[np.interp(np.log10((151*10**-3)*i[0]),np.log10(frequenciesgh),i[1:])
for i in bigclusterlogflux151mhandredshift]
shiftflux151mh = [10**i for i in shiftedlogflux151mh]
bigclusterflux151mhandredshift =
np.array(list(zip(shiftflux151mh,np.transpose(bigcluster)[9])))

I don't know what you are trying to fix exactly, but I would definitely NOT change the negative values as they would change the power to being positive always (if you know some maths you will understand that that means 1/16 ==> 16 and also 16 ==> 16).
What you probably want, as you are working with frequencies (Which are always between 0 and 1, if you normalize them, to do this divide each of them by the sum of all of them, hence your logarithm will always be smaller or equal to 0) is to make them all positive and have the - log 10 of your probability, which is quite a common value to have, then 1 == 1/10, 2 == 1/100, etc (which in genetics at least are called phred values I believe).
Summarizing always call the minus log, not the log
-math.log(0.0001)

The abs() function is what you are looking for.

Related

Combining functions to produce a desired integer output

I'm not sure if this question belongs on stackoverflow or math, but I'll give stackoverflow a shot.
I'm attempting to create an algorithm / function that will allow me to solve a puzzle, but I am simply unable to. The puzzle is as stated.
Let a unit be a function that accepts a integer i between 0 and 15.
A unit can add or subtract any number in the range 0 - 15 to/from i.
Additionally, instead of adding or subtracting a number to/from i, a unit can also contain a number from 0-15 and subtract i from that.
A unit can only perform 2 operations on a number, and the operation producing the largest value will be the output of the unit.
Values only go from 0 - 15, so 9 - 15 = 0 and 13 + 5 = 15.
We may combine units together to produce a more complex result.
The first unit may only accept numbers ranging from 0 - 9.
In my examples I will string together 3 units.
This is a problem that is unrelated to coding, but it seems that I need a program to figure out possible solutions. I've attempted to create a brute force algorithm to find solutions, but I've been unable to do so, as I'm not that great with coding.
For example, a problem could be:
For values 1 and 4, let the output be 0. For all others, e.g. 0, 2, 3, 5, 6, 7, 8, 9, the output must be greater than 0.
A solution here might be:
def unit1(input):
return max(5 - input, input)
def unit2(input):
max(14 - input, input)
def unit3(input):
max(10 - input, input - 10)
print(unit3(unit2(unit1(4))))
Another example might be:
For values 4, 5, 6 and 8 the output must be 3 or greater. For all others, e.g. 0, 1, 2, 3, 7, 9, the output must be less than 3.
A solution here might be:
def unit1(input):
return max(4 - input, input - 4)
def unit2(input):
max(2, input)
def unit3(input):
max(1 - input, input - 6)
print(unit3(unit2(unit1(5))))
Given an example as the two stated above, is there a general algorithm / formula I can use to find my desired output?
Additionally, is it possible to solve the problems above using only 2 units?
Please do let me know if I need to elaborate on something, and know that your help is extremely appreciated!
There seems to be basically two kinds of things you have to do: map inputs that should be handled the same way to contiguous ranges, and then move contiguous ranges to the right place.
max(A-x,x-B) is the only kind of unit that can map non-contiguous ranges together. It has limitations: It always maps 2 inputs onto one output, and you've gotta be careful never to map two inputs that have to be handled differently onto the same output.
In terms of what gets mapped together, you only need one parameter, since max(x,A-x) handles all cases. You can try all 16 possibilities to see if any of them help. Sometimes you may need to do a saturating add before the max to collapse inputs at the top or bottom of the range.
In your first example, you need to map 0 and 4 together.
In max(x,A-x), we need 4 = A-1.
Solving that we get A=5, so we start with
max(x, 5-x)
That maps 4 and 1 to 4, and everything else to other values.
Now we need to combine the ranges above and below 4. Everything less than 4 has to map to something higher than 4. We solve 5 = A-3 to get A = 8:
max(x, 8-x)
Now the ranges of things that need to be handled the same way are contiguous, so we just need to move them to the right place. We have values >=4 and we need 4->0. We could add a subtraction unit, but it's shorter just to shift the previous max by subtracting 4 from both cases. We're left with a final solution
max(x, 5-x)
max(x-4, 4-x)
You haven't really defined all the possible questions you might be asked, but it looks like they can all be solved by this two step combine-and-shift process. Sometimes there will be no solution because you can't combine ranges in any valid way with max.
I think your missing piece you need to proceed is a higher-order function. That's a function that returns another function. Something like:
def createUnit(sub, add):
def unit(input):
return max(sub - input, input + add)
return unit
You can use it like:
unit1 = createUnit(5, 0)
unit2 = createUnit(14, 0)
unit3 = createUnit(10, -10)
Your first example can be solved with two units, but your second example can't.
I think the key to solving this efficiently is to work backwards from the output to the input, but I haven't spent enough time on it to figure out precisely how to do so.

SHA Hashing for training/validation/testing set split

Following is a small snippet from the full code
I am trying to understand the logical process of this methodology of split.
SHA1 encoding is 40 characters in hexadecimal. What kind of probability has been computed in the expression ?
What is the reason for (MAX_NUM_IMAGES_PER_CLASS + 1) ? Why add 1 ?
Does setting different values to MAX_NUM_IMAGES_PER_CLASS have an effect on the split quality ?
How good a quality of split would we get out of this ? Is this is a recommended way of splitting datasets ?
# We want to ignore anything after '_nohash_' in the file name when
# deciding which set to put an image in, the data set creator has a way of
# grouping photos that are close variations of each other. For example
# this is used in the plant disease data set to group multiple pictures of
# the same leaf.
hash_name = re.sub(r'_nohash_.*$', '', file_name)
# This looks a bit magical, but we need to decide whether this file should
# go into the training, testing, or validation sets, and we want to keep
# existing files in the same set even if more files are subsequently
# added.
# To do that, we need a stable way of deciding based on just the file name
# itself, so we do a hash of that and then use that to generate a
# probability value that we use to assign it.
hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
percentage_hash = ((int(hash_name_hashed, 16) %
(MAX_NUM_IMAGES_PER_CLASS + 1)) *
(100.0 / MAX_NUM_IMAGES_PER_CLASS))
if percentage_hash < validation_percentage:
validation_images.append(base_name)
elif percentage_hash < (testing_percentage + validation_percentage):
testing_images.append(base_name)
else:
training_images.append(base_name)
result[label_name] = {
'dir': dir_name,
'training': training_images,
'testing': testing_images,
'validation': validation_images,
}
This code is simply distributing file names “randomly” (but reproducibly) over a number of bins and then grouping the bins into just the three categories. The number of bits in the hash is irrelevant (so long as it’s “enough”, which is probably about 35 for this sort of work).
Reducing modulo n+1 produces a value on [0,n], and multiplying that by 100/n obviously produces a value on [0,100], which is being interpreted as a percentage. n being MAX_NUM_IMAGES_PER_CLASS is meant to control the rounding error in the interpretation to be no more than “one image”.
This strategy is reasonable, but looks a bit more sophisticated than it is (since there is still rounding going on, and the remainder introduces a bias—although with numbers this large it is utterly unobservable). You could make it simpler and more accurate by simply precalculating ranges over the whole space of 2^160 hashes for each class and just checking the hash against the two boundaries. That still notionally involves rounding, but with 160 bits it’s only that intrinsic to representing decimals like 31% in floating point.

Python - Avoiding zero division errors when taking the log of very small values (several hundred decimals)

I am working with very small p-values (several hundred decimals) and I am trying to detect the smallest one in the list. It seems like Python detects many of them as zero so I get a zero division error when I log.
To avoid this, I have written this code:
smallest_val = min(np.array(p_value)[np.array(p_value) > 0])
for i in range(len(p_value)):
if p_value[i] == 0:
p_value[i] = smallest_val
p_value_log = []
for i in p_value:
b = log(i)
p_value_log.append(b)
Of course, this does not solve my problem as several small p-values are then equal to smallest_val and I can't identify the smallest. Any idea on the best way to go about this?
Instead of replacing 0's with smallest_val, replace them with smallest_val/2 or something else between smallest_val and 0, so that you can still identify your replacements.

python - rescale a value between two numerical ranges

I am trying to convert a value, comprised in the range of two integers, to another integer comprised in an other given range. I want the output value to be the corresponding value of my input integer (respecting the proportions). To be more clear, here is an example of the problem I'd like to solve:
Suppose I have three integers, let's say 10, 28 and 13, that are comprised in the range (10, 28). 10 is the minimum possible value, 28 the maximum possible value.
I want, as output, integers converted to the 'corresponding' number in the range (5, 20). In this precise example, 10 would be converted to 5, 28 to 20, and 13 to a value between 5 and 20, rescaled to keep the proportions intacts. Afterwards, this value is converted to integer.
How is it possible to achieve such 'rescaling' in python ? I tried the usual calculation (value/max of first range)*max of second range but it gives wrong values except in rare cases.
At first I had problems with the division with intin python, but after correcting this, I still have wrong values. For instance, using the values given in the example, int((float(10)/28)*20) will give me 7 as result, where it should return 5 because it's the minimum possible value in the first range.
I feel like it is a bit obvious (in terms of logic and math) and I am missing something.
If you are getting wrong results, you are likely using Python2 - where a division always yield an integer - (and therefore you will get lots of rounding errors and "zeros" when it comes to scaling factors.
Python3 corrected this so that divisions return floats - in Python2 the workaround is either to put a from __future__ import division on the first line of your code (preferred), or to explicitly convert at least one of the division operands to float - on every division.
from __future__ import division
def renormalize(n, range1, range2):
delta1 = range1[1] - range1[0]
delta2 = range2[1] - range2[0]
return (delta2 * (n - range1[0]) / delta1) + range2[0]

In what contexts do programming languages make real use of an Infinity value?

So in Ruby there is a trick to specify infinity:
1.0/0
=> Infinity
I believe in Python you can do something like this
float('inf')
These are just examples though, I'm sure most languages have infinity in some capacity. When would you actually use this construct in the real world? Why would using it in a range be better than just using a boolean expression? For instance
(0..1.0/0).include?(number) == (number >= 0) # True for all values of number
=> true
To summarize, what I'm looking for is a real world reason to use Infinity.
EDIT: I'm looking for real world code. It's all well and good to say this is when you "could" use it, when have people actually used it.
Dijkstra's Algorithm typically assigns infinity as the initial edge weights in a graph. This doesn't have to be "infinity", just some arbitrarily constant but in java I typically use Double.Infinity. I assume ruby could be used similarly.
Off the top of the head, it can be useful as an initial value when searching for a minimum value.
For example:
min = float('inf')
for x in somelist:
if x<min:
min=x
Which I prefer to setting min initially to the first value of somelist
Of course, in Python, you should just use the min() built-in function in most cases.
There seems to be an implied "Why does this functionality even exist?" in your question. And the reason is that Ruby and Python are just giving access to the full range of values that one can specify in floating point form as specified by IEEE.
This page seems to describe it well:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
As a result, you can also have NaN (Not-a-number) values and -0.0, while you may not immediately have real-world uses for those either.
In some physics calculations you can normalize irregularities (ie, infinite numbers) of the same order with each other, canceling them both and allowing a approximate result to come through.
When you deal with limits, calculations like (infinity / infinity) -> approaching a finite a number could be achieved. It's useful for the language to have the ability to overwrite the regular divide-by-zero error.
Use Infinity and -Infinity when implementing a mathematical algorithm calls for it.
In Ruby, Infinity and -Infinity have nice comparative properties so that -Infinity < x < Infinity for any real number x. For example, Math.log(0) returns -Infinity, extending to 0 the property that x > y implies that Math.log(x) > Math.log(y). Also, Infinity * x is Infinity if x > 0, -Infinity if x < 0, and 'NaN' (not a number; that is, undefined) if x is 0.
For example, I use the following bit of code in part of the calculation of some log likelihood ratios. I explicitly reference -Infinity to define a value even if k is 0 or n AND x is 0 or 1.
Infinity = 1.0/0.0
def Similarity.log_l(k, n, x)
unless x == 0 or x == 1
k * Math.log(x.to_f) + (n-k) * Math.log(1.0-x)
end
-Infinity
end
end
Alpha-beta pruning
I use it to specify the mass and inertia of a static object in physics simulations. Static objects are essentially unaffected by gravity and other simulation forces.
In Ruby infinity can be used to implement lazy lists. Say i want N numbers starting at 200 which get successively larger by 100 units each time:
Inf = 1.0 / 0.0
(200..Inf).step(100).take(N)
More info here: http://banisterfiend.wordpress.com/2009/10/02/wtf-infinite-ranges-in-ruby/
I've used it for cases where you want to define ranges of preferences / allowed.
For example in 37signals apps you have like a limit to project number
Infinity = 1 / 0.0
FREE = 0..1
BASIC = 0..5
PREMIUM = 0..Infinity
then you can do checks like
if PREMIUM.include? current_user.projects.count
# do something
end
I used it for representing camera focus distance and to my surprise in Python:
>>> float("inf") is float("inf")
False
>>> float("inf") == float("inf")
True
I wonder why is that.
I've used it in the minimax algorithm. When I'm generating new moves, if the min player wins on that node then the value of the node is -∞. Conversely, if the max player wins then the value of that node is +∞.
Also, if you're generating nodes/game states and then trying out several heuristics you can set all the node values to -∞/+∞ which ever makes sense and then when you're running a heuristic its easy to set the node value:
node_val = -∞
node_val = max(heuristic1(node), node_val)
node_val = max(heuristic2(node), node_val)
node_val = max(heuristic2(node), node_val)
I've used it in a DSL similar to Rails' has_one and has_many:
has 0..1 :author
has 0..INFINITY :tags
This makes it easy to express concepts like Kleene star and plus in your DSL.
I use it when I have a Range object where one or both ends need to be open
I've used symbolic values for positive and negative infinity in dealing with range comparisons to eliminate corner cases that would otherwise require special handling:
Given two ranges A=[a,b) and C=[c,d) do they intersect, is one greater than the other, or does one contain the other?
A > C iff a >= d
A < C iff b <= c
etc...
If you have values for positive and negative infinity that respectively compare greater than and less than all other values, you don't need to do any special handling for open-ended ranges. Since floats and doubles already implement these values, you might as well use them instead of trying to find the largest/smallest values on your platform. With integers, it's more difficult to use "infinity" since it's not supported by hardware.
I ran across this because I'm looking for an "infinite" value to set for a maximum, if a given value doesn't exist, in an attempt to create a binary tree. (Because I'm selecting based on a range of values, and not just a single value, I quickly realized that even a hash won't work in my situation.)
Since I expect all numbers involved to be positive, the minimum is easy: 0. Since I don't know what to expect for a maximum, though, I would like the upper bound to be Infinity of some sort. This way, I won't have to figure out what "maximum" I should compare things to.
Since this is a project I'm working on at work, it's technically a "Real world problem". It may be kindof rare, but like a lot of abstractions, it's convenient when you need it!
Also, to those who say that this (and other examples) are contrived, I would point out that all abstractions are somewhat contrived; that doesn't mean they are useful when you contrive them.
When working in a problem domain where trig is used (especially tangent) infinity is an answer that can come up. Trig ends up being used heavily in graphics applications, games, and geospatial applications, plus the obvious math applications.
I'm sure there are other ways to do this, but you could use Infinity to check for reasonable inputs in a String-to-Float conversion. In Java, at least, the Float.isNaN() static method will return false for numbers with infinite magnitude, indicating they are valid numbers, even though your program might want to classify them as invalid. Checking against the Float.POSITIVE_INFINITY and Float.NEGATIVE_INFINITY constants solves that problem. For example:
// Some sample values to test our code with
String stringValues[] = {
"-999999999999999999999999999999999999999999999",
"12345",
"999999999999999999999999999999999999999999999"
};
// Loop through each string representation
for (String stringValue : stringValues) {
// Convert the string representation to a Float representation
Float floatValue = Float.parseFloat(stringValue);
System.out.println("String representation: " + stringValue);
System.out.println("Result of isNaN: " + floatValue.isNaN());
// Check the result for positive infinity, negative infinity, and
// "normal" float numbers (within the defined range for Float values).
if (floatValue == Float.POSITIVE_INFINITY) {
System.out.println("That number is too big.");
} else if (floatValue == Float.NEGATIVE_INFINITY) {
System.out.println("That number is too small.");
} else {
System.out.println("That number is jussssst right.");
}
}
Sample Output:
String representation: -999999999999999999999999999999999999999999999
Result of isNaN: false
That number is too small.
String representation: 12345
Result of isNaN: false
That number is jussssst right.
String representation: 999999999999999999999999999999999999999999999
Result of isNaN: false
That number is too big.
It is used quite extensively in graphics. For example, any pixel in a 3D image that is not part of an actual object is marked as infinitely far away. So that it can later be replaced with a background image.
I'm using a network library where you can specify the maximum number of reconnection attempts. Since I want mine to reconnect forever:
my_connection = ConnectionLibrary(max_connection_attempts = float('inf'))
In my opinion, it's more clear than the typical "set to -1 to retry forever" style, since it's literally saying "retry until the number of connection attempts is greater than infinity".
Some programmers use Infinity or NaNs to show a variable has never been initialized or assigned in the program.
If you want the largest number from an input but they might use very large negatives. If I enter -13543124321.431 it still works out as the largest number since it's bigger than -inf.
enter code here
initial_value = float('-inf')
while True:
try:
x = input('gimmee a number or type the word, stop ')
except KeyboardInterrupt:
print("we done - by yo command")
break
if x == "stop":
print("we done")
break
try:
x = float(x)
except ValueError:
print('not a number')
continue
if x > initial_value: initial_value = x
print("The largest number is: " + str(initial_value))
You can to use:
import decimal
decimal.Decimal("Infinity")
or:
from decimal import *
Decimal("Infinity")
For sorting
I've seen it used as a sort value, to say "always sort these items to the bottom".
To specify a non-existent maximum
If you're dealing with numbers, nil represents an unknown quantity, and should be preferred to 0 for that case. Similarly, Infinity represents an unbounded quantity, and should be preferred to (arbitrarily_large_number) in that case.
I think it can make the code cleaner. For example, I'm using Float::INFINITY in a Ruby gem for exactly that: the user can specify a maximum string length for a message, or they can specify :all. In that case, I represent the maximum length as Float::INFINITY, so that later when I check "is this message longer than the maximum length?" the answer will always be false, without needing a special case.

Categories