Pitfalls in R for Python programmers

Pitfalls in R for Python programmers - python

I have mostly programmed in Python, but I am now learning the statistical programming language R. I have noticed some difference between the languages that tend to trip me.
Suppose v is a vector/array with the integers from 1 to 5 inclusive.
v[3] # in R: gives me the 3rd element of the vector: 3
# in Python: is zero-based, gives me the integer 4
v[-1] # in R: removes the element with that index
# in Python: gives me the last element in the array
Are there any other pitfalls I have to watch out for?

Having written tens of thousands of lines of code in both languages, R is just a lot more idiosyncratic and less consistent than Python. It's really nice for doing quick plots and investigation on a small to medium size dataset, mainly because its built-in dataframe object is nicer than the numpy/scipy equivalent, but you'll find all kinds of weirdness as you do things more complicated than one liners. My advice is to use rpy2 (which unfortunately has a much worse UI than its predecessor, rpy) and just do as little as possible in R with the rest in Python.
For example, consider the following matrix code:
> u = matrix(1:9,nrow=3,ncol=3)
> v = u[,1:2]
> v[1,1]
[2] 1
> w = u[,1]
> w[1,1]
Error in w[1, 1] : incorrect number of dimensions
How did that fail? The reason is that if you select a submatrix from a matrix which has only one column along any given axis, R "helpfully" drops that column and changes the type of the variable. So w is a vector of integers rather than a matrix:
> class(v)
[1] "matrix"
> class(u)
[1] "matrix"
> class(w)
[1] "integer"
To avoid this, you need to actually pass an obscure keyword parameter:
> w2 = u[,1,drop=FALSE]
> w2[1,1]
[3] 1
> class(w2)
[1] "matrix"
There's a lot of nooks and crannies like that. Your best friend at the beginning will be introspection and online help tools like str,class,example, and of course help. Also, make sure to look at the example code on the R Graph Gallery and in Ripley's Modern Applied Statistics with S-Plus book.
EDIT: Here's another great example with factors.
> xx = factor(c(3,2,3,4))
> xx
[1] 3 2 3 4
Levels: 2 3 4
> yy = as.numeric(xx)
> yy
[1] 2 1 2 3
Holy cow! Converting something from a factor back to a numeric didn't actually do the conversion you thought it would. Instead it's doing it on the internal enumerated type of the factor. This is a source of hard-to-find bugs for people who aren't aware of this, because it's still returning integers and will in fact actually work some of the time (when the input is already numerically ordered).
This is what you actually need to do
> as.numeric(levels(xx))[xx]
[1] 3 2 3 4
Yeah, sure, that fact is on the factor help page, but you only land up there when you've lost a few hours to this bug. This is another example of how R does not do what you intend. Be very, very careful with anything involving type conversions or accessing elements of arrays and lists.

This isn't specifically addressing the Python vs. R background, but the R inferno is a great resource for programmers coming to R.

The accepted answer to this post is possibly a bit outdated. The Pandas Python library now provides amazing R-like DataFrame support.

There may be... but before you embark on that have you tried some of the available Python extensions? Scipy has a list.

Related

8 Queens on a chessboard | PYTHON | Memory Error

I came across this question where 8 queens should be placed on a chessboard such that none can kill each other.This is how I tried to solve it:
import itertools
def allAlive(position):
qPosition=[]
for i in range(8):
qPosition.append(position[2*i:(2*i)+2])
hDel=list(qPosition) #Horizontal
for i in range(8):
a=hDel[0]
del hDel[0]
l=len(hDel)
for j in range(l):
if a[:1]==hDel[j][:1]:
return False
vDel=list(qPosition) #Vertical
for i in range(8):
a=vDel[0]
l=len(vDel)
for j in range(l):
if a[1:2]==vDel[j][1:2]:
return False
cDel=list(qPosition) #Cross
for i in range(8):
a=cDel[0]
l=len(cDel)
for j in range(l):
if abs(ord(a[:1])-ord(cDel[j][:1]))==1 and abs(int(a[1:2])-int(cDel[j][1:2]))==1:
return False
return True
chessPositions=['A1','A2','A3','A4','A5','A6','A7','A8','B1','B2','B3','B4','B5','B6','B7','B8','C1','C2','C3','C4','C5','C6','C7','C8','D1','D2','D3','D4','D5','D6','D7','D8','E1','E2','E3','E4','E5','E6','E7','E8','F1','F2','F3','F4','F5','F6','F7','F8','G1','G2','G3','G4','G5','G6','G7','G8','H1','H2','H3','H4','H5','H6','H7','H8']
qPositions=[''.join(p) for p in itertools.combinations(chessPositions,8)]
for i in qPositions:
if allAlive(i)==True:
print(i)
Traceback (most recent call last):
qPositions=[''.join(p) for p in itertools.combinations(chessPositions,8)]
MemoryError
I'm still a newbie.How can I overcome this error?Or is there any better way to solve this problem?

What you are trying to do is impossible ;)!
qPositions=[''.join(p) for p in itertools.combinations(chessPositions,8)]
means that you will get a list with length 64 choose 8 = 4426165368, since len(chessPositions) = 64, which you cannot store in memory. Why not? Combining what I stated in the comments and #augray in his answer, the result of above operation would be a list which would take
(64 choose 8) * 2 * 8 bytes ~ 66GB
of RAM, since it will have 64 choose 8 elements, each element will have 8 substrings like 'A1' and each substring like this consists of 2 character. One character takes 1 byte.
You have to find another way. I am not answering to that because that is your job. The n-queens problem falls into dynamic programming. I suggest you to google 'n queens problem python' and search for an answer. Then try to understand the code and dynamic programming.
I did searching for you, take a look at this video. As suggested by #Jean François-Fabre, backtracking. Your job is now to watch the video once, twice,... as long as you don't understand the solution to problem. Then open up your favourite editor (mine is Vi :D) and code it down!

This is one case where it's important to understand the "science" (or more accurately, math) part of computer science as much as it is important to understand the nuts and bolts of programming.
From the documentation for itertools.combinations, we see that the number of items returned is n! / r! / (n-r)! where n is the length of the input collection (in your case the number of chess positions, 64) and r is the length of the subsequences you want returned (in your case 8). As #campovski has pointed out, this results in 4,426,165,368. Each returned subsequence will consist of 8*2 characters, each of which is a byte (not to mention the overhead of the other data structures to hold these and calculate the answer). Each character is 1 byte, so in total, just counting the memory consumption of the resulting subsequences gives 4,426,165,368*2*8=70818645888. dividing this by 1024^3 gives the number of Gigs of memory held by these subsequences, about 66GB.
I'm assuming you don't have that much memory :-) . Calculating the answer to this question will require a well thought out algorithm, not just "brute force". I recommend doing some research on the problem- Wikipedia looks like a good place to start.

As the other answers stated you cant get every combination to fit in memory, and you shouldn't use brute force because the speed will be slow. However, if you want to use brute force, you could constrain the problem, and eliminate common rows and columns and check the diagonal
from itertools import permutations
#All possible letters
letters = ['a','b','c','d','e','f','g','h']
#All possible numbers
numbers = [str(i) for i in range(1,len(letters)+1)]
#All possible permutations given rows != to eachother and columns != to eachother
r = [zip(letters, p) for p in permutations(numbers,8)]
#Formatted for your function
points = [''.join([''.join(z) for z in b]) for b in r]
Also as a note, this line of code attempts to first find all of the combinations, then feed your function, which is a waste of memory.
qPositions=[''.join(p) for p in itertools.combinations(chessPositions,8)]
If you decided you do want to use a brute force method, it is possible. Just modify the code for itertools combinations. Remove the yield and return and just feed your check function one at a time.

Gaussian Elimination in modulo 2 python code

I was wondering whether Gaussian elimination in modulo 2 (or even generally in modulo k for that purpose) has ever been implemented somewhere, so that I do not have to reinvent the wheel and just use the available resources?

The pseudo code of the algorithm you are looking for exists and is:
// A is n by m binary matrix
i := 1 // row and column index
for i := 1 to m do // for every column
// find non-zero element in column i, starting in row i:
maxi := i
for k := i to n do
if A[k,i] = 1 then maxi := k
end for
if A[maxi,i] = 1 then
swap rows i and maxi in A and b, but do not change the value of i
Now A[i,i] will contain the old value of A[maxi,i], that is 1
for u := i+1 to m do
Add A[u,i] * row i to row u, do this for BOTH, matrix A and RHS vector b
Now A[u,i] will be 0
end for
else
declare error – more than one solution exist
end if
end for
if n>m and if you can find zero row in A with nonzero RHS element, then
declare error – no solution.
end if
// now, matrix A is in upper triangular form and solution can be found
use back substitution to find vector x
Taken from this pdf
Binary arithmetic means arithmetic in modulo 2, which is what you are looking for in your question, if I am not wrong.
Unfortunately I don't code in Python, but if you are familiar with Python, you can simply translate the pseudo code above to Python line by line in your own way for your convenience, and this task should be neither difficult nor long.
I googled "gaussian elimination modulo 2 python", but didn't find the python code you are looking for, but I think that this is for good, because during the translation you may understand the algorithm and the method better.
EDIT 1: If you are also familiar with C# and it's not effort for you to translate C# to Python, then Michael Anderson's answer to this question may also help you.
EDIT 2: After posting the answer, I continued the search and found this
"over any field" implies "over modulo 2" and even "over modulo k" for any k≥2.
It contains Source Code for Java Version and Python version too.
According to the last link I gave you for the Python version fieldmath.py includes the class BinaryField which suppose to be the modulo 2 as you wish.
Enjoy!
I just hope that Gauss-Jordan elimination and Gaussian Elimination are not two different things.
EDIT 3: If you are also familiar with VC++ and translating VC++ to Python isn't effort for you then you can also try this.
I hope this well answers your question.

Lua: Decompose a number by powers of 2

This question is a parallel to python - How do I decompose a number into powers of 2?. Indeed, it is the same question, but rather than using Python (or Javascript, or C++, as these also seem to exist), I'm wondering how it can be done using Lua. I have a very basic understanding of Python, so I took the code first listed in the site above and attempted to translate it to Lua, with no success. Here's the original, and following, my translation:
Python
def myfunc(x):
powers = []
i = 1
while i <= x:
if i & x:
powers.append(i)
i <<= 1
return powers
Lua
function powerfind(n)
local powers = {}
i = 1
while i <= n do
if bit.band(i, n) then -- bitwise and check
table.insert(powers, i)
end
i = bit.shl(i, 1) -- bitwise shift to the left
end
return powers
end
Unfortunately, my version locks and "runs out of memory". This was after using the number 12 as a test. It's more than likely that my primitive knowledge of Python is failing me, and I'm not able to translate the code from Python to Lua correctly, so hopefully someone can offer a fresh set of eyes and help me fix it.

Thanks to the comments from user2357112, I've got it fixed, so I'm posting the answer in case anyone else comes across this issue:
function powerfind(n)
local powers = {}
i = 1
while i <= n do
if bit.band(i, n) ~= 0 then -- bitwise and check
table.insert(powers, i)
end
i = bit.shl(i, 1) -- bitwise shift to the left
end
return powers
end

I saw that in the other one, it became a sort of speed contest. This one should also be easy to understand.
i is the current power. It isn't used for calculations.
n is the current place in the array.
r is the remainder after a division of x by two.
If the remainder is 1 then you know that i is a power of two which is used in the binary representation of x.
local function powerfind(x)
local powers={
nil,nil,nil,nil,
nil,nil,nil,nil,
nil,nil,nil,nil,
nil,nil,nil,nil,
}
local i,n=1,0
while x~=0 do
local r=x%2
if r==1 then
x,n=x-1,n+1
powers[n]=i
end
x,i=x/2,2*i
end
end
Running a million iterations, x from 1 to 1000000, takes me 0.29 seconds. I initialize the size of the powers table to 16.

Subtract Unless Negative Then Return 0

I'll preface with, this is solely to satisfy my curiosity rather than needing help on a coding project. But I was wanting to know if anyone knows of a function (particularly in python, but I'll accept a valid mathematical concept) kind of like absolute value, that given a number will return 0 if negative or return that number if positive.
Pseudo code:
def myFunc(x):
if x > 0:
return x
else:
return 0
Again, not asking the question out of complexity, just curiosity. I've needed it a couple times now, and was wondering if I really did need to write my own function or if one already existed. If there isn't a function to do this, is there a way to write this in one line using an expression doesn't evaluate twice.
i.e.
myVar = x-y if x-y>0 else 0
I'd be fine with a solution like that if x-y wasn't evaluated twice. So if anyone out there has any solution, I'd appreciate it.
Thanks

One way...
>>> max(0, x)

This should do it:
max(x-y, 0)

Sounds like an analysis type question. numpy can come to the rescue!
If you have your data in an array:
x = np.arange(-5,11)
print x
[-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10]
# Now do your subtraction and replacement.
x[(x-y)>0] -= y
x[(x-y)<0] = 0
If I understand your question correctly, you want to replace values in x where x-y<0 with zeros, otherwise replace with x-y.
NOTE, the solution above works well for subtracting an integer from an array, or operating on two array of equal dimensions. However, Daniel's solution is more elegant when working on two lists of equal length. It all depends on your needs (and whether you want to venture into the world of numpy or not).

An alternative expression would be
x -= min(x, y)

What category of combinatorial problems appear on the logic games section of the LSAT?

EDIT: See Solving "Who owns the Zebra" programmatically? for a similar class of problem
There's a category of logic problem on the LSAT that goes like this:
Seven consecutive time slots for a broadcast, numbered in chronological order I through 7, will be filled by six song tapes-G, H, L, O, P, S-and exactly one news tape. Each tape is to be assigned to a different time slot, and no tape is longer than any other tape. The broadcast is subject to the following restrictions:
L must be played immediately before O.
The news tape must be played at some time after L.
There must be exactly two time slots between G and
P, regardless of whether G comes before P or whether G comes after P.
I'm interested in generating a list of permutations that satisfy the conditions as a way of studying for the test and as a programming challenge. However, I'm not sure what class of permutation problem this is. I've generalized the type problem as follows:
Given an n-length array A:
How many ways can a set of n unique items be arranged within A? Eg. How many ways are there to rearrange ABCDEFG?
If the length of the set of unique items is less than the length of A, how many ways can the set be arranged within A if items in the set may occur more than once? Eg. ABCDEF => AABCDEF; ABBCDEF, etc.
How many ways can a set of unique items be arranged within A if the items of the set are subject to "blocking conditions"?
My thought is to encode the restrictions and then use something like Python's itertools to generate the permutations. Thoughts and suggestions are welcome.

This is easy to solve (a few lines of code) as an integer program. Using a tool like the GNU Linear Programming Kit, you specify your constraints in a declarative manner and let the solver come up with the best solution. Here's an example of a GLPK program.
You could code this using a general-purpose programming language like Python, but this is the type of thing you'll see in the first few chapters of an integer programming textbook. The most efficient algorithms have already been worked out by others.
EDIT: to answer Merjit's question:
Define:
matrix Y where Y_(ij) = 1 if tape i
is played before tape j, and 0
otherwise.
vector C, where C_i
indicates the time slot when i is
played (e.g. 1,2,3,4,5,6,7)
Large
constant M (look up the term for
"big M" in an optimization textbook)
Minimize the sum of the vector C subject to the following constraints:
Y_(ij) != Y_(ji) // If i is before j, then j must not be before i
C_j < C_k + M*Y_(kj) // the time slot of j is greater than the time slot of k only if Y_(kj) = 1
C_O - C_L = 1 // L must be played immediately before O
C_N > C_L // news tape must be played at some time after L
|C_G - C_P| = 2 // You will need to manipulate this a bit to make it a linear constraint
That should get you most of the way there. You want to write up the above constraints in the MathProg language's syntax (as shown in the links), and make sure I haven't left out any constraints. Then run the GLPK solver on the constraints and see what it comes up with.

Okay, so the way I see it, there are two ways to approach this problem:
Go about writing a program that will approach this problem head first. This is going to be difficult.
But combinatorics teaches us that the easier way to do this is to count all permutations and subtract the ones that don't satisfy your constraints.
I would go with number 2.
You can find all permutations of a given string or list by using this algorithm. Using this algorithm, you can get a list of all permutations. You can now apply a number of filters on this list by checking for the various constraints of the problem.
def L_before_O(s):
return (s.index('L') - s.index('O') == 1)
def N_after_L(s):
return (s.index('L') < s.index('N'))
def G_and_P(s):
return (abs(s.index('G') - s.index('P')) == 2)
def all_perms(s): #this is from the link
if len(s) <=1:
yield s
else:
for perm in all_perms(s[1:]):
for i in range(len(perm)+1):
yield perm[:i] + s[0:1] + perm[i:]
def get_the_answer():
permutations = [i for i in all_perms('GHLOPSN')] #N is the news tape
a = [i for i in permutations if L_before_O(i)]
b = [i for i in a if N_after_L(i)]
c = [i for i in b if G_and_P(i)]
return c
I haven't tested this, but this is general idea of how I would go about coding such a question.
Hope this helps

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pitfalls in R for Python programmers - python

This isn't specifically addressing the Python vs. R background, but the R inferno is a great resource for programmers coming to R.

The accepted answer to this post is possibly a bit outdated. The Pandas Python library now provides amazing R-like DataFrame support.

There may be... but before you embark on that have you tried some of the available Python extensions? Scipy has a list.

Related

8 Queens on a chessboard | PYTHON | Memory Error

Gaussian Elimination in modulo 2 python code

Lua: Decompose a number by powers of 2

Subtract Unless Negative Then Return 0

What category of combinatorial problems appear on the logic games section of the LSAT?

Categories

Resources