Sampling a matrix with conditions (no zeros or repeated columns) - python

In case you are interested in the background of the question, I'm thinking how to solve this post- incidentally, if you solve it there, I'll just erase this question. Ideally, I'd like to get an analytical or algebraic solution (constrained non-capturing rook problem), but short of that I'd like a simulation. Incidentally, I posted a related question without as much detail, in case it is easier to tackle.
But you don't have to leave this page. Basically there are pairings of two lists of soccer teams, and some pairings are good, while others are forbidden by the rules. This is the matrix:
So to generate multiple samplings to match the teams on the row names (to the left) with the column names of opposing teams (at the top), I have to come up with a conditional sampling procedure, but I have no clue how to.
This is what I have attempted so far:
BCN = c(0,2,3,4,0,0,7,8)
ATL = c(0,0,3,4,5,0,7,8)
DOR = c(0,0,3,4,5,6,7,0)
MON = c(1,2,3,0,5,6,7,0)
ARS = c(1,2,3,0,0,6,7,8)
LEI = c(1,2,3,4,0,6,0,8)
JUV = c(1,2,3,4,5,0,7,8)
NAP = c(1,2,0,4,5,6,7,8)
chessboard = t(as.matrix(data.frame(BCN, ATL, DOR, MON, ARS, LEI, JUV, NAP)))
colnames(chessboard) = c("MAD", "BYN", "BEN", "PSG", "MCY", "SEV", "OPO", "LEV")
chessboard
MAD BYN BEN PSG MCY SEV OPO LEV
BCN 0 2 3 4 0 0 7 8
ATL 0 0 3 4 5 0 7 8
DOR 0 0 3 4 5 6 7 0
MON 1 2 3 0 5 6 7 0
ARS 1 2 3 0 0 6 7 8
LEI 1 2 3 4 0 6 0 8
JUV 1 2 3 4 5 0 7 8
NAP 1 2 0 4 5 6 7 8
match = function(){
vec = rep(0,8)
for(i in 1:8){
tryCatch({vec[i] = as.numeric(sample(as.character(chessboard[i,][!(chessboard[i,] %in% vec) & chessboard[i,] > 0]),1))
last=chessboard[8,][!(chessboard[8,] %in% vec) & chessboard[i,] > 0]
},error=function(e){})
}
vec
}
match()
set.seed(0)
nsim = 100000
matches = t(replicate(nsim, match()))
matches = subset(matches, matches[,8]!=0)
colnames(matches) = c("BCN", "ATL", "DOR", "MON", "ARS", "LEI", "JUV", "NAP")
head(matches)
table = apply(matches, 2, function(x) table(x)/nrow(matches))
table
$BCN
x
2 3 4 7 8
0.1969821 0.2125814 0.1967272 0.1967166 0.1969927
$ATL
x
3 4 5 7 8
0.2016226 0.1874462 0.2357732 0.1875737 0.1875843
$DOR
x
3 4 5 6 7
0.1773264 0.1686188 0.2097673 0.2787270 0.1655605
$MON
x
1 2 3 5 6 7
0.2567882 0.2031199 0.1172017 0.1341921 0.1789617 0.1097365
$ARS
x
1 2 3 6 7 8
0.2368882 0.1907169 0.1104480 0.1651358 0.1026112 0.1941999
$LEI
x
1 2 3 4 6 8
0.2129743 0.1717302 0.1019210 0.1856410 0.1511081 0.1766255
$JUV
x
1 2 3 4 5 7 8
0.15873252 0.12940289 0.07889902 0.14203948 0.22837179 0.12845781 0.13409648
$NAP
x
1 2 4 5 6 7 8
0.1346168 0.1080481 0.1195272 0.1918956 0.2260675 0.1093436 0.1105011

Maybe try this:
matches = setNames(as.list(rep(NA,8)), rownames(mat))
set.seed(1)
# For each row, sample a column, then drop that column.
# 'sample.int' will automatically renormalize the probabilities.
for (i in sample.int(8)) {
team_i = rownames(mat)[i]
j = sample.int(ncol(mat), 1, prob=mat[i,])
matches[[team_i]] = colnames(mat)[j]
mat = mat[,-j,drop=FALSE]
}
> matches
# $Barcelona
# [1] "Oporto"
#
# $Atletico
# [1] "Benfica"
#
# $Dortmund
# [1] "Paris"
#
# $Juventus
# [1] "City"
#
# $Arsenal
# [1] "Sevilla"
#
# $Napoli
# [1] "Leverkusen"
#
# $Monaco
# [1] "Bayern"
#
# $Leicester
# [1] "Madrid"
Might be a good idea to add restrictions so you don't end up with a row of zeros.

Related

python panda apply compare to external list and remove part of list

I have a parking lot with cars of different models (nr) and the cars are so closely packed that in order for one to get out one might need to move some others. A little like a 15Puzzle, only I can take one or more cars out of the parking lot. Ordered_car_List includes the cars that will be picked up today, and they need to be taken out of the parking lot with as few non-ordered cars as possible moved. There are more columns to this panda, but this is what I can't figure out.
I have a Program that works good for small sets of data, but it seems that this is not the way of the PANDAS :-)
I have this:
cars = pd.DataFrame({'x': [1,1,1,1,1,2,2,2,2],
'y': [1,2,3,4,5,1,2,3,4],
'order_number':[6,6,7,6,7,9,9,10,12]})
cars['order_number_no_dublicates_down'] = None
Ordered_car_List = [6,9,9,10,28]
i=0
while i < len(cars):
temp_val = cars.at[i, 'order_number']
if temp_val in Ordered_car_List:
cars.at[i, 'order_number_no_dublicates_down'] = temp_val
Ordered_car_List.remove(temp_val)
i+=1
If I use cars.apply(lambda..., how can I change the Ordered_car_List in each iteration?
Is there another approach that I can take?
I found this page, and it made me want to be faster. The Lambda approach is in the middle when it comes to speed, but it still is so much faster than what I am doing now.
https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06
Updating cars
We can vectorize this based on two counters:
cumcount() to cumulatively count each unique value in cars['order_number']
collections.Counter() to count each unique value in Ordered_car_List
cumcount = cars.groupby('order_number').cumcount().add(1)
maxcount = cars['order_number'].map(Counter(Ordered_car_List))
# order_number cumcount maxcount
# 0 6 1 1
# 1 6 2 1
# 2 7 1 0
# 3 6 3 1
# 4 7 2 0
# 5 9 1 2
# 6 9 2 2
# 7 10 1 1
# 8 12 1 0
So then we only want to keep cars['order_number'] where cumcount <= maxcount:
either use DataFrame.loc[]
cars.loc[cumcount <= maxcount, 'nodup'] = cars['order_number']
or Series.where()
cars['nodup'] = cars['order_number'].where(cumcount <= maxcount)
or Series.mask() with the condition inverted
cars['nodup'] = cars['order_number'].mask(cumcount > maxcount)
Updating Ordered_car_List
The final Ordered_car_List is a Counter() difference:
Used_car_List = cars.loc[cumcount <= maxcount, 'order_number']
# [6, 9, 9, 10]
Ordered_car_List = list(Counter(Ordered_car_List) - Counter(Used_car_List))
# [28]
Final output
cumcount = cars.groupby('order_number').cumcount().add(1)
maxcount = cars['order_number'].map(Counter(Ordered_car_List))
cars['nodup'] = cars['order_number'].where(cumcount <= maxcount)
# x y order_number nodup
# 0 1 1 6 6.0
# 1 1 2 6 NaN
# 2 1 3 7 NaN
# 3 1 4 6 NaN
# 4 1 5 7 NaN
# 5 2 1 9 9.0
# 6 2 2 9 9.0
# 7 2 3 10 10.0
# 8 2 4 12 NaN
Used_car_List = cars.loc[cumcount <= maxcount, 'order_number']
Ordered_car_List = list(Counter(Ordered_car_List) - Counter(Used_car_List))
# [28]
Timings
Note that your loop is still very fast with small data, but the vectorized counter approach just scales much better:

Create dynamic nested for loops

I have some arrays m rows by 2 `columns (like series of coordinates) and I want to automatize my code so that I will not use nested loop for every coord. Here is my code it runs well and gives right answer coordinates but I want to make a dynamic loop:
import numpy as np
A = np.array([[1,5,7,4,6,2,2,6,7,2],[2,8,2,9,3,9,8,5,6,2],[3,4,0,2,4,3,0,2,6,7],\
[1,5,7,3,4,5,2,7,9,7],[6,2,8,8,6,7,9,6,9,7],[0,2,0,3,3,5,2,3,5,5],[5,5,5,0,6,6,8,5,9,0]\
,[0,5,7,6,0,6,9,9,6,7],[5,5,8,5,0,8,5,3,5,5],[0,0,6,3,3,3,9,5,9,9]])
number = 8292
number = np.asarray([int(i) for i in str(number)]) #split number into array
#the coordinates of every single value contained in required number
coord1=np.asarray(np.where(A == number[0])).T
coord2=np.asarray(np.where(A == number[1])).T
coord3=np.asarray(np.where(A == number[2])).T
coord4=np.asarray(np.where(A == number[3])).T
coordinates = np.array([[0,0]]) #initialize the array that will return all the desired coordinates
solutions = 0 #initialize the array that will give the number of solutions
for j in coord1:
j = j.reshape(1, -1)
for i in coord2 :
i=i.reshape(1, -1)
if (i[0,0]==j[0,0]+1 and i[0,1]==j[0,1]) or (i[0,0]==j[0,0]-1 and i[0,1]==j[0,1]) or (i[0,0]==j[0,0] and i[0,1]==j[0,1]+1) or (i[0,0]==j[0,0] and i[0,1]==j[0,1]-1) :
for ii in coord3 :
ii=ii.reshape(1, -1)
if (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0]+1 and ii[0,1]==i[0,1]) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0]-1 and ii[0,1]==i[0,1]) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0] and ii[0,1]==i[0,1]+1) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0] and ii[0,1]==i[0,1]-1) :
for iii in coord4 :
iii=iii.reshape(1, -1)
if (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0]+1 and iii[0,1]==ii[0,1]) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0]-1 and iii[0,1]==ii[0,1]) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0] and iii[0,1]==ii[0,1]+1) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0] and iii[0,1]==ii[0,1]-1) :
point = np.concatenate((j,i,ii,iii))
coordinates = np.append(coordinates,point,axis=0)
solutions +=1
coordinates = np.delete(coordinates, (0), axis=0)
import itertools
A = [1, 2, 3]
B = [4, 5, 6]
C = [7, 8, 9]
for (a, b, c) in itertools.product (A, B, C):
print (a, b, c);
outputs:
1 4 7
1 4 8
1 4 9
1 5 7
1 5 8
1 5 9
1 6 7
1 6 8
1 6 9
2 4 7
2 4 8
2 4 9
2 5 7
2 5 8
2 5 9
2 6 7
2 6 8
2 6 9
3 4 7
3 4 8
3 4 9
3 5 7
3 5 8
3 5 9
3 6 7
3 6 8
3 6 9
See documentation for details.

How to delete a matrix cell's neighbors which are the same value with it

I have a matrix as shown below (taken from a txt file with an argument), and every cell has neighbors. Once you pick a cell, that cell and all neighboring cells that containing the same number will disappear.
1 0 4 7 6 8
0 5 4 4 5 5
2 1 4 4 4 6
4 1 3 7 4 4
I've tried to do this with using recursion. I separated function four parts which are up(), down() , left() and right(). But I got an error message: RecursionError: maximum recursion depth exceeded in comparison
cmd=input("Row,column:")
cmdlist=command.split(",")
row,column=int(cmdlist[0]),int(cmdlist[1])
num=lines[row-1][column-1]
def up(x,y):
if lines[x-2][y-1]==num and x>1:
left(x,y)
right(x,y)
lines[x-2][y-1]=None
def left(x,y):
if lines[x-1][y-2]==num and y>1:
up(x,y)
down(x,y)
lines[x-1][y-2]=None
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
up(x,y)
down(x,y)
lines[x-1][y]=None
def down(x,y):
if lines[x][y-1]==num and x<len(lines):
left(x,y)
right(x,y)
lines[x][y-1]=None
up(row,column)
down(row,column)
for i in lines:
print(str(i).strip("[]").replace(",","").replace("None"," "))
When I give the input (3,3) which represents the number of "4", the output must be like this:
1 0 7 6 8
0 5 5 5
2 1 6
4 1 3 7
I don't need fixed code, just the main idea will be enough. Thanks a lot.
Recursion error happens when your recursion does not terminate.
You can solve this without recursing using set's of indexes:
search all indexes that contain the looked for number into all_num_idx
add the index you are currently at (your input) to a set tbd (to be deleted)
loop over the tbd and add all indexed from all_num_idx that differ only in -1/+1 in row or col to any index thats already in the set
do until tbd does no longer grow
delete all indexes from tbd:
t = """4 0 4 7 6 8
0 5 4 4 5 5
2 1 4 4 4 6
4 1 3 7 4 4"""
data = [k.strip().split() for k in t.splitlines()]
row,column=map(int,input("Row,column:").strip().split(";"))
num = data[row][column]
len_r =len(data)
len_c = len(data[0])
all_num_idx = set((r,c) for r in range(len_r) for c in range(len_c) if data[r][c]==num)
tbd = set( [ (row,column)] ) # inital field
tbd_size = 0 # different size to enter while
done = set() # we processed those already
while len(tbd) != tbd_size: # loop while growing
tbd_size=len(tbd)
for t in tbd:
if t in done:
continue
# only 4-piece neighbourhood +1 or -1 in one direction
poss_neighbours = set( [(t[0]+1,t[1]), (t[0],t[1]+1),
(t[0]-1,t[1]), (t[0],t[1]-1)] )
# 8-way neighbourhood with diagonals
# poss_neighbours = set((t[0]+a,t[1]+b) for a in range(-1,2) for b in range(-1,2))
tbd = tbd.union( poss_neighbours & all_num_idx)
# reduce all_num_idx by all those that we already addded
all_num_idx -= tbd
done.add(t)
# delete the indexes we collected
for r,c in tbd:
data[r][c]=None
# output
for line in data:
print(*(c or " " for c in line) , sep=" ")
Output:
Row,column: 3,4
4 0 7 6 8
0 5 5 5
2 1 6
4 1 3 7
This is a variant of a "flood-fill-algorythm" flooding only cells of a certain value. See https://en.wikipedia.org/wiki/Flood_fill
Maybe you should replace
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
up(x,y)
down(x,y)
lines[x-1][y]=None
by
def right(x,y):
if lines[x-1][y]==num and y<len(lines[row-1]):
lines[x-1][y]=None
up(x - 1,y)
down(x - 1,y)
right(x - 1, y)
and do the same for all the other functions.
Putting lines[x-1][y]=None ensure that your algorithm stops and changing the indices ensure that the next step of your algorithm will start from the neighbouring cell.

pandas display: truncate column display rather than wrapping

With lengthy column names, DataFrames will display in a very messy form seemingly no matter what options are set.
Info: I'm in Jupyter QtConsole, pandas 0.20.1, with the following relevant options specified at startup:
pd.set_option('display.max_colwidth', 20)
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_rows', 25)
Question: how can I truncate the DataFrame if necessary rather than wrapping the columns to the next line, while keeping expand_frame_repr=False?
Here's an example. Again, the issue doesn't depend on the number of columns but length of the columns.
This will not cause an issue:
df = pd.DataFrame(np.random.randn(1000, 1000),
columns=['col' + str(i) for i in range(1000)])
As the output is perfectly readable and looks like:
The same DataFrame with long column names causes the issue I'm talking about:
df = pd.DataFrame(np.random.randn(1000, 1000),
columns=['very_long_col_name_'
+ str(i) for i in range(1000)])
Is there any way to conform the second output to be like the first that I'm missing? (Through specifying an option, not through using .iloc every time I want to view.)
Use max_columns
from string import ascii_letters
df = pd.DataFrame(np.random.randint(10, size=(5, 52)), columns=list(ascii_letters))
with pd.option_context(
'display.max_colwidth', 20,
'expand_frame_repr', False,
'display.max_rows', 25,
'display.max_columns', 5,
):
print(df.add_prefix('really_long_column_name_'))
really_long_column_name_a really_long_column_name_b ... really_long_column_name_Y really_long_column_name_Z
0 8 1 ... 1 9
1 8 5 ... 2 1
2 5 0 ... 9 9
3 6 8 ... 0 9
4 1 2 ... 7 1
[5 rows x 52 columns]
Another idea... Obviously not exactly what you want, but maybe you can twist it to your needs.
d1 = df.add_suffix('_really_long_column_name')
with pd.option_context('display.max_colwidth', 4, 'expand_frame_repr', False):
mw = pd.get_option('display.max_colwidth')
print(d1.rename(columns=lambda x: x[:mw-3] + '...' if len(x) > mw else x))
a... b... c... d... e... f... g... h... i... j... ... Q... R... S... T... U... V... W... X... Y... Z...
0 6 5 5 5 8 3 5 0 7 6 ... 9 0 6 9 6 8 4 0 6 7
1 0 5 4 7 2 5 4 3 8 7 ... 8 1 5 3 5 9 4 5 5 3
2 7 2 1 6 5 1 0 1 3 1 ... 6 7 0 9 9 5 2 8 2 2
3 1 8 7 1 4 5 5 8 8 3 ... 3 6 5 7 1 0 8 1 4 0
4 7 5 6 2 4 9 7 9 0 5 ... 6 8 1 6 3 5 4 2 3 2
Looks like it will need an enhancement. The relevant code in the repr function appears to be here:
max_rows = get_option("display.max_rows")
max_cols = get_option("display.max_columns")
show_dimensions = get_option("display.show_dimensions")
if get_option("display.expand_frame_repr"):
width, _ = console.get_console_size()
else:
width = None
self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
line_width=width, show_dimensions=show_dimensions)
So either you pass expand_frame_repr=True and it wraps on the line width, or you pass expand_frame_repr=False and it shouldn't. But it looks like there is a bug in the code (this should be pandas 0.20.3 iirc):
in pd.io.formats.format.DataFrameFormatter:
def _chk_truncate(self):
"""
Checks whether the frame should be truncated. If so, slices
the frame up.
"""
from pandas.core.reshape.concat import concat
# Column of which first element is used to determine width of a dot col
self.tr_size_col = -1
# Cut the data to the information actually printed
max_cols = self.max_cols
max_rows = self.max_rows
if max_cols == 0 or max_rows == 0: # assume we are in the terminal
# (why else = 0)
(w, h) = get_terminal_size()
self.w = w
self.h = h
if self.max_rows == 0:
dot_row = 1
prompt_row = 1
if self.show_dimensions:
show_dimension_rows = 3
n_add_rows = (self.header + dot_row + show_dimension_rows +
prompt_row)
# rows available to fill with actual data
max_rows_adj = self.h - n_add_rows
self.max_rows_adj = max_rows_adj
# Format only rows and columns that could potentially fit the
# screen
if max_cols == 0 and len(self.frame.columns) > w:
max_cols = w
if max_rows == 0 and len(self.frame) > h:
max_rows = h
Looks like it intended to do what you wanted, but was unfinished. It's checking max_cols against the number of columns, not the total width of the columns.
So you could either create a show_df function that would calculate the correct number of columns and show it in an option_context like pi2Squared's answer, or fix it here (and maybe submit a patch if you need it distributed).
As others have pointed out, Pandas itself seems to be bugged or badly designed here, so a workaround is required.
Most of the time this problem occurs with numerical columns, since numbers are relatively short. Pandas will split the column heading onto multiple lines if there are spaces in it, so you can "hack in" the correct behavior by inserting spaces into column headings for numerical columns when you display the dataframe. I have a one-liner to do this:
def colfix(df, L=5): return df.rename(columns=lambda x: ' '.join(x.replace('_', ' ')[i:i+L] for i in range(0,len(x),L)) if df[x].dtype in ['float64','int64'] else x )
do display your dataframe, simply type
colfix(your_df)
note that the renaming is not going to permanently change the dataframe, it will only add spaces to the names for the purposes of displaying it that one time.
Results (in a Jupyter Notebook):
With colfix:
Without:

All 6-Number Permutations from a List

I'm writing a program, and the goal is to take a list of numbers and return all the six-letter combinations for it using a recursive function (without importing a function to do it for me). Say, for example, my numbers are "1 2 3 4 5 6 7 8 9", output would be:
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
1 2 3 4 5 9
1 2 3 4 6 7
1 2 3 4 6 8
1 2 3 4 6 9
1 2 3 4 7 8
... etcetera, all the way down to
4 5 6 7 8 9
I'm not looking for code, persay, just a push in the right direction conceptually. What I've attempted thus far has failed and I've driven myself into a logical rut.
I've included the code I used before below, but it isn't really a recursive function and only seems to work for 6-8-digit values. It's very messy, and I'd be fine with scrapping it entirely:
# Function prints all the possible 6-number combinations for a group of numbers
def lotto(constantnumbers, variablenumbers):
# Base case: No more constant variables, or only 6 numbers to begin with
if len(constantnumbers) == 0 or len(variablenumbers) == 0:
if len(constantnumbers) == 0:
print(" ".join(variablenumbers[1:7]))
else:
print(" ".join(constantnumbers[0:6]))
i = 6 - len(constantnumbers)
outvars = variablenumbers[1:i + 1]
if len(variablenumbers) > len(outvars) + 1:
print(" ".join(constantnumbers + outvars))
for index in range(len(outvars), 0, -1):
outvars[index - 1] = variablenumbers[index + 1]
print(" ".join(constantnumbers + outvars))
else:
i = 6 - len(constantnumbers)
outvars = variablenumbers[1:i + 1]
print(" ".join(constantnumbers + outvars))
if len(variablenumbers) > len(outvars) + 1:
for index in range(len(outvars), 0, -1):
outvars[index - 1] = variablenumbers[index + 1]
print(" ".join(constantnumbers + outvars))
#Reiterates the function until there are no more constant numbers
lotto(constantnumbers[0:-1], constantnumbers[-1:] + variablenumbers)
import itertools
for combo in itertools.combinations(range(1,10), 6):
print(" ".join(str(c) for c in combo))
which gives
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
...
3 4 6 7 8 9
3 5 6 7 8 9
4 5 6 7 8 9
Edit: ok, here is a recursive definition:
def combinations(basis, howmany):
for index in range(0, len(basis) - howmany + 1):
if howmany == 1:
yield [basis[index]]
else:
this, remainder = basis[index], basis[index+1:]
for rest in combinations(remainder, howmany - 1):
yield [this] + rest
Edit2:
Base case: A 1-item combination is any basis item.
Induction: An N-item combination is any basis item plus an (N-1)-item combination from the remaining basis.

Categories