Need to add column names to numpy array

Need to add column names to numpy array - python

I am trying to create connect 4 game with 6/7 arrray in python, and i need column headers so that column 0 is named a, column 2 is named b, and so on. The purpose of this is for the moves to be initiated by typing 'a' (drops token in first column) 'b' (drops token in second) etc.... This is my code to create the array
def clear_board():
board = np.zeros((6,7))
return board

If you need column names, the easiest way is to use a pandas Dataframe instead of a numpy array:
import pandas as pd
def clear_board():
board = pd.DataFrame(np.zeros((6,7)),columns=list('ABCDEFG'))
return board
>>> clear_board()
A B C D E F G
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Beyond that, take a look at the options provided in this answer

Related

.argmax(axis =1) not working on a numpy array

Hi I am trying to use argmax function on a numpy array but it shows an error.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-914c12a3a737> in <module>()
5 # TODO - Check for data issues
6 # Hint: You can convert from one-hot to integers with argmax
----> 7 train_df1 = train_df1.argmax(axis = 1)
8
9 # Initialise
AttributeError: 'function' object has no attribute 'argmax'
code:
train_df1
<bound method DataFrame.to_numpy of MEL NV BCC AKIEC BKL DF VASC
0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0
2 0.0 1.0 0.0 0.0 0.0 0.0 0.0
3 0.0 1.0 0.0 0.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0 0.0 0.0 0.0
.. ... ... ... ... ... ... ...
194 0.0 1.0 0.0 0.0 0.0 0.0 0.0
195 0.0 1.0 0.0 0.0 0.0 0.0 0.0
196 0.0 1.0 0.0 0.0 0.0 0.0 0.0
197 0.0 1.0 0.0 0.0 0.0 0.0 0.0
198 0.0 0.0 1.0 0.0 0.0 0.0 0.0
[199 rows x 7 columns]>
train_df1 = train_df1.argmax(axis = 1)
does anyone understand why I am getting this?
Thanks

You'll need to post more code for us to duplicate that behaviour, but we can see from the error messages what's going on.
train_df1 is a function. You need to call the function:
my_values = train_df1()
my_max = my_values.argmax(axis = 1)
The main clue is when you type train_df1 at the prompt and it tells you it's a "bound method" that's another name for a function you need to call.
I'm also assuming your train_df1 function returns a numpy array, such that the argmax function call will work.
In one line, it's just:
my_max = train_df1().argmax(axis = 1)
You'll definitely not want to redefine "train_def1" as a variable, so be sure to pick a new named variable to hold the max values returned.

Getting column name where a condition matches in a row

I have a pandas dataframe which looks like this:
A B C D E F G H I
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Now, for each row, I have to check which column contains 1 and then record this column name in a new column. The final dataframe would look like this:
A B C D E F G H I IsTrue
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 B
2 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 A
3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 B
Is there any faster and pythonic way to do it?

Here's one way using DataFrame.dot:
df['isTrue'] = df.astype(bool).dot(df.columns)
A B C D E F G H I isTrue
1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 B
2 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 A
3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 B
For an even better performance you can use:
df['isTrue'] = df.columns[df.to_numpy().argmax(1)]

What you described is the definition of idxmax
>>> df.idxmax(1)
1 B
2 A
3 B
dtype: object

Iterating over pandas DataFrame with identical columns header

I am trying to iterate through rows and columns of the Pandas DataFrame and write that result in a new DataFrame if some condition is met. I am able to iterate on the following DataFrame which has different names for row and column.
W0O5 W1O5 W2O5 W3O5
W0O5 0.0 0.0 0.0 0.0
W1O5 0.0 0.0 1.0 0.0
W2O5 0.0 1.0 0.0 0.0
W3O5 0.0 0.0 0.0 0.0
I used the following approach
for i in pandas_df.index:
for j in pandas_df.columns:
print(i, j)
print(pandas_df.at[i, j])
if pandas_df.at[i, j] ==1:
single_pandas_df.at['WO5', 'WO5_corner'] =1
where single_pandas_df  is the new DataFrame I created, on which I want to add the value at corresponding row and column.
However, when I try to iterate through  DataFrame containing identical header for row and columns as below:
WO5 WO5 WO5 WO5
WO5 0.0 0.0 0.0 0.0
WO5 0.0 0.0 1.0 0.0
WO5 0.0 1.0 0.0 0.0
WO5 0.0 0.0 0.0 0.0
I get the AttributeError saying
AttributeError: 'BlockManager' object has no attribute 'T'
I know the error is due to duplicate column names. I was curious is there any way to handle such case in pandas. I have all of my DataFrames as in the second case and I need to get the values of each index from row and column.
Thanks in advance.
Update after Yolos comment:
Actually I have many such DataFrames as below
DyO7 DyO7 DyO6 DyO7 DyO7 DyO6
DyO7 0.0 3.0 1.0 2.0 1.0 0.0
DyO7 3.0 0.0 0.0 1.0 0.0 1.0
DyO6 1.0 0.0 0.0 0.0 1.0 0.0
DyO7 2.0 1.0 0.0 0.0 3.0 1.0
DyO7 1.0 0.0 1.0 3.0 0.0 0.0
DyO6 0.0 1.0 0.0 1.0 0.0 0.0
and next one as
TaO6 TaO6
TaO6 0.0 1.0
TaO6 1.0 0.0
In these DataFrames 1 ,2 and 3 represents the corner, edge and face sharing. So if (i,j) item in the DataFrame is 1, it goes to "..."_corner, if 2 it goes to edge and 3 goes to face.
my initial single_pandas DataFrame looks like following
DyO6_corner DyO6_edge DyO6_face DyO7_corner DyO7_edge DyO7_face TaO6_corner TaO6_edge TaO6_face WO5_corner WO5_edge WO5_face
DyO6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
DyO7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
TaO6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
WO5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
As from my above script after updating this single_pandas DataFrame, there will be 1 at ('WO5', 'WO5_corner') and it becomes:
DyO6_corner DyO6_edge DyO6_face DyO7_corner DyO7_edge DyO7_face TaO6_corner TaO6_edge TaO6_face WO5_corner WO5_edge WO5_face
DyO6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
DyO7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
TaO6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
WO5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0.0 0.0

Finding the difference between rows of columns using shift

I've been coming here for almost two years now and have always been able to figure things out but I'm stumped now. Hopefully this is a quick answer.
https://github.com/MPhillips55/Capstone-Project-2---League-of-Legends/blob/master/EDA/test_case.csv
The link there is what my data looks like. 'min_0', 'min_1' and so on are gold values for League of Legends games at 1 minute intervals, that continue on to 'min_80'. The csv should be available to download.
I want to subtract the red values from the blue values and store that number on the blue rows for each minute.
Then I want to subtract the blue values from the red values and store that number on the red rows for each minute.
For clarity, I am only interested in the comparison for matching 'match_id's.
Here is an image of my desired output:
Desired Output
I think the right answer is likely something like this:
gold_df.loc[gold_df['red_or_blue_side'] == 'blue', :] = \
BLUE_VALUES - BLUE_VALUES.shifted_down
gold_df.loc[gold_df['red_or_blue_side'] == 'red', :] = \
RED_VALUES - RED_VALUES.shifted_up
I'm not clear on two things with that code. I need to select all the columns except the first two to calculate the differences. I also don't know how to select the values and the shifted values across all the relevant columns.
Thank you for the help. Please let me know if more information is needed.
-Mike

You could groupby match_id and then find the difference in each direction using .diff and then add the two components.
g = df.groupby('match_id', sort=False)[df.columns[2:]]
df = g.diff().fillna(0) + g.diff(-1).fillna(0)
df
min_0 min_1 min_2 min_3 min_4 min_5 min_6 min_7 min_8 min_9 \
0 0.0 15.0 46.0 -133.0 -60.0 -904.0 -505.0 -852.0 -763.0 -1224.0
1 0.0 -15.0 -46.0 133.0 60.0 904.0 505.0 852.0 763.0 1224.0
2 0.0 0.0 0.0 89.0 -92.0 -174.0 191.0 69.0 253.0 362.0
3 0.0 0.0 0.0 -89.0 92.0 174.0 -191.0 -69.0 -253.0 -362.0
4 0.0 0.0 17.0 -106.0 -136.0 400.0 363.0 829.0 1532.0 1862.0
5 0.0 0.0 -17.0 106.0 136.0 -400.0 -363.0 -829.0 -1532.0 -1862.0
... min_71 min_72 min_73 min_74 min_75 min_76 min_77 min_78 \
0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
min_79 min_80
0 0.0 0.0
1 0.0 0.0
2 0.0 0.0
3 0.0 0.0
4 0.0 0.0
5 0.0 0.0

To select all columns except the first two:
df[df.columns[2:]]

To select all columns except the first two:
df.iloc[:,2:]

"Cannot reindex from a duplicate axis" when groupby.apply() on MultiIndex columns

I'm playing around with computing subtotals within a DataFrame that looks like this (note the MultiIndex):
0 1 2 3 4 5
A 1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
B 1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
I can successfully add the subtotals with the following code:
(
df
.groupby(level=0)
.apply(
lambda df: pd.concat(
[df.xs(df.name), df.sum().to_frame('Total').T]
)
)
)
And it looks like this:
0 1 2 3 4 5
A 1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
Total 0.0 0.0 0.0 0.0 0.0 0.0
B 1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
Total 0.0 0.0 0.0 0.0 0.0 0.0
However, when I work with the transposed DataFrame, it does not work. The DataFrame looks like:
A B
1 2 1 2
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
And I use the following code:
(
df2
.groupby(level=0, axis=1)
.apply(
lambda df: pd.concat(
[df.xs(df.name, axis=1), df.sum(axis=1).to_frame('Total')],
axis=1
)
)
)
I have specified axis=1 everywhere I can think of, but I get an error:
ValueError: cannot reindex from a duplicate axis
I would expect the output to be:
A B
1 2 Total 1 2 Total
0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0
Is this a bug? Or have I not specified the axis correctly everywhere? As a workaround, I can obviously transpose the DataFrame, produce the totals, and transpose back, but I'd like to know why it's not working here, and submit a bug report if necessary.
The problem DataFrame can be generated with:
df2 = pd.DataFrame(
np.zeros([6, 4]),
columns=pd.MultiIndex.from_product([['A', 'B'], [1, 2]])
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Need to add column names to numpy array - python

Related

.argmax(axis =1) not working on a numpy array

Getting column name where a condition matches in a row

Iterating over pandas DataFrame with identical columns header

Finding the difference between rows of columns using shift

"Cannot reindex from a duplicate axis" when groupby.apply() on MultiIndex columns

Categories

Resources