This question already has answers here:
Converting pandas dataframe to structured arrays
(4 answers)
Closed 4 years ago.
I want to convert a dataframe:
to array without losing the header column like this:
I tried values:
but have no idea how to keep the column header so I can call it for later use. How can I do it using pandas?
data = data.values
array([[True, 33],
[True, 32],
[True, 31],
...,
[True, 2],
[True, 0],
[True, 0]], dtype=object)
What you're looking for is a way to turn the DataFrame into a structured array, you can find the instructions to do this in the question here Converting pandas dataframe to structured arrays
Related
This question already has answers here:
dataframe to long format
(2 answers)
Reshape wide to long in pandas
(2 answers)
Split pandas column and add last element to a new column
(2 answers)
Get last "column" after .str.split() operation on column in pandas DataFrame
(5 answers)
Closed 8 months ago.
I have a dataframe like this.
df = pd.DataFrame(np.array([[1, 2, 3, 4], [4, 5, 6, 4], [7, 8, 9, 4]]),
columns=['NP_A', 'NP_B', 'NP_C', "OP_A"])
I would like to use sns.lineplot.
But instead of the name of the columns as 'hue' I would like to make a split of the names.
"NP", "A"
"NP", "B"
...
In order to have a legend that considers the two combinations.
This question already has an answer here:
Cumsum as a new column in an existing Pandas data
(1 answer)
Closed 2 years ago.
Let sat I have a DataFrame with column A.
A= (1,2,3,4,5,6...n)
I want to create column B like this:
B=(1,3,6,10,15,21...n)
Explicitly: i+(sum of all the previous numbers)
Probably simple, but hard for me:P Very new to programming
Thanks!
from itertools import accumulate
A = [1, 2, 3, 4, 5, 6]
B = list(accumulate(A)) #->[1, 3, 6, 10, 15, 21]
Consider a Pandas DataFrame with a MultiIndex with all boolean-typed levels (example below). Trying to access specific rows of such a DataFrame by using a boolean label leads to an error:
df = pd.DataFrame([[False, False, 1],
[False, True, 2],
[True, False, 3]], columns=["A", "B", "C"])
df.set_index(["A", "B"], inplace=True)
print( df.loc[[False, False]] ) # IndexError: Item wrong length 2 instead of 3.
How can I access rows in a DataFrame with a a boolean-typed MultiIndex?
You can slice using pd.IndexSlice.
>>> df.loc[pd.IndexSlice[False, False]]
C 1
Name: (False, False), dtype: int64
Access the row using a tuple.
print( df.loc[(False, False)] )
If you only have the values as a list, convert them to a tuple before accessing.
x = [False, False] # possibly result of some previous computation
print( df.loc[tuple(x)] )
Background
Usually, rows in a Pandas DataFrame with a MultiIndex can easily be accessed using lists of values like so:
df = pd.DataFrame([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], columns=["A", "B", "C"])
df.set_index(["A", "B"], inplace=True)
print( df.loc[[1, 2]] ) # prints first row of the DataFrame
The only thing that changed in this example from the question's is the data type of the MultiIndex. The issue with using a boolean data type for the MultiIndex is that it conflicts with boolean indexing.
Boolean indexing allows selecting rows on the basis of some condition. For example, we could select all rows with a value in the C-column of less than 7 using df.loc[df["C"] < 7]. The inner condition results in an array of booleans, so this will be the same as df.loc[[True, True, False]].
This obviously conflicts with the MultiIndex -- If both MultiIndex access using lists and boolean indexing were supported, there would be no way for Pandas to tell which of both is intended. By using a tuple it is made clear that the supplied value is in fact a label, not the result of evaluating some condition.
This question already has answers here:
Translate integers in a numpy array to a contiguous range 0...n
(2 answers)
Closed 4 years ago.
I have a Numpy array with some numbers and I would like to get order the items ascending order.
For example, I have a list:
[4, 25, 100, 4, 50]
And I would like to use a function to get this:
[1, 2, 4, 1, 3]
Any ideas how to do this?
There is a convenient method via pandas:
import pandas as pd
lst = [4, 25, 100, 4, 50]
res = pd.factorize(lst, sort=True)[0] + 1
# [1 2 4 1 3]
This question already has answers here:
Remove duplicate rows of a numpy array [duplicate]
(3 answers)
Closed 6 years ago.
Say I have the following array:
import numpy as np
data = np.array([[51001, 121, 1, 121212],
[51001, 121, 1, 125451],
[51001, 125, 1, 127653]]
I want to remove duplicate rows only by the first 3 elements in a row (first 3 columns).
So the result I will get is:
print data
[[51001, 121, 1, 121212],
[51001, 125, 1, 127653]]
Doesn't matter which row we keep and which row we delete as long as I get the unique by the first 3 columns
Here's one way using drop_duplicates in pandas
In [179]: pd.DataFrame(data).drop_duplicates([0, 1, 2]).values
Out[179]:
array([[ 51001, 121, 1, 121212],
[ 51001, 125, 1, 127653]])