Pandas dataframe to 3D array

Pandas dataframe to 3D array - python

I have a dataframe like this
group b c d e label
A 0.577535 0.299304 0.617103 0.378887 1
0.167907 0.244972 0.615077 0.311497 0
B 0.640575 0.768187 0.652760 0.822311 0
0.424744 0.958405 0.659617 0.998765 1
0.077048 0.407182 0.758903 0.273737 0
I want to reshape it into a 3D array which an LSTM could use as input, using padding. So group A should feed in a sequence of length 3 (after padding) and group B of length 3. Desired output something like
array1 = [[[0.577535, 0.299304, 0.617103, 0.378887],
[0.167907, 0.244972, 0.615077, 0.311497],
[0, 0, 0, 0]],
[[0.640575, 0.768187, 0.652760, 0.822311],
[0.424744, 0.958405, 0.659617, 0.998765],
[0.077048, 0.407182, 0.758903, 0.273737]]]
and then the labels have to be reshaped accordingly too
array2 = [[1,
0,
0],
[0,
1,
0]]
How can I put in the padding and reshape my data?

You can first use cumcount to create a count for each group, reindex by MultiIndex.from_product and fill with 0, and finally export to list:
df["count"] = df.groupby("group")["label"].cumcount()
mux = pd.MultiIndex.from_product([df["group"].unique(), range(max(df["count"]+1))], names=["group","count"])
df = df.set_index(["group","count"]).reindex(mux, fill_value=0)
print (df.iloc[:,:4].groupby(level=0).apply(pd.Series.tolist).values.tolist())
[[[0.577535, 0.299304, 0.617103, 0.378887],
[0.167907, 0.24497199999999997, 0.6150770000000001, 0.31149699999999997],
[0.0, 0.0, 0.0, 0.0]],
[[0.640575, 0.768187, 0.65276, 0.822311],
[0.42474399999999995, 0.958405, 0.659617, 0.998765],
[0.077048, 0.40718200000000004, 0.758903, 0.273737]]]
print (df.groupby(level=0)["label"].apply(list).tolist())
[[1, 0, 0], [0, 1, 0]]

I'm assuming your group column consists of many values and not just 1 'A' and 1 'B'. This code worked for me, you can give it a try as well:
import pandas as pd
df = pd.read_csv('file2.csv')
vals = df['group'].unique()
array1 = []
array2 = []
for val in vals:
val_df = df[df.group == val]
val_label = val_df.label
smaller_array = []
label_small_array = []
for label in val_label:
label_small_array.append(label)
array2.append(label_small_array)
for i in range(val_df.shape[0]):
smallest_array = []
for j in val_df.columns:
smallest_array.append(j)
smaller_array.append(smallest_array)
array1.append(smaller_array)

Related

How do I add the counts of two rows where the values in the columns are swapped with respect of the other?

I have a Dataframe as follows:
import pandas as pd
df = pd.DataFrame({'Target': [0 ,1, 2],
'Source': [1, 0, 3],
'Count': [1, 1, 1]})
I have to count how many pairs of Sources and Targets there are. (1,0) and (0,1) will be treated as duplicate, hence the count will be 2.
I need to do it several times as I have 79 nodes in total. Any help will be much appreciated.

import pandas as pd
# instantiate without the 'count' column to start over
In[1]: df = pd.DataFrame({'Target': [0, 1, 2],
'Source': [1, 0, 3]})
Out[1]: Target Source
0 0 1
1 1 0
2 2 3
To count pairs regardless of their order is possible by converting to numpy.ndarray and sorting the rows to make them identical:
In[1]: array = df.values
In[2]: array.sort(axis=1)
In[3]: array
Out[3]: array([[0, 1],
[0, 1],
[2, 3]])
And then turn it back to a DataFrame to perform .value_counts():
In[1]: df_sorted = pd.DataFrame(array, columns=['value1', 'value2'])
In[2]: df_sorted.value_counts()
Out[2]: value1 value2
0 1 2
2 3 1
dtype: int64

Set column based on another column np

I want to make sure column 2 is smaller than column 1 and where it is just set to 0
x = np.array([[0,1],[1,0]])
x = np.where((x[1] > (x[0])), 0, x)
print(x)=>[[0,0],[1,0]]

Maybe this help you:
arr = np.array([[0,1],[1,0]])
arr[arr[:,1] > arr[:,0], 1] = 0
print(arr)
Output:
array([[0, 0],
[1, 0]])

You started with a list (of lists), so I'll give you a list answer.
First define a simple helper function:
def foo(row):
if row[1]<row[0]:
row[1] = 0
return row
And apply it to x row by row:
In [37]: x = [[0,1],[1,0]]
In [38]: [foo(row) for row in x]
Out[38]: [[0, 1], [1, 0]]

How to group consecutive data in 2d array in python

I have a 2d NumPy array that looks like this:
array([[1, 1],
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[5, 1],
[5, 2]])
and I want to group it and have an output that looks something like this:
Col1 Col2
group 1: 1-2, 1-2
group 2: 3-3, 1-1
group 3: 5-5, 1-2
I want to group the columns based on if they are consecutive.
So, for a unique value In column 1, group data in the second column if they are consecutive between rows. Now for a unique grouping of column 2, group column 1 if it is consecutive between rows.
The result can be thought of as corner points of a grid. In the above example, group 1 is a square grid, group 2 is a a point, and group 3 is a flat line.
My system won't allow me to use pandas so I cannot use group_by in that library but I can use other standard libraries.
Any help is appreciated. Thank you

Here you go ...
Steps are:
Get a list xUnique of unique column 1 values with sort order preserved.
Build a list xRanges of items of the form [col1_value, [col2_min, col2_max]] holding the column 2 ranges for each column 1 value.
Build a list xGroups of items of the form [[col1_min, col1_max], [col2_min, col2_max]] where the [col1_min, col1_max] part is created by merging the col1_value part of consecutive items in xRanges if they differ by 1 and have identical [col2_min, col2_max] value ranges for column 2.
Turn the ranges in each item of xGroups into strings and print with the required row and column headings.
Also package and print as a numpy.array to match the form of the input.
import numpy as np
data = np.array([
[1, 1],
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[5, 1],
[5, 2]])
xUnique = list({pair[0] for pair in data})
xRanges = list(zip(xUnique, [[0, 0] for _ in range(len(xUnique))]))
rows, cols = data.shape
iRange = -1
for i in range(rows):
if i == 0 or data[i, 0] > data[i - 1, 0]:
iRange += 1
xRanges[iRange][1][0] = data[i, 1]
xRanges[iRange][1][1] = data[i, 1]
xGroups = []
for i in range(len(xRanges)):
if i and xRanges[i][0] - xRanges[i - 1][0] == 1 and xRanges[i][1] == xRanges[i - 1][1]:
xGroups[-1][0][1] = xRanges[i][0]
else:
xGroups += [[[xRanges[i][0], xRanges[i][0]], xRanges[i][1]]]
xGroupStrs = [ [f'{a}-{b}' for a, b in row] for row in xGroups]
groupArray = np.array(xGroupStrs)
print(groupArray)
print()
print(f'{"":<10}{"Col1":<8}{"Col2":<8}')
[print(f'{"group " + str(i) + ":":<10}{col1:<8}{col2:<8}') for i, (col1, col2) in enumerate(xGroupStrs)]
Output:
[['1-2' '1-2']
['3-3' '1-1']
['5-5' '1-2']]
Col1 Col2
group 0: 1-2 1-2
group 1: 3-3 1-1
group 2: 5-5 1-2

Splitting Pandas dataframe

I'd like to split my time-series data into X and y by shifting the data. The dummy dataframe looks like:
i.e. if the time steps equal to 2, X and y look like: X=[3,0] -> y= [5]
X=[0,5] -> y= [7] (this should be applied to the entire samples (rows))
I wrote the function below, but it returns empty matrices when I pass pandas dataframe to the function.
def create_dataset(dataset, time_step=1):
dataX, dataY = [], []
for i in range (len(dataset)-time_step-1):
a = dataset.iloc[:,i:(i+time_step)]
dataX.append(a)
dataY.append(dataset.iloc[:, i + time_step ])
return np.array(dataX), np.array(dataY)
Thank you for any solutions.

Here is an example that replicates the example, IIUC:
import pandas as pd
# function to process each row
def process_row(s):
assert isinstance(s, pd.Series)
return pd.concat([
s.rename('timestep'),
s.shift(-1).rename('x_1'),
s.shift(-2).rename('x_2'),
s.shift(-3).rename('y')
], axis=1).dropna(how='any', axis=0).astype(int)
# test case for the example
process_row( pd.Series([2, 3, 0, 5, 6]) )
# type in first two rows of the data frame
df = pd.DataFrame(
{'x-2': [3, 2], 'x-1': [0, 3],
'x0': [5, 0], 'x1': [7, 5], 'x2': [1, 6]})
# perform the transformation
ts = list()
for idx, row in df.iterrows():
t = process_row(row)
t.index = [idx] * t.index.size
ts.append(t)
print(pd.concat(ts))
# results
timestep x_1 x_2 y
0 3 0 5 7
0 0 5 7 1
1 2 3 0 5 <-- first part of expected results
1 3 0 5 6 <-- second part

Do you mean something like this:
df = df.shift(periods=-2, axis='columns')
# you can also pass a fill values parameter
df = df.shift(periods=-2, axis='columns', fill_value = 0)

python - applying a mask to an array in a for loop

I have this code:
import numpy as np
result = {}
result['depth'] = [1,1,1,2,2,2]
result['generation'] = [1,1,1,2,2,2]
result['dimension'] = [1,2,3,1,2,3]
result['data'] = [np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0])]
for v in np.unique(result['depth']):
temp_v = (result['depth'] == v)
values_v = [result[string][temp_v] for string in result.keys()]
this_v = dict(zip(result.keys(), values_v))
in which I want to create a new dictcalled 'this_v', with the same keys as the original dict result, but fewer values.
The line:
values_v = [result[string][temp_v] for string in result.keys()]
gives an error
TypeError: only integer scalar arrays can be converted to a scalar index
which I don't understand, since I can create ex = result[result.keys()[0]][temp_v] just fine. It just does not let me do this with a for loop so that I can fill the list.
Any idea as to why it does not work?

In order to solve your problem (finding and dropping duplicates) I encourage you to use pandas. It is a Python module that makes your life absurdly simple:
import numpy as np
result = {}
result['depth'] = [1,1,1,2,2,2]
result['generation'] = [1,1,1,2,2,2]
result['dimension'] = [1,2,3,1,2,3]
result['data'] = [np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]),\
np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0])]
# Here comes pandas!
import pandas as pd
# Converting your dictionary of lists into a beautiful dataframe
df = pd.DataFrame(result)
#> data depth dimension generation
# 0 [0, 0, 0] 1 1 1
# 1 [0, 0, 0] 1 2 1
# 2 [0, 0, 0] 1 3 1
# 3 [0, 0, 0] 2 1 2
# 4 [0, 0, 0] 2 2 2
# 5 [0, 0, 0] 2 3 2
# Dropping duplicates... in one single command!
df = df.drop_duplicates('depth')
#> data depth dimension generation
# 0 [0, 0, 0] 1 1 1
# 3 [0, 0, 0] 2 1 2
If you want oyur data back in the original format... you need yet again just one line of code!
df.to_dict('list')
#> {'data': [array([0, 0, 0]), array([0, 0, 0])],
# 'depth': [1, 2],
# 'dimension': [1, 1],
# 'generation': [1, 2]}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe to 3D array - python

Related

How do I add the counts of two rows where the values in the columns are swapped with respect of the other?

Set column based on another column np

How to group consecutive data in 2d array in python

Splitting Pandas dataframe

python - applying a mask to an array in a for loop

Categories

Resources