map a data frame to a nested list

map a data frame to a nested list - python

I have a data frame with 2 columns
and I also have a nested list containing the elements of the first column
I want to append the remaining column elements and the index to the nested list
|Column A | Column B |
| -------- | -------- |
|100,2 | A |
|101,5 | B |
|103,6 | C |
|104,6 | D |
|105,7 | E |
the nested list looks like
[[100.2,101.5],[103.6,104.6],[105.7]]
The output
[[[0,100.2,'A'],[1,101.5,'B']],[[3,103.6,'C'],[4,104.6,'D']],[[5,105.7,'E']]]
from a dataframe to a nested list

Use pandas to solve above problem with simplicity.
import pandas as pd
df = pd.DataFrame({'Column A': [100.2,101.5,103.6,103.6,105.7],
'Column B': ['A', 'B', 'C', 'D', 'E']})
grouped = df.groupby('Column A')
result = [[[i, row['Column A'], row['Column B']] for i, row in
group.iterrows()] for _, group in grouped]
print(result)
Result:
[[[0, 100.2, 'A']], [[1, 101.5, 'B']], [[2, 103.6, 'C'], [3, 103.6, 'D']], [[4, 105.7, 'E']]]
Process finished with exit code 0

We can use pandas reset_index() and tolist() functions to solve above
import pandas as pd
df = pd.DataFrame({'Column A': [100.2,101.5,103.6,103.6,105.7],
'Column B': ['A', 'B', 'C', 'D', 'E']})
df = df.reset_index(level=0)
print(df.values.tolist())
output:
[[0, 100.2, 'A'],
[1, 101.5, 'B'],
[2, 103.6, 'C'],
[3, 103.6, 'D'],
[4, 105.7, 'E']]

import pandas as pd
df = pd.DataFrame({'A': [100.2,101.5,103.6,104.6,105.7],
'B': ['A', 'B', 'C', 'D', 'E']})
a = [[100.2,101.5],[103.6,104.6],[105.7]]
df1 = df.reset_index().set_index('A', drop=False)
[[df1.loc[v].tolist() for v in row] for row in a]
produces
[[[0, 100.2, 'A'], [1, 101.5, 'B']],
[[2, 103.6, 'C'], [3, 104.6, 'D']],
[[4, 105.7, 'E']]]

Related

Changing 2-dimensional list to standard matrix form

org = [['A', 'a', 1],
['A', 'b', 2],
['A', 'c', 3],
['B', 'a', 4],
['B', 'b', 5],
['B', 'c', 6],
['C', 'a', 7],
['C', 'b', 8],
['C', 'c', 9]]
I want to change the 'org' to the standard matrix form like below.
transform = [['\t','A', 'B', 'C'],
['a', 1, 4, 7],
['b', 2, 5, 8],
['c', 3, 6, 9]]
I made a small function that converts this.
The code I wrote is below:
import numpy as np
def matrix(li):
column = ['\t']
row = []
result = []
rest = []
for i in li:
if i[0] not in column:
column.append(i[0])
if i[1] not in row:
row.append(i[1])
result.append(column)
for i in li:
for r in row:
if r == i[1]:
rest.append([i[2]])
rest = np.array(rest).reshape((len(row),len(column)-1)).tolist()
for i in range(len(rest)):
rest[i] = [row[i]]+rest[i]
result += rest
for i in result:
print(i)
matrix(org)
The result was this:
>>>['\t', 'school', 'kids', 'really']
[72, 0.008962252017017516, 0.04770759762717251, 0.08993156334317577]
[224, 0.004180594204995023, 0.04450803342634945, 0.04195010047081213]
[385, 0.0021807662921382335, 0.023217182598008267, 0.06564858527712682]
I don't think this is efficient since I use so many for loops.
Is there any efficient way to do this?

Since you are using 3rd party libraries, this is a task well suited for pandas.
There is some messy, but not inefficient, work to incorporate index and columns as per your requirement.
org = [['A', 'a', 1],
['A', 'b', 2],
['A', 'c', 3],
['B', 'a', 4],
['B', 'b', 5],
['B', 'c', 6],
['C', 'a', 7],
['C', 'b', 8],
['C', 'c', 9]]
df = pd.DataFrame(org)
pvt = df.pivot_table(index=0, columns=1, values=2)
cols = ['\t'] + pvt.columns.tolist()
res = pvt.values.T.tolist()
res.insert(0, pvt.index.tolist())
res = [[i]+j for i, j in zip(cols, res)]
print(res)
[['\t', 'A', 'B', 'C'],
['a', 1, 4, 7],
['b', 2, 5, 8],
['c', 3, 6, 9]]

Here's another "manual" way using only numpy:
org_arr = np.array(org)
key1 = np.unique(org_arr[:,0])
key2 = np.unique(org_arr[:,1])
values = org_arr[:,2].reshape((len(key1),len(key2))).transpose()
np.block([
["\t", key1 ],
[key2[:,None], values]
])
""" # alternatively, for numpy < 1.13.0
np.vstack((
np.hstack(("\t", key1)),
np.hstack((key2[:, None], values))
))
"""
For simplicity, it requires the input matrix to be strictly ordered (first col is major and ascending ...).
Output:
Out[58]:
array([['\t', 'A', 'B', 'C'],
['a', '1', '4', '7'],
['b', '2', '5', '8'],
['c', '3', '6', '9']],
dtype='<U1')

How to create a list of column names, row names, and values from data frame in panda?

I am very new to python. I have a data frame that looks like this:
A B
E 0 1
F 2 3
I want to convert this data frame to a list that looks like this:
[[E, A, 0], [E, B, 1], [F, A, 2], [F, B, 3]]
Any idea?

Use
In [197]: df.stack().reset_index().values.tolist()
Out[197]: [['E', 'A', 0L], ['E', 'B', 1L], ['F', 'A', 2L], ['F', 'B', 3L]]

df.reset_index().melt('index').values.tolist()
Out[1423]: [['E', 'A', 0], ['F', 'A', 2], ['E', 'B', 1], ['F', 'B', 3]]

Groupby, pivot and concatenate in Pandas, breaking by date?

I have a dataframe which looks like this:
df = pd.DataFrame([
[123, 'abc', '121'],
[123, 'abc', '121'],
[456, 'def', '121'],
[123, 'abc', '122'],
[123, 'abc', '122'],
[456, 'def', '145'],
[456, 'def', '145'],
[456, 'def', '121'],
], columns=['userid', 'name', 'dt'])
From this question, I have managed to transpose it.
So, the desired df would be:
userid1_date1 name_1 name_2 ... name_n
userid1_date2 name_1 name_2 ... name_n
userid2 name_1 name_2 ... name_n
userid3_date1 name_1 name_2 ... name_n
But, I want to seperate the rows depending on the date. For example, is a user 123 has data in two days, then the rows should be seperate for each day's api events.
I wouldn't really be needing the userid after the transformation, so you can use it anyway.
My plan was:
Group the df w.r.t the dt column
Pivot all the groups such that each looks like this:
userid1_date1 name_1 name_2 ... name_n
Now, concatenate the pivoted data
But, I have no clue how to do this in pandas!

Try:
def tweak(df):
return df.reset_index().name
df.set_index('userid').groupby(level=0).apply(tweak)
Demonstration
df = pd.DataFrame([[1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
[1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
[2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
[2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
], columns=['userid', 'name'])
def tweak(df):
return df.reset_index().name
df.set_index('userid').groupby(level=0).apply(tweak)

Pandas way of getting intersection between two rows in a python Pandas dataframe

Say I have some data that looks like below. I want to get the count of ids that have two tags at the same time.
tag id
a A
b B
a B
b A
c A
What I desire the result:
tag1 tag2 count
a b 2
a c 1
b c 1
In plain python I could write pseudocode:
d = defaultdict(set)
d[tag].add(id)
for tag1, tag2 in itertools.combinations(d.keys(), 2):
print tag1, tag2, len(d[tag1] & d[tag2])
Not the most efficient way but it should work. Now I already have the data stored in Pandas dataframe. Is there a more pandas-way to achieve the same result?

Here is my attempt:
from itertools import combinations
import pandas as pd
import numpy as np
In [123]: df
Out[123]:
tag id
0 a A
1 b B
2 a B
3 b A
4 c A
In [124]: a = np.asarray(list(combinations(df.tag, 2)))
In [125]: a
Out[125]:
array([['a', 'b'],
['a', 'a'],
['a', 'b'],
['a', 'c'],
['b', 'a'],
['b', 'b'],
['b', 'c'],
['a', 'b'],
['a', 'c'],
['b', 'c']],
dtype='<U1')
In [126]: a = a[a[:,0] != a[:,1]]
In [127]: a
Out[127]:
array([['a', 'b'],
['a', 'b'],
['a', 'c'],
['b', 'a'],
['b', 'c'],
['a', 'b'],
['a', 'c'],
['b', 'c']],
dtype='<U1')
In [129]: np.ndarray.sort(a)
In [130]: pd.DataFrame(a).groupby([0,1]).size()
Out[130]:
0 1
a b 4
c 2
b c 2
dtype: int64

extracting a range of elements from a csv / 2d array

I want to extract elements from a range of elements is a specific column from a csv file.
I've simplified the problem to this:
data = [['a',1,'A',100],['b',2,'B',200],['c',3,'C',300],['d',4,'D',400]]
print(data[0:2][:],'\nROWS 0&1')
print(data[:][0:2],'\nCOLS 1&1')
I thought that meant
'show me all columns for just row 0 and 1'
'show me all the rows for just column 0 and 1'
But the output is always just showing me rows 0 and 1, never the columns,
[['a', 1, 'A', 100], ['b', 2, 'B', 200]]
ROWS 0&1
[['a', 1, 'A', 100], ['b', 2, 'B', 200]]
COLS 1&1
when I want to see this:
['a', 1, 'A', 100,'b', 2, 'B', 200] # ... i.e. ROWS 0 and 1
['a','b','c','d',1,2,3,4]
Is there a nice way to do this?

Your problem here is that data[:] is just a copy of data:
>>> data
[['a', 1, 'A', 100], ['b', 2, 'B', 200], ['c', 3, 'C', 300], ['d', 4, 'D', 400]]
>>> data[:]
[['a', 1, 'A', 100], ['b', 2, 'B', 200], ['c', 3, 'C', 300], ['d', 4, 'D', 400]]
... so both your attempts at slicing are giving you the same result as data[0:2].
You can get just columns 0 and 1 with a list comprehension:
>>> [x[0:2] for x in data]
[['a', 1], ['b', 2], ['c', 3], ['d', 4]]
... which can be rearranged to the order you want with zip():
>>> list(zip(*(x[0:2] for x in data)))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
To get a single list rather than a list of 2 tuples, use itertools.chain.from_iterable():
>>> from itertools import chain
>>> list(chain.from_iterable(zip(*(x[0:2] for x in data))))
['a', 'b', 'c', 'd', 1, 2, 3, 4]
... which can also be used to collapse data[0:2]:
>>> list(chain.from_iterable(data[0:2]))
['a', 1, 'A', 100, 'b', 2, 'B', 200]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

map a data frame to a nested list - python

Related

Changing 2-dimensional list to standard matrix form

How to create a list of column names, row names, and values from data frame in panda?

Groupby, pivot and concatenate in Pandas, breaking by date?

Pandas way of getting intersection between two rows in a python Pandas dataframe

extracting a range of elements from a csv / 2d array

Categories

Resources