How to concatenate two lists into pandas DataFrame? - python

Hey I have two different lists:
One is list of the strings:
['A',
'B',
'C',
'D',
'E']
Second list contains floats:
[(-0.07154222477384509, 0.03681057318023705),
(-0.23678194754416643, 3.408617573881597e-12),
(-0.24277881018771763, 6.991906304566735e-13),
(-0.16858465905189185, 7.569580517034595e-07),
(-0.21850787663602167, 1.1718560531238815e-10)]
I want have one DataFrame with three columns that look like this:
var_name val1 val2
A -0.07154222477384509 0.03681057318023705
Best if the new DataFrame dont have scientific notation so I dont want them as strings.

Use list copmprehension with zip for list of tuples and pass toDataFrame constructor:
a = ['A',
'B',
'C',
'D',
'E']
b = [(-0.07154222477384509, 0.03681057318023705),
(-0.23678194754416643, 3.408617573881597e-12),
(-0.24277881018771763, 6.991906304566735e-13),
(-0.16858465905189185, 7.569580517034595e-07),
(-0.21850787663602167, 1.1718560531238815e-10)]
df = pd.DataFrame([(a, *b) for a, b in zip(a,b)])
print (df)
0 1 2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10
With set columns names:
df = pd.DataFrame([(a, *b) for a, b in zip(a,b)],
columns=['var_name','val1','val2'])
print (df)
var_name val1 val2
0 A -0.071542 3.681057e-02
1 B -0.236782 3.408618e-12
2 C -0.242779 6.991906e-13
3 D -0.168585 7.569581e-07
4 E -0.218508 1.171856e-10

Related

how to groupby and join multiple rows from multiple columns at a time?

I want to know how to groupby a single column and join multiple column strings each row.
Here's an example dataframe:
df = pd.DataFrame(np.array([['a', 'a', 'b', 'b'], [1, 1, 2, 2],
['k', 'l', 'm', 'n']]).T,
columns=['a', 'b', 'c'])
print(df)
a b c
0 a 1 k
1 a 1 l
2 b 2 m
3 b 2 n
I've tried something like,
df.groupby(['b', 'a'])['c'].apply(','.join).reset_index()
b a c
0 1 a k,l
1 2 b m,n
But that is not my required output,
Desired output:
a b c
0 1 a,a k,l
1 2 b,b m,n
How can I achieve this? I need a scalable solution because I'm dealing with millions of rows.
I think you need grouping by b column only and then if necessary create list of columns for apply function with GroupBy.agg:
df1 = df.groupby('b')['a','c'].agg(','.join).reset_index()
#alternative if want join all columns without b
#df1 = df.groupby('b').agg(','.join).reset_index()
print (df1)
b a c
0 1 a,a k,l
1 2 b,b m,n

pandas filter Series with a list

I have a Series and a list like this
$ import pandas as pd
$ s = pd.Series(data=[1, 2, 3, 4], index=['A', 'B', 'C', 'D'])
$ filter_list = ['A', 'C', 'D']
$ print(s)
A 1
B 2
C 3
D 4
How can I create a new Series with row B removed using s and filter_list?
I mean I want to create a Series new_s with the following content
$ print(new_s)
A 1
C 3
D 4
s.isin(filter_list) doesn't work. Because I want to filter based on the index of the Series, not the values of the Series.
Use Series.loc if all values of list exist in index:
new_s = s.loc[filter_list]
print (new_s)
A 1
C 3
D 4
dtype: int64
If possible some not exist use Index.intersection or isin like #Yusuf Baktir solution:
filter_list = ['A', 'C', 'D', 'E']
new_s = s.loc[s.index.intersection(filter_list)]
print (new_s)
A 1
C 3
D 4
dtype: int64
Another alternative with numpy.in1d:
filter_list = ['A', 'C', 'D', 'E']
new_s = s[np.in1d(s.index, filter_list)]
print (new_s)
A 1
C 3
D 4
dtype: int64
Basically, those are the index values. So, filtering on index will work
s[s.index.isin(filter_list)]
for i in filter_list:
print(i,s[i])
A 1
C 3
D 4

How to select list of list elememts and make different columns in a single dataframe?

List1 = [[1,A,!,a],[2,B,#,b],[7,C,&,c],[1,B,#,c],[4,D,#,p]]
Output should be like this:
Each different column should contain 1 value of each sublist elements
for example
column1:[1,2,7,1,4]
column2:[A,B,C,B,D]
column3:[!,#,&,#,#]
column4:[a,b,c,c,p]
in the same dataframe
Assuming that you actually meant for List1 to be this (all elements are strings):
list1 = [["1","A","!","a"],["2","B","#","b"],["7","C","&","c"],["1","B","#","c"],["4","D","#","p"]]
I don't think that you need to do anything except pass List1 to the DataFrame constructor. There are several ways to pass information to a DataFrame. Using lists of lists constructs un-named columns.
print(pd.DataFrame(list1))
0 1 2 3
0 1 A ! a
1 2 B # b
2 7 C & c
3 1 B # c
4 4 D # p
Given the below list file:
l = [['1', 'A', '!', 'a'], ['2', 'B', '#', 'b'], ['7', 'C', '&', 'c'], ['1', 'B', '#', 'c'], ['4', 'D', '#', 'p']]
You can use pandas.Dataframe for converting it as below:
import pandas as pd
pd.DataFrame(l, columns=['c1', 'c2', 'c3', 'c4'])
# columns parameter for passing customized column names
Result:
c1 c2 c3 c4
0 1 A ! a
1 2 B # b
2 7 C & c
3 1 B # c
4 4 D # p
As commented (and illustrated by John L.'s answer), pandas.DataFrame should be sufficient. If what you actually want is a transposed dataframe, try transpose manually:
import pandas as pd
df = pd.DataFrame(List1).T
Or beforehand using zip:
df = pd.DataFrame(list(zip(*List1)))
Both of which returns:
0 1 2 3 4
0 1 2 7 1 4
1 A B C B D
2 ! # & # #
3 a b c c p

Python - Group multiple values from a column to create "Other" values

I have this dataset:
Field
A
A
A
B
C
C
C
D
C
C
C
A
This has been read into pandas through the following code:
data = read_csv('data.csv', header=None)
print(data.describe())
How can I transform the column to get the below result?
Field
A
A
A
Others
C
C
C
Others
C
C
C
A
I want to transform values B and D, since they have low frequency, to an aggregate value "Others".
Here is one way:
import pandas as pd
df = pd.DataFrame({'Field': ['A', 'A', 'A', 'B', 'C', 'C', 'C',
'D', 'C', 'C', 'C', 'C', 'A']})
n = 2
counts = df['Field'].value_counts()
others = set(counts[counts < n].index)
df['Field'] = df['Field'].replace(list(others), 'Others')
Result
Field
0 A
1 A
2 A
3 Others
4 C
5 C
6 C
7 Others
8 C
9 C
10 C
11 C
12 A
Explanation
First get the counts of each value in Field via value_counts.
Filter for values which occur less than n times. n is user-configurable.
Finally replace those values with 'Others'.

count unique lists in dataframe

I have a pandas dataframe with a column of lists, and I would like to find a way to return a dataframe with the lists in one column and the total counts in another. My problem is finding a way to add together list that contain the same values, for example I want to find the total of ['a', 'b'] and ['b', 'a'] in the end.
So for example the dataframe:
Lists Count
['a','b'] 2
['a','c'] 4
['b','a'] 3
would return:
Lists Count
['a','b'] 5
['a','c'] 4
list are unhashable. so, sort and convert to tuple,
In [80]: df
Out[80]:
count lists
0 2 [a, b]
1 4 [a, c]
2 3 [b, a]
In [82]: df['lists'] = df['lists'].map(lambda x: tuple(sorted(x)))
In [83]: df
Out[83]:
count lists
0 2 (a, b)
1 4 (a, c)
2 3 (a, b)
In [76]: df.groupby('lists').sum()
Out[76]:
count
lists
(a, b) 5
(a, c) 4
You can also use sets (after coercing them to strings).
df = pd.DataFrame({'Lists': [['a', 'b'], ['a', 'c'], ['b', 'a']],
'Value': [2, 4, 3]})
df['Sets'] = df.Lists.apply(set).astype(str)
>>> df.groupby(df.Sets).Value.sum()
Sets
set(['a', 'b']) 5
set(['a', 'c']) 4
Name: Value, dtype: int64

Categories