How do I add or subtract a row to an entire pandas dataframe? - python

I have a dataframe like this:
| a | b | c |
0 | 0 | 0 | 0 |
1 | 5 | 5 | 5 |
I have a dataframe row (or series) like this:
| a | b | c |
0 | 1 | 2 | 3 |
I want to subtract the row from the entire dataframe to obtain this:
| a | b | c |
0 | 1 | 2 | 3 |
1 | 6 | 7 | 8 |
Any help is appreciated, thanks.

Use DataFrame.add or DataFrame.sub with convert one row DataFrame to Series - e.g. by DataFrame.iloc for first row:
df = df1.add(df2.iloc[0])
#alternative select by row label
#df = df1.add(df2.loc[0])
print (df)
a b c
0 1 2 3
1 6 7 8
Detail:
print (df2.iloc[0])
a 1
b 2
c 3
Name: 0, dtype: int64

You can convert the second dataframe to numpy array:
df1 + df2.values
Output:
a b c
0 1 2 3
1 6 7 8

Related

Movement of a specific string from one column to another

Hi I have a DataFrame which have values like
| ID| Value| comments |
| 1 | a | |
| 2 | b | |
| 3 | a;b;c| |
| 4 | b;c | |
| 5 | d;a;c| |
I need to transfer to a and b from Value to Comments for all the rows they are in. so that only value other that a and b will remain in data.
the new df would look like this
| ID| Value| comments |
| 1 | | a |
| 2 | | b |
| 3 | c | a;b |
| 4 | c | b |
| 5 | d;c | a |
Can you give me a direction where should i look for the answer to this
(i) Use str.split to split on ';' and explode the "Value" column
(ii) Use boolean indexing to filter rows where 'a' or 'b' exist, take them out and groupby index and join them with ';' as separators
exploded_series = df['Value'].str.split(';').explode()
mask = exploded_series.isin(['a','b'])
df['comments'] = exploded_series[mask].groupby(level=0).apply(';'.join)
df['Value'] = exploded_series[~mask].groupby(level=0).apply(';'.join)
df = df.fillna('')
Output:
ID Value comments
0 1 a
1 2 b
2 3 c a;b
3 4 c b
4 5 d;c a
Explode your Value column then label it to the right column:
out = df.assign(Value=df['Value'].str.split(';')).explode('Value')
out['col'] = np.where(out['Value'].isin(['a', 'b']), 'comments', 'Value')
print(out)
# Intermediate output
ID Value comments col
0 1 a NaN comments
1 2 b NaN comments
2 3 a NaN comments
2 3 b NaN comments
2 3 c NaN Value
3 4 b NaN comments
3 4 c NaN Value
4 5 d NaN Value
4 5 a NaN comments
4 5 c NaN Value
Now pivot your dataframe:
out = out.pivot_table(index='ID', columns='col', values='Value', aggfunc=';'.join) \
.fillna('').reset_index().rename_axis(columns=None)
print(out)
# Final output
ID Value comments
0 1 a
1 2 b
2 3 c a;b
3 4 c b
4 5 d;c a

How to create a new Data Frame with columns from lists of values (better described below)?

I got a Data Frame like this. In the values column there is a list of numbers per row. In the categories column there is a list of categories per row. Values are of Type int and Categories of Type string. Each value in the values column always fits the category value in the same position oft the list in the categories column. You can think of it as recipes. For example: For the recipe in the first row you need 2 of a, 4 of c, 3 of d and 5 of e.
| values | categories |
| ------ | ---------- |
| [2,4,3,5] | ['a','c','d','e'] |
| [1,6,7] | ['b','c','e'] |
| [3,5] | ['c','f'] |
I need to create a new Data Frame with pandas/ python so that it takes the distinct categories as columns and fills the rows with the corresponding values. So that it looks like this:
| a | b | c | d | e | f |
| - | - | - | - | - | - |
| 2 | 0 | 4 | 3 | 5 | 0 |
| 0 | 1 | 6 | 0 | 7 | 0 |
| 0 | 0 | 3 | 0 | 0 | 5 |
Thank you for your help.
Another option with explode and pivot:
df.apply(pd.Series.explode).pivot(columns='categories').fillna(0)
Output:
values
categories a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5
Use list comprehension for list of dictionaries, pass to DataFrame constructor and then replace missing values to 0 with sorting columns names:
L = [dict(zip(a, b)) for a, b in zip(df['categories'], df['values'])]
df = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
print (df)
a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5
Another idea is create dictionary by all uniwue sorted columns names and use {**dict1, **dict2} merge trick:
d = dict.fromkeys(sorted(set([y for x in df['categories'] for y in x])), 0)
L = [{ **d, **dict(zip(a, b))} for a, b in zip(df['categories'], df['values'])]
df = pd.DataFrame(L, index=df.index)
print (df)
a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5

How to drop level of column in Pandas when the second level is in the first column

I have the following data
---+--
b | c
--+---+--
a
--+---+--
0 | 1 | 2
1 | 3 | 4
I would like to drop the level of column a and have the following as output:
--+---+--
a| b | c
--+---+--
0 | 1 | 2
1 | 3 | 4
I have tried df.droplevel(1) but got the following error:
IndexError: Too many levels: Index has only 1 level, not 2
Any help is appreciated.
As suggested by #Alollz & #Ben, I did the following:
df = df.reset_index()
And got the following output
---+---+---+---+
| a | b | c
---+---+---+---+--
0 | 0 | 1 | 2
1 | 1 | 3 | 4

Find count of unique value of each column and save in CSV

I have data like this:
+---+---+---+
| A | B | C |
+---+---+---+
| 1 | 2 | 7 |
| 2 | 2 | 7 |
| 3 | 2 | 1 |
| 3 | 2 | 1 |
| 3 | 2 | 1 |
+---+---+---+
Need to count unique value of each column and report it like below:
+---+---+---+
| A | 3 | 3 |
| A | 2 | 1 |
| A | 1 | 1 |
| B | 2 | 5 |
| C | 1 | 3 |
| C | 7 | 2 |
+---+---+---+
I have no issue when number of column are limit and manually name them, when input file is big it become hard,need to have simple way to have output
here is the code I have
import pandas as pd
df=pd.read_csv('1.csv')
A=df['A']
B=df['B']
C=df['C']
df1=A.value_counts()
df2=B.value_counts()
df3=C.value_counts()
all = {'A': df1,'B': df2,'C': df3}
result = pd.concat(all)
result.to_csv('out.csv')
Use DataFrame.stack with SeriesGroupBy.value_counts and then convert Series to DataFrame by Series.rename_axis and Series.reset_index and :
df=pd.read_csv('1.csv')
result = (df.stack()
.groupby(level=1)
.value_counts()
.rename_axis(['X','Y'])
.reset_index(name='Z'))
print (result)
X Y Z
0 A 3 3
1 A 1 1
2 A 2 1
3 B 2 5
4 C 1 3
5 C 7 2
X Y Z
2 A 3 3
0 A 1 1
1 A 2 1
3 B 2 5
4 C 1 3
5 C 7 2
result.to_csv('out.csv', index=False)
You can loop over column and insert them in dictionary.
you can initiate dictionary by all={}. To be scalable you can read column by colm=df.columns. This would give you all column in your df.
Try this code:
import pandas as pd
df=pd.read_csv('1.csv')
all={}
colm=df.columns
for i in colm:
all.update({i:df[i].value_counts()})
result = pd.concat(all)
result.to_csv('out.csv')
to find unique values of a data-frame.
df.A.unique()
to know the count of the unique values.
len(df.A.unique())
unique create an array to find the count use len() function

How to groupby count across multiple columns in pandas

I have the following sample dataframe in Python pandas:
+---+------+------+------+
| | col1 | col2 | col3 |
+---+------+------+------+
| 0 | a | d | b |
+---+------+------+------+
| 1 | a | c | b |
+---+------+------+------+
| 2 | c | b | c |
+---+------+------+------+
| 3 | b | b | c |
+---+------+------+------+
| 4 | a | a | d |
+---+------+------+------+
I would like to perform a count of all the 'a,' 'b,' 'c,' and 'd' values across columns 1-3 so that I would end up with a dataframe like this:
+---+--------+-------+
| | letter | count |
+---+--------+-------+
| 0 | a | 4 |
+---+--------+-------+
| 1 | b | 5 |
+---+--------+-------+
| 2 | c | 4 |
+---+--------+-------+
| 3 | d | 2 |
+---+--------+-------+
One way I can do this is stack the columns on top of each other and THEN do a groupby count, but I feel like there has to be a better way. Can someone help me with this?
You can stack() the dataframe to put all columns into rows and then do value_counts:
df.stack().value_counts()
b 5
c 4
a 4
d 2
dtype: int64
You can apply value_counts with sum:
print (df.apply(pd.value_counts))
col1 col2 col3
a 3.0 1 NaN
b 1.0 2 2.0
c 1.0 1 2.0
d NaN 1 1.0
df1 = df.apply(pd.value_counts).sum(1).reset_index()
df1.columns = ['letter','count']
df1['count'] = df1['count'].astype(int)
print (df1)
letter count
0 a 4
1 b 5
2 c 4
3 d 2

Categories