Append column to pandas dataframe - python

This is probably easy, but I have the following data:
In data frame 1:
index dat1
0 9
1 5
In data frame 2:
index dat2
0 7
1 6
I want a data frame with the following form:
index dat1 dat2
0 9 7
1 5 6
I've tried using the append method, but I get a cross join (i.e. cartesian product).
What's the right way to do this?

It seems in general you're just looking for a join:
> dat1 = pd.DataFrame({'dat1': [9,5]})
> dat2 = pd.DataFrame({'dat2': [7,6]})
> dat1.join(dat2)
dat1 dat2
0 9 7
1 5 6

You can also use:
dat1 = pd.concat([dat1, dat2], axis=1)

Both join() and concat() way could solve the problem. However, there is one warning I have to mention: Reset the index before you join() or concat() if you trying to deal with some data frame by selecting some rows from another DataFrame.
One example below shows some interesting behavior of join and concat:
dat1 = pd.DataFrame({'dat1': range(4)})
dat2 = pd.DataFrame({'dat2': range(4,8)})
dat1.index = [1,3,5,7]
dat2.index = [2,4,6,8]
# way1 join 2 DataFrames
print(dat1.join(dat2))
# output
dat1 dat2
1 0 NaN
3 1 NaN
5 2 NaN
7 3 NaN
# way2 concat 2 DataFrames
print(pd.concat([dat1,dat2],axis=1))
#output
dat1 dat2
1 0.0 NaN
2 NaN 4.0
3 1.0 NaN
4 NaN 5.0
5 2.0 NaN
6 NaN 6.0
7 3.0 NaN
8 NaN 7.0
#reset index
dat1 = dat1.reset_index(drop=True)
dat2 = dat2.reset_index(drop=True)
#both 2 ways to get the same result
print(dat1.join(dat2))
dat1 dat2
0 0 4
1 1 5
2 2 6
3 3 7
print(pd.concat([dat1,dat2],axis=1))
dat1 dat2
0 0 4
1 1 5
2 2 6
3 3 7

Perhaps too simple by anyways...
dat1 = pd.DataFrame({'dat1': [9,5]})
dat2 = pd.DataFrame({'dat2': [7,6]})
dat1['dat2'] = dat2 # Uses indices from dat1
Result:
dat1 dat2
0 9 7
1 5 6

You can assign a new column. Use indices to align correspoding rows:
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]}, index=[0, 1, 2])
df2 = pd.DataFrame({'C': [100, 200, 300]}, index=[1, 2, 3])
df1['C'] = df2['C']
Result:
A B C
0 1 10 NaN
1 2 20 100.0
2 3 30 200.0
Ignore indices:
df1['C'] = df2['C'].reset_index(drop=True)
Result:
A B C
0 1 10 100
1 2 20 200
2 3 30 300

Just a matter of the right google search:
data = dat_1.append(dat_2)
data = data.groupby(data.index).sum()

Related

Hide some columns

I have code working as below.
from yahoo_fin import stock_info as si
import yfinance as yf
import talib as ta
import pandas as pd
import datetime
df = pd.read_csv('portfoy.csv')
def anlik_fiyat(data):
return si.get_live_price(data)
hisseler=df["hisse"]
liste=[]
for hisse in hisseler:
fiyat = round(anlik_fiyat(hisse),2)
print(hisse," geldi")
liste.append(fiyat)
df.insert(5, 'guncel', liste)
#Hesaplamalar yap
m = df['satis_fiyati'].isna()
acik=(df.loc[m, 'alis_fiyati']*df.loc[m, 'miktar']).sum()
print("\n", "-"*32, "AÇIK POZİSYONLAR", "-"*32, "\n")
print(df.loc[df['satis_fiyati'].isna()])
print("Açık Pozisyonlar:", acik)
When it works, the results are as follows.
tip hisse alis_tarihi alis_fiyati miktar guncel satis_fiyati satis_tarihi
1 hisse ISCTR.IS 27-06-2022 4.56 21 4.93 NaN NaN
2 hisse SAHOL.IS 04-07-2022 19.21 5 19.73 NaN NaN
5 hisse SAHOL.IS 07-07-2022 18.50 5 19.73 NaN NaN
6 hisse AYGAZ.IS 21-07-2022 35.20 3 35.50 NaN NaN
7 hisse KCHOL.IS 21-07-2022 36.12 3 36.00 NaN NaN
Açık Pozisyonlar: 498.27
I don't want to see tip, satis_fiyati and satis_tarihi columns among the results.
Ps: If I add the following lines
df.drop(['tip'], inplace=True, axis=1)
df.drop(['satis_fiyati'], inplace=True, axis=1)
df.drop(['satis_tarihi'], inplace=True, axis=1)
Gives an error. Also, this is a very long method.
How can I solve this easily?
Hope this small example will help you
Lets say, we have dataframe like this and our goal is to exclude (or include) some columns from the results
df = pd.DataFrame({'a':[1, 2, 3], 'b':[7,9,5], 'c' : [9, 6, 4], 'd':[0, 0, 0]})
a b c d
0 1 7 9 0
1 2 9 6 0
2 3 5 4 0
List of columns we want to include/exclude:
col = ['a', 'c']
To include columns:
df.loc[:, df.columns.isin(col)]
# or df.loc[:, col]
# or df[col]
a c
0 1 9
1 2 6
2 3 4
To exclude columns:
df.loc[:, ~df.columns.isin(col)]
b d
0 7 0
1 9 0
2 5 0

How can I fill data frames with NAN with same values of previous data frames from the same list

I have a list of data frames A most of them are NAN data frames some of them are not, I would like to fill all NAN data frames with same values of the previous data frames (that do not contain NAN) in the list.
Here's a small example:
A=[]
data = {'set_of_numbers': [1,2,3,4,4,5,9]}
df1 = pd.DataFrame(data,columns=['set_of_numbers'])
data2 = {'set_of_numbers': [0,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
df2 = pd.DataFrame(data2,columns=['set_of_numbers'])
data3 = {'set_of_numbers': [3,3,3,8,4,5,8]}
df3 = pd.DataFrame(data3,columns=['set_of_numbers'])
data4 = {'set_of_numbers': [0,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
df4 = pd.DataFrame(data4,columns=['set_of_numbers'])
A.append(df1)
A.append(df2)
A.append(df3)
A.append(df4)
A
I would like to have the output shown in the second picture, where all nan dataframes are filled with values of previous data frames
If I understand correctly:
for i, df in enumerate(A):
df[df.isnull()] = A[i-1]
or if you wish to change the dtype of previously non-nan df:
for i, df in enumerate(A):
if df.isnull().all().all():
A[i] = A[i-1].copy()
per OP's EDIT on question:
for i, df in enumerate(A):
if df.isnull().any().any():
A[i] = A[i-1].copy()
output:
[ set_of_numbers
0 1
1 2
2 3
3 4
4 4
5 5
6 9, set_of_numbers
0 1
1 2
2 3
3 4
4 4
5 5
6 9, set_of_numbers
0 3
1 3
2 3
3 8
4 4
5 5
6 8, set_of_numbers
0 3
1 3
2 3
3 8
4 4
5 5
6 8]

How to create data fame from random lists length using python?

I want to create pandas data frame with multiple lists with different length. Below is my python code.
import pandas as pd
A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]
lenA = len(A)
lenB = len(B)
lenC = len(C)
df = pd.DataFrame(columns=['A', 'B','C'])
for i,v1 in enumerate(A):
for j,v2 in enumerate(B):
for k, v3 in enumerate(C):
if(i<random.randint(0, lenA)):
if(j<random.randint(0, lenB)):
if (k < random.randint(0, lenC)):
df = df.append({'A': v1, 'B': v2,'C':v3}, ignore_index=True)
print(df)
My lists are as below:
A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6,7]
In each run I got different output and which is correct. But not covers all list items in each run. In one run I got below output as:
A B C
0 1 1 3
1 1 2 1
2 1 2 2
3 2 2 5
In the above output 'A' list's all items (1,2) are there. But 'B' list has only (1,2) items, the item 3 is missing. Also list 'C' has (1,2,3,5) items only. (4,6,7) items are missing in 'C' list. My expectation is: in each list each item should be in the data frame at least once and 'C' list items should be in data frame only once. My expected sample output is as below:
A B C
0 1 1 3
1 1 2 1
2 1 2 2
3 2 2 5
4 2 3 4
5 1 1 7
6 2 3 6
Guide me to get my expected output. Thanks in advance.
You can add random values of each list to total length and then use DataFrame.sample:
A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]
L = [A,B,C]
m = max(len(x) for x in L)
print (m)
6
a = [np.hstack((np.random.choice(x, m - len(x)), x)) for x in L]
df = pd.DataFrame(a, index=['A', 'B', 'C']).T.sample(frac=1)
print (df)
A B C
2 2 2 3
0 2 1 1
3 1 1 4
4 1 2 5
5 2 3 6
1 2 2 2
You can use transpose to achieve the same.
EDIT: Used random to randomize the output as requested.
import pandas as pd
from random import shuffle, choice
A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]
shuffle(A)
shuffle(B)
shuffle(C)
data = [A,B,C]
df = pd.DataFrame(data)
df = df.transpose()
df.columns = ['A', 'B', 'C']
df.loc[:,'A'].fillna(choice(A), inplace=True)
df.loc[:,'B'].fillna(choice(B), inplace=True)
This should give the below output
A B C
0 1.0 1.0 1.0
1 2.0 2.0 2.0
2 NaN 3.0 3.0
3 NaN 4.0 4.0
4 NaN NaN 5.0
5 NaN NaN 6.0

How can I extract a column from dataframe and attach it to rows while keeping other columns intact

How can I extract a column from pandas dataframe attach it to rows while keeping the other columns same.
This is my example dataset.
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': np.arange(0,5),
'sample_1' : [5,6,7,8,9],
'sample_2' : [10,11,12,13,14],
'group_id' : ["A","B","C","D","E"]})
The output I'm looking for is:
df2 = pd.DataFrame({'ID': [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
'sample_1' : [5,6,7,8,9,10,11,12,13,14],
'group_id' : ["A","B","C","D","E","A","B","C","D","E"]})
I have tried to slice the dataframe and concat using pd.concat but it was giving NaN values.
My original dataset is large.
You could do this using stack: Set the index to the columns you don't want to modify, call stack, sort by the "sample" column, then reset your index:
df.set_index(['ID','group_id']).stack().sort_values(0).reset_index([0,1]).reset_index(drop=True)
ID group_id 0
0 0 A 5
1 1 B 6
2 2 C 7
3 3 D 8
4 4 E 9
5 0 A 10
6 1 B 11
7 2 C 12
8 3 D 13
9 4 E 14
Using pd.wide_to_long:
res = pd.wide_to_long(df, stubnames='sample_', i='ID', j='group_id')
res.index = res.index.droplevel(1)
res = res.rename(columns={'sample_': 'sample_1'}).reset_index()
print(res)
ID group_id sample_1
0 0 A 5
1 1 B 6
2 2 C 7
3 3 D 8
4 4 E 9
5 0 A 10
6 1 B 11
7 2 C 12
8 3 D 13
9 4 E 14
The function you are looking for is called melt
For example:
df2 = pd.melt(df, id_vars=['ID', 'group_id'], value_vars=['sample_1', 'sample_2'], value_name='sample_1')
df2 = df2.drop('variable', axis=1)

Filling Pandas columns with lists of unequal lengths

I am having trouble filling Pandas dataframes with values from lists of unequal lengths.
nx_lists_into_df is a list of numpy arrays.
I get the following error:
ValueError: Length of values does not match length of index
The code is below:
# Column headers
df_cols = ["f1","f2"]
# Create one dataframe fror each sheet
df1 = pd.DataFrame(columns=df_cols)
df2 = pd.DataFrame(columns=df_cols)
# Create list of dataframes to iterate through
df_list = [df1, df2]
# Lists to be put into the dataframes
nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
array([2, 5, 6, 8])],
[array([0, 1, 2, 6, 7]),
array([3, 4, 5, 8])]]
# Loop through each sheet (i.e. each round of k folds)
for df, test_index_list in zip_longest(df_list, nx_lists_into_df):
counter = -1
# Loop through each column in that sheet (i.e. each fold)
for col in df_cols:
print(col)
counter += 1
# Add 1 to each index value to start indexing at 1
df[col] = test_index_list[counter] + 1
Thank you for your help.
Edit: This is how the result should hopefully look:-
print(df1)
f1 f2
0 0 2
1 1 5
2 3 6
3 4 8
4 7 NaN
print(df2)
f1 f2
0 0 3
1 1 4
2 2 5
3 6 8
4 7 NaN
We'll leverage pd.Series to attach an appropriate index and will allow us to use the pd.DataFrame constructor without complaining of unequal lengths.
df1, df2 = (
pd.DataFrame(dict(zip(df_cols, map(pd.Series, d))))
for d in nx_lists_into_df
)
print(df1)
f1 f2
0 0 2.0
1 1 5.0
2 3 6.0
3 4 8.0
4 7 NaN
print(df2)
f1 f2
0 0 3.0
1 1 4.0
2 2 5.0
3 6 8.0
4 7 NaN
Setup
from numpy import array
nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
array([2, 5, 6, 8])],
[array([0, 1, 2, 6, 7]),
array([3, 4, 5, 8])]]
# Column headers
df_cols = ["f1","f2"]
You could predefine the size of your DataFrames (by setting the index range to the length of the longest column you want to add [or any size bigger than the longest column]) like so:
df1 = pd.DataFrame(columns=df_cols, index=range(5))
df2 = pd.DataFrame(columns=df_cols, index=range(5))
print(df1)
f1 f2
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
(df2 is the same)
The DataFrame will be filled with NaNs automatically.
Then you use .loc to access each entry separately like so:
for x in range(len(nx_lists_into_df)):
for col_idx, y in enumerate(nx_lists_into_df[x]):
df_list[x].loc[range(len(y)), df_cols[col_idx]] = y
print(df1)
f1 f2
0 0 2
1 1 5
2 3 6
3 4 8
4 7 NaN
print(df2)
f1 f2
0 0 3
1 1 4
2 2 5
3 6 8
4 7 NaN
The first loop iterates over the first dimension of your array (or the number of DataFrames you want to create).
The second loop iterates over the column values for the DataFrame, where y are the values for the current column and df_cols[col_idx] is the respective column (f1 or f2).
Since the row & col indices are the same size as y, you don't get the length mismatch.
Also check out the enumerate(iterable, start=0) function to get around those "counter" variables.
Hope this helps.
If I understand correctly, this is possible via pd.concat.
But see #pir's solution for an extendable version.
# Lists to be put into the dataframes
nx_lists_into_df = [[array([0, 1, 3, 4, 7]),
array([2, 5, 6, 8])],
[array([0, 1, 2, 6, 7]),
array([3, 4, 5, 8])]]
df1 = pd.concat([pd.DataFrame({'A': nx_lists_into_df[0][0]}),
pd.DataFrame({'B': nx_lists_into_df[0][1]})],
axis=1)
# A B
# 0 0 2.0
# 1 1 5.0
# 2 3 6.0
# 3 4 8.0
# 4 7 NaN
df2 = pd.concat([pd.DataFrame({'C': nx_lists_into_df[1][0]}),
pd.DataFrame({'D': nx_lists_into_df[1][1]})],
axis=1)
# C D
# 0 0 3.0
# 1 1 4.0
# 2 2 5.0
# 3 6 8.0
# 4 7 NaN

Categories