I am trying to add a column to a pandas dataframe, like so:
df = pd.DataFrame()
df['one'] = pd.Series({'1':4, '2':6})
print (df)
df['two'] = pd.Series({'0':4, '2':6})
print (df)
This yields:
one two
1 4 NaN
2 6 6
However, I would the result to be,
one two
0 NaN 4
1 4 NaN
2 6 6
How do you do that?
One possibility is to use pd.concat:
ser1 = pd.Series({'1':4, '2':6})
ser2 = pd.Series({'0':4, '2':6})
df = pd.concat((ser1, ser2), axis=1)
to get
0 1
0 NaN 4
1 4 NaN
2 6 6
You can use join, telling pandas exactly how you want to do it:
df = pd.DataFrame()
df['one'] = pd.Series({'1':4, '2':6})
df.join(pd.Series({'0':4, '2':6}, name = 'two'), how = 'outer')
This results in
one two
0 NaN 4
1 4 NaN
2 6 6
Related
I have code working as below.
from yahoo_fin import stock_info as si
import yfinance as yf
import talib as ta
import pandas as pd
import datetime
df = pd.read_csv('portfoy.csv')
def anlik_fiyat(data):
return si.get_live_price(data)
hisseler=df["hisse"]
liste=[]
for hisse in hisseler:
fiyat = round(anlik_fiyat(hisse),2)
print(hisse," geldi")
liste.append(fiyat)
df.insert(5, 'guncel', liste)
#Hesaplamalar yap
m = df['satis_fiyati'].isna()
acik=(df.loc[m, 'alis_fiyati']*df.loc[m, 'miktar']).sum()
print("\n", "-"*32, "AÇIK POZİSYONLAR", "-"*32, "\n")
print(df.loc[df['satis_fiyati'].isna()])
print("Açık Pozisyonlar:", acik)
When it works, the results are as follows.
tip hisse alis_tarihi alis_fiyati miktar guncel satis_fiyati satis_tarihi
1 hisse ISCTR.IS 27-06-2022 4.56 21 4.93 NaN NaN
2 hisse SAHOL.IS 04-07-2022 19.21 5 19.73 NaN NaN
5 hisse SAHOL.IS 07-07-2022 18.50 5 19.73 NaN NaN
6 hisse AYGAZ.IS 21-07-2022 35.20 3 35.50 NaN NaN
7 hisse KCHOL.IS 21-07-2022 36.12 3 36.00 NaN NaN
Açık Pozisyonlar: 498.27
I don't want to see tip, satis_fiyati and satis_tarihi columns among the results.
Ps: If I add the following lines
df.drop(['tip'], inplace=True, axis=1)
df.drop(['satis_fiyati'], inplace=True, axis=1)
df.drop(['satis_tarihi'], inplace=True, axis=1)
Gives an error. Also, this is a very long method.
How can I solve this easily?
Hope this small example will help you
Lets say, we have dataframe like this and our goal is to exclude (or include) some columns from the results
df = pd.DataFrame({'a':[1, 2, 3], 'b':[7,9,5], 'c' : [9, 6, 4], 'd':[0, 0, 0]})
a b c d
0 1 7 9 0
1 2 9 6 0
2 3 5 4 0
List of columns we want to include/exclude:
col = ['a', 'c']
To include columns:
df.loc[:, df.columns.isin(col)]
# or df.loc[:, col]
# or df[col]
a c
0 1 9
1 2 6
2 3 4
To exclude columns:
df.loc[:, ~df.columns.isin(col)]
b d
0 7 0
1 9 0
2 5 0
I have two dataframes,
df1:
hash a b c
ABC 1 2 3
def 5 3 4
Xyz 3 2 -1
df2:
hash v
Xyz 3
def 5
I want to make
df:
hash a b c
ABC 1 2 3 (= as is, because no matching 'ABC' in df2)
def 25 15 20 (= 5*5 3*5 4*5)
Xyz 9 6 -3 (= 3*3 2*3 -1*3)
as like above,
I want to make a dataframe with values of multiplying df1 and df2 according to their index (or first column name) matched.
As df2 only has one column (v), all df1's columns except for the first one (index) should be affected.
Is there any neat Pythonic and Panda's way to achieve it?
df1.set_index(['hash']).mul(df2.set_index(['hash'])) or similar things seem not work..
One approach:
df1 = df1.set_index("hash")
df2 = df2.set_index("hash")["v"]
res = df1.mul(df2, axis=0).combine_first(df1)
print(res)
Output
a b c
hash
ABC 1.0 2.0 3.0
Xyz 9.0 6.0 -3.0
def 25.0 15.0 20.0
One Method:
# We'll make this for convenience
cols = ['a', 'b', 'c']
# Merge the DataFrames, keeping everything from df
df = df1.merge(df2, 'left').fillna(1)
# We'll make the v column integers again since it's been filled.
df.v = df.v.astype(int)
# Broadcast the multiplication across axis 0
df[cols] = df[cols].mul(df.v, axis=0)
# Drop the no-longer needed column:
df = df.drop('v', axis=1)
print(df)
Output:
hash a b c
0 ABC 1 2 3
1 def 25 15 20
2 Xyz 9 6 -3
Alternative Method:
# Set indices
df1 = df1.set_index('hash')
df2 = df2.set_index('hash')
# Apply multiplication and fill values
df = (df1.mul(df2.v, axis=0)
.fillna(df1)
.astype(int)
.reset_index())
# Output:
hash a b c
0 ABC 1 2 3
1 Xyz 9 6 -3
2 def 25 15 20
The function you are looking for is actually multiply.
Here's how I have done it:
>>> df
hash a b
0 ABC 1 2
1 DEF 5 3
2 XYZ 3 -1
>>> df2
hash v
0 XYZ 4
1 ABC 8
df = df.merge(df2, on='hash', how='left').fillna(1)
>>> df
hash a b v
0 ABC 1 2 8.0
1 DEF 5 3 1.0
2 XYZ 3 -1 4.0
df[['a','b']] = df[['a','b']].multiply(df['v'], axis='index')
>>>df
hash a b v
0 ABC 8.0 16.0 8.0
1 DEF 5.0 3.0 1.0
2 XYZ 12.0 -4.0 4.0
You can actually drop v at the end if you don't need it.
Having two dataframes df1 and df2 (same number of rows) how can we, very simply, take all the columns from df2 and add them to df1? Using join, we are joining them on the index or a given column, but assuming their index's are completely different and they have no columns in common. Is that doable (without the obvious way of looping over each column in df2and add them as new to df1)?
EDIT: added an example.
Note; no index, column names are mentioned since it should not matter (thats is the "problem").
df1= [[1,3,2,
[11,20,33]]
df2 = [["bird",np.nan,37,np.sqrt(2)]
["dog",0.123,3.14,0]]
pd.some_operation(df1,df2)
#[[1,3,2,"bird",np.nan,37,np.sqrt(2)]
#[11,20,33,"dog",0.123,3.14,0]]
Samples:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
}, index = list('QRSTUW'))
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
}, index = list('KLMNOP'))
Pandas always use index values if use join or concat by axis=1, so for correct alignement is necessary create same index values:
df = df1.join(df2.set_index(df1.index))
df = pd.concat([df1, df2.set_index(df1.index)], axis=1)
print (df)
A B C D E F
Q a 4 7 1 5 a
R b 5 8 3 3 a
S c 4 9 5 6 a
T d 5 4 7 9 b
U e 5 2 1 2 b
W f 4 3 0 4 b
Or create default index in both DataFrames:
df = df1.reset_index(drop=True).join(df2.reset_index(drop=True))
df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
I have a dataframe containing some data, which I want to transform, so that the values of one column define the new columns.
>>> import pandas as pd
>>> df = pd.DataFrame([['a','a','b','b'],[6,7,8,9]]).T
>>> df
A B
0 a 6
1 a 7
2 b 8
3 b 9
The values of the column A shall be the column names of the new dataframe. The result of the transformation should look like this:
a b
0 6 8
1 7 9
What I came up with so far didn't work completely:
>>> pd.DataFrame({ k : df.loc[df['A'] == k, 'B'] for k in df['A'].unique() })
a b
0 6 NaN
1 7 NaN
2 NaN 8
3 NaN 9
Besides this being incorrect, I guess there probably is a more efficient way anyway. I'm just really having a hard time understanding how to handle things with pandas.
You were almost there but you need the .values as the list of array and then provide the column names.
pd.DataFrame(pd.DataFrame({ k : df.loc[df['A'] == k, 'B'].values for k in df['A'].unique() }), columns=df['A'].unique())
Output:
a b
0 6 8
1 7 9
Using a dictionary comprehension with groupby:
res = pd.DataFrame({col: vals.loc[:, 1].values for col, vals in df.groupby(0)})
print(res)
a b
0 6 8
1 7 9
Use set_index, groupby, cumcount, and unstack:
(df.set_index(['A', df.groupby('A').cumcount()])['B']
.unstack(0)
.rename_axis([None], axis=1))
Output:
a b
0 6 8
1 7 9
in a pandas dataframe
matrix
I would like to find the rows (indices) contaning NaN.
for finding NaN in columns I would do
idx_nan = matrix.columns[np.isnan(matrix).any(axis=1)]
but it doesn't work with matrix.rows
What is the equivalent for finding items in rows?
I think you need DataFrame.isnull with any and boolean indexing:
print (df[df.isnull().any(1)].index)
Sample:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[np.nan,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 NaN 1 5 7
1 2 5 8.0 3 3 4
2 3 6 9.0 5 6 3
print (df[df.isnull().any(1)].index)
Int64Index([0], dtype='int64')
Another solutions:
idx_nan = df[np.isnan(df).any(axis=1)].index
print (idx_nan)
Int64Index([0], dtype='int64')
idx_nan = df.index[np.isnan(df).any(axis=1)]
print (idx_nan)