I have code working as below.
from yahoo_fin import stock_info as si
import yfinance as yf
import talib as ta
import pandas as pd
import datetime
df = pd.read_csv('portfoy.csv')
def anlik_fiyat(data):
return si.get_live_price(data)
hisseler=df["hisse"]
liste=[]
for hisse in hisseler:
fiyat = round(anlik_fiyat(hisse),2)
print(hisse," geldi")
liste.append(fiyat)
df.insert(5, 'guncel', liste)
#Hesaplamalar yap
m = df['satis_fiyati'].isna()
acik=(df.loc[m, 'alis_fiyati']*df.loc[m, 'miktar']).sum()
print("\n", "-"*32, "AÇIK POZİSYONLAR", "-"*32, "\n")
print(df.loc[df['satis_fiyati'].isna()])
print("Açık Pozisyonlar:", acik)
When it works, the results are as follows.
tip hisse alis_tarihi alis_fiyati miktar guncel satis_fiyati satis_tarihi
1 hisse ISCTR.IS 27-06-2022 4.56 21 4.93 NaN NaN
2 hisse SAHOL.IS 04-07-2022 19.21 5 19.73 NaN NaN
5 hisse SAHOL.IS 07-07-2022 18.50 5 19.73 NaN NaN
6 hisse AYGAZ.IS 21-07-2022 35.20 3 35.50 NaN NaN
7 hisse KCHOL.IS 21-07-2022 36.12 3 36.00 NaN NaN
Açık Pozisyonlar: 498.27
I don't want to see tip, satis_fiyati and satis_tarihi columns among the results.
Ps: If I add the following lines
df.drop(['tip'], inplace=True, axis=1)
df.drop(['satis_fiyati'], inplace=True, axis=1)
df.drop(['satis_tarihi'], inplace=True, axis=1)
Gives an error. Also, this is a very long method.
How can I solve this easily?
Hope this small example will help you
Lets say, we have dataframe like this and our goal is to exclude (or include) some columns from the results
df = pd.DataFrame({'a':[1, 2, 3], 'b':[7,9,5], 'c' : [9, 6, 4], 'd':[0, 0, 0]})
a b c d
0 1 7 9 0
1 2 9 6 0
2 3 5 4 0
List of columns we want to include/exclude:
col = ['a', 'c']
To include columns:
df.loc[:, df.columns.isin(col)]
# or df.loc[:, col]
# or df[col]
a c
0 1 9
1 2 6
2 3 4
To exclude columns:
df.loc[:, ~df.columns.isin(col)]
b d
0 7 0
1 9 0
2 5 0
Related
I have two dataframes,
df1:
hash a b c
ABC 1 2 3
def 5 3 4
Xyz 3 2 -1
df2:
hash v
Xyz 3
def 5
I want to make
df:
hash a b c
ABC 1 2 3 (= as is, because no matching 'ABC' in df2)
def 25 15 20 (= 5*5 3*5 4*5)
Xyz 9 6 -3 (= 3*3 2*3 -1*3)
as like above,
I want to make a dataframe with values of multiplying df1 and df2 according to their index (or first column name) matched.
As df2 only has one column (v), all df1's columns except for the first one (index) should be affected.
Is there any neat Pythonic and Panda's way to achieve it?
df1.set_index(['hash']).mul(df2.set_index(['hash'])) or similar things seem not work..
One approach:
df1 = df1.set_index("hash")
df2 = df2.set_index("hash")["v"]
res = df1.mul(df2, axis=0).combine_first(df1)
print(res)
Output
a b c
hash
ABC 1.0 2.0 3.0
Xyz 9.0 6.0 -3.0
def 25.0 15.0 20.0
One Method:
# We'll make this for convenience
cols = ['a', 'b', 'c']
# Merge the DataFrames, keeping everything from df
df = df1.merge(df2, 'left').fillna(1)
# We'll make the v column integers again since it's been filled.
df.v = df.v.astype(int)
# Broadcast the multiplication across axis 0
df[cols] = df[cols].mul(df.v, axis=0)
# Drop the no-longer needed column:
df = df.drop('v', axis=1)
print(df)
Output:
hash a b c
0 ABC 1 2 3
1 def 25 15 20
2 Xyz 9 6 -3
Alternative Method:
# Set indices
df1 = df1.set_index('hash')
df2 = df2.set_index('hash')
# Apply multiplication and fill values
df = (df1.mul(df2.v, axis=0)
.fillna(df1)
.astype(int)
.reset_index())
# Output:
hash a b c
0 ABC 1 2 3
1 Xyz 9 6 -3
2 def 25 15 20
The function you are looking for is actually multiply.
Here's how I have done it:
>>> df
hash a b
0 ABC 1 2
1 DEF 5 3
2 XYZ 3 -1
>>> df2
hash v
0 XYZ 4
1 ABC 8
df = df.merge(df2, on='hash', how='left').fillna(1)
>>> df
hash a b v
0 ABC 1 2 8.0
1 DEF 5 3 1.0
2 XYZ 3 -1 4.0
df[['a','b']] = df[['a','b']].multiply(df['v'], axis='index')
>>>df
hash a b v
0 ABC 8.0 16.0 8.0
1 DEF 5.0 3.0 1.0
2 XYZ 12.0 -4.0 4.0
You can actually drop v at the end if you don't need it.
How can I extract a column from pandas dataframe attach it to rows while keeping the other columns same.
This is my example dataset.
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': np.arange(0,5),
'sample_1' : [5,6,7,8,9],
'sample_2' : [10,11,12,13,14],
'group_id' : ["A","B","C","D","E"]})
The output I'm looking for is:
df2 = pd.DataFrame({'ID': [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
'sample_1' : [5,6,7,8,9,10,11,12,13,14],
'group_id' : ["A","B","C","D","E","A","B","C","D","E"]})
I have tried to slice the dataframe and concat using pd.concat but it was giving NaN values.
My original dataset is large.
You could do this using stack: Set the index to the columns you don't want to modify, call stack, sort by the "sample" column, then reset your index:
df.set_index(['ID','group_id']).stack().sort_values(0).reset_index([0,1]).reset_index(drop=True)
ID group_id 0
0 0 A 5
1 1 B 6
2 2 C 7
3 3 D 8
4 4 E 9
5 0 A 10
6 1 B 11
7 2 C 12
8 3 D 13
9 4 E 14
Using pd.wide_to_long:
res = pd.wide_to_long(df, stubnames='sample_', i='ID', j='group_id')
res.index = res.index.droplevel(1)
res = res.rename(columns={'sample_': 'sample_1'}).reset_index()
print(res)
ID group_id sample_1
0 0 A 5
1 1 B 6
2 2 C 7
3 3 D 8
4 4 E 9
5 0 A 10
6 1 B 11
7 2 C 12
8 3 D 13
9 4 E 14
The function you are looking for is called melt
For example:
df2 = pd.melt(df, id_vars=['ID', 'group_id'], value_vars=['sample_1', 'sample_2'], value_name='sample_1')
df2 = df2.drop('variable', axis=1)
code to make test data:
import pandas as pd
import numpy as np
testdf = {'date': range(10),
'event': ['A', 'A', np.nan, 'B', 'B', 'A', 'B', np.nan, 'A', 'B'],
'id': [1] * 7 + [2] * 3}
testdf = pd.DataFrame(testdf)
print(testdf)
gives
date event id
0 0 A 1
1 1 A 1
2 2 NaN 1
3 3 B 1
4 4 B 1
5 5 A 1
6 6 B 1
7 7 NaN 2
8 8 A 2
9 9 B 2
subset testdf
df_sub = testdf.loc[testdf.event == 'A',:]
print(df_sub)
date event id
0 0 A 1
1 1 A 1
5 5 A 1
8 8 A 2
(Note: not re-indexed)
create conditional boolean index
bool_sliced_idx1 = df_sub.date < 4
bool_sliced_idx2 = (df_sub.date > 4) & (df_sub.date < 6)
I want to insert conditional values using this new index in original df, like
dftest[ 'new_column'] = np.nan
dftest.loc[bool_sliced_idx1, 'new_column'] = 'new_conditional_value'
which obviously (now) gives error:
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
bool_sliced_idx1 looks like
>>> print(bool_sliced_idx1)
0 True
1 True
5 False
8 False
Name: date, dtype: bool
I tried testdf.ix[(bool_sliced_idx1==True).index,:], but that doesn't work because
>>> (bool_sliced_idx1==True).index
Int64Index([0, 1, 5, 8], dtype='int64')
IIUC, you can just combine all of your conditions at once, instead of trying to chain them. For example, df_sub.date < 4 is really just (testdf.event == 'A') & (testdf.date < 4). So, you could do something like:
# Create the conditions.
cond1 = (testdf.event == 'A') & (testdf.date < 4)
cond2 = (testdf.event == 'A') & (testdf.date.between(4, 6, inclusive=False))
# Make the assignments.
testdf.loc[cond1, 'new_col'] = 'foo'
testdf.loc[cond2, 'new_col'] = 'bar'
Which would give you:
date event id new_col
0 0 A 1 foo
1 1 A 1 foo
2 2 NaN 1 NaN
3 3 B 1 NaN
4 4 B 1 NaN
5 5 A 1 bar
6 6 B 1 NaN
7 7 NaN 2 NaN
8 8 A 2 NaN
9 9 B 2 NaN
This worked
idx = np.where(bool_sliced_idx1==True)[0]
## or
# np.ravel(np.where(bool_sliced_idx1==True))
idx_original = df_sub.index[idx]
testdf.iloc[idx_original,:]
I am trying to add a column to a pandas dataframe, like so:
df = pd.DataFrame()
df['one'] = pd.Series({'1':4, '2':6})
print (df)
df['two'] = pd.Series({'0':4, '2':6})
print (df)
This yields:
one two
1 4 NaN
2 6 6
However, I would the result to be,
one two
0 NaN 4
1 4 NaN
2 6 6
How do you do that?
One possibility is to use pd.concat:
ser1 = pd.Series({'1':4, '2':6})
ser2 = pd.Series({'0':4, '2':6})
df = pd.concat((ser1, ser2), axis=1)
to get
0 1
0 NaN 4
1 4 NaN
2 6 6
You can use join, telling pandas exactly how you want to do it:
df = pd.DataFrame()
df['one'] = pd.Series({'1':4, '2':6})
df.join(pd.Series({'0':4, '2':6}, name = 'two'), how = 'outer')
This results in
one two
0 NaN 4
1 4 NaN
2 6 6
This is probably easy, but I have the following data:
In data frame 1:
index dat1
0 9
1 5
In data frame 2:
index dat2
0 7
1 6
I want a data frame with the following form:
index dat1 dat2
0 9 7
1 5 6
I've tried using the append method, but I get a cross join (i.e. cartesian product).
What's the right way to do this?
It seems in general you're just looking for a join:
> dat1 = pd.DataFrame({'dat1': [9,5]})
> dat2 = pd.DataFrame({'dat2': [7,6]})
> dat1.join(dat2)
dat1 dat2
0 9 7
1 5 6
You can also use:
dat1 = pd.concat([dat1, dat2], axis=1)
Both join() and concat() way could solve the problem. However, there is one warning I have to mention: Reset the index before you join() or concat() if you trying to deal with some data frame by selecting some rows from another DataFrame.
One example below shows some interesting behavior of join and concat:
dat1 = pd.DataFrame({'dat1': range(4)})
dat2 = pd.DataFrame({'dat2': range(4,8)})
dat1.index = [1,3,5,7]
dat2.index = [2,4,6,8]
# way1 join 2 DataFrames
print(dat1.join(dat2))
# output
dat1 dat2
1 0 NaN
3 1 NaN
5 2 NaN
7 3 NaN
# way2 concat 2 DataFrames
print(pd.concat([dat1,dat2],axis=1))
#output
dat1 dat2
1 0.0 NaN
2 NaN 4.0
3 1.0 NaN
4 NaN 5.0
5 2.0 NaN
6 NaN 6.0
7 3.0 NaN
8 NaN 7.0
#reset index
dat1 = dat1.reset_index(drop=True)
dat2 = dat2.reset_index(drop=True)
#both 2 ways to get the same result
print(dat1.join(dat2))
dat1 dat2
0 0 4
1 1 5
2 2 6
3 3 7
print(pd.concat([dat1,dat2],axis=1))
dat1 dat2
0 0 4
1 1 5
2 2 6
3 3 7
Perhaps too simple by anyways...
dat1 = pd.DataFrame({'dat1': [9,5]})
dat2 = pd.DataFrame({'dat2': [7,6]})
dat1['dat2'] = dat2 # Uses indices from dat1
Result:
dat1 dat2
0 9 7
1 5 6
You can assign a new column. Use indices to align correspoding rows:
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]}, index=[0, 1, 2])
df2 = pd.DataFrame({'C': [100, 200, 300]}, index=[1, 2, 3])
df1['C'] = df2['C']
Result:
A B C
0 1 10 NaN
1 2 20 100.0
2 3 30 200.0
Ignore indices:
df1['C'] = df2['C'].reset_index(drop=True)
Result:
A B C
0 1 10 100
1 2 20 200
2 3 30 300
Just a matter of the right google search:
data = dat_1.append(dat_2)
data = data.groupby(data.index).sum()