Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 17 days ago.
Improve this question
I've created my own prediction function (R=C^3+Z^2+3) to predict my target variable. The problem is now I am dealing with a prediction function not an algorithm; therefore .predict from scikit-learn won't work. But then how can I get my predictions?
def objective(C, Z)
return C**3 + Z**2 + 3
here is what you want in pandas.
import pandas as pd
def objective(C, Z):
return C**3 + Z**2 + 3
data = {'C': [1,2,3], 'Z': [4,5,6]}
df = pd.DataFrame(data)
df['R'] = df.apply(lambda x: objective(x.C, x.Z), axis=1)
print(df)
C Z R
0 1 4 20
1 2 5 36
2 3 6 66
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I currently have a csv dataset that has a column that is supposed to be dates. However, the formatting in csv was in 5 digit numbers and I need to convert these to date (ddmmyyyy) formatting in python.
For example, I need to convert 22580 in csv to 27/10/2021. I would appreciate if someone could offer me a solution that will cater to a dataframe.
Thanks in advance!
Seems to be days since Jan 1 1960, so you can just create a timedelta with that many days and add to the base date;
>>> import pandas as pd
>>> from datetime import date, timedelta
>>> df = pd.DataFrame({"a":[1,2,3], 'b':[22580, 22587, 22590]})
>>> df
a b
0 1 22580
1 2 22587
2 3 22590
>>> df['c'] = df['b'].apply(lambda x:date(1960,1,1)+timedelta(days=x))
>>> df
a b c
0 1 22580 2021-10-27
1 2 22587 2021-11-03
2 3 22590 2021-11-06
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to clean data in a dataframe in python, where I am to concatenate Rows in which data in two columns(name, phone_no) are similar i.e.
I have
What I have
Trying to get
Expected Result
P.S It would be much better if you could provide a sample of the dataset instead of the images. Next time you can use df.to_clipboard and paste it as a code snippet in the question for reproducibility.
Now to the answer. You can use pandas groupby and then a custom aggregation.
First I created a dataset for the example:
df = pd.DataFrame({"A": ["a", "b", "a", "b", "c"], "B": list(map(str, range(5))), "C": list(map(str, range(5, 10)))})
Looks as follows
A B C
0 a 0 5
1 b 1 6
2 a 2 7
3 b 3 8
4 c 4 9
Then you can contact rows with similar keys (in your case the keys are name and phone_no
gdf = df.groupby("A").agg({
"B": ",".join,
"C": ",".join
})
print(gdf)
And the results are as follows:
A B C
0 a 0,2 5,7
1 b 1,3 6,8
2 c 4 9
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Hi Guys i want to change index of my dataframe based on result of str.contains.
a quick example
import pandas as pd
df= pd.DataFrame({'Product':['Banana_1','Banana_2','Orange_a','Orange_b'],
'Value':[5.10,5.00,2.10,2.00]})
df2 = df[df['Product'].str.contains('Banana')]
print(df2)
is there a way to use df2 filter to change df1 index?
Thanks
You can control the index based on cell values like the following, which is (I think) along the lines of what you want:
In [28]: df.index = [i if 'Banana' in df.iloc[i,0] else i+len(df) for i in range(len(df))]
In [29]: df
Out[29]:
Product Value
0 Banana_1 5.1
1 Banana_2 5.0
6 Orange_a 2.1
7 Orange_b 2.0
This is what you need:
In [1230]: index_list = df2.index.tolist()
In [1236]: index_map = {}
In [1237]: for i in index_list:
...: index_map[i] = 'myindex'
...:
In [1250]: df.rename(index=index_map, inplace=True)
In [1251]: df
Out[1251]:
Product Value
myindex Banana_1 5.1
myindex Banana_2 5.0
2 Orange_a 2.1
3 Orange_b 2.0
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a csv file below:
name,apply,percent
A,101,98%
B,388,79%
C,637,88%
D,541,75%
E,345,98%
A,446,85%
D,211,49%
I tried to split dataframe into multiple dataframes as df_A, df_B, df_C, df_D & df_E.
for name in df.groupby('name'):
locals()['df_'.name]=df[(df.name==name)]
print(df_A)
It doesn't work. How to fix the code? Many thanks.
You can try this,
>>> import pandas as pd
>>> df = pd.read_csv('a.csv')
>>> for name in df['name'].unique():
... locals()['df_' + name] = df[(df.name == name)]
...
>>> df_A
name apply percent
0 A 101 98%
5 A 446 85%
>>> df_B
name apply percent
1 B 388 79%
>>> df_C
name apply percent
2 C 637 88%
>>> df_D
name apply percent
3 D 541 75%
6 D 211 49%
>>> df_E
name apply percent
4 E 345 98%
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So I'm trying to split a pandas Dataframe into two separate data frames by a single binary variable. Accordingly, the groupby function seems a decent option except it doesn't return a data frame but rather a groupby object which isn't nearly as useful to me. Moreover, I can't access any values from within the groupby object. I ran a simple df.groupby('Type') statement and would like to partition the data from here meaning output those two groups to two new data frames. Any help would be sincerely appreciated. The last question I posted was met with ridiculously childish admonitions not to post homework questions. Needless to say, this as well as the aforementioned were/are NOT homework so please spare me of such remarks. As always thanks so much.
If you use groupby, you can iterate through the groups as follows:
g = df.groupby('class')
for k, v in g.groups.iteritems():
print k # a
print df.iloc[v] # df_a, the dict values are position indices for the group
print
a
class data1 data2
0 a -0.173070 141.437719
2 a -0.087673 200.815709
6 a 1.220608 159.456053
8 a 0.428373 -6.491034
9 a -0.123463 -96.898025
c
class data1 data2
5 c -0.358996 162.715982
7 c -1.339496 23.043417
b
class data1 data2
1 b -1.761652 -12.405066
3 b 1.366879 22.988654
4 b 1.125314 60.489373
Note: iterating over a set/dict is not guaranteed to be in order.
How's this?
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'class': np.random.choice(list('abc'), size=10),
'data1': np.random.randn(10),
'data2': np.random.randn(10) * 100})
df_a = df[df['class']=='a']
df_b = df[df['class']=='b']
df_c = df[df['class']=='c']
print df, '\n'
print df_a
print df_b
print df_c
Gives:
class data1 data2
0 a -0.173070 141.437719
1 b -1.761652 -12.405066
2 a -0.087673 200.815709
3 b 1.366879 22.988654
4 b 1.125314 60.489373
5 c -0.358996 162.715982
6 a 1.220608 159.456053
7 c -1.339496 23.043417
8 a 0.428373 -6.491034
9 a -0.123463 -96.898025
class data1 data2
0 a -0.173070 141.437719
2 a -0.087673 200.815709
6 a 1.220608 159.456053
8 a 0.428373 -6.491034
9 a -0.123463 -96.898025
class data1 data2
1 b -1.761652 -12.405066
3 b 1.366879 22.988654
4 b 1.125314 60.489373
class data1 data2
5 c -0.358996 162.715982
7 c -1.339496 23.043417