Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a Panda and want to do a calculation based on an existing column.
However, the apply. function is not working for some reason.
It's something like letssay
df = pd.DataFrame({'Age': age, 'Input': input})
and the input column is something like [1.10001, 1.49999, 1.60001]
Now I want to add a new column to the Dataframe, that is doing the following:
Add 0.0001 to each element in column
Multiply each value by 10
Transform each value of new column to int
Use Series.add, Series.mul and Series.astype:
#input is python code word (builtin), so better dont use it like variable
inp = [1.10001, 1.49999, 1.60001]
age = [10,20,30]
df = pd.DataFrame({'Age': age, 'Input': inp})
df['new'] = df['Input'].add(0.0001).mul(10).astype(int)
print (df)
Age Input new
0 10 1.10001 11
1 20 1.49999 15
2 30 1.60001 16
You could make a simple function and then apply it by row.
def f(row):
return int((row['input']+0.0001)*10))
df['new'] = df.apply(f, axis=1)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I got a dictionary with 3 Dataframes {0: DataFrame, 1: DataFrame, 2: DataFrame}.
Each DataFrame has the same size. 6 variables, 25 rows.
I'd like to sum all the values/rows from each DataFrame column 'Income' and pass the sum to a list.
Looking like this
list_of_sums = [Sum of income DF0, Sum of income DF1, Sum of income DF2]
Try this below :
list_of_sums = []
input_dict = {0: DataFrame, 1: DataFrame, 2: DataFrame}
for obj in input_dict:
list_of_sums.append(input_dict[obj]['Income'].sum()) # use sum function and append the result to the output list.
print(list_of_sums)
Not the most optimized one but this works.
import pandas as pd
d, dd, ddd = pd.DataFrame(),pd.DataFrame(),pd.DataFrame()
d['Income'] = [3,4,5,6]
dd['Income'] = [30,40,50,60]
ddd['Income'] = [300,400,500,600]
dicts = [d,dd,ddd]
for dc in dicts:
print(sum([s for s in dc['Income']]))
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a dataframe in which each row show one transaction, items purchased together. Here is how my dataframe looks like
items
['A','B','C']
['A','C]
['C','F']
...
I need to create a dictionary which shows how many times items have been purchased together, something like below
{'A':[('B',1),('C':5)], 'B': [('A':1),('C':6)], ...}
Right now, I have defined a variable freq and then loop through my dataframe and calculate/update the dictionary (freq). but it's taking very long.
What's the efficient way of calculating this without looping through the dataframe?
You can speed this up with sklearn's MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
Transform your data using:
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(df['items']),
columns=mlb.classes_,
index=df.index)
to get it in the following format:
A B C F
0 1 1 1 0
1 1 0 1 0
2 0 0 1 1
And then getting you can define a trivial function like:
get_num_coexisting = lambda x, y: (df[x] & df[y]).sum()
And use as so:
get_num_coexisting('A', 'C')
>>> 2
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have the user input two lists, one for sizes one for minutes they are each stored in a list. For example they can input sizes: 111, 121 and for minutes, 5, 10, 15.
I want to have the dataframe have columns that are named by the size and minute. (I did a for loop to extract each size and minute.) For example I want the columns to say 111,5 ; 111,10; 111;15, etc. I tried to do df[size+minute]=values (Values is data I want to input into each column) but instead the column name is just the values added up so I got the column name to be 116 instead of 111,5.
If you have two lists:
l = [111,121]
l2 = [5,10,15]
Then you can use list comprehension to form your column names:
col_names = [str(x)+';'+str(y) for x in l for y in l2]
print(col_names)
['111;5', '111;10', '111;15', '121;5', '121;10', '121;15']
And create a dataframe with these column names using pandas.DataFrame:
df = pd.DataFrame(columns=col_names)
If we add a row of data:
row = pd.DataFrame([[1,2,3,4,5,6]])
row.columns = col_names
df = df.append(pd.DataFrame(row))
We can see that our dataframe looks like this:
print(df)
111;5 111;10 111;15 121;5 121;10 121;15
0 1 2 3 4 5 6
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
My csv file looks like this:
5783,145v
g656,4589,3243,tt56
6579
How do I read this with pandas (or otherwise)?
(the table should contain empty cells)
You could pass a dummy separator, and then use str.split (by ",") with expand=True:
df = pd.read_csv('path/to/file.csv', sep=" ", header=None)
df = df[0].str.split(",", expand=True).fillna("")
print(df)
Output
0 1 2 3
0 5783 145v
1 g656 4589 3243 tt56
2 6579
I think that the solution proposed by #researchnewbie is good. If you need to replace the NaN values for say, zero, you could add this line after the read:
dataFrame.fillna(0, inplace=True)
Try doing the following:
import pandas as pd
dataFrame = pd.read_csv(filename)
Your empty cells should contain the NaN value, which essentially null.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s