I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5
I got two DataFrame and want remove rows in df1 where we have same value in column 'a' in df2. Moreover one common value in df2 will only remove one row.
df1 = pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2 = pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
result = pd.DataFrame({'a':[1,1,3,4],'b':[1,2,4,6],'c':[6,5,3,1]})
Use Series.isin + Series.duplicated to create a boolean mask and use this mask to filter the rows from df1:
m = df1['a'].isin(df2['a']) & ~df1['a'].duplicated()
df = df1[~m]
Result:
print(df)
a b c
0 1 1 6
1 1 2 5
3 3 4 3
5 4 6 1
Try This:
import pandas as pd
df1=pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2=pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
df2a = df2['a'].tolist()
def remove_df2_dup(x):
if x in df2a:
df2a.remove(x)
return False
return True
df1[df1.a.apply(remove_df2_dup)]
It creates a list from df2['a'], then checks that list against each value of df1['a'], removing values from the list each time there's a match in df1
try this
df1=pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2=pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
for x in df2.a:
if x in df1.a:
df1.drop(df1[df1.a==x].index[0], inplace=True)
print(df1)
i have a dataframe look like this:
enter image description here
how can i get the column index base on the cell value exist .xml which is 4 in my dataframe ?
Here is my code:
df = fileserver
for index in df:
df1 = df[index].str.contains(".xml")
print(df1)
for the result True, i don't know how to find the index, please help and thank you everyone
You may track which column does indeed contain a pattern .xml using the following code. For the example below, the column of interest has index 2.
First, create a sample dataframe for your example.
import pandas as pd
>>> df = pd.DataFrame([[1,2,"3.xml"], [4,5,"6.xml"]])
>>> df
0 1 2
0 1 2 3.xml
1 4 5 6.xml
Now, check for your pattern among all cells of the dataframe.
>>> df_filtered = df.applymap(lambda cell: str(cell).endswith(".xml"))
>>> df_filtered
0 1 2
0 False False True
1 False False True
Finally, keep columns if at least one cell was found with the desired pattern.
>>> [column for column, count in df_filtered.sum().to_dict().items()
if count > 0]
[2]
I have two dataframe like this:
df1 = pd.DataFrame({'a':[1,2]})
df2 = pd.DataFrame({'a':[1,1,1,2,2,3,4,5,6,7,8]})
I want to count the two numbers of df1 separately in df2, the correct answer like:
No Amount
1 3
2 2
Instead of:
No Amount
1 5
2 5
How can I solve this problem?
First filter df2 for values that are contained in df1['a'], then apply value_counts. The rest of the code just presents the data in your desired format.
result = (
df2[df2['a'].isin(df1['a'].unique())]['a']
.value_counts()
.reset_index()
)
result.columns = ['No', 'Amount']
>>> result
No Amount
0 1 3
1 2 2
In pandas 0.21.0 you can use set_axis to rename columns as chained method. Here's a one line solution:
df2[df2.a.isin(df1.a)]\
.squeeze()\
.value_counts()\
.reset_index()\
.set_axis(['No','Amount'], axis=1, inplace=False)
Output:
No Amount
0 1 3
1 2 2
You can simply find value_counts of second df and map that with first df i.e
df1['Amount'] = df1['a'].map(df2['a'].value_counts())
df1 = df1.rename(columns={'a':'No'})
Output :
No Amount
0 1 3
1 2 2
I have two df. One maps values to IDs. The other one has multiple entries of these IDs. I want to have a df with the first dataframe with the values assigned to the respective IDs.
df1 =
Val1 Val2 Val3
x 1000 2 0
y 2000 3 9
z 3000 1 8
df2=
foo ID bar
0 something y a
1 nothing y b
2 everything x c
3 who z d
result=
foo ID bar Val1 Val2 Val3
0 something y a 2000 3 9
1 nothing y b 2000 3 9
2 everything x c 1000 2 0
3 who z d 3000 1 8
I've tried merge and join (obviously incorrectly) but I am getting a bunch of NaNs when I do that. It appears that I am getting NaNs on every alternate ID.
I have also tried indexing both DFs by ID but that didn't seem to help either. I am obviously missing something that I am guessing is a core functionality but I can't get my head around it.
merge and join could both get you the result DataFrame you want. Since one of your DataFrames is indexed (by ID) and the other has just a integer index, merge is the logical choice.
Merge:
# use ID as the column to join on in df2 and the index of df1
result = df2.merge(df1, left_on="ID", right_index=True, how="inner")
Join:
df2.set_index("ID", inplace=True) # index df2 in place so you can use join, which merges by index by default
result = df2.join(df1, how="inner") # join df1 by index