Difference in Pandas between df['myColumn'] and df.myColumn [duplicate] - python

This question already has answers here:
Pandas column creation
(3 answers)
Accessing Pandas column using squared brackets vs using a dot (like an attribute)
(5 answers)
pandas dataframe where clause with dot versus brackets column selection
(1 answer)
Closed 5 years ago.
I just thought I added a column to a pandas dataframe with
df.NewColumn = np.log1p(df.ExistingColumn)
but when I looked it wasn't there! No error was raised either.. I executed it many times not believing what I was seeing but yeah, it wasn't there. So then did this:
df['NewColumn'] = np.log1p(df.ExistingColumn)
and now the new column was there.
Does anyone know the reason for this confusing behaviour? I thought those two ways of operating on a column were equivalent..

Related

How do I get a count on the affected rows from an operation on a Pandas dataframe? [duplicate]

This question already has answers here:
Python Pandas Counting the Occurrences of a Specific value
(8 answers)
Closed 11 months ago.
Given the following data set, loaded into a Pandas DataFrame
BARCODE
ALTERNATE_BARCODE
123
456
789
Imagine I have the following Pandas python Statement:
users.loc[users["BARCODE"] == "", "BARCODE"] = users["ALTERNATE_BARCODE"]
Is there any way - without rewriting this terse statement too much - that would allow me to access the number of rows in the DataFrame that got affected?
Edit: I am mainly on the lookout for the existence of a library or something build into Pandas that has knowledge of the last operation and could provide me with some metadata about it. Deltas is a good workaround, but not what I am after, since it would clutter the code.
Prior to replacing the values, get the length output of the .loc command.
len(users.loc[users["BARCODE"] == "", "BARCODE"].index)

Can pandas join tables if the common headers are different names? [duplicate]

This question already has answers here:
Joining pandas DataFrames by Column names
(3 answers)
Pandas Merging 101
(8 answers)
Closed last year.
I am following this article, but I was only able to get it to work by making sure there were matching titles, the two still had computer names, but they were called differently in the title, how could I modify my command so that it still references the same column, is that possible?
lj_df2 = pd.merge(d2, d3, on="PrimaryUser", how="left")
For example, I have this, but on my other csv, I have Employee # not primary user

How do you remove every second row in a pandas dataframe? [duplicate]

This question already has answers here:
pandas read_csv remove blank rows
(4 answers)
Closed 1 year ago.
I’ve read a file into a dataframe, and every second row is n/a. How do I remove the offending blank rows?
I am assuming there are many ways to do this. But I just use iloc
df = df.iloc[::2,:]
Try it and let me know if it worked for you.

Avoid 'SettingWithCopyWarning:' when setting values with loc and conditions (inside a function) [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
I'm trying to set some values in one column (col_2) coming from another column (col_3) when values in col_1 fulfill a condition.
I'm doing that with the .loc operator as after all my research looking for alternatives it looks as the best practice.
data.loc[data.col_1.isin(['A','B']),col_2]=data[col_3]
Nevertheless this stills trigger the 'SettingWithCopyWarning:'.
Is there any other way to do this operation without raising that warning?
Following the advice of Cameron,the line of code was part of a function, where data was passed as argument.
Starting the function with data = data.copy() solves the issue

Assigning dataframe to dataframe in Pandas Python [duplicate]

This question already has answers here:
Copy Pandas DataFrame using '=' trick [duplicate]
(2 answers)
Closed 4 years ago.
When i assign dataframe to another dataframe, making changes to one dataframe affects another dataframe
Code:
interest_margin_data = initial_margin_data
interest_margin_data['spanReq'] = (interest_margin_data['spanReq']*interest_margin_data['currency'].map(interestrate_dict))/(360*100*interest_margin_data['currency'].map(currency_dict))
initial_margin_data['spanReq'] /= initial_margin_data['currency'].map(currency_dict)
The second line changes the values in initial_margin_data as well.
Why is this so? How to affect this?
Use .copy to create a separate dataframe in memory:
interest_margin_data = initial_margin_data.copy()
It creates a different object in memory, rather than just pointing to the same place.
This is done so if you create a "view" of the dataframe it does not require substantially extra memory. It can index it, and calculate using the source.
In your case however you do not want this.

Categories