can I split the value in a pandas row for searching? - python

I'm searching a DF using:
df.loc[df['Ticker'] == 'ibm'
The problem is df['Ticker'] is formated with another value after it(for example 'ibm US').
normally for string I can do something like .split[" "][0] to find the match but it doesn't work for my pandas search above(df.loc[df['Ticker'].split[" "][0] == 'ibm' - fails with AttributeError: 'Series' object has no attribute 'split').
What can I do to achieve my goal?

Are you are looking for str.contains?:
new_df = df[df['Ticket'].str.contains(r'ibm',case=False)]
which will create a new dataframe from rows that the 'Ticker' column contains 'ibm'.
You can use or and case=False (case insensitive) in str.contains:
new_df = df[df['Ticket'].str.contains(r'ibm|msft|google|..',case=False)]

Related

Python pandas lower data AttributeError: 'Series' object has no attribute 'lower'

I want to lower data taken from pandas sheet and trim all spaces then to look for an equality.
df['ColumnA'].loc[lambda x: x.lower().replace(" ", "") == var_name]
Code is above.
It says pandas series has no lower method. But I need to search for data inside column A via pandas framework while lowering all letters to small and whitespace trimmering.
Any other idea, how can I achieve in pandas?
In your lambda function, x is a Series not a string so you have to use str accessor:
df['ColumnA'].loc[lambda x: x.str.lower().replace(" ", "") == var_name]
Another way:
df.loc[df['ColumnA'].str.lower().str.replace(' ', '') == var_name, 'ColumnA']

How in Pandas Dataframe and Python can extract the specific text string after a given word?

I would like to extract the text inside the range "text: ....." from this dataframe and create another column with that value.
This is my Pandas Dataframe
issues_df['new_column'] = issues_df['fields.description.content'].apply(lambda x: x['text'])
However, it returns the following error:
issues_df['new_column'] = issues_df['fields.description.content'].apply(lambda x: x['text'])
TypeError: Object 'float' is not writable.
Any suggestions?
Thanks in advance.
Problem is NaN in column, you can try .str accessor
issues_df['new_column'] = issues_df['fields.description.content'].str[0].str['content'].str[0].str['text']
That could be a good task for the rather efficient json_normalize:
df['new_column'] = pd.json_normalize(
df['fields.description.content'], 'content'
)['text']

Drop/edit rows in dataframe where entry doesn't meet condition

I know this has been asked before but I cannot find an answer that is working for me. I have a dataframe df that contains a column age, but the values are not all integers, some are strings like 35-59. I want to drop those entries. I have tried these two solutions as suggested by kite but they both give me AttributeError: 'Series' object has no attribute 'isnumeric'
df.drop(df[df.age.isnumeric()].index, inplace=True)
df = df.query("age.isnumeric()")
df = df.reset_index(drop=True)
Additionally is there a simple way to edit the value of an entry if it matches a certain condition? For example instead of deleting rows that have age as a range of values, I could replace it with a random value within that range.
Try with:
df.drop(df[df.age.str.isnumeric() == False].index, inplace=True)
If you check documentation isnumeric is a method of Series.str and not of Series. That's why you get that error.
Also you will need the ==False because you have mixed types and get a series with only booleans.
I'm posting it in case this also helps you with your last question. You can use pandas.DataFrame.at with pandas.DataFrame.Itertuples for iteration over rows of the dataframe and replace values:
for row in df.itertuples():
# iterate every row and change the value of that column
if row.age == 'non_desirable_value:
df.at[row.Index, "age"] = 'desirable_value'
Hence, it could be:
for row in df.itertuples():
if row.age.str.isnumeric() == False or row.age == 'non_desirable_value':
df.at[row.Index, "age"] = 'desirable_value'

Using count inside pivot table Pandas

I want to count values from a column called "profitable_trades". This values are "TRUE" or "FALSE" depending if they are <0 or not. The problem I'm facing is that seems you can't use COUNT with Numpy. Is that correct?
What I want to know it's how many TRUE or FALSE I have for each item.
Here is the code I'm using:
currencies = ['audcad','audchf','audjpy','cadchf','eurcad','gbpaud','gbpchf','nzdusd']
filtered_df = df[df['Item'].isin(currencies)]
df3 = pd.pivot_table(filtered_df,index=["profitable_trades","month"],columns=["Item"],values=["profitable_trades"],aggfunc=[np.count],margins=True)
Here is the output:
AttributeError: module 'numpy' has no attribute 'count'
Any idea about how to use count inside a pivot table?
Thanks!
np.count does not exist. You can just use aggfunc = len:
df3 = pd.pivot_table(
df,
index=["profitable_trades","month"],
columns=["Item"],
aggfunc=len,
margins=True
)

Using pd.Dataframe.replace with an apply function as the replace value

I have several dataframes that have mixed in some columns with dates in this ASP.NET format "/Date(1239018869048)/". I've figured out how to parse this into python's datetime format for a given column. However I would like to put this logic into a function so that I can pass it any dataframe and have it replace all the dates that it finds that match a regex using pd.Dataframe.replace.
something like:
def pretty_dates():
#Messy logic here
df.replace(to_replace=r'\/Date(d+)', value=pretty_dates(df), regex=True)
Problem with this is that the df that is being passed to pretty_dates is the whole dataframe not just the cell that is needed to be replaced.
So the concept I'm trying to figure out is if there is a way that the value that should be replaced when using df.replace can be a function instead of a static value.
Thank you so much in advance
EDIT
To try to add some clarity, I have many columns in a dataframe, over a hundred that contain this date format. I would like not to list out every single column that has a date. Is there a way to apply the function the clean my dates across all the columns in my dataset? So I do not want to clean 1 column but all the hundreds of columns of my dataframe.
I'm sure you can use regex to do this in one step, but here is how to apply it to the whole column at once:
df = pd.Series(['/Date(1239018869048)/',
'/Date(1239018869048)/'],dtype=str)
df = df.str.replace('\/Date\(', '')
df = df.str.replace('\)\/', '')
print(df)
0 1239018869048
1 1239018869048
dtype: object
As far as I understand, you need to apply custom function to selected cells in specified column. Hope, that the following example helps you:
import pandas as pd
df = pd.DataFrame({'x': ['one', 'two', 'three']})
selection = df.x.str.contains('t', regex=True) # put your regexp here
df.loc[selection, 'x'] = df.loc[selection, 'x'].map(lambda x: x+x) # do some logic instead
You can apply this procedure to all columns of the df in a loop:
for col in df.columns:
selection = df.loc[:, col].str.contains('t', regex=True) # put your regexp here
df.loc[selection, col] = df.loc[selection, col].map(lambda x: x+x) # do some logic instead

Categories