Split a column value based on condition [duplicate] - python

This question already has answers here:
Split Pandas Series into DataFrame by delimiter
(2 answers)
Closed last year.
I am trying to split a column based on whether a slash('/) is present in the cell within that column. Not all cells contain slashes. Most only contain 3 letters (e.g.'ABC').
I am trying to avoid for loops since they affect performance. I have tried the following code:
df.column.split('/',expand=True)
I get the following outut:
AttributeError: 'Series' object has no attribute 'split'

Almost there:
df.column.str.split('/',expand=True)

Related

How to read the column using Pandas [duplicate]

This question already has answers here:
Pandas column access w/column names containing spaces
(6 answers)
Closed 10 months ago.
I am trying to put a condition to print the revenue which is greater or equal to certain number using Pandas in Python.
I am using the code line" df[df.Total Revenue>=6678690.38]"
I am getting the error "SyntaxError: invalid syntax." beacuse its not taking my column name(Total Revenue) with space. So how to read column if the column header is with space?
Note = df is where I am reading my file
this should work
df[df['Total Revenue']>=6678690.38]

syntax for data frames to use for 2 options [duplicate]

This question already has answers here:
How to use str.contains() with multiple expressions in pandas dataframes
(3 answers)
Filter pandas DataFrame by substring criteria
(17 answers)
How to test if a string contains one of the substrings in a list, in pandas?
(4 answers)
Pandas filtering for multiple substrings in series
(3 answers)
Closed 10 months ago.
I have a dataframe, df. One of the column is Text. I want to search the dataframe where the text contains ABC.
Hence, I write the code:
df["Text"].str.contains("ABC")
Now, I want to search which text contains ABC or XYZ.
What will be the syntax?
Using the | pipe is what you need
DF['Text'].str.contains('ABC|XYZ')
DF['Text'].str.contains('ABC') | DF['Text'].str.contains('XYZ')

Identifying line breaks anywhere in a dataframe [duplicate]

This question already has answers here:
How can I check for a new line in string in Python 3.x?
(4 answers)
Closed 1 year ago.
Looking for ways in which I can run an equivalent of a 'find' in Python in order to be able to identify line breaks.
I have tried using this but it didn't return any results unexpectedly:
df[df.isin(['\n']).any(axis=1)]
The str accessor has a function to search for substrings.
df["colA"].str.contains(r"\n")
Use it in conjunction with apply to get your solution.
df.apply(lambda s: s.str.contains(r"\n"))
If you want a pd.DataFrame as result, use:
df1 = testdf[testdf['B'].str.contains('\n')]
Another solution would be with iloc and np.where:
testdf.iloc[np.where(testdf['B'].str.contains('\n', regex=False))]

findall string that starts with letter "CU" and return full string [duplicate]

This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 3 years ago.
It seems like straight forward thing however could not find appropriate SO answer.
I have a column called title which contain strings. I want to find out rows that starts with letter "CU".
I've tried using df.loc however It's giving me indexError,
Using regex, re.findall(r'^CU', string)
returns 'CU' instead of full name ex: 'CU abcd'. How can I get full name that starts with 'CU'?
EDIT: SORRY, I did not notice it was a duplicate question, problem solved by reading duplicate question.
You can try:
string.startswith("CU")

Why does this RE not work and the group function failed? [duplicate]

This question already has answers here:
Extract part of a regex match
(11 answers)
Closed 3 years ago.
match_next = re.search(r'(再来週)の(.曜日)', '再来週の月曜日')
when I run match_next.group[1], I got the following:
TypeError: 'builtin_function_or_method' object is not subscriptable
Even if the match fails, why does the group function report this error?
The docs have the properties and how to use them. Basically the group method of the Match Object should be called with 1 or more integers indicating which groups you want to access.
match_next = re.search(r'(再来週)の(.曜日)', '再来週の月曜日')
match_next.group(1)

Categories