This question already has answers here:
str.startswith with a list of strings to test for
(3 answers)
Closed 4 years ago.
I am working on python script that splits text in different blocks based on keywords used in text.
Currently I split text into blocks with sth like this (for 1 block, others have pretty much the same strucure):
if (line.strip().lower().startswith('ключевые навыки')
or line.strip().lower().startswith('дополнительная информация')
or line.strip().lower().startswith('знания')
or line.strip().lower().startswith('личные качества')
or line.strip().lower().startswith('профессиональные навыки')
or line.strip().lower().startswith('навыки')):
But, it is possible that list of keywords is going to expand. Is there a possibility to generate multiple or statements based on some array of possible keywords?
Try this code
values=['ключевые навыки','дополнительная информация','знания']
val=True
#enter any words you want to check
while val
for i in values:
if (line.strip().lower().startswith(i)):
#whatever code you want to implement
val=False
#to exit loop
Hope it helps :)
Related
This question already has answers here:
Why doesn't print output show up immediately in the terminal when there is no newline at the end?
(1 answer)
Python print immediately?
(1 answer)
Closed 2 years ago.
I have the following code:
for file_name, content in corpus.items():
print('here')
content = [list(filter(lambda index: index not in remove_indices, content))]
corpus[file_name] = np.array(content).astype(np.uint32)
Where corpus is a 800,000 long dictionary with string keys and array values.
Things were taking forever so I decided to check how fast each iteration was by adding in that print statement.
If I comment the last two lines out it prints lots of heres really fast, so there's no problem with my iterator. What's really weird is that when I uncomment the last two lines, here takes a long time to print, even for the first one! It's like the print statement is somehow aware of the lines that follow it.
I guess my question speaks for itself. I'm in Jupyter notebook, if that helps.
This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 3 years ago.
It seems like straight forward thing however could not find appropriate SO answer.
I have a column called title which contain strings. I want to find out rows that starts with letter "CU".
I've tried using df.loc however It's giving me indexError,
Using regex, re.findall(r'^CU', string)
returns 'CU' instead of full name ex: 'CU abcd'. How can I get full name that starts with 'CU'?
EDIT: SORRY, I did not notice it was a duplicate question, problem solved by reading duplicate question.
You can try:
string.startswith("CU")
This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 3 years ago.
I'm looking for a cell in a data frame that contains the word "Scan". Unfortunately, there is also a word "Scan-Steuerung" which I would like to ignore.
How can I do this in python?
Is it also possible to get the index of this cell?
I'm looking for a cell in a data frame that contains the word "Scan". Unfortunately, there is also a word "Scan-Steuerung" which I would like to ignore.
How can I do this in python?
Is it also possible to get the index of this cell?
edit: I think it would be sufficient when I can read these two lines separately. At the moment, I use:
line = df[df["Name:"].str.contains("Scan")]
and when I print, I receive both lines at once.
Use Regex pattern boundaries \b
Ex:
df["Col"].str.contains(r"\bScan\b")
This question already has answers here:
How to test multiple variables for equality against a single value?
(31 answers)
Closed 7 years ago.
I am currently using if "not" or "missing" in temp_reader:
However, it looks like it is not working.
It always go inside of the if loop even if not or missing is not there.
How can I fix this, so that it only foes inside of the loop if a text file contains not or missing?
Change to something like this:
if "not" in temp_reader or "missing" in temp_reader:
The original condition was being parsed like this:
if ("not") or ("missing" in temp_reader):
Since "not" always evaluates to True, the branch was being taken every time.
This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I'm looking to build a string function to extract the string contents between two markers. It returns an extraction list
def extract(raw_string, start_marker, end_marker):
... function ...
return extraction_list
I know this can be done using regex but is this fast? This will be called billions of times in my process. What is the fastest way to do this?
What happens if the markers are the same and appear and odd number of times?
The function should return multiple strings if the start and end markers appear more than once.
You probably can't go faster than:
def extract(raw_string, start_marker, end_marker):
start = raw_string.index(start_marker) + len(start_marker)
end = raw_string.index(end_marker, start)
return raw_string[start:end]
But if you want to try regex, just try to benchmark it. There's a good timeit module for it.