How to use BeautifulSoup find_previous with a condition or method? - python

Can find_previous be used to skip over tags that don't meet certain criteria?
for e in soup.select('p'):
number = e.find_previous('b').isnumeric().get_text(strip=True)
I am trying to find the previous tag with numeric text.
adding the .isnumeric() method doesn't work.
find_previous has a parameter text= but that doesn't serve my purposes either.
For clarification, I need to pass over previous tags that aren't numeric until it reaches the tag that is numeric.

.isnumeric() does not return a string. It returns a boolean value to let you know whether string is numeric or not.
You can use .previous_sibling and .isnumeric() method to achieve this.
See example, adapt as per your own code.
for e in soup.select('p'):
prev_sibling = e.find_previous_sibling(text=lambda text: text.strip().isnumeric())
if prev_sibling:
number = prev_sibling.strip()
else:
number = None

You cannot use the .isnumeric() method here, as it is a method of strings and not elements.
Corrected:
for e in soup.select('p'):
number = e.find_previous('b').get_text(strip=True)
if number.isnumeric():
# do something
pass

Related

How to search for an element inside another one with selenium

I am searching for elements that contains a certain string with find_elements, then for each of these elements, I need to ensure that each one contain a span with a certain text, is there a way that I can create a for loop and use the list of elements I got?
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
I believe you should be able to search within each element of a loop, something like this:
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
for element in today:
span = element.find_element(By.TAG_NAME, "span")
if span.text == "The text you want to check":
...do something...
Let me know if that works.
Sure, you can.
You can do something like the following:
today = self.dataBrowser.find_elements(By.XPATH, f'//tr[child::td[child::div[child::strong[text()="{self.date}"]]]]')
for element in today:
span = element.find_elements(By.XPATH,'.//span[contains(text(),"the_text_in_span")]')
if !span:
print("current element doesn't contain the desired span with text")
To make sure your code doesn't throw exception in case of no span with desired text found in some element you can use find_elements method. It returns a list of WebElements. In case of no match it will return just an empty list. An empty list is interpreted as a Boolean false in Python while non-empty list is interpreted as a Boolean true.

Find function in Python 3.x

When we have the following:
tweet2 = 'Want cheap snacks? Visit #cssu office in BA2283'
print(tweet2[tweet2.find('cheap')]) results in the output 'c' and I cant wrap my head around how it does this. I tried the visualizer and it didn't show anything. Could anyone please explain?
tweet2.find('cheap') returns the index at which the beginning of "cheap" is found, and when that index is used in tweet2[index], it returns the character at that index, which is "c"
It's because the find(str, string) method determines if str occurs in string, or in a substring of string and returns the position of first occurrence. So when you call tweet2.find('cheap') it will return the position that is the first occurs of cheap.
You should consider reading python documentation on string methods and lists
# define string variable tweet2
tweet2 = 'Want cheap snacks? Visit #cssu office in BA2283'
# find position of substring 'cheap', which is 5 (strings has 0-based indices in python
n = tweet2.find('cheap')
# print 5th element of string, which is 'c'
print(tweet2[n])
find returns an index, not a slice.
If you want the full string you can get it like so:
to_find = 'cheap'
ind = tweet2.find(to_find)
print(tweet2[ind:ind+len(to_find)])

Python regular expression: how to return search value?

I'm sure that this is a very elementary question, so thank you in advance for your patience.
I'm using a simple regular expression to identify whether a year is present in a line of text (line is a dictionary so I'm calling the particular field I want to search). Then I want to do something with the year. My question is, how can I return the year that my search finds? Everything I'm finding online tells me how to replace or edit it, but in this case the value is the thing I want to extract/use.
p = re.compile('\d{4}')
if p.search(line['productionStartYear']):
# do something with the value the regular expression found
else:
# do something else
You just need to put the matched object in a variable, then use group() attribute in order to return the matched string.
p = re.compile('\d{4}')
matched_obj = p.search(line['productionStartYear'])
if matched_obj:
# do something with the value the regular expression found
# print matched_obj.group(0)
else:
# do something else
match.group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group.

Display the number of lower case letters in a string

This is what I have so far:
count=0
mystring=input("enter")
for ch in mystring:
if mystring.lower():
count+=1
print(count)
I figured out how to make a program that displays the number of lower case letters in a string, but it requires that I list each letter individually: if ch=='a' or ch=='b' or ch=='c', etc. I am trying to figure out how to use a command to do so.
This sounds like homework! Anway, this is a fun way of doing it:
#the operator module contains functions that can be used like
#their operator counter parts. The eq function works like the
#'=' operator; it takes two arguments and test them for equality.
from operator import eq
#I want to give a warning about the input function. In python2
#the equivalent function is called raw_input. python2's input
#function is very different, and in this case would require you
#to add quotes around strings. I mention this in case you have
#been manually adding quotes if you are testing in both 2 and 3.
mystring = input('enter')
#So what this line below does is a little different in python 2 vs 3,
#but comes to the same result in each.
#First, map is a function that takes a function as its first argument,
#and applies that to each element of the rest of the arguments, which
#are all sequences. Since eq is a function of two arguments, you can
#use map to apply it to the corresponding elements in two sequences.
#in python2, map returns a list of the elements. In python3, map
#returns a map object, which uses a 'lazy' evaluation of the function
#you give on the sequence elements. This means that the function isn't
#actually used until each item of the result is needed. The 'sum' function
#takes a sequence of values and adds them up. The results of eq are all
#True or False, which are really just special names for 1 and 0 respectively.
#Adding them up is the same as adding up a sequence of 1s and 0s.
#so, map is using eq to check each element of two strings (i.e. each letter)
#for equality. mystring.lower() is a copy of mystring with all the letters
#lowercase. sum adds up all the Trues to get the answer you want.
sum(map(eq, mystring, mystring.lower()))
or the one-liner:
#What I am doing here is using a generator expression.
#I think reading it is the best way to understand what is happening.
#For every letter in the input string, check if it is lower, and pass
#that result to sum. sum sees this like any other sequence, but this sequence
#is also 'lazy,' each element is generated as you need it, and it isn't
#stored anywhere. The results are just given to sum.
sum(c.islower() for c in input('enter: '))
You have a typo in your code. Instead of:
if my.string.lower():
It should be:
if ch.islower():
If you have any questions ask below. Good luck!
I'm not sure if this will handle UTF or special characters very nicely but should work for at least ASCII in Python3, using the islower() function.
count=0
mystring=input("enter:")
for ch in mystring:
if ch.islower():
count+=1
print(count)
The correct version of your code would be:
count=0
mystring=input("enter")
for ch in mystring:
if ch.islower():
count += 1
print(count)
The method lower converts a string/char to lowercase. Here you want to know if it IS lowercase (you want a boolean), so you need islower.
Tip: With a bit of wizardry you can even write this:
mystring= input("enter")
count = sum(map(lambda x: x.islower(), mystring))
or
count = sum([x.islower() for x in mystring])
(True is automatically converted to 1 and False to 0)
:)
I think you can use following method:
mystring=input("enter:")
[char.lower() for char in mystring].count( True ) )

Is there a python special character for a blank value?

I don't know how to ask this so I'm going to explain what I'm doing instead. I'm trying to search a list of lists with only 2 values. The first value I don't care about the second how ever I need to check and if it exsists I need the first value. Example
list = [[1,'text1'],[1,'text2'],[3,'text3'],[2,'text4']]
so basically I need to know if there is a character like % or ! that when used in find basically means any value. Link find(!,'text2') would get me the value of 1. (i know that wouldn't work like that). Ik I have the option of searching just the second value in lists but that's unecessary code if there is a special character as such.
There is no specific character or value for that, but you can either create your own sentinel object or you can use None for this. Just make sure to use is to detect it within your code.
# fmod.py
Any = object()
def find(first=Any, second=Any):
if first is Any:
...
...
import fmod
fmod.find(None, 'text2')
fmod.find(fmod.Any, 'text2')
You can do this with a list comprehension:
def findit(word, lst):
return [el[0] for el in lst if el[1] == word][0]
Try None value.
You can read more information about it here: Python 3 Constrants
When you say "the second how ever I need to check and if it exists I need the first value", I'm thinking you need to check if the second value is a) there (if the list has two elements) and b) of nonzero length (assuming they'll always be strings). This leads me to:
list_to_search = [[1,'text1'],[1,'text2'],[3,'text3'],[2,'text4'], [3], [4, '']]
firsts_with_seconds = [item[0] for item in list_to_search
if len(item) > 0 and len(item[1]) > 0]

Categories