Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 days ago.
Improve this question
I wanted to capture a word from a request.
It is not possible to use the json part. It is only possible to capture her starting with. https and ending it with a "
I tried everything, using rsplit, rfind, find, split everything I saw ahead I tried, but what caught me was trying to get the last part which is " I tried to put '"' but it still said that something was wrong
r = session.get('https://google.com')
if "https://" in r.text:
addrss = *parse https:// -> "*
i want to capture only 1 url, however it can start anyway. There may be many urls or just 1, so I want to capture the first url that always comes. and that the variable name is addrss. I tried using
r.text.split('https', '"')
But it ended up not working. For me this was the most logical
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
My question is about Python code to consume existing REST APIs. The script is based on requests.
Let say, we have the following base URL:
www.something.com/search/countries/{country_code}/cities/{city_code}/
In order to ignore URL Encoding we assume that codes are numeric.
It seems that format is a perfect method to replace "place holders" with actual values.
URL.format(country_code = ..., city_code-... )
Is this pythonic enough?
I think that .format is a perfectly fine way to do that, but it's not super readable. You can use Python f-strings to make that more readable like:
country_code = 'foo'
city_code = 'bar'
url = f'www.something.com/search/countries/{country_code}/cities/{city_code}/'
The f will tell Python to interpolate the values inside the curly brackets with the variables defined above.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to capture any alpha numeric character between ''
Regex
'(.*.doc)' will only capture .doc files.
'(\w)' should capture any alpha numeric character.
But I am looking to capture any character between '' except the ---- characters.
Here you can use the following regular expression: ([^\-\[\][\n']+)
An example:
regexr.com/5btcs
Is this good?
'[^'-]*'
Means a single ', then anything not ' or -, then another '.
If you wish to capture things around the dashes though, you might have to capture inclusively and filter them out.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have texts in an excel file that looks something like this:
alpha123_4rf
45beta_Frank
Red5Great_Sam_Fun
dan.dan_mmem_ber
huh_k
han.jk_jj
huhu
I am trying to use a regex to match all of these words and save them into a set().
I have tried r"(\w+..*?_.*?\w+)" as seen here . But cant seem to capture the word huhu that does not have special characters.
Your regex is capturing word that have a _ in them, and huhu don't.
You could change your regex to match every letter, number, underscore, and dots, multiple times.
([\w.]+)
I've fork your regex101
If you wish to match something more precise, you might need to give us more information about your context and what exactly you are trying to match.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am using scrapy to scrape the date that a comment was posted on a forum. I have been able to scrape the contents of the divider that contains the date, but it has escaped characters on both sides that make the string unusable. I need to create a regex expression which matches everything except for escaped characters.
The string I am working with is "\r\n\t\t\t\r\n\t\t\t\t08-07-2019, 11:37:16 AM\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t". I want only to match the date inside.
The pattern that I was trying to use was (?<!\\\\)\\+[\\w-]+, as was recommended by other topics, but this doesn't match anything in that string.
You don't need regex if you want to match everything. I strongly recommend you to use Item Loaders in Scrapy to process your fields (using .strip() etc).
Also you can remove unwanted characters from your string using XPath normalize-space():
event_time = response.xpath('normalize-space(string(//YOUR/XPATH/HERE))').get()
But if you want to match part of a complex string you can use regular expresssion of course:
event_time = response.xpath('//YOUR/XPATH/HERE').re_first(r'(\d{2}-\d{2}-\d{4},\s+\d{2}:\d{2}:\d{2}\s+\w{2})')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I get a link with requests.get and when I check history it's empty although the link redirects to another address when I open it with my browser. What is the problem?
import requests
r=requests.get('http://dir.iran.ir/home?p_p_id=webdirectorydisplay_WAR_webdirectoryportlet&p_p_lifecycle=0&p_p_state=exclusive&p_p_mode=view&_webdirectorydisplay_WAR_webdirectoryportlet_itemEntryId=14439&_webdirectorydisplay_WAR_webdirectoryportlet_cmd=redirectToLink')
result=r.history
but result equal with empty list
and final link is http://www.dps.ir/
You should check the result of that URL first.
>>> r.content
'<script type="text/javascript">window.location.href="http://www.dps.ir";</script> '
Requests library doesn't provide ability to execute Javascript, so that explains why there is no history.
PS: Btw you could give phantomjs a shot.