My python code looks like below. Basically, I am joining two part of url using urljoin module of urlib. The issue that I am facing is during the URL join my output looks like below. As shown below the input from a which is a list is getting displayed at start part of url and end has start information. My expected output is also mentioned below.
To summarize, I want user to input total number of terms and the entered terms should be passed into query part of URL (i.e. query[]=" "&query[]= " "). Not sure if I am missing something.
Thanks in advance for help!
Code
from urllib.parse import urljoin
num_terms=int(input("Enter total number of search terms:")) #Asking user for number of terms
a=input("Enter all search terms: ").split(",",num_terms) #User enters all the terms
start,end=input("Enter start and end date").split() #User enters start and end date
base_url="http://mytest.org"
join_url="/comments/data?"+"terms[]={}"+"&terms[]={}"*int(num_terms-1)+"&start={}&end={}".format(a,start,end)
url=urljoin(base_url,join_url) #Joining url
url
Output:
Enter total number of search terms:3
Enter all search terms: ty ou io
Enter start and end date2345 7890
"http://mytest.org/comments/data?terms[]={}&terms[]={}&terms[]={}start=['ty ou io']&end=2345"
Expected Output
"http://mytest.org/comments/data?terms[]=ty&terms[]=ou&terms[]=io&start=2345&end=7890"
One issue I spotted: the search terms don't have any (,) which you used to split the string.
# the base URL path
url_base = "http://mytest.org/comments/data?"
# you don't need a search term number here
# the split below will do the job
# ask for search item directly, must have at least one item
a = input("Enter all search terms (separate by ,): ").split(",")
while len(a) < 1:
a = input("Enter all search terms (separate by ,): ").split(",")
# ask for the start and end dates, no guarantee they are correct
# so use loop to force the user does the check for you
dates = input("Enter the start and end date (separate by ,): ").split(",")
while len(dates) != 2:
dates = input("Enter the start and end date (separate by ,): ").split(",")
# form the URL
url_search = "&".join([f"terms[]={x}" for x in a])
url_date = "start=" + dates[0] + "&end=" + dates[1]
# the final result
url_final = "&".join([url_base, url_search, url_date])
# print the result
print(url_final)
The output is like:
Enter all search terms (separate by ,): ty,ou,io
Enter the start and end date (separate by ,): 2000,2022
http://mytest.org/comments/data?&terms[]=ty&terms[]=ou&terms[]=io&start=2000&end=2022
As author mentioned in this comment he/she will use requests to make an API call, so constructing URL isn't necessary, you can just use functionality of module you're using. You can let requests build query string internally by passing dict with URL params to params argument (read Passing Parameters In URLs):
import requests
response = requests.get(
"http://mytest.org/comments/data",
{
"terms[]": ["ty", "ou", "io"],
"start": 2345,
"end": 7890
}
)
One problem is your code is only formatting the last bit of the url. That is,
"&start={}&end={}".format(a,start,end)
is the only part where the formatting applies; you need to add parentheses.
The other thing is that we need to unpack the list of terms, a, in the .format function:
join_url=("/comments/data?"+"terms[]={}"+"&terms[]={}"*int(num_terms-1)+"&start={}&end={}").format(*a,start,end)
But I'd recommend using f-strings instead of .format:
join_url=("/comments/data?"+'&'.join([f"terms[]={term}"for term in a])+f"&start={start}&end={end}")
(I also used str.join for the terms instead of string multiplication.)
A simple for loop should suffice:
terms = ""
for i in range(num_terms):
terms += f"terms[]={a[i]}&"
Basically, format takes a single value, it does not iterate over a list as you wanted. This is a simple way to achieve your goal. You could probably use list comprehension as well.
[f"terms[]={term}"for term in a]
Output:
Enter total number of search terms:3
Enter all search terms: au,io,ua
Enter start and end date233 444
http://mytest.org/comments/data?terms[]=au&terms[]=io&terms[]=ua&&start=233&end=444
Related
I'm writing a webscrabing program and need to bulk search on FedEx, to do this I normally concatenate all my tracking numbers with "\n" between them to stand as an equivalent as pasting text from an excel column
issue is when I enter my string into their search box, it enters the number concatenated as if without the delimiter, so the search box only sees 1 long tracking number rather than multiple (lie if pasted from excel) any idea on how I can get the string formatted, or sent to the search box correctly?
this is what it looks like when I paste 2 tracking numbers 12345 and abcdefg:
and here's what it should look like:
here is my code for sending the string to the search box:
def fedex_bulk(tn_list):
# you can mostly ignore until the end of this function, all this is setup for the driver #
# all relevant formatting is in the creating of variable search_str #
driver = start_uc()
loaded = False
size=3
tn_list = [tn_list[i:i+size] if i+size <= len(tn_list) else tn_list[i:len(tn_list)] for i in range(0,len(tn_list),size)]
tn_dict = []
for sublist in tn_list:
tries = 0
### concatenate all tracking numbers with "\n" delimiter
search_str = ''
for tn in sublist:
search_str+=tn+'\n'
### loop until numbers searched or tried 4 times
while not loaded:
try:
if tries==4:
break
tries+=1
### refresh until loaded
driver.get("https://www.fedex.com/en-us/tracking.html")
page_loaded = False
while not page_loaded:
try:
inputform = driver.find_element(By.XPATH, "//input[#class='form-input__element ng-pristine ng-invalid ng-touched']")
page_loaded = True
except:
driver.refresh()
sleep(5)
### search_str sent to search box, formatted incorrectly
inputform.send_keys(search_str)
sleep(1)
driver.find_element(By.XPATH, "//button[#type = 'submit']").click()
thankyou in advance!
I think the problem here is as following:
Inside the for sublist in tn_list: loop you do adding a '\n' to each tracking number tn in the sublist so search_str is containing a list of tracking numbers concatenated with '\n' between them.
But inside the while not page_loaded: you are locating the first input field and then you are sending to it all that long string containing multiple tracking numbers.
The search input element on the page is probably limited to accept valid inputs only, so it just ignores all the '\n' signs.
On the other hand, you are not inserting your tracking numbers to other search field inputs as you presenting on the picture showing how it should look.
So, in order to make your code work as you want you will probably need to insert single tracking number each time or to insert them to different search input fields.
So I have written this code to generate a random anime quote from the given website
could someone help me how can I make it so that the code checks if the quote has certain number of words and validates it accordingly
eg. If the quote has just 3 words I want it to try again
also if the quote has more than 20 words I want it to try again
I tried writing a while loop for it but it wasn't working any answers would be appreciated
url = "https://animechan.vercel.app/api/random"
data = requests.get(url).json()
anime = data["anime"]
quote =data["quote"]
character =data["character"]
print("Anime : "+anime)
print(quote)
print(" '"+character+"'")
the while loop solution i came up with
quote = ""
word_list = quote.split()
number = len(word_list)
while number<=3 and number>=15:
url = "https://animechan.vercel.app/api/random"
data = requests.get(url).json()
anime = data["anime"]
quote =data["quote"]
character =data["character"]
print("Anime : "+anime)
print(quote)
print(" '"+character+"'")
I think this might be what you're looking for (I'm assuming you wanted to stop on the first quote you found that was between 3 and 15 words long):
quote = ""
word_list = quote.split()
number = len(word_list)
while number<=3 or number>=15:
url = "https://animechan.vercel.app/api/random"
data = requests.get(url).json()
anime = data["anime"]
quote =data["quote"]
character =data["character"]
word_list = quote.split()
number = len(word_list)
print("Anime : "+anime)
print(quote)
print(" '"+character+"'")
I only made a couple of changes to your original:
Your while loop needed a change from and to or (the length can never be both less than or equal to 3 and greater than or equal to 15, so the loop never got the chance to run)
You need to recalculate the value of number each time you get a new value from the API
Your while loop needs to contain the or statement and the last condition should be number>=20
while number<=3 or number>=20:
Keep in mind though that you don't update number in the loop block, so the condition will never change, even with the corrected number<=3 or number>=20 condition.
Finally This is the solution if someone needs the same thing
import requests
number = 0
while number<=3 or number>=30:
url = "https://animechan.vercel.app/api/random"
data = requests.get(url).json()
anime = data["anime"]
quote =data["quote"]
character =data["character"]
print("Anime : "+anime)
print(quote)
print(" '"+character+"'")
word_list = quote.split()
number = len(word_list)
print(number)
As the title says, how do I check whether an input string has alphabets in another string in Python?
The string specifies that only alphabets A to G ('ABCDEFG') can be used in the input string. However, my attempt did not get the results I want. Instead, input strings with alphabets in order such as 'ABC' and 'ABCD' work, while those not in order such as 'BADD' and 'EFEG' do not.
Please refer to my attempt below.
ID = 'ABCDEFG'
addcode=input('Enter new product code: ')
Code = []
if addcode in ID:
Code.append(addcode)
print(Code)
else:
print("Product code is invalid")
Ideally, as long as the input string contain letters from A to G, it should be appended to 'Code' regardless of the order. How do I modify my code so that I can get the results I want? Thank you.
You can use RegEx:
re.search('[a-zA-Z]', string)
You can try converting your input string (addcode) to a set and then see if it is a subset of ID. I am not converting ID into a set as it contains unique elements as per your code:
ID = 'ABCDEFG'
addcode = input('Enter new product code: ')
Code = []
if set(addcode).issubset(ID):
Code.append(addcode)
print(Code)
else:
print("Product code is invalid")
If you want to use a RegEx based approach, you can do this:
import re
pattern = re.compile("^[A-G]+$")
addcode = input('Enter new product code: ')
Code = []
if pattern.findall(addcode):
Code.append(addcode)
print(Code)
else:
print("Product code is invalid")
We are checking if the input string contains only characters between A-G here i.e A,B,C,D,E,F,G. If there is a match, we append the input string and print it.
String is immutable. You should check whether each letter of product code is present in the ID.
To achieve this you can use ID as tuple instead of single string.
ID = ('A','B','C','D','E','F','G')
addcode=input('Enter new product code: ')
Code = []
for l in range(0,len(addcode)):
if addcode[l] in ID:
Code.append(addcode[l])
else:
print("Product code is invalid")
print(Code)
I'm currently trying to count the number of times a date occurs within a chat log for example the file I'm reading from may look something like this:
*username* (mm/dd/yyyy hh:mm:ss): *message here*
However I need to split the date from the time as I currently treat them as one. Im currently struggling to solve my problem so any help is appreciated. Down below is some sample code that I'm currently using to try get the date count working. Im currently using a counter however I'm wondering if there are other ways to count dates.
filename = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,))
mtxtr = filename.read()
date = []
number = []
occurences = Counter(date)
mtxtformat = mtxtr.split("\r\n")
print 'The Dates in the chat are as follows'
print "--------------------------------------------"
for mtxtf in mtxtformat:
participant = mtxtf.split("(")[0]
date = mtxtf.split("(")[-1]
message = date.split(")")[0]
date.append(date1.strip())
for item in date:
if item not in number:
number.append(item)
for item in number:
occurences = date.count(item)
print("Date Occurences " + " is: " + str(occurences))
Easiest way would be to use regex and take the count of the date pattern you have in the log file. It would be faster too.
If you know the date and time are going to be enclosed in parentheses at the start of the message (i.e. no parentheses (...): will be seen before the one containing the date and time):
*username* (mm/dd/yyyy hh:mm:ss): *message here*
Then you can extract based on the parens:
import re
...
parens = re.compile(r'\((.+)\)')
for mtxtf in mtxtformat:
match = parens.search(mtxtf)
date.append(match.group(1).split(' ')[0])
...
Note: If the message itself contains parens, this may match more than just the needed (mm/dd/yyyy hh:mm:ss). Doing match.group(1).split(' ')[0] would still give you the information you are looking for assuming there is no information enclosed in parens before your date-time information (for the current line).
Note2: Ideally enclose this in a try-except to continue on to the next line if the current line doesn't contain useful information.
I'm working on a program that lets the user enter a sequence they want to find inside a FASTA file, after which the program shows the description line and the sequence that belongs to it.
The FASTA can be found at hugheslab.ccbr.utoronto.ca/supplementary-data/IRC/IRC_representative_cdna.fa.gz, it's approx. 87 MB.
The idea is to first create a list with the location of description lines, which always start with a >. Once you know what are the description lines, you can search for the search_term in the lines between two description lines. This is exactly what is done in the fourth paragraph, this results in a list of 48425 long, here is an idea of what the results are: http://imgur.com/Lxy8hnI
Now the fifth paragraph is meant to search between two description lines, let's take lines 0 and 15 as example, this would be description_list[a] and description_list[a+1] as a = 0 and a+1 = 1, and description_list[0] = 0 and description_list[1] = 15. Between these lines the if-statement searches for the search term, if it finds one it will save description_list[a] into the start_position_list and description_list[a+1] into the stop_position_list, which will be used later on.
So as you can imagine a simple term like 'ATCG' will occur often, which means the start_position_list and stop_position_list will have a lot of duplicates, which will be removed using list(set(start_position_list)) and afterwards sorting them. That way start_position_list[0] and start_position_list[0] will be 0 and 15, like this: http://imgur.com/QcOsuhM, which can then be used as a range for which lines to print out to show the sequence.
Now, of course, the big issue is that line 15, for i in range(description_list[a], description_list[a+1]): will eventually hit the [a+1] while it's already at the maximum length of description_list and therefore will give a list index out of range error, as you see here as well: http://imgur.com/hi7d4tr
What would be the best solution for this ? It's still necessary to go through all the description lines and I can't come up with a better structure to go through them all ?
file = open("IRC_representative_cdna.fa")
file_list = list(file)
search_term = input("Enter your search term: ")
description_list = []
start_position_list = []
stop_position_list = []
for x in range (0, len(file_list)):
if ">" in file_list[x]:
description_list.append(x)
for a in range(0, len(description_list)):
for i in range(description_list[a], description_list[a+1]):
if search_term in file_list[i]:
start_position_list.append(description_list[a])
stop_position_list.append(description_list[a+1])
The way to avoid the subscript out of range error is to shorten the loop. Replace the line
for a in range(0, len(description_list)):
by
for a in range(0, len(description_list)-1):
Also, I think that you can use a list comprehension to build up description_list:
description_list = [x for x in file_list if x.startswith('>')]
in addition to being shorter it is more efficient since it doesn't do a linear search over the entire line when only the starting character is relevant.
Here is a solution that uses the biopython package, thus saving you the headache of parsing interleaved fasta yourself:
from Bio import SeqIO
file = open("IRC_representative_cdna.fa")
search_term = input("Enter your search term: ")
for record in SeqIO.parse(file, "fasta"):
rec_seq = record.seq
if search_term in rec-seq:
print(record.id)
print(rec-seq)
it wasn't very clear to me what your desired output is, but this code can be changed easily to fit it.