so what i wanna do is basically i have a list of urls with multiple parameters, such as:
https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2
and i would want to get is something like this:
https://www.somesite.com/path/path2/path3?param1=PAYLOAD¶m2=value2
https://www.somesite.com/path/path2/path3?param1=value1¶m2=PAYLOAD
like i wanna iterate through every parameter (basically every match of "=" and "&") and replace each value one per time. Thank you in advance.
from urllib.parse import urlparse
import re
urls = ["https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2¶m3=value3",
"https://www.anothersite.com/path/path2/path3?param1=value1¶m2=value2¶m3=value3"]
parseds = [urlparse(url) for url in urls]
newurls = []
for parsed in parseds:
params = parsed[4].split("&")
for i, param in enumerate(params):
newparam = re.sub("=.+", "=PAYLOAD", param)
newurls.append(
parsed[0] +
"://" +
parsed[1] +
parsed[2] +
"?" +
parsed[4].replace(param, newparam)
)
newurls is
['https://www.somesite.com/path/path2/path3?param1=PAYLOAD¶m2=value2¶m3=value3',
'https://www.somesite.com/path/path2/path3?param1=value1¶m2=PAYLOAD¶m3=value3',
'https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2¶m3=PAYLOAD',
'https://www.anothersite.com/path/path2/path3?param1=PAYLOAD¶m2=value2¶m3=value3',
'https://www.anothersite.com/path/path2/path3?param1=value1¶m2=PAYLOAD¶m3=value3',
'https://www.anothersite.com/path/path2/path3?param1=value1¶m2=value2¶m3=PAYLOAD']
I've solved it:
from urllib.parse import urlparse
url = "https://github.com/search?p=2&q=user&type=Code&name=djalel"
parsed = urlparse(url)
query = parsed.query
params = query.split("&")
new_query = []
for param in params:
l = params.index(param)
param = str(param.split("=")[0]) + "=" + "PAYLOAD"
params[l] = param
new_query.append("&".join(params))
params = query.split("&")
for query in new_query:
print(str(parsed.scheme) + '://' + str(parsed.netloc) + str(parsed.path) + '?' + query)
Output:
https://github.com/search?p=PAYLOAD&q=user&type=Code&name=djalel
https://github.com/search?p=2&q=PAYLOAD&type=Code&name=djalel
https://github.com/search?p=2&q=user&type=PAYLOAD&name=djalel
https://github.com/search?p=2&q=user&type=Code&name=PAYLOAD
Related
I have a string https://www.exampleurl.com/
How would I insert a word in the middle of a string so it could look like this: https://www.subdomain.exampleulr.com/
I know I can insert the word if I did this:
url = 'https://www.exampleurl.com/'
url[:12] + 'subdomain'
It prints me https://www.subdomain, but I can't figure out how to print the rest of the string dynamically so it would adjust to the subdomain that is being appended to the string.
My goal is for the end result to look like the following https://www.subdomain.exampleurl.com/
url = 'https://www.exampleurl.com/'
content = url.split("www.")
url = content[0] + "www." + "subdomain." + content[1]
url = 'https://www.exampleurl.com/'
text = url.split(".")
url = text[0] + '.subdomain.' + text[1] + '.' + text[2]
Final output : https://www.subdomain.exampleurl.com/
Better split on the first .:
l = url.split('.', 1)
l[0] + '.subdomain.' + l[1]
## OR if subdomain is a variable:
f'{l[0]}.{subdomain}.{l[1]}'
output: 'https://www.subdomain.exampleurl.com/'
Using replace (once)
url = 'https://www.exampleurl.com/'
url = url.replace(".", ".subdomain.", 1) # only replaces first "." to
# get desured result
I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
When I use order:print(html.text),get all jsondata:
[{"Site":"屏東(琉球)","county":"屏東縣","PM25":"6","DataCreationDate":"202 0-04-19 03:00","ItemUnit":"μg/m3"},
{"Site":"臺南(北門)","county":"臺南市" ,"PM25":"25","DataCreationDate":"2020-04-19 03:00","ItemUnit":"μg/m3"}, ....................................
If I use order: for Site in jsondata:.....,I only get data:
SITE:基隆CYTY:基隆市P25:21DATE:2020-04-19 14:00UNIT:μg/m3
Why? thank you for your answer sincerely
import json
import requests
url1 = '[https://opendata.epa.gov.tw/ws/Data/ATM00625/?$format=json][1]'
html = requests.get(url1)
# html.encoding = "BIG5"
html.encoding = html.apparent_encoding
# print(html.text)
jsondata = eval(html.text)
# jsondata = json.loads(html.text)
for Site in jsondata:
Sitename = Site["Site"]
countyname = Site["county"]
PM25name = Site["PM25"]
DataCreationDatename = Site["DataCreationDate"]
ItemUnitname = Site["ItemUnit"]
print("SITE:" + Sitename + "CYTY:" + countyname + "P25:" + PM25name + "DATE:" + DataCreationDatename + "UNIT:" + ItemUnitname)
Always remember not to use eval() when you have json.loads():
import json
import requests
url1 = 'https://opendata.epa.gov.tw/ws/Data/ATM00625/?$format=json]'
html = requests.get(url1)
html = html.decode('utf8')
html = json.loads(html)
for Site in html:
Sitename = Site["Site"]
countyname = Site["county"]
PM25name = Site["PM25"]
DataCreationDatename = Site["DataCreationDate"]
ItemUnitname = Site["ItemUnit"]
print("SITE:" + Sitename + "CYTY:" + countyname + "P25:" + PM25name + "DATE:" + DataCreationDatename + "UNIT:" + ItemUnitname)
The problem lies somewhere in how I'm parsing and or reassembling urls. I'm losing the ?id=1 and getting ?d=1.
What I am trying to do is have the ability to manipulate and query parameter and reassemble it before sending back out modified. Meaning the dictionaries would be modified than using urlencode(modified_dict) I would reassemble url + query.
Can someone give me a pointer on what I'm doing wrong here.
from urlparse import parse_qs, urlparse , urlsplit
from urllib import urlencode
import os
import sys
import mechanize
from collections import OrderedDict
import urllib2
scrape_post_urls = []
get_inj_tests = []
#check multiple values to strip out duplicate and useless checks
def parse_url(url):
parsed = urlparse(url,allow_fragments=False)
if parsed.query:
if url not in get_inj_tests:
get_inj_tests.append(url)
#print url
'''get_inj_tests.append(url)
print url
#print 'scheme :', parsed.scheme
#print 'netloc :', parsed.netloc
print 'path :', parsed.path
print 'params :', parsed.params
print 'query :', parsed.query
print 'fragment:', parsed.fragment
#print 'hostname:', parsed.hostname, '(netloc in lower case)'
#print 'port :', parsed.port
'''
else:
if url not in scrape_post_urls:
scrape_post_urls.append(url)
#print url
def main():
unparsed_urls = open('in.txt','r')
for urls in unparsed_urls:
try:
parse_url(urls)
except:
pass
print(len(scrape_post_urls))
print(len(get_inj_tests))
clean_list = list(OrderedDict.fromkeys(get_inj_tests))
reaasembled_url = ""
#print clean_list
for query_test in clean_list:
url_object = urlparse(query_test,allow_fragments=False)
#parse query paramaters
url = query_test.split("?")[1]
dicty = {x[0] : x[1] for x in [x.split("=") for x in url[1:].split("&") ]}
query_pairs = [(k,v) for k,vlist in dicty.iteritems() for v in vlist]
reaasembled_url = "http://" + str(url_object.netloc) + str(url_object.path) + '?'
reaasembled_query = urlencode(query_pairs)
full_url = reaasembled_url + reaasembled_query
print dicty
main()
Can someone give me a pointer on what I'm doing wrong here.
Well quite simply you're not using the existing tools:
1/ to parse a query string, use urllib.parse.parse_qsl().
2/ to reassemble the querystring, use urllib.parse.urlencode().
And forget about dicts, querystrings can have multiple values for the same key, ie ?foo=1&foo=2 is perfectly valid.
first of all, your variable url is a bad name for the params variable and this could create confusion.
>>> url = "https://url.domian.com?id=22¶m1=1¶m2=2".split("?")[1]
'id=22¶m1=1¶m2=2'
>>> "https://url.domian.com?id=22¶m1=1¶m2=2".split("?")[1].split("&")
['id=22', 'param1=1', 'param2=2']
The error is in the url[1:].split("&")
Solution:
>>> dicty = {x[0] : x[1] for x in [x.split("=") for x in url.split("&") ]}
{'id': '22', 'param1': '1', 'param2': '2'}
NOTE: There is no fix url for it. Means it is not possible to see this url always. I want code which works for all the urls.
For ex, http://januapp.com/demo/search.php?search=aaa
http://januapp.com/demo/search.php?other=aaa
Now I want to change it to
http://januapp.com/demo/search.php?search=bbb
http://januapp.com/demo/search.php?other=bbb
I don't know how can I do it?
I tried this
import optparse
import requests
import urlparse
parser = optparse.OptionParser()
parser.add_option("-t","--Host", dest="Target", help="Please provide the target", default="true")
options, args = parser.parse_args()
url = options.Target
xss = []
xss.append("bbb")
try:
url2 =urlparse.urlparse(url)
print url2
url3 = urlparse.parse_qs(url2.query)
parametervalue = [key for key, key in url3.iteritems()] #[['aaa']]
parsed = parametervalue.append(xss[0])
print parsed
finalurl = urljoin(url, parsed)
print finalurl
except Exception as e:
print e
So when I pass this
xss3.py -t http://januapp.com/demo/search.php?search=aaa
The Error occurs below on to the cmd
ParseResult(scheme='http', netloc='januapp.com', path='/demo/search.php', params='', query='search=aaa', fragment='')
None
name 'urljoin' is not defined
See the None
Now that's the problem,
I am using Python2.7.
Thank you very much. Hope you get the problem.
You can try something with this kind of approach.
url = 'http://januapp.com/demo/search.php?search=aaa'
# First get all your query params
arr = url.split('?')
base_url = arr[0] # This is your base url i.e. 'http://januapp.com/demo/search.php'
params = arr[1] # here are your query params ['search=aaa']
# Now seprate out all the query parameters and their values
arr2 = params.split("=") # This will give you somrthing like this : ['search', 'aaa'], the the value will be next to the key
# This is a dictonary to hold the key value pairs
param_value_dict = {} # {'search': 'aaa'}
for i, str in enumerate(arr2):
if i % 2 == 0:
param_value_dict[str] = arr2[i + 1]
# now if you want to chnage the value of search from 'aaa' to 'bbb', then just change it in the dictonary
param_value_dict['search'] = 'bbb'
# now form the new url from the dictonary
new_url = base_url + '?'
for param_name, param_value in param_value_dict.items():
new_url = new_url + param_name + "=" + param_value + "&"
# remove the extra '&'
new_url = new_url[:len(new_url) - 1]
print(new_url)
How about:
ext = "bbb"
a = "http://januapp.com/demo/search.php?search="
print a+ext
Where ext is what you want to search for, a is the link and just add them together.
Or you could replace values like this:
ext = "bbb"
a = "http://januapp.com/demo/search.php?search=aaa"
print a.replace('aaa', ext)
Using regex:
import re
ext = "bbb"
a = "http://januapp.com/demo/search.php?search=aaa"
b=re.compile(r".+search=")
print re.search(b,a).group()+ext
I am able to do the following to build a url:
base_url = 'http://google.com/'
qs = urllib.urlencode({'q': string})
url = base_url + '?' + qs
Is there a way to url-encode a string? For example, I would like to be able to do:
url = 'http://google.com/?q=' + urlencode('this is my search'))
Use urllib.quote or urllib.quote_plus, eg:
>>> urllib.quote('this is my string')
'this%20is%20my%20string'
>>> urllib.quote_plus('this is my string')
'this+is+my+string'