Loop URL list and save as txt file

Loop URL list and save as txt file - python

I have a URL as follows: https://www.vq.com/36851082/?p=1. I want to create a file named list_of_urls.txt which contains url links from p=1 to p=20, seperate each with space, and save it as a txt file.
Here is what I have tried, but it only prints the last one:
url = "https://www.vq.com/36851082/?p="
list_of_urls = []
for page in range(20):
list_of_urls = url + str(page)
print(list_of_urls)
The expected txt file inside would be like this:

It is the occasion to use f-strings, usable since Python 3.6, and fully described in PEP 498 -- Literal String Interpolation.
url_base = "https://www.vq.com/36851082/?p="
with open('your.txt', 'w') as f:
for page in range(1, 20 + 1):
f.write(f'{url_base}{page} ')
#f.write('{}{} '.format(url_base, page))
#f.write('{0}{1} '.format(url_base, page))
#f.write('{u}{p} '.format(u=url_base, p=page))
#f.write('{u}{p} '.format(**{'u':url_base, 'p':page}))
#f.write('%s%s '%(url_base, page))
Notice the space character at the end of each formatting expression.

Be careful with range - it starts from from 0 by default and the last number of the range is not included. Hence, if you want numbers 1 - 20 you need to use range(1, 21).
url_template = "https://www.vq.com/36851082/?p={page}"
urls = [url_template.format(page=page) for page in range(1, 21)]
with open("/tmp/urls.txt", "w") as f:
f.write(" ".join(urls))

Try this :)
url = "https://www.vq.com/36851082/?p="
list_of_urls = ""
for page in range(20):
list_of_urls = list_of_urls + url + str(page) + " "
print(list_of_urls)

Not sure if you want one line inside your file but if so:
url = "https://www.vq.com/36851082/?p=%i"
with open("expected.txt", "w") as f:
f.write(' '.join([url %i for i in range(1,21)]))
Output:
https://www.vq.com/36851082/?p=1 https://www.vq.com/36851082/?p=2 https://www.vq.com/36851082/?p=3 https://www.vq.com/36851082/?p=4 https://www.vq.com/36851082/?p=5 https://www.vq.com/36851082/?p=6 https://www.vq.com/36851082/?p=7 https://www.vq.com/36851082/?p=8 https://www.vq.com/36851082/?p=9 https://www.vq.com/36851082/?p=10 https://www.vq.com/36851082/?p=11 https://www.vq.com/36851082/?p=12 https://www.vq.com/36851082/?p=13 https://www.vq.com/36851082/?p=14 https://www.vq.com/36851082/?p=15 https://www.vq.com/36851082/?p=16 https://www.vq.com/36851082/?p=17 https://www.vq.com/36851082/?p=18 https://www.vq.com/36851082/?p=19 https://www.vq.com/36851082/?p=20

This one also work, thanks to my colleague!
url = "https://www.vq.com/36851082/?p=%d"
result = " ".join([ url % (x + 1) for x in range(20)])
with open("list_of_urls.txt", "w") as f:
f.write(result)

Related

How would I insert a word into a string?

I have a string https://www.exampleurl.com/
How would I insert a word in the middle of a string so it could look like this: https://www.subdomain.exampleulr.com/
I know I can insert the word if I did this:
url = 'https://www.exampleurl.com/'
url[:12] + 'subdomain'
It prints me https://www.subdomain, but I can't figure out how to print the rest of the string dynamically so it would adjust to the subdomain that is being appended to the string.
My goal is for the end result to look like the following https://www.subdomain.exampleurl.com/

url = 'https://www.exampleurl.com/'
content = url.split("www.")
url = content[0] + "www." + "subdomain." + content[1]

url = 'https://www.exampleurl.com/'
text = url.split(".")
url = text[0] + '.subdomain.' + text[1] + '.' + text[2]
Final output : https://www.subdomain.exampleurl.com/

Better split on the first .:
l = url.split('.', 1)
l[0] + '.subdomain.' + l[1]
## OR if subdomain is a variable:
f'{l[0]}.{subdomain}.{l[1]}'
output: 'https://www.subdomain.exampleurl.com/'

Using replace (once)
url = 'https://www.exampleurl.com/'
url = url.replace(".", ".subdomain.", 1) # only replaces first "." to
# get desured result

[UPDATE]Google Search Python Module

I have a mistake with google search module.
I try to use this module to do multiple requests, but i have a mistake for do each query with the word.
alpha = input(colored ("[{}*{}] Enter Path of you're Word : ",'yellow'))
word = open(alpha, 'r')
Lines = word.readlines()
query = Lines
try:
print(colored("[{}+{}] Scan started! Please wait... :)",'red'))
for gamma in search(query, start=0, tld=beta, num=1000 , pause=2):
print(colored ('[+] Found > ' ,'yellow') + (gamma) )
with open("googleurl.txt","a") as f:
f.write(gamma + "/" + "\n")
except:
print("[{}-{}] Word Liste not found!")
I think it's not possible to do multiple query,
Because my dorks is loaded into my python program but query not done. If i change
query = "test"
I have like 100 requests for the word test. I think i have do a bad things, for do query with the text file.
I'm sorry for my bad English. I'm a beginner with English and also with Python
I hope you can help me
I'm now with this program :
alpha = input(colored ("[{}*{}] Wordlist : ",'yellow'))
Word = open(alpha, 'r')
Lines = Word.readlines()
query = Lines
beta = random.choice(TLD)
Word_number = 0
for line in Lines:
Word_number+=1
for query in Lines:
print("Nombre de Word: "+str(Word_number))
for i in search(query, start=0, tld=beta, num=1000 , pause=2, stop=None):
print(colored ('[+] Found > ' ,'yellow') +(i))
URL_number+=1
with open("googleurl.txt","a") as f:
f.write(i + "/" + "\n")
f.close()
print(colored("[{}+{}] Total Google URL : ",'red') + str(URL_number))
And my program answer do this :
He just fount 98 website and stop, and he only check the 1st word

word.readlines() returns a list of strings, where each item is the next line in the file. This means that query is a list.
The search() function wants query to be a string, so you'll have to loop through Lines to get each individual query:
for query in Lines:
# perform search with this query

Hey i finally update my code. And i now i have a problem with proxies.
The code is fixed for requests with dorks but i can't find how to add proxy my code is :
alpha = input(colored ("[{}*{}] Dorklist : ",'yellow'))
dorks = open(alpha, 'r')
Lines = dorks.readlines()
query = Lines
beta = random.choice(TLD)
ceta = input(colored ("[{}*{}] Proxylist :",'yellow'))
prox = open(ceta, 'r')
Lines2 = prox.readlines()
proxy = Lines2
Dorks_number = 0
Proxy_number = 0
for line in Lines:
Dorks_number+=1
for line in Lines2:
Proxy_number+=1
print("Nombre de dorks: "+str(Dorks_number))
print("Nombre de Proxy: "+str(Proxy_number))
s = requests.Session(proxies=proxy)
s.cookies.set_policy(BlockAll())
for query in Lines:
for i in search(query, start=0, tld=beta, num=1000 , pause=2, stop=None):
print(colored ('[+] Found > ' ,'yellow') +(i))
URL_number+=1
with open("googleurl.txt","a") as f:
f.write(i + "/" + "\n")
f.close()
print(colored("[{}+{}] Total Google URL : ",'red') + str(URL_number))
My error :
s = requests.Session(proxies=proxy)
TypeError: init() got an unexpected keyword argument 'proxies'
Someone have an idea how to done it ?

need help regarding this error: can only concatenate list (not "str") to list

I'm learning python so I am pretty new to it.
I've been working on a class assignment and iv'e been facing some error, such as the one in the title.
This is my code:
import random
def getWORDS(filename):
f = open(filename, 'r')
templist = []
for line in f:
templist.append(line.split("\n"))
return tuple(templist)
articles = getWORDS("articles.txt")
nouns = getWORDS("nouns.txt")
verbs = getWORDS("verbs.txt")
prepositions = getWORDS("prepositions.txt")
def sentence():
return nounphrase() + " " + verbphrase()
def nounphrase():
return random.choice(articles) + " " + random.choice(nouns)
def verbphrase():
return random.choice(verbs) + " " + nounphrase() + " " + \
prepositionalphrase()
def prepositionalphrase():
return random.choice(prepositions) + " " + nounphrase()
def main():
number = int(input("enter the number of sentences: "))
for count in range(number):
print(sentence())
main()
However, whenever I run it I get an this error:
TypeError: can only concatenate list (not "str") to list.
Now, I know there are tons of question like this but I tried a lot of time, I am not able to fix it, I'm new to programming so I've been learning the basics since last week.
Thank you

Here I've modified the function slightly - it'll fetch every words into a tuple. Use with to open the files - it will close the pointer once the values have been fetched.
I hope this will work for you!
def getWORDS(filename):
result = []
with open(filename) as f:
file = f.read()
texts = file.splitlines()
for line in texts:
result.append(line)
return tuple(result)

I think the problem is in this line:
templist.append(line.split("\n"))
split() will return a list that is then appended to templist. If you're wanting to remove the newline character from the end of the line use rstrip() as this will return a string.

When working with a file, you should use the read() method:
file = f.read()
To split the file to lines and add to a list, you first split, then append line by line.
file = f.read()
lines = file.split("\n")
for line in lines:
templist.append(line)
In your case, you are using the list of lines as-is, so I would write:
file = f.read()
templist = file.split("\n")
Edit 1:
Another useful tool when working with files is f.readline(), which returns the first line when calling it for the first time, second when calling it once again... third... and so on, although the previous ways I showed would be more efficient here.
Edit 2:
When you are done using the file, use the close() method, or start using the file with a with ... as method which closes the file at the end of the code block.
Code example using with ... as (The best written code in this answer):
def getWORDS(filename):
with open(filename, 'r') as f:
file = f.read()
templist = file.split("\n")
return tuple(templist)
Code example using close():
def getWORDS(filename):
f = open(filename, 'r')
file = f.read()
templist = file.split("\n")
f.close()
return tuple(templist)

This is how I would write the full code.
(fixed file opening and reading + fixed capitalization)
import random
def getWORDS(filename):
with open(filename, 'r') as f:
file = f.read()
templist = file.split("\n")
return tuple(templist)
articles = getWORDS("articles.txt")
nouns = getWORDS("nouns.txt")
verbs = getWORDS("verbs.txt")
prepositions = getWORDS("prepositions.txt")
def sentence():
sentence = nounphrase() + " " + verbphrase()
sentence = sentence.split(" ")
sentence[0] = sentence[0].capitalize()
sentence = " ".join(sentence)
return sentence
def nounphrase():
return random.choice(articles).lower() + " " + random.choice(nouns).capitalize()
def verbphrase():
return random.choice(verbs).lower() + " " + nounphrase() + " " + \
prepositionalphrase()
def prepositionalphrase():
return random.choice(prepositions).lower() + " " + nounphrase()
def main():
number = int(input("enter the number of sentences: "))
for count in range(number):
print(sentence())
main()

insert content in a specific place inside an existing file python

I am trying to append the parameters passed to a function to a specific place in an existing text file.
txt file:
query{
text:"",
source_language:"",
target_language:"",
},
data_type:[16],
params{
client:"xyz"
}
python:
def function(text,source_language,target_language):
f = open("file.txt", "w");
f.write( 'text:' + text + '\n' )
f.write( 'source_language:' + source_language + '\n' )
f.write( 'target_language:' + target_language + '\n' )
f.close()
But, its not working. Is there a way to append the parameters directly into the file including " " and ,. I am trying to add just the parameters into the existing file with data at the specified position.

Solution
In revision to your comments, considering that this is only being applied to the example above and need only to alter those specific three lines, this will accomplish the task (included if location: in case you don't match keyword it won't erase your file by open('w')
def change_text(new_text):
content[1] = list(content[1])
y = list(new_text)
content[1] = content[1][:6] + y + content[1][-2:]
content[1] = ''.join(content[1])
def change_source_l(new_source_l):
content[2] = list(content[2])
y = list(new_source_l)
content[2] = content[2][:17] + y + content[2][-2:]
content[2] = ''.join(content[2])
def change_target_l(new_target_l):
content[3] = list(content[3])
y = list(new_target_l)
content[3] = content[3][:17] + y + content[3][-2:]
content[3] = ''.join(content[3])
filename = open('query.txt', 'r')
content = filename.read()
content = content.split()
filename.close()
name = open('query.txt', 'w')
change_text('something')
change_source_l('this')
change_target_l('that')
name.write('\n'.join(content))
name.close()
Output
(xenial)vash#localhost:~/python/LPTHW$ cat query.txt
query{
text:"something",
source_language:"this",
target_language:"that",
},
data_type:[16],
params{
client:"xyz"

Open file in r+ mode
Use .seek method from Python file I/O and then write your content.

How to Modify Python Code in Order to Print Multiple Adjacent "Location" Tokens to Single Line of Output

I am new to python, and I am trying to print all of the tokens that are identified as locations in an .xml file to a .txt file using the following code:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('exercise-ner.xml', 'r'))
tokenlist = soup.find_all('token')
output = ''
for x in tokenlist:
readeachtoken = x.ner.encode_contents()
checktoseeifthetokenisalocation = x.ner.encode_contents().find("LOCATION")
if checktoseeifthetokenisalocation != -1:
output += "\n%s" % x.word.encode_contents()
z = open('exercise-places.txt','w')
z.write(output)
z.close()
The program works, and spits out a list of all of the tokens that are locations, each of which is printed on its own line in the output file. What I would like to do, however, is to modify my program so that any time beautiful soup finds two or more adjacent tokens that are identified as locations, it can print those tokens to the same line in the output file. Does anyone know how I might modify my code to accomplish this? I would be entirely grateful for any suggestions you might be able to offer.

This question is very old, but I just got your note #Amanda and I thought I'd post my approach to the task in case it might help others:
import glob, codecs
from bs4 import BeautifulSoup
inside_location = 0
location_string = ''
with codecs.open("washington_locations.txt","w","utf-8") as out:
for i in glob.glob("/afs/crc.nd.edu/user/d/dduhaime/java/stanford-corenlp-full-2015-01-29/processed_washington_correspondence/*.xml"):
locations = []
with codecs.open(i,'r','utf-8') as f:
soup = BeautifulSoup(f.read())
tokens = soup.findAll('token')
for token in tokens:
if token.ner.string == "LOCATION":
inside_location = 1
location_string += token.word.string + u" "
else:
if location_string:
locations.append( location_string )
location_string = ''
out.write( i + "\t" + "\t".join(l for l in locations) + "\n" )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop URL list and save as txt file - python

Try this :) url = "https://www.vq.com/36851082/?p=" list_of_urls = "" for page in range(20): list_of_urls = list_of_urls + url + str(page) + " " print(list_of_urls)

This one also work, thanks to my colleague! url = "https://www.vq.com/36851082/?p=%d" result = " ".join([ url % (x + 1) for x in range(20)]) with open("list_of_urls.txt", "w") as f: f.write(result)

Related

How would I insert a word into a string?

[UPDATE]Google Search Python Module

need help regarding this error: can only concatenate list (not "str") to list

insert content in a specific place inside an existing file python

How to Modify Python Code in Order to Print Multiple Adjacent "Location" Tokens to Single Line of Output

Categories

Resources