reading and printing text file from a website url line by line

reading and printing text file from a website url line by line - python

I have this code here:
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
File = url.text
for line in File:
print(line)
The output looks like this:
p
i
l
l
o
w
and so on...
Instead, I want it to look like this:
pillow
fire
thumb
and so on...
I know I can add end="" inside of print(line) but I want a variable to be equal to those lines. For example
Word = line
and when you print Word, it should look like this:
pillow
fire
thumb

.text of requests' response is str, you might use .splitlines for iterating over lines as follows:
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
for line in url.text.splitlines():
print(line)
Note that .splitlines() deals with different newlines, so you can use it without worrying about what newlines exactly are used (using .split("\n") is fine as long are you sure you working with Linux-style newlines)

you cannot do for line in url.text because url.text is not a IO (File). Instead, you can either print it directly (since \n or the line breaks will automatically print as line breaks) or if you really need to split on new lines, then do for line in url.text.split('\n').
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
for line in url.text.split('\n'):
print(line)
Edit: You might also want to do .strip() as well to remove extra line breaks.

response is a str object which you need to split() first as:
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
file = url.text.split()
for line in file:
print(line)

You can also use split("\n"):
import requests
for l in requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt").text.split("\n"):
print(l)
Demo

Related

How to extract only lines with specific word from text file and write a new one?

Whats the way to extract only lines with specific word only from requests (online text file) and write to a new text file? I am stuck here...
This is my code:
r = requests.get('http://website.com/file.txt'.format(x))
with open('data.txt', 'a') as f:
if 'word' in line:
f.write('\n')
f.writelines(str(r.text))
f.write('\n')
If I remove: if 'word' in line:, it works, but for all lines. So it's only copying all lines from one file to another.
Any idea how to give the correct command to extract (filter) only lines with specific word?
Update: This is working but If that word exist in the requests file, it start copying ALL lines, i need to copy only the line with 'SOME WORD'.
I have added this code:
for line in r.text.split('\n'):
if 'SOME WORD' in line:
*Thank you guys for all the answers and sorry If i didn't made myself clear.

Perhaps this will help.
Whenever you invoke POST/GET or whatever, always check the HTTP response code.
Now let's assume that the lines within the response text are delimited with newline ('\n') and that you want to write a new file (change the mode to 'a' if you want to append). Then:
import requests
(r := requests.get('SOME URL')).raise_for_status()
with open('SOME FILENAME', 'w') as outfile:
for line in r.text.split('\n'):
if 'SOME WORD' in line:
print(line, file=outfile)
break
Note:
You will need Python 3.8+ in order to take advantage of the walrus operator in this code

I would suggest you these steps for properly handling the file:
Step1:Streamline the download file to a temporary file
Step2:Read lines from the temporary file
Step3:Generate main file based on your filter
Step4:Delete the temporary file
Below is the code that does the following steps:
import requests
import os
def read_lines(file_name):
with open(file_name,'r') as fp:
for line in fp:
yield line
if __name__=="__main__":
word='ipsum'
temp_file='temp_file.txt'
main_file='main_file.txt'
url = 'https://filesamples.com/samples/document/txt/sample3.txt'
with open (temp_file,'wb') as out_file:
content = requests.get(url, stream=True).content
out_file.write(content)
with open(main_file,'w') as mf:
out=filter(lambda x: word in x,read_lines(temp_file))
for i in out:
mf.write(i)
os.remove(temp_file)

Well , there is missing line you have to put in order to check with if statement.
import requests
r = requests.get('http://website.com/file.txt').text
with open('data.txt', 'a') as f:
for line in r.splitlines(): #this is your loop where you get a hold of line.
if 'word' in line: #so that you can check your 'word'
f.write(line) # write your line contains your word

Searching for word in file and taking whole line

I am running this program to basically get the page source code of a website I put in. It saves it to a file and what I want is it to look for a specific string which is basically # for the emails. However, I can't get it to work.
import requests
import re
url = 'https://www.youtube.com/watch?v=GdKEdN66jUc&app=desktop'
data = requests.get(url)
# dump resulting text to file
with open("data6.txt", "w") as out_f:
out_f.write(data.text)
with open("data6.txt", "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if "#" in line:
for l in searchlines[i:i+3]: print((l))

You can use the regex method findall to find all email addresses in your text content, and use file.read() instead of file.readlines(). To get all content together rather than split into separate lines.
For example:
import re
with open("data6.txt", "r") as file:
content = file.read()
emails = re.findall(r"[\w\.]+#[\w\.]+", content)
Maybe cast to a set for uniqueness afterwards, and then save to a file however you like.

Remove comment lines from a file

I'm making a file type to store information from my program. The file type can include lines starting with #, like:
# This is a comment.
As shown, the # in front of a line denotes a comment.
I've written a program in Python that can read these files:
fileData = []
file = open("Tutorial.rdsf", "r")
line = file.readline()
while line != "":
fileData.append(line)
line = file.readline()
for item in list(fileData):
item.strip()
fileData = list(map(lambda s: s.strip(), fileData))
print(fileData)
As you can see, it takes the file, adds every line as an item in a list, and strips the items of \n. So far, so good.
But often these files contain comments I've made, and such the program adds them to the list.
Is there a way to delete all items in the list starting with #?
Edit: To make things a bit clearer: Comments won't be like this:
Some code:
{Some Code} #Foo
They'll be like this:
#Foo
Some code:
{Some Code}

You can process lines directly in a for loop:
with open("Tutorial.rdsf", "r") as file:
for line in file:
if line.startswith('#'):
continue # skip comments
line = line.strip()
# do more things with this line
Only put them into a list if you need random access (e.g. you need to access lines at specific indices).
I used a with statement to manage the open file, when Python reaches the end of the with block the file is automatically closed for you.

It's easy to check for leading # signs.
Change this:
while line != "":
fileData.append(line)
line = file.readline()
to this:
while line != "":
if not line.startswith("#"):
fileData.append(line)
line = file.readline()
But your program is a bit complicated for what it does. Look in the documentation where it explains about for line in file:.

Stored HL7 in a file and facing error while parsing line by line using python

I have stored my HL7 version 2.6 content in example.txt.I have no issue in reading the file but while parsing the file, the 1st line gets parsed but tells me that 2nd line is not hL7 when using hl7.ishl7(line) so it does not get parsed so it stops executing after the 2nd line. I don't know what is the issue in the content.
The content of the text file is:
MSH|^~\&|AcmeMed|Lab|Main HIS|St. Micheals|20130408031655||ADT^A01|6306E85542000679F11EEA93EE38C18813E1C635CB09673815639B8AD55D6775|P|2.6|
EVN||20050622101634||||20110505110517|
PID|||231331||Garland^Tracy||19010201|F||EU|147 Yonge St.^^LA^CA^58818|||||||28-457-773|291-697-644|
NK1|1|Smith^Sabrina|Second Cousin|
NK1|2|Fitzgerald^Sabrina|Second Cousin|
NK1|3|WHITE^Tracy|Second Cousin|
PV1||||||||^Fitzgerald^John^F|||||||||||5778985|||||||||||||||||||||||||20020606051116|
OBX|||WT^WEIGHT||78|pounds|
OBX|||HT^HEIGHT||57|cm|
For the code:
import json
import hl7
import re
i=0
with open('example.txt','r') as f:
for line in f:
print hl7.isfile(line)
print line
h=hl7.parse(line)
i = i+1
print i

You should parse the whole message, not line by line. HL7 message lines by themselves are not valid HL7 messages.

I think you can go
result = hl7.parse(f)
Then access the desired segment - first value being line, next being segment.
name = result[2][4]
It's well detailed here
Edit: And watch for the line break character, some people break it with \r and some with \n and this can be a little deadly.

PyQuery Python not working with for loop

I am trying to write a program that pulls the urls from each line of a .txt file and performs a PyQuery to scrape lyrics data off of LyricsWiki, and everything seems to work fine until I actually put the PyQuery stuff in. For example, when I do:
full_lyrics = ""
#open up the input file
links = open('links.txt')
for line in links:
full_lyrics += line
print(full_lyrics)
links.close()
It prints everything out as expected, one big string with all the data in it. However, when I implement the actual html parsing, it only pulls the lyrics from the last url and skips through all the previous ones.
import requests, re, sqlite3
from pyquery import PyQuery
from collections import Counter
full_lyrics = ""
#open up the input file
links = open('links.txt')
output = open('web.txt', 'w')
output.truncate()
for line in links:
r = requests.get(line)
#create the PyQuery object and parse text
results = PyQuery(r.text)
results = results('div.lyricbox').remove('script').text()
full_lyrics += (results + " ")
output.write(full_lyrics)
links.close()
output.close()
I writing to a txt file to avoid encoding issues with Powershell. Anyway, after I run the program and open up the txt file, it only shows the lyrics of the last link on the links.txt document.
For reference, 'links.txt' should contain several links to lyricswiki song pages, like this:
http://lyrics.wikia.com/Taylor_Swift:Shake_It_Off
http://lyrics.wikia.com/Maroon_5:Animals
'web.txt' should be a blank output file.
Why is it that pyquery breaks the for loop? It clearly works when its doing something simpler, like just concatenating the individual lines of a file.

The problem is the additional newline character in every line that you read from the file (links.txt). Try open another line in your links.txt and you'll see that even the last entry will not be processed.
I recommend that you do a right strip on the line variable after the for, like this:
for line in links:
line = line.rstrip()
r = requests.get(line)
...
It should work.
I also think that you don't need requests to get the html. Try results = PyQuery(line) and see if it works.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

reading and printing text file from a website url line by line - python

response is a str object which you need to split() first as: import requests url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt") file = url.text.split() for line in file: print(line)

You can also use split("\n"): import requests for l in requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt").text.split("\n"): print(l) Demo

Related

How to extract only lines with specific word from text file and write a new one?

Searching for word in file and taking whole line

Remove comment lines from a file

Stored HL7 in a file and facing error while parsing line by line using python

PyQuery Python not working with for loop

Categories

Resources