So I'm new in python and I desperately need help.
I have a file which has a bunch of ids (integer values) written in 'em. Its a text file.
Now I need to pass each id inside the file into a url.
For example "https://example.com/[id]"
It will be done in this way
A = json.load(urllib.urlopen("https://example.com/(the first id present in the text file)"))
print A
What this will essentially do is that it will read certain information about the id present in the above url and display it. I want this to work in a loop format where in it will read all the ids inside the text file and pass it to the url mentioned in 'A' and display the values continuously..is there a way to do this?
I'd be very grateful if someone could help me out!
Old style string concatenation can be used
>>> id = "3333333"
>>> url = "https://example.com/%s" % id
>>> print url
https://example.com/3333333
>>>
The new style string formatting:
>>> url = "https://example.com/{0}".format(id)
>>> print url
https://example.com/3333333
>>>
The reading for file as mentioned by avasal with a small change:
f = open('file.txt', 'r')
for line in f.readlines():
id = line.strip('\n')
url = "https://example.com/{0}".format(id)
urlobj = urllib.urlopen(url)
try:
json_data = json.loads(urlobj)
print json_data
except:
print urlobj.readlines()
lazy style:
url = "https://example.com/" + first_id
A = json.load(urllib.urlopen(url))
print A
old style:
url = "https://example.com/%s" % first_id
A = json.load(urllib.urlopen(url))
print A
new style 2.6+:
url = "https://example.com/{0}".format( first_id )
A = json.load(urllib.urlopen(url))
print A
new style 2.7+:
url = "https://example.com/{}".format( first_id )
A = json.load(urllib.urlopen(url))
print A
Python 3+
New String formatting is supported in Python 3 which is a more readable and better way to format a string.
Here's the good article to read about the same: Python 3's f-Strings
In this case, it can be formatted as
url = f"https://example.com/{id}"
Detailed example
When you want to pass multiple params to the URL it can be done as below.
name = "test_api_4"
owner = "jainik#test.com"
url = f"http://localhost:5001/files/create" \
f"?name={name}" \
f"&owner={owner}" \
We are using multiple f-string here and they can be appended by ''. This will keep them in the same line without inserting any new line character between them.
For values which have space
For such values you should import from urllib.parse import quote in your python file and then quote the string like: quote("firstname lastname")
This will replace space character with %20.
The first thing you need to do is know how to read each line from a file. First, you have to open the file; you can do this with a with statement:
with open('my-file-name.txt') as intfile:
This opens a file and stores a reference to that file in intfile, and it will automatically close the file at the end of your with block. You then need to read each line from the file; you can do that with a regular old for loop:
for line in intfile:
This will loop through each line in the file, reading them one at a time. In your loop, you can access each line as line. All that's left is to make the request to your website using the code you gave. The one bit your missing is what's called "string interpolation", which allows you to format a string with other strings, numbers, or anything else. In your case, you'd like to put a string (the line from your file) inside another string (the URL). To do that, you use the %s flag along with the string interpolation operator, %:
url = 'http://example.com/?id=%s' % line
A = json.load(urllib.urlopen(url))
print A
Putting it all together, you get:
with open('my-file-name.txt') as intfile:
for line in intfile:
url = 'http://example.com/?id=%s' % line
A = json.load(urllib.urlopen(url))
print A
Related
I´m trying to find a specific string on a file and have it return the text in front of if.
The file has the following: "releaseDate": "2022-07-11T07:15:00.000Z"
I want to search the releaseDate but have it return the 2022-07-11T07:15:00.000Z
I can find it, but have honestly no idea how to return the info I need.
dateOccurence=open('scriptFile.txt', 'r').read().find('releaseDate')
Your file is JSON content, so parsing it as this would be the best idea, but then you'd need to know the path of keys to reach it
A regex approach is easier here
with open('scriptFile.txt', 'r') as f:
content = f.read()
date = re.search(r'"releaseDate":\s+"([^"]+)"', content)[1]
print(date) # 2022-07-11T07:15:00.000Z
I have this code here:
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
File = url.text
for line in File:
print(line)
The output looks like this:
p
i
l
l
o
w
and so on...
Instead, I want it to look like this:
pillow
fire
thumb
and so on...
I know I can add end="" inside of print(line) but I want a variable to be equal to those lines. For example
Word = line
and when you print Word, it should look like this:
pillow
fire
thumb
.text of requests' response is str, you might use .splitlines for iterating over lines as follows:
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
for line in url.text.splitlines():
print(line)
Note that .splitlines() deals with different newlines, so you can use it without worrying about what newlines exactly are used (using .split("\n") is fine as long are you sure you working with Linux-style newlines)
you cannot do for line in url.text because url.text is not a IO (File). Instead, you can either print it directly (since \n or the line breaks will automatically print as line breaks) or if you really need to split on new lines, then do for line in url.text.split('\n').
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
for line in url.text.split('\n'):
print(line)
Edit: You might also want to do .strip() as well to remove extra line breaks.
response is a str object which you need to split() first as:
import requests
url = requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt")
file = url.text.split()
for line in file:
print(line)
You can also use split("\n"):
import requests
for l in requests.get("https://raw.githubusercontent.com/deppokol/testrepo/main/testfile.txt").text.split("\n"):
print(l)
Demo
Question
I have a text file that records metadata of research papers requested with SemanticScholar API. However, when I wrote requested data, I forgot to add "\n" for each individual record. This results in something looks like
{<metadata1>}{<metadata2>}{<metadata3>}...
and this should be if I did add "\n".
{<metadata1>}
{<metadata2>}
{<metadata3>}
...
Now, I would like to read the data. As all the metadata is now stored in one line, I need to do some hacks
First I split the cluttered dicts using "{".
Then I tried to convert the string line back to dict. Note that I do consider line might not be in a proper JSON format.
import json
with open("metadata.json", "r") as f:
for line in f.readline().split("{"):
print(json.loads("{" + line.replace("\'", "\"")))
However, there is still an error message
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I am wondering what should I do to recover all the metadata I collected?
MWE
Note, in order to get metadata.json file I use, use the following code, it should work out of the box.
import json
import urllib
import requests
baseURL = "https://api.semanticscholar.org/v1/paper/"
paperIDList = ["200794f9b353c1fe3b45c6b57e8ad954944b1e69",
"b407a81019650fe8b0acf7e4f8f18451f9c803d5",
"ff118a6a74d1e522f147a9aaf0df5877fd66e377"]
for paperID in paperIDList:
response = requests.get(urllib.parse.urljoin(baseURL, paperID))
metadata = response.json()
record = dict()
record["title"] = metadata["title"]
record["abstract"] = metadata["abstract"]
record["paperId"] = metadata["paperId"]
record["year"] = metadata["year"]
record["citations"] = [item["paperId"] for item in metadata["citations"] if item["paperId"]]
record["references"] = [item["paperId"] for item in metadata["references"] if item["paperId"]]
with open("metadata.json", "a") as fileObject:
fileObject.write(json.dumps(record))
The problem is that when you do the split("{") you get a first item that is empty, corresponding to the opening {. Just ignore the first element and everything works fine (I added an r in your quote replacements so python considers then as strings literals and replace them properly):
with open("metadata.json", "r") as f:
for line in f.readline().split("{")[1:]:
print(json.loads("{" + line).replace(r"\'", r"\""))
As suggested in the comments, I would actually recommend recreating the file or saving a new version where you replace }{ by }\n{:
with open("metadata.json", "r") as f:
data = f.read()
data_lines = data.replace("}{","}\n{")
with open("metadata_mod.json", "w") as f:
f.write(data_lines)
That way you will have the metadata of a paper per line as you want.
I wrote this code to download an srt subtitle file, but this doesn't work. Please review this problem and help me with the code. I need to find what is the mistake that i'm doing. Thanks.
from urllib import request
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
srt_str = str(srt)
lines = srt_str.split('\\n')
dest_url = r'srtfile.srt'
fx = open('dest_url' , 'w')
for line in lines:
fx.write(line)
fx.close()
download_srt_file(srt_url)
A number of things are wrong or can be improved.
You are missing the return statement on your function.
You are calling the function from within the function so you are not actually calling it at all. You never enter it to begin with.
dest_url is not a string, it is a variable so fx = open('dest_url', 'w') will return an error (no such file)
To avoid handling the closing and flushing the file you are writing just use the with statement.
Your split('//n') is also wrong. You are escaping the slash like that. You want to split the lines so it has to be split('\n')
Finally, you don't have to convert the srt to string. It already is.
Below is a modified and hopefully functioning version of your code with the above implemented.
from urllib import request
def download_srt_file(srt_url):
response = request.urlopen(srt_url)
srt = response.read()
lines = srt.split('\n')
dest_url = 'srtfile.srt'
with open(dest_url, 'w') as fx:
for line in lines:
fx.write(line)
return
srt_url = "https://subscene.com/subtitle/download?mac=LkM2jew_9BdbDSxdwrqLkJl7hDpIL_HnD-s4XbfdB9eqPHsbv3iDkjFTSuKH0Ee14R-e2TL8NQukWl82yNuykti8b_36IoaAuUgkWzk0WuQ3OyFyx04g_vHI_rjnb2290"
download_srt_file(srt_url)
Tell me if it works for you.
A final remark is that you are not setting the target directory for the file you are writing. Are you sure you want to do that?
I think this block of code is pretty close to being right, but something is throwing it off. I'm trying to loop through 10 URLs and download the contents of each to a text file, and make sure everything is structured orderly, in a dataframe.
import pandas as pd
rawHtml = ''
url = r'http://www.pga.com/golf-courses/search?page=" + i + "&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0'
g = open("C:/Users/rshuell001/Desktop/MyData.txt", "w")
for i in range(0, 10):
df = pd.DataFrame.from_csv(url)
print(df)
g.write(str(df))
g.close()
The error that I get says:
CParserError: Error tokenizing data.
C error: Expected 1 fields in line 22, saw 2
I have no idea what that means. I only have 9 lines of code, so I don't know why it's mentioning a problem on line 22.
Can someone give me a push to get this working?
pandas.DataFrame.from_csv() takes a first argument which is either a path or a file-like handle, where either are supposed to be pointing at valid CSV file.
You are providing it with a URL.
It seems that you want to use a different function: the top-level pandas.read_csv. This function will actually fetch the data from you from a valid URL, then parse it.
If for any reason you insist on using pandas.DataFrame.from_csv(), you will have to:
Get the text from the page.
Persist the text, or parts thereof, as a valid CSV file, or a file-like object.
Provide the path to the file, or the handler of the file-like, as the first argument to pandas.DataFrame.from_csv().
I finally got it working. This is what I was trying to do all along.
import requests
from bs4 import BeautifulSoup
link = "http://www.pga.com/golf-courses/search?page=1&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
html = requests.get(link).text
soup = BeautifulSoup(html, "lxml")
res = soup.findAll("div", {"class": "views-field-nothing"})
for r in res:
print("Address: " + r.find("span", {'class': 'field-content'}).text)