Python : Read .JS file without missing any data - python

I'm tring to read a '.js' file but when i try I can't read data same as I have in the js file some lines goes missing, can anyone please help.
The code I tried is
import json
file_path = 'C:/Users/smith/Desktop/inject.bundle.js'
with open(file_path,'r', encoding='utf-8') as dataFile:
data = dataFile.read()
print(data)
The js file I tried is here.

Why not download the js file using a rest call?
import requests
res = requests.get("https://raw.githubusercontent.com/venkatesanramasekar/jstesting/master/inject.bundle.js")
data = res.text
// to save to a file
with open("inject.bundle.js", "w") as f:
f.write(data)

Related

Write request.get response to json file in python

I am using requests.get in python for a URL to get the data as below:
import requests
username = 'ROAND'
password = dbutils.secrets.get("lab-secrets","ROSecret")
response = requests.get('https://pit.service.com/api/table', auth=(username,password))
The count for this is
print(response.headers)
X-Total-Count': '799434'
Im trying to load this into a json file as below:
data = response.content
with open('/path/file.json', 'wb') as f:
f.write(data)
But the file contains only 1439 records.
The json file content looks like the below image:
Ive tried multiple ways, but not successful.
I just want to exactly bring all my contents from requests.get into a json file.
Kindly help.

how to replace HTML codes in HTML file using python?

I'm trying to replace all HTML codes in my HTML file in a for Loop (not sure if this is the easiest approach) without changing the formatting of the original file. When I run the code below I don't get the codes replaced. Does anyone know what could be wrong?
import re
tex=open('ALICE.per-txt.txt', 'r')
tex=tex.read()
for i in tex:
if i =='õ':
i=='õ'
elif i == 'ç':
i=='ç'
with open('Alice1.replaced.txt', "w") as f:
f.write(tex)
f.close()
You can use html.unescape.
>>> import html
>>> html.unescape('õ')
'õ'
With your code:
import html
with open('ALICE.per-txt.txt', 'r') as f:
html_text = f.read()
html_text = html.unescape(html_text)
with open('ALICE.per-txt.txt', 'w') as f:
f.write(html_text)
Please note that I opened the files with a with statement. This takes care of closing the file after the with block - something you forgot to do when reading the file.

How can i download a .mat file from a website?

I was trying to do that task with Matlab using :
url = 'the url of the file';
file_name = 'data.mat';
outfilename = websave(filename,url);
load(outfilename);
but it didn't work, how can i do that using python? kindly note i want the .mat as it is not an html , csv or any other format i just that file just downloaded(i can do it manually but i have hundreds that's why i need that)
.(python 3)
using urllib2:
import urllib2
response = urllib2.urlopen("the url")
file = open("filename.mat", 'w')
file.write(response.read())
file.close()

JSON Line issue when loading from import.io using Python

I'm having a hard time trying to load an API response from import.io into a file or a list.
The enpoint I'm using is https://data.import.io/extractor/{0}/json/latest?_apikey={1}
Previously all my scripts were set to use normal JSON and all was working well, but now hey have decided to use json line, but somehow it seems malformed.
The way I tried to adapt my scripts is to read the API response in the following way:
url_call = 'https://data.import.io/extractor/{0}/json/latest?_apikey={1}'.format(extractors_row_dict['id'], auth_key)
r = requests.get(url_call)
with open(temporary_json_file_path, 'w') as outfile:
json.dump(r.content, outfile)
data = []
with open(temporary_json_file_path) as f:
for line in f:
data.append(json.loads(line))
the problem doing this is that when I check data[0], all of the json file content was dumped in it...
data[1] = IndexError: list index out of range
Here is an example of data[0][:300]:
u'{"url":"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de","result":{"extractorData":{"url":"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de","resourceId":"23455234","data":[{"group":[{"Brand":[{"text":"Brand","href":"https://www.example.com'
Does anyone have experience with the response of this API?
All other jsonline reads I do from other sources work fine except this one.
EDIT based on comment:
print repr(open(temporary_json_file_path).read(300))
gives this:
'"{\\"url\\":\\"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de\\",\\"result\\":{\\"extractorData\\":{\\"url\\":\\"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de\\",\\"resourceId\\":\\"df8de15cede2e96fce5fe7e77180e848\\",\\"data\\":[{\\"group\\":[{\\"Brand\\":[{\\"text\\":\\"Bra'
You've got a bug in your code where you are double encoding:
with open(temporary_json_file_path, 'w') as outfile:
json.dump(r.content, outfile)
Try:
with open(temporary_json_file_path, 'w') as outfile:
outfile.write(r.content)

In python, downloading html file and store in a file

I'm using python to download a html file and store in a file.
Here's the code:
url = "http://www.nytimes.com/roomfordebate/2014/09/24/protecting-student-privacy-in-online-learning"
page = requests.get(url)
# save html content
file_name = url.split('/')[-1]
text_file = open(file_name, 'w+')
text_file.write(page.text())
text_file.close()
i got the following error:
File "scraper.py", line 15, in scrape_Page
text_file.write(page.text())
TypeError: 'unicode' object is not callable
Could anyone tell how could I successfully store the text or why I got this error?
Thanks
request.text is an attribute, not a method. You should not call it. You should not be using it to download a file, either, you should be using .content instead; you want the undecoded bytes, not the decoded Unicode value:
text_file.write(page.content)
To download content, you may want to stream it to the file instead:
import requests
import shutil
r = requests.get(url, stream=True)
file_name = url.rpartition('/')[-1]
with open(file_name, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)

Categories