How to parse xml from local disk file in python?

How to parse xml from local disk file in python? - python

I have a code like this :
import requests
user_agent_url = 'http://www.user-agents.org/allagents.xml'
xml_data = requests.get(user_agent_url).content
Which will parse a online xml file into xml_data. How can I parse it from a local disk file? I tried replacing with path to local disk,but got an error:
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found
What has to be done?

Note that the code you quote does NOT parse the file - it simply puts the XML data into xml_data. The equivalent for a local file doesn't need to use requests at all: simply write
with open("/path/to/XML/file") as f:
xml_data = f.read()
If you are determined to use requests then see this answer for how to write a file URL adapter.

You can read the file content using open method and then use elementtree module XML function to parse it.
It returns an etree object which you can loop through.
Example
Content = open("file.xml").read()
From xml.etree import XML
Etree = XML(Content)
Print Etree.text, Etree.value, Etree.getchildren()

Related

How to get file from url in python?

I want to download text files using python, how can I do so?
I used requests module's urlopen(url).read() but it gives me the bytes representation of file.

For me, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.

You can use multiple options:
For the simpler solution you can use this
file_url = 'https://someurl.com/text_file.txt'
for line in urllib.request.urlopen(file_url):
print(line.decode('utf-8'))
For an API solution
file_url = 'https://someurl.com/text_file.txt'
response = requests.get(file_url)
if (response.status_code):
data = response.text
for line in enumerate(data.split('\n')):
print(line)

When downloading text files with python I like to use the wget module
import wget
remote_url = 'https://www.google.com/test.txt'
local_file = 'local_copy.txt'
wget.download(remote_url, local_file)
If that doesn't work try using urllib
from urllib import request
remote_url = 'https://www.google.com/test.txt'
file = 'copy.txt'
request.urlretrieve(remote_url, file)
When you are using the request module you are reading the file directly from the internet and it is causing you to see the text in byte format. Try to write the text to a file then view it manually by opening it on your desktop
import requests
remote_url = 'test.com/test.txt'
local_file = 'local_file.txt'
data = requests.get(remote_url)
with open(local_file, 'wb')as file:
file.write(data.content)

Reading Local HTML File in Python

I've been reviewing examples of how to read in HTML from websites using XPass and lxml. For some reason when I try with a local file I keep running into this error.
AttributeError: 'str' object has no attribute 'content'
This is the code
with open(r'H:\Python\Project\File','r') as f:
file = f.read()
f.close()
tree = html.fromstring(file.content)

You have a few problems with your code. It looks like you are modifying code that is parsing html from an http/https request. In that case using .content() extracts the bytes from the response object.
However, when reading from a file, you are already reading in the contents of the file in your with context. Also, you don't need to use .close(), the context manager takes care of that for you.
Try this:
with open(r'H:\Python\Project\File','r') as f:
tree = html.fromstring(f.read())

Try encoding='utf-8'
f1 = open(new_file + '.html', 'r', encoding="utf-8")

Open an XML file through URL and save it

With Python 3, I want to read an XML web page and save it in my local drive.
Also, if the file already exist, it must overwrite it.
I tested some script like :
import urllib.request
xml = urllib.request.urlopen('URL')
data = xml.read()
file = open("file.xml","wb")
file.writelines(data)
file.close()
But I have an error :
TypeError: a bytes-like object is required, not 'int'

First suggestion: do what even the official urllib docs says and don't use urllib, use requests instead.
Your problem is that you use .writelines() and it expects a list of lines, not a bytes objects (for once in Python the error message is not very helpful). Use .write() instead
import requests
resp = requests.get('URL')
with open('file.xml', 'wb') as foutput:
foutput.write(resp.content)

I found a solution :
from urllib.request import urlopen
xml = open("import.xml", "r+")
xml.write(urlopen('URL').read().decode('utf-8'))
xml.close()
Thanks for your help.

How to read remote page content using Python

I need to read the remote file content using python but here I am facing some challenges. My code is below:
import subprocess
path = 'http://securityxploded.com/remote-file-inclusion.php'
subprocess.Popen(["rsync", host-ip+path],stdout=subprocess.PIPE)
for line in ssh.stdout:
line
Here I am getting the error NameError: name 'host' is not defined. I could not know what should be the host-ip value because I am running my Python file using terminal(python sub.py). Here I need to read the content of the http://securityxploded.com/remote-file-inclusion.php remote file.

You need the urllib library. Also you are using parameters which you don't use.
Try something like this:
import urllib.request
fp = urllib.request.urlopen("http://www.stackoverflow.com")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
Note: this is for python 3
For python 2.7 use this:
import urllib
fp = urllib.urlopen("http://www.stackoverflow.com")
myfile = fp.read()
print myfile

if you want to read remote content via http.
requests or urllib2 are both good choice.
for Python2, use requests.
import requests
resp = requests.get('http://example.com/')
print resp.text
will work.

Can anyone tell me what error msg "line 1182 in parse" means when I'm trying to parse and xml in python

This is the code that results in an error message:
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.parse(data)
The error:
I'm new to python. I did read documentation and a couple of tutorials, but clearly I still have done something wrong. I don't believe it is the xml file itself because it does this to two different xml files.

Consider using ElementTree's fromstring():
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
# http://feeds.bbci.co.uk/news/rss.xml?edition=int
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.fromstring(data)
print ET.tostring(tree, encoding='utf8', method='xml')

data is a reference to the XML content as a string, but the parse() function expects a filename or file object as argument. That's why there is an an error.
urlhandle is a file object, so tree = ET.parse(urlhandle) should work for you.

The error message indicates that your code is trying to open a file, who's name is stored in the variable source.
It's failing to open that file (IOError) because the variable source contains a bunch of XML, not a file name.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse xml from local disk file in python? - python

You can read the file content using open method and then use elementtree module XML function to parse it. It returns an etree object which you can loop through. Example Content = open("file.xml").read() From xml.etree import XML Etree = XML(Content) Print Etree.text, Etree.value, Etree.getchildren()

Related

How to get file from url in python?

Reading Local HTML File in Python

Open an XML file through URL and save it

How to read remote page content using Python

Can anyone tell me what error msg "line 1182 in parse" means when I'm trying to parse and xml in python

Categories

Resources