I am trying to read the content of an XML file for parsing using the BOTO3 library and getting below error while doing that.
I am using the below python code.
import xml.etree.ElementTree as et
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket_name')
key = 'audit'
for obj in bucket.objects.filter(Prefix="Folder/XML.xml"):
key = obj.key
body = obj.get()['Body'].read()
parsed_xml = et.fromstring(body)
I am getting below error while printing parsed_xml variable or body.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in ()
----> 1 parsed
NameError: name 'parsed_xml' is not defined
If I will print body in the above code, it should be shown in XML tags.
You have to define 'parsed_xml' outside the 'for' sentence.
parsed_xml = ''
Related
I am brand new at all of this and I am completely lost even after Googling, watching hours of youtube videos, and reading posts on this site for the past week.
I am using Jupyter notebook
I have a config file with my api keys it is called config.ipynb
I have a different file where I am trying to call?? (I am not sure if this is the correct terminology) my config file so that I can connect to the twitter API but I getting an attribute error
Here is my code
import numpy as np
import pandas as pd
import tweepy as tw
import configparser
#Read info from the config file named config.ipynb
config = configparser.ConfigParser()
config.read(config.ipynb)
api_key = config[twitter][API_key]
print(api_key) #to test if I did this correctly`
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [17], line 4
1 #Read info from the config file named config.ipynb
3 config = configparser.ConfigParser()
----> 4 config.read(config.ipynb)
5 api_key = config[twitter][API_key]
AttributeError: 'ConfigParser' object has no attribute 'ipynb'
After fixing my read() mistake I received a MissingSectionHeaderError.
MissingSectionHeaderError: File contains no section headers.
file: 'config.ipynb', line: 1 '{\n'.
My header in my config file is [twitter] but that gives me a NameError and say [twitter] is not defined... I have updated this many times per readings but I always get the same error.
My config.ipynb file code is below:
['twitter']
API_key = "" #key between the ""
API_secret = "" #key between the ""
Bearer_token = "" #key between the ""
Client_ID = "" #key between the ""
Client_Secret = "" #key between the ""
I have tried [twitter], ['twitter'], and ["twitter"] but all render a MissingSectionHeaderError:
Per your last comment in Brance's answer, this is probably related to your file path. If your file path is not correct, configparser will raise a KeyError or NameError.
Tested and working in Jupyter:
Note that no quotation marks such as "twitter" are used
# stackoverflow.txt
[twitter]
API_key = 6556456fghhgf
API_secret = afsdfsdf45435
import configparser
import os
# Define file path and make sure path is correct
file_name = "stackoverflow.txt"
# Config file stored in the same directory as the script.
# Get currect working directory with os.getcwd()
file_path = os.path.join(os.getcwd(), file_name)
# Confirm that the file exists.
assert os.path.isfile(file_path) is True
# Read info from the config file named stackoverflow.txt
config = configparser.ConfigParser()
config.read(file_path)
# Will raise KeyError if the file path is not correct
api_key = config["twitter"]["API_key"]
print(api_key)
You are using the read() method incorrectly, the input should be a string of the filename, so if your filename is config.ipynb then you need to set the method to
config.read('config.ipynb')
I have this issue in which I hope that someone can help me with.
So I have a process that saves some images into a S3 bucket.
Then, I have a lambda process, that using python, it's supposed to create a PDF file, displaying these images.
I'm using the library xhtml2pdf for that, which I've uploaded to my lambda environment as a layer.
My 1st approach was to download the image from the S3 bucket, and save it into the lambda '/tmp', but I was getting this error from xhtml2pdf:
Traceback (most recent call last):
File "/opt/python/xhtml2pdf/xhtml2pdf_reportlab.py", line 359, in __init__
raise RuntimeError('Imaging Library not available, unable to import bitmaps only jpegs')
RuntimeError: Imaging Library not available, unable to import bitmaps only jpegs fileName=
<_io.BytesIO object at 0x7f1eaabe49a0>
Then I thought that if I had it being transformed into a base64 file, that this issue would be solved, but then I got the same error.
Can anybody here, please, give me some guidance about the best way to do this ?
Thank you
This is a small piece of my lambda code:
from xhtml2pdf import pisa
def getFileFromS3(fileKey, fileName):
try:
localFileName = f'/tmp/{fileName}'
bot_utils.log(f'fileKey : {fileKey}')
bot_utils.log(f'fileName : {fileName}')
bot_utils.log(f'localFileName : {localFileName}')
s3 = boto3.client('s3')
bucketName = 'fileholder'
s3.download_file(bucketName, fileKey, localFileName)
return 'data:image/jpeg;base64,' + getImgBase64( localFileName )
except botocore.exceptions.ClientError as e:
raise
htmlText = '<table>'
for i in range(0, len(shoppingLines), 2):
product = shoppingLines[i]
text = product['text']
folderName = product['folder']
tmpFile = getFileFromS3(f"pannings/{folderName}/{product['photo_id']}.jpg", f"{product['photo_id']}.jpg")
htmlText += f"""<tr><td align="center"><img src="{tmpFile}" width="40" height="55"></td><td>{text}</td></tr>"""
htmlText += '</table>'
result_file = open('/tmp/file.pdf', "w+b")
pisa_status = pisa.CreatePDF(htmlText ,dest=result_file)
result_file.close()
For future google searches.
Seems like the issue is with the PIL/Pillow library.
I've found a version of these library on this GIT repo (https://github.com/keithrozario/Klayers)
When I use this version, it works...
I am trying to use this code to download an image from the given URL
import urllib.request
resource = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
output = open("file01.jpg","wb")
output.write(resource)
output.close()
However, I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-39-43fe4522fb3b> in <module>()
41 resource = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
42 output = open("file01.jpg","wb")
---> 43 output.write(resource)
44 output.close()
TypeError: a bytes-like object is required, not 'tuple'
I get that its the wrong data type for the .write() object but I don't know how to feed resource into output
Right, Using urllib.request.urlretrieve like this way:
import urllib.request
resource, headers = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
image_data = open(resource, "rb").read()
with open("file01.jpg", "wb") as f:
f.write(image_data)
PS: urllib.request.urlretrieve return a tuple, the first element is the location of temp file, you could try to get the bytes of temp file, and save it to a new file.
In Official document:
The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.
So I would recommend you to use urllib.request.urlopen,try code below:
import urllib.request
resource = urllib.request.urlopen("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
output = open("file01.jpg", "wb")
output.write(resource.read())
output.close()
I have an XML file sitting in S3 and I need to open it from a lambda function and write strings to a DynamoDB table. I am using etree to parse the file. However, I don't think any content is actually getting read from the file. Below is my code, the error, and some sample xml.
Code:
import boto3
import lxml
from lxml import etree
def lambda_handler(event, context):
output = 'Lambda ran successfully!'
return output
def WriteItemToTable():
s3 = boto3.resource('s3')
obj = s3.Object('bucket', 'object')
body = obj.get()['Body'].read()
image_id = etree.fromstring(body.content).find('.//IMAGE_ID').text
print(image_id)
WriteItemToTable()
Error:
'str' object has no attribute 'content'
XML:
<HOST_LIST>
<HOST>
<IP network_id="X">IP</IP>
<TRACKING_METHOD>EC2</TRACKING_METHOD>
<DNS><![CDATA[i-xxxxxxxxxx]]></DNS>
<EC2_INSTANCE_ID><![CDATA[i-xxxxxxxxx]]></EC2_INSTANCE_ID>
<EC2_INFO>
<PUBLIC_DNS_NAME><![CDATA[xxxxxxxxxxxx]]></PUBLIC_DNS_NAME>
<IMAGE_ID><![CDATA[ami-xxxxxxx]]></IMAGE_ID>
I am trying to pull the AMI ID inside of the <IMAGE_ID> tag.
Content is read, what you get is just an attribute error. body is already a string and it has no content attribute. Instead of fromstring(body.content) just do fromstring(body).
This is the code that results in an error message:
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.parse(data)
The error:
I'm new to python. I did read documentation and a couple of tutorials, but clearly I still have done something wrong. I don't believe it is the xml file itself because it does this to two different xml files.
Consider using ElementTree's fromstring():
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
# http://feeds.bbci.co.uk/news/rss.xml?edition=int
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.fromstring(data)
print ET.tostring(tree, encoding='utf8', method='xml')
data is a reference to the XML content as a string, but the parse() function expects a filename or file object as argument. That's why there is an an error.
urlhandle is a file object, so tree = ET.parse(urlhandle) should work for you.
The error message indicates that your code is trying to open a file, who's name is stored in the variable source.
It's failing to open that file (IOError) because the variable source contains a bunch of XML, not a file name.