Python // create XML File - python

I'm struggling with printing out an xml-file.
What I wanna do is to iterate through my list of test numbers until the list is empty.
But I'm not getting to the point, where it even prints out anything, since I get a syntax error for the print command (***-Line).
I would be fine with getting it printed out to the console, but right now I'm stuck with the syntax error.
Any ideas on how to fix that syntax error?
import lxml.etree
import lxml.builder
E = lxml.builder.ElementMaker()
ITEM = E.item
attributeValue = E.attributeValue
part = E.part
a = 300 #Testwert
l = [10,20,30,44,76876,9009809] #Personennummern
i = 0 #Indexwert
while len(l) > 0:
Document = ITEM(
attributeValue(str("2019-12-03"), id="1026"),
attributeValue(str("2019-12-03"), id="1028"),
attributeValue(str("FATCA Schriftverkehr"), id="1023"),
attributeValue(str(l[i]), id="1022"),
attributeValue(str("FATCA"), id="1031"),
attributeValue(str(a), id = "1025", url="Test-URL", contenttype = "application/pdf/"),
part(str(a), type="base"),
type="1007"
)
***print lxml.etree.tostring(Document, pretty_print=True)
del l[i]

Related

TypeError: list indices must be integers or slices, not Tag. One of my loops isn't working

The ultimate goal of this is to output select data columns to a .csv. I had it working once to where it only got the first table on the page but I needed both. Now it says this. Im quite new to python and IDK how I got to this point in the first place. I needed the call and put table but on the web page the calls came first and when I did .find I only got the calls. I am working on this with a friend and he put in the last two functions. He could get the columns I wanted but now we only get the calls. I tried to fix it and now it say the error in the title.
import bs4
import requests
import pandas as pd
import csv
from bs4 import BeautifulSoup
#sets desired ticker. in the future you could make this long
def ticker():
ticker = ['GME','NYMT']
return ticker
#creates list of urls for scrapet to grab
def ticker_site():
ticker_site = ['https://finance.yahoo.com/quote/'+x+'/options?p='+x for x in ticker()]
return ticker_site
optionRows = []
for i in range(len(ticker_site())):
optionRows.append([])
def ticker_gets():
option_page = ticker_site()
requested_page = requests.get(option_page[i])
ticker_soup = BeautifulSoup(requested_page.text,'html.parser')
return ticker_soup
def soup_search():
table = ticker_gets()
both_tables = table.find_all('table')
call_table = both_tables[0]
put_table= both_tables[1]
call_rows = call_table.find('tr')
put_rows = put_table.find('tr')
#makes the call table
for call in call_rows:
whole_call_table = call.find_all('td')
call_row = [y.text for y in whole_call_table]
optionRows[call].append(call_row)
#makes the put table
for put in put_rows:
whole_put_table = put.find_all('td')
put_row = [z.text for z in whole_put_table]
optionRows[put].append(put_row)
for i in range(len(optionRows)):
optionRows[i] = optionRows[i][1:len(optionRows[i])]
return optionRows
def getColumns(columnIndexes=[2, 4, 5]):
newList = []
for tickerIndex in range(len(soup_search())):
newList.append([])
indexCount = 0
for j in soup_search()[tickerIndex]:
newList[tickerIndex].append([])
for i in columnIndexes:
newList[tickerIndex][indexCount].append(j[i])
indexCount += 1
return newList
def csvOutputer():
rows = getColumns()
fields = ["Ticker", "Strike", "Bid", "Ask"]
with open('newcsv', 'w') as f:
write = csv.writer(f)
write.writerow(fields)
for i in range(len(ticker())):
for j in rows[i]:
j.insert(0, ticker()[i])
write.writerow(j)
csvOutputer()

Error with xmltodict

EDIT:
I can print rev['contributor'] for a while but then every try to access rev['contributor'] returns the following
TypeError: string indices must be integers
ORIGINAL POST:
I'm trying to extract data from an xml using xml to dict with the code:
import xmltodict, json
with open('Sockpuppet_articles.xml', encoding='utf-8') as xml_file:
dic_xml = xmltodict.parse(xml_file.read(), xml_attribs=False)
print("parsed")
for page in dic_xml['mediawiki']['page']:
for rev in page['revision']:
for user in open("Sockpuppet_names.txt", "r", encoding='utf-8'):
user = user.strip()
if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
dosomething()
I get this error in the last line with the if-statement:
TypeError: string indices must be integers
Weird thing is, it works on another xml-file.
I got the same error when the next level has only one element.
...
## Read XML
pastas = [os.path.join(caminho, name) for name in os.listdir(caminho)]
pastas = filter(os.path.isdir, pastas)
for pasta in pastas:
for arq in glob.glob(os.path.join(pasta, "*.xml")):
xmlData = codecs.open(arq, 'r', encoding='utf8').read()
xmlDict = xmltodict.parse(xmlData, xml_attribs=True)["XMLBIBLE"]
bible_name = xmlDict["#biblename"]
list_verse = []
for xml_inBook in xmlDict["BIBLEBOOK"]:
bnumber = xml_inBook["#bnumber"]
bname = xml_inBook["#bname"]
for xml_chapter in xml_inBook["CHAPTER"]:
cnumber = xml_chapter["#cnumber"]
for xml_verse in xml_chapter["VERS"]:
vnumber = xml_verse["#vnumber"]
vtext = xml_verse["#text"]
...
TypeError: string indices must be integers
The error occurs when the book is "Obadiah". It has only one chapter.
Cliking CHAPTER value we see the following view. Then it's supposed xml_chapter will be the same. That is true only if the book has more then one chapter:
But the loop returns "#cnumber" instead of an OrderedDict.
I solved that converting the OrderedDict to List when has only one chapter.
...
if len(xml_inBook["CHAPTER"]) == 2:
xml_chapter = list(xml_inBook["CHAPTER"].items())
cnumber = xml_chapter[0][1]
for xml_verse in xml_chapter[1][1]:
vnumber = xml_verse["#vnumber"]
vtext = xml_verse["#text"]
...
I am using Python 3,6.

ValueError: can only parse strings python

I am trying to gather a bunch of links using xpath which need to be scraped from the next page however, I keep getting the error that can only parse strings? I tried looking at the type of lk and it was a string after I casted it? What seems to be wrong?
def unicode_to_string(types):
try:
types = unicodedata.normalize("NFKD", types).encode('ascii', 'ignore')
return types
except:
return types
def getData():
req = "http://analytical360.com/access-points"
page = urllib2.urlopen(req)
tree = etree.HTML(page.read())
i = 0
for lk in tree.xpath('//a[#class="sabai-file sabai-file-image sabai-file-type-jpg "]//#href'):
print "Scraping Vendor #" + str(i)
trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk)))
for ll in trees.xpath('//table[#id="archived"]//tr//td//a//#href'):
final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)))
You should pass in strings not urllib2.orlopen.
Perhaps change the code like so:
trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk)).read())
for i, ll in enumerate(trees.xpath('//table[#id="archived"]//tr//td//a//#href')):
final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)).read())
Also, you don't seem to increment i.

FBX SDK - Get the FbxNode from an ENodeId of a Character

Is it possible to get the associated FbxNode from the enum ENodeId ?
for example if i want to get the FbxNode from the Character::eLeftHand
I tried to use Character::GetCharacterLink(ENodeId, FbxCharacterLink)
then extracting the FbxNode from FbxCharacterLink simply by calling FbxCharacterLink::mNode
However this function returns for most ENodeIds False, so no FbxCharacterLink is created.
character = myScene.GetCharacter(0)
lefthand = character.eLeftHand
lefthand_node = FbxCharacterLink()
character.GetCharacterLink(lefthand, lefthand_node) # False
lefthand_node = lefthand_node.mNode
when I was scripting inside Motionbuilder using Python and Pyfbxsdk, it was very easy, no matter how the skeleton objects are named, I can get the FBXObject of it
m = character.GetModel(self.BodyNodeObject[o])
and BodyNodeObject is generated with
def BodyNodes(self):
for i in dir(FBBodyNodeId):
if i[0] == 'k':
try:
self.BodyNodeObject[BodyNodesId[i]] = getattr(FBBodyNodeId, i)
except:
pass
BodyNodesId is simply a dictionary
BodyNodesId = OrderedDict({
'kFBChestNodeId':'Spine1',
'kFBHeadNodeId':'Head',
'kFBHipsNodeId':'Hips',
'kFBLeftAnkleNodeId':'LeftFoot',
'kFBLeftCollarNodeId':'LeftShoulder',
'kFBLeftElbowNodeId':'LeftForeArm',
...
})
this worked for me
from fbx import *
for i in range(myscene.GetCharacterCount()):
character = myscene.GetCharacter(i)
node = character.eHips
link = FbxCharacterLink()
while (character.GetCharacterLink(node, link)):
print node, link.mNode.GetName()
node = character.ENodeId(int(node+1))

KeyError and TypeError in my python web scraper

So sorry about this vague and confusing title. But there is no really better way for me to summarize my problem in one sentence.
I was trying to get the student and grade information from a french website. The link is this (http://www.bankexam.fr/resultat/2014/BACCALAUREAT/AMIENS?filiere=BACS)
My code is as follows:
import time
import urllib2
from bs4 import BeautifulSoup
regions = {'R\xc3\xa9sultats Bac Amiens 2014':'/resultat/2014/BACCALAUREAT/AMIENS'}
base_url = 'http://www.bankexam.fr'
tests = {'es':'?filiere=BACES','s':'?filiere=BACS','l':'?filiere=BACL'}
for i in regions:
for x in tests:
# create the output file
output_file = open('/Users/student project/'+ i + '_' + x + '.txt','a')
time.sleep(2) #compassionate scraping
section_url = base_url + regions[i] + tests[x] #now goes to the x test page of region i
request = urllib2.Request(section_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
# Find the maximum pages to go through
if soup.find('div','pagination'):
import re
page_info = soup.find('div','pagination')
pages = []
for i in page_info.find_all('a',re.compile('elt')):
try:
pages.append(int(i.string.encode('utf8')))
except ValueError:
continue
max_page = max(pages)
# Now goes through page 2 to max page
for i in range(1,max_page):
page_url = '&p='+str(i)+'#anchor'
section2_url = section_url+page_url
request = urllib2.Request(section2_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
A little more description about the code:
I created a 'regions' dictionary and 'tests' dictionary because there are 30 other regions I need to collect and I just include one here for showcase. And I'm just interested in the test results of three tests (ES, S, L) and so I created this 'tests' dictionary.
Two errors keep showing up,
one is
KeyError: 2
and the error is linked to line 12,
section_url = base_url + regions[i] + tests[x]
The other is
TypeError: cannot concatenate 'str' and 'int' objects
and this is linked to line 10.
I know there is a lot of information here and I'm probably not listing the most important info for you to help me. But let me know how I can do to fix this!
Thanks
The issue is that you're using the variable i in more than one place.
Near the top of the file, you do:
for i in regions:
So, in some places i is expected to be a key into the regions dictionary.
The trouble comes when you use it again later. You do so in two places:
for i in page_info.find_all('a',re.compile('elt')):
And:
for i in range(1,max_page):
The second of these is what is causing your exceptions, as the integer values that get assigned to i don't appear in the regions dict (nor can an integer be added to a string).
I suggest renaming some or all of those variables. Give them meaningful names, if possible (i is perhaps acceptable for an "index" variable, but I'd avoid using it for anything else unless you're code golfing).

Categories