Python - SimpleJSON Issue - python

I'm working with the Mega API and Python in hope to produce a folder tree readable by Python. At the moment I'm working with the JSON responses Mega's API gives, but for some reason am having trouble parsing it. In the past I would simply use simplejson in the format below, though right now it's not working. At the moment I'm just trying to get the file name. Any help is appreciated!
import simplejson
megaResponseToFileSearch = "(u'BMExefXbYa', {u'a': {u'n': u'A Bullet For Pretty Boy - 1 - Dial M For Murder.mp3'}, u'h': u'BMExXbYa', u'k': (5710166, 21957970, 11015946, 7749654L), u'ts': 13736999, 'iv': (7949460, 15946811, 0, 0), u'p': u'4FlnwBTb', u's': 5236864, 'meta_mac': (529642, 2979591L), u'u': u'xpz_tb-YDUg', u't': 0, 'key': (223xx15874, 642xx8505, 1571620, 26489769L, 799460, 1596811, 559642, 279591L)})"
jsonRespone = simplejson.loads(megaResponseToFileSearch)
print jsonRespone[u'a'][u'n']
ERROR:
Traceback (most recent call last):
File "D:/Projects/Mega Sync/megasync.py", line 18, in <module>
jsonRespone = simplejson.loads(file4)
File "D:\Projects\Mega Sync\simplejson\__init__.py", line 453, in loads
return _default_decoder.decode(s)
File "D:\Projects\Mega Sync\simplejson\decoder.py", line 429, in decode
obj, end = self.raw_decode(s)
File "D:\Projects\Mega Sync\simplejson\decoder.py", line 451, in raw_decode
raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
EDIT:
I was asked where I got the string from. It's a response to searching for a file using the Mega API. I'm using the module found here. https://github.com/richardasaurus/mega.py
The code itself looks like this:
from mega import Mega
mega = Mega({'verbose': True})
m = mega.login(email, password)
file = m.find('A Bullet For Pretty Boy - 1 - Dial M For Murder.mp3')
print file

The thing you are getting from m.find is just a python tuple, where the 1-st (next after the 0th) element is a dictionary:
(u'99M1Tazb',
{u'a': {u'n': u'test.txt'},
u'h': u'99M1Tazb',
u'k': (1145485578, 1435138417, 702505527, 274874292),
u'ts': 1373482712,
'iv': (1883603069, 763415510, 0, 0),
u'p': u'9td12YaY',
u's': 0,
'meta_mac': (1091379956, 402442960),
u'u': u'79_166PAQCA',
u't': 0,
'key': (872626551, 2013967015, 1758609603, 127858020, 1883603069, 763415510, 1091379956, 402442960)})
To get the filename, just use:
print file[1]['a']['n']
So, no need to use simplejson at all.

Related

Parsing yaml file with --- in python

How can we parse a file which contains multiple configs and which are separated by --- in python.
I've config file which looks like
File name temp.yaml
%YAML 1.2
---
name: first
cmp:
- Some: first
top:
top_rate: 16000
audio_device: "pulse"
---
name: second
components:
- name: second
parameters:
always_on: true
timeout: 200000
When I read it with
import yaml
with open('./temp.yaml', 'r') as f:
temp = yaml.load(f)
I am getting following error
temp = yaml.load(f)
Traceback (most recent call last):
File "temp.py", line 4, in <module>
temp = yaml.load(f)
File "/home/pranjald/.local/lib/python3.6/site-packages/yaml/__init__.py", line 114, in load
return loader.get_single_data()
File "/home/pranjald/.local/lib/python3.6/site-packages/yaml/constructor.py", line 41, in get_single_data
node = self.get_single_node()
File "/home/pranjald/.local/lib/python3.6/site-packages/yaml/composer.py", line 43, in get_single_node
event.start_mark)
yaml.composer.ComposerError: expected a single document in the stream
in "./temp.yaml", line 3, column 1
but found another document
in "./temp.yaml", line 10, column 1
Your input is composed of multiple YAML documents. For that you will need yaml.load_all() or better yet yaml.safe_load_all(). (The latter will not construct arbitrary Python objects outside of data-like structures such as list/dict.)
import yaml
with open('temp.yaml') as f:
temp = yaml.safe_load_all(f)
As hinted at by the error message, yaml.load() is strict about accepting only a single YAML document.
Note that safe_load_all() returns a generator of Python objects which you'll need to iterate over.
>>> gen = yaml.safe_load_all(f)
>>> next(gen)
{'name': 'first', 'cmp': [{'Some': 'first', 'top': {'top_rate': 16000, 'audio_device': 'pulse'}}]}
>>> next(gen)
{'name': 'second', 'components': [{'name': 'second', 'parameters': {'always_on': True, 'timeout': 200000}}]}

Why am I receiving this error using BackTrader on Python?

I am trying to learn how to use the backtrader module on Python. I copied the code directly from the website but receiving an error message.
Here is the website: https://www.backtrader.com/docu/quickstart/quickstart/
I downloaded S&P500 stock data from Yahoo Finance and saved it into an excel file named 'SPY'. Here is the code so far:
from __future__ import (absolute_import, division, print_function,
unicode_literals)
import datetime # For datetime objects
import os.path # To manage paths
import sys # To find out the script name (in argv[0])
# Import the backtrader platform
import backtrader as bt
if __name__ == '__main__':
# Create a cerebro entity
cerebro = bt.Cerebro()
# Datas are in a subfolder of the samples. Need to find where the script is
# because it could have been called from anywhere
modpath = os.path.dirname(os.path.abspath(sys.argv[0]))
datapath = os.path.join(modpath, 'C:\\Users\\xboss\\Desktop\\SPY.csv')
# Create a Data Feed
data = bt.feeds.YahooFinanceCSVData(
dataname=datapath,
# Do not pass values before this date
fromdate=datetime.datetime(2000, 1, 1),
# Do not pass values after this date
todate=datetime.datetime(2000, 12, 31),
reverse=False)
# Add the Data Feed to Cerebro
cerebro.adddata(data)
# Set our desired cash start
cerebro.broker.setcash(100000.0)
# Print out the starting conditions
print('Starting Portfolio Value: %.2f' % cerebro.broker.getvalue())
# Run over everything
cerebro.run()
# Print out the final result
print('Final Portfolio Value: %.2f' % cerebro.broker.getvalue())
Here is the error that I am receiving:
C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\Scripts\python.exe C:/Users/xboss/PycharmProjects/BackTraderDemo/backtrader_quickstart.py
Traceback (most recent call last):
File "C:/Users/xboss/PycharmProjects/BackTraderDemo/backtrader_quickstart.py", line 39, in <module>
cerebro.run()
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\cerebro.py", line 1127, in run
runstrat = self.runstrategies(iterstrat)
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\cerebro.py", line 1212, in runstrategies
data.preload()
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\feed.py", line 688, in preload
while self.load():
Starting Portfolio Value: 100000.00
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\feed.py", line 479, in load
_loadret = self._load()
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\feed.py", line 710, in _load
return self._loadline(linetokens)
File "C:\Users\xboss\PycharmProjects\BackTraderDemo\venv\lib\site-packages\backtrader\feeds\yahoo.py", line 129, in _loadline
dt = date(int(dttxt[0:4]), int(dttxt[5:7]), int(dttxt[8:10]))
ValueError: invalid literal for int() with base 10: '1/29'
Process finished with exit code 1
Does anyone have any suggestions? Any help would be greatly appreciated. Thank you so much for your time!
You get the error because of using custom csv with YahooFinanceCSVData method.
You should import them using GenericCSVData method.
data = btfeed.GenericCSVData(
dataname='SPY.csv',
fromdate=datetime.datetime(2000, 1, 1),
todate=datetime.datetime(2000, 12, 31),
nullvalue=0.0,
dtformat=('%Y-%m-%d'),
datetime=0,
high=1,
low=2,
open=3,
close=4,
volume=5,
openinterest=-1
)
For more information you can see the instruction here

Interpolating R objects into R code strings

I am using rpy2 to run some R commands. Dont ask why. It's necessary at this moment. So here's a part of the code.
import pandas.rpy.common as com
from rpy2.robjects import r
#Load emotionsCART decision tree. Successful.
r_dataframe = com.convert_to_r_dataframe(data)
print type(r_dataframe)
(<class 'rpy2.robjects.vectors.DataFrame'>)
r('pred = predict(emotionsCART, newdata = %s)') %(r_dataframe)
Here what I want to do is pass this r_dataframe into the calculation. I'm using the decision tree that I'd loaded earlier to predict the values. But the last line gives me an error. It says
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
r('pred = predict(emotionsCART, newdata = %s)') %(r_dataframe)
File "C:\Python27\lib\site-packages\rpy2\robjects\__init__.py", line 245, in __call__
p = rinterface.parse(string)
ValueError: Error while parsing the string.
Ideas why this is happening?
I think that:
r('pred = predict(emotionsCART, newdata = %s)') %(r_dataframe)
SHould be:
r('pred = predict(emotionsCART, newdata = %s)' % (r_dataframe) )
The %(r_dataframe) was associated with the r() part, while it should be associated with the '' (string).
But it is hard to check without a reproducible example.

BeautifulSoup trips up on meta tags

I have this function to read saved HTML files saved on the computer:
def get_doc_ondrive(self,mypath):
the_file = open(mypath,"r")
line = the_file.readline()
if(line != "")and (line!=None):
self.soup = BeautifulSoup(line)
else:
print "Something is wrong with line:\n\n%r\n\n" % line
quit()
print "\t\t------------ line: %r ---------------\n" % line
while line != "":
line = the_file.readline()
print "\t\t------------ line: %r ---------------\n" % line
if(line != "")and (line!=None):
print "\t\t\tinner if executes: line: %r\n" % line
self.soup.feed(line)
self.get_word_vector()
self.has_doc = True
Doing self.soup = BeautifulSoup(open(mypath,"r")) returns None, but feeding it line by line at least crashes and gives me something to look at.
I edited the functions listed by the traceback in BeautifulSoup.py and sgmllib.py
When I try to run this, I get:
me#GIGABYTE-SERVER:code$ python test_docs.py
in sgml.finish_endtag
in _feed: inDocumentEncoding: None, fromEncoding: None, smartQuotesTo: 'html'
in UnicodeDammit.__init__: markup: '<!DOCTYPE html>\n'
in UnicodeDammit._detectEncoding: xml_data: '<!DOCTYPE html>\n'
in sgmlparser.feed: rawdata: '', data: u'<!DOCTYPE html>\n' self.goahead(0)
------------ line: '<!DOCTYPE html>\n' ---------------
------------ line: '<html dir="ltr" class="client-js ve-not-available" lang="en"><head>\n' ---------------
inner if executes: line: '<html dir="ltr" class="client-js ve-not-available" lang="en"><head>\n'
in sgmlparser.feed: rawdata: u'', data: '<html dir="ltr" class="client-js ve-not-available" lang="en"><head>\n' self.goahead(0)
in sgmlparser.goahead: end: 0,rawdata[i]: u'<', i: 0,literal:0
in sgmlparser.parse_starttag: i: 0, __starttag_text: None, start_pos: 0, rawdata: u'<html dir="ltr" class="client-js ve-not-available" lang="en"><head>\n'
in sgmlparser.goahead: end: 0,rawdata[i]: u'<', i: 61,literal:0
in sgmlparser.parse_starttag: i: 61, __starttag_text: None, start_pos: 61, rawdata: u'<html dir="ltr" class="client-js ve-not-available" lang="en"><head>\n'
------------ line: '<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n' ---------------
inner if executes: line: '<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n'
in sgmlparser.feed: rawdata: u'', data: '<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n' self.goahead(0)
in sgmlparser.goahead: end: 0,rawdata[i]: u'<', i: 0,literal:0
in sgmlparser.parse_starttag: i: 0, __starttag_text: None, start_pos: 0, rawdata: u'<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n'
in sgml.finish_starttag: tag: u'meta', attrs: [(u'http-equiv', u'content-type'), (u'content', u'text/html; charset=UTF-8')]
in start_meta: attrs: [(u'http-equiv', u'content-type'), (u'content', u'text/html; charset=UTF-8')] declaredHTMLEncoding: u'UTF-8'
in _feed: inDocumentEncoding: u'UTF-8', fromEncoding: None, smartQuotesTo: 'html'
in UnicodeDammit.__init__: markup: None
in UnicodeDammit._detectEncoding: xml_data: None
and the Traceback:
Traceback (most recent call last):
File "test_docs.py", line 28, in <module>
newdoc.get_doc_ondrive(testeee)
File "/home/jddancks/Capstone/Python/code/pkg/vectors/DOCUMENT.py", line 117, in get_doc_ondrive
self.soup.feed(line)
File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead(0)
File "/usr/lib/python2.7/sgmllib.py", line 139, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/sgmllib.py", line 298, in parse_starttag
self.finish_starttag(tag, attrs)
File "/usr/lib/python2.7/sgmllib.py", line 348, in finish_starttag
self.handle_starttag(tag, method, attrs)
File "/usr/lib/python2.7/sgmllib.py", line 385, in handle_starttag
method(attrs)
File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1618, in start_meta
self._feed(self.declaredHTMLEncoding)
File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1172, in _feed
smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1776, in __init__
self._detectEncoding(markup, isHTML)
File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1922, in _detectEncoding
'^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
TypeError: expected string or buffer
so this line
<meta http-equiv="content-type" content="text/html; charset=UTF-8">\n
is somehow causing a null string to be parsed in UnicodeDammit. Why is this happening?
I just read through the source and I think I understand the problem. Essentially, here’s how BeautifulSoup thinks things are supposed to go:
You call BeautifulSoup with the entire markup.
It sets self.markup to that markup.
It calls _feed on itself, which resets the document and parses it in the initially-detected encoding.
While feeding itself, it finds a meta tag that states a different encoding.
To use this new encoding, it calls _feed on itself again, which reparses self.markup.
After the first _feed as well as the _feed it recursed into has finished, it sets self.markup to None. (After all, we’ve parsed everything now; <sarcasm>who could ever need the original markup any more?</sarcasm>)
But the way you’re using it:
You call BeautifulSoup with the first line of the markup.
It sets self.markup to the first line of the markup and calls _feed.
_feed sees no interesting meta tag on the first line, so finishes successfully.
The constructor thinks we’re done parsing, so it sets self.markup back to None and returns.
You call feed on the BeautifulSoup object, which goes straight to the SGMLParser.feed implementation, which is not overridden by BeautifulSoup.
It sees an interesting meta tag and calls _feed to parse the document in this new encoding.
_feed goes trying to construct a UnicodeDammit object with self.markup.
It explodes, since self.markup is None, since it thought it was only going to be called during that little chunk of time in the constructor of BeautifulSoup.
Moral of the story is that feed is an unsupported way of sending input to BeautifulSoup. You have to pass it all the input at once.
As for why BeautifulSoup(open(mypath, "r")) returns None, I’ve no idea; I don’t see a __new__ defined on BeautifulSoup, so it seems like it has to return a BeautifulSoup object.
All that said, you might want to look into using BeautifulSoup 4 rather than 3. Here’s the porting guide. In order to support Python 3, it had to remove the dependency on SGMLParser, and I wouldn’t be surprised if during that part of the rewrite whatever bug you’re encountering was fixed.

How to parse XML in Python and LXML?

Here's my project: I'm graphing weather data from WeatherBug using RRDTool. I need a simple, efficient way to download the weather data from WeatherBug. I was using a terribly inefficient bash-script-scraper but moved on to BeautifulSoup. The performance is just too slow (it's running on a Raspberry Pi) so I need to use LXML.
What I have so far:
from lxml import etree
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")
But I get an error message. Weather.xml is this:
<?xml version="1.0" encoding="UTF-8"?>
<aws:weather xmlns:aws="http://www.aws.com/aws">
<aws:api version="2.0"/>
<aws:WebURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0&stat=TNKCN</aws:WebURL>
<aws:InputLocationURL>http://weather.weatherbug.com/PA/Tunkhannock-weather.html?ZCode=Z5546&Units=0</aws:InputLocationURL>
<aws:ob>
<aws:ob-date>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="10" hour-24="22"/>
<aws:minute number="26"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:ob-date>
<aws:requested-station-id/>
<aws:station-id>TNKCN</aws:station-id>
<aws:station>Tunkhannock HS</aws:station>
<aws:city-state zipcode="18657">Tunkhannock, PA</aws:city-state>
<aws:country>USA</aws:country>
<aws:latitude>41.5663871765137</aws:latitude>
<aws:longitude>-75.9794464111328</aws:longitude>
<aws:site-url>http://www.tasd.net/highschool/index.cfm</aws:site-url>
<aws:aux-temp units="&deg;F">-100</aws:aux-temp>
<aws:aux-temp-rate units="&deg;F">0</aws:aux-temp-rate>
<aws:current-condition icon="http://deskwx.weatherbug.com/images/Forecast/icons/cond013.gif">Cloudy</aws:current-condition>
<aws:dew-point units="&deg;F">40</aws:dew-point>
<aws:elevation units="ft">886</aws:elevation>
<aws:feels-like units="&deg;F">41</aws:feels-like>
<aws:gust-time>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="12" hour-24="12"/>
<aws:minute number="18"/>
<aws:second number="00"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:gust-time>
<aws:gust-direction>NNW</aws:gust-direction>
<aws:gust-direction-degrees>323</aws:gust-direction-degrees>
<aws:gust-speed units="mph">17</aws:gust-speed>
<aws:humidity units="%">98</aws:humidity>
<aws:humidity-high units="%">100</aws:humidity-high>
<aws:humidity-low units="%">61</aws:humidity-low>
<aws:humidity-rate>3</aws:humidity-rate>
<aws:indoor-temp units="&deg;F">77</aws:indoor-temp>
<aws:indoor-temp-rate units="&deg;F">-1.1</aws:indoor-temp-rate>
<aws:light>0</aws:light>
<aws:light-rate>0</aws:light-rate>
<aws:moon-phase moon-phase-img="http://api.wxbug.net/images/moonphase/mphase01.gif">0</aws:moon-phase>
<aws:pressure units=""">30.09</aws:pressure>
<aws:pressure-high units=""">30.5</aws:pressure-high>
<aws:pressure-low units=""">30.08</aws:pressure-low>
<aws:pressure-rate units=""/h">-0.01</aws:pressure-rate>
<aws:rain-month units=""">0.11</aws:rain-month>
<aws:rain-rate units=""/h">0</aws:rain-rate>
<aws:rain-rate-max units=""/h">0.12</aws:rain-rate-max>
<aws:rain-today units=""">0.09</aws:rain-today>
<aws:rain-year units=""">0.11</aws:rain-year>
<aws:temp units="&deg;F">41</aws:temp>
<aws:temp-high units="&deg;F">42</aws:temp-high>
<aws:temp-low units="&deg;F">29</aws:temp-low>
<aws:temp-rate units="&deg;F/h">-0.9</aws:temp-rate>
<aws:sunrise>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="7" hour-24="07"/>
<aws:minute number="29"/>
<aws:second number="53"/>
<aws:am-pm abbrv="AM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunrise>
<aws:sunset>
<aws:year number="2013"/>
<aws:month number="1" text="January" abbrv="Jan"/>
<aws:day number="11" text="Friday" abbrv="Fri"/>
<aws:hour number="4" hour-24="16"/>
<aws:minute number="54"/>
<aws:second number="19"/>
<aws:am-pm abbrv="PM"/>
<aws:time-zone offset="-5" text="Eastern Standard Time (USA)" abbrv="EST"/>
</aws:sunset>
<aws:wet-bulb units="&deg;F">40.802</aws:wet-bulb>
<aws:wind-speed units="mph">3</aws:wind-speed>
<aws:wind-speed-avg units="mph">1</aws:wind-speed-avg>
<aws:wind-direction>S</aws:wind-direction>
<aws:wind-direction-degrees>163</aws:wind-direction-degrees>
<aws:wind-direction-avg>SE</aws:wind-direction-avg>
</aws:ob>
</aws:weather>
I used http://www.xpathtester.com/test to test my xpath and it worked there. But I get the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2043, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:47570)
File "xpath.pxi", line 376, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:118247)
File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116728)
lxml.etree.XPathEvalError: Undefined namespace prefix
This is all very new to me -- Python, XML, and LXML. All I want is the observed time and the temperature.
Do my problems have anything to do with that aws: prefix in front of everything? What does that even mean?
Any help you can offer is greatly appreciated!
The problem has all "to do with that aws: prefix in front of everything"; it is a namespace prefix which you have to define. This is easily achievable, as in:
print doc.xpath('//aws:weather/aws:ob/aws:temp',
namespaces={'aws': 'http://www.aws.com/aws'})[0].text
The need for this mapping between the namespace prefix to a value is documented at http://lxml.de/xpathxslt.html.
Try something like this:
from lxml import etree
ns = etree.FunctionNamespace("http://www.aws.com/aws")
ns.prefix = "aws"
doc=etree.parse('weather.xml')
print doc.xpath("//aws:weather/aws:ob/aws:temp")[0].text
See this link: http://lxml.de/extensions.html

Categories