I'm using pyravendb + ravendb in order to store webpages. The main issue here is that when the url is in this form :
http://www.somedomain.com/nicepage.html?stuff=param&id=021345
pyravendb seems lost and tries to find the 021345 index (which obviously doesnt exists).
An example would be the following :
the url
http://www.example.com/ebx/LinkResolverServlet?classofcontent=Standard&id=63935
the query
session.query().where_equals("url",url).select("Id","html","date","metadata")
gives this stack
File "/home/myusername/***********/somepythonfile.py", line 60, in getDocumentbyURL
query_result = list(session.query().where_equals("url",url).select("Id","html","date","metadata"))
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 92, in __iter__
return self._execute_query().__iter__()
File "/usr/local/lib/python3.5/dist-packages/pyravendb/store/session_query.py", line 332, in _execute_query
includes=self.includes)
File "/usr/local/lib/python3.5/dist-packages/pyravendb/d_commands/database_commands.py", line 286, in query
raise exceptions.ErrorResponseException(response["Error"][:100])
pyravendb.custom_exceptions.exceptions.ErrorResponseException: Could not find index named: 63935
Could not find index named: 63935
Which is normal since there is no index 63935, it seems to be mistaking the url parameter for a query parameter.
Any help in how to fix it ?
Thank you !
Fixed via a new version of RavenDB !
Related
I am taking cs50 class. Currently on Week 7.
Prior to this coding, python was working perfectly fine.
Now, I am using SQL command within python file on VS Code.
cs50 module is working fine through venv.
When I execute python file, I should be asked "Title: " so that I can type any titles to see the outcome.
I should be getting an output of the counter, which tracks the number of occurrence of the title from user input.
import csv
from cs50 import SQL
db = SQL("C:\\Users\\wf user\\Desktop\\CODING\\CS50\\shows.db")
title = input("Title: ").strip()
#uses SQL command to return the number of occurrence of the title the user typed.
rows = db.execute("SELECT COUNT(*) AS counter FROM shows WHERE title LIKE ?", title) #? is for title.
#db.execute always returns a list of rows even if it's just one row.
#setting row to the keyword which is is rows[0]. the actual value is in rows[1]
row = rows[0]
#passing the key called counter will print out the value that is in rows[1]
print(row["counter"])
I have shows.db in the path.
But the output is printing "Found". It's not even asking for a Title to input.
PS C:\Users\wf user\Desktop\CODING\CS50> python favoritesS.py
Found
I am expecting the program to ask me "Title: " for me, but instead it's print "Found"
In cs50, the professor encountered the same problem when he was coding phonebook.py, but the way he solved the problem was he put the python file into a separate folder called "tmp"
I tried the same way but then I was given a long error message
PS C:\Users\wf user\Desktop\CODING\CS50> cd tmp
PS C:\Users\wf user\Desktop\CODING\CS50\tmp> python favoritesS.py
Traceback (most recent call last):
File "C:\Users\wf user\Desktop\CODING\CS50\tmp\favoritesS.py", line 5, in <module>
db = SQL("C:\\Users\\wf user\\Desktop\\CODING\\CS50\\shows.db")
File "C:\Users\wf user\AppData\Local\Programs\Python\Python311\Lib\site-packages\cs50\sql.py", line 74, in __init__
self._engine = sqlalchemy.create_engine(url, **kwargs).execution_options(autocommit=False, isolation_level="AUTOCOMMIT")
File "<string>", line 2, in create_engine
File "C:\Users\wf user\AppData\Local\Programs\Python\Python311\Lib\site-packages\sqlalchemy\util\deprecations.py", line 309, in warned
return fn(*args, **kwargs)
File "C:\Users\wf user\AppData\Local\Programs\Python\Python311\Lib\site-packages\sqlalchemy\engine\create.py", line 518, in create_engine
u = _url.make_url(url)
File "C:\Users\wf user\AppData\Local\Programs\Python\Python311\Lib\site-packages\sqlalchemy\engine\url.py", line 732, in make_url
return _parse_url(name_or_url)
File "C:\Users\wf user\AppData\Local\Programs\Python\Python311\Lib\site-packages\sqlalchemy\engine\url.py", line 793, in _parse_url
raise exc.ArgumentError(
sqlalchemy.exc.ArgumentError: Could not parse SQLAlchemy URL from string 'C:\Users\wf user\Desktop\CODING\CS50\shows.db'
Here is the proof that the code I posted is the same code I am working on.
I use Start Debugging under Run menu on VSCode and it's working! But not when I don't use debugging.
Is this the library you are using? https://cs50.readthedocs.io/
It may be that one of your intermediate results is not doing what you think it is. I would recommend you put print() statements at every step of the way to see the values of the intermediate variables.
If you have learned how to use a debugger, that is even better.
Goal:
The goal is to generate a robot in replit which will iteratively scrape yahoo pages like this amazon page, and track the dynamic 'Volume' datapoint for abnormally large changes. I'm currently trying to be able to reliably pull this exact datapoint down, and I have been using the yahoo_fin API to do so. I have also considered using bs4, but I'm not sure if it is possible to use BS4 to extract dynamic data. (I'd greatly appreciate it if you happen to know the answer to this: can bs4 extract dynamic data?)
Problem:
The script seems to work, but it does not stay online due to what appears to be an error in yahoo_fin. Usually within around 5 minutes of turning the bot on, it throws the following error:
File "/home/runner/goofy/scrape.py", line 13, in fetchCurrentVolume
table = si.get_quote_table(ticker)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/yahoo_fin/stock_info.py", line 293, in get_quote_table
tables = pd.read_html(requests.get(site, headers=headers).text)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 1098, in read_html
return _parse(
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 926, in _parse
raise retained
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 906, in _parse
tables = p.parse_tables()
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 222, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/pandas/io/html.py", line 552, in _parse_tables
raise ValueError("No tables found")
ValueError: No tables found
However, this usually happens after a number of tables have already been found.
Here is the fetchCurrentVolume function:
import yahoo_fin.stock_info as si
def fetchCurrentVolume(ticker):
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
return currentVolume
and the API documentation is found above under Goal. Whenever this error message is displayed, the bot exits a #tasks.loop , and the robot goes offline. If you know of a way to fix the current use of yahoo_fin, OR any other way to obtain the dynamic data found in this xpath: '//div[#id="quote-summary"]/div/table/tbody/tr' , then you will have pulled me out of a 3 weeks long debacle with this issue! Thank you.
If you are able to retrieve some data then it cuts out, it is probably due to a rate limit. Try adding a sleep of a few seconds between each one.
see here for how to use sleep
Maybe the web server bonks out when the tables are being re-written every so often. Or something like that.
If you use a try/except waited a few seconds and then tried again before bailing out to a failure maybe that would work if it is just a hicup once in a while?
import yahoo_fin.stock_info as si
import time
def fetchCurrentVolume(ticker):
try:
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
except:
# hopefully this was just a hicup and it will be back up in 5 seconds
time.sleep(5)
table = si.get_quote_table(ticker)
currentVolume = table['Volume']
return currentVolume
I have a neo4j database with nodes that have labels "Program" and "Session". In the Neo4j database I've enforced a uniqueness constraint on the properties: "name" and "href". From the :schema
Constraints
ON (program:Program) ASSERT program.href IS UNIQUE
ON (program:Program) ASSERT program.name IS UNIQUE
ON (session:Session) ASSERT session.name IS UNIQUE
ON (session:Session) ASSERT session.href IS UNIQUE
I want to periodically query another API (thus storing the name and API endpoint href as properties), and only add new nodes when they're not already in the database.
This is how I'm creating the nodes:
newprogram, = graph_db.create(node(name = programname, href = programhref))
newprogram.add_labels('Program')
newsession, = graph_db.create(node(name = sessionname, href = sessionhref))
newsession.add_labels('Session')
I'm running into the following error:
Traceback (most recent call last):
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/Users/jedc/appfolder/applicationapis.py", line 42, in post
newprogram.add_labels('Program')
File "/Users/jedc/appfolder/py2neo/util.py", line 99, in f_
return f(*args, **kwargs)
File "/Users/jedc/appfolder/py2neo/core.py", line 1638, in add_labels
if err.response.status_code == BAD_REQUEST and err.cause.exception == 'ConstraintViolationException':
AttributeError: 'ConstraintViolationException' object has no attribute 'exception'
My thought was that if I try to add the nodes and they're already in the database they just won't be added.
I've done a try/except AttributeError block around the creation/add_labels lines, but when I did that I managed to duplicate everything that was already in the database, even though I had the constraints shown. (?!?) (How can py2neo manage to violate those constraints??)
I'm really confused, and would appreciate any help in figuring out how to add nodes only when they don't already exist.
The problem seems to be that you are first creating nodes without a label and then subsequently adding the label after creation.
That is
graph_db.create(node(name = programname, href = programhref))
and
graph_db.create(node(name = sessionname, href = sessionhref))
This, first creates nodes without any labels which means the nodes satisfy the constraint conditions which only apply to nodes with the labels Program and Session.
Once you call newprogram.add_labels('Program') and newsession.add_labels('Session') Neo4j attempts to add labels to the node and raises an exception since the constraint assertions cannot be met.
Py2neo may be creating duplicate nodes. Although I'm sure if you inspect them, you'll find one set of nodes has the labels and the other set does not.
Can you use py2neo in a way that it adds the label at the same time as creation?
Otherwise you could use a Cypher query
CREATE (program:Program{name: {programname}, href: {programhref}})
CREATE (session:Session{name: {sessionname}, href: {sessionhref}})
Using Py2neo you should be able to do this as suggested in the docs
graph_db = neo4j.GraphDatabaseService()
qs = '''CREATE (program:Program{name: {programname}, href: {programhref}})
CREATE (session:Session{name: {sessionname}, href: {sessionhref}})'''
query = neo4j.CypherQuery(graph_db, qs)
query.execute(programname=programname, programhref=programhref,
sessionname=sessionname, sessionhref=sessionhref)
Firstly, the stack trace that you've shown highlights a a bug that should be fixed in the latest version of py2neo (1.6.4 at the time of writing this). There was an issue whereby error detail dropped an expected "exception" key and this has now been fixed so upgrading should give you a better error message.
However, this only addresses the error reporting bug. In terms of the constraint question itself, it is correct that the node creation and application of labels are necessarily carried out in two steps. This is due to a limitation in the REST API that does not allow a direct method for creating a node with label detail.
The next version of py2neo will make this easier/possible in a single step via batching. But for now, you probably want to look at a Cypher statement to carry out the creation and labelling as mentioned in the other answer here.
I am trying to make an app similar to StumbleUpon using Python as a back end for a personal project . From the database I retrieve a website name and then I open that website with webbrowser.open("http://www.website.com"). Sounds pretty straight forward right but there is a problem. When I try to open the website with webbrowser.open("website.com") it returns the following error:
File "fetchall.py", line 18, in <module>
webbrowser.open(x)
File "/usr/lib/python2.6/webbrowser.py", line 61, in open
if browser.open(url, new, autoraise):
File "/usr/lib/python2.6/webbrowser.py", line 190, in open
for arg in self.args]
TypeError: expected a character buffer object
Here is my code:
import sqlite3
import webbrowser
conn = sqlite3.connect("websites.sqlite")
cur = conn.cursor()
cur.execute("SELECT WEBSITE FROM COLUMN")
x = cur.fetchmany(1)
webbrowser.open(x)
EDIT
Okay thanks for the reply, but now I'm receiving this: "Error showing URL: Error stating file '/home/user/(u'http:bbc.co.uk,)': No such file or directory".
What's going on ?
webbrowser.open is expecting a character buffer, but fetchmany returns a list. So webbrowser.open(x[0]) should do the trick.
I am writing a google app engine app and I have this key value error upon requests coming in
from the backtrace I just access and cause the key error
self.request.headers
entire code snippet is here, I just forward the headers unmodified
response = fetch( "%s%s?%s" % (
self.getApiServer() ,
self.request.path.replace("/twitter/", ""),
self.request.query_string
),
self.request.body,
method,
self.request.headers,
)
and get method handling the request calling proxy()
# handle http get
def get(self, *args):
parameters = self.convertParameters(self.request.query_string)
# self.prepareHeader("GET", parameters)
self.request.query_string = "&".join("%s=%s" % (quote(key) , quote(value)) for key, value in parameters.items())
self.proxy(GET, *args)
def convertParameters(self, source):
parameters = {}
for pairs in source.split("&"):
item = pairs.split("=")
if len(item) == 2:
parameters[item[0]] = unquote(item[1])
return parameters
the error back trace:
'CONTENT_TYPE'
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 513, in __call__
handler.post(*groups)
File "/base/data/home/apps/waytosing/1.342850593213842824/com/blogspot/zizon/twitter/RestApiProxy.py", line 67, in post
self.proxy(POST, *args)
File "/base/data/home/apps/waytosing/1.342850593213842824/com/blogspot/zizon/twitter/RestApiProxy.py", line 47, in proxy
self.request.headers,
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 240, in fetch
allow_truncated, follow_redirects)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 280, in make_fetch_call
for key, value in headers.iteritems():
File "/base/python_runtime/python_dist/lib/python2.5/UserDict.py", line 106, in iteritems
yield (k, self[k])
File "/base/python_runtime/python_lib/versions/1/webob/datastruct.py", line 40, in __getitem__
return self.environ[self._trans_name(item)]
KeyError: 'CONTENT_TYPE'
Any idea why it happens or is this a known bug?
This looks weird. The docs mention that response "Headers objects do not raise an error when you try to get or delete a key that isn't in the wrapped header list. Getting a nonexistent header just returns None". It's not clear from the request documentation if request.headers are also objects of this class, but even they were regular dictionaries, iteritems seems to be misbehaving. So this might be a bug.
It might be worth inspecting self.request.headers, before calling fetch, and see 1) its actual type, 2) its keys, and 3) if trying to get self.request.headers['CONTENT_TYPE'] raises an error then.
But, if you simply want to solve your problem and move forward, you can try to bypass it like:
if 'CONTENT_TYPE' not in self.request.headers:
self.request.headers['CONTENT_TYPE'] = None
(I'm suggesting setting it to None, because that's what a response Header object should return on non-existing keys)
Here's my observation about this problem:
When the content-type is application/x-www-form-urlencoded and POST data is empty (e.g. jquery.ajax GET, twitter's favorite and retweet API...), the content-type is dropped by Google appengine.
You can add:
self.request.headers.update({'content-type':'application/x-www-form-urlencoded'})
before urlfetch.
Edit: indeed, looking at the error more carefully, it doesn't seem to be related to convertParameters, as the OP points out in the comments. I'm retiring this answer.
I'm not entirely sure what you mean by "just forward the headers unmodified", but have you taken a look at self.request.query_string before and after you call convertParameters? More to the point, you're leaving out any (valid) GET parameters of the form "key=" (that is, keys with empty values).
Maybe your original query_string had a value like "CONTENT_TYPE=", and your convertParameters is stripping it out.
Known issue http://code.google.com/p/googleappengine/issues/detail?id=3427 and potential workarounds here http://code.google.com/p/googleappengine/issues/detail?id=2040