I have a database with a bunch of regular documents that look something like this (example from wiki):
{
"_id":"some_doc_id",
"_rev":"D1C946B7",
"Subject":"I like Plankton",
"Author":"Rusty",
"PostedDate":"2006-08-15T17:30:12-04:00",
"Tags":["plankton", "baseball", "decisions"],
"Body":"I decided today that I don't like baseball. I like plankton."
}
I'm working in Python with couchdb-python and I want to know if it's possible to add a field to each document. For example, if I wanted to have a "Location" field or something like that.
Thanks!
Regarding IDs
Every document in couchdb has an id, whether you set it or not. Once the document is stored you can access it through the doc._id field.
If you want to set your own ids you'll have to assign the id value to doc._id. If you don't set it, then couchdb will assign a uuid.
If you want to update a document, then you need to make sure you have the same id and a valid revision. If say you are working from a blog post and the user adds the Location, then the url of the post may be a good id to use. You'd be able to instantly access the document in this case.
So what's a revision
In your code snippet above you have the doc._rev element. This is the identifier of the revision. If you save a document with an id that already exists, couchdb requires you to prove that the document is still the valid doc and that you are not trying to overwrite someone else's document.
So how do I update a document
If you have the id of your document, you can just access each document by using the db.get(id) function. You can then update the document like this:
doc = db.get(id)
doc['Location'] = "On a couch"
db.save(doc)
I have an example where I store weather forecast data. I update the forecasts approximately every 2 hours. A separate process is looking for data that I get from a different provider looking at characteristics of tweets on the day.
This looks something like this.
doc = db.get(id)
doc_with_loc = GetLocationInformationFromOtherProvider(doc) # takes about 40 seconds.
doc_with_loc["_rev"] = doc["_rev"]
db.save(doc_with_loc) # This will fail if weather update has also updated the file.
If you have concurring processes, then the _rev will become invalid, so you have to have a failsave, eg. this could do:
doc = db.get(id)
doc_with_loc = GetLocationInformationFromAltProvider(doc)
update_outstanding = true
while update_outstanding:
doc = db.get(id) //reretrieve this to get
doc_with_loc["_rev"] = doc["_rev"]
update_outstanding = !db.save(doc_with_loc)
So how do I get the Ids?
One option suggested above is that you actively set the id, so you can retrieve it. Ie. if a user sets a given location that is attached to a URL, use the URL. But you may not know which document you want to update - or even have a process that finds all the document that don't have a location and assign one.
You'll most likely be using a view for this. Views have a mapper and a reducer. You'll use the first one, forget about the last one. A view with a mapper does the following:
It returns a simplyfied/transformed way of looking at your data. You can return multiple values per data or skip some. It gives the data you emit a key, and if you use the _include_docs function it will give you the document (with _id and rev alongside).
The simplest view is the default view db.view('_all_docs') this will return all documents and you may not want to update all of them. Views for example will be stored as a document as well when you define these.
The next simple way is to have view that only returns items that are of the type of the document. I tend to have a _type="article in my database. Think of this as marking that a document belongs to a certain table if you had stored them in a relational database.
Finally you can filter elements that have a location so you'd have a view where you can iterate over all those docs that still need a location and identify this in a separate process. The best documentation on writing view can be found here.
Related
I have an established connection with a notes database and I am able to loop through all the records in a view. What I am curious about if it is possible to open a document and get the data from it using python. (Like double clicking on a record from an HCL Notes Client).
Here is my code simplified:
import noteslib
db = noteslib.Database('my-domino-server','my-db.nsf', 'mypassword')
view = db.GetView('my_view')
doc = view.GetFirstDocument()
while doc:
print(doc.ColumnValues)
#here after printing the column values, I want to open the document and store it's values in a variable.
doc = view.GetNextDocument(doc)
I tried googling about LotusScript and I found the Open() method, but doc.Open() did not work.
Just use the LotusScript documentation to find examples for everything you need.
In your case you start with the NotesDatabase - class, then get an object of type NotesView and finally get a NotesDocument object.
This doc object does not need to be opened. You can directly access all items in that document either by their name or -if you don't know the name- by cycling through all items.
If you e.g. know the name of an item (can be found in the document properties box on the second tab, found with Alt + Enter) then you can read the value like this:
#Subject of a given mail
subject = doc.GetitemValue( "Subject" )[0]
#Start date of a calendar entry
startdate = doc.GetItemValue( "StartDate" )[0]
# cycle through all items
for item in doc.Items
print(item.Name)
value = item.Values
Take care: items are always arrays, even if they contain only a single value. To get the value of a single value item, always access the element at position 0.
I am using the python library to programmatically create documents in a collection as such:
user = client.query(q.create(q.collection("my_collection"), {
"data": {
"UTC_datetime": str(datetime.now(pytz.UTC)),
"item_one": str(value_one),
"item_two": str(value_two),
"item_three": str(value_three)
}
}))
Upon certain conditions being met the python app executes again.
If item_two on the next app execution has the same value again I do not want a new document to be created.
How do I craft the above query to perform this?
Currently, I am reading the previous document, extracting the value from item_two and performing an if/else statement to either proceed to store a new document or sys.exit().
I'm positive there is a more elegant solution that is based within Fauna's logic instead of Python's, however, I have not been able to achieve this.
You can create a unique index (https://docs.fauna.com/fauna/current/api/fql/indexes) for item_two to ensure that duplicates are not possible. You may also want to an upsert implementation
https://forums.fauna.com/t/multi-document-upsert/488/3
https://forums.fauna.com/t/does-fauna-supports-upserts/208
q.If(
q.Exists(q.Match(q.Index('unique_item_two'), str(value_two))),
q.Update(...),
q.Create(...)
)
I'm currently trying to create a small python program using SolrClient to index some files.
My need is that I want to index some file content and then add some attributes to enrich the document.
I used the post command line tool to index the files. Then I use a python program trying to enrich documents, something like this:
doc = solr.get('collection', id)
doc['new_attribute'] = 'value'
solr.index_json('collection',json.dumps([doc]))
solr.commit(openSearcher=True)
Problem is that I have the feeling that we lost file content index. If I run a query with a word present in all attributes of the doc, I find it.
If I run a query with a word only in the file, it does not work (it works indexing only the file with post without my update tentative).
I'm not sure to understand how to update the doc keeping the index created by the post command.
I hope I'm clear enough, maybe I misunderstood the way it works...
thanks a lot
If I understand correctly, you want to modify an existing record. You should be able to do something like this without using a solr.get:
doc = [{'id': 'value', 'new_attribute':{'set': 'value'}}]
solr.index_json('collection',json.dumps([doc]))
See also:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
It has worked for me in this way, it can be useful for someone
from SolrClient import SolrClient
solrConect = SolrClient("http://xx.xx.xxx.xxx:8983/solr/")
doc = [{'id': 'my_id', 'count_related_like':{'set': 10}}]
solrConect.index_json("my_collection", json.dumps(doc) )
solrConect.commit("my_collection", softCommit=True)
Trying with Curl did not change anything. I did it differently so now it works. Instead of adding the file with the post command and trying to modify it afterwards, I read the file in a string and index in a "content" field. It means every document is added in one shot.
The content field is defined as not stored, so I just index it.
It works fine and suits my needs. It's also more simple since it removes many attributes set by post command that I don't need.
If I find some time, I'll try again the partial update and update the post.
Thanks
RĂ©mi
I have a simple Firebase that I mostly interact with via Javascript, which works really well. However, I also have a Python program that needs to get data from existing children and put/update data on existing children. I tried python-firebasin, which would do what I want, but it is unreliable (hangs, fails, etc.).
So I'm looking at the python-firebase REST wrapper. This seems efficient, and works well. However, every time I try to post() data, I get not just the data I'm posting, but some kind of unique string paired with it, all inserted as a child.
For example, via Javascript, I might say:
db = new Firebase('https://myfirebase.firebaseio.com/testval/');
db.transaction(function(current) { return 1; });
This would then give me a Firebase that looked like:
|---testval: 1
But when I try to do something similar with the Python Firebase REST wrapper, such as:
db = firebase.FirebaseApplication('https://myfirebase.firebaseio.com/')
db.post('/testval/',1)
My Firebase looks something like this:
|---testval:
|---JI4BiBbICSEAnM9mDXf: 1
In other words, it inserts a new child, gives it a new string, and then appends the data. Is there any way to insert/modify data on my Firebase using the REST wrapper that would do it cleanly like I'm doing with Javascript? Without adding children, without adding these unique strings?
Try this instead:
db.put(1)
db.post() is the equivalent of .push() in the JavaScript API, so it creates a unique ID for you. db.put() is equivalent to .set() and will just set the data, which appears to be what you want.
Note that there is no equivalent for transactions in the REST API, but your example was just using a transaction to do a .set() so hopefully you don't actually need them.
Try this:
db = firebase.FirebaseApplication('https://myfirebase.firebaseio.com/')
db.put('', 'testval', 1)
put takes three arguments : first is url or path, second is the key name or the snapshot name and third is the data(json)
Even with all I do know about the AppEngine datastore, I don't know the answer to this. I'm trying to avoid having to write and run all the code it would take to figure it out, hoping someone already knows the answer.
I have code like:
class AddlInfo(db.Model)
user = db.ReferenceProperty(User)
otherstuff = db.ListProperty(db.Key, indexed=False)
And create the record with:
info = AddlInfo(user=user)
info.put()
To get this object I can do something like:
# This seems excessively wordy (even though that doesn't directly translate into slower)
info = AddlInfo.all().filter('user =', user).fetch(1)
or I could do something like:
class AddlInfo(db.Model)
# str(user.key()) is the key to this record
otherstuff = db.ListProperty(db.Key, indexed=False)
Creation looks like:
info = AddlInfo(key_name=str(user.key()))
info.put()
And then get the info with:
info = AddlInfo.get(str(user.key()))
I don't need the reference_property in the AddlInfo, (I got there using the user object in the first place). Which is faster/less resource intensive?
==================
Part of why I was doing it this way is that otherstuff could be a list of 100+ keys and I only need them sometimes (probably less than 50% of the time) I was trying to make it more efficient by not having to load those 100+ keys on every request.....
Between those 2 options, the second is marginally cheaper, because you're determining the key by inference rather than looking it up in a remote index.
As Wooble said, it's cheaper still to just keep everything on one entity. Consider an Expando if you just need a way to store a bunch of optional, ad-hoc properties.
The second approach is the better one, with one modification: There's no need to use the whole key of the user as the key name of this entity - just use the same key name as the User record.