Bulk Undelete CouchDB Docs with CouchDB-Python - python

I accidentally deleted all the docs in a CouchDB database and I want to undelete them.
CouchDB version = 2.2.0
Python = 2.7 and I'm using the python-couchDB library.
My CouchDB is not doing any compaction and 1260 documents are listed in the doc_del_count when I call /couchip:5984/my_db
I followed directions here:
Retrieve just deleted document
and adjusted it in Python like so:
docs_to_put_back = []
ids_to_put_back = []
for id in db.changes()['results']:
ids_to_put_back.append(id)
for id in ids_to_put_back:
rev = db.get(id, revs=True, open_revs='all')
current_revision = rev[0]['ok']['_rev']
current_number = rev[0]['ok']['_revisions']['start']
rev_id = rev[0]['ok']['_revisons']['ids'][1] # The last revision id
old_doc = db.get(id, rev=str(current_number-1)+'-'+rev_id)
I started printing old_doc from here and most of the time, it returned a NoneType but I did see that some were printing out the documents that I wanted to restore so I added this to the code:
if old_doc != None:
db.save(old_doc, rev=current_revision)
This didn't work and nothing restored to my database. Now when I try to look at all revisions of these documents, I can't seem to return anything but a NoneType when I call old_doc. I tried looping through all the revisions like this:
for id in ids_to_put_back:
rev = db.get(_id, revs=True, open_revs='all')
current_revision = rev[0]['ok']['_rev']
current_number = rev[0]['ok']['_revisions']['start']
rev_list = rev[0]['ok']['_revisions']['ids']
counter = 1
for revision in rev_list[1:]:
old_doc = db.get(_id, rev=str(current_number-counter)+'-'+revision)
if old_doc == None:
counter += 1
continue
elif old_doc != None:
docs_to_put_back.append(old_doc)
break
else:
pass
docs_to_put_back is returning an empty list. From my understanding, if my database is not compacting, I should be able to get the old documents if I have their old revision numbers. However, from what I've been reading, it seems like this may not be the case.
I've also tried putting the document back into the database first and then tried to get an old revision number with curl like so:
curl -X PUT http://localhost:5984/db/id
{"ok": true, "id":"id", "rev":""3-b7ff1b0135c051822dd2958aec1a1b9c"}
curl -X GET http://localhost:5984/db/id?rev=2-1301c6dd3257decf978655f553ae8fa4
{"_id":"id", "rev":"2-1301c6dd3257decf978655f553ae8fa4", "_deleted":true}
curl -X GET http://localhost:5984/db/id?rev=1-42283a6b30639b12adddb814ba9ee4dc
{"error":"not_found", "reason":"missing"}
Am I hosed? Does CouchDB not have access to all the old revisions (which I've read is a kind of a misnomer)?
This wasn't my best day so if you can please help, that would be awesome! Thank you!

Turns out I am hosed so no you can't get those deleted documents back if you wait too long. By default, CouchDB will check your databases every hour and run compaction if it's over something like 131 kb. After that happens, all my old revisions are "missing."
If only CouchDB had an Undo button for user error....

Related

Django - error from call to values_list: 'flat' is not defined

Here's a loop I'm using to process an Excel sheet with openpyxl. On occasion the partno field is missing and I try to derive it from another database table (booking) using a field called transmittal. I'm having trouble with this lookup and it returns an error. When I run the same thing under the Django shell, it works fine.
parser = parse_defrev(book)
brs = []
xmtl_part = {}
invalid = []
for r in parser.process():
# If no partno, try to look it up in the bookings table
if r.get('partno') is None:
xmtl = r.get('transmittal')
if xmtl:
r['partno'] = xmtl_part.get(xmtl)
if not r['partno']:
parts = set(Booking.objects.filter(bookingtransmittal=r['transmittal']).values_list('bookingpartno', flat=True))
if len(parts) == 1:
r['partno'] = parts.pop()
xmtl_part[xmtl] = r['partno']
process is a generator function in the parse_defrev class which returns a dictionary with the processed fields from the workbook. If partno is missing, then it first tries to retrieve it from a caching dictionary and then from the booking table. The values_list returns any matching partno values, and if there is more than 1 match it just logs the problem and skips to the next row.
I get the following error, and don't understand how it could fail to recognize the flat keyword.
parts = set(Booking.objects.filter(bookingtransmittal=r['transmittal']).values_list('bookingpartno', flat=True))
NameError: name 'flat' is not defined
However it works in the Django shell, after importing Booking, I do this.
>>> r = dict()
>>> r['transmittal'] = '21/05 01'
>>> parts = set(Booking.objects.filter(bookingtransmittal=r['transmittal']).values_list('bookingpartno', flat=True))
& irrespective of whether there are zero or more fields in the lookup result, it returns the expected result, which is a set containing the part numbers.
Python 3.9.5, Django 3.2.4, Windows client under Visual Studio Code with git bash shell, to CentOS PostgreSQL back end.
There's an easy workaround, which is a simple change, but I'd still like to understand why the flat keyword isn't recognized here.
parts = set(Booking.objects.filter(bookingtransmittal=r['transmittal']).values_list('bookingpartno'))
if len(parts) == 1:
r['partno'] = parts.pop()[0]

How can I efficiently get a filtered list of svn commits in python?

I'm making an application in python that needs to be able to get information (date, author, files changed) about the latest commits to an svn repository.
I've been trying with the svn library and my application can get the information it needs by manually filtering through something like this:
import svn.local
repo = svn.local.LocalClient('/my/svn/repo')
for rel_path, entry in repo.list_recursive():
revision = entry['commit_revision']
date = entry['date']
The problem with this is that it iterates through every file in the repo, getting commit info on the file, and the load time is nearly a minute long on a fairly powerful machine.
If it's possible, I'm looking for some way to iterate through a list of commits starting with the latest revision going back N revisions where N will be provided by the user.
According to the documentation there is a log_default method which would allow you to do this easier and even has an option limit which could be used to get the n items. For example:
import svn.local
def print_commits(repo, limit=5):
client = svn.local.LocalClient(repo)
for commit in client.log_default(limit=limit):
revision = commit.revision
date = commit.date
print("{}:{}".format(date, revision))
print_commits("repo")

How to query specific information for an issue?

I'm trying to use JIRA python api to receive the list of tickets that have been raised in the last 30 days but whenever i run
Issue = main.jira.issue("PLAT-38592")
i = main.jira.issue(issue, "Summary")
print(i)
All that gets returned is
PLAT-38592
Then i try to poke at the issue
Issue = main.jira.issue("PLAT-38592")
print (Issue)
And all that gets returned is
PLAT-38592
I need to be able receive information from this ticket but it only returns a string
Issues are objects. You can access their content by accessing fields.
If you for example want to access the summary, you can use (according to the docs):
issue = jira.issue('PLAT-38592', fields='summary')
summary = issue.fields.summary
print(summary)

django-mongodb-engine can't update an object

I am writing a Django-based back end using django-mongodb-engine for an android app and I'm trying to get data from a PUT request to update a record in my database. I'm getting a username and filtering the database for the user object with that name (successfully), but the save function doesn't seem to be saving the changes. I can tell the changes aren't being saved because when I go onto mLab's online database management tool the changes aren't there.
Here's the code:
existing_user = User.objects.filter(userName = user_name)
if existing_user == None:
response_string += "<error>User not identified: </error>"
elif (existing_user[0].password != user_pwd):
response_string += "<error>Password error.</error>"
#if we have a validated user, then manipulate user data
else:
existing_user[0].star_list.append(new_star)
existing_user[0].save()
I'm not getting any error messages, but the data remains the same. The star_list remains empty after the above. In fact, as a test I even tried replacing the else clause above with:
else:
existing_user[0].userName = "Barney"
existing_user[0].save()
And following this call, the existing_user[0].userName is still it's original value ("Fred", rather than "Barney")!
Found an answer to this question. I'm not sure why it wasn't working as posted, but there were problems trying to access the object through the list... in other words the following didn't work:
existing_user[0].userName = "Barney"
existing_user[0].save()
But this did:
found_user = existing_user[0]
found_user.userName = "Barney"
found_user.save()
Based on my understanding of Python lists, either one should be the same... but maybe some python genius out there can explain why they aren't?

Tumblr API paging bug when fetching followers?

I'm writing a little python app to fetch the followers of a given tumblr, and I think I may have found a bug in the paging logic.
The tumblr I am testing with has 593 followers and I know the API is block limited to 20 per call. After successful authentication, the fetch logic looks like this:
offset = 0
while True:
response = client.followers(blog, limit=20, offset=offset)
bunch = len(response["users"])
if bunch == 0:
break
j = 0
while j < bunch:
print response["users"][j]["name"]
j = j + 1
offset += bunch
What I observe is that on the third call into the API with offset=40, the first name returned on the list is one I saw in the previous group. It's actually the 38th name. This behavior (seeing one or more names I've seen before) repeats randomly from that point on, though not in every call to the API. Some calls give me a fresh 20 names. It's repeatable across multiple test runs. The sequence I see them in is the same as on Tumblr's site, I just see many of them twice.
An interesting coincidence is that the total number of of non-unique followers returned is the same as what the "Followers" count indicates on the blog itself (593). But only 516 of them are unique.
For what it's worth, running the query on Tumblr's console page returns the same results regardless of the language I choose, so I'm not inclined to think this is a bug in the PyTumblr client, but something lower, at the API level.
Any ideas?

Categories