reformat string of index field when creating RethinkDB index - python

I am trying to create an index in RethinkDB.
The documents look like this:
{ "pinyin" : "a1 ai3"}
To make searching easier, I would like to preprocess the index entries and remove spaces and numbers, the entry thus should simply be "aai" in this case. What I tried are various variants of the following:
r.index_create('pinyin', lambda doc: doc['pinyin'].replace("1", "")).run()
This is a most simple case to build from, but even here I get an error
Expected 2 arguments but found 3 in:
r.table('collection').index_create('pinyin', lambda var_7: var_7['pinyin'].replace('1', ''))
It's obvious that I do not understand what's going on. Can anybody help? I gather that the lambda expression has to follow python syntax, but since it will be used on the server has to be JavaScript??

I do not have experience using rethinkdb. But if you would like to remove all numbers and space from a string value the below snippet might help.
import re
doc = { "pinyin" : "a1 ai3"}
removeIntSpace = lambda doc: re.sub("\d", "x", doc['pinyin'].replace(" ", ""))
print removeIntSpace(doc)
#r.index_create('pinyin', removeIntSpace(doc)).run()
Output:
axaix

OK, I figured it out and am posting here for others who might run into the same problem. This is how the index can be created in the data explorer of the rethinkdb admin console:
r.db("dics").table("collection").indexCreate('pinyin', r.row("pinyin").split("").filter(function(char) { return r.expr(["1", "2", "3", "4", " "]).contains(char).not(); }).reduce(function(a, b) { return a.add(b); }))
A corresponding python version would use an anonymous function with lambda in the filter part.

Related

Write a custom JSON interpreter for a file that looks like json but isnt using Python

What I need to do is to write a module that can read and write files that use the PDX script language. This language looks alot like json but has enough differences that a custom encoder/decoder is needed to do anything with those files (without a mess of regex substitutions which would make maintenance hell). I originally went with just reading them as txt files, and use regex to find and replace things to convert it to valid json. This lead me to my current point, where any additions to the code requires me to write far more code than I would want to, just to support some small new thing. So using a custom json thing I could write code that shows what valid key:value pairs are, then use that to handle the files. To me that will be alot less code and alot easier to maintain.
So what does this code look like? In general it looks like this (tried to put all possible syntax, this is not an example of a working file):
#key = value # this is the definition for the scripted variable
key = {
# This is a comment. No multiline comments
function # This is a single key, usually optimize_memory
# These are the accepted key:value pairs. The quoted version is being phased out
key = "value"
key = value
key = #key # This key is using a scripted variable, defined either in the file its in or in the `scripted_variables` folder. (see above for example on how these are initially defined)
# type is what the key type is. Like trigger:planet_stability where planet_stability is a trigger
key = type:key
# Variables like this allow for custom names to be set. Mostly used for flags and such things
[[VARIABLE_NAME]
math_key = $VARIABLE_NAME$
]
# this is inline math, I dont actually understand how this works in the script language yet as its new. The "<" can be replaced with any math symbol.
# Valid example: planet_stability < #[ stabilitylevel2 + 10 ]
key < #[ key + 10 ]
# This is used alot to handle code blocks. Valid example:
# potential = {
# exists = owner
# owner = {
# has_country_flag = flag_name
# }
# }
key = {
key = value
}
# This is just a list. Inline brackets are used alot which annoys me...
key = { value value }
}
The major differences between json and PDX script is the nearly complete lack of quotations, using an equals sign instead of a colon for separation and no comma's at the end of the lines. Now before you ask me to change the PDX code, I cant. Its not mine. This is what I have to work with and cant make any changes to the syntax. And no I dont want to convert back and forth as I have already mentioned this would require alot of work. I have attempted to look for examples of this, however all I can find are references to convert already valid json to a python object, which is not what I want. So I cant give any examples of what I have already done, as I cant find anywhere to even start.
Some additional info:
Order of key:value pairs does not technically matter, however it is expected to be in a certain order, and when not in that order causes issues with mods and conflict solvers
bool properties always use yes or no rather than true or false
Lowercase is expected and in some cases required
Math operators are used as separators as well, eg >=, <= ect
The list of syntax is not exhaustive, but should contain most of the syntax used in the language
Past work:
My last attempts at this all revolved around converting it from a text file to a json file. This was alot of work just to get a small piece of this to work.
Example:
potential = {
exists = owner
owner = {
is_regular_empire = yes
is_fallen_empire = no
}
NOR = {
has_modifier = resort_colony
has_modifier = slave_colony
uses_habitat_capitals = yes
}
}
And what i did to get most of the way to json (couldnt find a way to add quotes)
test_string = test_string.replace("\n", ",")
test_string = test_string.replace("{,", "{")
test_string = test_string.replace("{", "{\n")
test_string = test_string.replace(",", ",\n")
test_string = test_string.replace("}, ", "},\n")
test_string = "{\n" + test_string + "\n}"
# Replace the equals sign with a colon
test_string = test_string.replace(" =", ":")
This resulted in this:
{
potential: {
exists: owner,
owner: {
is_regular_empire: yes,
is_fallen_empire: no,
},
NOR: {
has_modifier: resort_colony,
has_modifier: slave_colony,
uses_habitat_capitals: yes,
},
}
}
Very very close yes, but in no way could I find a way to add the quotations to each word (I think I did try a regex sub, but wasnt able to get it to work, since this whole thing is just one unbroken string), making this attempt stuck and also showing just how much work is required just to get a very simple potential block to mostly work. However this is not the method I want anymore, one because its alot of work and two because I couldnt find anything to finish it. So a custom json interpreter is what I want.
The classical approach (potentially leading to more code, but also more "correctness"/elegance) is probably to build a "recursive descent parser", from a bunch of conditionals/checks, loops and (sometimes recursive?) functions/handlers to deal with each of the encountered elements/characters on the input stream. An implicit parse/call tree might be sufficient if you directly output/print the JSON equivalent, or otherwise you could also create a representation/model in memory for later output/conversion.
Related book recommendation could be "Language Implementation Patterns" by Terence Parr, me avoiding to promote my own interpreters and introductory materials :-) In case you need further help, maybe write me?

str.format a list by joining its values

Say I have a dictionary:
data = {
"user" : {
"properties" : ["i1", "i2"]
}
}
And the following string:
txt = "The user has properties {user[properties]}"
I want to have:
txt.format(**data)
to equal:
The user has properties i1, i2
I believe to achieve this, I could subclass the formatter used by str.format but I am unfortunately unsure how to proceed. I rarely subclass standard Python classes. Note that writing {user[properties][0]}, {user[properties][1]} is not an ideal option for me here. I don't know how many items are in the list so I would need to do a regex to identify matches, then find the relevant value in data and replace the matched text with {user[properties][0]}, {user[properties][1]}. str.format takes care of all the indexing from the string's value so it is very practical.
Just join the items in data["user"]["properties"]
txt = "The user has properties {properties}"
txt.format(properties = ", ".join(data["user"]["properties"]))
Here you have a live example
I ended up using the jinja2 package for all of my formatting needs. It's extremely powerful and I really recommend it!

Python Dict Transform

I've been having some strange difficulty trying to transform a dataset that I have.
I currently have a dictionary coming from a form as follows:
data['content']['answers']
I would like to have the ['answers'] appended to the first element of a list like so:
data['content'][0]['answers']
However when I try to create it as so, I get an empty dataset.
data['content'] = [data['content']['answers']]
I can't for the life of me figure out what I am doing wrong.
EDIT: Here is the opening JSON
I have:
{
"content" : {
"answers" : {
"3" : {
But I need it to be:
{
"content" : [
{
"answers" : {
"3" : {
thanks
You can do what you want by using a dictionary comprehension (which is one of the most elegant and powerful features in Python.)
In your case, the following should work:
d = {k:[v] for k,v in d.items()}
You mentioned JSON in your question. Rather than rolling your own parser (which it seems like you might be trying to do), consider using the json module.
If I've understood the question correctly, it sounds like you need data['contents'] to be equal to a list where each element is a dictionary that was previously contained in data['contents']?
I believe this might work (works in Python 2.7 and 3.6):
# assuming that data['content'] is equal to {'answers': {'3':'stuff'}}
data['content'] = [{key:contents} for key,contents in data['content'].items()]
>>> [{'answers': {'3': 'stuff'}}]
The list comprehension will preserve the dictionary content for each dictionary that was in contents originally and will return the dictionaries as a list.
Python 2 doc: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
Python 3 doc:
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
It would be best if you give us a concrete example of 'data' (what the dictionary looks like), what code you try to run, what result you get and what you except. I think I have an idea but can't be sure.
Your question isn't clear and lacks of an explicit example.
Btw, something like this can work for you?
data_list = list()
for content in data.keys():
data_list.append(data[content])

Find documents which contain a particular value - Mongo, Python

I'm trying to add a search option to my website but it doesn't work. I looked up solutions but they all refer to using an actual string, whereas in my case I'm using a variable, and I can't make those solutions work. Here is my code:
cursor = source.find({'title': search_term}).limit(25)
for document in cursor:
result_list.append(document)
Unfortunately this only gives back results which match the search_term variable's value exactly. I want it to give back any results where the title contains the search term - regardless what other strings it contains. How can I do it if I want to pass a variable to it, and not an actual string? Thanks.
You can use $regex to do contains searches.
cursor = collection.find({'field': {'$regex':'regular expression'}})
And to make it case insensitive:
cursor = collection.find({'field': {'$regex':'regular expression', '$options'‌​:'i'}})
Please try cursor = source.find({'title': {'$regex':search_term}}).limit(25)
$text
You can perform a text search using $text & $search. You first need to set a text index, then use it:
$ db.docs.createIndex( { title: "text" } )
$ db.docs.find( { $text: { $search: "search_term" } } )
$regex
You may also use $regex, as answered here: https://stackoverflow.com/a/10616781/641627
$ db.users.findOne({"username" : {$regex : ".*son.*"}});
Both solutions compared
Full Text Search vs. Regular Expressions
... The regular expression search takes longer for queries with just a
few results while the full text search gets faster and is clearly
superior in those cases.

Syntax Error (FROM) in Python, I do not want to use it as function but rather use it as to print something

I am trying to print out usernames from Instagram. When I type in print i.from.username, there will be syntax error because Python thinks that I am using from function, which i actually not.
for i in a:
print i.from.username
Is there anyway to troubleshoot it? I tried using making a string but it is still wrong. What I try to did was:
for i in a:
print i+ ".from." +username
Base on the comments:
I'm not trying to put from as a key attribute. What I'm trying to do is collect data from Instagram API.
The a represents the comments, so basically I'm going into the comments to collect the usernames that commented.
"text": "This is #kimsoohyun 's "house" in The Producer!",
"from": {
"username": "lilingchen",
},
If I put i.text, it will print out every comments. Now, I wanted to print out the username that commented, so I tried using i.from.username.
print getattr(i, 'from').username

Categories