Automate conversion of SQL query to python code - python

I am trying to create a Python script which can convert the SQL query into a Python script using a regex. Can someone throw some ideas to achieve this in Python?
============SQL Query========
SELECT
alert_id,
Count(star_rating) as total_rating,
Max(star_rating) AS best_rating,
Min(star_rating) AS worst_rating
FROM
alerts
WHERE
verified_purchase = 'Y'
AND review_date BETWEEN '1995-07-22' AND '2015-08-31'
AND country IN
(
'DE','US','UK','FR','JP'
)
GROUP BY
alert_id
ORDER BY
total_rating asc,
alert_id desc,
best_rating
LIMIT 10;
Below are the expected result:
alerts.filter("verified_purchase = 'Y' AND review_date BETWEEN '1995-07-22' AND '2015-08-31' AND country IN ('DE', 'US', 'UK', 'FR', 'JP')")
.groupBy("alert_id")
.agg(count(col("star_rating")).alias('total_rating'),max(col("star_rating")).alias('best_rating'),min(col("star_rating")).alias('worst_rating')")
.select("alert_id","total_rating","best_rating","worst_rating")
.orderBy(col("total_rating").asc(),col("alert_id").desc(),col("best_rating").asc())
.limit(10)

I found a project that does it for SQLAlchemy code.
https://github.com/pglass/sqlitis
It seems that feature matrix is incomplete, but it's open source, so it may be better to contribute to the project than writing from scratch (and it seems like a fun and active project).
Definitely don't use regex. I recomment classic SO answer why not: RegEx match open tags except XHTML self-contained tags

Related

Converting company name to ticker

Hey so I have an excel document that has a mapping of company names to their respective tickers. I currently have this function
def(ticker):
mapping = pd.read_excel('ticker.xlsx',header = 3,parse_cols='A,B')
for index,row in mapping.iterrows():
if ticker.upper() in row['Name'].upper().split():
ticker = row['Ticker']
return ticker
The reason I am using "in" on line 4 instead of "==" is because in the excel document "Apple" is listed as "Apple Inc." and since the user isn't likely to type that I want ticker("apple") to return "AAPL".
In the code above the if statement never gets executed and I was curious on the best possible solution here.
Havnt seen this type of syntax before. Must be the nltk syntax.
That being said I will try to be helpful.
If the In command is the same as SQL then it means exactly equal. Meaning 'Apple' in('Apple Inc') would be false.
You want to do a if('AppleInc like '%Apple%')
or perhaps a .Match using regex. That's about the extent to which I can make suggestions as I don't do python.

Parsing conditional statements

I've written a small utility in Python3 to help me copy my music collection from my NAS to a mobile device. The usefulness of this is that it will auto-convert flac files to ogg-vorbis (to save space) and also exclude some files based on their audio tags (i.e. artist, album, date, etc).
I'm not happy with the limited nature of the exclude feature and I want to improve it but I've hit a mental block and I'm looking for advice on how to proceed.
I would like the user to write an exclude file which will look something like this:
exclude {
artist is "U2"
artist is "Uriah Heep" {
album is "Spellbinder"
album is "Innocent Victim"
}
}
This would translate to:
exclude if
(artist = "U2") OR
(artist = "Uriah Heep" AND (album = "Spellbinder" OR album = "Innocent Victim"))
There will be more conditionals such as sub-string matching and date ranges.
I've been checking out PLY but I'm struggling with the concepts of how to parse this type of nested structure and also how to represent the resulting conditional so that I can execute it in code when applying the exclude filter during the copy operation.
Your data structure is almost a dict, why not just use JSON? To go one better, you chould use Lucene Query Syntax.

Efficient Django full-text search without Haystack

What's the next best option for database-agnostic full-text search for Django without Haystack?
I have a model like:
class Paper(models.Model):
title = models.CharField(max_length=1000)
class Person(models.Model):
name = models.CharField(max_length=100)
class PaperReview(models.Model):
paper = models.ForeignKey(Paper)
person = models.ForeignKey(Person)
I need to search for papers by title and reviewer name, but I also want to search from the perspective of a person and find which papers they have and haven't reviewed. With Haystack, it's trivial to implement a full-text index to search by title and name fields, but as far as I can tell, there's no way to do the "left outer join" necessary to find papers without a review by a specific person.
Haystack is just a wrapper that exposes a few different search engine backends:
Solr
ElasticSearch
Whoosh
Xapian
There might be other backends as well available as plugins.
So the real question here is, is there a search backend that gives me the desired functionality, and does haystack expose that functionality?
The answer to that is, you can probably use elasticsearch*, but note the asterix.
Generally, when creating a search index, it's a good idea to think about the documents in the same way you might if you were creating a no-rel database and you want those documents to be as flat as possible.
So one possibility might be to have an array of char fields on a paperreview index. The array would contain all of the related foreign key references.
Another might be to use "nested documents" in elasticsearch.
And lastly, to use "parent/child documents" in elasticsearch.
You can still use haystack for indexing, with some hacking, but you will probably want to use one of the raw backends directly, such as pyelasticsearch or pyes.
http://www.elasticsearch.org/guide/reference/mapping/nested-type/
http://www.elasticsearch.org/guide/reference/mapping/parent-field/
http://pyelasticsearch.readthedocs.org/en/latest/
http://pyes.readthedocs.org/en/latest/
I know this question is older, but I spent some time investigation this recently and answered this as well here but it is actually not too hard to implement this yourself, and wanted to share.
I found the SearchVector/SearchQuery approach actually does not catch all cases, for example partial words (see https://www.fusionbox.com/blog/detail/partial-word-search-with-postgres-full-text-search-in-django/632/ for reference). You can implement your own without much trouble, depending on your constraints.
example, within a viewsets' get_queryset method:
...other params...
search_terms = self.request.GET.get('q')
if search_terms:
# remove possible other delimiters and other chars
# that could interfere
cleaned_terms = re.sub(r'[!\'()|&;,]', ' ', search_terms).strip()
if cleaned_terms:
# Check against all the params we want
# apply to previous terms' filtered results
q = reduce(
lambda p, n: p & n,
map(
lambda word:
Q(your_property__icontains=word) | Q(
second_property__icontains=word) | Q(
third_property__icontains=word)
cleaned_terms.split()
)
)
qs = YourModel.objects.filter(q)
return qs
I use Haystack + elastic search and so far its working pretty well. Dont think its trivial . You can easily implement your requirement, if theres a association between paper and person.
I ended up using djorm-ext-pgfulltext, which provides a simple Django interface for PostgreSQL's built-in full text search features.

QuerySet: LEFT JOIN with AND

I use old Django version 1.1 with hack, that support join in extra(). It works, but now is time for changes. Django 1.2 use RawQuerySet so I've rewritten my code for that solution. Problem is, that RawQuery doesn't support filters etc. which I have many in code.
Digging through Google, on CaktusGroup I've found, that I could use query.join().
It would be great, but in code I have:
LEFT OUTER JOIN "core_rating" ON
("core_film"."parent_id" = "core_rating"."parent_id"
AND "core_rating"."user_id" = %i
In query.join() I've written first part "core_film"."parent_id" = "core_rating"."parent_id" but I don't know how to add the second part after AND.
Does there exist any solution for Django, that I could use custom JOINs without rewritting all the filters code (Raw)?
This is our current fragment of code in extra()
top_films = top_films.extra(
select=dict(guess_rating='core_rating.guess_rating_alg1'),
join=['LEFT OUTER JOIN "core_rating" ON ("core_film"."parent_id" = "core_rating"."parent_id" and "core_rating"."user_id" = %i)' % user_id] + extra_join,
where=['core_film.parent_id in (select parent_id from core_film EXCEPT select film_id from filmbasket_basketitem where "wishlist" IS NOT NULL and user_id=%i)' % user_id,
'( ("core_rating"."type"=1 AND "core_rating"."rating" IS NULL) OR "core_rating"."user_id" IS NULL)',
' "core_rating"."last_displayed" IS NULL'],
)
Unfortunately, the answer here is no.
The Django ORM, like most of Django, follows a philosophy that easy things should be easy and hard things should be possible. In this case, you are definitely in the "hard things" area and the "possible" solution is to simply write the raw query. There are definitely situations like this where writing the raw query can be difficult and feels kinda gross, but from the project's perspective situations like this are too rare to justify the cost of adding such functionality.
Try this patch: https://code.djangoproject.com/ticket/7231

Implementing "Starts with" and "Ends with" queries with Google App Engine

Am wondering if anyone can provide some guidance on how I might implement a starts with or ends with query against a Datastore model using Python?
In pseudo code, it would work something like...
Query for all entities A where property P starts with X
or
Query for all entities B where property P ends with X
Thanks, Matt
You can do a 'starts with' query by using inequality filters:
MyModel.all().filter('prop >=', prefix).filter('prop <', prefix + u'\ufffd')
Doing an 'ends with' query would require storing the reverse of the string, then applying the same tactic as above.
Seems you can't do it for the general case, but can do it for prefix searches (starts with):
Wildcard search on Appengine in python

Categories