Capturing information for customers such a referral URL and conversion - python

I was hoping to create my own in-house analytics so I tell my customers how many visits their company page got on my site and which URL they came from. I am coding this in Python (Flask) and I wondered if anyone could tell me what is the standard, or sensible approach to this problem.
I think it might be to have some sort of Redis queue which is triggered when a visitor comes and then this information is added to the database later so the site doesn't seem slow.

The standard, and sensible approach is to use Google Analytics. If you must roll your own, you have one of two approaches. JavaScript that is executed on every page (like GA) and pulls this kind of info into a DB. The second approach is parsing log files on the server. Awstats is a good bet for that.

Related

Connecting Two Websites with Coding

I am new to website development, my issue is figuring out what code I need to input so that my two website talk with one another. My two website are built similar to an Amazon platform. Where one site is for clients to input their content and the other site is for the consumer to browse and purchase. I have both built, but need to figure out what coding I need to input so that when a client enters new content or edits content it reflects on the consumer site.
Thank you for any advice or coding possibilities.
Elizabeth
I can't show you any code but you need something like a database. If The websites are basially on the same server and both are accessing to the same folders you could basically also use a normal file but i would recommend making a database. I dont really know much about databases but I am quite sure it should work. Have a look on youtube about how to work with databases. I think you need a bit knoweldge in JavaScript. Depending on the database you are gonna use you also need to know MySQL or rathe the language the database works with. I hope this helped you. If not just reply to this reply

Can Pybossa be used as a micro tasking / contest platform?

I was wondering if it was possible to use Pybossa as a micro tasking / contest platform?
I am looking for something where I can register users, and get them to complete micro tasks such as twitter upvotes, retweeting, commenting, as well as reddit, youtube and things like that?
One platform currently is vyper.io which does a similar thing.
I was looking for an open source alternative that I can customise myself.
Can Pybossa do this? or if not, do you know if something else similar can?
Thank you
You can do that with PYBOSSA. Basically, PYBOSSA allows you to design any type of data, due to its JSON data storage facilities. In there you can put images, audios, webmaps, anything that can be rendered on the web and then ask users to complete those micro-tasks.
PYBOSSA has webooks APIs, allowing you to react in real-time to the feedback the users are sending to you. Thus, imagine that you are asking to upvote a given image. Say you want at least 10 people to participate in it. When the tenth person sends his/her feedback, PYBOSSA will notify you via the webhooks solution that this task has been completed. Then, another micro-service could get it, do some statistics analysis and say, hey 8 out 10 agree this is the best image. Then, you can push that info to any other service like Twitter, Facebook, etc.
I hope it helps you.
NOTE: I'm the creator of PYBOSSA.

Simple API's to play around with for Python/Django?

Does anyone know of simple and well documented API's with plenty of hand holding examples that assumes very little or no prior knowledge of web development?
I've been messing around with Pyfacebook and Facebook-Python-SDK in trying to create a simple photo display app but I haven't been able to make much headway after spending the last few days on it. The main reason for this is simply because I just wasn't able to find a good tutorial that walks me through all the steps. So, I'm putting this mini project on pause and looking for lower hanging fruit.
In terms of skill level, I'm pretty ok on the basics of Python and Django.
Update
I've done the tutorials at http://www.djangoproject.com/ already. Really looking for ideas and suggestions on webapp projects that utilises an API. E.g, a twitter app that displays a user's most frequently used keywords in a tagcloud.
Update2
Side note: Having mess around with Twitter's API for a little bit, I would definitely recommend to start with Twitter first as opposed to Facebook. It's easier and better documented.
Best place to start is with tutorials on djangoproject.com.
Have you tried the Django tutorial? It is pretty basic, but touches on all important points required to develop your own basic app.
django-basic-apps contains a collection of apps you might enjoy reading.
Edit: Check out this good list of web services I found. :)
As far as I know you can't write facebook apps with Django. Facebook uses their own API and stuff. They are completely different.
And for the twitter API thingy I have an idea.
Develop a django app which can used to
scrap and backup tweets.
The scenario is during any FOSS
conference, they are using a #hastag
to identify tweets related to that
conf. But after sometime these tweets
don't show up even on search. For
example we used #inpycon2010 tag for
Pycon conf in India. But now when I
search for this tag, nothing shows up.
So what you can do allow users to
register a hastag and set a time
interval. Within that time interval
your app should scrap all the tweets
and backup them. The user should be
able to retreive from that later.
If you start this a foss project, I'm ready to jump in :)

I'm searching for a messaging platform (like XMPP) that allows tight integration with a web application

At the company I work for, we are building a cluster of web applications for collaboration. Things like accounting, billing, CRM etc.
We are using a RESTfull technique:
For database we use CouchDB
Different applications communicate with one another and with the database via http.
Besides, we have a single sign on solution, so that when you login in one application, you are automatically logged to the other.
For all apps we use Python (Pylons).
Now we need to add instant messaging to the stack.
We need to support both web and desktop clients. But just being able to chat is not enough.
We need to be able to achieve all of the following (and more similar things).
When somebody gets assigned to a task, they must receive a message. I guess this is possible with some system daemon.
There must be an option to automatically group people in groups by lots of different properties. For example, there must be groups divided both by geographical location, by company division, by job type (all the programers from different cities and different company divisions must form a group), so that one can send mass messages to a group of choice.
Rooms should be automatically created and destroyed. For example when several people visit the same invoice, a room for them must be automatically created (and they must auto-join). And when all leave the invoice, the room must be destroyed.
Authentication and authorization from our applications.
I can implement this using custom solutions like hookbox http://hookbox.org/docs/intro.html
but then I'll have lots of problems in supporting desktop clients.
I have no former experience with instant messaging. I've been reading about this lately. I've been looking mostly at things like ejabberd. But it has been a hard time and I can't find whether what I want is possible at all.
So I'd be happy if people with experience in this field could help me with some advice, articles, tales of what is possible etc.
Like frx suggested above, the StropheJS folks have an excellent book about web+xmpp coding but since you mentioned you have no experience in this type of coding I would suggest talking to some folks who have :) It will save you time in the long run - not that I'm saying don't try to implement what frx outlines, it could be a fun project :)
I know of one group who has implemented something similar and chatting with them would help solidify what you have in mind: http://andyet.net/ (I'm not affiliated with them at all except for the fact that the XMPP dev community is small and we tend to know each other :)
All goals could be achieved with ejabberd, strophe and little server side scripting
When someone gets assigned to task, server side script could easily authenticate to xmpp server and send message stanza to assigned JID. That its trivial task.
To group different people in groups, it is easily can be done from web chat app if those user properties are stored somewhere. Just join them in particular multi user chat room after authentication.
Ejabberd has option to automatically create and destroy rooms.
Ejabberd has various authorization methods including database and script auth
You could take look at StropheJS library, they have great book (paperback) released. Really recommend to read this book http://professionalxmpp.com/

Efficient storage of and access to web pages with Python

So like many people I want a way to download, index/extract information and store web pages efficiently. My first thought is to use MySQL and simply shove the pages in which would let me use FULLTEXT searches which would let me do ad hoc queries easily (in case I want to see if something exists and extract it/etc.). But of course performance wise I have some concerns especially with large objects/pages and high volumes of data. So that leads me to look at things like CouchDB/search engines/etc. So to summarize, my basic requirements are:
It must be Python compatible (libraries/etc.)
Store meta data (URL, time retrieved, any GET/POST stuff I sent), response code, etc. of the page I requested.
Store a copy of the original web page as sent by the server (might be content, might be 404 search page, etc.).
Extract information from the web page and store it in a database.
Have the ability to do ad hoc queries on the existing corpus of original web pages (for example a new type of information I want to extract, or to see how many of the pages have the string "fizzbuzz" or whatever in them.
And of course it must be open source/Linux compatible, I have no interest in something I can't modify or fiddle with.
So I'm thinking several broad options are:
Toss everything into MySQL, use FULLTEXT, go nuts, shard the contact if needed.
Toss meta data into MySQL, store the data on the file system or something like CouchDB, write some custom search stuff.
Toss meta data into MySQL, store the data on a file system with a web server (maybe /YYYY/MM/DD/HH/MM/SS/URL/), make sure there is no default index.html/etc specified (directory index each directory in other words) and use some search engine like Lucene or Sphinx index the content and use that to search. Biggest downside I see here is the inefficiency of repeatedly crawling the site.
Other solutions?
When answering please include links to any technologies you mention and if possible what programming languages it has libraries for (i.e. if it's Scala only or whatever it's probably not that useful since this is a Python project). If this question has already been asked (I'm sure it must have been) please let me know (I searched, no luck).
Why do you think solution (3), the Sphinx-based one, requires "repeatedly crawling the site"? Sphinx can accept and index many different data sources, including MySQL and PostgreSQL "natively" (there are contributed add-ons for other DBs such as Firebird) -- you can keep your HTML docs as columns in your DB if you like (modern PostgreSQL versions should have no trouble with that, and I imagine that MySQL's wouldn't either), just use Sphinx superior indexing and full-text search (including stemming &c). Your metadata all comes from headers, after all (plus the HTTP request body if you want to track requests in which you POSTed data, but not the HTTP response body at any rate).
One important practical consideration: I would recommend standardizing on UTF-8 -- html will come to you in all sorts of weird encodings, but there's no need to get crazy supporting that at search time -- just transcode every text page to UTF-8 upon arrival (from whatever funky encoding it came in), before storing and indexing it, and live happily ever after.
Maybe you could special-case non-textual responses to keep those in files (I can imagine that devoting gigabytes in the DB to storing e.g. videos which can't be body-indexed anyway might not be a good use of resources).
And BTW, Sphinx does come with Python bindings, as you request.
You may be trying to achieve too much with the storage of the html (and supporting files). It seems you wish this repository would both
allow to display a particular page as it was in its original site
provide indexing for locating pages relevant to a particular search criteria
The html underlying a web page once looked a bit akin to a self-standing document, but the pages crawled off the net nowadays are much messier: javascript, ajax snippets, advertisement sections, image blocks etc.
This reality may cause you to rethink the one storage for all html approach. (And also the parsing / pre-processing of the material crawled, but that's another story...)
On the other hand, the distinction between metadata and the true text content associated with the page doesn't need to be so marked. By "true text content", I mean [possibly partially marked-up] text from the web pages that is otherwise free of all other "Web 2.0 noise") Many search engines, including Solr (since you mentioned Lucene) now allow mixing the two genres, in the form of semi-structured data. For operational purposes (eg to task the crawlers etc.), you may keep a relational store with management related metadata, but the idea is that for search purposes, fielded and free-text info can coexist nicely (at the cost of pre-processing much of the input data).
It sounds to me like you need a content management system. Check out Plone. If that's not what you want maybe a web framework, like Grok, BFG, Django, Turbogears, or anything on this list. If that isn't good either, then I don't know what you are asking. :-)

Categories