I'm making a windows desktop application that needs to transcribe videos and I'm looking for a good free API to help me achieve that.
I looked a lot but most of the API's that I've found have bad accuracies.
This doesn't work with .NET Core but if you're using the legacy .NET Framework (which is supported) you can use System.Speech to both recognize and synthesize speech offline.
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine?view=netframework-4.8
Update 3/1/21: System.Speech is now been ported to .NET Core. The Nuget package is available at: https://www.nuget.org/packages/System.Speech
Google's Speech-to-Text API has state of the art accuracy, a simple interface, and client libraries in many languages. You get 60 minutes free per month.
Link: https://cloud.google.com/speech-to-text/
If you want online API that is totally free, you most likely will not find it.
If you are willing to go offline, you will probably have to come up with a custom solution using the weights of some openly available deep learning model. Read some papers on state-of-the-art transcription models and see if any of the weights are available on GitHub. Keep in mind that performing such a task offline is very computationally expensive, and might require a GPU to give you results in a reasonable amount of time.
Related
I'm what Cornelis van Lit from Digital Orientalist calls a "centaur," or a scholar who devotes time to developing software solutions to humanities research problems. I've run into a problem that been NOT BEEN able to solve by searching stackoverflow or other online resources.
I developed a FileMaker solution to manage 150,000 digital surrogates of original sources similar to way Reddit user restricteddata suggested a few years ago.
I want to extend my solution with Google's Vision API. In particular, I want to use Vision perform OCR on these digital surrogates. I saw a Youtube video which does exactly what I'm asking but with Amazon's Textract API. I've tried Textract on my digital surrogates and found unsatifactory results. My surrogates are in Spanish and a signifcant number of them are handwritten. Google's Vision API, in my case, has produced better results. Also, Vision has a Python client library that I'm very familiar with.
So my problem and question deals with Python integration with Filemaker Pro Advanced (NOT HOSTED ON FILEMAKER SERVER)
Is there a way to pass a PDF from a container field to Python? And after Python does its thing--splitting the PDF, processing individual images, sending said images to Vision, pasring results, and recombining them--send the output string back to FileMaker in a new field?
The trigger would be from the FileMaker side so using available python libs or making the solution an ODBC source would not be useful. There are some FileMaker pluggins that can run a Python script, I think, the way Python would run a subprocess from FileMaker, but there's no clear direction on how to do that. I'm a graduate student so paying for consulting is out of the question. Is there anyone out there that can help?
I just came across googletrans python package. This package translates quite well and seems to use google translation API. To my knowledge, google translation API is not free. What googletrans doing internally for the translations? Is it legal to use googletrans?
The official documentation has information on this:
https://pypi.python.org/pypi/googletrans#how-does-this-library-work
You may wonder why this library works properly, whereas other approaches such like goslate won’t work since Google has updated its translation service recently with a ticket mechanism to prevent a lot of crawler programs.
I eventually figure out a way to generate a ticket by reverse engineering on the obfuscated and minified code used by Google to generate such token, and implemented on the top of Python. However, this could be blocked at any time.
As for the legality of this approach, this kind of stuff depends on the laws of the countries you live in, and is probably slightly off-topic.
I am learning Pyprocessing. It comes with the regular processing platform that originally was written in Java. Many of the example projects that come bundled with processing have also been written in Python but not any of the audio libraries/examples.
I tried searching google but haven't found anything as of yet.
Does anyone know of a good resource where I can learn to do basic things with the audio library in pyprocessing such as playing audio and filtering audio?
I've used pyaudio and SWMixer for basic audio needs on a project.
Other python-audio resources I found useful at the time:
Scott W Harden's blog post on FFT analysis in Python (lots of neat things there)
PyAudioMixer
python-sounddevice
I hadn't used these exhaustively though to be able to advise on which one is most stable and easy to use.
After much research, I've come up with a list of what I think might be the best way of putting together a Python based social network/cms, but have some questions about how some of these components fit together.
Before I ask about the particular components, here are some of the key features of the site to be built:
a modern almost desktop-like gui
future ability to host an advanced html5 sub-application (ex.http://www.lucidchart.com)
high scalability both for functionality and user load
user ability to password protect and permission manage content on per item/group basis
typical social network features
ability to build a scaled down mobile version in the future
Here's the list of tools I'm considering using:
Google App Engine
Python
Django
Pinax
Pyjamas
wxPython
And the questions:
Google App Engine -- this is an attempt to cut to the chase as many pieces of the puzzle seem to be in place.
Question: Am I limiting my options with this choice? Example: datastore not being relational? Should I wait
for SQL support under the Business version?
Python -- I considered 'drupal' at first, but in the end decided that being dependent on modules that may or
may not exist tomorrow + limitations of its templating system are a no-no. Learning its API, too, would be useless elsewhere
whereas Python seems like a swiss army knife of languages -- good for almost anything.
Question: v.2.5.2 is required by GAE, but python.org recommends 2.5.5. Which do I install?
Django -- v.0.96 is built into GAE. You seem to be able to upgrade it.
Questions: Any reason not to upgrade to the latest version? Ways to get around the lack of HTML5 support?
Pinax (http://pinaxproject.com) Rides on top of Django and appears to provide most of the social network functionality
anyone would want.
Question: Reasons NOT to use it? Alternatives?
Pyjamas and wxPython -- this is the part that gets a little confusing. The basic idea behind these is the ability
to build a GUI. I've considered Silverlight and Flash, before the GAE/Python route, but a few working versions of
HTML5 apps convinced me that enough of it ALREADY runs on the latest batch of browsers to chose the HTML5/Javascript
route instead.
Question: How do I extend/supplement Python/Django to build an app-like HTML5 interface? Are Pyjamas and wxPython
the way to go? Or should I change my thinking completely?
Answers to some/any of these questions would be of great help. Please excuse my ignorance if any of this doesn't make much sense.
My last venture into web programming was a decent sized LAMP website some 5-6 years ago. On the desktop side of things,
my programming experience boils down to very high level scripting languages that I keep on learning to accomplish very specific
tasks :)
As someone who has deployed a Django site to GAE, I can tell you that you are not going to reach the ideal solution. Django on GAE misses some of the best aspects of Django because the ORM doesn't work right. The best compromise may be to use Django-nonrel to add the features back in.
This introduces it's own problems though: because of the large number of files and memory used by a Django app you're code will be unloaded from memory quickly after the app becomes idle. That means that visitors will frequently hit an approximately 6 second delay on the first page view after the site's code has been unloaded from memory while GAE uncompresses the zipped modules. Once your site is busy this won't be a problem, but while your site is still young and unknown it will cause the appearance of performance problems. :-(
Second, I've also worked for a company that built a custom CMS and can tell you that the first 80% is pretty easy, especially with modern frameworks. However, the rest can be quite challenging. For example, user roles and custom content types are two challenging aspects. Therefore strongly consider standing on the backs of giants and finding a CMS or CMS framework that almost perfectly meets your needs and then extend it to do that extra bit you need.
So, that said, answering your points:
Yes, you're limiting your options but that may be OK. Most developers are more comfortable with the relational model than the nosql model. Therefore more open source software is built with it in mind. Also, GAE is a closed source platform which is also a deterrent to open source developers. App Engine Oil is a CMS framework that may suit you well and is optimized for App Engine. Also look at web2py which has support for GAE.
I've found myself to be extremely productive with Python. I used to write a lot of PHP now I find it ugly. That said, think about the total line count of code you'll have to write. If you can make Drupal work with high quality pre-made modules you may find yourself only needing 1/10th of the code. By the way, the trick with Drupal is to mainly use only high quality modules. Look at the history, make sure not to use development versions. Try to contact the authors on IRC. I'm not saying you should use Drupal but it is possible to have a reliable site with it (for example, whitehouse.gov)
You're in the classic GAE/Django problem. If you use 0.96 you get great performance but you miss a lot of the great 1.0+ features and you don't get the ORM and all of it's benefits. If you use a newer version of Django you get the performance/memory problems mentioned above.
I'm about to investigate pinax for my company. I've done a very cursor glance at it. I don't know if it has good support for non relational model backends. You'll probably need to look at django-nonrel. However know that you're going to be investing in relatively untried solutions here. A small percentage of Django users use Pinax and an even smaller percentage, if any, use it on a nonrelational backend. Therefore you're going to be in the highly experimental scenario you mentioned in point 2 above.
I can't offer personal experience on it. I've investigated pyjamas a few times. However I like writing HTML CSS and JS. I like to have control. I like progressive enhancement and knowing what users will see if they don't have the full capabilities. Also, I think any new app that doesn't explicitly address mobile clients is implicitly shooting themself in the foot. As many as 15% of Internet users only use the Internet via their smart phone. What kind of experience will they get with pyjamas?
You didn't mention this, but one thing I consider when choosing a platform is vendor lockin and portability. If you develop your solution for GAE and find that you're not able to do what you want, will you be able to port it to another solution elsewhere? How much work will it take? If you code heavily for GAE or make commitments to its architecture, you're stuck with it or with rewriting to move. Using Django or Web2py can help mitigate this.
That said, the big benefit of Python GAE is that you get to be very productive, see your results instantly, get hosting for free while your site is small and get excellent scalability. These are not small things. There is great value there.
I'm doing a tech review and looking at AMF integration with various backends (Rails, Python, Grails etc).
Lots of options are out there, question is, what do the Adobe products do (BlazeDS etc) that something like RubyAMF / pyAMF don't?
Other than NIO (RTMP) channels, LCDS include also the "data management" features.
Using this feature, you basically implement, in an ActionScript class, a CRUD-like interface defined by LCDS, and you get:
automatic progressive list loading (large lists/datagrids loads while scrolling)
automatic crud management (you get object locally in flash, modify it, send it back and DB will get updated automatically)
feature for conflict resolution (if multiple user try to updated the same record at the same time)
if I remember well, also some improved integration with the LiveCycle ES workflow engine
IMO, it can be very fast to develop this way, but only if you have only basic requirements and a simple architecture (forget SOA, that otherwise works so well with Flex). I'm fine with BlazeDS.
The data management features for LCDS described here are certainly valid, however I believe they do not let you actually develop a solution faster. A developer still has to write ALL the data access code, query execution, extracting data from datareaders into value objects. ALL of this has been solved a dozen of times with code generators. For instance the data management approach in WebORB for Java (much like in WebORB for .NET and PHP) is based on code generation which creates code for both client side AND server-side. You get all the ActionScript APIs out of the code generator to do full CRUD.
Additionally, WebORB provides video streaming and real-time messaging features and goes WAY beyond what both BlazeDS and LCDS offer combined, especially considering that the product is free. Just google it.
Adobe has two products: Livecycle Data Services ES (LCDS) and BlazeDS. BlazeDS contains a subset of LCDS features and was made open source. Unfortunately NIO channels (RTMP NIO/HTTP) and the DataManagement features are implemented only in LCDS, not BlazeDS.
BlazeDS can be used only to integrate Flex with Java backend. It offers not only remoting services using AMF serialization (as RubyAMF) but also messaging and collaboration features - take a look at this link (http://livedocs.adobe.com/blazeds/1/blazeds_devguide/help.html?content=lcoverview_3.html). Also I suppose that the support is better compared with RubyAMF/pyAMF.
If your backend is JAVA and you want to use only a free product you can also use GraniteDS or WebORB (BlazeDS competitors)
Good question. I'm not a ruby guy (i use java with flex), but what I believe differentiates blazeds vs commercial livecycle ds is
Streaming protocol support (rtmp) - competition for comet and such, delivering video
Some advanced stuff for hibernate detached objects and large resultset caching that I don't fully understand or need
support?
Might be others but those are the ones I know off the top of my head.