To save a lot of work, I'm trying to generate a sound file of a script for a play that includes several different voices. These voices should be computer generated. I have software (NaturalReader13) for generating these voices. Since I don't want the entire play read in one voice, I can't upload the whole text into NaturalReader and export it.
I could export several voice files and then mix them into a coherent whole, but this takes a long time and patience. I tried this already using Audacity to mix. Instead, I want to automate this by interfacing with the program using Python. I have no idea how to do this.
There are free voice generators online that would work for this task, but they are much lower quality. If interfacing with NaturalReaders is too complex, getting the data from the web might be easier.
So basically, the script I have is of the form:
Character: These are the lines that need to be read...
Any ideas on how to approach this?
Related
I'm new to programming and recently started getting into Python a lot more seriously. However, I've done some projects that required programming in my company, so I have some background on how it works (or how to scour the internet! lol).
Recently however, we have had a client that sends us invoices in PDF formats and we would like to automate all the invoices to compile into one .csv file.
I've been picking up a few OCR codes (I ran my first image-to-text output recently), however I don't think I'm 100% capable of creating such automation yet since I'm still very fresh in programming. It would require at least a few weeks, and I'm not sure if it's worth it if we could just ask the client to set up a more accurate excel spreadsheet to send over every time.
That's why I'm turning to an already available OCR tool. I recently found this gem: https://www.pdftoexcel.com/ however it is a very manual process and not as automated as we could like. If there is a way to program a script to upload an available PDF file from a certain folder in order to upload it to the website and export it into an Excel file every time we receive an invoice, would it be possible to share?
It would also be a big plus if there would be a way to upload a batch of invoices and identify different charges, providing a summary across the scanned invoices, particularly in the categories
I hope what I'm asking for makes sense. Let me know if you'd require more clarification.
Cheers
There's lots of stuff available for Python, if you have a quick search on Google or StackOverflow. I believe I used Tesseract OCR in the past.
My experience is that you will get serviceable OCR with some of the popular Python libraries, but the excellent stuff will come with a price tag.
Try some tests with your PDF invoices, but if you are getting even slightly questionable results, you may have to consider more expensive alternatives (or even standalone machinery!).
If the client is sending you clear, nicely formatted PDFs with a good font, I don't see why free Python libraries wouldn't be sufficient.
I have been having some issues for the past few days to get this to work. All I need is to learn it once to get it later. So what I need is a good example of working source code to play a simple wav file. I do not want to use an external library only to get this to work. I honestly don't see the point in getting a huge library to substitute for one problem :/. So if I can get a (Once again, NON-EXTERNAL) example, that would be great. (I'm using windows, so winsound should work, but I can't get the winsound.PlaySound('Example'.wav, SND_FILENAME) thing to work.) Thanks!
I'm working on a side project where we want to process images in a hadoop mapreduce program (for eventual deployment to Amazon's elastic mapreduce). The input to the process will be a list of all the files, each with a little extra data attached (the lat/long position of the bottom left corner - these are aerial photos)
The actual processing needs to take place in Python code so we can leverage the Python Image Library. All the Python streaming examples I can find use stdin and process text input. Can I send image data to Python through stdin? If so, how?
I wrote a Mapper class in Java that takes the list of files and saves the names, the extra data, and the binary contents to a sequence file. I was thinking maybe I need to write a custom Java mapper that takes in the sequence file and pipes it to Python. Is that the right approach? If so, what should the Java to pipe the images out and the Python to read them in look like?
In case it's not obvious, I'm not terribly familiar with Java OR Python, so it's also possible I'm just biting off way more than I can chew with this as my introduction to both languages...
There are a few possible approaches that I can see:
Use both the extra data and the file contents as input to your python program. The tricky part here will be the encoding. I frankly have no idea how streaming works with raw binary content, and I'm assuming that basic answer is "not well." The main issue is that the stdin/stdout communication between processes is very text-based, relying on delimiting input with tabs and newlines, and things like that. You would need to worry about the encoding of the image data, and probably have some sort of pre-processing step, or a custom InputFormat so that you could represent the image as text.
Use only the extra data and the file location as input to your python program. Then the program can independently read the actual image data from the file. The hiccup here is making sure that the file is available to the python script. Remember this is a distributed environment, so the files would have to be in HDFS or somewhere similar, and I don't know if there are good libraries for reading files from HDFS in python.
Do the java-python interaction yourself. Write a java mapper that uses the Runtime class to start the python process itself. This way you get full control over exactly how the two worlds communicate, but obviously its more code and a bit more involved.
I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).
This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)
I am looking for a high level audio library that supports crossfading for python (and that works in linux). In fact crossfading a song and saving it is about the only thing I need.
I tried pyechonest but I find it really slow. Working with multiple songs at the same time is hard on memory too (I tried to crossfade about 10 songs in one, but I got out of memory errors and my script was using 1.4Gb of memory). So now I'm looking for something else that works with python.
I have no idea if there exists anything like that, if not, are there good command line tools for this, I could write a wrapper for the tool.
A list of Python sound libraries.
Play a Sound with Python
PyGame or Snack would work, but for this, I'd use something like audioop.
— basic first steps here : merge background audio file
A scriptable solution using external tools AviSynth and avs2wav or WAVI:
Create an AviSynth script file:
test.avs
v=ColorBars()
a1=WAVSource("audio1.wav").FadeOut(50)
a2=WAVSource("audio2.wav").Reverse.FadeOut(50).Reverse
AudioDub(v,a1+a2)
Script fades out on audio1 stores that in a1 then fades in on audio2 and stores that in a2.
a1 & a2 are concatenated and then dubbed with a Colorbar screen pattern to make a video.
You can't just work with audio alone - a valid video must be generated.
I kept the script as simple as possible for demonstration purposes. Google for more details on audio processing via AviSynth.
Now using avs2wav (or WAVI) you can render the audio:
avs2wav.exe test.avs combined.wav
or
wavi.exe test.avs combined.wav
Good luck!
Some references:
How to edit with Avisynth
AviSynth filters reference