How to define prompt weights to huggingface's diffusers.StableDiffusionInpaintPipeline? - python

I am tweaking a python script using diffusers inpainting pipeline for a custom video generation idea.
I would like to gradually shift the weights of certain words in the prompt.
As I understand the argument prompt_embeds is exactly what i need.
I could not figure out how to define this argument. Can someone pls provide an example?
I know there are frameworks out there where you can just add weights to certain words with the following syntax:
"This is a SD prompt with plus 50% weight added to the last (word:1.5)"
This would be a great solution as well however this does not work with diffusers.StableDiffusionInpaintPipeline

Related

Need help understanding the math behind a CVAE

I am trying to use the following link to understand how a CVAE works. Although i can see how this works for something like a 28x28x1 input image, I'm not sure how to modify this to work for something like an input image of size 64x64x3.
I have tried looking at other sources for information, but all of them use the MNIST dataset used in the example above. None of them really explain why they chose the numbers for filters, kernels, or strides. I need help understanding this and how to modify the network to work for a 64x64x3.
None of them really explain why they chose the numbers for filters,
kernels, or strides.
I'm new to CNNs too, but from what I understand it's really more about experimentation, there is not an exact formula that would give you the amount of filters you have to use or the correct size, it depends on your problem, if you are trying to recognize an object and the object has small features that would make it "recognizable" to the network, then using a filter of small size may be best, but if you think that the features that allow the network to recognize the object are "bigger", then using filters of bigger size may be best, but again from what I've learned, these are just tips, you may have a CNN that has a completely different configuration.

custom binary algorithm and neural network

I would like to understand more the machine learning technics, I have read and watch a bunch of things on Python, sklearn and supervised feed forward net but I am still struggling to see how I can apply all this to my project and where to start with. Maybe it is a little bit too ambitious yet.
I have the following algorithm which generates nice patterns as binary format inputs on csv file. The outputs and the goal is to predict the next row.
The simplify logic of this algorithm is the prediction of the next line (top line being the most recent one) would be 0,0,1,1,1,0 and then the next after that would become either 0,0,0,1,1,0 or come back to its previous step 0,1,1,1,0. However you can see the model is slightly more complex and noisy this is why I would like to introduce some machine learnings here. I am aware to have a reliable prediction I will need to introduce other relevant inputs afterwards.
Would someone please help me to get started and stand on my feet here?
I don't like throwing this here and not being able to provide a single piece of code but I am slightly confused to where to start.
Should I pass as input each (line-1) as vectors and then the associated output would be the top line? Should I build the array manually with all my dataset?
I guess I have to use the sigmoid function and python seems the most common way to answer this but for the synapses (or weights), I understand I need also to provide a constant, should this be 1?
Finally assuming you want this to run continuously what would be required?
Please would you share with me readings or simplification tasks that could help me to increase my knowledge with all this.
Many thanks.

Implementation of BEGAN (Boundary Equilibrium GAN) Using CNTK Python API

I found an implementation for BEGAN using CNTK.
(https://github.com/2wins/BEGAN-cntk)
This uses MNIST dataset instead of Celeb A which was used in the original paper.
However, I don't understand the result images, which looks quite deterministic:
Output images of the trained generator (iter: 30000)
For different noise samples, I expect different outputs come from it. But it doesn't do that regardless of any hyper-parameters. Which part of the code does make the problem?
Please explain it.
Use higher gamma (for example gamma=1 or 1.3, more than 1 actually). Then it will improve certainly but would not make it perfect. Take enough iterations like 200k.
Please look at the paper carefully. It says the parameter gamma controls diversity.
One of the results that I obtained is .
I'm also looking for the best parameters and best results, but haven't yet.
Looks like your model might be getting stuck in a particular mode. One idea would be to add an additional condition on the class labels. Conditional GANs have been proposed to overcome such limitations.
http://www.foldl.me/uploads/2015/conditional-gans-face-generation/paper.pdf
This is an idea that would be worth exploring.

Architecture neural network OCR for Printed Documents

I'm learning neural network by using tensorflow to build a OCR for printed documents.
Would you mind giving me advices which Architecture neural network is good for recognize characters.
I'm confusing because I'm a newbie and there are a lot of neural network designs
I found MNIST CLASSIFIER but their architectures are only about digit.
I don't know their architectures can work with characters or not ?
thank you
As you correctly point out, recognizing documents is a different thing from recognizing single characters. It is a complex system that will take time to implement from scratch. First, there is the problem of preprocessing. You need to find where the text is, perhaps slightly rotate it, etc. That can be done with heuristics and a library like OpenCV. You'll also have to detect things like page numbers, header/footers, tables/figures, etc.
Then, in some cases, you could take the "easy" route and use heuristics to segment the text into characters. That works for block characters, but not cursive scripts.
If the segmentation is given, and you don't have to guess it, you have to solve multiple related problems, each are like MNIST but they are related in that the decisions are not independent. You can look up MEMM (Maximum-Entropy Markov Models) vs HMM (Hidden Markov Models, Hidden Conditional Random Fields, and Segmental Conditional Random Fields, and study the difference between them. You can also read about seq2seq.
So if you're making it simple for yourself, you can essentially run MNIST classifiers multiple times, once the segmentation is revealed (via some heuristic in opencv). On top of that, you have to run a dynamic program which finds the best final sequence based on the score of each decision, and a "language model", which assigns likelihoods of letters occurring close to each other.
If you're starting from scratch, it's not an easy thing. It may take months for you to get a basic understanding. Happy hacking!

Python: Create Nomograms from Data (using PyNomo)

I am working on Python 2.7. I want to create nomograms based on the data of various variables in order to predict one variable. I am looking into and have installed PyNomo package.
However, the from the documentation here and here and the examples, it seems that nomograms can only be made when you have equation(s) relating these variables, and not from the data. For example, examples here show how to use equations to create nomograms. What I want, is to create a nomogram from the data and use that to predict things. How do I do that? In other words, how do I make the nomograph take data as input and not the function as input? Is it even possible?
Any input would be helpful. If PyNomo cannot do it, please suggest some other package (in any language). For example, I am trying function nomogram from package rms in R, but not having luck with figuring out how to properly use it. I have asked a separate question for that here.
The term "nomogram" has become somewhat confused of late as it now refers to two entirely different things.
A classic nomogram performs a full calculation - you mark two scales, draw a straight line across the marks and read your answer from a third scale. This is the type of nomogram that pynomo produces, and as you correctly say, you need a formula. As mentioned above, producing nomograms like this is definitely a two-step process.
The other use of the term (very popular, recently) is to refer to regression nomograms. These are graphical depictions of regression models (usually logistic regression models). For these, a group of parallel predictor variables are depicted with a common scale on the bottom; for each predictor you read the 'score' from the scale and add these up. These types of nomograms have become very popular in the last few years, and thats what the RMS package will draft. I haven't used this but my understanding is that it works directly from the data.
Hope this is of some use! :-)

Categories