Using my phone held in my hand to represent a toy gun, I move it about ("aim", if you would) and transfer the orientation data (pitch, yaw, roll) to my laptop where I render a moving crosshair relative to a webcam feed pointed forwards.
I start the application by getting the user to press enter on the laptop while they are holding the phone straight in front of them - this is the initial calibration step, and I use the initial yaw/pitch as a reference to aiming at the center.
Then during the camera feed loop, where I draw the crosshair, I measure changes in the pitch/yaw relative to those initial calibrated values, and I use them to redraw the crosshair left/right/up/down.
This is my current code:
pitchDiff = initPitch - newPitch #corresponds to Y axis
yawDiff = 0-(initYaw - newYaw) #corresponds to X axis
pitchChangeFactor = 10
yawChangeFactor = 10
xC = int((imgWidth/2) + yawDiff*yawChangeFactor)
yC = int((imgHeight/2) - pitchDiff*pitchChangeFactor)
## THE TARGETING GUI
cR = 40 #circle radius
cv2.circle(img, (xC,yC), cR, (20, 20, 255), 3)
What I'm asking for in this question is how I can do this more accurately and smoothly. The gyro data is noisy, so when I sample every 30 frames-per-second, I actually take the median of about 60 gyro readings for pitch/yaw.
Also, I believe my dynamical model is wrong: I'm simply moving the targeting crosshair a constant amount across the screen according to changes in angles. I would think trigonometric functions are needed, but it's not clear to me what I should try. Using only 1 camera, I clearly lack depth data. However, I am OK for assuming that the target I would like to aim at is, say, 3-4 meters in front of me.
Thank you for any help
how can I do this more accurately and smoothly?
Gyroscopes measure yaw rate, roll rate, and pitch rate but can't measure roll, pitch, and yaw directly. When you request the pitch and yaw angles from your phone it combines gyro and accelerometer data (and possibly magnetometer data) to give you an estimate of pitch and yaw.
By gyro data I assume that you mean the estimated yaw and pitch that your phone provides.
Make sure that you are using the yaw and pitch angles and not the yaw rate, pitch rate etc.
If you are using the estimation angles then you can look at different methods of filtering those signals before you you do your calculations. You mentioned a 60 sample median filter. Have you tried other filters? A simple low pass filter may perform better for your situation. Median filters are good if you get large and sudden spikes in your signals but a low pass filter or moving average may perform better for your situation. The proper way to do this would be to fix the position of your phone while acquiring data for some time then analyze the frequency components of the noise and choose a filter with the appropriate cut-off frequency to remove as much noise as you can, though this may be overkill for what you're trying to do.
I would suggest experimenting with a simple moving average or low pass filter to start. There is a lot of resources on the web showing how to implement filters in software.(http://en.wikipedia.org/wiki/Low-pass_filter)
Dynamical Model
As far as your calculations, they seem fine to me as long as your phone is providing angles and not rates as discussed above.
You do not need trigonometric functions. If you are just trying to map the angle to the screen position then a simple linear mapping like you've done is all you need so that a change of angle of x degrees corresponds to a change in screen position of M*x pixels.
Another thing that could help is if your phone can provide data faster than 30 Hz. You can still update the screen at 30 Hz but your filtering may be more effective when sampling faster. This all depends on the nature of the noise though, you should experiment with the sampling rate if you can.
Good luck with your project.
Related
I have a picture of human eye taken roughly 10cm away using a mobile phone(no specifications regarding the camera). After some detection and contouring, I got 113px as the Euclidean distance between the center of the detected iris and the outermost edge of iris on the taken image. Dimensions of the image: 483x578px.
I tried converting the pixels into mm by simply multiplying the number of pixels with the size of a pixel in mm since 1px is roughly equal to 0.264mm which gives the proper length only if the image is in 1:1 ratio wrt to the real-time eye which is not the case here.
Edit:
Device used: One Plus 7T
View of range = 117 degrees
Aperture = f/2.2
Distance photo was taken = 10 cm (approx)
Question:
Is there an optimal way to find the real time radius of this particular eye with the amount of information I have gathered through processing thus far and by not including a reference object within the image?
P.S. The actual HVID of the volunteer's iris is 12.40mm taken using Sirus(A hi-end device to calculate iris radius and I'm trying to simulate the same actions using Python and OpenCV)
After months I was able to come up with the result after ton of research and lots of trials and errors. This is not the most ideal answer but it gave me expected results with decent precision.
Simply, In order to measure object size/distance from the image we need multiple parameters. In my case, I was trying to measure the diameter of iris from a smart phone camera.
To make that possible we need to know the following details prior to the calculation
1. The Size of the physical sensor (height and width) (usually in mm)
(camera inside the smart phone whose details can be obtained from websites on the internet but you need to know the exact brand and version of the smart phone used)
Note: You cannot use random values for these, otherwise you will get inaccurate results. Every step/constraint must be considered carefully.
2. The Size of the image taken (pixels).
Note: Size of the image can be easily obtained used img.shape but make sure the image is not cropped. This method relies on the total width/height of the original smartphone image so any modifications/inconsistencies would result in inaccurate results.
3. Focal Length of the Physical Sensor (mm)
Note: Info regarding focal length of the sensor used can be acquired from the internet and random values should not be given. Make sure you are taking images with auto focus feature disabled so the focal length is preserved. Incase if you have auto focus on then the focal length will be constantly changing and the results will be all over the place.
4. Distance at which the image is taken (Very Important)
Note: As "Christoph Rackwitz" told in the comment section. The distance from which the image is taken must be known and should not be arbitrary. Head cannoning a number as input will always result in inaccuracy for sure. Make sure you properly measure the distance from sensor to the object using some sort of measuring tool. There are some depth detection algorithms out there in the internet but they are not accurate in most cases and need to calibrated after every single try. That is indeed an option if you dont have any setup to take consistent photos but inaccuracies are inevitable especially in objects like iris which requires medical precision.
Once you have gathered all these "proper" information the rest is to dump these into a very simple equation which is a derivative of the "Similar Traingles".
Object height/width on sensor (mm) = Sensor height/width (mm) × Object height/width (pixels) / Sensor height/width (pixels)
Real Object height (in units) = Distance to Object (in units) × Object height on sensor (mm) / Focal Length (mm)
In the first equation, you must decide from which axis you want to measure. For instance, if the image is taken in portrait and you are measuring the width of the object on the image, then input the width of the image in pixels and width of the sensor in mm
Sensor height/width in pixels is nothing but the size of the "image"
Also you must acquire the object size in pixels by any means.
If you are taking image in landscape, make sure you are passing the correct width and height.
Equation 2 is pretty simple as well.
Things to consider:
No magnification (Digital magnification can destroy any depth info)
No Autofocus (Already Explained)
No cropping/editing image size/resizing (Already Explained)
No image skewing.(Rotating the image can make the image unfit)
Do not substitute random values for any of these inputs (Golden Advice)
Do not tilt the camera while taking images (Tilting the camera can distort the image so the object height/width will be altered)
Make sure the object and the camera is exactly in the same line
Don't use EXIF data of the image (EXIF data contains depth information which is absolute garbage since they are not accurate at all. DO NOT CONSIDER THEM)
Things I'm unsure of till now:
Lens distortion / Manufacturing defects
Effects of field of view
Perspective Foreshortening due to camera tilt
Depth field cameras
DISCLAIMER: There are multiple ways to solve this issue but I chose to use this method and I highly recommend you guys to explore more and see what you can come up with. You can basically extend this idea to measure pretty much any object using a smartphone (given the images that a normal smart phone can take)
(Please don't try to measure the size of an amoeba with this. Simply won't work but you can indeed take some of the advice I have gave for your advantage)
If you have cool ideas and issues with my answers. Please feel free to let me know I would love to have discussions. Feel free to correct me if I have made any mistakes and misunderstood any of these concepts.
Final Note:
No matter how hard you try, you cannot make something like a smartphone to work and behave like a camera sensor which is specifically designed to take images for measuring purposes. Smart phone can never beat those but sure we can manipulate the smart phone camera to achieve similar results upto a certain degree. So you guys must keep this in mind and I learnt it the hard way
I'm training a neural network on stimuli which are being developed to mimic a sensory neuroscience task to compare performance to human results.
The task is based on spatial localization of audio. I need to generate white noise audio in python to present to the neural network, but also need to alter the audio as if it were presented at different locations. I understand how I'd generate the audio, but I'm not sure on how to generate the white noise from different theoretical locations.
You can add a delay to the right or left track, to account for the arrival time at the two ears. If I recall correctly, it amounts to up to about 25 or 30 milliseconds, depending on the angle. The travel distance disparity from source to the two ears can be calculated with basic trigonometry, and then multiplied by speed of sound in air to get the delay length. (IDK what python has for controlling delays or to what granularity delay lengths can be specified.)
Most of the other cues we have for spacial location are a lot harder to quantify. Most commonly we use volume, of course. Especially for higher-pitched content (wavelengths smaller than the width of the head) the head itself can block and cause some volume differences, based on the angle.
But a lot comes from reverberation for environmental cues, from timbrel roll-off as a function of distance (a quiet sound with lots of highs in the mix can really sound like they are right next to your ear), from moving the head to capture the sound from different angles, and from the filtering effects of the pinna of the ear. Because everyone's ear shape is different, I don't know that there is a universal thumbnail algorithm for what causes a sound to be sensed as originating from a particular altitude for a given angle. I think to some extent we just all learn by experiencing the sounds with our own particular ears while observing the sound source visually.
I am planning to acquire position in 3D cartesian coordinates from an IMU (Inertial Sensor) containing Accelerometer and Gyroscope. I'm using this to track the objects position and trajectory in 3D.
1- From my limited knowledge I was under the assumption that Accelerometer alone would be enough, resulting in acceleration in xyz axis A(Ax,Ay,Az) and would need to be integrated twice to get velocity and then position, but integrating would add an unknown constant value, this error called drift increases with time. How to remove this error?
2- Furthermore, why is there a need for gyroscope in the first place, cant we just translate the x-y-z axis acceleration to displacement, if accelerometer tells the axis of motion then why check orientation from Gyroscopes. Sorry this is a very basic question, everywhere I checked both Gyro+Accel were used but don't know why.
3- Even when stationary and not in any motion there is earth's gravitation force acting on the sensor which will always give values more than that attributed by the motion of sensor. How do you remove the gravity?
Once this has been done ill apply Kalman Filters to them to fuse them and to smooth the values. How accurate is this method for trajectory estimation of an object for environments where GPS is not an option. I'm getting the Accelerometer and Gyroscope values from arduino and then importing to Python where it will be plotted on a 3D graph updating in real time. Any help would be highly appreciated, especially links to similar codes.
1 - An accelerometer can be calibrated to account for some of this drift but in the end no sensor is perfect and inaccuracy will inevitably cause drift. To fix this you would need some filter such as the Kalman filter to use the accelerometer for short high frequency data, and a secondary sensor such as a camera to periodically get the absolute position and update the internal position. This is the fundamental idea behind the Kalman filter.
2 - Accelerometers aren't very good for high frequency rotational data. Just using the accelerometers data would mean the system could not differentiate between a horizontal linear acceleration and rotational position. The gyroscope is used for the high frequency data while the accelerometer is used for low frequency data to adjust and counteract the rotational drift. A Kalman filter is one possible solution to this problem and there are many great online resources explaining this.
3 - You would have to use the methods including gyro / accel sensor fusion to get the 3d orientation of the sensor and then use vector math to subtract 1g from that orientation.
You would most likely be better off looking at some online resources to get the gist of it and then using a pre-built sensor fusion system whether it be a library or an fusion system on the accelerometer (on most accelerometers today including the mpu6050). These onboard systems typically do a better job then a simple Kalman filter and can combine other sensors such as magnetometers to gain even more accuracy.
I have an air drone with four motors and wanted to make it fly between two straight lines.
The first problem:
its initial position will be in the middle at certain height but because of the air factors it may deviate (up or down) or (left or right). I have calculated the error when it deviates left or right using the camera, but still don't know how to calculate the error of the height (using the camera too without pressure sensor).
The second problem:
after calculating these errors how to convert them from an integer to a real move.
Sorry, I couldn't provide my code. it is too large and complicated.
1) Using a single camera to calculate distance is not enough.
However, if you're using a stereo camera, you can get a distance data pretty easily. If you want to avoid using a pressure sensor, you may want to consider using a distance sensor(LIDAR or ultrasonic: check the maximum range on these) to measure the height at which your drone will fly. In addition to this, you'll require a error control algorithm eg. PID algorithm to make your drone fly at a constant height.
This is a fantastic source for understanding the fundamentals of PID.
2)For implementation:
In my opinion, this video is awesome for understanding how your sensor data will get converted to an actual movement and will help you can create an analogy. You'll also get a headstart on the code provided.
How can I find the actual real world velocity of an object using the optical flow information obtained from two images? Can anyone help me out?
as the commentators have already said we need some more information on your problem.
Basically: Yes, it is possible to calculate real world velocity from an image
But all of this depends on the following things:
Is your camera fixed or is it maybe even moving
Do you try to calculate velocity of any object moving anywhere on the scene or do you have a fixed lane, like a street filmed with a mounted camera and objects (cars) will always move along one lane?
If the latter, can you do measurements on the street in real world? Like marking points on the boardwalk (permanently or simply to find out to how long a distance of x meters in real world will appear on your camera image in pixels)
if you cannot do those measurements in the real world scene you will need to provide information on angle of the camera to the scene/ground level, distance of the camera to the scene, and parameters of your camera.
For calculating the velocity of any tracked object on the scene you'd probably need all the latter stuff to really calculate the distances in the scene. But this is much more difficult.
If you have the case of a fixed lane where you i.e. want to measure a car's velocity I would prefer the method with measuring or marking points in real world.
Because if have that information:
x m = y px
and an object has moved y px in t time (you get that time by the refreshment rate of your calculation) you can calculate how many pixels it will have moved in 1 second and since you know how many pixels are one meter you'd know its speed in meters per second (or any other unit you prefer.
You could also just set your two marks in the scene and simply measure, how many frames (and therefore how much time) the object needed to move from one marking to the other. This would give you a more averaged velocity since if you do calculations in small time steps you might get a noisy result due to segmentation problems or simply because changes are fairly small between the shorter the measured timespan is.
Well and for segmentation you could simply try a substraction method. Substract two or three following frames from each other. Moving objects (and therefore image parts that have changed) will result in non-zero values whereas color values of a steady image part should substract to something about 0.
Maybe that helps you with your problem... but of couse this depends on your setting and your desired goal... You'll need to provide more information then...
This method is quite long but in short:
What you can do is set a value that specifies the distance of object from camera.
Then capture first frame and save it somewhere.
Capture last frame and save it somewhere.
Apply threshold on both the frames.
Trim all the pixels from left of first frame and then do the same for second frame.
For detail tutorial I think this article may help you a bit.
http://morefunscience.blogspot.in/2012/05/calculating-speed-using-webcam.html