In the last office hours Sebastian mentioned the idea of creating local maps to use as 'landmarks' and then stitching those maps together via techniques like GraphSLAM. I had somewhat mixed results trying to identify traditional landmarks in image data as mentioned in this thread. So I thought that I would give this a try. I have previously worked on identifying the edges of the sidewalk in images. A couple of samples are below with the blue lines representing the closest in edges of the sidewalk that I am using as guidelines, and the green lines representing other lines that basically meet at the vanishing point, which is marked with a green dot.
Below each image is a top down view of the 'local map' from that image. The 'left-right' position of the lines (X) is along the horizontal axis and the distance from the camera (Z) is along the vertical axis. I create using the camera's intrinsic properties and by making a few assumptions. Namely that the points on the blue lines are on the ground, which is 15 in below the camera and that the lines are parallel with each other so that Z = infinity at the vanishing point.
When I calculate the distance between the guidelines for points that are relatively close to the camera I get a distance of a little more than 4 ft 9 in. Shockingly when I went out and measured the actual distance between the lines it turned out to be 4 ft 10 in (sorry I didn't seem to have a metric tape measure). So from this limited data sample I am going to say this is working well enough.
I think that my next step is to impose a matrix on each of these maps and use dynamic programming to calculate the optimal path to the far side so that I can take into account penalties for hitting the guidelines and driving on the 'wrong side' of the sidewalk. Also regarding localization, it seems like at least at the local level a Monte Carlo method would be most appropriate.
This seems like a good way to make progress, but I wonder if anyone with more background has any particular concerns or suggestions for improvement.
So just a final update from me on this in case anyone is still watching the forum. I went ahead and created a local map by using 6 inch grid cells and marking them with a '1' if there was a guideline present in them and a '0' otherwise. I won't post that because it's not that interesting, just a 10x50 matrix with 0's almost everywhere and 1's on most of the edges. In the future when I can detect obstacles I can also mark the cells where they are with a 1, but I am not doing that yet.
The next steps is to use dynamic programming to create the value and policy matrices. I decided that it would be useful to have 5 commands: straight forward, 90 degree turns to the right and left, and 45 degree turns to the right and left. So that means that I have the possibility of 8 orientations: up, up-left, left, down-left, down, down-right, right, and up-right. It also means that the car will move on diagonals along the map, which is what will happen in the real world. I have to penalize turns relative to straight forward, but probably more interesting is that I also have to penalize moving straight along a diagonal relative to moving straight up-down and left-right because moving from cell to cell along a diagonal is longer than moving up-down by a factor of sqrt(2).
Then to me the really cool thing about dynamic programming is that the optimal policy is precalculated for every grid cell and orientation so all I have to do is run the policy for a given starting point to obtain the optimal path through the local map and the steering commands. Also cool is that if somehow the car finds itself somewhere unexpected it doesn't have to recalculate anything, it just has to read out the policy based on position and orientation and follow directions.
Here is a sample path through the map:
In this case the car found itself with pointed up, but a few grid cells to the left of where it wanted to be, so it took a 45 degree right turn, when straight for a couple cells, and then a 45 degree left turn.
Here is another somewhat contrived example where the car finds itself facing down-left and makes two 90 degree turns to head back to the path.
Next step before hooking this up to an actual car is to implement localization so that the car can figure out which cell it is in. My plan for left-right localization is to just use the calculated distance form each of the guidelines, which should be plenty to put the car in the correct column. The Z direction localization is going to be more tricky, but I have an idea based on tracking the relative movement of strong corners in the each image and calculating the delta Z from that.
Actually Michael is demonstrating how 3D content can be derived from a single image with some knowledge of perspective. This is a great lesson Michael! Stereo would be nice as well, but there is so much information in one image already.
Some time ago I've read about SLAM algorithm implementation (sorry, but I do not remember where it was) where authors were detecting horizontal lines (as I remember, with standard algorithm available in OpenCV) in the images using two cameras. After that, they calculate the equation of the line in 3D space (by correlating images from two cameras). Such 3D lines then could be used as unique landmarks in the 3D space.
So I think it would be cool if you can use your line detection algorithm to deduce the equation of 3D line which will then serve as good unique landmark in the 3D space.
answered 04 Apr '12, 05:20
pretty cool work, you can share the code?
answered 04 Apr '12, 07:59