3D Scanning with Kinect v2

Getting an accurate surface scan with KinectV2

By on May 11, 2015 |

Video with brief overview

What the data can usually look like


The Kinect V2 measures the time it takes infrared light to leave and come back to the sensor. This process is not 100% accurate, so pixels will vary by a certain range – the bumps you see in the above picture will be different on each frame from the Kinect V2.

Also, the closer to the edge of the sensor range and the less infrared light available causes an even greater error range- as can be seen by the increased amount of bumps on the picture of a flat wall above.

Just averaging won’t work

Sometimes the pixels from the Kinect V2 will be bad data, either in the form of a dead pixel (no data) or being wild and way off from the actual point.

If you average multiple frames together, excluding the dead pixels, you’ll still include the wild pixels which can throw the average out of the error range.

Staying within Error Range

If a pixel stays within the error range of where the actual position is, then the pixel is good data and can be used in calculations that merge different meshes together.

If the pixel is changed in a way that it might no longer fall within the error range of the actual position, then the scan will be seen as having a different real world structure when compared with another scan that has the same pixel on the far side of the error range.

Averaging within the trend

To handle dead and wild pixels, a pixel is examined across several frames. Dead pixels are excluded from the comparison and the remaining pixels are measured to see how far each one is from all of the other pixels.

The total distance from everything else is used to determine where the ‘cluster’ of pixels is located, as the lower the number will mean the closer to the group the pixel is. The pixels are sorted and the one in the middle of the group is used as the middle distance. If a pixel’s total distance from every other pixel ends up being more than 1.5* times the middle distance, then it is excluded from the average. *1.5 is just a range I arbitrarily picked, I haven’t optimized this and is subject to change.

The calculation handles when pixels have a large error range and a small error range also. If one pixel is off on it’s own, it won’t sway the average unless it’s a trend that all the pixels are spread out.

The result

Single capture with no alteration done


Below is the capture of 6 frames with the dead and wilds not being included. No data alteration was done so this frame still falls within the error range. It’s an average of where pixels are trending to be.


If you don’t see the differences, check out these: On the left is one frame, on the right is when the trend is averaged.

Left: Single Frame.  Right: Trending Frame

Left: Single Frame. Right: Trending Frame

Note: Scan from Life is in no way affiliated with Microsoft. We are an entirely separate company that has created a product that is dependent on a Microsoft owned product.