Awesome

硬件

相机技术&参数

相机生产厂家汇总

公司名称	主要业务	岗位	base
海康威视	安防监控、机器视觉解决方案	机器视觉算法、三维重建算法、立体视觉算法	杭州
大华	视频监控、机器视觉系统	机器视觉算法、立体视觉算法	杭州
奥比中光	3D相机、3D视觉解决方案	3D视觉、三维重建、立体视觉	深圳/西安/上海
北京凌云光技术集团	工业视觉、解决方案、3D相机	图像算法	北京
大恒图像	机器视觉系统、3D相机、3D传感器	图像算法、机器视觉算法	北京
基恩士	机器视觉解决方案、视觉产品		上海/苏州
康耐视	机器视觉解决方案、传感器	机器视觉	上海/苏州/杭州
埃尔森智能科技	机器人3D视觉、3D相机、结构光		郑州
	立体视觉、双目感知		北京
阿丘科技	AI视觉解决方案	AI算法、视觉算法、机器视觉	北京/苏州/深圳
图漾科技	3D视觉系统、产品	视觉算法、图像处理	上海
精锐视觉	工业AI视觉	图像算法	深圳/上海
华夏视科	工业视觉、图像检测	图像算法	北京/上海
Sick（德国）	机器视觉解决方案	机器视觉工程师	北京/上海/深圳/广州
光鉴科技	3D视觉解决方案	TOF算法、3D视觉算法	上海
征图新视	机器视觉解决方案	机器视觉、深度学习	深圳/常州/苏州
中星微电子	图像芯片	深度学习、视频图像处理	北京/上海
捷尚视觉	视频智能分析	图像算法	杭州
先临三维科技	三维扫描仪、三维成像	三维视觉算法、点云算法	杭州
华睿科技	机器视觉	机器视觉工程师	杭州
蓝芯科技	视觉系统、3D视觉传感器		杭州
微视图像	机器视觉、工业相机、3D相机		北京
库柏特科技	机器人、3D视觉产品		武汉
辰视智能	结构光、3D系统、双目、多目		深圳
星上维智能科技	结构光、三维机器视觉、三维扫描仪		广州
创科视觉	机器视觉系统、3D相机		深圳

相机标定

综述

线阵相机标定方法综述

单相机标定

手眼标定

一种新的机器人手眼关系标定方法

其它

基于张正友标定法的红外靶标系统

3D全景相机

360°环视

鱼眼相机标定

A Practical Toolbox for Calibrating Omnidirectional Cameras

多相机拼接

主要通过SIFT、SURF、Harris等算法进行特征点对应匹配。

3D视觉资源汇总

书籍

视觉测量[张广军]
机器人视觉测量与控制[徐德，谭民，李原]
Machine Vision 2016: Automated Visual Inspection: Theory, Practice and Applications

资源

https://github.com/timzhang642/3D-Machine-Learning

https://github.com/sunglok/3dv_tutorial(涉及SLAM、多视图几何代码示例)

SLAM

优秀开源项目汇总

https://github.com/OpenSLAM/awesome-SLAM-list

https://github.com/tzutalin/awesome-visual-slam

https://github.com/kanster/awesome-slam

https://github.com/YoujieXia/Awesome-SLAM

Recent_SLAM_Research

https://github.com/youngguncho/awesome-slam-datasets

https://github.com/marknabil/SFM-Visual-SLAM

https://github.com/ckddls1321/SLAM_Resources

激光SLAM

分为前端和后端。其中前端主要完成匹配和位置估计，后端主要完成进一步的优化约束。

整个SLAM大概可以分为前端和后端，前端相当于VO（视觉里程计），研究帧与帧之间变换关系。首先提取每帧图像特征点，利用相邻帧图像，进行特征点匹配，然后利用RANSAC去除大噪声，然后进行匹配，得到一个pose信息（位置和姿态），同时可以利用IMU（Inertial measurement unit惯性测量单元）提供的姿态信息进行滤波融合。

后端则主要是对前端出结果进行优化，利用滤波理论（EKF、UKF、PF）、或者优化理论TORO、G2O进行树或者图的优化。最终得到最优的位姿估计。

数据预处理

点云匹配

地图构建

视觉SLAM

Books

视觉SLAM十四讲高翔
机器人学中的状态估计
概率机器人
Simultaneous Localization and Mapping for Mobile Robots: Introduction and Methods by Juan-Antonio Fernández-Madrigal and José Luis Blanco Claraco, 2012
Simultaneous Localization and Mapping: Exactly Sparse Information Filters by Zhan Wang, Shoudong Huang and Gamini Dissanayake, 2011
An Invitation to 3-D Vision -- from Images to Geometric Models by Yi Ma, Stefano Soatto, Jana Kosecka and Shankar S. Sastry, 2005
Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman, 2004
Numerical Optimization by Jorge Nocedal and Stephen J. Wright, 1999

Courses&&Lectures

SLAM Tutorial@ICRA 2016
Geometry and Beyond - Representations, Physics, and Scene Understanding for Robotics at Robotics: Science and Systems (2016)
Robotics - UPenn on Coursera by Vijay Kumar (2016)
Robot Mapping - UniFreiburg by Gian Diego Tipaldi and Wolfram Burgard (2015-2016)
Robot Mapping - UniBonn by Cyrill Stachniss (2016)
Introduction to Mobile Robotics - UniFreiburg by Wolfram Burgard, Michael Ruhnke and Bastian Steder (2015-2016)
Computer Vision II: Multiple View Geometry - TUM by Daniel Cremers ( Spring 2016)
Advanced Robotics - UCBerkeley by Pieter Abbeel (Fall 2015)
Mapping, Localization, and Self-Driving Vehicles at CMU RI seminar by John Leonard (2015)
The Problem of Mobile Sensors: Setting future goals and indicators of progress for SLAM sponsored by Australian Centre for Robotics and Vision (2015)
Robotics - UPenn by Philip Dames and Kostas Daniilidis (2014)
Autonomous Navigation for Flying Robots on EdX by Jurgen Sturm and Daniel Cremers (2014)
Robust and Efficient Real-time Mapping for Autonomous Robots at CMU RI seminar by Michael Kaess (2014)
KinectFusion - Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera by David Kim (2012)

Code

Project	Language	License
COSLAM	C++	GNU General Public License
DSO-Direct Sparse Odometry	C++	GPLv3
DTSLAM-Deferred Triangulation SLAM	C++	modified BSD
LSD-SLAM	C++/ROS	GNU General Public License
MAPLAB-ROVIOLI	C++/ROS	Apachev2.0
OKVIS: Open Keyframe-based Visual-Inertial SLAM	C++	BSD
ORB-SLAM	C++	GPLv3
REBVO - Realtime Edge Based Visual Odometry for a Monocular Camera	C++	GNU General Public License
SVO semi-direct Visual Odometry	C++/ROS	GNU General Public License

计算机视觉

资源汇总

Books

Computer Vision: Models, Learning, and Inference - Simon J. D. Prince 2012
Computer Vision: Theory and Application - Rick Szeliski 2010
Computer Vision: A Modern Approach (2nd edition) - David Forsyth and Jean Ponce 2011
Multiple View Geometry in Computer Vision - Richard Hartley and Andrew Zisserman 2004
Visual Object Recognition synthesis lecture - Kristen Grauman and Bastian Leibe 2011
Computer Vision for Visual Effects - Richard J. Radke, 2012
High dynamic range imaging: acquisition, display, and image-based lighting - Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., Myszkowski, K 2010
Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics - Justin Solomon 2015

Courses

EENG 512 / CSCI 512 - Computer Vision - William Hoff (Colorado School of Mines)
3D Computer Vision: Past, Present, and Future
Visual Object and Activity Recognition - Alexei A. Efros and Trevor Darrell (UC Berkeley)
Computer Vision - Steve Seitz (University of Washington)
Visual Recognition Spring 2016, Fall 2016 - Kristen Grauman (UT Austin)
Language and Vision - Tamara Berg (UNC Chapel Hill)
Convolutional Neural Networks for Visual Recognition - Fei-Fei Li and Andrej Karpathy (Stanford University)
Computer Vision - Rob Fergus (NYU)
Computer Vision - Derek Hoiem (UIUC)
Computer Vision: Foundations and Applications - Kalanit Grill-Spector and Fei-Fei Li (Stanford University)
High-Level Vision: Behaviors, Neurons and Computational Models - Fei-Fei Li (Stanford University)
Advances in Computer Vision - Antonio Torralba and Bill Freeman (MIT)
Computer Vision - Bastian Leibe (RWTH Aachen University)
Computer Vision 2 - Bastian Leibe (RWTH Aachen University)
Computer Vision Pascal Fua (EPFL):
Computer Vision 1 Carsten Rother (TU Dresden):
Computer Vision 2 Carsten Rother (TU Dresden):
Multiple View Geometry Daniel Cremers (TU Munich):

深度学习

Github link

1、https://github.com/ChristosChristofidis/awesome-deep-learning

2、https://github.com/endymecy/awesome-deeplearning-resources

机器学习

Github link

1、https://github.com/josephmisiti/awesome-machine-learning

3D点云

点云标注工具

开源

Semantic.editor

商用

商用软件很多，阿里、腾讯、百度、京东都有对应业务

点云获取

传统的点云获取技术包括非接触式测量和接触式测量两种，它们的主要区别在于，在测量过程中测头是否与工件的表面相接触。

非接触式测量是利用光学原理的方法采集数据，例如结构光法、测距法以及干涉法等。该方法的优点在于测量速度较快、测量精度高，并且能够获得高密度点云数据，但其测量精度易受外界因素干扰，而且测量物体表面的反射光与环境光对测量精度也有一定影响。

相反，接触式测量是通过将测头上的测量传感器与被测物体的外表面相接触，然后通过移动测头来读取物体表面点的三维坐标值。该方法的优点在于测头的结构相对固定，并且其测量结果不受被测物体表面的材料与表面特性等因素的影响。这种方法的不足在于，由于测头长期与被测物体表面相接触，易产生磨损，并且这种测量方式的测量速度较慢，不适合测量几何结构较复杂的物体。

点云应用场景

逆向工程、游戏人物重建、文物保护、数字博物馆、医疗辅助、三维城市建模

点云种类

不同的点云获取技术获取的点云数据类型不同，根据点云数据中点的分布情况可将点云数据划分为以下四种类型

散乱点云

散乱点云是指所有数据点在空间中以散乱状态分布，任意两点之间没有建立拓扑连接关系。一般而言，激光点测量系统获得的点云数据以及坐标测量机在随机扫描状态下获得的点云数据都为散乱点云数据。

扫描线点云

测量设备所获得的三维点云数据是由多条直线或曲线构成，点与点之间有一定的拓扑连接关系。一般而言，这种点云数据类型常见于扫描式点云数据中。

网格化点云

网格化点云是指点云数据中任意一点，均对应于其参数域所对应的一个均匀网格的顶点。当对空间散乱点云进行网格化插值时，所获得的点云数据即为网格化点云数据。

多边形点云

多边形点云是指分布在一组平面内的点云数据，该组平面内的平面两两互相平行，并且一个平面内距离最近的点连接起来可以形成平面多边形。这种点云数据常见于等高线测量、CT 测量等获得的点云数据中。

点云去噪&滤波

主要包括双边滤波、高斯滤波、条件滤波、直通滤波、随机采样一致滤波、VoxelGrid滤波等

三角网格去噪算法、

有序点云去噪

孤立点排异法、曲线拟合法、弦高差法、全局能量法和滤波法.

孤立点排异法是通过观察点云数据，然后将与扫描线偏离较大的点剔除掉，从而达到去噪的目的。这类方法简单，可除去比较明显的噪声点，但缺点是只能对点云做初步的去噪处理，并不能滤除与真实点云数据混合在一起的噪声数据点。曲线拟合法是根据给定数据点的首末点，然后通过最小二乘等方法拟合一条曲线，通常为3到4 阶，最后计算中间的点到该曲线的距离，如果该距离大于给定阈值，则该点为噪声点，予以删除，相反，如果该距离小于给定阈值，则该点为正常点，应该保留。弦高差法通过连接给定点集的首末点形成弦，然后求取中间每个点到该弦的距离，如果该距离小于给定阈值，则该点为正常点，予以保留，相反，若大于给定阈值，则该点为噪声点，予以删除。全局能量法通常用于网格式点云去噪，它通过建立整个曲面的能量方程，并求该方程在约束情况下的能量值的最小值。可以看出，这是一个全局最优化问题，因为网格数量比较大，因此会消耗大量的计算机资源与计算时间，而且由于约束方程是建立在整体网格的基础上，所以对于局部形状的去噪效果并不是很好。滤波法也是一种常用的有序点云去噪方法，它通过运用信号处理中的相关方法，使用合适的滤波函数对点云数据进行去噪处理，常用的滤波方法主要包括高斯滤波、均值滤波以及中值滤波法等。

无序点云去噪&空间散乱点云去噪算法

目前，针对空间散乱点云数据去噪方法，主要分为两类方法，即基于网格模型的去噪方法和直接对空间点云数据进行去噪的方法。

其中，基于网格模型的去噪方法需要首先建立点云的三角网格模型，然后计算所有三角面片的纵横比和顶点方向的曲率值，并将该值与相应的阈值进行比较，若小于阈值，则为正常点，予以保留，相反，则为噪声点，予以删除。由于该方法需要对空间点云数据进行三角网格剖分，所以，往往比较复杂，并需要大量计算。

点云精简

采用三维激光扫描仪获得的点云数据往往十分密集，点云数据中点的数量往往高达千万级甚至数亿级，即使对点云数据进行了去噪处理，点云数据中点的数量还是很多，所以往往不会直接使用这些原始点云数据进行曲面重建等工作，因为这会使后续处理过程变得耗时并且消耗过多的计算机资源，而且重构的曲面，其精度也不一定高，甚至出现更大的误差。所以，在进行空间点云曲面重建之前，往往需要对高密度的点云数据进行点云精简操作。点云精简的目的是在保持原始点云的形状特征以及几何特征信息的前提下，尽量删除多余的数据点。

目前，空间散乱点云数据的精简方法主要分为两大类：基于三角网格模型的空间点云精简方法与直接基于数据点的空间点云精简方法。

其中，基于三角网格模型的空间点云精简方法需要先对点云数据进行三角剖分处理，建立其相应的三角网格拓扑结构，然后再对该三角网格进行处理，并将区域内那些形状变化较小的三角形进行合并，最后删除相关的三角网格顶点，从而达到点云数据精简的目的。这种方法需要对点云数据建立其相应的三角网格，该过程比较复杂，且因为需要存储网格数据，故需要消耗大量的计算机系统资源，并且该方法的抗噪能力较弱，对含有噪声的点云数据，构造的三角网格可能会出现变形等情况，因此精简后的点云数据经过曲面重建后的模型与原始点云经过曲面重建后的模型可能大不相同。因此，目前关于直接基于点云数据的精简方法成为点云精简方法的主流。这种方法依据点云数据点之间的空间位置关系来建立点云的拓扑连接关系，并根据建立的拓扑连接关系计算点云数据中每个数据点的几何特征信息，最后根据这些特征信息来对点云数据进行点云精简处理。相比基于三角网格的空间点云精简方法，由于直接基于点云数据点的精简方法无需计算和存储复杂的三角网格结构，使得其精简的效率相对较高。因此，本章只研究直接基于空间点云数据的精简算法。

其中基于空间点云精简方法主要有：空间包围盒法、基于聚类的方法、法向偏差法、曲率精简法、平局点距法以及均匀栅格划分法。

Paper

点模型的几何图像简化法
基于相似性的点模型简化算法
基于最小曲面距离的快速点云精简算法
大规模点云选择及精简
一种基于模糊聚类的海量测量数据简化方法
基于均值漂移聚类的点模型简化方法
基于局部曲面拟合的散乱点云简化方法

点云关键点

常见的三维点云关键点提取算法有一下几种：ISS3D、Harris3D、NARF、SIFT3D，这些算法在PCL库中都有实现，其中NARF算法是用的比较多的

点云描述

如果要对一个三维点云进行描述，光有点云的位置是不够的，常常需要计算一些额外的参数，比如法线方向、曲率、文理特征等等。如同图像的特征一样，我们需要使用类似的方式来描述三维点云的特征。

常用的特征描述算法有：法线和曲率计算、特征值分析、PFH、FPFH、SHOT、VFH、CVFH、3D Shape Context、Spin Image等。PFH：点特征直方图描述子，FPFH：跨苏点特征直方图描述子，FPFH是PFH的简化形式。

点云线、面拟合

针对直线拟合：RANSAC算法、最小二乘法、平面相交法

针对曲线拟合：拉格朗日插值法、最小二乘法、Bezier曲线拟合法、B样条曲线法（二次、三次B样条曲线拟合）

针对平面拟合：主成成分分析、最小二乘法、粗差探测法、抗差估计法

针对曲面拟合：最小二乘法（正交最小二乘、移动最小二乘）、NURBS、 Bezier

三维激光扫描拟合平面自动提取算法
点云平面拟合新方法
海量散乱点的曲面重建算法研究
一种稳健的点云数据平面拟合方法
迭代切片算法在点云曲面拟合中的应用
基于最小二乘的点云叶面拟合算法研究
点云曲面边界线的提取

点云体积计算

基于三维点云求取物理模型体积的研究算法大致可分为以下 4 大类。

1.凸包算法：使用凸包模型近似表示不规则体，再通过把凸包模型切片分割进行累加、或将凸包模型分解为上下两个三角网格面，采用正投影法求取两者的投影体积，其差即所求体积。此方法适用于凸模型，非凸模型误差较大。

2.模型重建法：在得到点云数据后，使用三角面片构建物理模型的方法求得体积。该算法受点云密度、生成的三角网格数量、点精度影响较大，易产生孔洞。

3.切片法：将点云沿某一坐标轴方向进行切片处理，再计算切片上下两表面的面积，通过累加切片体积求得总体积。该方法受到切片厚度的影响，切片厚度越小，计算精度越高但会导致计算效率下降。

4.投影法：先将点云投影进行三角形剖分，再将投影点与其原对应点构建出五面体，通过累加五面体体积求得总体积。该算法同样容易产生孔洞。上述算法，无论是通过三维点云先构建物理模型再求体积、还是基于三维点云通过几何方法直接求体积，当激光雷达采集的三维点云存在密度不均匀、空间物体存在过渡带或过渡线等问题时，重建三维模型的误差较大，体积计算精度不高。

点云识别&分类

分类：基于点的分类，基于分割的分类，监督分类与非监督分类

除此之外，还可以基于描述向量/关键点描述进行分类。

3D ShapeNets: A Deep Representation for Volumetric Shapes
PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models[ICCV2017]
[ICCV2017] Colored Point Cloud Registration Revisited.
[ICRA2017] SegMatch: Segment based place recognition in 3D point clouds.
[IROS2017] 3D object classification with point convolution network.
[CVPR2018] Pointwise Convolutional Neural Networks.
[CVPR2018] SO-Net: Self-Organizing Network for Point Cloud Analysis.
[CVPR2018] PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition.
[CVPR2018] PointGrid: A Deep Network for 3D Shape Understanding.
[CVPR2019] Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition.
[MM] MMJN: Multi-Modal Joint Networks for 3D Shape Recognition.

点云匹配&配准&对齐&注册

点云配准的概念也可以类比于二维图像中的配准，只不过二维图像配准获取得到的是x，y，alpha，beta等放射变化参数，三维点云配准可以模拟三维点云的移动和对齐，也就是会获得一个旋转矩阵和一个平移向量，通常表达为一个4×3的矩阵，其中3×3是旋转矩阵，1x3是平移向量。严格说来是6个参数，因为旋转矩阵也可以通过罗格里德斯变换转变成1*3的旋转向量。

常用的点云配准算法有两种：正太分布变换和著名的ICP点云配准，此外还有许多其它算法，列举如下：

ICP：稳健ICP、point to plane ICP、point to line ICP、MBICP、GICP

NDT 3D、Multil-Layer NDT

FPCS、KFPSC、SAC-IA

Line Segment Matching、ICL

An ICP variant using a point-to-line metric
Generalized-ICP
Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration
Metric-Based Iterative Closest Point Scan Matching for Sensor Displacement Estimation
NICP: Dense Normal Based Point Cloud Registration
Efficient Global Point Cloud Alignment using Bayesian Nonparametric Mixtures[CVPR2017]
3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions[CVPR2017]
[CVPR2018] Density Adaptive Point Set Registration.
[CVPR2018] Inverse Composition Discriminative Optimization for Point Cloud Registration.
[CVPR2018] PPFNet: Global Context Aware Local Features for Robust 3D Point Matching.
[ECCV2018] Learning and Matching Multi-View Descriptors for Registration of Point Clouds.
[ECCV2018] 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration.
[ECCV2018] Efficient Global Point Cloud Registration by Matching Rotation Invariant Features Through Translation Search.
[IROS2018] Robust Generalized Point Cloud Registration with Expectation Maximization Considering Anisotropic Positional Uncertainties.
[CVPR2019] PointNetLK: Point Cloud Registration using PointNet.
[CVPR2019] SDRSAC: Semidefinite-Based Randomized Approach for Robust Point Cloud Registration without Correspondences.
[CVPR2019] The Perfect Match: 3D Point Cloud Matching with Smoothed Densities.
[CVPR] FilterReg: Robust and Efficient Probabilistic Point-Set Registration using Gaussian Filter and Twist Parameterization.
[CVPR2019] 3D Local Features for Direct Pairwise Registration.
[ICCV2019] DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration.
[ICCV2019] Deep Closest Point: Learning Representations for Point Cloud Registration.
[ICRA2019] 2D3D-MatchNet: Learning to Match Keypoints across 2D Image and 3D Point Cloud.
[CVPR2019] The Perfect Match: 3D Point Cloud Matching with Smoothed Densities.
[CVPR2019] 3D Local Features for Direct Pairwise Registration.
[ICCV2019] Robust Variational Bayesian Point Set Registration.
[ICRA2019] Robust low-overlap 3-D point cloud registration for outlier rejection.
Learning multiview 3D point cloud registration[CVPR2020]

点云匹配质量评估

[IROS2017] Analyzing the quality of matched 3D point clouds of objects.

点云分割

点云的分割也算是一个大Topic了，这里因为多了一维就和二维图像比多了许多问题，点云分割又分为区域提取、线面提取、语义分割与聚类等。同样是分割问题，点云分割涉及面太广，确实是三言两语说不清楚的。只有从字面意思去理解了，遇到具体问题再具体归类。一般说来，点云分割是目标识别的基础。

分割主要有四种方法：基于边的区域分割、基于面的区域分割、基于聚类的区域分割、混合区域分割方法、深度学习方法

分割：区域声场、Ransac线面提取、NDT-RANSAC、K-Means（谱聚类）、Normalize Cut、3D Hough Transform(线面提取)、连通分析

基于局部表面凸性的散乱点云分割算法研究
三维散乱点云分割技术综述
基于聚类方法的点云分割技术的研究
SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor
From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds
Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation
JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation,CVPR2017
[ICRA2017] SegMatch: Segment based place recognition in 3D point clouds.
[3DV2017] SEGCloud: Semantic Segmentation of 3D Point Clouds.
[CVPR2018] Recurrent Slice Networks for 3D Segmentation of Point Clouds.
[CVPR2018] SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation.
[CVPR2018] Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs.
[ECCV2018] 3D Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation.
[CVPR2019] JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields.
[CVPR2019] PartNet: A Recursive Part Decomposition Network for Fine-grained and Hierarchical Shape Segmentation.
[ICCV2019] 3D Instance Segmentation via Multi-Task Metric Learning.
[IROS2019] PASS3D: Precise and Accelerated Semantic Segmentation for 3D Point Cloud.

点云目标检索

这是点云数据处理中一个偏应用层面的问题，简单说来就是Hausdorff距离常被用来进行深度图的目标识别和检索，现在很多三维人脸识别都是用这种技术来做的。

点云三维重建

我们获取到的点云数据都是一个个孤立的点，如何从一个个孤立的点得到整个曲面呢，这就是三维重建的topic。

在玩kinectFusion时候，如果我们不懂，会发现曲面渐渐变平缓，这就是重建算法不断迭代的效果。我们采集到的点云是充满噪声和孤立点的，三维重建算法为了重构出曲面，常常要应对这种噪声，获得看上去很舒服的曲面。

常用的三维重建算法和技术有：

泊松重建、Delauary triangulatoins(Delauary三角化)

表面重建，人体重建，建筑物重建，输入重建

实时重建：重建纸杯或者农作物4D生长台式，人体姿势识别，表情识别

改进的点云数据三维重建算法
Scalable Surface Reconstruction from Point Clouds with Extreme Scale and Density Diversity,CVPR2017
[ICCV2017] PolyFit: Polygonal Surface Reconstruction from Point Clouds.
[ICCV2017] From Point Clouds to Mesh using Regression.
[ECCV2018] Efficient Dense Point Cloud Object Reconstruction using Deformation Vector Fields.
[ECCV2018] HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration.
[AAAI2018] Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction.
[CVPR2019] Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes.
[AAAI2019] CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision.
[MM] L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention.
SurfNet: Generating 3D shape surfaces using deep residual networks

点云其它

[CVPR2018] Reflection Removal for Large-Scale 3D Point Clouds.
[ICML2018] Learning Representations and Generative Models for 3D Point Clouds.
[3DV] PCN: Point Completion Network.
[CVPR2019] PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding.
[CVPR2019] ClusterNet: Deep Hierarchical Cluster Network with Rigorously Rotation-Invariant Representation for Point Cloud Analysis.
[ICCV2019] LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis.
[ICRA2019] Speeding up Iterative Closest Point Using Stochastic Gradient Descent.

点云数据集

[KITTI] The KITTI Vision Benchmark Suite.
[ModelNet] The Princeton ModelNet .
[ShapeNet] A collaborative dataset between researchers at Princeton, Stanford and TTIC.
[PartNet] The PartNet dataset provides fine grained part annotation of objects in ShapeNetCore.
[PartNet] PartNet benchmark from Nanjing University and National University of Defense Technology.
[S3DIS] The Stanford Large-Scale 3D Indoor Spaces Dataset.
[ScanNet] Richly-annotated 3D Reconstructions of Indoor Scenes.
[Stanford 3D] The Stanford 3D Scanning Repository.
[UWA Dataset] .
[Princeton Shape Benchmark] The Princeton Shape Benchmark.
[SYDNEY URBAN OBJECTS DATASET] This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees.
[ASL Datasets Repository(ETH)] This site is dedicated to provide datasets for the Robotics community with the aim to facilitate result evaluations and comparisons.
[Large-Scale Point Cloud Classification Benchmark(ETH)] This benchmark closes the gap and provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total.
[Robotic 3D Scan Repository] The Canadian Planetary Emulation Terrain 3D Mapping Dataset is a collection of three-dimensional laser scans gathered at two unique planetary analogue rover test facilities in Canada.
[Radish] The Robotics Data Set Repository (Radish for short) provides a collection of standard robotics data sets.
[IQmulus & TerraMobilita Contest] The database contains 3D MLS data from a dense urban environment in Paris (France), composed of 300 million points. The acquisition was made in January 2013.
[Oakland 3-D Point Cloud Dataset] This repository contains labeled 3-D point cloud laser data collected from a moving platform in a urban environment.
[Robotic 3D Scan Repository] This repository provides 3D point clouds from robotic experiments，log files of robot runs and standard 3D data sets for the robotics community.
[Ford Campus Vision and Lidar Data Set] The dataset is collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck.
[The Stanford Track Collection] This dataset contains about 14,000 labeled tracks of objects as observed in natural street scenes by a Velodyne HDL-64E S2 LIDAR.
[PASCAL3D+] Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild.
[3D MNIST] The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.
[WAD] [ApolloScape] The datasets are provided by Baidu Inc.
[nuScenes] The nuScenes dataset is a large-scale autonomous driving dataset.
[PreSIL] Depth information, semantic segmentation (images), point-wise segmentation (point clouds), ground point labels (point clouds), and detailed annotations for all vehicles and people. [paper]
[3D Match] Keypoint Matching Benchmark, Geometric Registration Benchmark, RGB-D Reconstruction Datasets.
[BLVD] (a) 3D detection, (b) 4D tracking, (c) 5D interactive event recognition and (d) 5D intention prediction. [ICRA 2019 paper]
[PedX] 3D Pose Estimation of Pedestrians, more than 5,000 pairs of high-resolution (12MP) stereo images and LiDAR data along with providing 2D and 3D labels of pedestrians. [ICRA 2019 paper]
[H3D] Full-surround 3D multi-object detection and tracking dataset. [ICRA 2019 paper]
[Matterport3D] RGB-D: 10,800 panoramic views from 194,400 RGB-D images. Annotations: surface reconstructions, camera poses, and 2D and 3D semantic segmentations. Keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and scene classification. [3DV 2017 paper] [code] [blog]
[SynthCity] SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Nine categories.
[Lyft Level 5] Include high quality, human-labelled 3D bounding boxes of traffic agents, an underlying HD spatial semantic map.
[SemanticKITTI] Sequential Semantic Segmentation, 28 classes, for autonomous driving. All sequences of KITTI odometry labeled. [ICCV 2019 paper]
[NPM3D] The Paris-Lille-3D has been produced by a Mobile Laser System (MLS) in two different cities in France (Paris and Lille).
[The Waymo Open Dataset] The Waymo Open Dataset is comprised of high resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions.
[A*3D: An Autonomous Driving Dataset in Challeging Environments] A*3D: An Autonomous Driving Dataset in Challeging Environments.
[PointDA-10 Dataset] Domain Adaptation for point clouds.
[Oxford Robotcar] The dataset captures many different combinations of weather, traffic and pedestrians.

三维重建

资料汇总：https://github.com/openMVG/awesome_3DReconstruction_list

单目图像

主要分为基于SfM三维重建和基于Deep learning的三维重建方法，sfM方法在下节将会做详细介绍，基于深度学习方式，主要通过RGB图像生成深度图。

Paper

Unsupervised Monocular Depth Estimation with Left-Right Consistency
Unsupervised Learning of Depth and Ego-Motion from Video
Deep Ordinal Regression Network for Monocular Depth Estimation
Depth from Videos in the Wild
Attention-based Context Aggregation Network for Monocular Depth Estimation
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network（NIPS2014）
Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture（ICCV2015)
Deeper Depth Prediction with Fully Convolutional Residual Networks
Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation(CVPR2017)
Single View Stereo Matching

Project with code

Project	Paper	Framework
3dr2n2: A unified approach for single and multi-view 3d object Reconstruction	ECCV 2016	Theano
Learning a predictable and generative vector representation for objects	ECCV 2016	Caffe
Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling	NIPS 2016	Torch 7
Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision	NIPS 2016	Torch 7
Deep disentangled representations for volumetric reconstruction	ECCV 2016
Multi-view 3D Models from Single Images with a Convolutional Network	ECCV 2016	Tensorflow
Single Image 3D Interpreter Network	ECCV 2016	Torch 7
Weakly-Supervised Generative Adversarial Networks for 3D Reconstruction	3DV 2017	Theano
Hierarchical Surface Prediction for 3D Object Reconstruction	3DV 2017	Torch 7
Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs	ICCV 2017	Caffe
Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency	CVPR 2017	Torch 7
SurfNet: Generating 3D shape surfaces using deep residual networks	CVPR 2017	Matlab
A Point Set Generation Network for 3D Object Reconstruction from a Single Image	CVPR 2017	Tensorflow
O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis	SIGGRAPH 2017	Caffe
Rethinking Reprojection: Closing the Loop for Pose-aware Shape Reconstruction from a Single Image	ICCV 2017
Scaling CNNs for High Resolution Volumetric Reconstruction From a Single Image	ICCV 2017
Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55	ICCV 2017
Learning a Hierarchical Latent-Variable Model of 3D Shapes	3DV 2018	Tensorflow
Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction	AAAI 2018	Tensorflow
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image	ACCV 2018	Tensorflow
Image2Mesh: A Learning Framework for Single Image 3DReconstruction	ACCV 2018	Pytorch
Neural 3D Mesh Renderer	CVPR 2018	Chainer
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction	CVPR 2018	Torch 7
Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers	CVPR 2018	Pytorch
AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation	CVPR 2018	Pytorch
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images	ECCV 2018	Tensorflow
Multiresolution Tree Networks for 3D Point Cloud Processing	ECCV 2018	Pytorch
Adaptive O-CNN: A Patch-based Deep Representation of 3D Shapes	SIGGRAPH Asia 2018
Learning Implicit Fields for Generative Shape Modeling	CVPR 2019	Tensorflow
Occupancy Networks: Learning 3D Reconstruction in Function Space	CVPR 2019	Pytorch
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation	CVPR 2019	Pytorch

结构光

结构光投影三维成像目前是机器人3D 视觉感知的主要方式,结构光成像系统是由若干个投影仪和相机组成, 常用的结构形式有: 单投影仪-单相机、单投影仪-双相机、单投影仪-多相机、单相机-双投影仪和单相机-多投影仪等典型结构形式.

结构光投影三维成像的基本工作原理是：投影仪向目标物体投射特定的结构光照明图案,由相机摄取被目标调制后的图像,再通过图像处理和视觉模型求出目标物体的三维信息. 常用的投影仪主要有下列几种类型:液晶投影(LCD)、数字光调制投影(DLP)[如数字微镜器件 (DMD)]、激光 LED图案直接投影. 根据结构光投影次数划分,结构光投影三维成像可以分成单次投影3D和多次投影3D方法.

按照扫描方式又可分为：线扫描结构光、面阵结构光

参考链接：https://zhuanlan.zhihu.com/p/29971801

结构光三维表面成像：综述（一）

结构光三维表面成像：综述（二）

结构光三维表面成像：综述（三）

综述

Structured-light 3D surface imaging: a tutorial

机器人视觉三维成像技术综述

Real-time structured light profilometry a review

A state of the art in structured light patterns for surface profilometry

Phase shifting algorithms for fringe projection profilometry: a review

Overview of the 3D profilometry of phase shifting fringe projection

Temporal phase unwrapping algorithms for fringe projection profilometry:a comparative review

Lectures&Video

标定

单次投影成像

单次投影结构光主要采用空间复用编码和频率复用编码形式实现 ,常用的编码形式有:彩色编码、灰度索引、几何形状编码和随机斑点. 目前在机器人手眼系统应用中,对于三维测量精度要求不高的场合,如码垛、拆垛、三维抓取等,比较受欢迎的是投射伪随机斑点获得目标三维信息。

One-shot pattern projection for dense and accurate 3D acquisition in structured light
A single-shot structured light means by encoding both color and geometrical features
Dynamic 3D surface profilometry using a novel colour pattern encoded with a multiple triangular mode
Review of single-shot 3D shape measurement by phase calculation-based fringe projection techniques
Robust pattern decoding in shape-coded structured light

多次投影成像

多次投影3D方法主要采用时间复用编码方式实现,常用的图案编码形式有:二进制编码、多频相移编码和混合编码法(如格雷码＋相移条纹)等.

但是格雷码方法仅能在投射空间内进行离散的划分,空间分辨率受到成像器件的限制. 为了提高空间分辨率,需要增加投影条纹幅数,投射条纹宽度更小的格雷码条纹图,但条纹宽度过小会导致格雷码条纹的边缘效应,从而引起解码误差.

正弦光栅条纹投影克服了格雷码空间离散划分的缺点,成为使用率最高的结构光类型之一. 众所周知,对于复杂外形,如有空洞、阶梯、遮挡等,采用正弦单频相移法条纹投影时,存在相位解包裹难题.另外为了能够从系列条纹图中求出相位绝对值,需要在条纹中插入特征点,比如一个点、一条线作为参考相位点,但是这些点或线特征标志有可能投影在物体的遮挡或阴影区域,或受到环境光等干扰等,发生丢失,影响测量结果的准确性. 因此,对于复杂轮廓的物体,常采用多频相移技术.

三维重建的格雷码-相移光编码技术研究
Pattern codification strategies in structured light systems
Binary coded linear fringes for three-dimensional shape profiling
3D shape measurement based on complementary Gray-code light
Phase shifting algorithms for fringe projection profilometry: a review
Overview of the 3D profilometry of phase shifting fringe projection
Temporal phase unwrapping algorithms for fringe projection profilometry:a comparative review
A multi-frequency inverse-phase error compensation method for projectornon linear in3D shape measurement

偏折法成像

对于粗糙表面,结构光可以直接投射到物体表面进行视觉成像测量;但对于大反射率光滑表面和镜面物体3D 测量,结构光投影不能直接投射到被测表面,3D测量还需要借助镜面偏折技术 .

Principles of shape from specular reflection
Deflectometry: 3D-metrology from nanometer to meter
Three-dimensional shape measurement of a highly reflected specular surface with structured light method
Three-dimensional shape measurements of specular objects using phase-measuring deflectometry

由于单次投影曝光和测量时间短,抗振动性能好,适合运动物体的3D测量,如机器人实时运动引导,手眼机器人对生产线上连续运动产品进行抓取等操作. 但深度垂直方向上的空间分辨率受到目标视场、镜头倍率和相机像素等因素的影响,大视场情况下不容易提升.

多次投影方法(如多频条纹方法)具有较高空间分辨率,能有效地解决表面斜率阶跃变化和空洞等难题. 不足之处在于:① 对于连续相移投影方法,3D重构的精度容易受到投影仪、相机的非线性和环境变化的影响;②抗振动性能差,不合适测量连续运动的物体;③在 Eye-in-Hand视觉导引系统中,机械臂不易在连续运动时进行3D成像和引导;④实时性差,不过随着投影仪投射频率和 CCD/CMOS图像传感器采集速度的提高,多次投影方法实时3D 成像的性能也在逐步改进.

偏折法对于复杂面型的测量,通常需要借助多次投影方法,因此具有多次投影方法相同的缺点.另外偏折法对曲率变化大的表面测量有一定的难度,因为条纹偏折后的反射角的变化率是被测表面曲率变化率的２倍,因此对被测物体表面的曲率变化比较敏感,很容易产生遮挡难题.

Other Papers

Code

扫描3D成像

扫描3D成像方法可分为扫描测距、主动三角法、色散共焦法等。扫描3D成像的最大优点是测量精度高,其中色散共焦法还有其他方法难以比拟的优点,即非常适合测量透明物体、高反与光滑表面的物体. 但缺点是速度慢、效率低;当用于机械手臂末端时,可实现高精度3D测量,但不适合机械手臂实时3D引导与定位,因此应用场合有限;另外主动三角扫描在测量复杂结构形貌时容易产生遮挡,需要通过合理规划末端路径与姿态来解决.

扫描测距

扫描测距是利用一条准直光束通过一维测距扫描整个目标表面实现3D测量，主要包括：单点飞行时间法、激光散射干涉法、共焦法。

单点测距扫描3D方法中,单点飞行时间法适合远距离扫描,测量精度较低,一般在毫米量级. 其他几种单点扫描方法有:单点激光干涉法、共焦法和单点激光主动三角法,测量精度较高,但前者对环境要求高;线扫描精度适中,效率高. 比较适合于机械手臂末端执行3D测量的应是主动激光三角法和色散共焦法.

Paper

Active optical range imaging sensor
Active and passive range sensing for robotics

主动三角法

主动三角法是基于三角测量原理,利用准直光束、一条或多条平面光束扫描目标表面完成3D测量的. 光束常采用以下方式获得:激光准直、圆柱或二次曲面柱形棱角扩束,非相干光(如白光、LED 光源)通过小孔、狭缝(光栅)投影或相干光衍射等. 主动三角法可分为三种类型:单点扫描、单线扫描和多线扫描

Paper

Review of different 3D scanners and scanning techniques
3D metrology using a collaborative robot with a laser triangulation sensor
Introductory review on Flying Triangulation a motion-robust optical 3D measurement principle
Flying triangulation an optical 3D sensor for the motion-robust acquisition of complex object
Hand-Guided 3D Surface Acquisition by Combining Simple Light Sectioning with Real-Time Algorithms

色彩共焦法

色散共焦似乎可以扫描测量粗糙和光滑的不透明和透明物体,如反射镜面、透明玻璃面等,目前在手机盖板三维检测等领域广受欢迎。色散共焦扫描有三种类型:单点一维绝对测距扫描、多点阵列扫描和连续线扫描。

Paper

Spectral characteristics of chromatic confocal imaging systems
Spectrally multiplexed chromatic confocal multipoint sensing
Chromatic confocal matrix sensor with actuated pinhole arrays
Multiplex acquisition approach for high speed 3d measurements with a chromatic confocal microscope
Fast 3D in line-sensor for specular and diffuse surfaces combining the chromatic confocal and triangulation principle
Single-shot depth-section imaging through chromatic slit-scan confocal microscopy
Three-dimensional surface profile measurement using a beam scanning chromatic confocal microscope

立体视觉3D成像

立体视觉字面意思是用一只眼睛或两只眼睛感知三维结构,一般情况下是指从不同的视点获取两幅或多幅图像重构目标物体3D结构或深度信息. 深度感知视觉线索可分为 Monocular cues 和 Binocular cues(双目视差). 目前立体视觉3D 可以通过单目视觉、双目视觉、多 (目) 视觉、光场3D 成像(电子复眼或阵列相机)实现.

书籍

机器视觉 Robot Vision

教程

综述

单目视觉成像

单目视觉深度感知线索通常有:透视、焦距差异、多视觉成像、覆盖、阴影、运动视差等.

Depth map extracting based on geometric perspective an applicable２D to３D conversion technology
Focus cues affect perceived depth
3D image acquisition system based on shape from focus technique
Multi-view stereo: a tutorial
3D reconstruction from multiple images part1 principles
Three-dimensional reconstruction of hybrid surfaces using perspective shape from shading
Numerical methods for shape-from-shading a new survey with benchmarks
The neural basis of depth perception from motion parallax
Motion parallax in stereo 3D
3D image sensor based on parallax motion

双目视觉

在机器视觉里利用两个相机从两个视点对同一个目标场景获取两个视点图像,再计算两个视点图像中同名点的视差获得目标场景的3D深度信息. 典型的双目立体视觉计算过程包含下面四个步骤:图像畸变矫正、立体图像对校正、图像配准和三角法重投影视差图计算.

双目视觉的难点：

1、光照敏感，被动光

2、双目视觉系统估计视差没那么容易，立体匹配是计算机视觉典型的难题，基线宽得到远目标测距准，而基线短得到近目标测距结果好。谈到双目系统的难点，除了立体匹配，还有标定。标定后的系统会出现“漂移”的，所以在线标定是必须具有的。

综述

双目立体视觉匹配技术综述

视差和深度计算

Real-time depth computation using stereo imaging
Binocular disparity and the perception of depth
Fast Stereo Disparity Maps Refinement By Fusion of Data-Based And Model-Based Estimations

立体匹配

匹配方法分两种，全局法和局部法，实用的基本是局部法，因为全局法太慢。

（一）基于全局约束的立体匹配算法：在本质上属于优化算法，它是将立体匹配问题转化为寻找全局能量函数的最优化问题，其代表算法主要有图割算法、置信度传播算法和协同优化算法等．全局算法能够获得较低的总误匹配率，但算法复杂度较高,很难满足实时的需求，不利于在实际工程中使用，常见的算法有DP、BP 等。

（二）基于局部约束的立体匹配算法：主要是利用匹配点周围的局部信息进行计算，由于其涉及到的信息量较少，匹配时间较短，因此受到了广泛关注，其代表算法主要有 SAD、SSD、ZSAD、NCC等。

多目视觉

多(目)视觉成像,也称多视点立体成像,用单个或多个相机从多个视点获取同一个目标场景的多幅图像,重构目标场景的三维信息.

Adaptive structure from motion with a contrario model estimation
A comparison and evaluation of multi-view stereo reconstruction algorithms
Multiple view geometry in computer vision

光场成像

光场3D成像的原理与传统 CCD和 CMOS相机成像原理在结构原理上有所差异,传统相机成像是光线穿过镜头在后续的成像平面上直接成像,一般是2D图像;光场相机成像是在传感器平面前增加了一个微透镜阵列,将经过主镜头入射的光线再次穿过每个微透镜,由感光阵列接收,从而获得光线的方向与位置信息,使成像结果可在后期处理,达到先拍照,后聚焦的效果.

光场相机的优点是:单个相机可以进行3D成像,横向和深度方向的空间分辨率可以达到20μm到 mm 量级,景深比普通相机大好几倍,比较适合Eye-in-Hand系统3D测量与引导,但目前精度适中的商业化光场相机价格昂贵.

Light field imaging models calibrations reconstructions and applications
Extracting depth information from stereo vision system using a correlation and a feature based methods
基于微透镜阵列型光场相机的多目标快速测距方法
基于光场相机的四维光场图像水印及质量评价
基于光场相机的深度面光场计算重构
光场相机视觉测量误差分析
一种基于光场图像的聚焦光场相机标定方法
光场相机成像模型及参数标定方法综述

SFM

Structure from Motion（SfM）是一个估计相机参数及三维点位置的问题。一个基本的SfM pipeline可以描述为:对每张2维图片检测特征点（feature point），对每对图片中的特征点进行匹配，只保留满足几何约束的匹配，最后执行一个迭代式的、鲁棒的SfM方法来恢复摄像机的内参（intrinsic parameter）和外参(extrinsic parameter)。并由三角化得到三维点坐标，然后使用Bundle Adjustment进行优化。

SFM（Structure From Motion），主要基于多视觉几何原理，用于从运动中实现3D重建，也就是从无时间序列的2D图像中推算三维信息，是计算机视觉学科的重要分支。

使用同一相机在其内参数不变的条件下,从不同视点获取多幅图像,重构目标场景的三维信息. 该技术常用于跟踪目标场景中大量的控制点,连续恢复场景3D结构信息、相机的姿态和位置.

SfM方法可以分为增量式（incremental/sequential SfM）,全局式（global SfM），混合式（hybrid SfM）,层次式（hierarchica SfM）。另外有基于语义的SfM(Semantic SfM)和基于Deep learning的SfM。

Incremental SfM

Global SfM

Hierarchical SfM

Multi-Stage SfM

Non Rigid SfM

参考

基于单目视觉的三维重建算法综述

Turtorial

Open Source Structure-from-Motion. M. Leotta, S. Agarwal, F. Dellaert, P. Moulon, V. Rabaud. CVPR 2015 Tutorial (material).
Large-scale 3D Reconstruction from Images](https://home.cse.ust.hk/~tshenaa/sub/ACCV2016/ACCV_2016_Tutorial.html). T. Shen, J. Wang, T.Fang, L. Quan. ACCV 2016 Tutorial.

Incremental SfM

增量式SfM首先使用SIFT特征检测器提取特征点并计算特征点对应的描述子（descriptor），然后使用ANN（approximate nearest neighbor）方法进行匹配，低于某个匹配数阈值的匹配对将会被移除。对于保留下来的匹配对，使用RANSAC和八点法来估计基本矩阵（fundamental matrix），在估计基本矩阵时被判定为外点（outlier）的匹配被看作是错误的匹配而被移除。对于满足以上几何约束的匹配对，将被合并为tracks。然后通过incremental方式的SfM方法来恢复场景结构。首先需要选择一对好的初始匹配对，一对好的初始匹配对应该满足：

（1）足够多的匹配点；

（2）宽基线。之后增量式地增加摄像机，估计摄像机的内外参并由三角化得到三维点坐标，然后使用Bundle Adjustment进行优化。

增量式SfM从无序图像集合计算三维重建的常用方法，增量式SfM可分为如图 3所示几个阶段：图像特征提取、特征匹配、几何约束、重建初始化、图像注册、三角化、outlier过滤、Bundle adjustment等步骤。

增量式SfM优势：系统对于特征匹配以及外极几何关系的外点比较鲁棒，重讲场景精度高；标定过程中通过RANSAC不断过滤外点；捆绑调整不断地优化场景结构。

增量式SfM缺点：对初始图像对选择及摄像机的添加顺序敏感；场景漂移，大场景重建时的累计误差。效率不足，反复的捆绑调整需要大量的计算时间。

实现增量式SfM框架的包含COLMAP、openMVG、Theia等

Photo Tourism: Exploring Photo Collections in 3D. N. Snavely, S. M. Seitz, and R. Szeliski. SIGGRAPH 2006.
Towards linear-time incremental structure from motion. C. Wu. 3DV 2013.
Structure-from-Motion Revisited. Schöenberger, Frahm. CVPR 2016.

Global SfM

全局式：估计所有摄像机的旋转矩阵和位置并三角化初始场景点。

优势：将误差均匀分布在外极几何图上，没有累计误差。不需要考虑初始图像和图像添加顺序的问题。仅执行一次捆绑调整，重建效率高。

缺点：鲁棒性不足，旋转矩阵求解时L1范数对外点相对鲁棒，而摄像机位置求解时相对平移关系对匹配外点比较敏感。场景完整性，过滤外极几何边，可能丢失部分图像。

Combining two-view constraints for motion estimation V. M. Govindu. CVPR, 2001.
Lie-algebraic averaging for globally consistent motion estimation. V. M. Govindu. CVPR, 2004.
Robust rotation and translation estimation in multiview reconstruction. D. Martinec and T. Pajdla. CVPR, 2007.
Non-sequential structure from motion. O. Enqvist, F. Kahl, and C. Olsson. ICCV OMNIVIS Workshops 2011.
Global motion estimation from point matches. M. Arie-Nachimson, S. Z. Kovalsky, I. KemelmacherShlizerman, A. Singer, and R. Basri. 3DIMPVT 2012.
Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion. P. Moulon, P. Monasse and R. Marlet. ICCV 2013.
A Global Linear Method for Camera Pose Registration. N. Jiang, Z. Cui, P. Tan. ICCV 2013.
Global Structure-from-Motion by Similarity Averaging. Z. Cui, P. Tan. ICCV 2015.
Linear Global Translation Estimation from Feature Tracks Z. Cui, N. Jiang, C. Tang, P. Tan, BMVC 2015.

混合式

混合式SfM[5]在一定程度上综合了incremental SfM和global SfM各自的优点。HSfM的整个pipeline可以概括为全局估计摄像机旋转矩阵，增量估计摄像机位置，三角化初始场景点。

用全局的方式提出一种基于社区的旋转误差平均法，该方法既考虑了对极几何的精度又考虑了成对几何的精度。基于已经估计的相机的绝对旋转姿态，用一种增量的方式估计相机光心位置。对每个添加的相机，其旋转和内参保持不变，同时使用改进的BA细化光心和场景结构。

层次式SfM同样借鉴incremental SfM和global SfM各自优势，但是基于分段式的incremental SfM和全局式SfM，没有像混合式SfM分成两个阶段进行。

SfM中我们用来做重建的点是由特征匹配提供的，所以SfM获得特征点的方式决定了它不可能直接生成密集点云。而MVS则几乎对照片中的每个像素点都进行匹配，几乎重建每一个像素点的三维坐标，这样得到的点的密集程度可以较接近图像为我们展示出的清晰度。

Hierarchical SfM

Structure-and-Motion Pipeline on a Hierarchical Cluster Tree. A. M.Farenzena, A.Fusiello, R. Gherardi. Workshop on 3-D Digital Imaging and Modeling, 2009.
Randomized Structure from Motion Based on Atomic 3D Models from Camera Triplets. M. Havlena, A. Torii, J. Knopp, and T. Pajdla. CVPR 2009.
Efficient Structure from Motion by Graph Optimization. M. Havlena, A. Torii, and T. Pajdla. ECCV 2010.
Hierarchical structure-and-motion recovery from uncalibrated images. Toldo, R., Gherardi, R., Farenzena, M. and Fusiello, A.. CVIU 2015.

Multi-Stage SfM

Parallel Structure from Motion from Local Increment to Global Averaging. S. Zhu, T. Shen, L. Zhou, R. Zhang, J. Wang, T. Fang, L. Quan. arXiv 2017.
Multistage SFM : Revisiting Incremental Structure from Motion. R. Shah, A. Deshpande, P. J. Narayanan. 3DV 2014. -> Multistage SFM: A Coarse-to-Fine Approach for 3D Reconstruction, arXiv 2016.
HSfM: Hybrid Structure-from-Motion. H. Cui, X. Gao, S. Shen and Z. Hu, ICCV 2017.

Non Rigid SfM

Robust Structure from Motion in the Presence of Outliers and Missing Data. G. Wang, J. S. Zelek, J. Wu, R. Bajcsy. 2016.

Project&code

Project	Language	License
Bundler	C++	GNU General Public License - contamination
Colmap	C++	BSD 3-clause license - Permissive
TeleSculptor	C++	BSD 3-Clause license - Permissive
MicMac	C++	CeCILL-B
MVE	C++	BSD 3-Clause license + parts under the GPL 3 license
OpenMVG	C++	MPL2 - Permissive
OpenSfM	Python	Simplified BSD license - Permissive
TheiaSfM	C++	New BSD license - Permissive

TOF

飞行时间 (TOF) 相机每个像素利用光飞行的时间差来获取物体的深度。TOF成像可用于大视野、远距离、低精度、低成本的3D图像采集. 其特点是:检测速度快、视野范围较大、工作距离远、价格便宜,但精度低,易受环境光的干扰。

教程

Paper

https://arxiv.org/pdf/1511.07212.pdf)

Multi-view Stereo

多视角立体视觉（Multiple View Stereo，MVS）是对立体视觉的推广，能够在多个视角（从外向里）观察和获取景物的图像，并以此完成匹配和深度估计。某种意义上讲，SLAM/SFM其实和MVS是类似的，只是前者是摄像头运动，后者是多个摄像头视角（可以是单相机的多个视角图像，也可以是多相机的多视角图像）。也可以说，前者可以在环境里面“穿行”，而后者更像在环境外“旁观”。

多视角立体视觉的pipelines如下：

收集图像；

针对每个图像计算相机参数；

从图像集和相应的摄像机参数重建场景的3D几何图形；

可选择地重建场景的形状和纹理颜色。

参考链接：多视角立体视觉MVS简介

paper

综述

Multi-view stereo: A tutorial
State of the Art 3D Reconstruction Techniques N. Snavely, Y. Furukawa, CVPR 2014 tutorial slides. Introduction MVS with priors - Large scale MVS

Point cloud computation（点云计算）

Accurate, Dense, and Robust Multiview Stereopsis. Y. Furukawa, J. Ponce. CVPR 2007. PAMI 2010
State of the art in high density image matching. F. Remondino, M.G. Spera, E. Nocerino, F. Menna, F. Nex . The Photogrammetric Record 29(146), 2014.
Progressive prioritized multi-view stereo. A. Locher, M. Perdoch and L. Van Gool. CVPR 2016.
Pixelwise View Selection for Unstructured Multi-View Stereo. J. L. Schönberger, E. Zheng, M. Pollefeys, J.-M. Frahm. ECCV 2016.
TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo. A. Romanoni, M. Matteucci. ICCV 2019

Surface computation & refinements（曲面计算与优化）

Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts. P. Labatut, J-P. Pons, R. Keriven. ICCV 2007
Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh. S. N. Sinha, P. Mordohai and M. Pollefeys. ICCV 2007.
Towards high-resolution large-scale multi-view stereo. H.-H. Vu, P. Labatut, J.-P. Pons, R. Keriven. CVPR 2009.
Refinement of Surface Mesh for Accurate Multi-View Reconstruction. R. Tylecek and R. Sara. IJVR 2010.
High Accuracy and Visibility-Consistent Dense Multiview Stereo. H.-H. Vu, P. Labatut, J.-P. Pons, R. Keriven. Pami 2012.
Exploiting Visibility Information in Surface Reconstruction to Preserve Weakly Supported Surfaces M. Jancosek et al. 2014.
A New Variational Framework for Multiview Surface Reconstruction. B. Semerjian. ECCV 2014.
Photometric Bundle Adjustment for Dense Multi-View 3D Modeling. A. Delaunoy, M. Pollefeys. CVPR2014.
Global, Dense Multiscale Reconstruction for a Billion Points. B. Ummenhofer, T. Brox. ICCV 2015.
Efficient Multi-view Surface Refinement with Adaptive Resolution Control. S. Li, S. Yu Siu, T. Fang, L. Quan. ECCV 2016.
Multi-View Inverse Rendering under Arbitrary Illumination and Albedo, K. Kim, A. Torii, M. Okutomi, ECCV2016.
Shading-aware Multi-view Stereo, F. Langguth and K. Sunkavalli and S. Hadap and M. Goesele, ECCV 2016.
Scalable Surface Reconstruction from Point Clouds with Extreme Scale and Density Diversity, C. Mostegel, R. Prettenthaler, F. Fraundorfer and H. Bischof. CVPR 2017.
Multi-View Stereo with Single-View Semantic Mesh Refinement, A. Romanoni, M. Ciccone, F. Visin, M. Matteucci. ICCVW 2017

Machine Learning based MVS

Matchnet: Unifying feature and metric learning for patch-based matching, X. Han, Thomas Leung, Y. Jia, R. Sukthankar, A. C. Berg. CVPR 2015.
Stereo matching by training a convolutional neural network to compare image patches, J., Zbontar, and Y. LeCun. JMLR 2016.
Efficient deep learning for stereo matching, W. Luo, A. G. Schwing, R. Urtasun. CVPR 2016.
Learning a multi-view stereo machine, A. Kar, C. Häne, J. Malik. NIPS 2017.
Learned multi-patch similarity, W. Hartmann, S. Galliani, M. Havlena, L. V. Gool, K. Schindler.I CCV 2017.
Surfacenet: An end-to-end 3d neural network for multiview stereopsis, Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L. ICCV2017.
DeepMVS: Learning Multi-View Stereopsis, Huang, P. and Matzen, K. and Kopf, J. and Ahuja, N. and Huang, J. CVPR 2018.
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials, D. Paschalidou and A. O. Ulusoy and C. Schmitt and L. Gool and A. Geiger. CVPR 2018.
MVSNet: Depth Inference for Unstructured Multi-view Stereo, Y. Yao, Z. Luo, S. Li, T. Fang, L. Quan. ECCV 2018.
Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency, T. Khot, S. Agrawal, S. Tulsiani, C. Mertz, S. Lucey, M. Hebert. 2019.
DPSNET: END-TO-END DEEP PLANE SWEEP STEREO, Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon. 2019.
Point-based Multi-view Stereo Network, Rui Chen, Songfang Han, Jing Xu, Hao Su. ICCV 2019.

Multiple View Mesh Texturing（多视图网格纹理）

Seamless image-based texture atlases using multi-band blending. C. Allène, J-P. Pons and R. Keriven. ICPR 2008.
Let There Be Color! - Large-Scale Texturing of 3D Reconstructions. M. Waechter, N. Moehrle, M. Goesele. ECCV 2014

Texture Mapping(纹理贴图)

3D Textured Model Encryption via 3D Lu Chaotic Mapping

Courses

Image Manipulation and Computational Photography - Alexei A. Efros (UC Berkeley)
Computational Photography - Alexei A. Efros (CMU)
Computational Photography - Derek Hoiem (UIUC)
Computational Photography - James Hays (Brown University)
Digital & Computational Photography - Fredo Durand (MIT)
Computational Camera and Photography - Ramesh Raskar (MIT Media Lab)
Computational Photography - Irfan Essa (Georgia Tech)
Courses in Graphics - Stanford University
Computational Photography - Rob Fergus (NYU)
Introduction to Visual Computing - Kyros Kutulakos (University of Toronto)
Computational Photography - Kyros Kutulakos (University of Toronto)
Computer Vision for Visual Effects - Rich Radke (Rensselaer Polytechnic Institute)
Introduction to Image Processing - Rich Radke (Rensselaer Polytechnic Institute)

Software

MATLAB Functions for Multiple View Geometry
Peter Kovesi's Matlab Functions for Computer Vision and Image Analysis
OpenGV - geometric computer vision algorithms
MinimalSolvers - Minimal problems solver
Multi-View Environment
Visual SFM
Bundler SFM
openMVG: open Multiple View Geometry - Multiple View Geometry; Structure from Motion library & softwares
Patch-based Multi-view Stereo V2
Clustering Views for Multi-view Stereo
Floating Scale Surface Reconstruction
Large-Scale Texturing of 3D Reconstructions
Multi-View Stereo Reconstruction

Project&code

Project	Language	License
Colmap	C++ CUDA	BSD 3-clause license - Permissive (Can use CGAL -> GNU General Public License - contamination)
GPUIma + fusibile	C++ CUDA	GNU General Public License - contamination
HPMVS	C++	GNU General Public License - contamination
MICMAC	C++	CeCILL-B
MVE	C++	BSD 3-Clause license + parts under the GPL 3 license
OpenMVS	C++ (CUDA optional)	AGPL3
PMVS	C++ CUDA	GNU General Public License - contamination
SMVS Shading-aware Multi-view Stereo	C++	BSD-3-Clause license

深度相机三维重建

主要是基于Kinect这类深度相机进行三维重建，包括KinectFusion、Kintinuous，ElasticFusion、InfiniTAM，BundleFusion

基于线条/面的三维重建

Surface Reconstruction from 3D Line Segments

Planar Reconstruction

参考：https://github.com/BigTeacher-777/Awesome-Planar-Reconstruction

Papers

[PlaneRCNN] PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image [CVPR2019(Oral)][Pytorch]
[PlanarReconstruction] Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding [CVPR2019][Pytorch]
[Planerecover] Recovering 3D Planes from a Single Image via Convolutional Neural Networks [ECCV2018][Tensorflow]
[PlaneNet] PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image [CVPR2018][Tensorflow]

Datasets

ScanNet Dataset (PlaneNet) [Train][Test]
ScanNet Dataset (PlaneRCNN)[Link]
NYU Depth Dataset [Link]
SYNTHIA Dataset [Link]

3D人脸重建

1、Nonlinear 3D Face Morphable Model

2、On Learning 3D Face Morphable Model from In-the-wild Images

3、Cascaded Regressor based 3D Face Reconstruction from a Single Arbitrary View Image

4、JointFace Alignment and 3D Face Reconstruction

5、Photo-Realistic Facial Details Synthesis From Single Image

6、FML: Face Model Learning from Videos

7、Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric

8、Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network

9、Joint 3D Face Reconstruction and Dense Face Alignment from A Single Image with 2D-Assisted Self-Supervised Learning

10、Face Alignment Across Large Poses: A 3D Solution

纹理/材料分析与合成

Texture Synthesis Using Convolutional Neural Networks (2015)[Paper]
Two-Shot SVBRDF Capture for Stationary Materials (SIGGRAPH 2015) [Paper]
Reflectance Modeling by Neural Texture Synthesis (2016) [Paper]
Modeling Surface Appearance from a Single Photograph using Self-augmented Convolutional Neural Networks (2017)[Paper]
High-Resolution Multi-Scale Neural Texture Synthesis (2017) [Paper]
Reflectance and Natural Illumination from Single Material Specular Objects Using Deep Learning (2017) [Paper]
Joint Material and Illumination Estimation from Photo Sets in the Wild (2017) [Paper]
TextureGAN: Controlling Deep Image Synthesis with Texture Patches (2018 CVPR) [Paper]
Gaussian Material Synthesis (2018 SIGGRAPH) [Paper]
Non-stationary Texture Synthesis by Adversarial Expansion (2018 SIGGRAPH) [Paper]
Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients (2018 CVPR) [Paper]
LIME: Live Intrinsic Material Estimation (2018 CVPR) [Paper]
Learning Material-Aware Local Descriptors for 3D Shapes (2018) [Paper]

场景合成/重建

Make It Home: Automatic Optimization of Furniture Arrangement (2011, SIGGRAPH) [Paper]
Interactive Furniture Layout Using Interior Design Guidelines (2011) [Paper]
Synthesizing Open Worlds with Constraints using Locally Annealed Reversible Jump MCMC (2012) [Paper]
Example-based Synthesis of 3D Object Arrangements (2012 SIGGRAPH Asia) [Paper]
Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models (2013) [Paper]
Action-Driven 3D Indoor Scene Evolution (2016) [Paper]
The Clutterpalette: An Interactive Tool for Detailing Indoor Scenes (2015) [Paper]
Relationship Templates for Creating Scene Variations (2016) [Paper]
IM2CAD (2017) [Paper]
Predicting Complete 3D Models of Indoor Scenes (2017) [Paper]
Complete 3D Scene Parsing from Single RGBD Image (2017) [Paper]
Adaptive Synthesis of Indoor Scenes via Activity-Associated Object Relation Graphs (2017 SIGGRAPH Asia) [Paper]
Automated Interior Design Using a Genetic Algorithm (2017) [Paper]
SceneSuggest: Context-driven 3D Scene Design (2017) [Paper]
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition (2017)[Paper]
Human-centric Indoor Scene Synthesis Using Stochastic Grammar (2018, CVPR)[Paper] [Supplementary] [Code]
FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans (2018) [Paper] [Code]
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans (2018) [Paper]
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars (2018) [Paper]
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image (ECCV 2018) [Paper]
Automatic 3D Indoor Scene Modeling from Single Panorama (2018 CVPR) [Paper]
Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding (2019 CVPR) [Paper] [Code]
3D Scene Reconstruction with Multi-layer Depth and Epipolar Transformers (ICCV 2019) [Paper]

姿态估计

主要基于RGB、RGB-D以及点云数据，估计物体和相机/基准坐标系的关系。

主要有整体方式、霍夫投票方式、Keypoint-based方式和Dense Correspondence方式

标注工具

LabelFusion:https://github.com/RobotLocomotion/LabelFusion

实现方式不同

整体方式

整体方法直接估计给定图像中物体的三维位置和方向。经典的基于模板的方法构造刚性模板并扫描图像以计算最佳匹配姿态。这种手工制作的模板对集群场景不太可靠。最近，人们提出了一些基于深度神经网络的方法来直接回归相机或物体的6D姿态。然而，旋转空间的非线性使得数据驱动的DNN难以学习和推广。

Discriminative mixture-of-templates for viewpoint classification
Gradient response maps for realtime detection of textureless objects.
Comparing images using the hausdorff distance
Densefusion: 6d object pose estimation by iterative dense fusion
Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes.
Viewpoints and keypoints
Implicit 3d orientation learning for 6d object detection from rgb images.
Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views.

Keypoint-based方式

目前基于关键点的方法首先检测图像中物体的二维关键点，然后利用PnP算法估计6D姿态

Surf: Speeded up robust features.
Object recognition from local scaleinvariant features
3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints.
Stacked hourglass networks for human pose estimation
Making deep heatmaps robust to partial occlusions for 3d object pose estimation.
Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth
Real-time seamless single shot 6d object pose prediction.
Discovery of latent 3d keypoints via end-toend geometric reasoning.
Pvnet: Pixel-wise voting network for 6dof pose estimation.

Dense Correspondence and 霍夫投票方式

independent object class detection using 3d feature maps.
Depthencoded hough voting for joint object detection and shape recovery.
aware object detection and pose estimation.
Learning 6d object pose estimation using 3d object coordinates.
Global hypothesis generation for 6d object pose estimation.
Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation.
Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation.
Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation.
Normalized object coordinate space for categorylevel 6d object pose and size estimation.
Recovering 6d object pose and predicting next-bestview in the crowd.