Home

Awesome

awesome-local-global-descriptor

This is my personal note about local and global descriptor. Trying to make anyone can get in to these fields more easily. If you find anything you want to add, feel free to post on issue or email me.

<img src="https://github.com/shamangary/awesome-local-global-descriptor/blob/master/ur2kid_summary.png" height="540"/>

This repo is also a side product when I was doing the survey of our paper UR2KID. If you find this repo useful, please also consider to cite our paper.

@article{yang2020ur2kid,
  title={UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision},
  author={Yang*, Tsun-Yi; Nguyen*, Duy-Kien; Heijnen, Huub; Balntas, Vassileios},
  journal={arXiv preprint arXiv:2001.07252},
  year={2020}
}

This repo will be constantly updated.

Author: Tsun-Yi Yang (shamangary@hotmail.com)

Online talks

YearTopicLink
[ECCV20]MLAD Workshopmorning, afternoon
[3DV20]3DGV Talk: Marc Pollefeys - 3D geometric visionyoutube
[CVPR20]Image Matching Workshopyoutube
[CVPR20]CVPR2020 tutorial: Local Features: From SIFT to Differentiable Methodsyoutube
[CVPR20]Deep Visual SLAM Frontends: SuperPoint, SuperGlue, and SuperMapsyoutube

Local matching pipeline

In this section, I focus on the review about the sparse keypoint matching and it's pipeline.

1. Keypoint detection

This subsection includes the review about keypoint detection and it's orientation, scale, or affine transformation estimation.

YearPaperLinkCode
[CVPR20]Holistically-Attracted Wireframe ParsingarXivgithub
[CVPR20]KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent ObjectsarXivlink
[3DV19]SIPs: Succinct Interest Points from Unsupervised Inlierness Probability LearningarXivGithub
[ICCV19]Key.Net: Keypoint Detection by Handcrafted and Learned CNN FiltersPDFGithub
[ECCV18]Repeatability Is Not Enough: Learning Discriminative Affine Regions via DiscriminabilityarXivGithub
[CVPR17]Learning Discriminative and Transformation Covariant Local Feature DetectorsPDFGithub
[CVPR17]Quad-networks: unsupervised learning to rank for interest point detectionPDF-
[CVPR16]Learning to Assign Orientations to Feature Poitns-Github
[CVPR15]TILDE: a Temporally Invariant Learned DEtectorarXivGithub
YearPaperlinkCode
[ECCV20]DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalizationlinkgithub
[ICCV19]USIP: Unsupervised Stable Interest Point Detection from 3D Point CloudsarXivGithub
[arXiv19]Self-Supervised 3D Keypoint Learning for Ego-motion EstimationarXivGithub

2. Keypoint description (local descriptor)

In the last few decades, people focus on the patch descriptor

YearPaperlinkCode
[CVPR16]Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple ScalesPDFGithub
[CVPR15]Domain-Size Pooling in Local Descriptors: DSP-SIFTPDF-
[CVPR15]BOLD - Binary Online Learned Descriptor For Efficient Image MatchingPDFGithub
[CVPR13]Boosting binary keypoint descriptors--
[CVPR12]Freak: Fast retina keypoint--
[CVPR12]Three things everyone should know to improve object retrievalPDF-
[IPOL11]ASIFT: An Algorithm for Fully Affine Invariant Comparison--
[ICCV11]BRISK: Binary robust invariant scalable keypoints--
[ICCV11]Orb: An efficient alternative to sift or surf--
[ICCV11]Local inten-sity order pattern for feature description--
[CVIU06]Speeded-up robust features (SURF)--
[ECCV06]Surf:Speeded up robust features--
[IJCV04]Distinctive image features from scale-invariant keypoints-Github
YearPaperlinkCode
[TIP19]Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape RetrievalarXivGithub
[ICCV19]Beyond Cartesian Representations for Local DescriptorsPDF-
[CVPR19]SOSNet: Second Order Similarity Regularization for Local Descriptor LearningarXiv,PageGithub
[ECCV18]GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints-Github
[CVPR18]Local Descriptors Optimized for Average PrecisionPage-
[NIPS17]Working hard to know your neighbor's margins: Local descriptor learning lossarXivGithub
[ICCV17]DeepCD: Learning Deep Complementary Descriptors for Patch RepresentationsPDFGithub
[CVPR17]L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean SpacePDFGithub
[arXiv16]PN-Net: Conjoined Triple Deep Network for Learning Local Image DescriptorsarXivGithub
[BMVC16]Learning local feature descriptors with triplets and shallow convolutional neural networksPDFGithub
[ICCV15]Discriminative Learning of Deep Convolutional Feature Point DescriptorsPageGithub
[CVPR15]MatchNet: Unifying Feature and Metric Learning for Patch-Based MatchingPDF-
[CVPR15]Learning to compare image patches via convolutional neural networksPDFGithub
YearPaperlinkCode
[arXiv19]DEEPPOINT3D: LEARNING DISCRIMINATIVE LOCAL DESCRIPTORS USING DEEP METRIC LEARNING ON 3D POINT CLOUDSarXiv-

3. End-to-end matching pipeline

Recently, more and more papers try to embed the whole matching pipeline (keypoint detection, keypoint description) into one framework.

YearPaperlinkCode
[arXiv20]Dense Semantic 3D Map Based Long-Term Visual Localization with Hybrid FeaturesarXiv-
[arXiv20]D2D: Learning to find good correspondences for image matching and manipulationarXiv-
[arXiv20]DISK: Learning local features with policy gradientarXiv-
[arXiv20]D2D: Keypoint Extraction with Describe to Detect ApproacharXiv-
[arXiv20]HDD-Net: Hybrid Detector Descriptor with Mutual Interactive LearningarXiv-
[arXiv20]Learning Feature Descriptors using Camera Pose SupervisionarXiv-
[arXiv20]Efficient Neighbourhood Consensus Networks via Submanifold Sparse ConvolutionsarXivgithub
[arXiv20]S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature MatchingarXiv-
[CVPR20]ASLFeat: Learning Local Features of Accurate Shape and LocalizationarXivgithub,tfmatch
[CVPR20]Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level TaskarXiv-
[WACV19]DGC-Net: Dense Geometric Correspondence NetworkarXivgithub
[NIPS19]R2D2: Repeatable and Reliable Detector and DescriptorarXiv,PageGithub
[ICCV19]ELF: Embedded Localisation of Features in Pre-Trained CNNPDFGithub
[CVPR19]RF-Net: An End-to-End Image Matching Network based on Receptive FieldarXivGithub
[CVPR19]D2-Net: A Trainable CNN for Joint Description and Detection of Local FeaturesarXiv,PageGithub
[BMVC19]Matching Features without Descriptors: Implicitly Matched Interest PointsPDFgithub
[CVPRW18]SuperPoint: Self-Supervised Interest Point Detection and DescriptionarXivGithub,3rd_party
[NIPS18]LF-Net: Learning Local Features from ImagesPDFGithub
[ECCV16]LIFT: Learned Invariant Feature Points-Github
YearPaperlinkCode
[CVPR20]D3Feat: Joint Learning of Dense Detection and Description of 3D Local FeaturesarXivgithub
[arXiv20]StickyPillars: Robust feature matching on point clouds using Graph Neural NetworksarXiv-

3.5. Dense descriptor

Unlike local keypoint descriptor depends on keypoint, some works try to get the whole dense descriptor representation.

YearPaperlinkCode
[ICRA20]GN-Net: The Gauss-Newton Loss for Multi-Weather RelocalizationarXiv, MyNoteWeb
[ICCV17]CLKN: Cascaded Lucas-Kanade Networks for Image AlignmentPDF-

4. Geometric verification or learning based matcher

After the matching, standard RANSAC and it's variants are usually adopted for outlier removal.

YearPaperlinkCode
[ECCV20]Making Affine Correspondences Work in Camera Geometry ComputationarXivgithub
[arXiv20]AdaLAM: Revisiting Handcrafted Outlier DetectionarXivgithub
[arXiv20]Multi-View Optimization of Local Feature GeometryarXiv-
[CVPR19]MAGSAC: Marginalizing Sample ConsensusPDFGithub
[CVPR16]Progressive Feature Matching with Alternate Descriptor Selection and Correspondence EnrichmentPDF-
[CVPR13]Robust Feature Matching with Alternate Hough and Inverted Hough TransformsPDF-
[ECCV12]Improving Image-Based Localization by Active Correspondence SearchPDF-
[CVPR05]Matching with PROSAC – Progressive Sample ConsensusPDF-
[CVPR05]Two-View Geometry Estimation Unaffected by a Dominant PlanePDFGithub
YearPaperlinkCode
[ECCV20]Online Invariance Selection for Local Feature DescriptorsarXivgithub
[CVPR20]SuperGlue: Learning Feature Matching with Graph Neural NetworksarXivGithub
[CVPR20]High-dimensional Convolutional Networks for Geometric Pattern RecognitionarXiv, youtube-
[CVPR20]ACNe: Attentive Context Normalization for Robust Permutation-Equivariant LearningarXivgithub
[arXiv20]RANSAC-Flow: generic two-stage image alignmentarXiv, youtubepage,Github
[ICCV19]NG-RANSAC for Epipolar Geometry from Sparse CorrespondencesarXivGithub
[ICCV19]Learning Two-View Correspondences and Geometry Using Order-Aware NetworkarXivGithub
[CVPR18]Learning to Find Good Correspondences-Github
YearPaperlinkCode
[arXiv20]Deep Global RegistrationarXiv, youtube-
[Access18]Multi-Temporal Remote Sensing Image Registration Using Deep Convolutional FeaturesPDFGithub

Global retrieval

Consider global retrieval usually targets on a lot of candidates, there are several way to generate one single description for one image.

1. Feature aggregation

When there is only hand-crafted local descriptors, people usually uses feature aggregation from a set of local descriptors and output a single description.

YearPaperlinkCode
[ICCV13] <br> [IJCV15]To aggregate or not to aggregate: Selective match kernels for image search <br> Image search with selective match kernels: aggregation across single and multiple imagesICCV <br> IJCVOfficial : matlab, from DELF (tensorflow)
[CVPR13]All about VLADPDF-
[ECCV10]Improving the fisher kernel for large-scale image classificationPDF-
[CVPR07]Object retrieval with large vocabularies and fast spatial matchingPDF-
[CVPR06]Fisher kenrels on visual vocabularies for image categorizatonPDF-

Similar idea but use deep learning to adapt classical algorithm

YearPaperlinkCode
[ECCV16]CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples.PDF-
[CVPR16]NetVLAD: CNN architecture for weakly supervised place recognitionPageGithub

2. Real-valued descriptor

One single representation from the image.

YearPaperlinkCode
[ECCV20]Learning and aggregating deep local descriptors for instance-level recognitionarXivgithub
[ECCV20]Predicting Visual Overlap of Images Through Interpretable Non-Metric Box EmbeddingsarXivgithub
[ECCV20]Smooth-AP: Smoothing the Path Towards Large-Scale Image RetrievalarXivgithub
[ECCV20]SOLAR: Second-Order Loss and Attention for Image RetrievalarXiv-
[ECCV20]Unifying Deep Local and Global Features for Efficient Image SearcharXiv-
[arXiv19]ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrievalarXiv-
[TIP19]REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrievalarXiv-
[ICCV19]Learning with Average Precision: Training Image Retrieval with a Listwise LossarXivGithub
[CVPR19]Detect-to-Retrieve: Efficient Regional Aggregation for Image SearchPDFGithub
[TPAMI18]Fine-tuning CNN Image Retrieval with No Human AnnotationarXivGithub
[IJCV17]End-to-end Learning of Deep Visual Representations for Image RetrievalarXivGithub
[ICCV17]Large-Scale Image Retrieval with Attentive Deep Local Features-Github
[ECCV16]CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard ExamplesarXivGithub

3. Binary descriptor and quantization

For more compact representation, a binary descriptor can be generated from hashing or thresholding. Quantization is also very popular in large-scale image retrieval.

YearPaperlinkCode
[ICCVW19]DAME WEB: DynAmic MEan with Whitening Ensemble Binarization for Landmark Retrieval without Human AnnotationPDFGithub
[CVPR19]FastAP: Deep Metric Learning to RankPDFGithub
[CVPR18]Hashing as Tie-Aware Learning to RankPDFGithub
[AAAI18]Deep Region Hashing for Generic Instance Search from Image--
[TPAMI18]Supervised Learning of Semantics-Preserving Hash via Deep Convolutional NeuralNetworks--
[TPAMI13]Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image RetrievalPDF-
[TPAMI10]Product quantization for nearest neighbor searchPDF-

4. Pre-processing/Post-processing

Anything can boost the performance in the pre/post-processing stage such as rectification/re-ranking/query expansion.

YearPaperlinkCode
[arXiv20]Image Stylization for Robust FeaturesarXiv-
[ECCV20]Single-Image Depth Prediction Makes Feature Matching EasierarXivgithub
[CVPR19]Local features and visual words emerge in activationsPDF-
[CVPR12]Object retrieval and localization with spatially-constrained similarity measure and k-NN re-rankingPDF-

5. 3d point cloud

YearPaperlinkCode
[CVPR18]PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place RecognitionarXivGithub

Multi-tasking local and global descriptors

Some works try to cover both local descriptor and global retrieval due to the shared similarity about the activation and the applications.

YearPaperlinkCode
[arXiv20]UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence SupervisionarXiv-
[CVPR19]ContextDesc: Local Descriptor Augmentation with Cross-Modality Context-Github
[CVPR19]From Coarse to Fine: Robust Hierarchical Localization at Large Scale with HF-NetarXivGithub
[ICCV17]Large-Scale Image Retrieval with Attentive Deep Local Features (DELF)-Github

Reivew type paper

YearPaperlinkCode
[arXiv18]From handcrafted to deep local featuresarXiv-
[CVPR17]Comparative Evaluation of Hand-Crafted and Learned Local FeaturesPDF-

Metric learning

YearPaperlinkCode
[arXiv20]Metric learning: cross-entropy vs. pairwise lossesarXiv-
[arXiv19]A Metric Learning Reality CheckarXiv-

SfM

YearPaperlinkCode
[arXiv29]Reducing Drift in Structure from Motion using Extended FeaturesarXiv-

MVS

YearPaperlinkCode
[CVPR20]Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton RefinementarXivgithub
[CVPR20]BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo NetworksarXivgithub

View Synthesis/Novel view/Image completion

YearPaperlinkCode
[ECCV20]Flow-edge Guided Video CompletionarXivlink
[arXiv20]Reference Pose Generation for Visual Localization via Learned Features and View SynthesisarXiv-
[CVPR20]BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo NetworksarXivgithub

Segmentation localization

YearPaperlinkCode
[ICCV19]Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual LocalizationarXivgithub

Benchmarks

Local matching

YearPaperlinkCodeNote
[arXiv2020]Image Matching across Wide Baselines: From Paper to PracticearXivgithub
[CVPR17]HPatches: A benchmark and evaluation of handcrafted and learned local descriptorsarXivGithubHpatches
[TPAMI11]Discriminative learning of local image descriptorsPage-UBC/Brown dataset (subsets:Liberty (New York), Notre Dame (Paris) and Half Dome (Yosemite))
[CVPR08]On Benchmarking Camera Calibration and MultiView Stereo for High Resolution Imagery

Global retrieval

YearPaperlinkCodeNote
[CVPR18]Revisiting Oxford and Paris: Large-Scale Image Retrieval BenchmarkingPageGithubROxford5k, RParis6k
[CVPR07]Object retrieval with large vocabularies and fast spatial matchingPage-Oxford5k
[CVPR08]Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image DatabasesPage-Paris6k

Localization (both local matching and global retrieval)

YearPaperlinkCodeNote
[ECCV20]Map-based Localization for Autonomous Drivingwebgithub1, github2-
[CVPR18]Benchmarking 6DOF Outdoor Visual Localization in Changing ConditionsPDF,PageGithubAachen-day-night, Robotcar, CMU-seasons

Toolbox

YearPaperlink
[2020]Kapturegithub
[2020]hloc - the hierarchical localization toolboxgithub
[2020]pyslamv2github