Home

Awesome

Yandex Cup 2022: Like Prediction, 2nd place solution

This solution uses two-stage recommender system: candidate selection with different methods and ranking with GBDT.

Environment and running

Hardware

All experiments were run on a rig with 512GB RAM and A100 GPU. The most memory intense step is model training, takes ~250GB RAM at peak. GPU is only needed for fast calculation of co-occurence features with cudf, but it's possible to use pandas instead (set env USE_CUDF=0). Full pipeline with inference takes ~8 hours if executed consecutively with GPU.

Candidate selection

Next-item co-occurence

Smart co-occurence

Implicit BM25 (Item2Item)

Implicit ALS

Popular items

Last artist items

Features

Ranker

tl;dr - LightGBM with lambdarank objective. Some things to notice:

Final ensemble

Final submission is generated by blending 3 submission files with inverse rank blend (see blend.py for exmaple).

Features and LightGBM parameters were pretty much the same between all three models.

first (0.0849 lb, 0.0845 cv, 0.49 recall)

second (0.0854 lb, 0.0852 cv, 0.62 recall)

third (0.08608 lb, 0.0856 cv, 0.64 recall)

Bonus - DVC pipeline flow chart

flowchart LR
        node1["calculate_als_candidates@test"]
        node2["calculate_als_candidates@val"]
        node3["calculate_artist_candidates@test"]
        node4["calculate_artist_candidates@val"]
        node5["calculate_cooc_candidates@test"]
        node6["calculate_cooc_candidates@val"]
        node7["calculate_cooc_smart_candidates@test"]
        node8["calculate_cooc_smart_candidates@val"]
        node9["calculate_cooc_stats"]
        node10["calculate_cooc_stats_for_smart"]
        node11["calculate_popular_candidates@test"]
        node12["calculate_popular_candidates@val"]
        node13["calculate_similar_candidates@test"]
        node14["calculate_similar_candidates@val"]
        node15["create_artist_features"]
        node16["create_item_features"]
        node17["create_submission"]
        node18["create_submission_cv"]
        node19["create_user_artist_features@test"]
        node20["create_user_artist_features@val"]
        node21["create_user_features@test"]
        node22["create_user_features@val"]
        node23["create_user_history_als_features@test"]
        node24["create_user_history_als_features@val"]
        node25["create_user_history_artist_features@test"]
        node26["create_user_history_artist_features@val"]
        node27["create_user_history_cooc_features@test"]
        node28["create_user_history_cooc_features@val"]
        node29["create_user_history_features@test"]
        node30["create_user_history_features@val"]
        node31["create_user_history_similarity_features@test"]
        node32["create_user_history_similarity_features@val"]
        node33["merge_candidates@test"]
        node34["merge_candidates@val"]
        node35["merge_candidates_and_features@test"]
        node36["merge_candidates_and_features@val"]
        node37["prepare_data"]
        node38["split_test_by_chunks"]
        node39["train_als_candidates"]
        node40["train_cooc_candidates"]
        node41["train_lightgbm"]
        node42["train_lightgbm_cv"]
        node43["train_popular_candidates"]
        node44["train_similar_candidates"]
        node1-->node33
        node2-->node34
        node5-->node33
        node6-->node34
        node7-->node33
        node8-->node34
        node9-->node27
        node9-->node28
        node10-->node7
        node10-->node8
        node11-->node33
        node12-->node34
        node13-->node33
        node14-->node34
        node15-->node35
        node15-->node36
        node16-->node35
        node16-->node36
        node19-->node35
        node20-->node36
        node21-->node35
        node22-->node36
        node23-->node35
        node24-->node36
        node27-->node35
        node28-->node36
        node29-->node35
        node30-->node36
        node31-->node35
        node32-->node36
        node33-->node23
        node33-->node25
        node33-->node27
        node33-->node31
        node33-->node35
        node34-->node24
        node34-->node26
        node34-->node28
        node34-->node32
        node34-->node36
        node35-->node17
        node35-->node38
        node36-->node41
        node36-->node42
        node37-->node1
        node37-->node2
        node37-->node3
        node37-->node4
        node37-->node5
        node37-->node6
        node37-->node7
        node37-->node8
        node37-->node9
        node37-->node10
        node37-->node11
        node37-->node12
        node37-->node13
        node37-->node14
        node37-->node15
        node37-->node16
        node37-->node19
        node37-->node20
        node37-->node21
        node37-->node22
        node37-->node23
        node37-->node24
        node37-->node25
        node37-->node26
        node37-->node27
        node37-->node28
        node37-->node29
        node37-->node30
        node37-->node31
        node37-->node32
        node37-->node39
        node37-->node40
        node37-->node41
        node37-->node43
        node37-->node44
        node38-->node18
        node39-->node1
        node39-->node2
        node39-->node23
        node39-->node24
        node40-->node5
        node40-->node6
        node41-->node17
        node42-->node18
        node43-->node11
        node43-->node12
        node44-->node13
        node44-->node14
        node44-->node31
        node44-->node32