Post

Top Papers in Computer Vision, NLP, Speech, Multimodal AI, Core ML, RecSys, and Graph ML

๐Ÿ“ Top Papers in Computer Vision, NLP, Speech, Multimodal AI, Core ML, RecSys, and Graph ML โ€ข

Distilled AI : https://aman.ai/papers/ aman AI : https://aman.ai/

๐Ÿ‘‰๐Ÿผ Iโ€™ve put together a summary of key papers in ํ•ด์‹œํƒœ๊ทธ#AI and segregated them into (i) need-to-know and (ii) good-to-know.

๐Ÿ”น Vision

  • Image Classification (CNN architectures such as AlexNet, VGGNet, InceptionNet, ResNet to Transformer architectures such as ViT, DeiT, BEiT, MAE)
  • Object Detection (YOLO v1-v8, Fast/er R-CNN, Mask R-CNN, CenterNet, Pix2Seq, DETR, Detic, Focal Loss)
  • Semantic/Instance Segmentation (U-Net, Mask R-CNN, Segment Anything)
  • NeRF (InstantNeRF, BlockNeRF)
  • SSL Contrastive Learning (SimCLR, MoCo, DINO v1 & v2)

๐Ÿ”น NLP

  • Transformers (original paper)
  • Semantic Representation Encoders (BERT and its variants: RoBERTa, DistillBERT, ELECTRA, XLNet, MPNet, ALBERT)
  • Autoregressive Decoders (GPT-n, Llama 1/2/3, Alpaca, Vicuna)
  • Augmented LMs (RAG, Toolformer, HuggingGPT, Gorilla)
  • Supervised Fine-tuning (Instruction tuning/FLAN, LIMA, LESS)
  • LLM Alignment (RLHF/InstructGPT, PPO, DPO, KTO, GPO, IPO)
  • Encoder + Decoder Architectures (T0, T5, BART)
  • Machine Translation (M2M-100, NLLB-200)
  • Contrastive Learning (SNCSE, InfoNCE, Sentence-BERT)
  • Prompting (CoT, Auto-CoT, Self-Consistency, ToT, GoT, ReAct, APE, ART)
  • PEFT (Prefix-tuning, Adapters, LoRA, LLaMA-Adapter v1 and v2, QLoRA, QA-LoRA, DoRA, NOLA)

๐Ÿ”น Speech

  • SSL Pre-Training (WavLM, AudioMAE, HuBERT)
  • Automatic Speech Recognition/Keyword Spotting (GMM-HMM, DNN-HMM, all-neural architectures such as LAS/Whisper, streaming architectures such as RNN-T/Transformer-T)
  • Speaker Identification (i/d/x-vectors, GE2E loss, AAM loss)
  • Text-to-Speech (HiFi-GAN, Tacotron v1 and v2, Voicebox)
  • Text-to-Audio/Music (MusicGen, AudioGen)

๐Ÿ”น Multimodal

  • SSL Pre-Training (ViLT, MLIM, UNiTER, LXMERT, VisualBERT, Data2Vec v1 and v2, I-Code, VL-BEIT, ImageBind)
  • V+L Prompting (Flamingo, Frozen, InstructBLIP)
  • Text-to-Image (DALL-E 1/2/3, Imagen, Latent Diffusion, Make-A-Scene, Make-a-Video)
  • Translation (SeamlessM4T)
  • Contrastive Learning (InfoNCE, CLIP, CLAP, AudioCLIP)

๐Ÿ”น Core ML

  • Training Regularizer (Dropout)
  • Training/Inference Efficiency (ZeRO, ZeRO-Infinity, FlashAttention, FlashAttention-2)
  • Training Stability (Batch/Layer/Group/Instance Norm, Residual/Skip Connections)
  • Explainable AI (Guided Backprop, Grad-CAM, CAV, Influence functions, Representer points, TracIn)

๐Ÿ”น RecSys

  • ML-based Collaborative Filtering (Factorization Machines)
  • DL-based Algorithms (Collaborative Deep Learning, Wide & Deep, DNNs for YouTube Recommendations, Product-based DNNs, NCF, Deep & Cross v1 and v2, DeepFM, Deep Interest Network, Behavior Sequence Transformer)

๐Ÿ”น Graph ML

  • Factorization-based Algorithms LLE (LLE, LAP, HOPE)
  • Random Walk-based Algorithms (Node2vec)
  • Deep Learning-based Algorithms (SDNE, GraphSAGE, EGNN, GCN, GAT)

 Top Papers

Translate to Korean

๐Ÿ“ ์ปดํ“จํ„ฐ ๋น„์ „, NLP, ์Œ์„ฑ, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ AI, Core ML, RecSys ๋ฐ Graph ML ๋ถ„์•ผ์˜ ์ฃผ์š” ๋…ผ๋ฌธ โ€ข

Distilled AI : https://aman.ai/papers/ aman AI : https://aman.ai/

๐Ÿ‘‰๐Ÿผ ํ•ด์‹œํƒœ๊ทธ#AI ์˜ ์ฃผ์š” ๋…ผ๋ฌธ์„ ์š”์•ฝํ•˜์—ฌ (i) ์•Œ์•„์•ผ ํ•  ์‚ฌํ•ญ๊ณผ (ii) ์•Œ์•„๋‘๋ฉด ์ข‹์€ ๋‚ด์šฉ์œผ๋กœ ๊ตฌ๋ถ„ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”น ์‹œ๋ ฅ

  • ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜(AlexNet, VGGNet, InceptionNet, ResNet๊ณผ ๊ฐ™์€ CNN ์•„ํ‚คํ…์ฒ˜์—์„œ ViT, DeiT, BEiT, MAE์™€ ๊ฐ™์€ Transformer ์•„ํ‚คํ…์ฒ˜๊นŒ์ง€)
  • ๋ฌผ์ฒด ๊ฐ์ง€(YOLO v1-v8, Fast/er R-CNN, Mask R-CNN, CenterNet, Pix2Seq, DETR, Detic, Focal Loss)
  • ์˜๋ฏธ๋ก ์ /์ธ์Šคํ„ด์Šค ๋ถ„ํ• (U-Net, Mask R-CNN, Segment Anything)
  • NeRF (InstantNeRF, BlockNeRF)
  • SSL ๋Œ€์กฐ ํ•™์Šต(SimCLR, MoCo, DINO v1 ๋ฐ v2)

๐Ÿ”น NLP (์˜์–ด)

  • ๋ณ€์••๊ธฐ (์›๋ณธ ์šฉ์ง€)
  • ์˜๋ฏธ๋ก ์  ํ‘œํ˜„ ์ธ์ฝ”๋”(BERT ๋ฐ ๊ทธ ๋ณ€ํ˜•: RoBERTa, DistillBERT, ELECTRA, XLNet, MPNet, ALBERT)
  • ์ž๋™ ํšŒ๊ท€ ๋””์ฝ”๋”(GPT-n, Llama 1/2/3, Alpaca, Vicuna)
  • ์ฆ๊ฐ• LM(RAG, Toolformer, HuggingGPT, Gorilla)
  • ๊ฐ๋… ๋ฏธ์„ธ ์กฐ์ •(๋ช…๋ น ํŠœ๋‹/FLAN, LIMA, LESS)
  • LLM ์–ผ๋ผ์ธ๋จผํŠธ (RLHF/InstructGPT, PPO, DPO, KTO, GPO, IPO)
  • ์ธ์ฝ”๋” + ๋””์ฝ”๋” ์•„ํ‚คํ…์ฒ˜(T0, T5, BART)
  • ๊ธฐ๊ณ„ ๋ฒˆ์—ญ (M2M-100, NLLB-200)
  • ๋Œ€์กฐ ํ•™์Šต(SNCSE, InfoNCE, Sentence-BERT)
  • ํ”„๋กฌํ”„ํŠธ(CoT, Auto-CoT, Self-Consistency, ToT, GoT, ReAct, APE, ART)
  • PEFT(์ ‘๋‘์‚ฌ ํŠœ๋‹, ์–ด๋Œ‘ํ„ฐ, LoRA, LLaMA-์–ด๋Œ‘ํ„ฐ v1 ๋ฐ v2, QLoRA, QA-LoRA, DoRA, NOLA)

๐Ÿ”น ์—ฐ์„ค

  • SSL ์‚ฌ์ „ ๊ต์œก(WavLM, AudioMAE, HuBERT)
  • ์ž๋™ ์Œ์„ฑ ์ธ์‹/ํ‚ค์›Œ๋“œ ์ŠคํฌํŒ…(GMM-HMM, DNN-HMM, LAS/Whisper์™€ ๊ฐ™์€ ์ „์ฒด ์‹ ๊ฒฝ ์•„ํ‚คํ…์ฒ˜, RNN-T/Transformer-T์™€ ๊ฐ™์€ ์ŠคํŠธ๋ฆฌ๋ฐ ์•„ํ‚คํ…์ฒ˜)
  • ํ™”์ž ์‹๋ณ„(i/d/x-๋ฒกํ„ฐ, GE2E ์†์‹ค, AAM ์†์‹ค)
  • ํ…์ŠคํŠธ ์Œ์„ฑ ๋ณ€ํ™˜(HiFi-GAN, Tacotron v1 ๋ฐ v2, Voicebox)
  • ํ…์ŠคํŠธ-์˜ค๋””์˜ค/์Œ์•…(MusicGen, AudioGen)

๐Ÿ”น ๋ณตํ•ฉ

  • SSL ์‚ฌ์ „ ํ•™์Šต(ViLT, MLIM, UNiTER, LXMERT, VisualBERT, Data2Vec v1 ๋ฐ v2, I-Code, VL-BEIT, ImageBind)
  • V+L ํ”„๋กฌํ”„ํŠธ (Flamingo, Frozen, InstructBLIP)
  • ํ…์ŠคํŠธ-์ด๋ฏธ์ง€(DALL-E 1/2/3, ์˜์ƒ, ์ž ์žฌ ํ™•์‚ฐ, Make-A-Scene, Make-A-Video)
  • ๋ฒˆ์—ญ(SeamlessM4T)
  • ๋Œ€์กฐ ํ•™์Šต(InfoNCE, CLIP, CLAP, AudioCLIP)

๐Ÿ”น ์ฝ”์–ด ML

  • ๊ต์œก ์ •๊ทœํ™”๊ธฐ(๋“œ๋กญ์•„์›ƒ)
  • ํ›ˆ๋ จ/์ถ”๋ก  ํšจ์œจ์„ฑ(ZeRO, ZeRO-Infinity, FlashAttention, FlashAttention-2)
  • ํ•™์Šต ์•ˆ์ •์„ฑ(๋ฐฐ์น˜/๋ ˆ์ด์–ด/๊ทธ๋ฃน/์ธ์Šคํ„ด์Šค ํ‘œ์ค€, ์ž”์ฐจ/์Šคํ‚ต ์—ฐ๊ฒฐ)
  • ์„ค๋ช… ๊ฐ€๋Šฅํ•œ AI(์œ ๋„ ๋ฐฑํ”„๋กญ, Grad-CAM, CAV, ์˜ํ–ฅ๋ ฅ ๊ธฐ๋Šฅ, ๋ฐœํ‘œ์ž ํฌ์ธํŠธ, TracIn)

๐Ÿ”น ๋ ˆํฌ์‹œ์Šค

  • ML ๊ธฐ๋ฐ˜ ํ˜‘์—… ํ•„ํ„ฐ๋ง(Factorization Machine)
  • DL ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ (Collaborative Deep Learning, Wide & Deep, YouTube Recommendations์šฉ DNN, ์ œํ’ˆ ๊ธฐ๋ฐ˜ DNN, NCF, Deep & Cross v1 ๋ฐ v2, DeepFM, Deep Interest Network, Behavior Sequence Transformer)

๐Ÿ”น ๊ทธ๋ž˜ํ”„ ML

  • ์ธ์ˆ˜๋ถ„ํ•ด ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ LLE(LLE, LAP, HOPE)
  • ๋žœ๋ค ์›Œํฌ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜(Node2vec)
  • ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ (SDNE, GraphSAGE, EGNN, GCN, GAT)
This post is licensed under CC BY 4.0 by the author.