AI 개념/컴퓨터비전

컴퓨터비전 주요 논문 리스트업

AIstarter 2025. 5. 2. 00:28

컴퓨터비전 분야에서의 주요 논문을 한번에 정리하기 위한 공간입니다.

목차
1 이미지 분류(Image Classification)
2 비디오 분류(Video Classification)
3 RNN based Models
4 트랜스포머(Transformers)
5 객체 탐지(Obejet Detection)
6 세그멘테이션(Segmentation)
7 메트릭 러닝(Metric Learning)
8 멀티모달 학습(Multimodal Learning)
9 생성 모델(Generative Models)

 

📌 1. 이미지 분류 (Image Classification)

논문 제목 모델/키워드 링크
ImageNet Classification with Deep Convolutional Neural Networks AlexNet paper
Visualizing and Understanding Convolutional Networks ZFNet paper
Very Deep Convolutional Networks for Large-Scale Image Recognition VGG paper
Going Deeper with Convolutions GoogLeNet (Inception v1) paper
Deep Residual Learning for Image Recognition ResNet paper
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Inception-v2/v3/v4 paper

 

📌 2. 비디오 분류 (Video Classification)

논문 제목 모델/키워드 링크
Two-Stream Convolutional Networks for Action Recognition in Videos Two-stream paper
Convolutional Two-Stream Network Fusion for Video Action Recognition Two-stream fusion paper
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition TSN paper
Hidden Two-Stream Convolutional Networks for Action Recognition Hidden Two-Stream paper
Learning Spatiotemporal Features with 3D Convolutional Networks C3D paper
A Closer Look at Spatiotemporal Convolutions for Action Recognition R(2+1)D paper
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification T3D paper
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset I3D paper
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification S3D paper
SlowFast Networks for Video Recognition SlowFast paper
X3D: Expanding Architectures for Efficient Video Recognition X3D paper
A Multiscale Vision Transformer MViT paper

 

📌 3. 순환 신경망 및 비디오 모델 (RNN + Video)

논문 제목 모델/키워드 링크
Base RNN Models    
Learning representations by back-propagating errors RNN(BPTT) paper
Long Short-Term Memory LSTM paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation GRU paper
RNN-based Spatio-Temperal Modeling
Long-term Recurrent Convolutional Networks for Visual Recognition and Description LRCN paper
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting ConvLSTM paper
Show, Attend and Tell Visual Attention paper

 

📌 4. 트랜스포머 (Transformers)

논문 제목 모델/키워드 링크
Attention Is All You Need Transformer paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT paper
An Image is Worth 16x16 Words ViT paper
Training data-efficient image transformers & distillation DeiT paper
Swin Transformer Swin paper
TimeSformer TimeSFormer paper
ViViT: A Video Vision Transformer ViViT paper

 

📌 5. 객체 탐지 (Object Detection)

논문 제목 모델/키워드 링크
Rich feature hierarchies for accurate object detection and semantic segmentation R-CNN paper
Fast R-CNN Fast R-CNN paper
Faster R-CNN Faster R-CNN paper
You Only Look Once YOLO paper
SSD: Single Shot MultiBox Detector SSD paper
End-to-End Object Detection with Transformers DETR paper

 

📌 6. 세그멘테이션 (Segmentation)

논문 제목 모델/키워드 링크
Learning Deconvolution Network for Semantic Segmentation DeconvNet paper
U-Net: Convolutional Networks for Biomedical Image Segmentation U-Net paper
Mask R-CNN Mask R-CNN paper
Segmenter: Transformer for Semantic Segmentation Segmenter paper
DPT: Dense Prediction Transformer DPT paper

 

📌 7. 메트릭 러닝 (Metric Learning)

논문 제목 모델/키워드 링크
FaceNet: A Unified Embedding for Face Recognition and Clustering Triplet Loss paper
A Simple Framework for Contrastive Learning of Visual Representations SimCLR paper

 

📌 8. 멀티모달 학습 (Multimodal Learning)

논문 제목 모델/키워드 링크
ViLBERT: Pretraining Task-Agnostic V-L Representations ViLBERT paper
CLIP: Learning Transferable Visual Models From Natural Language Supervision CLIP paper
VATT: Learning Multimodal Representations with Video, Audio and Text VATT paper
VideoBERT: A Joint Model for Video and Language Representation Learning VideoBERT paper

 

📌 9. 생성 모델 (Generative Models)

논문 제목 모델/키워드 링크
Pixel Recurrent Neural Networks PixelRNN paper
Auto-Encoding Variational Bayes VAE paper
Generative Adversarial Nets GAN paper
Unsupervised Representation Learning with Deep Convolutional GANs DCGAN paper
Wasserstein GAN WGAN paper
Image-to-Image Translation with Conditional Adversarial Networks Pix2pix paper
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks CycleGAN paper
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation StarGAN paper
A Style-Based Generator Architecture for Generative Adversarial Networks StyleGAN paper
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF paper