AI 개념/컴퓨터비전
컴퓨터비전 주요 논문 리스트업
AIstarter
2025. 5. 2. 00:28
컴퓨터비전 분야에서의 주요 논문을 한번에 정리하기 위한 공간입니다.
목차 |
1 |
이미지 분류(Image Classification) |
2 |
비디오 분류(Video Classification) |
3 |
RNN based Models |
4 |
트랜스포머(Transformers) |
5 |
객체 탐지(Obejet Detection) |
6 |
세그멘테이션(Segmentation) |
7 |
메트릭 러닝(Metric Learning) |
8 |
멀티모달 학습(Multimodal Learning) |
9 |
생성 모델(Generative Models) |
📌 1. 이미지 분류 (Image Classification)
논문 제목 |
모델/키워드 |
링크 |
ImageNet Classification with Deep Convolutional Neural Networks |
AlexNet |
paper |
Visualizing and Understanding Convolutional Networks |
ZFNet |
paper |
Very Deep Convolutional Networks for Large-Scale Image Recognition |
VGG |
paper |
Going Deeper with Convolutions |
GoogLeNet (Inception v1) |
paper |
Deep Residual Learning for Image Recognition |
ResNet |
paper |
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning |
Inception-v2/v3/v4 |
paper |
📌 2. 비디오 분류 (Video Classification)
논문 제목 |
모델/키워드 |
링크 |
Two-Stream Convolutional Networks for Action Recognition in Videos |
Two-stream |
paper |
Convolutional Two-Stream Network Fusion for Video Action Recognition |
Two-stream fusion |
paper |
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition |
TSN |
paper |
Hidden Two-Stream Convolutional Networks for Action Recognition |
Hidden Two-Stream |
paper |
Learning Spatiotemporal Features with 3D Convolutional Networks |
C3D |
paper |
A Closer Look at Spatiotemporal Convolutions for Action Recognition |
R(2+1)D |
paper |
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification |
T3D |
paper |
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset |
I3D |
paper |
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification |
S3D |
paper |
SlowFast Networks for Video Recognition |
SlowFast |
paper |
X3D: Expanding Architectures for Efficient Video Recognition |
X3D |
paper |
A Multiscale Vision Transformer |
MViT |
paper |
📌 3. 순환 신경망 및 비디오 모델 (RNN + Video)
논문 제목 |
모델/키워드 |
링크 |
Base RNN Models |
|
|
Learning representations by back-propagating errors |
RNN(BPTT) |
paper |
Long Short-Term Memory |
LSTM |
paper |
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation |
GRU |
paper |
RNN-based Spatio-Temperal Modeling |
Long-term Recurrent Convolutional Networks for Visual Recognition and Description |
LRCN |
paper |
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting |
ConvLSTM |
paper |
Show, Attend and Tell |
Visual Attention |
paper |
📌 4. 트랜스포머 (Transformers)
논문 제목 |
모델/키워드 |
링크 |
Attention Is All You Need |
Transformer |
paper |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
BERT |
paper |
An Image is Worth 16x16 Words |
ViT |
paper |
Training data-efficient image transformers & distillation |
DeiT |
paper |
Swin Transformer |
Swin |
paper |
TimeSformer |
TimeSFormer |
paper |
ViViT: A Video Vision Transformer |
ViViT |
paper |
📌 5. 객체 탐지 (Object Detection)
논문 제목 |
모델/키워드 |
링크 |
Rich feature hierarchies for accurate object detection and semantic segmentation |
R-CNN |
paper |
Fast R-CNN |
Fast R-CNN |
paper |
Faster R-CNN |
Faster R-CNN |
paper |
You Only Look Once |
YOLO |
paper |
SSD: Single Shot MultiBox Detector |
SSD |
paper |
End-to-End Object Detection with Transformers |
DETR |
paper |
📌 6. 세그멘테이션 (Segmentation)
논문 제목 |
모델/키워드 |
링크 |
Learning Deconvolution Network for Semantic Segmentation |
DeconvNet |
paper |
U-Net: Convolutional Networks for Biomedical Image Segmentation |
U-Net |
paper |
Mask R-CNN |
Mask R-CNN |
paper |
Segmenter: Transformer for Semantic Segmentation |
Segmenter |
paper |
DPT: Dense Prediction Transformer |
DPT |
paper |
📌 7. 메트릭 러닝 (Metric Learning)
논문 제목 |
모델/키워드 |
링크 |
FaceNet: A Unified Embedding for Face Recognition and Clustering |
Triplet Loss |
paper |
A Simple Framework for Contrastive Learning of Visual Representations |
SimCLR |
paper |
📌 8. 멀티모달 학습 (Multimodal Learning)
논문 제목 |
모델/키워드 |
링크 |
ViLBERT: Pretraining Task-Agnostic V-L Representations |
ViLBERT |
paper |
CLIP: Learning Transferable Visual Models From Natural Language Supervision |
CLIP |
paper |
VATT: Learning Multimodal Representations with Video, Audio and Text |
VATT |
paper |
VideoBERT: A Joint Model for Video and Language Representation Learning |
VideoBERT |
paper |
📌 9. 생성 모델 (Generative Models)
논문 제목 |
모델/키워드 |
링크 |
Pixel Recurrent Neural Networks |
PixelRNN |
paper |
Auto-Encoding Variational Bayes |
VAE |
paper |
Generative Adversarial Nets |
GAN |
paper |
Unsupervised Representation Learning with Deep Convolutional GANs |
DCGAN |
paper |
Wasserstein GAN |
WGAN |
paper |
Image-to-Image Translation with Conditional Adversarial Networks |
Pix2pix |
paper |
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks |
CycleGAN |
paper |
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation |
StarGAN |
paper |
A Style-Based Generator Architecture for Generative Adversarial Networks |
StyleGAN |
paper |
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis |
NeRF |
paper |