Computer Vision - ECCV 2022 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVII / [electronic resource] :
edited by Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner.
- 1st ed. 2022.
- LVI, 753 p. 244 illus., 242 illus. in color. online resource.
- Lecture Notes in Computer Science, 13697 1611-3349 ; .
- Lecture Notes in Computer Science, 13697 .
Most and Least Retrievable Images in Visual-Language Query Systems -- Sports Video Analysis on Large-Scale Data -- Grounding Visual Representations with Texts for Domain Generalization -- Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions -- StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation -- VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance -- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation -- End-to-End Active Speaker Detection -- Emotion Recognition for Multiple Context Awareness -- Adaptive Fine-Grained Sketch-Based Image Retrieval -- Quantized GAN for Complex Music Generation from Dance Videos -- Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction -- Localizing Visual Sounds the Easy Way -- Learning Visual Styles from Audio-Visual Associations -- Remote Respiration Monitoring of Moving Person Using Radio Signals -- Camera Pose Estimation and Localization with Active Audio Sensing -- PACS: A Dataset for Physical Audiovisual Commonsense Reasoning -- VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer -- Telepresence Video Quality Assessment -- MultiMAE: Multi-modal Multi-task Masked Autoencoders -- AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation -- Audio-Visual Segmentation -- Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression -- Relationformer: A Unified Framework for Image-to-Graph Generation -- GAMa: Cross-view Video Geo-localization -- Revisiting a kNN-based Image Classification System with High-capacity Storage -- Geometric Representation Learning for Document Image Rectification -- S2-VER: Semi-Supervised Visual Emotion Recognition -- Image Coding for Machines with Omnipotent Feature Learning -- Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval -- Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition -- Semantic-Guided Multi-Mask Image Harmonization -- Learning an Isometric Surface Parameterization for Texture Unwrapping -- Towards Regression-Free Neural Networks for Diverse Compute Platforms -- Relationship Spatialization for Depth Estimation -- Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models -- FAR: Fourier Aerial Video Recognition -- Translating a Visual LEGO Manual to a Machine-Executable Plan -- Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder -- MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment -- The One Where They Reconstructed 3D Humans and Environments in TV Shows.
The 39-volume set, comprising the LNCS books 13661 until 13699, constitutes the refereed proceedings of the 17th European Conference on Computer Vision, ECCV 2022, held in Tel Aviv, Israel, during October 23-27, 2022. The 1645 papers presented in these proceedings were carefully reviewed and selected from a total of 5804 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.
9783031198366
10.1007/978-3-031-19836-6 doi
Computer vision.
Computer Vision.
TA1634
006.37
Most and Least Retrievable Images in Visual-Language Query Systems -- Sports Video Analysis on Large-Scale Data -- Grounding Visual Representations with Texts for Domain Generalization -- Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions -- StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation -- VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance -- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation -- End-to-End Active Speaker Detection -- Emotion Recognition for Multiple Context Awareness -- Adaptive Fine-Grained Sketch-Based Image Retrieval -- Quantized GAN for Complex Music Generation from Dance Videos -- Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction -- Localizing Visual Sounds the Easy Way -- Learning Visual Styles from Audio-Visual Associations -- Remote Respiration Monitoring of Moving Person Using Radio Signals -- Camera Pose Estimation and Localization with Active Audio Sensing -- PACS: A Dataset for Physical Audiovisual Commonsense Reasoning -- VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer -- Telepresence Video Quality Assessment -- MultiMAE: Multi-modal Multi-task Masked Autoencoders -- AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation -- Audio-Visual Segmentation -- Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression -- Relationformer: A Unified Framework for Image-to-Graph Generation -- GAMa: Cross-view Video Geo-localization -- Revisiting a kNN-based Image Classification System with High-capacity Storage -- Geometric Representation Learning for Document Image Rectification -- S2-VER: Semi-Supervised Visual Emotion Recognition -- Image Coding for Machines with Omnipotent Feature Learning -- Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval -- Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition -- Semantic-Guided Multi-Mask Image Harmonization -- Learning an Isometric Surface Parameterization for Texture Unwrapping -- Towards Regression-Free Neural Networks for Diverse Compute Platforms -- Relationship Spatialization for Depth Estimation -- Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models -- FAR: Fourier Aerial Video Recognition -- Translating a Visual LEGO Manual to a Machine-Executable Plan -- Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder -- MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment -- The One Where They Reconstructed 3D Humans and Environments in TV Shows.
The 39-volume set, comprising the LNCS books 13661 until 13699, constitutes the refereed proceedings of the 17th European Conference on Computer Vision, ECCV 2022, held in Tel Aviv, Israel, during October 23-27, 2022. The 1645 papers presented in these proceedings were carefully reviewed and selected from a total of 5804 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.
9783031198366
10.1007/978-3-031-19836-6 doi
Computer vision.
Computer Vision.
TA1634
006.37