Author: "M. Akin Yilmaz" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"M. Akin Yilmaz"' showing total 9 results

Start Over Author "M. Akin Yilmaz"

9 results on '"M. Akin Yilmaz"'

1. Flexible-Rate Learned Hierarchical Bi-Directional Video Compression with Motion Refinement and Frame-Level Bit Allocation

Author: Eren Cetin, M. Akin Yilmaz, and A. Murat Tekalp
Published: 2022
Full Text: View/download PDF

2. End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

Author: M. Akin Yilmaz, A. Murat Tekalp, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Yılmaz, Mustafa Akın, College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: Bidirectional control, Image coding, Video compression, Motion compensation, Optimization, Entropy, Video codecs, Learned video compression, Learned bi-directional motion compensation, Flow field sub-sampling, Flow vector prediction, End-to-end optimization, Computer science, Artificial intelligence, Engineering, electrical and electronic, Computer Graphics and Computer-Aided Design, Software
Abstract: Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TÜBA)
Published: 2022

3. Self-Organized Variational Autoencoders (Self-Vae) For Learned Image Compression

Author: Onur Keless, Hilal Guven, Serkan Kiranyaz, A. Murat Tekalp, M. Akin Yilmaz, Junaid Malik, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Keleş, Onur, Akın Yılmaz, Mustafa, Güven, Hilal, Malik, J., Kıranyaz S., College of Engineering, Graduate School of Sciences and Engineering, and Department of Electrical and Electronics Engineering
Subjects: End-to-end learned image compression, Perceptual quality metrics, Rate-distortion performance, Self-organized operational layer, Variational autoencoder, business.industry, Computer science, Computer vision, Artificial intelligence, Compression, JPEG, Deep learning, business, Image compression
Abstract: In end-to-end optimized learned image compression, it is standard practice to use a convolutional variational autoencoder with generalized divisive normalization (GDN) to transform images into a latent space. Recently, Operational Neural Networks (ONNs) that learn the best non-linearity from a set of alternatives, and their “self-organized” variants, Self-ONNs, that approximate any non-linearity via Taylor series have been proposed to address the limitations of convolutional layers and a fixed nonlinear activation. In this paper, we propose to replace the convolutional and GDN layers in the variational autoencoder with self-organized operational layers, and propose a novel self-organized variational autoencoder (Self-VAE) architecture that benefits from stronger non-linearity. The experimental results demonstrate that the proposed Self-VAE yields improvements in both rate-distortion performance and perceptual image quality., Scientific and Technological Research Council of Turkey (TÜBİTAK); Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); Turkish Academy of Sciences (TÜBA)
Published: 2021
Full Text: View/download PDF

4. DFPN: Deformable Frame Prediction Network

Author: A. Murat Tekalp, M. Akin Yilmaz, Yılmaz, M. Akın, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: Network architecture, Computer science, Work (physics), Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Task oriented, State (computer science), Motion modeling, Attention, Deep learning, Deformable convolution, Video frame prediction, Algorithm, Forecasting, Data compression
Abstract: Learned frame prediction is a current problem of interest in computer vision and video processing/compression. Although several deep network architectures have been proposed for learned frame prediction, to the best of our knowledge, there is no work based on using deformable convolutions for frame prediction. To this effect, we propose a deformable frame prediction network (DFPN) for task-oriented implicit motion modeling and next frame prediction. Experimental results demonstrate that the proposed DFPN model achieves state of the art results in next frame prediction in sequences with global motion., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TÜBA); Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI)
Published: 2021
Full Text: View/download PDF

5. On the Computation of PSNR for a Set of Images or Video

Author: Zafer Dogan, Cansu Korkmaz, A. Murat Tekalp, M. Akin Yilmaz, Onur Keleş, Doğan, Zafer (ORCID 0000-0002-5078-4590 & YÖK ID 280658), Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Keleş, Onur, Yılmaz, Mustafa Akın, Korkmaz, Cansu, College of Engineering, Graduate School of Sciences and Engineering, and Department of Electrical and Electronics Engineering
Subjects: FOS: Computer and information sciences, Arithmetic mean, Geometric mean, MSE, PSNR, RGB-PSNR, Y-channel PSNR, Exponential distribution, Computer science, Image and Video Processing (eess.IV), Frame (networking), Electrical and electronic engineering, Telecommunications, Transportation science and technology, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, Electrical Engineering and Systems Science - Image and Video Processing, Multimedia (cs.MM), Set (abstract data type), FOS: Electrical engineering, electronic engineering, information engineering, RGB color model, Noise (video), Algorithm, Computer Science - Multimedia, Image restoration
Abstract: When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/frame mean-square error (MSE). Others compute a single PSNR from the arithmetic mean of frame MSEs for each video. Furthermore, some compute the MSE/PSNR of Y-channel only, while others compute MSE/PSNR for RGB channels. This paper investigates different approaches to computing PSNR for sets of images, single video, and sets of video and the relation between them. We show the difference between computing the PSNR based on arithmetic vs. geometric mean of MSE depends on the distribution of MSE over the set of images or video, and that this distribution is task-dependent. In particular, these two methods yield larger differences in restoration problems, where the MSE is exponentially distributed and smaller differences in compression problems, where the MSE distribution is narrower. We hope this paper will motivate the community to clearly describe how they compute reported PSNR values to enable consistent comparison., Comment: accepted for publication in Picture Coding Symposium (PCS) 2021
Published: 2021
Full Text: View/download PDF

6. Video Frame Prediction via Deep Learning

Author: M. Akin Yilmaz, A. Murat Tekalp, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Yılmaz, Mustafa Akın, College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: Computer science, business.industry, Deep learning, Frame (networking), Backpropagation through time, Pattern recognition, Artificial intelligence, Residual, business, Electrical and electronic engineering, Telecommunications, Convolutional neural network, Frame prediction, Recurrent network architectures, Stateful training, Convolutional network architectures
Abstract: This paper provides new results over our previous work presented in ICIP 2019 on the performance of learned frame prediction architectures and associated training methods. More specifically, we show that using an end-to-end residual connection in the fully convolutional neural network (FCNN) provides improved performance. In order to provide comparative results, we trained a residual FCNN, a convolutional RNN (CRNN), and a convolutional long-short term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be stably and efficiently trained using the stateful truncated backpropagation through time procedure, and requires an order of magnitude less inference runtime to achieve an acceptable performance in near real-time. / Bu bildiri, ICIP 2019’da sundugumuz öğrenilmiş video çerçeve öngörü mimarileri ve egitim yöntemleri karşılaştırılması çalışmamız üzerine yeni sonuçlar vermektedir. Daha spesifik olarak, tamamen evrisimsel sinir agında (FCNN) uçtan uca kalıntı bağlantısı kullanarak öngörü performansında gelişim sağlandığını gösteriyoruz. Sonuçları karşılaştırmak için, ortalama kare kaybını kullanarak sonraki kareyi öngörmek için kalıntı baglantılı FCNN, evrisimsel bir RNN (CRNN) ve evrisimsel uzun-kısa süreli bellek (CLSTM) agı eğittik. Yinelemeli ağlar için durum bilgisi taşımayan (stateless) ve taşıyan (stateful) egitim gerçekleştirdik. Deneysel sonuçlar, kalıntı baglantılı FCNN mimarisinin yüksek egitim ve test (çıkarım) hesaplama karmaşıklığı pahasına PSNR açısından en iyi performansı verdigini göstermektedir. CRNN, durum bilgisi taşıyan zamanda geri yayılım yoluyla kararlı olarak egitilebilmekte ve kabul edilebilir bir performansla gerçek-zamanlıya yakın çerçeve öngörüsü yapabilmektedir., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TUBA)
Published: 2020
Full Text: View/download PDF

7. Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

Author: A. Murat Tekalp and M. Akin Yilmaz
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), 05 social sciences, Frame (networking), Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 010501 environmental sciences, Electrical Engineering and Systems Science - Image and Video Processing, 01 natural sciences, Convolutional neural network, Memory management, Recurrent neural network, 0502 economics and business, FOS: Electrical engineering, electronic engineering, information engineering, Backpropagation through time, Artificial intelligence, 050207 economics, business, 0105 earth and related environmental sciences
Abstract: We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance., Accepted for publication at IEEE ICIP 2019
Published: 2020

8. End-to-end rate-distortion optimization for bi-directional learned video compression

Author: M. Akin Yilmaz, A. Murat Tekalp, Yılmaz, Melih Akın, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: FOS: Computer and information sciences, Image compression, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Data_CODINGANDINFORMATIONTHEORY, 010501 environmental sciences, 01 natural sciences, Motion estimation, Computer Science::Multimedia, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, Codec, Entropy encoding, Quantization (image processing), 0105 earth and related environmental sciences, Motion compensation, Quantization (signal processing), Image and Video Processing (eess.IV), Electrical Engineering and Systems Science - Image and Video Processing, Bi-directional motion compensation, Deep learning, End-to-end optimization, Group of pictures, Video compression, Rate–distortion optimization, 020201 artificial intelligence & image processing, Algorithm, Data compression
Abstract: Conventional video compression methods employ a linear transform and block motion model, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to combinatorial nature of the end-to-end optimization problem. Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules, quantization parameter and entropy model simultaneously. While previous work on learned video compression considered training a sequential video codec based on end-to-end optimization of cost averaged over pairs of successive frames, it is well-known in conventional video compression that hierarchical, bi-directional coding outperforms sequential compression. In this paper, we propose for the first time end-to-end optimization of a hierarchical, bi-directional motion compensated learned codec by accumulating cost function over fixed-size groups of pictures (GOP). Experimental results show that the rate-distortion performance of our proposed learned bi-directional {\it GOP coder} outperforms the state-of-the-art end-to-end optimized learned sequential compression as expected., This work is accepted for publication in IEEE ICIP 2020
Published: 2020

9. NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results

Author: Shuhang Gu, Greg Shakhnarovich, Fatih Porikli, Kyoung Mu Lee, Zhongyuan Wang, Yulun Zhang, Seokil Hong, M. Akin Yilmaz, Kuldeep Purohit, Si Miao, A. S. Mandal, Yapeng Tian, A. Murat Tekalp, Norimichi Ukita, Sanghyun Son, Junjun Jiang, Yun Fu, A.N. Rajagopalan, Chenliang Xu, Gyeongsik Moon, Sungyong Baik, Ankit Shukla, Zhe Hu, Chen Change Loy, Manoj Sharma, Chao Li, Ding Yukang, Dong Un Kang, Yongxin Zhu, Santanu Chaudhury, Ratheesh Kalarot, Hang Dong, Avinash Upadhyay, Muhammad Haris, Ke Yu, Thomas S. Huang, Megh Makwana, Wang Xintao, Xinyi Zhang, Peng Yi, Jiahui Yu, Kwanyoung Kim, Kui Jiang, Chao Dong, Xiao Huo, Rudrabha Mukhopadhyay, Yuchen Fan, Dongliang He, Ding Liu, Se Young Chun, Shilei Wen, Ajay Pratap Singh, Xiao Liu, Kelvin C.K. Chan, Cansu Korkmaz, Jiayi Ma, Radu Timofte, Anuj Badhwar, Seungjun Nah, Dheeraj Khanna, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Nah, S., Timofte, R., Gu, S., Baik, S., Hong, S., Moon, G., Son, S., Lee, K.M., Wang, X., Chan, K.C.K., Yu, K., Dong, C., Loy, C.C., Fan, Y., Yu, J., Liu, D., Huang, T.S., Liu, X., Li, C., He, D., DIng, Y., Wen, S., Porikli, F., Kalarot, R., Haris, M., Shakhnarovich, G., Ukita, N., Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J., Dong, H., Zhang, X., Hu, Z., Kim, K., Kang, D.U., Chun, S.Y., Purohit, K., Rajagopalan, A.N., Tian, Y., Zhang, Y., Fu, Y., Xu, C., Yılmaz, M.A., Korkmaz, C., Sharma, M., Makwana, M., Badhwar, A., Singh, A.P., Upadhyay, A., Mukhopadhyay, R., Shukla, A., Khanna, D., Mandal, A.S., Chaudhury, S., Miao, S., Zhu, Y., Huo, X., College of Engineering, and Department of Electrical and Electronics Engineering
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Image resolution, Image restoration, Video signal processing, Track (rail transport), Superresolution, Optical resolving power, Image super, 0202 electrical engineering, electronic engineering, information engineering, Bicubic interpolation, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Focus (optics), business
Abstract: This paper reviews the first NTIRE challenge on video super-resolution (restoration of rich details in low-resolution video frames) with focus on proposed solutions and results. A new REalistic and Diverse Scenes dataset (REDS) was employed. The challenge was divided into 2 tracks. Track 1 employed standard bicubic downscaling setup while Track 2 had realistic dynamic motion blurs. Each competition had 124 and 104 registered participants. There were total 14 teams in the final testing phase. They gauge the state-of-the-art in video super-resolution., NA
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"M. Akin Yilmaz"'

1. Flexible-Rate Learned Hierarchical Bi-Directional Video Compression with Motion Refinement and Frame-Level Bit Allocation

2. End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

3. Self-Organized Variational Autoencoders (Self-Vae) For Learned Image Compression

4. DFPN: Deformable Frame Prediction Network

5. On the Computation of PSNR for a Set of Images or Video

6. Video Frame Prediction via Deep Learning

7. Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

8. End-to-end rate-distortion optimization for bi-directional learned video compression

9. NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

9 results on '"M. Akin Yilmaz"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources