9 results on '"M. Akin Yilmaz"'
Search Results
2. End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression
- Author
-
M. Akin Yilmaz, A. Murat Tekalp, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Yılmaz, Mustafa Akın, College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
Bidirectional control ,Image coding ,Video compression ,Motion compensation ,Optimization ,Entropy ,Video codecs ,Learned video compression ,Learned bi-directional motion compensation ,Flow field sub-sampling ,Flow vector prediction ,End-to-end optimization ,Computer science ,Artificial intelligence ,Engineering, electrical and electronic ,Computer Graphics and Computer-Aided Design ,Software - Abstract
Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TÜBA)
- Published
- 2022
3. Self-Organized Variational Autoencoders (Self-Vae) For Learned Image Compression
- Author
-
Onur Keless, Hilal Guven, Serkan Kiranyaz, A. Murat Tekalp, M. Akin Yilmaz, Junaid Malik, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Keleş, Onur, Akın Yılmaz, Mustafa, Güven, Hilal, Malik, J., Kıranyaz S., College of Engineering, Graduate School of Sciences and Engineering, and Department of Electrical and Electronics Engineering
- Subjects
End-to-end learned image compression ,Perceptual quality metrics ,Rate-distortion performance ,Self-organized operational layer ,Variational autoencoder ,business.industry ,Computer science ,Computer vision ,Artificial intelligence ,Compression ,JPEG ,Deep learning ,business ,Image compression - Abstract
In end-to-end optimized learned image compression, it is standard practice to use a convolutional variational autoencoder with generalized divisive normalization (GDN) to transform images into a latent space. Recently, Operational Neural Networks (ONNs) that learn the best non-linearity from a set of alternatives, and their “self-organized” variants, Self-ONNs, that approximate any non-linearity via Taylor series have been proposed to address the limitations of convolutional layers and a fixed nonlinear activation. In this paper, we propose to replace the convolutional and GDN layers in the variational autoencoder with self-organized operational layers, and propose a novel self-organized variational autoencoder (Self-VAE) architecture that benefits from stronger non-linearity. The experimental results demonstrate that the proposed Self-VAE yields improvements in both rate-distortion performance and perceptual image quality., Scientific and Technological Research Council of Turkey (TÜBİTAK); Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); Turkish Academy of Sciences (TÜBA)
- Published
- 2021
- Full Text
- View/download PDF
4. DFPN: Deformable Frame Prediction Network
- Author
-
A. Murat Tekalp, M. Akin Yilmaz, Yılmaz, M. Akın, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
Network architecture ,Computer science ,Work (physics) ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Task oriented ,State (computer science) ,Motion modeling ,Attention ,Deep learning ,Deformable convolution ,Video frame prediction ,Algorithm ,Forecasting ,Data compression - Abstract
Learned frame prediction is a current problem of interest in computer vision and video processing/compression. Although several deep network architectures have been proposed for learned frame prediction, to the best of our knowledge, there is no work based on using deformable convolutions for frame prediction. To this effect, we propose a deformable frame prediction network (DFPN) for task-oriented implicit motion modeling and next frame prediction. Experimental results demonstrate that the proposed DFPN model achieves state of the art results in next frame prediction in sequences with global motion., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TÜBA); Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI)
- Published
- 2021
- Full Text
- View/download PDF
5. On the Computation of PSNR for a Set of Images or Video
- Author
-
Zafer Dogan, Cansu Korkmaz, A. Murat Tekalp, M. Akin Yilmaz, Onur Keleş, Doğan, Zafer (ORCID 0000-0002-5078-4590 & YÖK ID 280658), Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Keleş, Onur, Yılmaz, Mustafa Akın, Korkmaz, Cansu, College of Engineering, Graduate School of Sciences and Engineering, and Department of Electrical and Electronics Engineering
- Subjects
FOS: Computer and information sciences ,Arithmetic mean ,Geometric mean ,MSE ,PSNR ,RGB-PSNR ,Y-channel PSNR ,Exponential distribution ,Computer science ,Image and Video Processing (eess.IV) ,Frame (networking) ,Electrical and electronic engineering ,Telecommunications ,Transportation science and technology ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Data_CODINGANDINFORMATIONTHEORY ,Electrical Engineering and Systems Science - Image and Video Processing ,Multimedia (cs.MM) ,Set (abstract data type) ,FOS: Electrical engineering, electronic engineering, information engineering ,RGB color model ,Noise (video) ,Algorithm ,Computer Science - Multimedia ,Image restoration - Abstract
When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/frame mean-square error (MSE). Others compute a single PSNR from the arithmetic mean of frame MSEs for each video. Furthermore, some compute the MSE/PSNR of Y-channel only, while others compute MSE/PSNR for RGB channels. This paper investigates different approaches to computing PSNR for sets of images, single video, and sets of video and the relation between them. We show the difference between computing the PSNR based on arithmetic vs. geometric mean of MSE depends on the distribution of MSE over the set of images or video, and that this distribution is task-dependent. In particular, these two methods yield larger differences in restoration problems, where the MSE is exponentially distributed and smaller differences in compression problems, where the MSE distribution is narrower. We hope this paper will motivate the community to clearly describe how they compute reported PSNR values to enable consistent comparison., Comment: accepted for publication in Picture Coding Symposium (PCS) 2021
- Published
- 2021
- Full Text
- View/download PDF
6. Video Frame Prediction via Deep Learning
- Author
-
M. Akin Yilmaz, A. Murat Tekalp, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Yılmaz, Mustafa Akın, College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
Computer science ,business.industry ,Deep learning ,Frame (networking) ,Backpropagation through time ,Pattern recognition ,Artificial intelligence ,Residual ,business ,Electrical and electronic engineering ,Telecommunications ,Convolutional neural network ,Frame prediction ,Recurrent network architectures ,Stateful training ,Convolutional network architectures - Abstract
This paper provides new results over our previous work presented in ICIP 2019 on the performance of learned frame prediction architectures and associated training methods. More specifically, we show that using an end-to-end residual connection in the fully convolutional neural network (FCNN) provides improved performance. In order to provide comparative results, we trained a residual FCNN, a convolutional RNN (CRNN), and a convolutional long-short term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be stably and efficiently trained using the stateful truncated backpropagation through time procedure, and requires an order of magnitude less inference runtime to achieve an acceptable performance in near real-time. / Bu bildiri, ICIP 2019’da sundugumuz öğrenilmiş video çerçeve öngörü mimarileri ve egitim yöntemleri karşılaştırılması çalışmamız üzerine yeni sonuçlar vermektedir. Daha spesifik olarak, tamamen evrisimsel sinir agında (FCNN) uçtan uca kalıntı bağlantısı kullanarak öngörü performansında gelişim sağlandığını gösteriyoruz. Sonuçları karşılaştırmak için, ortalama kare kaybını kullanarak sonraki kareyi öngörmek için kalıntı baglantılı FCNN, evrisimsel bir RNN (CRNN) ve evrisimsel uzun-kısa süreli bellek (CLSTM) agı eğittik. Yinelemeli ağlar için durum bilgisi taşımayan (stateless) ve taşıyan (stateful) egitim gerçekleştirdik. Deneysel sonuçlar, kalıntı baglantılı FCNN mimarisinin yüksek egitim ve test (çıkarım) hesaplama karmaşıklığı pahasına PSNR açısından en iyi performansı verdigini göstermektedir. CRNN, durum bilgisi taşıyan zamanda geri yayılım yoluyla kararlı olarak egitilebilmekte ve kabul edilebilir bir performansla gerçek-zamanlıya yakın çerçeve öngörüsü yapabilmektedir., Scientific and Technological Research Council of Turkey (TÜBİTAK); Turkish Academy of Sciences (TUBA)
- Published
- 2020
- Full Text
- View/download PDF
7. Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction
- Author
-
A. Murat Tekalp and M. Akin Yilmaz
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Image and Video Processing (eess.IV) ,05 social sciences ,Frame (networking) ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,010501 environmental sciences ,Electrical Engineering and Systems Science - Image and Video Processing ,01 natural sciences ,Convolutional neural network ,Memory management ,Recurrent neural network ,0502 economics and business ,FOS: Electrical engineering, electronic engineering, information engineering ,Backpropagation through time ,Artificial intelligence ,050207 economics ,business ,0105 earth and related environmental sciences - Abstract
We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance., Accepted for publication at IEEE ICIP 2019
- Published
- 2020
8. End-to-end rate-distortion optimization for bi-directional learned video compression
- Author
-
M. Akin Yilmaz, A. Murat Tekalp, Yılmaz, Melih Akın, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
FOS: Computer and information sciences ,Image compression ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Data_CODINGANDINFORMATIONTHEORY ,010501 environmental sciences ,01 natural sciences ,Motion estimation ,Computer Science::Multimedia ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Codec ,Entropy encoding ,Quantization (image processing) ,0105 earth and related environmental sciences ,Motion compensation ,Quantization (signal processing) ,Image and Video Processing (eess.IV) ,Electrical Engineering and Systems Science - Image and Video Processing ,Bi-directional motion compensation ,Deep learning ,End-to-end optimization ,Group of pictures ,Video compression ,Rate–distortion optimization ,020201 artificial intelligence & image processing ,Algorithm ,Data compression - Abstract
Conventional video compression methods employ a linear transform and block motion model, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to combinatorial nature of the end-to-end optimization problem. Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules, quantization parameter and entropy model simultaneously. While previous work on learned video compression considered training a sequential video codec based on end-to-end optimization of cost averaged over pairs of successive frames, it is well-known in conventional video compression that hierarchical, bi-directional coding outperforms sequential compression. In this paper, we propose for the first time end-to-end optimization of a hierarchical, bi-directional motion compensated learned codec by accumulating cost function over fixed-size groups of pictures (GOP). Experimental results show that the rate-distortion performance of our proposed learned bi-directional {\it GOP coder} outperforms the state-of-the-art end-to-end optimized learned sequential compression as expected., This work is accepted for publication in IEEE ICIP 2020
- Published
- 2020
9. NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results
- Author
-
Shuhang Gu, Greg Shakhnarovich, Fatih Porikli, Kyoung Mu Lee, Zhongyuan Wang, Yulun Zhang, Seokil Hong, M. Akin Yilmaz, Kuldeep Purohit, Si Miao, A. S. Mandal, Yapeng Tian, A. Murat Tekalp, Norimichi Ukita, Sanghyun Son, Junjun Jiang, Yun Fu, A.N. Rajagopalan, Chenliang Xu, Gyeongsik Moon, Sungyong Baik, Ankit Shukla, Zhe Hu, Chen Change Loy, Manoj Sharma, Chao Li, Ding Yukang, Dong Un Kang, Yongxin Zhu, Santanu Chaudhury, Ratheesh Kalarot, Hang Dong, Avinash Upadhyay, Muhammad Haris, Ke Yu, Thomas S. Huang, Megh Makwana, Wang Xintao, Xinyi Zhang, Peng Yi, Jiahui Yu, Kwanyoung Kim, Kui Jiang, Chao Dong, Xiao Huo, Rudrabha Mukhopadhyay, Yuchen Fan, Dongliang He, Ding Liu, Se Young Chun, Shilei Wen, Ajay Pratap Singh, Xiao Liu, Kelvin C.K. Chan, Cansu Korkmaz, Jiayi Ma, Radu Timofte, Anuj Badhwar, Seungjun Nah, Dheeraj Khanna, Tekalp, Ahmet Murat (ORCID 0000-0003-1465-8121 & YÖK ID 26207), Nah, S., Timofte, R., Gu, S., Baik, S., Hong, S., Moon, G., Son, S., Lee, K.M., Wang, X., Chan, K.C.K., Yu, K., Dong, C., Loy, C.C., Fan, Y., Yu, J., Liu, D., Huang, T.S., Liu, X., Li, C., He, D., DIng, Y., Wen, S., Porikli, F., Kalarot, R., Haris, M., Shakhnarovich, G., Ukita, N., Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J., Dong, H., Zhang, X., Hu, Z., Kim, K., Kang, D.U., Chun, S.Y., Purohit, K., Rajagopalan, A.N., Tian, Y., Zhang, Y., Fu, Y., Xu, C., Yılmaz, M.A., Korkmaz, C., Sharma, M., Makwana, M., Badhwar, A., Singh, A.P., Upadhyay, A., Mukhopadhyay, R., Shukla, A., Khanna, D., Mandal, A.S., Chaudhury, S., Miao, S., Zhu, Y., Huo, X., College of Engineering, and Department of Electrical and Electronics Engineering
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Image resolution ,Image restoration ,Video signal processing ,Track (rail transport) ,Superresolution ,Optical resolving power ,Image super ,0202 electrical engineering, electronic engineering, information engineering ,Bicubic interpolation ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Focus (optics) ,business - Abstract
This paper reviews the first NTIRE challenge on video super-resolution (restoration of rich details in low-resolution video frames) with focus on proposed solutions and results. A new REalistic and Diverse Scenes dataset (REDS) was employed. The challenge was divided into 2 tracks. Track 1 employed standard bicubic downscaling setup while Track 2 had realistic dynamic motion blurs. Each competition had 124 and 104 registered participants. There were total 14 teams in the final testing phase. They gauge the state-of-the-art in video super-resolution., NA
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.