Search

Your search keyword '"Orti, Enrique"' showing total 440 results

Search Constraints

Start Over You searched for: Author "Orti, Enrique" Remove constraint Author: "Orti, Enrique"
440 results on '"Orti, Enrique"'

Search Results

1. Parallel Reduced Order Modeling for Digital Twins using High-Performance Computing Workflows

2. Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning

3. Performance Analysis of Matrix Multiplication for Deep Learning on the Edge

4. Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

5. GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge

6. Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

7. Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

8. Toward matrix multiplication for deep learning inference on the Xilinx Versal

9. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors

10. Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks

11. Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence

14. High performance and energy efficient inference for deep learning on ARM processors

15. GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

16. Performance Analysis of Convolution Algorithms for Deep Learning on Edge Processors

17. Resiliency in Numerical Algorithm Design for Extreme Scale Simulations

18. Compressed Basis GMRES on High Performance GPUs

19. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

20. Reproducibility of Parallel Preconditioned Conjugate Gradient in Hybrid Programming Environments

21. High Performance and Portable Convolution Operators for ARM-based Multicore Processors

22. DMR API: Improving cluster productivity by turning applications into malleable

23. Exploiting nested task-parallelism in the $\mathcal{H}-LU$ factorization

26. QR Factorization Using Malleable BLAS on Multicore Processors

27. Performance Analysis of Matrix Multiplication for Deep Learning on the Edge

30. Programming Parallel Dense Matrix Factorizations with Look-Ahead and OpenMP

33. Look-Ahead in the Two-Sided Reduction to Compact Band Forms for Symmetric Eigenvalue Problems and the SVD

34. A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting

35. Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs

37. Multiprecision Block-Jacobi for Iterative Triangular Solves

38. Structure-Aware Calculation of Many-Electron Wave Function Overlaps on Multicore Processors

39. Architecture-Aware Optimization of an HEVC decoder on Asymmetric Multicore Processors

40. Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

42. Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra

43. Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors

44. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

45. Evaluating Asymmetric Multicore Systems-on-Chip using Iso-Metrics

46. Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors

48. Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

49. Concurrent and Accurate RNA Sequencing on Multicore Platforms

50. Evaluating the NVIDIA Tegra Processor as a Low-Power Alternative for Sparse GPU Computations

Catalog

Books, media, physical & digital resources