Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

Peng, Chen; Mohamed, Wahib; Shinichiro, Takizawa; Satoshi, Matsuoka; Peng, Chen; Mohamed, Wahib; Shinichiro, Takizawa; Satoshi, Matsuoka

WEKO3

インデックスツリー

RootNode

アイテム

Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

https://ipsj.ixsq.nii.ac.jp/records/186051

名前 / ファイル	ライセンス	アクション
IPSJ-HPC18163022.pdf (4.1 MB)	Copyright (c) 2018 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2018-02-21

タイトル

Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

タイトル

言語

タイトル

Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

言語

eng

キーワード

主題Scheme

Other

主題

GPU

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

Tokyo Institute of Technology／AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者所属

National Institute of Advanced Industrial Science and Technology

著者所属

AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者所属

Tokyo Institute of Technology／AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者所属(英)

Tokyo Institute of Technology / AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者所属(英)

National Institute of Advanced Industrial Science and Technology

著者所属(英)

AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者所属(英)

Tokyo Institute of Technology / AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology

著者名

Peng, Chen
Mohamed, Wahib
Shinichiro, Takizawa
Satoshi, Matsuoka

著者名(英)

Peng, Chen
Mohamed, Wahib
Shinichiro, Takizawa
Satoshi, Matsuoka

論文抄録

内容記述タイプ

Other

内容記述

The 2D convolution operator is the computational bottleneck in a variety of image processing and machine learning applications. We propose an algorithm to compute convolution by employing register files to cache image data (known as register cache), rather than using the user-managed scratch-pad memory. We take advantage of CUDA's warp shuffle functions to accelerate the intra-warp communication of partial results. Unlike the GEMM-based, FFT-based or Winograd method, our algorithm executes the convolution computation without using any GPU memory as a workspace, and is general to all filter shapes. Our algorithm performs better than state-of-the-art 2D convolution implementations. Using a single TitanXp GPU, it is in average 4.7x faster than NPP (Nvidia Performance Primitives), and 1.8x faster than the highly-optimized ArrayFire library.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10463942

書誌情報

研究報告ハイパフォーマンスコンピューティング（HPC）

巻 2018-HPC-163, 号 22, p. 1-9, 発行日 2018-02-21

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8841

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 02:44:23.789721

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Pushing the Limits for 2D Convolution Computation On CUDA-enabled GPUs

× Peng, Chen

× Mohamed, Wahib

× Shinichiro, Takizawa

× Satoshi, Matsuoka

× Peng, Chen

× Mohamed, Wahib

× Shinichiro, Takizawa

× Satoshi, Matsuoka

Versions

Share

Cite as

エクスポート