1
0
Fork 0

mor build, output

main
Jeff Moe 2024-02-06 09:10:55 -07:00
parent 1704c80e7c
commit 3eb675a66f
14 changed files with 888 additions and 128 deletions

View File

@ -0,0 +1,52 @@
RocmBandwidthTest Version: 2.6.0
Launch Command is: rocm-bandwidth-test -e
Device Index: 0
Device Type: CPU
Device Name: AMD EPYC 7662 64-Core Processor
Allocatable Memory Size (KB): 263788012
Device Index: 1
Device Type: GPU
Device Name: AMD Radeon Graphics
Device BDF: c3:0.0
Device UUID: GPU-9391a2630862a05e
Allocatable Memory Size (KB): 25149440
Allocatable Memory Size (KB): 25149440
Device Index: 2
Device Type: GPU
Device Name: AMD Radeon Graphics
Device BDF: 83:0.0
Device UUID: GPU-b8774af2c31f0c3b
Allocatable Memory Size (KB): 25149440
Allocatable Memory Size (KB): 25149440
Device Index: 3
Device Type: GPU
Device Name: AMD Radeon Graphics
Device BDF: 48:0.0
Device UUID: GPU-bdc8ded57bfb9196
Allocatable Memory Size (KB): 25149440
Allocatable Memory Size (KB): 25149440
Device Index: 4
Device Type: GPU
Device Name: AMD Radeon Graphics
Device BDF: 03:0.0
Device UUID: GPU-ffecdebca16a2c8e
Allocatable Memory Size (KB): 25149440
Allocatable Memory Size (KB): 25149440
Device Index: 5
Device Type: GPU
Device Name: AMD Radeon Graphics
Device BDF: 06:0.0
Device UUID: GPU-74d6901254ac9411
Allocatable Memory Size (KB): 25149440
Allocatable Memory Size (KB): 25149440

View File

@ -0,0 +1,81 @@
....................................................................................................................................................................................................................................................................................................................................................
RocmBandwidthTest Version: 2.6.0
Launch Command is: rocm-bandwidth-test (rocm_bandwidth -a + rocm_bandwidth -A)
Device: 0, AMD EPYC 7662 64-Core Processor
Device: 1, AMD Radeon Graphics, GPU-9391a2630862a05e, c3:0.0
Device: 2, AMD Radeon Graphics, GPU-b8774af2c31f0c3b, 83:0.0
Device: 3, AMD Radeon Graphics, GPU-bdc8ded57bfb9196, 48:0.0
Device: 4, AMD Radeon Graphics, GPU-ffecdebca16a2c8e, 03:0.0
Device: 5, AMD Radeon Graphics, GPU-74d6901254ac9411, 06:0.0
Inter-Device Access
D/D 0 1 2 3 4 5
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 1
4 1 1 1 1 1 1
5 1 1 1 1 1 1
Inter-Device Numa Distance
D/D 0 1 2 3 4 5
0 0 20 20 20 20 20
1 20 0 40 40 40 40
2 20 40 0 40 40 40
3 20 40 40 0 40 40
4 20 40 40 40 0 40
5 20 40 40 40 40 0
Unidirectional copy peak bandwidth GB/s
D/D 0 1 2 3 4 5
0 N/A 27.995 27.994 27.998 27.952 27.995
1 25.704 1068.612 23.989 23.990 23.989 23.989
2 25.704 25.704 1118.099 23.990 23.990 23.989
3 25.703 25.704 25.704 1069.976 23.990 23.990
4 25.703 25.703 25.704 25.704 1106.677 23.989
5 25.703 23.990 25.703 25.703 25.703 1035.311
Bidirectional copy peak bandwidth GB/s
D/D 0 1 2 3 4 5
0 N/A 46.184 45.644 46.055 45.643 45.740
1 46.184 N/A 47.968 47.965 47.978 47.979
2 45.644 47.968 N/A 47.957 47.977 47.979
3 46.055 47.965 47.957 N/A 47.954 47.979
4 45.643 47.978 47.977 47.954 N/A 47.977
5 45.740 47.979 47.979 47.979 47.977 N/A

View File

@ -0,0 +1,529 @@
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD EPYC 7662 64-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7662 64-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 128
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 263788012(0xfb915ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 263788012(0xfb915ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 263788012(0xfb915ec) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-9391a2630862a05e
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2526
BDFID: 49920
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 528
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx1100
Uuid: GPU-b8774af2c31f0c3b
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2526
BDFID: 33536
Internal Node ID: 2
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 528
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 4
*******
Name: gfx1100
Uuid: GPU-bdc8ded57bfb9196
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 3
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2526
BDFID: 18432
Internal Node ID: 3
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 528
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 5
*******
Name: gfx1100
Uuid: GPU-ffecdebca16a2c8e
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 4
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2526
BDFID: 768
Internal Node ID: 4
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 528
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 6
*******
Name: gfx1100
Uuid: GPU-74d6901254ac9411
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 5
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2526
BDFID: 1536
Internal Node ID: 5
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 528
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

View File

@ -0,0 +1,12 @@
diff --git a/src/libhsakmt.ver b/src/libhsakmt.ver
index 15c2916..c04cefe 100644
--- a/src/libhsakmt.ver
+++ b/src/libhsakmt.ver
@@ -81,6 +81,7 @@ hsaKmtWaitOnEvent_Ext;
hsaKmtWaitOnMultipleEvents_Ext;
hsaKmtReplaceAsanHeaderPage;
hsaKmtReturnAsanHeaderPage;
+hsaKmtGetAMDGPUDeviceHandle;
local: *;
};

View File

@ -19,11 +19,7 @@ cmake -B build -G Ninja \
-DCPACK_SOURCE_TZ=OFF \
-DROCM_CCACHE_BUILD=ON \
-DROCM_DIR=/opt/rocm \
-Dhip_DIR=/home/jebba/devel/ROCm/hip
-Dhip_DIR=/opt/rocm/share/rocm/cmake
ninja -C build package
sudo dpkg -i build/comgr_2.6.0.99999-local_amd64.deb
exit
# XXX
hip_DIR hip_DIR-NOTFOUND
-Dhip_DIR=/opt/rocm/share/rocm/cmake

View File

@ -18,7 +18,8 @@ cmake -B build -G Ninja \
-DCPACK_SOURCE_TBZ2=OFF \
-DCPACK_SOURCE_TGZ=OFF \
-DCPACK_SOURCE_TXZ=OFF \
-DCPACK_SOURCE_TZ=OFF
-DCPACK_SOURCE_TZ=OFF \
-DHIPCC_BACKWARD_COMPATIBILITY=ON
ninja -C build package
sudo dpkg -i build/hipcc_1.0.0.99999-local_amd64.deb

View File

@ -1,27 +1,23 @@
#!/bin/bash
rm -rf rocm-bandwidth-test
git clone https://github.com/ROCm/rocm_bandwidth_test
cd rocm_bandwidth_test/
git checkout rocm-6.0.2
rm -rf build
cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CXX_FLAGS="-I/opt/rocm/include/hsa" \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_INSTALL_PREFIX=/opt/rocm \
-DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm \
-DCPACK_BINARY_DEB=ON \
-DCPACK_BINARY_STGZ=OFF \
-DCPACK_BINARY_TGZ=OFF \
-DCPACK_BINARY_TZ=OFF \
-DCPACK_GENERATOR=DEB \
-DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm \
-DCPACK_SOURCE_TBZ2=OFF \
-DCPACK_SOURCE_TGZ=OFF \
-DCPACK_SOURCE_TZ=OFF \
-DCPACK_SOURCE_TXZ=OFF \
-DCPACK_SOURCE_TXZ=OFF \
-DCPACK_GENERATOR=DEB \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_FLAGS="-I/opt/rocm/include/hsa"
-DCPACK_SOURCE_TZ=OFF
ninja -C build package
sudo dpkg -i build/rocm-bandwidth-test_1.4.0.99999-local_amd64.deb

View File

@ -19,5 +19,3 @@ cmake -B build -G Ninja \
ninja -C build package
sudo dpkg -i build/rocminfo_1.0.0.99999-local_amd64.deb
exit
/usr/bin/ld: /opt/rocm/lib/libhsa-runtime64.so.1.12.0: undefined reference to hsaKmtGetAMDGPUDeviceHandle

View File

@ -23,5 +23,3 @@ cmake -B build -G Ninja \
ninja -C build package
sudo dpkg -i build/hsa-rocr_1.12.0-local_amd64.deb \
build/hsa-rocr-dev_1.12.0-local_amd64.deb
exit
-DINCLUDE_PATH_COMPATIBILITY=ON

View File

@ -2,6 +2,7 @@ git clone --recursive https://github.com/ROCm/ROCT-Thunk-Interface
cd ROCT-Thunk-Interface/
git checkout rocm-6.0.2
rm -rf build
# XXX PATCH
cmake -B build -G Ninja \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=Release \

View File

@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: tinyrocs 0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-02-06 08:18-0700\n"
"POT-Creation-Date: 2024-02-06 09:09-0700\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: en\n"
@ -70,3 +70,28 @@ msgstr ""
#: ../../../_source/output.rst:48
msgid "Output with four GPUs:"
msgstr ""
#: ../../../_source/output.rst:54
msgid "rocm-bandwidth-test"
msgstr ""
#: ../../../_source/output.rst:56
msgid "``rocm-bandwidth-test``"
msgstr ""
#: ../../../_source/output.rst:62
msgid "``rocm-bandwidth-test -e``"
msgstr ""
#: ../../../_source/output.rst:63
msgid "Devices."
msgstr ""
#: ../../../_source/output.rst:69
msgid "rocminfo"
msgstr ""
#: ../../../_source/output.rst:71
msgid "``rocminfo``"
msgstr ""

View File

@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: tinyrocs 0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-02-05 19:45-0700\n"
"POT-Creation-Date: 2024-02-06 08:57-0700\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: en\n"
@ -103,164 +103,195 @@ msgid "Build ``amd-smi``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:72
msgid "device-libs"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:73
msgid "Build ``device-libs``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:75
msgid ""
"Note, building against the ``amd-stg-open`` or ``amd-staging`` branch "
"includes and ``amd/`` directory that has ``device-libs`` to build. Release "
"``6.0.2`` does not have these directories, so the packages need to be build "
"from other repos, which is kind of broken, afaict."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:85
msgid "roct-thunk-interface"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:86
#: ../../../_source/toolchain-6.0.2.rst:73
msgid ""
"This needs a patchlet or other applications (e.g. ``rocminfo``) won't be "
"able to build. Just needs a one-liner:"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:79
msgid "Build ``roct-thunk-interface``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:93
#: ../../../_source/toolchain-6.0.2.rst:86
msgid "device-libs"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:87
msgid "Build ``device-libs``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:89
msgid ""
"Using the deprecated device-libs repository, as it is what is used for "
"release ``6.0.2``. In later releases, this package is built under the ``llvm-"
"project/amd`` directory."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:98
msgid "rocr-runtime"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:94
msgid ""
"Build ``rocr-runtime``. Needs hsakmtConfig.cmake from ROCT-Thunk-Interface "
"first."
#: ../../../_source/toolchain-6.0.2.rst:99
msgid "Build ``rocr-runtime``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:100
#: ../../../_source/toolchain-6.0.2.rst:101
msgid ""
"This has an option for ``TARGET_DEVICES``. By default all targets are built. "
"This adds a *lot* of time to the build for devices that won't be used. But "
"if they aren't included, other packages further down the toolchain may "
"complain, so include them all for now."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:106
msgid "List of possible targets:"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:108
msgid ""
"``gfx700;gfx701;gfx702;gfx801;gfx802;gfx803;gfx805;gfx810;gfx900;gfx902;"
"gfx904;gfx906;gfx908;gfx909;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;"
"gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;"
"gfx1036;gfx1100;gfx1101;gfx1102;gfx1103``"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:110
msgid "The AMD Radeon 7900 XTX target is ``gfx1100``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:115
msgid ""
"For some reason, this is installing headers to ``/usr/hsa`` instead of ``/"
"opt/rocm``. It is ignoring the ``PREFIX``. Workaround..."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:105
#: ../../../_source/toolchain-6.0.2.rst:120
msgid "hipcc"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:121
msgid "hipcc built under clr. This seems better."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:128
msgid "comgr"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:106
#: ../../../_source/toolchain-6.0.2.rst:129
msgid "AKA ``ROCm-CompilerSupport``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:108
#: ../../../_source/toolchain-6.0.2.rst:131
msgid "Build ``comgr``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:110
#: ../../../_source/toolchain-6.0.2.rst:133
msgid ""
"This is another that in latest HEAD uses ``llvm-project/amd/`` directory, "
"but in ``6.0.2`` this isn't available."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:117
#: ../../../_source/toolchain-6.0.2.rst:136
msgid "Failing to find ``hip`` directory. XXX"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:142
msgid "Has non-fatal (?) ``hip_DIR-NOTFOUND`` in cmake."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:121
#: ../../../_source/toolchain-6.0.2.rst:146
msgid "LLVM Pass Two"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:122
#: ../../../_source/toolchain-6.0.2.rst:147
msgid "XXX Skip this XXX."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:129
msgid "hipcc"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:130
msgid "hipcc built under clr. This seems better."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:137
#: ../../../_source/toolchain-6.0.2.rst:154
msgid "clr"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:138
#: ../../../_source/toolchain-6.0.2.rst:155
msgid "OpenCL and more."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:140
#: ../../../_source/toolchain-6.0.2.rst:157
msgid ""
"``file STRINGS file \"/home/jebba/devel/ROCm/hip/VERSION\" cannot be read.``"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:147
#: ../../../_source/toolchain-6.0.2.rst:164
msgid "rocminfo"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:148
#: ../../../_source/toolchain-6.0.2.rst:165
msgid "Yes, ``rocminfo``"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:155
#: ../../../_source/toolchain-6.0.2.rst:172
msgid "rocBLAS"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:156
#: ../../../_source/toolchain-6.0.2.rst:173
msgid "Needed before hipBLAS."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:158
#: ../../../_source/toolchain-6.0.2.rst:175
msgid "Set up this once:"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:175
#: ../../../_source/toolchain-6.0.2.rst:192
msgid "rocprim"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:176
#: ../../../_source/toolchain-6.0.2.rst:193
msgid "``rocprim``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:183
#: ../../../_source/toolchain-6.0.2.rst:200
msgid "rocsparse"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:184
#: ../../../_source/toolchain-6.0.2.rst:201
msgid "``rocsparse``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:191
#: ../../../_source/toolchain-6.0.2.rst:208
msgid "rocsolver"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:192
#: ../../../_source/toolchain-6.0.2.rst:209
msgid "``rocsolver`` for hipBLAS."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:199
#: ../../../_source/toolchain-6.0.2.rst:216
msgid "hipBLAS"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:200
#: ../../../_source/toolchain-6.0.2.rst:217
msgid "``hipBLAS`` plz."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:207
#: ../../../_source/toolchain-6.0.2.rst:224
msgid "rocm-bandwidth-test"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:208
#: ../../../_source/toolchain-6.0.2.rst:225
msgid "``rocm-bandwidth-test``."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:215
#: ../../../_source/toolchain-6.0.2.rst:232
msgid "HOLD"
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:216
#: ../../../_source/toolchain-6.0.2.rst:233
msgid "Don't upgrade over these files. Debian has higher epochs."
msgstr ""
#: ../../../_source/toolchain-6.0.2.rst:218
#: ../../../_source/toolchain-6.0.2.rst:235
msgid "``apt-mark hold hipcc llvm rocm-cmake rocm-device-libs rocminfo``"
msgstr ""

View File

@ -6,37 +6,37 @@ System _output.
amd-smi
=======
``amd-smi bad-pages``
----------------
---------------------
.. literalinclude:: _static/_output/amd-smi-bad-pages.txt
:language: output
:language: BashSession
``amd-smi firmware``
----------------
--------------------
.. literalinclude:: _static/_output/amd-smi-firmware.txt
:language: output
:language: BashSession
``amd-smi list``
----------------
.. literalinclude:: _static/_output/amd-smi-list.txt
:language: output
:language: BashSession
``amd-smi metric``
--------------------
.. literalinclude:: _static/_output/amd-smi-metric.txt
:language: output
:language: BashSession
``amd-smi static``
------------------
.. literalinclude:: _static/_output/amd-smi-static.txt
:language: output
:language: BashSession
``amd-smi topology``
--------------------
.. literalinclude:: _static/_output/amd-smi-topology.txt
:language: output
:language: BashSession
Pytorch
=======
@ -48,5 +48,28 @@ Find scriptlet in Applications Pytorch section.
Output with four GPUs:
.. literalinclude:: _static/_output/verify-pytorch.txt
:language: output
:language: BashSession
rocm-bandwidth-test
===================
``rocm-bandwidth-test``
-----------------------
.. literalinclude:: _static/_output/rocm-bandwidth-test.txt
:language: BashSession
``rocm-bandwidth-test -e``
--------------------------
Devices.
.. literalinclude:: _static/_output/rocm-bandwidth-test-devices.txt
:language: BashSession
rocminfo
========
``rocminfo``
---------------------
.. literalinclude:: _static/_output/rocminfo.txt
:language: BashSession

View File

@ -68,31 +68,46 @@ Build ``amd-smi``.
:language: bash
device-libs
-----------
Build ``device-libs``.
Note, building against the ``amd-stg-open`` or ``amd-staging`` branch
includes and ``amd/`` directory that has ``device-libs`` to build.
Release ``6.0.2`` does not have these directories, so the packages
need to be build from other repos, which is kind of broken, afaict.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-device-libs.sh
:language: bash
roct-thunk-interface
--------------------
This needs a patchlet or other applications (e.g. ``rocminfo``) won't
be able to build. Just needs a one-liner:
.. literalinclude:: _static/toolchain/patch/roct.patch
:language: diff
Build ``roct-thunk-interface``.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-roct-thunk-interface.sh
:language: bash
device-libs
-----------
Build ``device-libs``.
Using the deprecated device-libs repository, as it is what is used
for release ``6.0.2``. In later releases, this package is built
under the ``llvm-project/amd`` directory.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-device-libs.sh
:language: bash
rocr-runtime
------------
Build ``rocr-runtime``.
Needs hsakmtConfig.cmake from ROCT-Thunk-Interface first.
This has an option for ``TARGET_DEVICES``. By default all targets are built.
This adds a *lot* of time to the build for devices that won't be used.
But if they aren't included, other packages further down the toolchain may
complain, so include them all for now.
List of possible targets:
``gfx700;gfx701;gfx702;gfx801;gfx802;gfx803;gfx805;gfx810;gfx900;gfx902;gfx904;gfx906;gfx908;gfx909;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1100;gfx1101;gfx1102;gfx1103``
The AMD Radeon 7900 XTX target is ``gfx1100``.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocr-runtime.sh
:language: bash
@ -101,6 +116,30 @@ For some reason, this is installing headers to ``/usr/hsa`` instead of
``/opt/rocm``. It is ignoring the ``PREFIX``. Workaround...
hipcc
-----
hipcc built under clr. This seems better.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-hipcc.sh
:language: bash
rocminfo
--------
Yes, ``rocminfo``
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocminfo.sh
:language: bash
rocm-bandwidth-test
-------------------
``rocm-bandwidth-test``.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocm-bandwidth-test.sh
:language: bash
comgr
-----
AKA ``ROCm-CompilerSupport``.
@ -110,6 +149,8 @@ Build ``comgr``.
This is another that in latest HEAD uses ``llvm-project/amd/`` directory,
but in ``6.0.2`` this isn't available.
Failing to find ``hip`` directory. XXX
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-comgr.sh
:language: bash
@ -125,14 +166,6 @@ XXX Skip this XXX.
:language: bash
hipcc
-----
hipcc built under clr. This seems better.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-hipcc.sh
:language: bash
clr
---
OpenCL and more.
@ -143,14 +176,6 @@ OpenCL and more.
:language: bash
rocminfo
--------
Yes, ``rocminfo``
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocminfo.sh
:language: bash
rocBLAS
-------
Needed before hipBLAS.
@ -203,14 +228,6 @@ hipBLAS
:language: bash
rocm-bandwidth-test
-------------------
``rocm-bandwidth-test``.
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocm-bandwidth-test.sh
:language: bash
HOLD
----
Don't upgrade over these files. Debian has higher epochs.