mor build, output
parent
1704c80e7c
commit
3eb675a66f
|
@ -0,0 +1,52 @@
|
|||
|
||||
RocmBandwidthTest Version: 2.6.0
|
||||
|
||||
Launch Command is: rocm-bandwidth-test -e
|
||||
|
||||
|
||||
Device Index: 0
|
||||
Device Type: CPU
|
||||
Device Name: AMD EPYC 7662 64-Core Processor
|
||||
Allocatable Memory Size (KB): 263788012
|
||||
|
||||
Device Index: 1
|
||||
Device Type: GPU
|
||||
Device Name: AMD Radeon Graphics
|
||||
Device BDF: c3:0.0
|
||||
Device UUID: GPU-9391a2630862a05e
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
|
||||
Device Index: 2
|
||||
Device Type: GPU
|
||||
Device Name: AMD Radeon Graphics
|
||||
Device BDF: 83:0.0
|
||||
Device UUID: GPU-b8774af2c31f0c3b
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
|
||||
Device Index: 3
|
||||
Device Type: GPU
|
||||
Device Name: AMD Radeon Graphics
|
||||
Device BDF: 48:0.0
|
||||
Device UUID: GPU-bdc8ded57bfb9196
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
|
||||
Device Index: 4
|
||||
Device Type: GPU
|
||||
Device Name: AMD Radeon Graphics
|
||||
Device BDF: 03:0.0
|
||||
Device UUID: GPU-ffecdebca16a2c8e
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
|
||||
Device Index: 5
|
||||
Device Type: GPU
|
||||
Device Name: AMD Radeon Graphics
|
||||
Device BDF: 06:0.0
|
||||
Device UUID: GPU-74d6901254ac9411
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
Allocatable Memory Size (KB): 25149440
|
||||
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
....................................................................................................................................................................................................................................................................................................................................................
|
||||
RocmBandwidthTest Version: 2.6.0
|
||||
|
||||
Launch Command is: rocm-bandwidth-test (rocm_bandwidth -a + rocm_bandwidth -A)
|
||||
|
||||
|
||||
Device: 0, AMD EPYC 7662 64-Core Processor
|
||||
Device: 1, AMD Radeon Graphics, GPU-9391a2630862a05e, c3:0.0
|
||||
Device: 2, AMD Radeon Graphics, GPU-b8774af2c31f0c3b, 83:0.0
|
||||
Device: 3, AMD Radeon Graphics, GPU-bdc8ded57bfb9196, 48:0.0
|
||||
Device: 4, AMD Radeon Graphics, GPU-ffecdebca16a2c8e, 03:0.0
|
||||
Device: 5, AMD Radeon Graphics, GPU-74d6901254ac9411, 06:0.0
|
||||
|
||||
Inter-Device Access
|
||||
|
||||
D/D 0 1 2 3 4 5
|
||||
|
||||
0 1 1 1 1 1 1
|
||||
|
||||
1 1 1 1 1 1 1
|
||||
|
||||
2 1 1 1 1 1 1
|
||||
|
||||
3 1 1 1 1 1 1
|
||||
|
||||
4 1 1 1 1 1 1
|
||||
|
||||
5 1 1 1 1 1 1
|
||||
|
||||
|
||||
Inter-Device Numa Distance
|
||||
|
||||
D/D 0 1 2 3 4 5
|
||||
|
||||
0 0 20 20 20 20 20
|
||||
|
||||
1 20 0 40 40 40 40
|
||||
|
||||
2 20 40 0 40 40 40
|
||||
|
||||
3 20 40 40 0 40 40
|
||||
|
||||
4 20 40 40 40 0 40
|
||||
|
||||
5 20 40 40 40 40 0
|
||||
|
||||
|
||||
Unidirectional copy peak bandwidth GB/s
|
||||
|
||||
D/D 0 1 2 3 4 5
|
||||
|
||||
0 N/A 27.995 27.994 27.998 27.952 27.995
|
||||
|
||||
1 25.704 1068.612 23.989 23.990 23.989 23.989
|
||||
|
||||
2 25.704 25.704 1118.099 23.990 23.990 23.989
|
||||
|
||||
3 25.703 25.704 25.704 1069.976 23.990 23.990
|
||||
|
||||
4 25.703 25.703 25.704 25.704 1106.677 23.989
|
||||
|
||||
5 25.703 23.990 25.703 25.703 25.703 1035.311
|
||||
|
||||
|
||||
Bidirectional copy peak bandwidth GB/s
|
||||
|
||||
D/D 0 1 2 3 4 5
|
||||
|
||||
0 N/A 46.184 45.644 46.055 45.643 45.740
|
||||
|
||||
1 46.184 N/A 47.968 47.965 47.978 47.979
|
||||
|
||||
2 45.644 47.968 N/A 47.957 47.977 47.979
|
||||
|
||||
3 46.055 47.965 47.957 N/A 47.954 47.979
|
||||
|
||||
4 45.643 47.978 47.977 47.954 N/A 47.977
|
||||
|
||||
5 45.740 47.979 47.979 47.979 47.977 N/A
|
||||
|
||||
|
|
@ -0,0 +1,529 @@
|
|||
[37mROCk module is loaded[0m
|
||||
=====================
|
||||
HSA System Attributes
|
||||
=====================
|
||||
Runtime Version: 1.1
|
||||
System Timestamp Freq.: 1000.000000MHz
|
||||
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
|
||||
Machine Model: LARGE
|
||||
System Endianness: LITTLE
|
||||
Mwaitx: DISABLED
|
||||
DMAbuf Support: YES
|
||||
|
||||
==========
|
||||
HSA Agents
|
||||
==========
|
||||
*******
|
||||
Agent 1
|
||||
*******
|
||||
Name: AMD EPYC 7662 64-Core Processor
|
||||
Uuid: CPU-XX
|
||||
Marketing Name: AMD EPYC 7662 64-Core Processor
|
||||
Vendor Name: CPU
|
||||
Feature: None specified
|
||||
Profile: FULL_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 0(0x0)
|
||||
Queue Min Size: 0(0x0)
|
||||
Queue Max Size: 0(0x0)
|
||||
Queue Type: MULTI
|
||||
Node: 0
|
||||
Device Type: CPU
|
||||
Cache Info:
|
||||
L1: 32768(0x8000) KB
|
||||
Chip ID: 0(0x0)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2000
|
||||
BDFID: 0
|
||||
Internal Node ID: 0
|
||||
Compute Unit: 128
|
||||
SIMDs per CU: 0
|
||||
Shader Engines: 0
|
||||
Shader Arrs. per Eng.: 0
|
||||
WatchPts on Addr. Ranges:1
|
||||
Features: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: FINE GRAINED
|
||||
Size: 263788012(0xfb915ec) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: TRUE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
|
||||
Size: 263788012(0xfb915ec) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: TRUE
|
||||
Pool 3
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 263788012(0xfb915ec) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: TRUE
|
||||
ISA Info:
|
||||
*******
|
||||
Agent 2
|
||||
*******
|
||||
Name: gfx1100
|
||||
Uuid: GPU-9391a2630862a05e
|
||||
Marketing Name: AMD Radeon Graphics
|
||||
Vendor Name: AMD
|
||||
Feature: KERNEL_DISPATCH
|
||||
Profile: BASE_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 128(0x80)
|
||||
Queue Min Size: 64(0x40)
|
||||
Queue Max Size: 131072(0x20000)
|
||||
Queue Type: MULTI
|
||||
Node: 1
|
||||
Device Type: GPU
|
||||
Cache Info:
|
||||
L1: 32(0x20) KB
|
||||
L2: 6144(0x1800) KB
|
||||
L3: 98304(0x18000) KB
|
||||
Chip ID: 29772(0x744c)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2526
|
||||
BDFID: 49920
|
||||
Internal Node ID: 1
|
||||
Compute Unit: 96
|
||||
SIMDs per CU: 2
|
||||
Shader Engines: 6
|
||||
Shader Arrs. per Eng.: 2
|
||||
WatchPts on Addr. Ranges:4
|
||||
Coherent Host Access: FALSE
|
||||
Features: KERNEL_DISPATCH
|
||||
Fast F16 Operation: TRUE
|
||||
Wavefront Size: 32(0x20)
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Max Waves Per CU: 32(0x20)
|
||||
Max Work-item Per CU: 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
Max fbarriers/Workgrp: 32
|
||||
Packet Processor uCode:: 528
|
||||
SDMA engine uCode:: 19
|
||||
IOMMU Support:: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 3
|
||||
Segment: GROUP
|
||||
Size: 64(0x40) KB
|
||||
Allocatable: FALSE
|
||||
Alloc Granule: 0KB
|
||||
Alloc Alignment: 0KB
|
||||
Accessible by all: FALSE
|
||||
ISA Info:
|
||||
ISA 1
|
||||
Name: amdgcn-amd-amdhsa--gfx1100
|
||||
Machine Models: HSA_MACHINE_MODEL_LARGE
|
||||
Profiles: HSA_PROFILE_BASE
|
||||
Default Rounding Mode: NEAR
|
||||
Default Rounding Mode: NEAR
|
||||
Fast f16: TRUE
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
FBarrier Max Size: 32
|
||||
*******
|
||||
Agent 3
|
||||
*******
|
||||
Name: gfx1100
|
||||
Uuid: GPU-b8774af2c31f0c3b
|
||||
Marketing Name: AMD Radeon Graphics
|
||||
Vendor Name: AMD
|
||||
Feature: KERNEL_DISPATCH
|
||||
Profile: BASE_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 128(0x80)
|
||||
Queue Min Size: 64(0x40)
|
||||
Queue Max Size: 131072(0x20000)
|
||||
Queue Type: MULTI
|
||||
Node: 2
|
||||
Device Type: GPU
|
||||
Cache Info:
|
||||
L1: 32(0x20) KB
|
||||
L2: 6144(0x1800) KB
|
||||
L3: 98304(0x18000) KB
|
||||
Chip ID: 29772(0x744c)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2526
|
||||
BDFID: 33536
|
||||
Internal Node ID: 2
|
||||
Compute Unit: 96
|
||||
SIMDs per CU: 2
|
||||
Shader Engines: 6
|
||||
Shader Arrs. per Eng.: 2
|
||||
WatchPts on Addr. Ranges:4
|
||||
Coherent Host Access: FALSE
|
||||
Features: KERNEL_DISPATCH
|
||||
Fast F16 Operation: TRUE
|
||||
Wavefront Size: 32(0x20)
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Max Waves Per CU: 32(0x20)
|
||||
Max Work-item Per CU: 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
Max fbarriers/Workgrp: 32
|
||||
Packet Processor uCode:: 528
|
||||
SDMA engine uCode:: 19
|
||||
IOMMU Support:: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 3
|
||||
Segment: GROUP
|
||||
Size: 64(0x40) KB
|
||||
Allocatable: FALSE
|
||||
Alloc Granule: 0KB
|
||||
Alloc Alignment: 0KB
|
||||
Accessible by all: FALSE
|
||||
ISA Info:
|
||||
ISA 1
|
||||
Name: amdgcn-amd-amdhsa--gfx1100
|
||||
Machine Models: HSA_MACHINE_MODEL_LARGE
|
||||
Profiles: HSA_PROFILE_BASE
|
||||
Default Rounding Mode: NEAR
|
||||
Default Rounding Mode: NEAR
|
||||
Fast f16: TRUE
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
FBarrier Max Size: 32
|
||||
*******
|
||||
Agent 4
|
||||
*******
|
||||
Name: gfx1100
|
||||
Uuid: GPU-bdc8ded57bfb9196
|
||||
Marketing Name: AMD Radeon Graphics
|
||||
Vendor Name: AMD
|
||||
Feature: KERNEL_DISPATCH
|
||||
Profile: BASE_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 128(0x80)
|
||||
Queue Min Size: 64(0x40)
|
||||
Queue Max Size: 131072(0x20000)
|
||||
Queue Type: MULTI
|
||||
Node: 3
|
||||
Device Type: GPU
|
||||
Cache Info:
|
||||
L1: 32(0x20) KB
|
||||
L2: 6144(0x1800) KB
|
||||
L3: 98304(0x18000) KB
|
||||
Chip ID: 29772(0x744c)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2526
|
||||
BDFID: 18432
|
||||
Internal Node ID: 3
|
||||
Compute Unit: 96
|
||||
SIMDs per CU: 2
|
||||
Shader Engines: 6
|
||||
Shader Arrs. per Eng.: 2
|
||||
WatchPts on Addr. Ranges:4
|
||||
Coherent Host Access: FALSE
|
||||
Features: KERNEL_DISPATCH
|
||||
Fast F16 Operation: TRUE
|
||||
Wavefront Size: 32(0x20)
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Max Waves Per CU: 32(0x20)
|
||||
Max Work-item Per CU: 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
Max fbarriers/Workgrp: 32
|
||||
Packet Processor uCode:: 528
|
||||
SDMA engine uCode:: 19
|
||||
IOMMU Support:: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 3
|
||||
Segment: GROUP
|
||||
Size: 64(0x40) KB
|
||||
Allocatable: FALSE
|
||||
Alloc Granule: 0KB
|
||||
Alloc Alignment: 0KB
|
||||
Accessible by all: FALSE
|
||||
ISA Info:
|
||||
ISA 1
|
||||
Name: amdgcn-amd-amdhsa--gfx1100
|
||||
Machine Models: HSA_MACHINE_MODEL_LARGE
|
||||
Profiles: HSA_PROFILE_BASE
|
||||
Default Rounding Mode: NEAR
|
||||
Default Rounding Mode: NEAR
|
||||
Fast f16: TRUE
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
FBarrier Max Size: 32
|
||||
*******
|
||||
Agent 5
|
||||
*******
|
||||
Name: gfx1100
|
||||
Uuid: GPU-ffecdebca16a2c8e
|
||||
Marketing Name: AMD Radeon Graphics
|
||||
Vendor Name: AMD
|
||||
Feature: KERNEL_DISPATCH
|
||||
Profile: BASE_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 128(0x80)
|
||||
Queue Min Size: 64(0x40)
|
||||
Queue Max Size: 131072(0x20000)
|
||||
Queue Type: MULTI
|
||||
Node: 4
|
||||
Device Type: GPU
|
||||
Cache Info:
|
||||
L1: 32(0x20) KB
|
||||
L2: 6144(0x1800) KB
|
||||
L3: 98304(0x18000) KB
|
||||
Chip ID: 29772(0x744c)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2526
|
||||
BDFID: 768
|
||||
Internal Node ID: 4
|
||||
Compute Unit: 96
|
||||
SIMDs per CU: 2
|
||||
Shader Engines: 6
|
||||
Shader Arrs. per Eng.: 2
|
||||
WatchPts on Addr. Ranges:4
|
||||
Coherent Host Access: FALSE
|
||||
Features: KERNEL_DISPATCH
|
||||
Fast F16 Operation: TRUE
|
||||
Wavefront Size: 32(0x20)
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Max Waves Per CU: 32(0x20)
|
||||
Max Work-item Per CU: 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
Max fbarriers/Workgrp: 32
|
||||
Packet Processor uCode:: 528
|
||||
SDMA engine uCode:: 19
|
||||
IOMMU Support:: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 3
|
||||
Segment: GROUP
|
||||
Size: 64(0x40) KB
|
||||
Allocatable: FALSE
|
||||
Alloc Granule: 0KB
|
||||
Alloc Alignment: 0KB
|
||||
Accessible by all: FALSE
|
||||
ISA Info:
|
||||
ISA 1
|
||||
Name: amdgcn-amd-amdhsa--gfx1100
|
||||
Machine Models: HSA_MACHINE_MODEL_LARGE
|
||||
Profiles: HSA_PROFILE_BASE
|
||||
Default Rounding Mode: NEAR
|
||||
Default Rounding Mode: NEAR
|
||||
Fast f16: TRUE
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
FBarrier Max Size: 32
|
||||
*******
|
||||
Agent 6
|
||||
*******
|
||||
Name: gfx1100
|
||||
Uuid: GPU-74d6901254ac9411
|
||||
Marketing Name: AMD Radeon Graphics
|
||||
Vendor Name: AMD
|
||||
Feature: KERNEL_DISPATCH
|
||||
Profile: BASE_PROFILE
|
||||
Float Round Mode: NEAR
|
||||
Max Queue Number: 128(0x80)
|
||||
Queue Min Size: 64(0x40)
|
||||
Queue Max Size: 131072(0x20000)
|
||||
Queue Type: MULTI
|
||||
Node: 5
|
||||
Device Type: GPU
|
||||
Cache Info:
|
||||
L1: 32(0x20) KB
|
||||
L2: 6144(0x1800) KB
|
||||
L3: 98304(0x18000) KB
|
||||
Chip ID: 29772(0x744c)
|
||||
ASIC Revision: 0(0x0)
|
||||
Cacheline Size: 64(0x40)
|
||||
Max Clock Freq. (MHz): 2526
|
||||
BDFID: 1536
|
||||
Internal Node ID: 5
|
||||
Compute Unit: 96
|
||||
SIMDs per CU: 2
|
||||
Shader Engines: 6
|
||||
Shader Arrs. per Eng.: 2
|
||||
WatchPts on Addr. Ranges:4
|
||||
Coherent Host Access: FALSE
|
||||
Features: KERNEL_DISPATCH
|
||||
Fast F16 Operation: TRUE
|
||||
Wavefront Size: 32(0x20)
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Max Waves Per CU: 32(0x20)
|
||||
Max Work-item Per CU: 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
Max fbarriers/Workgrp: 32
|
||||
Packet Processor uCode:: 528
|
||||
SDMA engine uCode:: 19
|
||||
IOMMU Support:: None
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 2
|
||||
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
|
||||
Size: 25149440(0x17fc000) KB
|
||||
Allocatable: TRUE
|
||||
Alloc Granule: 4KB
|
||||
Alloc Alignment: 4KB
|
||||
Accessible by all: FALSE
|
||||
Pool 3
|
||||
Segment: GROUP
|
||||
Size: 64(0x40) KB
|
||||
Allocatable: FALSE
|
||||
Alloc Granule: 0KB
|
||||
Alloc Alignment: 0KB
|
||||
Accessible by all: FALSE
|
||||
ISA Info:
|
||||
ISA 1
|
||||
Name: amdgcn-amd-amdhsa--gfx1100
|
||||
Machine Models: HSA_MACHINE_MODEL_LARGE
|
||||
Profiles: HSA_PROFILE_BASE
|
||||
Default Rounding Mode: NEAR
|
||||
Default Rounding Mode: NEAR
|
||||
Fast f16: TRUE
|
||||
Workgroup Max Size: 1024(0x400)
|
||||
Workgroup Max Size per Dimension:
|
||||
x 1024(0x400)
|
||||
y 1024(0x400)
|
||||
z 1024(0x400)
|
||||
Grid Max Size: 4294967295(0xffffffff)
|
||||
Grid Max Size per Dimension:
|
||||
x 4294967295(0xffffffff)
|
||||
y 4294967295(0xffffffff)
|
||||
z 4294967295(0xffffffff)
|
||||
FBarrier Max Size: 32
|
||||
*** Done ***
|
|
@ -0,0 +1,12 @@
|
|||
diff --git a/src/libhsakmt.ver b/src/libhsakmt.ver
|
||||
index 15c2916..c04cefe 100644
|
||||
--- a/src/libhsakmt.ver
|
||||
+++ b/src/libhsakmt.ver
|
||||
@@ -81,6 +81,7 @@ hsaKmtWaitOnEvent_Ext;
|
||||
hsaKmtWaitOnMultipleEvents_Ext;
|
||||
hsaKmtReplaceAsanHeaderPage;
|
||||
hsaKmtReturnAsanHeaderPage;
|
||||
+hsaKmtGetAMDGPUDeviceHandle;
|
||||
|
||||
local: *;
|
||||
};
|
|
@ -19,11 +19,7 @@ cmake -B build -G Ninja \
|
|||
-DCPACK_SOURCE_TZ=OFF \
|
||||
-DROCM_CCACHE_BUILD=ON \
|
||||
-DROCM_DIR=/opt/rocm \
|
||||
-Dhip_DIR=/home/jebba/devel/ROCm/hip
|
||||
-Dhip_DIR=/opt/rocm/share/rocm/cmake
|
||||
|
||||
ninja -C build package
|
||||
sudo dpkg -i build/comgr_2.6.0.99999-local_amd64.deb
|
||||
exit
|
||||
# XXX
|
||||
hip_DIR hip_DIR-NOTFOUND
|
||||
-Dhip_DIR=/opt/rocm/share/rocm/cmake
|
||||
|
|
|
@ -18,7 +18,8 @@ cmake -B build -G Ninja \
|
|||
-DCPACK_SOURCE_TBZ2=OFF \
|
||||
-DCPACK_SOURCE_TGZ=OFF \
|
||||
-DCPACK_SOURCE_TXZ=OFF \
|
||||
-DCPACK_SOURCE_TZ=OFF
|
||||
-DCPACK_SOURCE_TZ=OFF \
|
||||
-DHIPCC_BACKWARD_COMPATIBILITY=ON
|
||||
|
||||
ninja -C build package
|
||||
sudo dpkg -i build/hipcc_1.0.0.99999-local_amd64.deb
|
||||
|
|
|
@ -1,27 +1,23 @@
|
|||
#!/bin/bash
|
||||
rm -rf rocm-bandwidth-test
|
||||
git clone https://github.com/ROCm/rocm_bandwidth_test
|
||||
cd rocm_bandwidth_test/
|
||||
git checkout rocm-6.0.2
|
||||
rm -rf build
|
||||
|
||||
cmake -B build -G Ninja \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_CXX_COMPILER=clang++ \
|
||||
-DCMAKE_CXX_FLAGS="-I/opt/rocm/include/hsa" \
|
||||
-DCMAKE_C_COMPILER=clang \
|
||||
-DCMAKE_INSTALL_PREFIX=/opt/rocm \
|
||||
-DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm \
|
||||
-DCPACK_BINARY_DEB=ON \
|
||||
-DCPACK_BINARY_STGZ=OFF \
|
||||
-DCPACK_BINARY_TGZ=OFF \
|
||||
-DCPACK_BINARY_TZ=OFF \
|
||||
-DCPACK_GENERATOR=DEB \
|
||||
-DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm \
|
||||
-DCPACK_SOURCE_TBZ2=OFF \
|
||||
-DCPACK_SOURCE_TGZ=OFF \
|
||||
-DCPACK_SOURCE_TZ=OFF \
|
||||
-DCPACK_SOURCE_TXZ=OFF \
|
||||
-DCPACK_SOURCE_TXZ=OFF \
|
||||
-DCPACK_GENERATOR=DEB \
|
||||
-DCMAKE_CXX_COMPILER=clang++ \
|
||||
-DCMAKE_C_COMPILER=clang \
|
||||
-DCMAKE_CXX_FLAGS="-I/opt/rocm/include/hsa"
|
||||
-DCPACK_SOURCE_TZ=OFF
|
||||
|
||||
ninja -C build package
|
||||
sudo dpkg -i build/rocm-bandwidth-test_1.4.0.99999-local_amd64.deb
|
||||
|
|
|
@ -19,5 +19,3 @@ cmake -B build -G Ninja \
|
|||
|
||||
ninja -C build package
|
||||
sudo dpkg -i build/rocminfo_1.0.0.99999-local_amd64.deb
|
||||
exit
|
||||
/usr/bin/ld: /opt/rocm/lib/libhsa-runtime64.so.1.12.0: undefined reference to hsaKmtGetAMDGPUDeviceHandle
|
||||
|
|
|
@ -23,5 +23,3 @@ cmake -B build -G Ninja \
|
|||
ninja -C build package
|
||||
sudo dpkg -i build/hsa-rocr_1.12.0-local_amd64.deb \
|
||||
build/hsa-rocr-dev_1.12.0-local_amd64.deb
|
||||
exit
|
||||
-DINCLUDE_PATH_COMPATIBILITY=ON
|
||||
|
|
|
@ -2,6 +2,7 @@ git clone --recursive https://github.com/ROCm/ROCT-Thunk-Interface
|
|||
cd ROCT-Thunk-Interface/
|
||||
git checkout rocm-6.0.2
|
||||
rm -rf build
|
||||
# XXX PATCH
|
||||
cmake -B build -G Ninja \
|
||||
-DBUILD_SHARED_LIBS=ON \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
|
|
|
@ -8,7 +8,7 @@ msgid ""
|
|||
msgstr ""
|
||||
"Project-Id-Version: tinyrocs 0\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2024-02-06 08:18-0700\n"
|
||||
"POT-Creation-Date: 2024-02-06 09:09-0700\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: en\n"
|
||||
|
@ -70,3 +70,28 @@ msgstr ""
|
|||
#: ../../../_source/output.rst:48
|
||||
msgid "Output with four GPUs:"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:54
|
||||
msgid "rocm-bandwidth-test"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:56
|
||||
msgid "``rocm-bandwidth-test``"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:62
|
||||
msgid "``rocm-bandwidth-test -e``"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:63
|
||||
msgid "Devices."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:69
|
||||
msgid "rocminfo"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/output.rst:71
|
||||
msgid "``rocminfo``"
|
||||
msgstr ""
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ msgid ""
|
|||
msgstr ""
|
||||
"Project-Id-Version: tinyrocs 0\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2024-02-05 19:45-0700\n"
|
||||
"POT-Creation-Date: 2024-02-06 08:57-0700\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: en\n"
|
||||
|
@ -103,164 +103,195 @@ msgid "Build ``amd-smi``."
|
|||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:72
|
||||
msgid "device-libs"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:73
|
||||
msgid "Build ``device-libs``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:75
|
||||
msgid ""
|
||||
"Note, building against the ``amd-stg-open`` or ``amd-staging`` branch "
|
||||
"includes and ``amd/`` directory that has ``device-libs`` to build. Release "
|
||||
"``6.0.2`` does not have these directories, so the packages need to be build "
|
||||
"from other repos, which is kind of broken, afaict."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:85
|
||||
msgid "roct-thunk-interface"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:86
|
||||
#: ../../../_source/toolchain-6.0.2.rst:73
|
||||
msgid ""
|
||||
"This needs a patchlet or other applications (e.g. ``rocminfo``) won't be "
|
||||
"able to build. Just needs a one-liner:"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:79
|
||||
msgid "Build ``roct-thunk-interface``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:93
|
||||
#: ../../../_source/toolchain-6.0.2.rst:86
|
||||
msgid "device-libs"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:87
|
||||
msgid "Build ``device-libs``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:89
|
||||
msgid ""
|
||||
"Using the deprecated device-libs repository, as it is what is used for "
|
||||
"release ``6.0.2``. In later releases, this package is built under the ``llvm-"
|
||||
"project/amd`` directory."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:98
|
||||
msgid "rocr-runtime"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:94
|
||||
msgid ""
|
||||
"Build ``rocr-runtime``. Needs hsakmtConfig.cmake from ROCT-Thunk-Interface "
|
||||
"first."
|
||||
#: ../../../_source/toolchain-6.0.2.rst:99
|
||||
msgid "Build ``rocr-runtime``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:100
|
||||
#: ../../../_source/toolchain-6.0.2.rst:101
|
||||
msgid ""
|
||||
"This has an option for ``TARGET_DEVICES``. By default all targets are built. "
|
||||
"This adds a *lot* of time to the build for devices that won't be used. But "
|
||||
"if they aren't included, other packages further down the toolchain may "
|
||||
"complain, so include them all for now."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:106
|
||||
msgid "List of possible targets:"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:108
|
||||
msgid ""
|
||||
"``gfx700;gfx701;gfx702;gfx801;gfx802;gfx803;gfx805;gfx810;gfx900;gfx902;"
|
||||
"gfx904;gfx906;gfx908;gfx909;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;"
|
||||
"gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;"
|
||||
"gfx1036;gfx1100;gfx1101;gfx1102;gfx1103``"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:110
|
||||
msgid "The AMD Radeon 7900 XTX target is ``gfx1100``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:115
|
||||
msgid ""
|
||||
"For some reason, this is installing headers to ``/usr/hsa`` instead of ``/"
|
||||
"opt/rocm``. It is ignoring the ``PREFIX``. Workaround..."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:105
|
||||
#: ../../../_source/toolchain-6.0.2.rst:120
|
||||
msgid "hipcc"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:121
|
||||
msgid "hipcc built under clr. This seems better."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:128
|
||||
msgid "comgr"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:106
|
||||
#: ../../../_source/toolchain-6.0.2.rst:129
|
||||
msgid "AKA ``ROCm-CompilerSupport``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:108
|
||||
#: ../../../_source/toolchain-6.0.2.rst:131
|
||||
msgid "Build ``comgr``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:110
|
||||
#: ../../../_source/toolchain-6.0.2.rst:133
|
||||
msgid ""
|
||||
"This is another that in latest HEAD uses ``llvm-project/amd/`` directory, "
|
||||
"but in ``6.0.2`` this isn't available."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:117
|
||||
#: ../../../_source/toolchain-6.0.2.rst:136
|
||||
msgid "Failing to find ``hip`` directory. XXX"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:142
|
||||
msgid "Has non-fatal (?) ``hip_DIR-NOTFOUND`` in cmake."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:121
|
||||
#: ../../../_source/toolchain-6.0.2.rst:146
|
||||
msgid "LLVM Pass Two"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:122
|
||||
#: ../../../_source/toolchain-6.0.2.rst:147
|
||||
msgid "XXX Skip this XXX."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:129
|
||||
msgid "hipcc"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:130
|
||||
msgid "hipcc built under clr. This seems better."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:137
|
||||
#: ../../../_source/toolchain-6.0.2.rst:154
|
||||
msgid "clr"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:138
|
||||
#: ../../../_source/toolchain-6.0.2.rst:155
|
||||
msgid "OpenCL and more."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:140
|
||||
#: ../../../_source/toolchain-6.0.2.rst:157
|
||||
msgid ""
|
||||
"``file STRINGS file \"/home/jebba/devel/ROCm/hip/VERSION\" cannot be read.``"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:147
|
||||
#: ../../../_source/toolchain-6.0.2.rst:164
|
||||
msgid "rocminfo"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:148
|
||||
#: ../../../_source/toolchain-6.0.2.rst:165
|
||||
msgid "Yes, ``rocminfo``"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:155
|
||||
#: ../../../_source/toolchain-6.0.2.rst:172
|
||||
msgid "rocBLAS"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:156
|
||||
#: ../../../_source/toolchain-6.0.2.rst:173
|
||||
msgid "Needed before hipBLAS."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:158
|
||||
#: ../../../_source/toolchain-6.0.2.rst:175
|
||||
msgid "Set up this once:"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:175
|
||||
#: ../../../_source/toolchain-6.0.2.rst:192
|
||||
msgid "rocprim"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:176
|
||||
#: ../../../_source/toolchain-6.0.2.rst:193
|
||||
msgid "``rocprim``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:183
|
||||
#: ../../../_source/toolchain-6.0.2.rst:200
|
||||
msgid "rocsparse"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:184
|
||||
#: ../../../_source/toolchain-6.0.2.rst:201
|
||||
msgid "``rocsparse``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:191
|
||||
#: ../../../_source/toolchain-6.0.2.rst:208
|
||||
msgid "rocsolver"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:192
|
||||
#: ../../../_source/toolchain-6.0.2.rst:209
|
||||
msgid "``rocsolver`` for hipBLAS."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:199
|
||||
#: ../../../_source/toolchain-6.0.2.rst:216
|
||||
msgid "hipBLAS"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:200
|
||||
#: ../../../_source/toolchain-6.0.2.rst:217
|
||||
msgid "``hipBLAS`` plz."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:207
|
||||
#: ../../../_source/toolchain-6.0.2.rst:224
|
||||
msgid "rocm-bandwidth-test"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:208
|
||||
#: ../../../_source/toolchain-6.0.2.rst:225
|
||||
msgid "``rocm-bandwidth-test``."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:215
|
||||
#: ../../../_source/toolchain-6.0.2.rst:232
|
||||
msgid "HOLD"
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:216
|
||||
#: ../../../_source/toolchain-6.0.2.rst:233
|
||||
msgid "Don't upgrade over these files. Debian has higher epochs."
|
||||
msgstr ""
|
||||
|
||||
#: ../../../_source/toolchain-6.0.2.rst:218
|
||||
#: ../../../_source/toolchain-6.0.2.rst:235
|
||||
msgid "``apt-mark hold hipcc llvm rocm-cmake rocm-device-libs rocminfo``"
|
||||
msgstr ""
|
||||
|
|
|
@ -6,37 +6,37 @@ System _output.
|
|||
amd-smi
|
||||
=======
|
||||
``amd-smi bad-pages``
|
||||
----------------
|
||||
---------------------
|
||||
|
||||
.. literalinclude:: _static/_output/amd-smi-bad-pages.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
``amd-smi firmware``
|
||||
----------------
|
||||
--------------------
|
||||
|
||||
.. literalinclude:: _static/_output/amd-smi-firmware.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
``amd-smi list``
|
||||
----------------
|
||||
|
||||
.. literalinclude:: _static/_output/amd-smi-list.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
``amd-smi metric``
|
||||
--------------------
|
||||
.. literalinclude:: _static/_output/amd-smi-metric.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
``amd-smi static``
|
||||
------------------
|
||||
.. literalinclude:: _static/_output/amd-smi-static.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
``amd-smi topology``
|
||||
--------------------
|
||||
.. literalinclude:: _static/_output/amd-smi-topology.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
Pytorch
|
||||
=======
|
||||
|
@ -48,5 +48,28 @@ Find scriptlet in Applications Pytorch section.
|
|||
Output with four GPUs:
|
||||
|
||||
.. literalinclude:: _static/_output/verify-pytorch.txt
|
||||
:language: output
|
||||
:language: BashSession
|
||||
|
||||
rocm-bandwidth-test
|
||||
===================
|
||||
``rocm-bandwidth-test``
|
||||
-----------------------
|
||||
|
||||
.. literalinclude:: _static/_output/rocm-bandwidth-test.txt
|
||||
:language: BashSession
|
||||
|
||||
``rocm-bandwidth-test -e``
|
||||
--------------------------
|
||||
Devices.
|
||||
|
||||
.. literalinclude:: _static/_output/rocm-bandwidth-test-devices.txt
|
||||
:language: BashSession
|
||||
|
||||
rocminfo
|
||||
========
|
||||
``rocminfo``
|
||||
---------------------
|
||||
|
||||
.. literalinclude:: _static/_output/rocminfo.txt
|
||||
:language: BashSession
|
||||
|
||||
|
|
|
@ -68,31 +68,46 @@ Build ``amd-smi``.
|
|||
:language: bash
|
||||
|
||||
|
||||
device-libs
|
||||
-----------
|
||||
Build ``device-libs``.
|
||||
|
||||
Note, building against the ``amd-stg-open`` or ``amd-staging`` branch
|
||||
includes and ``amd/`` directory that has ``device-libs`` to build.
|
||||
Release ``6.0.2`` does not have these directories, so the packages
|
||||
need to be build from other repos, which is kind of broken, afaict.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-device-libs.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
roct-thunk-interface
|
||||
--------------------
|
||||
This needs a patchlet or other applications (e.g. ``rocminfo``) won't
|
||||
be able to build. Just needs a one-liner:
|
||||
|
||||
.. literalinclude:: _static/toolchain/patch/roct.patch
|
||||
:language: diff
|
||||
|
||||
Build ``roct-thunk-interface``.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-roct-thunk-interface.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
device-libs
|
||||
-----------
|
||||
Build ``device-libs``.
|
||||
|
||||
Using the deprecated device-libs repository, as it is what is used
|
||||
for release ``6.0.2``. In later releases, this package is built
|
||||
under the ``llvm-project/amd`` directory.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-device-libs.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
rocr-runtime
|
||||
------------
|
||||
Build ``rocr-runtime``.
|
||||
Needs hsakmtConfig.cmake from ROCT-Thunk-Interface first.
|
||||
|
||||
This has an option for ``TARGET_DEVICES``. By default all targets are built.
|
||||
This adds a *lot* of time to the build for devices that won't be used.
|
||||
But if they aren't included, other packages further down the toolchain may
|
||||
complain, so include them all for now.
|
||||
|
||||
List of possible targets:
|
||||
|
||||
``gfx700;gfx701;gfx702;gfx801;gfx802;gfx803;gfx805;gfx810;gfx900;gfx902;gfx904;gfx906;gfx908;gfx909;gfx90a;gfx90c;gfx940;gfx941;gfx942;gfx1010;gfx1011;gfx1012;gfx1013;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1100;gfx1101;gfx1102;gfx1103``
|
||||
|
||||
The AMD Radeon 7900 XTX target is ``gfx1100``.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocr-runtime.sh
|
||||
:language: bash
|
||||
|
@ -101,6 +116,30 @@ For some reason, this is installing headers to ``/usr/hsa`` instead of
|
|||
``/opt/rocm``. It is ignoring the ``PREFIX``. Workaround...
|
||||
|
||||
|
||||
hipcc
|
||||
-----
|
||||
hipcc built under clr. This seems better.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-hipcc.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
rocminfo
|
||||
--------
|
||||
Yes, ``rocminfo``
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocminfo.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
rocm-bandwidth-test
|
||||
-------------------
|
||||
``rocm-bandwidth-test``.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocm-bandwidth-test.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
comgr
|
||||
-----
|
||||
AKA ``ROCm-CompilerSupport``.
|
||||
|
@ -110,6 +149,8 @@ Build ``comgr``.
|
|||
This is another that in latest HEAD uses ``llvm-project/amd/`` directory,
|
||||
but in ``6.0.2`` this isn't available.
|
||||
|
||||
Failing to find ``hip`` directory. XXX
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-comgr.sh
|
||||
:language: bash
|
||||
|
||||
|
@ -125,14 +166,6 @@ XXX Skip this XXX.
|
|||
:language: bash
|
||||
|
||||
|
||||
hipcc
|
||||
-----
|
||||
hipcc built under clr. This seems better.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-hipcc.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
clr
|
||||
---
|
||||
OpenCL and more.
|
||||
|
@ -143,14 +176,6 @@ OpenCL and more.
|
|||
:language: bash
|
||||
|
||||
|
||||
rocminfo
|
||||
--------
|
||||
Yes, ``rocminfo``
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocminfo.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
rocBLAS
|
||||
-------
|
||||
Needed before hipBLAS.
|
||||
|
@ -203,14 +228,6 @@ hipBLAS
|
|||
:language: bash
|
||||
|
||||
|
||||
rocm-bandwidth-test
|
||||
-------------------
|
||||
``rocm-bandwidth-test``.
|
||||
|
||||
.. literalinclude:: _static/toolchain/rocm-6.0.2/build-rocm-bandwidth-test.sh
|
||||
:language: bash
|
||||
|
||||
|
||||
HOLD
|
||||
----
|
||||
Don't upgrade over these files. Debian has higher epochs.
|
||||
|
|
Loading…
Reference in New Issue