Skip to content

Conversation

niwinanto
Copy link
Collaborator

@niwinanto niwinanto commented Apr 9, 2025

VEC/ACC/FIFO registers can be copied to each other and which can be used as an alternative to costly stack spill. However, the problem of having a composite register class is, sub-register indices corresponds to vector and accumulator does not cover the corresponding sized registers and leads to undefined uses if we use same sub-register indices for both.

The idea is to mimic the register composition hierarchy of vector for accumulator (only for smaller types, because 2048 does not matter) creating dummy 256-bit accumulators and reverting the separate sub-register indices. Now, vector and accumulator register has same lane masks.

@niwinanto niwinanto marked this pull request as draft April 9, 2025 13:54
@niwinanto niwinanto changed the title [Draft][wip]Niwin.lanemask.regbank [Draft][wip]Mimic vector composition hierarchy for accumulators. Apr 9, 2025
def mCMs : AIE2PAcc1024RegisterClass<(add mCMm)>;
} // let SubRegIndices = [sub_512_lo, sub_512_hi], CoveredBySubRegs = 1

def eCML : AIE2PVector1024RegisterClass<(add cml0, cml1, cml2, cml3, cml4)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, was this change also needed or just the dummy registers were enough?

@niwinanto niwinanto force-pushed the niwin.lanemask.regbank branch from 0fb8751 to 0575190 Compare April 10, 2025 12:09
@andcarminati
Copy link
Collaborator

Maybe a description of the idea would be nice to help us to help.

@@ -33,8 +33,8 @@ def sub_512_hi : SubRegIndex<512, 512>;
def sub_512_acc_lo : SubRegIndex<512, 0>;
def sub_512_acc_hi : SubRegIndex<512, 512>;

def sub_1024_acc_lo : SubRegIndex<1024, 0>;
def sub_1024_acc_hi : SubRegIndex<1024, 1024>;
def sub_1024_lo : SubRegIndex<1024, 0>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will lead to another problem. I heard once that we cannot reuse this for different register classes, maybe is related to the same problem that we have.

@niwinanto niwinanto force-pushed the niwin.lanemask.regbank branch from 0575190 to e8e737d Compare April 29, 2025 08:23
@niwinanto niwinanto changed the title [Draft][wip]Mimic vector composition hierarchy for accumulators. [Draft][wip] Support composite vector register class for register allocation. Apr 29, 2025
@niwinanto
Copy link
Collaborator Author

Preliminary Core_StackSize results.

|----------------------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|-------------------------|----------------|----------------|----------------|--------------------|--------------------|----------------|--------------------|--------------------|-----------------|---------------------------|---------------------------|---------------------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|----------------|----------------|----------------|----------------|----------------|-------------------|-------------------|------------------------------|--------------------|------------------------|------------------------|------------------------|------------------------|---------------|---------------|---------------|-------------------|---------------|---------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|--------------------|-----------------|------------------------|---------------|--------------------|--------------|
| Core_StackSize                         | ReduceMeanAxis_1_aie2_bf16 | ReduceMeanAxis_2_aie2_bf16 | ReduceMeanAxis_3_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | ReduceMeanAxis_5_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceMeanAxis_7_aie2_bf16 | GeluTemplated_aie2_bf16 | Conv2D_bf16_20 | Conv2D_bf16_21 | Conv2D_bf16_22 | GEMM_bfp16_0_AIE2p | GEMM_bfp16_1_AIE2p | Mish_aie2_int8 | Conv2D_bfp16_OC8_1 | Conv2D_bfp16_OC8_7 | Elu_aie2_bf16_0 | Conv2D_bfp16_PSUM_FLOAT_0 | Conv2D_bfp16_PSUM_FLOAT_1 | Conv2D_bfp16_PSUM_FLOAT_2 | Conv2D_bf16_0 | Conv2D_bf16_1 | Conv2D_bf16_3 | Conv2D_bf16_4 | Conv2D_bf16_5 | Conv2D_bf16_6 | Conv2D_bf16_7 | Conv2D_bf16_16 | Conv2D_bf16_17 | Conv2D_bf16_18 | Conv2D_bf16_19 | Conv2D_bf16_23 | Conv2D_bf16_OW8_0 | Conv2D_bf16_OW8_1 | LayerNormC8Part2_aie2_int8_0 | Conv2D_bfp16_OC8_5 | ArgMaxAxis_2_aie2_bf16 | ArgMaxAxis_4_aie2_bf16 | ArgMinAxis_2_aie2_bf16 | ArgMinAxis_4_aie2_bf16 | Sin_aie2_bf16 | Sqrt_int8_0   | Sqrt_int8_1   | Rsqrt_aie2_int8_0 | Conv2D_DW_0   | Conv2D_DW_1   | Conv2D_bf16_FC_AIE2p_0 | Conv2D_bf16_FC_AIE2p_1 | Conv2D_bf16_FC_AIE2p_2 | Conv2D_bf16_FC_AIE2p_3 | Conv2D_bf16_FC_AIE2p_4 | Conv2D_bf16_FC_AIE2p_5 | FullyConnect_aie2_bf16 | Mish_aie2_bfloat16 | Erf_aie2_int8_0 | FullyConnect_aie2_int8 | Ceil_bfloat16 | SoftSign_aie2_int8 | Average diff |
|----------------------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|-------------------------|----------------|----------------|----------------|--------------------|--------------------|----------------|--------------------|--------------------|-----------------|---------------------------|---------------------------|---------------------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|----------------|----------------|----------------|----------------|----------------|-------------------|-------------------|------------------------------|--------------------|------------------------|------------------------|------------------------|------------------------|---------------|---------------|---------------|-------------------|---------------|---------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|--------------------|-----------------|------------------------|---------------|--------------------|--------------|
| mllib_composedreg_full_no_change_peano |                        320 |                        320 |                        320 |                        320 |                        320 |                        320 |                        320 |                     704 |            640 |            640 |            640 |               4416 |               4416 |           1856 |               1152 |               1152 |            3968 |                      1088 |                      1088 |                      1088 |           512 |           512 |           512 |           512 |           512 |           512 |           512 |            512 |            512 |            512 |            512 |            512 |               512 |               512 |                         1664 |               1216 |                   1280 |                   1280 |                   1280 |                   1280 |          1408 |           832 |           832 |              1344 |           448 |           448 |                    640 |                    640 |                    640 |                    640 |                    640 |                    640 |                   1280 |                832 |             640 |                    640 |           896 |               1536 | +0.00%       |
|----------------------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|-------------------------|----------------|----------------|----------------|--------------------|--------------------|----------------|--------------------|--------------------|-----------------|---------------------------|---------------------------|---------------------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|----------------|----------------|----------------|----------------|----------------|-------------------|-------------------|------------------------------|--------------------|------------------------|------------------------|------------------------|------------------------|---------------|---------------|---------------|-------------------|---------------|---------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|--------------------|-----------------|------------------------|---------------|--------------------|--------------|
| mllib_composedreg_full_6_peano         |                        448 |                        448 |                        448 |                        448 |                        448 |                        448 |                        448 |                     640 |            576 |            576 |            576 |               3968 |               3968 |           1664 |               1024 |               1024 |            3520 |                       960 |                       960 |                       960 |           448 |           448 |           448 |           448 |           448 |           448 |           448 |            448 |            448 |            448 |            448 |            448 |               448 |               448 |                         1408 |               1024 |                   1024 |                   1024 |                   1024 |                   1024 |          1088 |           640 |           640 |              1024 |           320 |           320 |                    448 |                    448 |                    448 |                    448 |                    448 |                    448 |                    896 |                512 |             384 |                    384 |           384 |                384 | -1.30%       |
|----------------------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|-------------------------|----------------|----------------|----------------|--------------------|--------------------|----------------|--------------------|--------------------|-----------------|---------------------------|---------------------------|---------------------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|----------------|----------------|----------------|----------------|----------------|-------------------|-------------------|------------------------------|--------------------|------------------------|------------------------|------------------------|------------------------|---------------|---------------|---------------|-------------------|---------------|---------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|--------------------|-----------------|------------------------|---------------|--------------------|--------------|
| Total diff                             | REGR(+40.00%)              | REGR(+40.00%)              | REGR(+40.00%)              | REGR(+40.00%)              | REGR(+40.00%)              | REGR(+40.00%)              | REGR(+40.00%)              | IMPR(-9.09%)            | IMPR(-10.00%)  | IMPR(-10.00%)  | IMPR(-10.00%)  | IMPR(-10.14%)      | IMPR(-10.14%)      | IMPR(-10.34%)  | IMPR(-11.11%)      | IMPR(-11.11%)      | IMPR(-11.29%)   | IMPR(-11.76%)             | IMPR(-11.76%)             | IMPR(-11.76%)             | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%) | IMPR(-12.50%)  | IMPR(-12.50%)  | IMPR(-12.50%)  | IMPR(-12.50%)  | IMPR(-12.50%)  | IMPR(-12.50%)     | IMPR(-12.50%)     | IMPR(-15.38%)                | IMPR(-15.79%)      | IMPR(-20.00%)          | IMPR(-20.00%)          | IMPR(-20.00%)          | IMPR(-20.00%)          | IMPR(-22.73%) | IMPR(-23.08%) | IMPR(-23.08%) | IMPR(-23.81%)     | IMPR(-28.57%) | IMPR(-28.57%) | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-30.00%)          | IMPR(-38.46%)      | IMPR(-40.00%)   | IMPR(-40.00%)          | IMPR(-57.14%) | IMPR(-75.00%)      | -1.30%       |
|----------------------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|----------------------------|-------------------------|----------------|----------------|----------------|--------------------|--------------------|----------------|--------------------|--------------------|-----------------|---------------------------|---------------------------|---------------------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|----------------|----------------|----------------|----------------|----------------|-------------------|-------------------|------------------------------|--------------------|------------------------|------------------------|------------------------|------------------------|---------------|---------------|---------------|-------------------|---------------|---------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|------------------------|--------------------|-----------------|------------------------|---------------|--------------------|--------------|

…to mimic vector register hierarchy

Subregister indices corresponds to vector and accumulator does not cover the co-
rresponding sized registers and leads to undefined uses if we use same subregis-
ter indices for both. Having separate subregister indices solves this problem.
However, with this approach, we cannot allocate vector register for accumulator
(or vice versa). The idea is to mimic the register composition hierarchy of vector
for accumulator (only for smaller types, because 2048 does not matter) creating
dummy 256-bit accumulators and reverting the separate subregister indices. Now,
vector and accumulator register has same lane masks.
VEC/ACC/FIFO registers can be copied to each other and which can be used as an
alternative to costly stack spill.
Upon spilling the composed register, we might need to select the spill/reload
instruction based on the register allocated by the register allocator itself.
ItineraryRegPairs information is missing for the FIFO store registers(sf) with
VMOV_alu_mv_mv_x and we might see wrongly scheduled code. Removed fifo store
registers from the composed register class for the time being.
@niwinanto niwinanto force-pushed the niwin.lanemask.regbank branch from e8e737d to d953477 Compare April 29, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants