Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 SVE target failed tests #1799

Open
fbarbari opened this issue Oct 5, 2023 · 9 comments
Open

ARM64 SVE target failed tests #1799

fbarbari opened this issue Oct 5, 2023 · 9 comments

Comments

@fbarbari
Copy link

fbarbari commented Oct 5, 2023

Hello,
I compiled Highway 1.0.7 on a Fujitsu A64FX and these tests failed.

400 - HwyCryptoTestGroup/HwyCryptoTest.TestAllAES/NEON  # GetParam() = 268435456 (ILLEGAL)
404 - HwyCryptoTestGroup/HwyCryptoTest.TestAllAESInverse/NEON  # GetParam() = 268435456 (ILLEGAL)
408 - HwyCryptoTestGroup/HwyCryptoTest.TestAllCLMul/NEON  # GetParam() = 268435456 (ILLEGAL)

I configured Highway with these flags with cmake 3.26.3:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=`realpath ~/local` -DHWY_SYSTEM_GTEST=ON -DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -mcpu=a64fx"

When compiling (with clang 16.0.6), these are the detected targets:

Compiled HWY_TARGETS:   SVE_256 SVE
HWY_ATTAINABLE_TARGETS: SVE_256 SVE NEON NEON_WITHOUT_AES EMU128
HWY_BASELINE_TARGETS:   SVE NEON NEON_WITHOUT_AES EMU128
HWY_STATIC_TARGET:      SVE
HWY_BROKEN_TARGETS:    
HWY_DISABLED_TARGETS:  
Current CPU supports:   SVE NEON NEON_WITHOUT_AES EMU128 SCALAR
@jan-wassenberg
Copy link
Member

Thanks for reporting the issue. Looks like AES related tests, and the NEON target indeed includes AES instructions. I see from the A64FX microarchitecture guide that AESD/AESE etc are supported.

Are you able to run with ctest --output-on-failure option to print details on what exactly is failing? And if it actually is a SIGILL exception, can you run the crypto_test inside a debugger to see which instruction it is?

@fbarbari
Copy link
Author

fbarbari commented Oct 5, 2023

ctest --output-on-failure option output:

Test project /ccc/cont005/home/cineca/barbarif/highway-1.0.7/build
    Start 400: HwyCryptoTestGroup/HwyCryptoTest.TestAllAES/NEON  # GetParam() = 268435456
1/3 Test #400: HwyCryptoTestGroup/HwyCryptoTest.TestAllAES/NEON  # GetParam() = 268435456 ..........***Exception: Illegal  0.17 sec
Running main() from /ccc/cont005/home/cineca/barbarif/googletest-1.13.0/googletest/src/gtest_main.cc
Note: Google Test filter = HwyCryptoTestGroup/HwyCryptoTest.TestAllAES/NEON
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyCryptoTestGroup/HwyCryptoTest
[ RUN      ] HwyCryptoTestGroup/HwyCryptoTest.TestAllAES/NEON

    Start 404: HwyCryptoTestGroup/HwyCryptoTest.TestAllAESInverse/NEON  # GetParam() = 268435456
2/3 Test #404: HwyCryptoTestGroup/HwyCryptoTest.TestAllAESInverse/NEON  # GetParam() = 268435456 ...***Exception: Illegal  0.17 sec
Running main() from /ccc/cont005/home/cineca/barbarif/googletest-1.13.0/googletest/src/gtest_main.cc
Note: Google Test filter = HwyCryptoTestGroup/HwyCryptoTest.TestAllAESInverse/NEON
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyCryptoTestGroup/HwyCryptoTest
[ RUN      ] HwyCryptoTestGroup/HwyCryptoTest.TestAllAESInverse/NEON

    Start 408: HwyCryptoTestGroup/HwyCryptoTest.TestAllCLMul/NEON  # GetParam() = 268435456
3/3 Test #408: HwyCryptoTestGroup/HwyCryptoTest.TestAllCLMul/NEON  # GetParam() = 268435456 ........***Exception: Illegal  0.18 sec
Running main() from /ccc/cont005/home/cineca/barbarif/googletest-1.13.0/googletest/src/gtest_main.cc
Note: Google Test filter = HwyCryptoTestGroup/HwyCryptoTest.TestAllCLMul/NEON
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyCryptoTestGroup/HwyCryptoTest
[ RUN      ] HwyCryptoTestGroup/HwyCryptoTest.TestAllCLMul/NEON


crypto_test source:

hwy::N_NEON::AESRound(hwy::N_NEON::Vec128<unsigned char, 16ul>, hwy::N_NEON::Vec128<unsigned char, 16ul>) ()

crypto_test asm:

aese   v0.16b, v1.16b

(do you need the asm of the whole function?)

@jan-wassenberg
Copy link
Member

Thanks for confirming. Interesting, it does seem to be an AES instruction that is faulting.
It is possible that AES instructions are disabled/trapped by the OS? Does /proc/cpuinfo mention "aes"?

@fbarbari
Copy link
Author

fbarbari commented Oct 6, 2023

Actually, it doesn't support AES instructions, to my surprise! These are the flags reported in /proc/cpuinfo: fp asimd evtstrm sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve. So I guess it's not Highway's fault.

@jan-wassenberg
Copy link
Member

Interesting, thanks for confirming. Indeed we are detecting that the CPU supports AES, according to the OS, via this code in targets.cc:

  const CapBits hw = getauxval(AT_HWCAP);
#if HWY_ARCH_ARM_A64
  // .. but not necessarily AES, which is required for HWY_NEON.
#if defined(HWCAP_AES)
  if (hw & HWCAP_AES) {
    bits |= HWY_NEON;

Can you confirm via printf or abort that this bits |= HWY_NEON; line is in fact reached?

If so, it's very curious that getauxval would be incorrect. Our code seems to match an example.

Are there any other flags we could check to detect this properly on your system?

@fbarbari
Copy link
Author

fbarbari commented Oct 6, 2023

I'm sorry, I think I don't know how to do that. I modified the code you mentioned with

...
printf("Called arm::detectTargets()\n");
#if HWY_ARCH_ARM_A64
  bits |= HWY_NEON_WITHOUT_AES;  // aarch64 always has NEON and VFPv4..

  // .. but not necessarily AES, which is required for HWY_NEON.
#if defined(HWCAP_AES)
printf("'bits' before : %d (0x%16d)\n", bits, bits);
  if (hw & HWCAP_AES) {
    bits |= HWY_NEON;
printf("'bits' after : %d (0x%016d)\n", bits, bits);
  }
#else
printf("No HWCAP_AES defined\n");
#endif  // HWCAP_AES
...

and run ctest --verbose but nothing showed up on the console. The same happened by manually running crypto_test, targets_test and hwy_list_targets.

@jan-wassenberg
Copy link
Member

Thanks for adding the printfs! I see that we only add HWY_NEON to HWY_BASELINE_NEON if the compiler sets __ARM_FEATURE_AES, which looks correct.

As to why the printf is skipped: we are compiling with clang, but the runtime dispatch is only known to work with (and thus enabled for) GCC. If !HWY_HAVE_RUNTIME_DISPATCH [1], then we assume the presence of all CPU features enabled by compiler flags instead of detecting. This is risky but reasonable because the compiler could also be generating those instructions, and if they weren't supported by the CPU, the binary would crash anyway.

But it seems that your -mcpu sets __ARM_FEATURE_AES, which might be a misunderstanding; I think that's NEON AES, which your CPU indeed does not support.

To work around this, we could still run the CPU feature detection, and thus skip the HWY_NEON target which requires AES.
In targets.cc, could you change the second HWY_ARCH_ARM && HWY_HAVE_RUNTIME_DISPATCH (in DetectTargets) to HWY_ARCH_ARM and confirm that fixes the problem?

1: The prereqs for HWY_HAVE_RUNTIME_DISPATCH are HWY_COMPILER_GCC_ACTUAL && HWY_OS_LINUX && !defined(TOOLCHAIN_MISS_SYS_AUXV_H)

@fbarbari
Copy link
Author

fbarbari commented Oct 31, 2023

Hello again, sorry for the delay.

Are there any other flags we could check to detect this properly on your system?

These are the ARM-related flags obtained from clang++ -mcpu=a64fx -dM -E - </dev/null

#define __AARCH64EL__ 1
#define __AARCH64_CMODEL_SMALL__ 1
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_AES 1
#define __ARM_FEATURE_ATOMICS 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRC32 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
#define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_QRDMX 1
#define __ARM_FEATURE_SHA2 1
#define __ARM_FEATURE_SVE 1
#define __ARM_FEATURE_SVE_VECTOR_OPERATORS 2
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xE
#define __ARM_FP16_ARGS 1
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xE
#define __ARM_NEON_SVE_BRIDGE 1
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4

In targets.cc, could you change the second HWY_ARCH_ARM && HWY_HAVE_RUNTIME_DISPATCH (in DetectTargets) to HWY_ARCH_ARM and confirm that fixes the problem?

I changed the one on line 456 (hope it's what you meant). And it outputs this:

highway-1.0.7/hwy/targets.cc:462:11: error: use of undeclared identifier 'arm'
  bits |= arm::DetectTargets();
          ^
1 error generated.

So I tried to also change the one on line 333 to HWY_ARCH_ARM, it compiles and writes this during configuration:

Config: emu128:0 scalar:0 static:0 all_attain:0 is_test:0
Compiled HWY_TARGETS:   SVE_256 SVE
HWY_ATTAINABLE_TARGETS: SVE_256 SVE NEON NEON_WITHOUT_AES EMU128
HWY_BASELINE_TARGETS:   SVE NEON NEON_WITHOUT_AES EMU128
HWY_STATIC_TARGET:      SVE
HWY_BROKEN_TARGETS:    
HWY_DISABLED_TARGETS:  
Called arm::detectTargets()
'bits' before : 536870912 (0x        20000000)
WARNING: CPU supports 0x6000000028000000, software requires 0x2000000038000000
Current CPU supports:   SVE NEON_WITHOUT_AES EMU128 SCALAR

I'm still working with clang 16.0.6. I also made some quick attempts with gcc 11.1.0 but there were some compilation errors.

Hope this helps

@jan-wassenberg
Copy link
Member

A quick update, we are getting close to enabling runtime dispatch for aarch64+clang, which would fix that.

Per the output you provided above, we are correctly detecting that the system somehow does not support AES, even though the CPU should. That should prevent the SIGILL crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants