Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choosing NEON over SVE when fixed size vectors are used where possible #2060

Open
Ryo-not-rio opened this issue Apr 4, 2024 · 7 comments
Open

Comments

@Ryo-not-rio
Copy link

I've noticed quite a severe performance hit when writing highway code using fixed size vectors where the size is smaller than the number of available lanes in SVE. This occurred when porting NEON code written for 128-bit vectors into highway on a SVE machine which has 256-bit SVE vectors. Would it be possible for highway to choose NEON vectors for fixed size vectors where the specified size is smaller or equal to 128 bits?

@johnplatts
Copy link
Contributor

It is possible to bitcast SVE vectors to NEON vectors and vice versa on GCC and Clang releases that have support for the arm_neon_sve_bridge.h header, including Clang 14 and later and GCC 14 and later.

An uint8x16_t vector can be bitcast to a svuint8_t vector by doing svset_neonq_u8(svundef_u8(), v) on compilers that support the arm_neon_sve_bridge.h header, and a svuint8_t vector can be bitcast to an uint8x16_t vector using svget_neonq_u8(v) on compilers that support the arm_neon_sve_bridge.h header.

It is also possible to re-implement the HWY_SVE2_128 target to use the fixed-size vector, mask, and tuple types in arm_neon-inl.h (which are wrappers around fixed-sized NEON vectors) instead of the SVE scalable vector, mask, and tuple types in arm_sve-inl.h on compilers that have support for the arm_neon_sve_bridge.h header as full SVE vectors are exactly 16 bytes on the HWY_SVE2_128 target.

@johnplatts
Copy link
Contributor

Here is a link to a Compiler Explorer snippet that demonstrates the use of the ARM NEON SVE Bridge intrinsics (which are defined in the arm_neon_sve_bridge.h header) to convert between NEON vectors and SVE vectors on the HWY_NEON target:
https://godbolt.org/z/EK8h36Err

@jan-wassenberg
Copy link
Member

If I understand correctly, the issue is that we use FixedTag<uint32_t, 4>, which on SVE requires Load/Store etc to do extra work to limit the work to 128 bits.

+1 to John's comment that SVE2_128 would work when running on Neoverse V2, but I think this use case is running on V1 which actually has 256-bit vectors.

I don't have experience with the SVE/NEON bridge, that sounds interesting. But perhaps I don't fully understand the use case. If we are porting from NEON code, why not just use the NEON target? Is the issue that dynamic dispatch chooses SVE, even though for this use case NEON would be better?

If so, we can either set HWY_DISABLED_TARGETS (HWY_NEON|HWY_NEON_WITHOUT_AES), or call hwy::DisableTargets at runtime to influence the dynamic dispatch.

@Ryo-not-rio
Copy link
Author

Yes, the use case is running on V1 and when there are some scalable vectors used in parts of the code where fixed sized vectors are used in other parts of the code. We haven't tested using dynamic dispatch - only static dispatch - but even with dynamic dispatch, I imagine if there's currently not a way to use NEON vectors for parts of the code and SVE in other parts of the code. Am I correct in this understanding or is there actually a way of specifying?

@jan-wassenberg
Copy link
Member

hm, if the code is isolated and not alternating between SVE/NEON in the same function or source file, it is easy to compile one source file with SVE disabled (so it would use NEON on Arm), and the other one not.

I suppose we could compile both NEON and SVE in the SVE target, and whenever the N in Simd<T, N, kPow2> is <= 16/sizeof(T), only enable the NEON functions. This would probably require quite a few updates to the SFINAE conditions in both files, disabling SVE for small vectors, and disabling NEON for non-capped.

@Ryo-not-rio
Copy link
Author

I think that would be the ideal solution but for now, how would one specify whether to use NEON or SVE on a per-function basis? I don't envision using NEON and SVE mixed in one function so if there's a way to just specify it for functions, that would most likely be enough

@jan-wassenberg
Copy link
Member

It can work like this.

template <class D, HWY_IF_V_SIZE_LE_D(D, 16), typename T>
NeonType Func(D d) { return NeonType(); }

template <class D, HWY_IF_V_SIZE_GT_D(D, 16), typename T>
SveType Func(D d) { return SveType(); }

and for functions not involving a D=Simd<T, N, kPow2>, we rely on normal C++ overloading because NeonType and SveType are not the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants