Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrderedDemote2To() f64->f32 ? #1903

Open
Pflugshaupt opened this issue Dec 15, 2023 · 3 comments
Open

OrderedDemote2To() f64->f32 ? #1903

Pflugshaupt opened this issue Dec 15, 2023 · 3 comments

Comments

@Pflugshaupt
Copy link

I'm migrating my DSP codebase from my own attempt of a library to Highway at the moment. Things went mostly well but I found one thing a bit puzzling: I have some algorithms that work on float lanes, but have to do a intermediate calculations at double precision. My own library allowed having double-as-wide f64 aggregates for that, but I see that highway won't do Twice<d> on full-width tags.
That's fair enough and so I went with PromoteLowerTo() and PromoteUpperTo() to convert each float tag to two double tags.. However to go back to float later I found OrderedDemote2To() is curiously missing for double to float. Is there a specific reason for that or am I missing some other function? I just want to convert N double lanes to N float lanes using half as many registers - it seems like something that would come up quite often with algorithm requiring full float precision results.

I ended up writing this, but it seems a bit silly:

        auto dbl2float = [](auto d, auto a, auto b) HWY_ATTR {
            const Half<decltype(d)> hd;
            return Combine(d, DemoteTo(hd, b), DemoteTo(hd, a));
        };
@jan-wassenberg
Copy link
Member

Hi, we don't have f64->f32 OrderedDemote2To because x86 and SVE can't do that very efficiently and we did not yet have a use-case.

However, RVV and NEON could do this a bit more efficiently. Would you be interested in having a go at adding support? That would involve updating quick_reference.md to mention f64->f32 is supported, in demote_test.cc:678 adding ForShrinkableVectors<TestFloatOrderedDemote2To>()(float());, copying your implementation to generic_ops-inl.h with the usual #if (defined(HWY_NATIVE_ 'include guard', and adding implementations to rvv-inl.h and arm_neon-inl.h.

@Pflugshaupt
Copy link
Author

Ok, I'll give it a try once I'm done migrating to Highway and gained some more experience with it. That'll be in January. Thanks for letting me know I'm not missing a different way to do f64->f32. An issue might be that I have zero experience with Risc-V/RVV.

@jan-wassenberg
Copy link
Member

Sounds good :)
No worries, RVV already has an existing function for that, it may be enough simply to enable f64->f32 in the template SFINAE. Would also be fine to write a TODO instead, in the meantime that target would be covered by the generic code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants