Skip to content

Releases: google/highway

1.1.0

18 Feb 01:33
Compare
Choose a tag to compare
  • Add BitCastScalar, DispatchedTarget, Foreach
  • Add Div/Mod and MaskedDiv/ModOr, SaturatedAbs, SaturatedNeg
  • Add InterleaveWholeLower/Upper, Dup128VecFromValues
  • Add IsInteger, IsIntegerLaneType, RemoveVolatile, RemoveCvRef
  • Add MaskedAdd/Sub/Mul/Div/Gather/Min/Max/SatAdd/SatSubOr
  • Add MaskFalse, IfNegativeThenNegOrUndefIfZero, PromoteEven/OddTo
  • Add ReduceMin/Max, 8-bit reductions, f16 <-> f64 conversions
  • Add Span, AlignedArray, matrix-vector mul
  • Add SumsOf2/4, I8 SumsOf8, SumsOfAdjQuadAbsDiff, SumsOfShuffledQuadAbsDiff
  • Add ThreadPool, hierarchical profiler
  • Build: use bazel_platforms
  • Enable clang16 Arm/PPC runtime dispatch, F16 for GCC AVX3_SPR
  • Extend Dot to f32*bf16, FMA to integer
  • Fix: RVV 8-bit overflow, UB in vqsort, big-endian bugs, PPC HTM
  • Improved codegen in various ops, fp16/bf16 tests and conversions
  • New targets: HWY_Z14, HWY_Z15
  • Test: add foreign_arch builders, CodeQL

1.0.7

30 Aug 07:06
Compare
Choose a tag to compare
  • Add LoadNOr, GatherIndexN, ScatterIndexN
  • Add additional float<->int conversions
  • Codegen improvements for 8-bit shift, PPC Compress/Expand
  • Fixes for MSVC, PPC, RVV, WASM, GCC 13, GCC 8.2, i686, f16 type, QEMU 7.2
  • Support CMake args in Debian packaging

1.0.6

11 Aug 15:01
Compare
Choose a tag to compare
  • Add MaskedGatherIndex, MaskedScatterIndex, LoadN, StoreN
  • Add SatWidenMulPairwiseAdd, SumOfMulQuadAccumulate, PromoteUpperLowerTo
  • Add F64 for Wasm, F64 AbsDiff
  • Add F16 support to AVX3_SPR, RVV tuple (both not yet enabled)
  • Validate all D args in x86 function signatures
  • License: now dual Apache2/BSD3
  • Doc: new users, vcpkg install instructions, AVX10 plans
  • Doc: advice on dynamic dispatch plus -march flags
  • Build: avoid installing hwy_test if !HWY_ENABLE_TESTS
  • Codegen: improved PPC9 Find*True, variable-length CopyBytes
  • Fix: GCC 8.2, MSVC, ICC, PPC9, SVE, arm64 MSVC issues
  • Fix: IfNegativeThenElse, MulFixedPoint15, Debian changelog format
  • Tests: faster builds (split up), use release builds

1.0.5

19 Jul 16:10
Compare
Choose a tag to compare
  • Add Insert/ExtractBlock, BroadcastBlock/Lane, NumBlocks
  • Add integer Le/Ge and [Neg]MulAdd, extend DemoteTo/PromoteTo
  • Add Leading/TrailingZeroCount, HighestSetBitIndex, ReverseBits
  • Add MaskedLoadOr, tuple Get/Set/Create, ReduceSum, WidenMulPairwiseAdd
  • Add [ZeroExtend]ResizeBitCast, BitwiseIfThenElse, Find[Known]LastTrue
  • Add AESRoundInv, AESKeyGenAssist
  • Add contrib/math Atan2/SinCos, contrib/unroller
  • Add fp16/bf16 support (Armv8, SVE, RVV), HWY_DYNAMIC_POINTER
  • Add OrderedTruncate2To, Per4LaneBlockShuffle, TwoTablesLookupLanes
  • Add SlideUp/Down[Blocks/Lanes], Slide1Up/Down, ReverseLaneBytes
  • Add SetBeforeFirst, SetAtOrBefore/AfterFirst, SetOnlyFirst
  • Add 8-bit Reverse2/4/8, Shl/Shr, RotateRight, Reverse, Mul
  • Add 8/16-bit DupEven/Odd, TableLookupLanes
  • Add F64 ApproximateReciprocal[Sqrt], 32/64-bit SaturatedAdd/Sub
  • Build: Support Bazel modules
  • Codegen improvements
  • Compiler: support Clang 15/16
  • Doc: add Github pages, support policy, evaluation
  • Doc: publish AVX-512 throttling/startup findings
  • Release: add signing
  • Test: add GCC to Github Actions
  • VQSort: small N speedups: fix seeding, func ptr, 8-wide network.
  • VQSort: add BenchAllColdSort, VQSortStatic
  • VQSort: fix subnormal/inf/NaN, support fp16, fix KV types
  • Workarounds: RVV VXRM, x87 excess precision, missing intrinsics

1.0.4

17 Mar 15:33
Compare
Choose a tag to compare
  • Add PPC8..10, SSE2, AVX3_ZEN4, NEON_WITHOUT_AES targets
  • Add Expand, LoadExpand, integer AbsDiff, SumsOf8AbsDiff
  • Improved Half/Twice support, codegen for Shift*Same
  • Support Wasm in Godbolt
  • Faster KV128 sorting
  • Fix armv7 build config, CMake config mode
  • Update RVV intrinsics for 1.0-draft

1.0.3

19 Jan 15:20
Compare
Choose a tag to compare
  • Add RearrangeToOddPlusEven, Xor3, 8-bit CompressStore, HWY_ASSUME
  • Add contrib/bit_pack for 8/16-bit lanes
  • Add WASM_EMU256 target
  • Documentation improvements
  • Allow opting out of C++ stdlib usage for Compiler Explorer
  • Update for new RVV intrinsics; faster WASM min/max and extmul/q15mul
  • Fix UB, GCC atomic

1.0.2

28 Oct 11:05
Compare
Choose a tag to compare
  • Add ExclusiveNeither, FindKnownFirstTrue, Ne128
  • Add 16-bit SumOfLanes/ReorderWidenMulAccumulate/ReorderDemote2To
  • Faster sort for low-entropy input, improved pivot selection
  • Add GN build system, Highway FAQ, k32v32 type to vqsort
  • CMake: Support find_package(GTest), add rvv-inl.h, add HWY_ENABLE_TESTS
  • Fix MIPS and C++20 build, Apple LLVM 10.3 detection, EMU128 AllTrue on RVV
  • Fix missing exec_prefix, RVV build, warnings, libatomic linking
  • Work around GCC 10.4 issue, disabled RDCYCLE, arm7 with vfpv3
  • Documentation/example improvements
  • Support static dispatch to SVE2_128 and SVE_256

1.0.1

24 Aug 16:43
Compare
Choose a tag to compare
  • Add Eq128, i64 Mul, unsigned->float ConvertTo
  • Faster sort for few unique keys, more robust pivot selection
  • Fix: floating-point generator for sort tests, Min/MaxOfLanes for i16
  • Fix: avoid always_inline in debug, link atomic
  • GCC warnings: string.h, maybe-uninitialized, ignored-attributes
  • GCC warnings: preprocessor int overflow, spurious use-after-free/overflow
  • Doc: <=HWY_AVX3, Full32/64/128, how to use generic-inl

1.0.0

27 Jul 14:56
Compare
Choose a tag to compare
  • ABI change: 64-bit target values, more room for expansion
  • Add CompressBlocksNot, CompressNot, Lt128Upper, Min/Max128Upper, TruncateTo
  • Add HWY_SVE2_128 target
  • Sort speedups especially for 128-bit
  • Documentation clarifications
  • Faster NEON CountTrue/FindFirstTrue/AllFalse/AllTrue
  • Improved SVE codegen
  • Fix u16x8 ConcatEven/Odd, SSSE3 i64 Lt
  • MSVC 2017 workarounds
  • Support for runtime dispatch on Arm/GCC/Linux

The 1.0 release signals an increased focus on backwards compatibility.
Applications using documented functionality will remain compatible with future updates that have the same major version number.

0.17.0

02 Jun 17:45
Compare
Choose a tag to compare
  • Add ExtractLane, InsertLane, IsInf, IsFinite, IsNaN
  • Add StoreInterleaved2, LoadInterleaved2/3/4, BlendedStore, SafeFillN
  • Add MulFixedPoint15, Or3
  • Add Copy[If], Find[If], Generate, Replace[If] algos
  • Add HWY_EMU128 target (replaces HWY_SCALAR)
  • HWY_RVV is feature-complete
  • Add HWY_ENABLE_CONTRIB build flag, HWY_NATIVE_FMA, HWY_WANT_SSSE3/SSE4 macros
  • Extend ConcatOdd/Even and StoreInterleaved* to all types
  • Allow CappedTag<T, nonPowerOfTwo>
  • Sort speedups: 2x for AVX2, 1.09x for AVX3; avoid x86 malloc
  • Expand documentation
  • Fix RDTSCP crash in nanobenchmark
  • Fix XCR0 check (was ignoring AVX3 on ICL)
  • Support Arm/RISC-V timers