Commit 3639122b authored by Gustavo Valiente's avatar Gustavo Valiente

Initial commit

parent a2ce7e1c
What's new in v2.2 development master:
* Nothing yet
What's new in v2.1:
* Various bug fixes
* Documentation has been significantly improved. The public API is now almost
fully documented.
* Added support for MIPS MSA instruction set.
* Added support for PowerPC VSX v2.06 and v2.07 instruction sets.
* Added support for x86 AVX512BW, AVX512DQ and AVX512VL instruction sets.
* Added support for 64-bit little-endian PowerPC.
* Added support for arbitrary width vectors in extract() and insert()
* Added support for arbitrary source vectors to to_int8(), to_uint8(),
`to_int16()`, `to_uint16()`, `to_int32()`, `to_uint32()`, `to_int64()`,
`to_uint64()`, `to_float32()`, `to_float64()`.
* Added support for per-element integer shifts to `shift_r()` and `shift_l()`.
Fallback paths are provided for SSE2-AVX instruction sets that lack hardware
per-element integer shift support.
* Make `shuffle_bytes16()`, `shuffle_zbytes16()`, `permute_bytes16()` and
`permute_zbytes()` more generic.
* New functions: `popcnt()`, `reduce_popcnt()`, `for_each()`, `to_mask()`.
* Xcode is now supported.
* The library has been refactored in such a way that older compilers are able
to optimize vector emulation code paths much better than before.
* Deprecation: implicit conversion operators to native vector types has been
deprecated and a replacement method has been provided instead. The implicit
conversion operators may lead to wrong code being accepted without a compile
error on Clang.
What's new in v2.0:
* Various bug fixes.
* Intel compiler is now supported on Windows. Newer versions of other compilers
are now supported.
What's new in v2.0~rc2:
* Various bug fixes.
What's new in v2.0~rc1:
* Expression template-based backend.
* Support scalar arguments to certain functions.
* Support for vectors much longer than the native vector type. The only
limitation is that the length must be a power of 2.
* Operator overloads are provided that fully mimic free functions.
* New functions: `store_masked()`, `store_u()`, `shuffle4x2()`, `shuffle2x2()`,
`test_bits_any()`.
* PowerPC Altivec, NEONv2 support. More compilers are now supported.
* API break: The built-in dispatcher has been rewritten to generate hard
link-time dependencies on the versioned functions. It is no longer susceptible
in lazy static variable initialization breaking function registration. User
must supply the list of generated architectures via predefined macro values.
See examples/dispatcher for an example.
* API break: `SIMDPP_USER_ARCH_INFO` now accepts any expression, not only a
function.
* API break: `int128` and `int256` types have been removed. On some archs
it's more efficient to have different physical representations for vectors
with different element widths.
* API break: `broadcast()` family of functions have been renamed to `splat()`.
* API break: `permute()` family of functions has been renamed to `permute2()`
and `permute4()` depending on the number of template arguments taken.
* API break: value conversion functions such as `to_float32x4()` have been
renamed and now returns a vector with the same number of elements as the
source vector.
* API break: certain functions return "empty" expressions instead of vectors.
* API break: saturated add and sub are now called `add_sat()` and `sub_sat()`.
* API break: the `zero()` and `ones()` member functions of the vector types
have been removed.
* API break: the functions in the `sse::` and `neon::` namespaces have been
replaced with equivalent functions within the main simdpp namespace
* Various other enhancements.
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:
The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
libsimdpp
=========
[![Travis build status](https://travis-ci.org/p12tic/libsimdpp.svg?branch=master)](https://travis-ci.org/p12tic/libsimdpp "Travis build status")
[![Appveyor build status](https://img.shields.io/appveyor/ci/p12tic/libsimdpp/master.svg)](https://ci.appveyor.com/project/p12tic/libsimdpp "Appveyor build status")
[![Join the chat at https://gitter.im/libsimdpp/Lobby](https://badges.gitter.im/libsimdpp/Lobby.svg)](https://gitter.im/libsimdpp/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
libsimdpp is a portable header-only zero-overhead C++ low level SIMD library.
The library presents a single interface over SIMD instruction sets present in
x86, ARM, PowerPC and MIPS architectures. On architectures that support
different SIMD instruction sets the library allows the same source code files
to be compiled for each SIMD instruction set and then hooked into an internal
or third-party dynamic dispatch mechanism. This allows the capabilities of the
processor to be queried on runtime and the most efficient implementation to be
selected.
The library sits somewhere in the middle between programming directly in SIMD
intrinsics and even higher-level SIMD libraries. As much control as possible
is given to the developer, so that it's possible to exactly predict what code
the compiler will generate.
No API-breaking changes are planned for the foreseeable future.
Documentation
-------------
Online documentation is provided
[here](http://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/).
Compiler and instruction set support
------------------------------------
- This describes the current branch only which may be unstable or otherwise
unfit for use. For available releases please see the
[libsimdpp wiki](https://github.com/p12tic/libsimdpp/wiki).
The library supports the following architectures and instruction sets:
- x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX512F,
AVX512BW, AVX512DQ, AVX512VL, XOP, popcnt
- ARM 32-bit: NEON, NEONv2
- ARM 64-bit: NEON, NEONv2
- PowerPC 32-bit big-endian: Altivec, VSX v2.06, VSX v2.07
- PowerPC 64-bit little-endian: Altivec, VSX v2.06, VSX v2.07
- MIPS 32-bit little-endian: MSA
- MIPS 64-bit little-endian: MSA
The primary development of the library happens in C++11. A C++98-compatible
version of the library is provided on the
[cxx98](https://github.com/p12tic/libsimdpp/tree/cxx98) branch.
Supported compilers:
- C++11 version:
- GCC: 4.8-7.x
- Clang: 3.3-4.0
- Xcode 7.0-9.x
- MSVC: 2013, 2015, 2017
- ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
- C++98 version
- GCC: 4.4-7.x
- Clang: 3.3-4.0
- Xcode 7.0-9.x
- MSVC: 2013, 2015, 2017
- ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.
Various compiler versions are not supported on various instruction sets due to
compiler bugs or incompletely implemented instruction sets. See
simdpp/detail/workarounds.h for more details.
- MSVC and ICC are only supported on x86 and x86-64.
- AVX is not supported on Clang 3.6 or GCC 4.4
- AVX2 is not supported on Clang 3.6.
- AVX512F is not supported on:
- GCC 5.x and older
- Clang 5.0 and older
- MSVC
- NEON armv7 is not supported on Clang 3.3 and older.
- NEON aarch64 is not supported on GCC 4.8 and older
- Altivec on little-endian PPC is not suppported on GCC 5.x and older.
- VSX on big-endian PPC is not supported on GCC 5.x and older.
- MSA is not supported on GCC 6.x and older.
Contributing
------------
Contributions are welcome. Please see CONTRIBUTING.md for more information.
License
-------
The library may be freely used in commercial and non-commercial software. The
code is distributed under the Boost Software License, Version 1.0. Some
internal development scripts are licensed under different licenses -- see
comments in these files. The documentation is licensed under CC-BY-SA.
> Boost Software License - Version 1.0 - August 17th, 2003
>
> Permission is hereby granted, free of charge, to any person or organization
> obtaining a copy of the software and accompanying documentation covered by
> this license (the "Software") to use, reproduce, display, distribute,
> execute, and transmit the Software, and to prepare derivative works of the
> Software, and to permit third-parties to whom the Software is furnished to
> do so, all subject to the following:
>
> The copyright notices in the Software and this entire statement, including
> the above license grant, this restriction and the following disclaimer,
> must be included in all copies of the Software, in whole or in part, and
> all derivative works of the Software, unless such copies or derivative
> works are solely in the form of machine-executable object code generated by
> a source language processor.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
> SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
> FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
> ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN THE SOFTWARE.
# Copyright (C) 2013 Povilas Kanapickas <povilas@radix.lt>
#
# Distributed under the Boost Software License, Version 1.0.
# (See accompanying file LICENSE_1_0.txt or copy at
# http://www.boost.org/LICENSE_1_0.txt)
file(GLOB_RECURSE HEADERS RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} *.h *.inl)
foreach(FILE ${HEADERS})
get_filename_component(FILE_PATH "${FILE}" PATH)
install(FILES "${FILE}" DESTINATION "${SIMDPP_INCLUDEDIR}/simdpp/${FILE_PATH}")
endforeach()
# Don't enable header tests by default because configuring it takes excessive
# amount of time
set(ENABLE_HEADER_TESTS "0")
if(${ENABLE_HEADER_TESTS} STREQUAL "1")
simdpp_get_compilable_archs(COMPILABLE_ARCHS)
set(HEADER_TESTS "")
add_custom_target(check_headers)
foreach(ARCH ${COMPILABLE_ARCHS})
simdpp_get_arch_info(CXX_FLAGS DEFINES_LIST SUFFIX ${ARCH})
foreach(FILE ${HEADERS})
if("${FILE}" STREQUAL ".inl")
continue()
endif()
string(REPLACE "/" "_" TEST "${FILE}")
string(REPLACE "." "_" TEST "${TEST}")
set(TEST "${TEST}${SUFFIX}")
set(TEST_OUT "check_headers/test_header_compiles_${TEST}")
string(REPLACE "-" "_" TEST_TARGET "check_headers_${TEST}")
file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/check_headers")
separate_arguments(CXX_FLAGS)
add_custom_command(
OUTPUT ${TEST_OUT}
COMMAND ${CMAKE_CXX_COMPILER}
-DLIBSIMDPP_SIMD_H
-I "${CMAKE_SOURCE_DIR}"
${CXX_FLAGS} -x c++ -std=c++11 -g2 -Wall
${CMAKE_SOURCE_DIR}/simdpp/${FILE}
-c -o ${CMAKE_BINARY_DIR}/${TEST_OUT}
DEPENDS ${FILE} )
add_custom_target(${TEST_TARGET} DEPENDS ${TEST_OUT})
add_dependencies(check_headers "${TEST_TARGET}")
endforeach()
endforeach()
endif()
/* Copyright (C) 2017 Povilas Kanapickas <povilas@radix.lt>
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt)
*/
#ifndef LIBSIMDPP_SIMD_CAPABILITIES_H
#define LIBSIMDPP_SIMD_CAPABILITIES_H
#ifndef LIBSIMDPP_SIMD_H
#error "This file must be included through simd.h"
#endif
#include <simdpp/setup_arch.h>
#if SIMDPP_USE_SSE2 || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT8_SIMD 1
#define SIMDPP_HAS_INT16_SIMD 1
#define SIMDPP_HAS_INT32_SIMD 1
#else
#define SIMDPP_HAS_INT8_SIMD 0
#define SIMDPP_HAS_INT16_SIMD 0
#define SIMDPP_HAS_INT32_SIMD 0
#endif
#if SIMDPP_USE_SSE2 || SIMDPP_USE_NEON || SIMDPP_USE_VSX_207 || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT64_SIMD 1
#else
#define SIMDPP_HAS_INT64_SIMD 0
#endif
#if SIMDPP_USE_SSE2 || SIMDPP_USE_NEON_FLT_SP || (SIMDPP_USE_NEON && SIMDPP_64_BITS) || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_FLOAT32_SIMD 1
#else
#define SIMDPP_HAS_FLOAT32_SIMD 0
#endif
#if SIMDPP_USE_SSE2 || (SIMDPP_USE_NEON && SIMDPP_64_BITS) || SIMDPP_USE_VSX_206 || SIMDPP_USE_MSA
#define SIMDPP_HAS_FLOAT64_SIMD 1
#else
#define SIMDPP_HAS_FLOAT64_SIMD 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_AVX512F || (SIMDPP_USE_NEON && SIMDPP_64_BITS) || SIMDPP_USE_VSX_206 || SIMDPP_USE_MSA
#define SIMDPP_HAS_FLOAT64_TO_UINT32_CONVERSION 1
#else
#define SIMDPP_HAS_FLOAT64_TO_UINT32_CONVERSION 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_AVX512DQ || (SIMDPP_USE_NEON && SIMDPP_64_BITS) || SIMDPP_USE_VSX_207 || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT64_TO_FLOAT64_CONVERSION 1
#define SIMDPP_HAS_INT64_TO_FLOAT32_CONVERSION 1
#define SIMDPP_HAS_UINT64_TO_FLOAT64_CONVERSION 1
#define SIMDPP_HAS_UINT64_TO_FLOAT32_CONVERSION 1
#define SIMDPP_HAS_FLOAT32_TO_INT64_CONVERSION 1
#define SIMDPP_HAS_FLOAT32_TO_UINT64_CONVERSION 1
#define SIMDPP_HAS_FLOAT64_TO_INT64_CONVERSION 1
#define SIMDPP_HAS_FLOAT64_TO_UINT64_CONVERSION 1
#else
#define SIMDPP_HAS_INT64_TO_FLOAT64_CONVERSION 0
#define SIMDPP_HAS_INT64_TO_FLOAT32_CONVERSION 0
#define SIMDPP_HAS_UINT64_TO_FLOAT64_CONVERSION 0
#define SIMDPP_HAS_UINT64_TO_FLOAT32_CONVERSION 0
#define SIMDPP_HAS_FLOAT32_TO_INT64_CONVERSION 0
#define SIMDPP_HAS_FLOAT32_TO_UINT64_CONVERSION 0
#define SIMDPP_HAS_FLOAT64_TO_INT64_CONVERSION 0
#define SIMDPP_HAS_FLOAT64_TO_UINT64_CONVERSION 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_SSSE3 || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT8_SHIFT_L_BY_VECTOR 1
#define SIMDPP_HAS_UINT8_SHIFT_L_BY_VECTOR 1
#define SIMDPP_HAS_INT16_SHIFT_L_BY_VECTOR 1
#define SIMDPP_HAS_UINT16_SHIFT_L_BY_VECTOR 1
#else
#define SIMDPP_HAS_INT8_SHIFT_L_BY_VECTOR 0
#define SIMDPP_HAS_UINT8_SHIFT_L_BY_VECTOR 0
#define SIMDPP_HAS_INT16_SHIFT_L_BY_VECTOR 0
#define SIMDPP_HAS_UINT16_SHIFT_L_BY_VECTOR 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_SSE2 || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT32_SHIFT_L_BY_VECTOR 1
#define SIMDPP_HAS_UINT32_SHIFT_L_BY_VECTOR 1
#else
#define SIMDPP_HAS_INT32_SHIFT_L_BY_VECTOR 0
#define SIMDPP_HAS_UINT32_SHIFT_L_BY_VECTOR 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_SSSE3 || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT8_SHIFT_R_BY_VECTOR 1
#define SIMDPP_HAS_UINT8_SHIFT_R_BY_VECTOR 1
#define SIMDPP_HAS_UINT16_SHIFT_R_BY_VECTOR 1
#else
#define SIMDPP_HAS_INT8_SHIFT_R_BY_VECTOR 0
#define SIMDPP_HAS_UINT8_SHIFT_R_BY_VECTOR 0
#define SIMDPP_HAS_UINT16_SHIFT_R_BY_VECTOR 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_AVX512BW || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT16_SHIFT_R_BY_VECTOR 1
#else
#define SIMDPP_HAS_INT16_SHIFT_R_BY_VECTOR 0
#endif
#if SIMDPP_USE_NULL || SIMDPP_USE_SSE2 || SIMDPP_USE_NEON || SIMDPP_USE_ALTIVEC || SIMDPP_USE_MSA
#define SIMDPP_HAS_INT32_SHIFT_R_BY_VECTOR 1
#define SIMDPP_HAS_UINT32_SHIFT_R_BY_VECTOR 1
#else
#define SIMDPP_HAS_INT32_SHIFT_R_BY_VECTOR 0
#define SIMDPP_HAS_UINT32_SHIFT_R_BY_VECTOR 0
#endif
#endif
/* Copyright (C) 2013-2014 Povilas Kanapickas <povilas@radix.lt>
Distributed under the Boost Software License, Version 1.0.