C++

Absolute value functions performance
Recently, when I was working on a code that needed to compute float absolute value on a large array of numbers, I have to decide which function to use.
Here is simple test done on an array of 10000000 float numbers that shows that using custom simple template code is actually the fastest solution (except using SSE – see bottom of this page).
std::abs() | 341 ms |
fabsf() | 322 ms |
template | 293 ms |
SSE | 7 ms |
This test shows that std::abs function is actually the slowest one.
Test code (using Ogre for timings and log).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
#include <random> template<typename T> T abst(T t) { return t > 0 ? t : -t; } void measureABS() { std::vector<float> randNumbers(10000000); std::mt19937 randGen(0x31597625); std::uniform_real_distribution<float> dist(std::numeric_limits<float>::lowest(), std::numeric_limits<float>::max()); std::generate(std::begin(randNumbers), std::end(randNumbers), [&dist, &randGen] { return dist(randGen); }); float t = 0.0f; Ogre::Timer t_std; for (std::vector<float>::const_iterator it = randNumbers.begin(); it != randNumbers.end(); ++it) { t = std::abs(*it); } const size_t duration_std = t_std.getMillisecondsCPU(); Ogre::Timer t_fabsf; for (std::vector<float>::const_iterator it = randNumbers.begin(); it != randNumbers.end(); ++it) { t = fabsf(*it); } const size_t duration_fabsf = t_fabsf.getMillisecondsCPU(); Ogre::Timer t_template; for (std::vector<float>::const_iterator it = randNumbers.begin(); it != randNumbers.end(); ++it) { t = abst<float>(*it); } const size_t duration_template = t_template.getMillisecondsCPU(); std::stringstream ss; ss << "ABS test:" << std::endl << "std::abs(): " << duration_std << "ms" << std::endl << "fabsf(): " << duration_fabsf << "ms" << std::endl << "template: " << duration_template << "ms" << std::endl; Ogre::LogManager::getSingleton().logMessage(ss.str()); } |
Edit:
Using SSE:
1 2 3 4 5 6 7 8 9 10 11 |
Ogre::Timer t_sse; __m128 ad = _mm_set_ps1(-0.0); __declspec(align(16)) float result[4]; for (int i = 0; i < 10000000 / 4; i+=4) { __m128 a = _mm_set_ps(randNumbers[i], randNumbers[i + 1], randNumbers[i + 2], randNumbers[i + 3]); __m128 b = _mm_andnot_ps(ad, a); _mm_store_ps(result, b); } const size_t duration_sse = t_sse.getMilliseconds(); |
1
1 thought on “Absolute value functions performance”
Comments are closed.
with
Pretty! It’s been a very fantastic post. For giving this data thankyou.