diff --git a/contents/fastest-way-to-determine-if-an-integers-square-root-is-an-integer.md b/contents/fastest-way-to-determine-if-an-integers-square-root-is-an-integer.md new file mode 100644 index 0000000..6b71224 --- /dev/null +++ b/contents/fastest-way-to-determine-if-an-integers-square-root-is-an-integer.md @@ -0,0 +1,239 @@ +# 判断一个integer数的平方根是否依然是integer的最快方法 + +## 问题 + +我在寻找一种可以快速判断一个 `long` 类型的数的平方根是不是依然是 `Integer` 类型的方法。 + +思路1: 使用内置的 `Math.sqrt()` 函数,但是我想知道有没有一种方法,通过把处理范围限制在 `Integer` 类型来更快地得出答案。 +思路2: 维护一个 Lookup Table。这个不太现实,因为有 $$2^{31.5}$$ 个数的平方小于 $$2^{63}$$。 + +我现在在用下面的这个简单直接的方法 + + public final static boolean isPerfectSquare(long n) + { + if (n < 0) + return false; + + long tst = (long)(Math.sqrt(n) + 0.5); + return tst*tst == n; + } + +注:我在很多 [Project Euluer](https://projecteuler.net/) 问题里使用了这个函数。因此,别人无需维护此代码。 而且这种小型优化确实会有一些作用,因为我需要在在不到一分钟的时间内完成每一种算法,并且在某些问题中将需要数百万次调用此函数。 + +我已经尝试了以下解决方案: + +1. 经过若干次尝试后,我发现给 `Math.sqrt()` 的结果加上 `0.5` 没太有必要,至少在我电脑上看不出区别。 +2. [平方根倒数速算法](https://zh.wikipedia.org/wiki/%E5%B9%B3%E6%96%B9%E6%A0%B9%E5%80%92%E6%95%B0%E9%80%9F%E7%AE%97%E6%B3%95)很快,但是 `n >= 410881` 时结果会出错。但是我们可以像 [BobbyShaftoe 建议的那样](https://stackoverflow.com/users/38426/bobbyshaftoe),用 FISR 法处理 `n < 410881` 的情况。 +3. Newton 的方法比 `Math.sqrt()` 慢很多。这是因为 `Math.sqrt()` 使用了和 Newton 相似的方法,但是它是在硬件上实现的,所以它比 Java 快很多。 +4. 一种改进过 Newton 的方法,其中使用了一些 tricks 来设置只有 Integer 类型会参与运算,因此需要一些技巧来避免溢出(我希望此函数可以用于所有正的64位有符号数),但它仍然比 `Math.sqrt()` 慢。 +5. 二分查找更慢。因为二分查找平均需要16次通过才能找到64位数字的平方根。 +6. 根据 John 的测试,在 C++ 里使用 `or` 比用 `switch` 计算得更快,但是在 Java 和 C# 里两种运算符没有区别。 +7. 我也尝试了 Lookup Table(一个有 64 个布尔值的 private static array)。除了 `switch` 和 `or` 运算符,我想说 `if(lookup[(int)(n&0x3F)]) { test } else return false;` 只是有一点点慢,主要是因为 Java 会[检查 array 的边界](https://stackoverflow.com/questions/299079/why-is-this-code-with-several-or-statements-slightly-faster-than-using-a-lookup-t#299205)。 + + +## 回答 + +我找到了一种在我的 x86 CPU 电脑和 C++ 环境里比你的 6bits+Carmack+sqrt 代码快了 35% 的方法。你的结果可能会有所不同,尤其是因为我不知道Java的因素将如何发挥作用。 + +我的方法分成三步。 + +1. 首先,过滤掉明显的答案。这包括负数并查看最后 4 位(我发现查看最后 6 位没用)。我还对 `0` 返回 `True`。你看我的代码时,注意我的输入类型是 `int64` + if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) + return false; + if( x == 0 ) + return true; + + +2. 第二步,检查这个数是不是 `255` 的平方模(square modulo)。因为那个数是三个不同质数的乘积,所以只有大约1/8的余数是平方数。然而,根据我的经验,调用模数运算符(`%`)的代价大于得到的好处,所以我使用涉及 255 这个数(255=2^8-1)的技巧来计算余数。 + int64 y = x; + y = (y & 4294967295LL) + (y >> 32); + y = (y & 65535) + (y >> 16); + y = (y & 255) + ((y >> 8) & 255) + (y >> 16); + // 当前,0>= 32; + if((x & 65535) == 0) + x >>= 16; + if((x & 255) == 0) + x >>= 8; + if((x & 15) == 0) + x >>= 4; + if((x & 3) == 0) + x >>= 2; + 这时为了让我们的数字是一个平方数,它必须等于 `1 mod 8`。 + if((x & 7) != 1) + return false; + 以下是 [Hensel’s lemma](http://en.wikipedia.org/wiki/Hensel%27s_lemma) 的基本结构(代码未测试。如果不能用就把 `t` 的值改成 `2` 或者 `8`。 + int64 t = 4, r = 1; + t <<= 1; r += ((x - r * r) & t) >> 1; + t <<= 1; r += ((x - r * r) & t) >> 1; + t <<= 1; r += ((x - r * r) & t) >> 1; + 用一个循环不断重复以上代码,直到 `t == 2^33`。 + +我的想法是,在每次迭代时给 `r` 加上一位,也就是 `x` 「当前」的平方根,每个平方根都是一个越来越大的 $$2^n$$ 的整除数。最终,`r` 和 `t/2` 就是 `x mod t/2` 的两个平方根。(注意:如果 `r` 是 `x` 的平方根,则 `-r` 也是 `x` 的平方根。在取模数时也是如此,但是要注意,对某些数 `mod` 时,这些数的根数会超过 2。值得注意的是,这包括 2 的幂。)考虑到我们讨论的平方根的范围仅限于小于 $$2^{32}$$ 的情况,所以只要检查 r 和 t/2-r 是不是真正的平方根就可以了。以下是我的 loop 的代码。 + + int64 r, t, z; + r = start[(x >> 3) & 1023]; + do { + z = x - r * r; + if( z == 0 ) + return true; + if( z < 0 ) + return false; + t = z & (-z); + r += (z & t) >> 1; + if( r > (t >> 1) ) + r = t - r; + } while( t <= (1LL << 33) ); + +这里有三方面可以帮助计算加速:(1)预先计算好的起始值(等价于十次迭代);(2)尽早结束循环;(3)跳过某些 `t` 值。 +对于第三个方面,我关注了 `z = r - x * x`,并设置 `t` 为 2 的最大幂除以 `z`。这让我可以跳过那些无论如何都不会影响 `r` 值的 `t` 值。在我的这里,预计算的起始值会挑出「最小的正数」平方根模 8192。 + +以下是完整的经过测试的代码,包括预先计算的表。 + + + typedef signed long long int int64; + + int start[1024] = + {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11, + 1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203, + 129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395, + 1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587, + 257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779, + 1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971, + 385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163, + 1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355, + 513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547, + 1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739, + 641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931, + 1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973, + 769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781, + 1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589, + 897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397, + 1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205, + 1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013, + 959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821, + 1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629, + 831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437, + 1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245, + 703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53, + 1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139, + 575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331, + 1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523, + 447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715, + 1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907, + 319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099, + 1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291, + 191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483, + 1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675, + 63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867, + 2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037, + 65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845, + 1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653, + 193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461, + 1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269, + 321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077, + 1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885, + 449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693, + 1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501, + 577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309, + 1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117, + 705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75, + 1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267, + 833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459, + 1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651, + 961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843, + 1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035, + 1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227, + 895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419, + 1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611, + 767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803, + 1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995, + 639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909, + 1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717, + 511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525, + 1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333, + 383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141, + 1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949, + 255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757, + 1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565, + 127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373, + 1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181}; + + bool bad255[512] = + {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, + 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, + 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, + 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, + 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, + 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, + 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, + 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, + 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, + 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, + 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, + 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, + 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, + 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, + 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, + 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, + 0,0}; + + inline bool square( int64 x ) { + // Quickfail + if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) + return false; + if( x == 0 ) + return true; + + // Check mod 255 = 3 * 5 * 17, for fun + int64 y = x; + y = (y & 4294967295LL) + (y >> 32); + y = (y & 65535) + (y >> 16); + y = (y & 255) + ((y >> 8) & 255) + (y >> 16); + if( bad255[y] ) + return false; + + // Divide out powers of 4 using binary search + if((x & 4294967295LL) == 0) + x >>= 32; + if((x & 65535) == 0) + x >>= 16; + if((x & 255) == 0) + x >>= 8; + if((x & 15) == 0) + x >>= 4; + if((x & 3) == 0) + x >>= 2; + + if((x & 7) != 1) + return false; + + // Compute sqrt using something like Hensel's lemma + int64 r, t, z; + r = start[(x >> 3) & 1023]; + do { + z = x - r * r; + if( z == 0 ) + return true; + if( z < 0 ) + return false; + t = z & (-z); + r += (z & t) >> 1; + if( r > (t >> 1) ) + r = t - r; + } while( t <= (1LL << 33) ); + + return false; + } + + +Stack Overflow 原链接: https://stackoverflow.com/questions/295579/fastest-way-to-determine-if-an-integers-square-root-is-an-integer +