Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fastest-way-to-determine-if-an-integers-square-root-is-an-inte… #118

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# 判断一个integer数的平方根是否依然是integer的最快方法

## 问题

我在寻找一种可以快速判断一个 `long` 类型的数的平方根是不是依然是 `Integer` 类型的方法。

思路1: 使用内置的 `Math.sqrt()` 函数,但是我想知道有没有一种方法,通过把处理范围限制在 `Integer` 类型来更快地得出答案。
思路2: 维护一个 Lookup Table。这个不太现实,因为有 $$2^{31.5}$$ 个数的平方小于 $$2^{63}$$。

我现在在用下面的这个简单直接的方法

public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;

long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}

注:我在很多 [Project Euluer](https://projecteuler.net/) 问题里使用了这个函数。因此,别人无需维护此代码。 而且这种小型优化确实会有一些作用,因为我需要在在不到一分钟的时间内完成每一种算法,并且在某些问题中将需要数百万次调用此函数。

我已经尝试了以下解决方案:

1. 经过若干次尝试后,我发现给 `Math.sqrt()` 的结果加上 `0.5` 没太有必要,至少在我电脑上看不出区别。
2. [平方根倒数速算法](https://zh.wikipedia.org/wiki/%E5%B9%B3%E6%96%B9%E6%A0%B9%E5%80%92%E6%95%B0%E9%80%9F%E7%AE%97%E6%B3%95)很快,但是 `n >= 410881` 时结果会出错。但是我们可以像 [BobbyShaftoe 建议的那样](https://stackoverflow.com/users/38426/bobbyshaftoe),用 FISR 法处理 `n < 410881` 的情况。
3. Newton 的方法比 `Math.sqrt()` 慢很多。这是因为 `Math.sqrt()` 使用了和 Newton 相似的方法,但是它是在硬件上实现的,所以它比 Java 快很多。
4. 一种改进过 Newton 的方法,其中使用了一些 tricks 来设置只有 Integer 类型会参与运算,因此需要一些技巧来避免溢出(我希望此函数可以用于所有正的64位有符号数),但它仍然比 `Math.sqrt()` 慢。
5. 二分查找更慢。因为二分查找平均需要16次通过才能找到64位数字的平方根。
6. 根据 John 的测试,在 C++ 里使用 `or` 比用 `switch` 计算得更快,但是在 Java 和 C# 里两种运算符没有区别。
7. 我也尝试了 Lookup Table(一个有 64 个布尔值的 private static array)。除了 `switch` 和 `or` 运算符,我想说 `if(lookup[(int)(n&0x3F)]) { test } else return false;` 只是有一点点慢,主要是因为 Java 会[检查 array 的边界](https://stackoverflow.com/questions/299079/why-is-this-code-with-several-or-statements-slightly-faster-than-using-a-lookup-t#299205)。


## 回答

我找到了一种在我的 x86 CPU 电脑和 C++ 环境里比你的 6bits+Carmack+sqrt 代码快了 35% 的方法。你的结果可能会有所不同,尤其是因为我不知道Java的因素将如何发挥作用。

我的方法分成三步。

1. 首先,过滤掉明显的答案。这包括负数并查看最后 4 位(我发现查看最后 6 位没用)。我还对 `0` 返回 `True`。你看我的代码时,注意我的输入类型是 `int64`
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
return false;
if( x == 0 )
return true;


2. 第二步,检查这个数是不是 `255` 的平方模(square modulo)。因为那个数是三个不同质数的乘积,所以只有大约1/8的余数是平方数。然而,根据我的经验,调用模数运算符(`%`)的代价大于得到的好处,所以我使用涉及 255 这个数(255=2^8-1)的技巧来计算余数。
int64 y = x;
y = (y & 4294967295LL) + (y >> 32);
y = (y & 65535) + (y >> 16);
y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
// 当前,0<y<511. 更多代码可以进一步让数值变小.
我还用了一个预先算好的 table 来判断余数是不是平方数
if( bad255[y] )
return false;
// 这个 table 的长度是 512


3. 最后,使用一种和 [Hensel’s lemma](http://en.wikipedia.org/wiki/Hensel%27s_lemma) 类似的方法计算平方根。这种方法需要一些修改才能使用。在计算之前,我先把 `2` 的幂用二分查找的方法先除掉。
if((x & 4294967295LL) == 0)
x >>= 32;
if((x & 65535) == 0)
x >>= 16;
if((x & 255) == 0)
x >>= 8;
if((x & 15) == 0)
x >>= 4;
if((x & 3) == 0)
x >>= 2;
这时为了让我们的数字是一个平方数,它必须等于 `1 mod 8`。
if((x & 7) != 1)
return false;
以下是 [Hensel’s lemma](http://en.wikipedia.org/wiki/Hensel%27s_lemma) 的基本结构(代码未测试。如果不能用就把 `t` 的值改成 `2` 或者 `8`。
int64 t = 4, r = 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
用一个循环不断重复以上代码,直到 `t == 2^33`。

我的想法是,在每次迭代时给 `r` 加上一位,也就是 `x` 「当前」的平方根,每个平方根都是一个越来越大的 $$2^n$$ 的整除数。最终,`r` 和 `t/2` 就是 `x mod t/2` 的两个平方根。(注意:如果 `r` 是 `x` 的平方根,则 `-r` 也是 `x` 的平方根。在取模数时也是如此,但是要注意,对某些数 `mod` 时,这些数的根数会超过 2。值得注意的是,这包括 2 的幂。)考虑到我们讨论的平方根的范围仅限于小于 $$2^{32}$$ 的情况,所以只要检查 r 和 t/2-r 是不是真正的平方根就可以了。以下是我的 loop 的代码。

int64 r, t, z;
r = start[(x >> 3) & 1023];
do {
z = x - r * r;
if( z == 0 )
return true;
if( z < 0 )
return false;
t = z & (-z);
r += (z & t) >> 1;
if( r > (t >> 1) )
r = t - r;
} while( t <= (1LL << 33) );

这里有三方面可以帮助计算加速:(1)预先计算好的起始值(等价于十次迭代);(2)尽早结束循环;(3)跳过某些 `t` 值。
对于第三个方面,我关注了 `z = r - x * x`,并设置 `t` 为 2 的最大幂除以 `z`。这让我可以跳过那些无论如何都不会影响 `r` 值的 `t` 值。在我的这里,预计算的起始值会挑出「最小的正数」平方根模 8192。

以下是完整的经过测试的代码,包括预先计算的表。


typedef signed long long int int64;

int start[1024] =
{1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};

bool bad255[512] =
{0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
0,0};

inline bool square( int64 x ) {
// Quickfail
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
return false;
if( x == 0 )
return true;

// Check mod 255 = 3 * 5 * 17, for fun
int64 y = x;
y = (y & 4294967295LL) + (y >> 32);
y = (y & 65535) + (y >> 16);
y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
if( bad255[y] )
return false;

// Divide out powers of 4 using binary search
if((x & 4294967295LL) == 0)
x >>= 32;
if((x & 65535) == 0)
x >>= 16;
if((x & 255) == 0)
x >>= 8;
if((x & 15) == 0)
x >>= 4;
if((x & 3) == 0)
x >>= 2;

if((x & 7) != 1)
return false;

// Compute sqrt using something like Hensel's lemma
int64 r, t, z;
r = start[(x >> 3) & 1023];
do {
z = x - r * r;
if( z == 0 )
return true;
if( z < 0 )
return false;
t = z & (-z);
r += (z & t) >> 1;
if( r > (t >> 1) )
r = t - r;
} while( t <= (1LL << 33) );

return false;
}


Stack Overflow 原链接: https://stackoverflow.com/questions/295579/fastest-way-to-determine-if-an-integers-square-root-is-an-integer