加入收藏 | 设为首页 | 会员中心 | 我要投稿 云计算网_泰州站长网 (http://www.0523zz.com/)- 视觉智能、AI应用、CDN、行业物联网、智能数字人!
当前位置: 首页 > 站长学院 > MsSql教程 > 正文

sql-server – 检查是否存在EXISTS优于COUNT! ……不是吗?

发布时间:2021-03-14 21:05:55 所属栏目:MsSql教程 来源:网络整理
导读:副标题#e# 我经常阅读当必须检查行的存在时,应始终使用EXISTS而不是COUNT. 然而,在最近的几个场景中,我测量了使用计数时的性能提升. 模式如下: LEFT JOIN ( SELECT someID,COUNT(*) FROM someTable GROUP BY someID) AS Alias ON ( Alias.someID = mainTab
副标题[/!--empirenews.page--]

我经常阅读当必须检查行的存在时,应始终使用EXISTS而不是COUNT.

然而,在最近的几个场景中,我测量了使用计数时的性能提升.
模式如下:

LEFT JOIN (
    SELECT
        someID,COUNT(*)
    FROM someTable
    GROUP BY someID
) AS Alias ON (
    Alias.someID = mainTable.ID
)

我不熟悉告诉SQL Server“内部”发生了什么的方法,所以我想知道是否存在一个带有EXISTS的无法解决的缺陷,这对我已经完成的测量非常有意义(可以说是RBAR吗?!).

你对这种现象有一些解释吗?

编辑:

这是您可以运行的完整脚本:

SET NOCOUNT ON
SET STATISTICS IO OFF

DECLARE @tmp1 TABLE (
    ID INT UNIQUE
)


DECLARE @tmp2 TABLE (
    ID INT,X INT IDENTITY,UNIQUE (ID,X)
)

; WITH T(n) AS (
    SELECT
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM master.dbo.spt_values AS S
),tally(n) AS (
    SELECT
        T2.n * 100 + T1.n
    FROM T AS T1
    CROSS JOIN T AS T2
    WHERE T1.n <= 100
    AND T2.n <= 100
)
INSERT @tmp1
SELECT n
FROM tally AS T1
WHERE n < 10000


; WITH T(n) AS (
    SELECT
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM master.dbo.spt_values AS S
),tally(n) AS (
    SELECT
        T2.n * 100 + T1.n
    FROM T AS T1
    CROSS JOIN T AS T2
    WHERE T1.n <= 100
    AND T2.n <= 100
)
INSERT @tmp2
SELECT T1.n
FROM tally AS T1
CROSS JOIN T AS T2
WHERE T1.n < 10000
AND T1.n % 3 <> 0
AND T2.n < 1 + T1.n % 15

PRINT '
COUNT Version:
'

WAITFOR DELAY '00:00:01'

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT
    T1.ID,CASE WHEN n > 0 THEN 1 ELSE 0 END AS DoesExist
FROM @tmp1 AS T1
LEFT JOIN (
    SELECT
        T2.ID,COUNT(*) AS n
    FROM @tmp2 AS T2
    GROUP BY T2.ID
) AS T2 ON (
    T2.ID = T1.ID
)
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (RECOMPILE) -- Required since table are filled within the same scope

SET STATISTICS TIME OFF

PRINT '

EXISTS Version:'

WAITFOR DELAY '00:00:01'

SET STATISTICS TIME ON

SELECT
    T1.ID,CASE WHEN EXISTS (
        SELECT 1
        FROM @tmp2 AS T2
        WHERE T2.ID = T1.ID
    ) THEN 1 ELSE 0 END AS DoesExist
FROM @tmp1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (RECOMPILE) -- Required since table are filled within the same scope

SET STATISTICS TIME OFF

在SQL Server 2008R2(七个64位)上我得到了这个结果

COUNT版本:

Table ‘#455F344D’. Scan count 1,logical reads 8,physical reads 0,read-ahead reads 0,lob logical reads 0,lob physical reads 0,lob read-ahead reads 0.
Table ‘#492FC531’. Scan count 1,logical reads 30,lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms,elapsed time = 81 ms.

EXISTS版本:

Table ‘#492FC531’. Scan count 1,logical reads 96,lob read-ahead reads 0.
Table ‘#455F344D’. Scan count 1,elapsed time = 76 ms.

解决方法

I’ve often read when one had to check existence of a row should always be done with EXISTS instead of with a COUNT.

任何事情都是非常罕见的,特别是涉及到数据库时.在SQL中有许多表达相同语义的方法.如果有一个有用的经验法则,可能是使用最自然的语法编写查询(并且,是的,这是主观的),并且只有在您获得的查询计划或性能不可接受时才考虑重写.

对于它的价值,我自己对这个问题的看法是存在查询最自然地使用EXISTS来表达.这也是我的经验,EXISTS tends to optimize better比OUTER JOIN拒绝NULL替代.使用COUNT(*)和过滤= 0是另一种选择,恰好在SQL Server查询优化器中有一些支持,但我个人发现这在更复杂的查询中是不可靠的.无论如何,对于我来说,EXISTS似乎比任何一种替代品更自然.

I was wondering if there was a unheralded flaw with EXISTS that gave perfectly sense to the measurements I’ve done

您的特定示例很有趣,因为它突出了优化程序处理CASE表达式(特别是EXISTS测试)中的子查询的方式.

CASE表达式中的子查询

考虑以下(完全合法的)查询:

DECLARE @Base AS TABLE (a integer NULL);
DECLARE @When AS TABLE (b integer NULL);
DECLARE @Then AS TABLE (c integer NULL);
DECLARE @Else AS TABLE (d integer NULL);

SELECT
    CASE
        WHEN (SELECT W.b FROM @When AS W) = 1
            THEN (SELECT T.c FROM @Then AS T)
        ELSE (SELECT E.d FROM @Else AS E)
    END
FROM @Base AS B;

semantics of CASE是WHEN / ELSE条款以文本顺序进行评估generally.在上面的查询中,如果满足WHEN子句,如果ELSE子查询返回多行,则SQL Server返回错误是不正确的.为了尊重这些语义,优化器会生成一个使用传递谓词的计划:

只有当传递谓词返回false时,才会计算嵌套循环连接的内侧.总体效果是CASE表达式按顺序进行测试,并且仅在没有满足先前表达式的情况下才评估子查询.

带有EXISTS子查询的CASE表达式

在CASE子查询使用EXISTS的情况下,逻辑存在测试实现为半连接,但是在后面的子句需要时,必须保留通常被半连接拒绝的行.流经这种特殊类型的半连接的行获取一个标志,以指示半连接是否找到匹配.此标志称为探测列.

实现的细节是逻辑子查询被相关联接(‘apply’)替换为探测列.该工作由查询优化器中的简化规则执行,该规则称为RemoveSubqInPrj(在投影中删除子查询).我们可以使用跟踪标志8606查看详细信息:

SELECT
    T1.ID,CASE
        WHEN EXISTS 
        (
            SELECT 1
            FROM #T2 AS T2
            WHERE T2.ID = T1.ID
        ) THEN 1 
    ELSE 0
    END AS DoesExist
FROM #T1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (QUERYTRACEON 3604,QUERYTRACEON 8606);

(编辑:云计算网_泰州站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

推荐文章
    热点阅读