2024年4月12日发(作者:)

custid companyname

----------- ----------------

22 Customer DTDMN

57 Customer WVAXS

Next, run the following code to insert a new order to the Orders table with a NULL customer ID:

INSERT INTO

(custid, empid, orderdate, requireddate, shippeddate, shipperid,

freight, shipname, shipaddress, shipcity, shipregion,

shippostalcode, shipcountry)

VALUES(NULL, 1, '20090212', '20090212',

'20090212', 1, 123.00, N'abc', N'abc', N'abc',

N'abc', N'abc', N'abc');

Run the query that is supposed to return customers who did not place orders again:

SELECT custid, companyname

FROM ers AS C

WHERE custid NOT IN(SELECT

FROM AS O);

This time the query returns an empty set. Keeping in mind what you've read in the section about NULLs in , try to explain why the query

returns an empty set. Also try to think of ways to get customers 22 and 57 in the output, and in general, to figure out best practices you can

follow to avoid such problems, assuming there is a problem here.

Obviously, the culprit in this story is the NULL customer ID that was added to the Orders table and is now returned by the subquery among

the known customer IDs.

Let's start with the part that behaves the way you expect it to. The IN predicate returns TRUE for a customer who placed orders (for

example, customer 85) because such a customer is returned by the subquery. The NOT operator is used to negate the IN predicate; hence,

the NOT TRUE becomes FALSE, and the customer is not returned by the outer query. This means that when a customer ID appears in the

Orders table, you can tell for sure that the customer placed orders, and therefore you don't want to see it in the output. However, having a

NULL customer ID in the Orders table, you can't tell for sure whether a certain customer ID does not appear in Orders, as explained shortly.

The IN predicate returns UNKNOWN (the truth value UNKNOWN like the truth values TRUE and FALSE) for a customer such as 22 that

does not appear in the set of known customer IDs in Orders. The IN predicate returns UNKNOWN for such a customer because comparing

it with all known customer IDs yields FALSE, and comparing it with the NULL in the set yields UNKNOWN. FALSE OR UNKNOWN yields

UNKNOWN. As a more tangible example, consider the expression 22 NOT IN (1, 2, NULL). This expression can be rephrased as NOT 22

IN (1, 2, NULL). You can expand the last expression to NOT (22 = 1 OR 22 = 2 OR 22 = NULL). Evaluate each individual expression in the

parentheses to its truth value and you get NOT (FALSE OR FALSE OR UNKNOWN), which translates to NOT UNKNOWN, which

evaluates to UNKNOWN.

The logical meaning of UNKNOWN here before you apply the NOT operator is that you can't tell for sure whether the customer ID appears

in the set, because the NULL could represent that customer ID as well as anything else. The tricky part is that negating the UNKNOWN

with the NOT operator still yields UNKNOWN, and UNKNOWN in a query filter is filtered out. This means that in a case where it is

unknown whether a customer ID appears in a set, it is also unknown whether it doesn't appear in the set.

In short, when you use the NOT IN predicate against a subquery that returns at least one NULL, the outer query always returns an empty

set. Values from the outer table that are known to appear in the set are not returned because the outer query is supposed to return values

that do not appear in the set. Values that do not appear in the set of known values are not returned because you can never tell for sure that

the value is not in the set that includes the NULL.

So, what practices can you follow to avoid such trouble?

First, when a column is not supposed to allow NULLs, it is important to define it as NOT NULL. Enforcing data integrity is much more

important than many people realize.

Second, in all queries that you write you should consider all three possible truth values of a three-valued logic (TRUE, FALSE, and

UNKNOWN). Think explicitly about whether the query might process NULLs, and if so, whether the default treatment of NULLs is suitable

for your needs. When it isn't, you need to intervene. For example, in our case the outer query returns an empty set because of the

comparison with NULL. If you want to check whether a customer ID appears in the set of known values and ignore the NULLs, you should

exclude the NULLs—either explicitly or implicitly. An example of explicitly excluding the NULLs is by adding the predicate IS NOT

NULL to the subquery, like so:

SELECT custid, companyname

FROM ers AS C

WHERE custid NOT IN(SELECT

FROM AS O

WHERE IS NOT NULL);

An example of excluding the NULLs implicitly is using the NOT EXISTS predicate instead of NOT IN, like so:

SELECT custid, companyname

FROM ers AS C

WHERE NOT EXISTS

(SELECT *

FROM AS O

WHERE = );

Recall that unlike IN, EXISTS uses two-valued predicate logic. EXISTS always returns TRUE or FALSE and never UNKNOWN. When the

subquery stumbles into a NULL in , the expression evaluates to UNKNOWN and the row is filtered out. As far as the EXISTS

predicate is concerned, the NULL cases are eliminated naturally, as though they weren't there. So EXISTS ends up handling only known

customer IDs. Therefore, it's safer to use NOT EXISTS than NOT IN.

When you're done experimenting, run the following code for cleanup:

DELETE FROM WHERE custid IS NULL;

DBCC CHECKIDENT('', RESEED, 11077);