A Bayesian Solution to the Conflict of Narrowness and Precision in Direct Inference

The conflict of narrowness and precision in direct inference occurs if a body of evidence contains estimates for frequencies in a certain reference class and less precise estimates for frequencies in a narrower reference class. To develop a solution to this conflict, I draw on ideas developed by Paul Thorn and John Pollock. First, I argue that Kyburg and Teng’s solution to the conflict of narrowness and precision leads to unreasonable direct inference probabilities. I then show that Thorn’s recent solution to the conflict leads to unreasonable direct inference probabilities. Based on my analysis of Thorn’s approach, I propose a natural distribution for a Bayesian analysis of the data directly obtained from studying members of the narrowest reference class.


Introduction
In direct inference, the probability that a certain individual belongs to a target class is equated with the relative frequency of the target class in a suitable reference class (Venn 1888;Reichenbach 1949;Pollock 1990;Kyburg 1961). For instance, from the premises that (1) Roland is male Austrian and that (2) 27.3% of male Austrians smoke, it follows by direct inference that Roland smokes with probability 0.273. Knowledge of frequencies in narrower reference classes to which the individual belongs defeats the above direct inference. If we add the premises that (3) Roland is 32 years old and that (4) 35.7% of male Austrians aged 32 smoke, then it follows by direct inference that Roland smokes with probability 0.357. The single-case probability here is the rational degree of belief or the rational credence of an agent that entertains the relevant body of evidence. In what follows, I will call it direct inference probability. The direct inference probability expresses the agent's lack of knowledge and uncertainty regarding Roland's smoking status. It is not a physical probability, i.e., it does not express objective randomness in the world. For the sake of brevity, I often will not mention 'target class' or 'frequency' explicitly. For instance, instead of speaking of 'the estimate for the frequency of the target class in the narrower reference class', I will be speaking of the 'estimate for the narrower reference class'.
As pointed out by numerous authors, striving for narrow reference class and striving for precise estimates of frequencies in reference classes are competing aims (Venn 1888;Reichenbach 1949;Kyburg and Teng 2001). Narrower reference classes should be preferred in direct inference, because frequencies of the target class in narrower reference classes are more relevant to the direct inference probability. However, narrower reference classes have less members. Hence, it is harder to take a sufficiently large sample of individuals that belong to a narrow reference class. As a result statistical estimates for frequencies of the target class in a narrow reference class are often less precise. As the pioneer of reference class reasoning John Venn puts it [...]; but whilst cautioning us against appealing to too wide a class, it seems to suggest that we cannot go wrong in the opposite direction, that is in taking too narrow a class. And yet we do avoid any such extremes. John Smith is not only an Englishman; he may also be a native of such a part of England, be living in such a Presidency, and so on. An indefinite number of such additional characteristics might be brought out into notice, many of which at any rate have some bearing upon the question of vitality. Why do we reject any consideration of these narrower classes? We do reject them, but it is for what may be termed a practical rather than a theoretical reason. Now many of the attributes of any individual are so rare that to take them into account would be at variance with the fundamental assumption of our science, viz. that we properly concerned only with averages of large numbers (Venn 1888, p. 220).
Simply rejecting these narrower reference classes is a luxury we often cannot afford. In personalized medicine, for instance, the narrowest reference class for which reliable statistics can be compiled is often not narrow enough. Probabilistic relationships between the narrowest reference class for which reliable statistics can be compiled and the target class are often not strong enough to be useful for medical diagnosis and prediction (Manolio et al. 2009;Salari et al. 2012).
Hence, often a body of evidence contains estimates for frequencies in a certain reference class and less precise estimates for frequencies in a narrower reference class. In this case I say that a conflict of narrowness and precision obtains. To provide the rational degree of belief in face of a conflict of narrowness and precision amounts to solving it. What should our degree of belief that Roland smokes be if our body of evidence, for instance, contains in addition to (1) Roland is male Austrian and Roland is 32 years old and (2) 27.3% of male Austrians smoke, also that (3) the frequency of 32 year old male Austrians that smoke is between 20 and 50%?
To develop a new solution to the conflict of precision and narrowness, I draw on ideas developed by Paul Thorn and John Pollock. To determine direct inference probabilities, Thorn (2012Thorn ( , 2016 and Pollock (1990Pollock ( , 2011 consider arbitrary subsets of the broader reference class. I show that Thorn's approach leads to unreasonable direct inference probabilities and propose a remedy.
The paper is organised as follows. In Sect. 2, I present Kyburg and Teng's approach to direct inference. I argue that it leads to unreasonable direct inference probabilities. In Sect. 3, I present Thorn's and Pollock's approaches to direct inference. I then show that Thorn's solution to the conflict of narrowness and precision leads to unreasonable direct inference probabilities. I also analyse the causes for the failure of Thorn's solution. In Sect. 4, I propose a new Bayesian solution to the conflict of precision and narrowness that employs a natural prior.

Kyburg and Teng's Approach to Direct Inference
In this section, I present Kyburg and Teng's solution to the conflict of narrowness and precision. I argue that the resulting direct inference probabilities are often unreasonable. Instead of only considering the case of point-valued information, Kyburg and Teng also consider interval-valued information about frequencies in reference classes. Let T be a target class, R; R 0 ; R 1 ; . . .; R 5 be reference classes, and c an individual. Suppose that freqðTjRÞ 2 ½x; y states that the relative frequency of individuals in R that are also T lies in the interval [x, y]. Finally, assume that PROB(A) is the rational degree of belief that the proposition A is true. Following Kyburg and Teng, I call two intervals [x, y] and [u, v] conflicting if and only if neither one is a subset of the other, i.e., if and only if ½x; y 6 ½u; v and ½u; v 6 ½x; y. Kyburg and Teng (2001) propose the following reference class rules.
If a body of evidence contains information for many different reference classes, the criteria of specificity and precision may interact. For such complex direct inference scenarios Kyburg and Teng (2001) suggest to sharpen the body of evidence before drawing any direct inference. They propose the following procedure to sharpen a body of evidence. In a first step, they apply the Criterion of Specificity to the whole body of evidence. This step rules out all intervals that are conflicting with the interval for the most specific reference class. In a second step, they apply the Criterion of Specificity to the second most specific reference class in the remaining set. They iterate this procedure until for any two remaining reference classes R 0 R the intervals are not conflicting. Application of the Criterion of Precision yields the tightest of these intervals. Finally, they apply the combination rule to the remaining competing reference classes. The following example illustrates this procedure. Suppose that c belongs to the reference classes R 1 ; R 2 ; R 3 ; R 4 ; R 5 . Our body of evidence contains the following information about frequencies in those reference classes.
1. The Ace Urn Company orders a proportion of .51 red balls; 2. The proportion of red balls ordered by the Taiwan Division is somewhere in [.01, .52].
An agent draws a ball from an urn which is labeled by ''Ace Urn Company-Made in Taiwan''. What is the probability that the agent will draw a red ball? Kyburg and Teng's approach leads to the direct inference probability 0.51. Suppose a brilliant chef offers us a very delicious meal, if either (1) we guess the correct color of the ball or (2) we roll a fair die and it comes up 6. We have the choice. I agree with Stone that we should choose the first option and guess ''black''. Kyburg and Teng, however, recommend ''red''.

Subset Approaches
For determining direct inference probabilities, Pollock (2011) and Thorn (2016) consider arbitrary subsets of the broader reference class. In this section, I present Thorn's solution to the conflict of narrowness and precision. I show that and why it leads to unreasonable direct inference probabilities. Finally, I argue that Pollock's approach to direct inference encounters the very same problems.

Thorn's Epistemic Utility Argument
Thorn (2016) gives the following epistemic utility argument for equating the direct inference probability with the expected frequency in the narrowest reference class. Denote by 'S c ' the conjunction of all properties that the individual c is known to have. Suppose that we assign to all individuals d 1 ; . . .; d n with the properties S c the same direct inference probability Probðd i 2 TÞ ¼ v i ¼ v. Thorn calls such a policy principled. Let Vðd i 2 TÞ 2 f0; 1g be the truth value of the proposition d i 2 T. Assume that we measure epistemic inaccuracy by the average squared deviation of the predicted values from the true values Then S is minimised if and only if v ¼ freqðTjS c Þ. Hence, setting the direct inference probability to the frequency in the narrowest reference class maximises epistemic accuracy in the class S c . Thorn (2016) shows that this result holds for a much more general class of accuracy measures (so-called proper scoring rules). Often, however, freqðTjS c Þ will be unknown. In these cases, the expected value of freqðTjS c Þ maximises expected accuracy in the class S c (Thorn 2016). Thorn calls the expected value of freqðTjS c Þ expected frequency. Following Thorn, we denote it by 'E½freqðTjS c Þ'. For this reason he entertains the following direct inference rule: If R 0 is the narrowest reference class the individual c is known to belong to and E½freqðTjR 0 Þ ¼ r, then PROBðc 2 TÞ ¼ r. 2 This concludes the presentation of Thorn's argument. Thorn (2016) himself identifies and discusses four non-trivial assumptions in his argument. First, the expected frequency in the narrowest reference class minimises expected inaccuracy only if accuracy is measured by proper scoring rules. If epistemic inaccuracy is measured by the absolute deviation S 0 ¼ 1 n P n i¼1 jv À Vðd i 2 TÞj, then the expected frequency in the narrowest reference class does not minimise S 0 . Second, the expected frequency in the narrowest reference class minimises expected inaccuracy only if we make predictions about all individuals with S c . Third, expected accuracy is relative to the distribution employed to calculate the expected accuracy. Maximising expected accuracy is only a legitimate aim if the distribution is empirically accurate, i.e., if it matches the relative frequencies in the world. Fourth, even if the distribution employed to calculate the expected frequency is accurate, one needs to care about expected accuracy. Thorn's assumptions are by no means uncontroversial. Below, I discuss ways in which they may fail. For Thorn's second assumption, two alternatives seem to be at least equally plausible. We could aim to make the best decision for the single-case c 2 T or we may require maximum accuracy in the class of all individuals for which predictions are being made. In both cases setting PROBðc 2 TÞ ¼ freqðTjS c Þ does not maximise accuracy. Regarding the third assumption I show in Sect. 4.1 that Thorn's distribution to determine the expected value and expected accuracy is inaccurate. Finally, clearly, expected accuracy is of main importance in the long-run-but should it be of importance in the short-run? Opinions are divided here. On the one hand, Pollock claims that many single-cases add up to a long-run. For this reason, they should not receive special treatment.
People sometimes protest at this point that they are not interested in the general case. They are concerned with some inference they are only going to make once. They want to know why they should reason this way in the single case. But all cases are single cases. If you reason in this way in single cases, you will tend to get them right (Pollock 2011, p. 32).
On the other hand, Williamson does not require that direct inference probabilities maximise expected accuracy. His direct inference probabilities should minimise worst-case expected loss (Williamson 2013). Minimising worst-case loss is a cautious strategy. 3 A detailed discussion of Thorn's assumptions goes beyond the scope of the present paper. In what follows I mainly show that Thorn's third assumption is violated. His distribution employed to calculate expected accuracy is inaccurate. Hence, maximising accuracy relative to Thorn's distribution is the wrong thing to do.

Thorn's Solution to the Conflict of Narrowness and Precision
Consider now the conflict of narrowness and precision. Assume that c belongs to two reference classes R 0 R. Suppose that 1) we have precise-valued information freqðTjRÞ ¼ x for the frequency in the broader reference class and that 2) we have imprecise-valued information freqðTjR 0 Þ ¼ v 1 _ . . . _ freqðTjR 0 Þ ¼ v n for the frequency in the narrower reference class. According to Thorn's approach (see Sect. 3.1), PROBðc 2 TÞ ¼ 3 In the following example, minimising worst-case loss and maximising accuracy lead to different direct inference probabilities. Suppose, for instance, that we only know that c belongs to the reference class E. Suppose further that a statistical trial yields the maximum likelihood estimate m for freq(T|E), i.e., setting m ¼ freqðTjEÞ maximises the probability of the observed outcome of the trial. Williamson recommends to calibrate PROBðc 2 TÞ only to a certain extent with the maximum likelihood estimate m. Suppose that ½m À a; m þ a is a x%-confidence interval for freq(T|E), where x is the confidence level at which the agent grants that freqðTjEÞ 2 ½m À a; m þ a. If the agent grants that freqðAjEÞ 2 ½m À a; m þ a, Williamson's calibration norm locates PROBðc 2 TÞ in the interval ½m À a; m þ a. Williamson's equivocation norm selects the most cautious value in ½m À a; m þ a, i.e., whichever is the closest to 0.5, for PROBðc 2 TÞ. If confidence intervals are wide and estimates are less precise, then Williamson's direct inferences may differ considerably from the maximum likelihood estimate m. For instance, if m ¼ 0:25 and a 95%-confidence interval is [0.1, 0.4], then Williamson's approach leads to PROBðc 2 TÞ ¼ 0:4. E½freqðTjR 0 Þ. To determine the direct inference probability we therefore need to calculate the expected frequency E½freqðTjR 0 Þ. Thorn proposes the following method to do this.
In what follows we set V ¼ fv 1 ; . . .; v n g so that freqðTjR 0 Þ ¼ v 1 _ . . . _ freqðTjR 0 Þ ¼ v n becomes freqðTjR 0 Þ 2 V. According to Thorn's Method 1, E½freqðTjR 0 Þ is a weighted average of the values in V (Thorn 2016). The weights are the probabilities that freqðTjR 0 Þ has value v i , i.e., PROBðfreqðTjR 0 Þ ¼ v i Þ. Thorn proposes to determine these probabilities by drawing direct inferences for the reference class R 0 . Since this strategy treats reference classes as individuals and subsumes them under other reference classes, I call the resulting direct inferences meta direct inferences.
According to Thorn, the most appropriate (meta) reference class for R 0 in this case is the set of all subsets of R that have the same size as R 0 and whose relative frequency of the target class is among the values in V. Hence, PROBðfreqðTjR 0 Þ ¼ v i Þ ¼ freqðfS : freqðTjSÞ ¼ v i gjfS : S R^jSj ¼ jR 0 j^freqðTjSÞ 2 VgÞ :¼ p i (Thorn 2016). Note that Means to calculating the z i are well-known in finite combinatorics. In fact, the z i are hypergeometrically distributed (see ''Appendix'' section Equation (5)). We will come back to this in Sect. 3.4. Thorn's reasoning can be summarized by the following two steps.
Step 1: Meta direct inference to determine the weights Premise 1: Step 2: Build weighted average of possible values for the narrower reference class The method generalizes then to the case jR 0 j 2 W where W is an arbitrary set (see Thorn 2016, Theorem 5). To calculate PROBðfreqðTjR 0 Þ ¼ v i Þ by the above meta direct inference is initially plausible. If the frequency of smokers in Austria is 30% and Roland is Austrian, then the probability that Roland smokes is 0.3. Analogously, if the frequency of subsets S of R such that freqðTjSÞ ¼ v i is p i and if R 0 is a subset of R, then the probability that freqðTjR 0 Þ ¼ v i is p i . As long as no better (meta) reference class for R 0 can be found, conclusions drawn by Thorn's approach remain plausible. In Sect. 4.1, I show that a better reference class can be found.
Thorn illustrates Method 1 by means of the following example: Example 1 Suppose Bill is a member of Company B. Company B has 100 members and 25 members are NCOs. Suppose also that Bill is a member of the command unit of Company B. The command unit has 10 members. Either 20% of the command unit are NCOs or 30% are NCOs. What is the probability that Bill is an NCO?
The example can be formalised this way: jRj ¼ 100, freqðTjRÞ ¼ 0:25, jR 0 j ¼ 10 and freqðTjR 0 Þ ¼ 0:2 _ freqðTjR 0 Þ ¼ 0:3. Following his Method 1, Thorn derives that the probability that Bill is NCO PROBðc 2 TÞ is 0.2485. 4 As Thorn correctly claims, this result does not depend much on the size of Company B. In addition, as we will shortly see, if jR 0 j is sufficiently large, E½freqðTjR 0 Þ does not depend much on the size of the command unit either.

Counter-Examples to Thorn's Method
In this section, I give some numerical examples in which Thorn's Method 1 leads to unreasonable direct inference probabilities. 5 These examples show that the weights p i are too extreme to serve as basis for direct inference.
class. In this case, the closest value to freq(T|R) gets most weight. In many cases it gets almost all weight. For instance, if freqðTjR 0 Þ ¼ 0:5 _ freqðTjR 0 Þ ¼ 0:7, then E½freqðTjR 0 Þ ¼ 0:507. 8 Of course, many of such examples can be given, but I assume that those given here are sufficient to show that Thorn's Method 1 is not correct.

Diagnosis
The main reason for the fact that Thorn's Method 1 leads to unreasonable direct inference probabilities is this: to build the needed expected frequency, Thorn relies on relative frequencies in arbitrary subsets of the broader reference class. Since, as I show in this section, the variation of these relative frequencies is very small, the weights p i are extreme.
Suppose that freqðTjRÞ ¼ r, then relative frequencies freq(T|S) in subsets S R cluster around r (see ''Appendix'' section). Surprisingly, however, these relative frequencies are quite uniform. If the sizes of the sets R and R 0 are sufficiently large, then for almost all subsets S R, freq(T|S) is very close to r. In other words: The variance of the distribution of freq(T|S) around its expected value r is small. Figure 1 illustrates this ''concentration'' or ''peaking'' property. For a more precise statement see Theorem 1 in the ''Appendix'' section. Moreover, if the sizes of the sets R and R 0 are tending to infinity, then for all [ 0 freqðfS : S R^jSj ¼ jR 0 j^jr À freqðTjSÞ [ jgÞ is tending to 0. I.e., for almost all subsets S R it is the case that jfreqðTjSÞ À freqðTjRÞj is smaller than every fixed number (think of epsilon as, for instance, 1 1000000000 ). 9 In Example 1, for instance, the variance of the distribution of freq(T|S) is approximately 0.017. It follows that for 95% of sets S 2 fS : S R^jSj ¼ jR 0 jg it holds that freqðTjSÞ 2 ½0; 0:5. Although this is a reasonable spread around 0.25, it still leads to problem one, two and four for Thorn's Method 1 as discussed in Sect. 3.3. Compared to values that are smaller than 0.5, values for freqðTjR 0 Þ that are higher than 0.5 get almost no weight. Worse still, since the variance of the relevant distribution is tending to zero as the sizes of R and R 0 become larger, this tendency magnifies. This ''peaking'' around the expected value is responsible for the third problem for Thorn's Method 1 as discussed in Sect. 3.3. For instance, if jRj ¼ 1000 and jR 0 j ¼ 100 in Thorn's example, then the variance is approximately 0.002. It follows that for 95% of sets S 2 fS : S R^jSj ¼ jR 0 jg it holds that freqðTjSÞ 2 ½0:17; 0:33.
I conclude that to determine the probabilities p i , Thorn uses a distribution with very low variance. The resulting direct inference probabilities are therefore extreme. To improve Thorn's approach, there are two (not mutually exclusive) options. First, one may consider special subsets of the broader reference class. The resulting distribution may then have sufficiently high variance. In Sect. 4.1, I propose such a special class of subsets. Second, to obtain more balanced probabilities, one may combine the distribution obtained from considering subsets of the broader reference class with a second distribution. Indeed, that the weights p i are solely determined by the distribution of subsets of the broader reference class is a second cause for the failure of Thorn's Method 1. Thorn ignores the fact that in most cases there has been evidence that establishes which values are epistemically possible for the narrower reference class. There has to be data or other evidence for the fact that, for instance, freqðTjR 0 Þ ¼ 0:5 _ freqðTjR 0 Þ ¼ 0:7. If this evidence is quite good, then accurate weights should be less sensitive to the distribution of subsets of the broader reference class. In Sect. 4.2, I propose a Bayesian way to combine the probabilities obtained from the distribution of subsets of the broader reference class with probabilities obtained form data on the narrower reference class.

The Case of No Information Concerning the Narrower Reference Class
Suppose that no information about the frequency in the narrower reference class is available. To determine the direct inference probability in this case, Thorn applies Method 1 to the logical truth freqðTjR 0 Þ ¼ 0 jR 0 j _ freqðTjR 0 Þ ¼ 1 jR 0 j _ . . . _ freqðTjR 0 Þ ¼ jR 0 j jR 0 j . Suppose that freqðTjRÞ ¼ r and R 0 R. Then: To determine PROBðfreqðTjR 0 Þ ¼ i jR 0 j Þ, Thorn applies Step 1 of Method 1. Method 2 then yields the following result (Thorn 2016), If freqðTjRÞ ¼ r and R 0 R, then E½freqðTjR 0 Þ ¼ r. I agree that the conclusion E½freqðTjR 0 Þ ¼ r is reasonable in such cases. 10 However, I think that the reasoning leading to this conclusion is faulty. To determine E½freqðTjR 0 Þ ¼ r, the set of all subsets of the broader reference class is the wrong (meta) Fig. 1 Plot of f : f0; . . .; jRjg ! ½0; 1; f ðsÞ :¼ PROBðX Freq ¼ sÞ reference class for R 0 . In Sect. 4.1, I will propose a more suitable (meta) reference class. However, expected values of two different distributions may agree. By a lucky coincidence Thorn's approach yields reasonable direct inference probabilities in this case. In other cases there is no such luck. Thorn's approach yields the wrong direct inference probabilities (see Sect. 3.3).

Pollock's Approach to Direct Inference
Like Thorn, Pollock motivates his theory of direct inference by considering the distribution of arbitrary subsets in sets.
Suppose we have a set of 10,000,000 objects. I announce that I am going to select a subset, and ask you how many members it will have. Most people will protest that there is no way to answer this question. It could have any number of members from 0 to 10,000,000. However, if you answer, Approximately 5,000,000, you will almost certainly be right. This is because, although there are subsets of all sizes from 0 to 10,000,000, there are many more subsets whose sizes are approximately 5,000,000 than there are of any other size. In fact, 99% of the subsets have cardinalities differing from 5,000,000 by less than :08%. (Pollock 2011, p. 329) Pollock (2011) equates the direct inference probability with what he calls the expectable value of the narrowest reference class. He calculates the expectable value within his theory of probable probabilities. These probabilities extrapolate the combinatorial probabilities for finite sets employed by Thorn to infinite sets. The expectable value only exists if the variance of the distribution of arbitrary subsets tends to zero as the size of the broader reference class tends to infinity. In Sect. 4.1, I show that the reasoning underlying Pollock's and Thorn's approach to direct inference is faulty. Considering arbitrary subsets of sets is not appropriate for determining direct inference probabilities. Consequently, although they may be accurate in some cases, in general, one cannot trust in the correctness of Pollock's direct inference probabilities.

Remedy: Natural Distributions
In this section, I propose a new Bayesian solution to the conflict of narrowness and precision. I argue that the meta direct inference to determine the weights (Step 1 in Thorn's Method 1) can be defeated. The set of all subclasses of the broader reference class R that people actually use in direct inference is a more suitable (meta) reference class for R 0 than the set of arbitrary subsets POWðR; jR 0 jÞ. The probabilities obtained by this (meta) reference class yield a natural prior distribution for my Bayesian approach.

Reference Classes are Exceptional Subsets
Thorn's Step 1 in Method 1 is based on meta direct inference. To draw the relevant direct inference, he subsumes R 0 under the reference class of all subsets of the broader reference class R. As I believe in the cogency of direct inference, in order to refute Thorn's Method 1, the meta direct inference has to be defeated by a narrower or a competing reference class for R 0 . Indeed, the set of all reference classes people actually use in direct inference is such a narrower reference class. Let RefA(T) be the set of all reference classes with respect to the target class T that are actually used in direct inference and let p Ã i ¼ freqðfS : freqðTjSÞ ¼ v i gjfS : S R^jSj ¼ jR 0 j^freqðTjSÞ 2 V^S 2 RefAðTÞgÞ. Subclass defeat for the meta direct inference to determine the weights Premise 1: p i ¼ freqðfS : freqðTjSÞ ¼ v i gjfS : S R^jSj ¼ jR 0 j^freqðTjSÞ 2 VgÞ Premise 2: R 0 2 fS : S R^jSj ¼ jR 0 j^freqðTjSÞ 2 Vg Premise 3: R 0 2 RefAðTÞ Premise 4: Premise 4 is reasonable. The fact that for almost all subsets S of the broader reference class R, freq(T|S) is close to freq(T|R) is difficult to reconcile with experience (see also Wallmann and Williamson 2017). In practice, we often find subsets that contain a rather different relative frequency of the target class than the original set. For instance, smoking rates in the United States vary strongly with gender, age, education, poverty status and many more. But according to the distribution of frequencies of all subsets, such variations are almost impossible (see Theorem 1 in the ''Appendix'' section). In other words: Thorn's probabilities PROB for frequencies in narrower reference classes do not match our observed frequencies for frequencies in narrower reference classes, i.e., they are inaccurate. But why is this the case?
In direct inference we consider certain classes of individuals, because we believe that they are causally related to the target class. Now, our past success in detecting causally relevant classes and the fact that almost all subclasses of reference classes are not causally relevant to the target class, suggest that we are quite successful in detecting ''exceptional'' subsets. Since they causally interact with the target class, these exceptional classes tend to be difference makers, i.e., the target class and the reference class tend to be probabilistically dependent. Therefore, frequencies within sub-reference classes that we actually use in direct inference do not cluster around a single value. They cluster around multiple values. Hence, Premise 4 is plausible: The variance among frequencies in sub-reference classes that we actually use in direct inference is higher than in subsets in general. Therefore, the new weights p Ã i are more balanced than the p i . We call distributions that describe how frequencies in sub-reference classes which we actually use in direct inference are distributed natural distributions (for the concept of natural distributions in a different context see Paris et al. 2000). We should use natural distributions in direct inference because they yield the best long-run epistemic consequences in the intended class: The class of all direct inferences that we actually draw. In absence of further knowledge, the natural distribution maximises expected accuracy in the class of all direct inferences that we actually draw (see Sect. 3.1).
Granted that the natural distribution is most suitable for direct inference. This fact is of little help for drawing direct inferences in practice, if there is no way to determine the natural distribution. How can we find out about the natural distribution? Paris et al. (2000) discuss two ways to estimate natural distributions in general. First, by empirical experimentation. We could (1) draw a sample of all direct inferences actually drawn such as freqðTjRÞ ¼ x, (2) study direct inferences in which subclasses S & R were employed, and finally (3) consider the frequencies freq(T|S) for such S 0 s. The distribution of these frequencies is then an estimate for the natural distribution. Second, we may propose some reasonable properties that natural distributions are supposed to have. For instance, we may assume the default independence E½freqðTjR 0 Þ ¼ freqðTjRÞ ¼ x.
A detailed discussion of how to find the natural distribution is beyond the scope of the present paper, but I think I have said enough to make three crucial points needed here. First, in building expected frequencies of the narrowest reference classes sub-reference classes that we actually use in direct inference (rather than arbitrary subclasses of the broader reference class) should be considered. Second, this natural distribution differs from Thorn's distribution. Third, it is possible, at least in principle, to determine the natural distribution.

A Bayesian Solution to the Conflict of Narrowness and Precision
On the one hand, in the empirical sciences, frequencies freqðTjR 0 Þ within reference classes are in most cases estimated by a suitable statistical procedure from a sample of members of R 0 . For instance, suppose that an observed random sample 11 (with replacement) of 16 R 0individuals contains 8 T-individuals. The likelihood for such a sample is freqðTjR 0 Þ 8 Á ð1 À freqðTjR 0 ÞÞ 8 . The likelihood is maximised for freqðTjR 0 Þ ¼ 0:5. The likelihood is much smaller if, for instance, freqðTjR 0 Þ ¼ 0:1. On the other hand, it should not be ignored that R 0 belongs to the set of all sub-reference classes of R people actually use. Thus, I reformulate the conflict of narrowness and precision this way.
Conflict of narrowness and precision reformulated How should the following information be aggregated to get an estimate for the value of freqðTjR 0 Þ? 1) Frequencies in a sample of the narrower reference class and 2) probabilities obtained from the natural distribution of sub-reference classes of R.
I propose to assign more weight to estimates based on data than to estimates based on the fact that the narrower reference class is a sub-reference class of the broader reference class. Especially, if the probability estimates based on data directly about the narrower reference and the probability estimate for the broader reference class have almost the same precision, this is reasonable.
Data over Expectation Principle When determining an estimate for freqðTjR 0 Þ, frequencies of the target class in narrower reference classes based on data should get more weight than probabilities derived from the fact that the narrower reference class is a sub-reference class of the broader reference class.
To satisfy the Data over Expectation Principle, I propose to use the natural distribution of sub-reference classes in the broader reference class as prior distribution in a Bayesian analysis of the data for the narrower reference class. If analysed within Bayesian statistics, samples of frequencies in narrower reference classes lead to a posterior distribution for freqðTjR 0 Þ. Contrary to Kyburg and Teng's approach to direct inference, not every point in an interval is treated equally (see Sect. 2).
Let D be a sequence of observations whether certain individuals in the narrower reference class belong to the target class and lðDjfreqðTjR 0 Þ ¼ v i Þ the likelihood of these observations given the relative frequency of the target class in R 0 is v i . Then 11 A sample is random if and only if in each draw of the sample every member of the population has the same probability of entering the sample.
where T ¼ P n i¼1 p Ã i Â lðDjfreqðTjR 0 Þ ¼ v i Þ is a normalizing constant. The direct inference probability is the expected value of the posterior distribution: Modulo prior distribution, the posterior distribution accounts for the fact that different values for freqðTjR 0 Þ explain that we observe a particular sample to a different degree. In our example, freqðTjR 0 Þ ¼ 0:5 explains the fact that we observed 8 T-individuals much better than freqðTjR 0 Þ ¼ 0:1. Hence, modulo prior probability, the posterior probability of freqðTjR 0 Þ ¼ 0:5 is much higher than the posterior probability of freqðTjR 0 Þ ¼ 0:1. Bayesian statistics has always been subject to the criticism that the posterior probability is subjective, because it strongly depends on the prior distribution chosen. However, since it contains the information of the frequency of the target class in the broader reference class, the natural distribution of sub-reference classes in the broader reference class is a reasonable prior distribution for freqðTjR 0 Þ. Equation (3) captures core intuitions about the conflict of narrowness and precision. The frequency in the broader reference class will influence the direct inference probability, if there is only a small sample for the narrower reference class available (this is the case in which the estimate for the narrower reference class is rather imprecise). As the sample size increases, the frequency in the broader reference class will loose influence on the direct inference probability. Again, a Bayesian line of reasoning is not viable in Thorn's and Pollock's approach. The low variance of the distribution for subsets in broader reference classes will make it almost impossible to update the prior on basis of data directly about the narrower reference class.

Conclusions and Future Work
Kyburg and Teng's approach leads to unreasonable direct inference probabilities. Thorn's approach and Pollock's approach are more promising. However, as my examples in Sect. 3.3 show, Thorn's Method 1 leads to unreasonable direct inference probabilities. For instance, if reference classes have sufficiently many members, then it leads to Kyburg and Teng's unreasonable Criterion of Precision.
The main reason for this is that Thorn considers arbitrary subsets of the broader reference class. However, for almost all subsets of the broader reference class it holds that the frequency of the target class is very close to the frequency of the target class in the broader reference class. Consequently, the probability distribution employed to build expected frequencies has too low variance. This point is more general and applies to any approach to direct inference that is based on combinatorial probabilities. In particular, it applies to Pollock's approach to direct inference.
In addition, Thorn's Method 1 is of limited practical applicability to diagnosis and prediction in the empirical sciences. It is silent about the case in which a sample of the target class in the narrower reference class is available. These samples lead to statistical estimates for relative frequencies of the target class in the narrower reference class.
In response to these two shortcomings, I developed a new Bayesian solution to the conflict of narrowness and precision that is based on two main assumptions. First, to determine expected values, instead of the distribution of frequencies in arbitrary subclasses, the natural distribution should be employed, i.e., the distribution of frequencies in sub-reference classes of the broader reference class that we actually use in direct inference should be employed. Second, probabilities obtained by the natural distribution need to be aggregated with estimates for the frequencies of the target class in the narrower reference class obtained from data. The resulting approach equates the direct inference probability with the expected value of the posterior distribution in the narrower reference class. A reasonable prior is given by the natural distribution. However, further research is needed to determine the relevant natural distribution, i.e., to determine the p Ã i .