Simpson's paradox: spoiler

Q: If drug B has a higher success rate (%age of cures) than drug A when given to women, and also when given to men, does it have a higher success rate when given to people in general?
2 Apr 1997

A: Not necessarily, e.g.

 WomenMen
SuccessFailure SuccessFailure
A853145
B3111

Then for women B (75%) is better than A (73%) and for men B (50%) is again better than A (44%), but for people in general A (71%) is better than B (67%).

Q: What's the smallest possible "paradoxical" situation (i.e. smallest total number of people)? There are two versions of the problem depending on whether we allow entries to be 0.
2 Apr 1997

A: When I first considered this puzzle I found the following two examples by hand, and wondered whether they were minimal:

Zeros not allowed (20 people):

  WomenMen
SuccessFailure SuccessFailure
A3415
B1114

Zeros allowed (9 people):

  WomenMen
SuccessFailure SuccessFailure
A2101
B1013

In 2011 I found that the first of these was quoted in Impossible? Surprising solutions to Counterintuitive Conundrums by Julian Havil (published 2008, ISBN 978-0-691-13131-3, available from Amazon for example). This inspired me to settle the question with a quick exhaustive computer search.

It turns out that for the zeros-not-allowed version the 20-person example above is not quite the best possible. The minimum possible total is 19, and there are two essentially different solutions:

  WomenMen
SuccessFailure SuccessFailure
A2125
B3213

and

  WomenMen
SuccessFailure SuccessFailure
A2135
B3212

For the zeros-allowed version the 9-person example above is minimal, and is essentially the only solution.
20 Mar 2011

Q: Here's another striking version. A greengrocer sells apples at a fixed price per fruit, and oranges similarly. Each day an apple costs more than an orange. I buy fruit on several days. On average, did my apples cost me more per fruit than my oranges?
20 Mar 2011

A: Not necessarily. For example:

Simpson's paradox can be interpreted as a sign that we're asking the wrong question. In the drugs trial we shouldn't be asking whether drug A is better than drug B, but rather why both drugs are more effective on women than on men. At the greengrocer's we shouldn't be asking whether apples cost more than oranges, but rather why the prices of both fruit changed so much on Tuesday.
20 Mar 2011

Q: Call the above situation a 2-level paradox, because we're measuring the drugs' effectiveness at two levels: the gender level and the overall population level. Is it possible to have a 3-level paradox? For example, is it possible that drug A has a higher success rate on people, but drug B has a higher success rate on women and on men, but drug A has a higher success rate on each of young women, old women, young men and old men?
Haidar Al-Dhalimy, 2 Apr 2021

A: Here's a 3-level paradox:

  WomenMen
YoungOld YoungOld
SuccessFailure SuccessFailure SuccessFailure SuccessFailure
A2331 1311
B1252 1445

The better drug for a given subpopulation is shown in red text:

By exhaustive search this example is minimal for the zeros-not-allowed version, although there are other minimal examples.

Inspired by this example I got a bit carried away and found a 6-level paradox. This diagram is transposed relative to the tables above, so each column aggregates pairs of subpopulations from the previous column. This diagram is also available as a PDF document.

By exhaustive search this example is minimal for the zeros-not-allowed version, although there are probably other minimal examples.

I produced a 6-level example because the table still fits on one page, but I stopped there because I'm not sure we'd learn very much by going further. I can't really see any structure in these examples, and I'm just using a brute-force search so I don't have a method for constructing them. I don't even have a bound on the size of minimal examples at a given level.

In this 6-level example the effect simply reverses at each level. However I think that in fact we could specify any pattern of red text in the diagram, and then find a population where colouring the better drug in each subpopulation yields that pattern, although I haven't proved it.

Non-spoiler