Monday, September 18, 2023

Correlation doesn't mean causation

Correlation: A mutual relationship or connection between two variables. 


When there is a positive correlation, an increase in one variable is associated with an increase in the other. (For instance, scientists might correlate an increase in time spent watching TV with an increase in risk of obesity.) 

Where there is an inverse correlation, an increase in one value is associated with a decrease in the other. (Scientists might correlate an increase in TV watching with a decrease in time spent exercising each week.) 

A correlation between two variables does not necessarily mean one is causing the otherThus, is not sufficient for causation.

Different possibilities of causation

1. Direct causation: A causes B (direct causation). 

The pool stick striking the billiard ball causing it to jump. 

2. Reverse causation: B causes A (reverse causation)

The correlation between recreational drug use and psychiatric disorders. Perhaps the drugs cause the disorders, or else, people use drugs to self medicate for preexisting conditions, which brings causes the disorder. 

Children that watch a lot of TV are the most violent. Clearly, TV makes children more violent. 

This could easily be the other way round; that is, violent children like watching more TV than less violent ones!       
3. Concomitant causes: A and B both cause C 

In the case of COVID 19, a person with a preexisting condition. John gets Covid, but he's 76 years old, and has a heart condition. 

Is COVID a direct cause of death? 

Before we answer let's keep in mind these 4 conditions: 
1. COVID must precede John's death (IT DOES), 
2. It's nearly impossible for COVID being there and John not dying (FALSE, John could survive the COVID and his existing heart failure and not die). At this point the answer is already NO, because we need the four conditions together, but revise the last two:
3. the cause must make a difference (IT DOES to some extent). 
4. there is no common cause (AND HERE THERE IS: the heart condition!). 

So, this shows clearly that COVID IS CANNOT BE THE DIRECT CAUSE OF DEATH.

4. Cyclic causationA causes B and B causes A 

In predator-prey relationship, predator numbers affect prey numbers, but prey numbers (i.e. food supply of predators) also affect predator numbers.

5. Indirect causation: A causes C which causes B  

6. Fringe case: A and B are consequences of a common cause, but do not cause each other

Example (from psychology): The relationship between anxiety and shyness shows a statistical value (strength of correlation) of +.59. Therefore, it may be concluded that shyness, causally speaking, influences anxiety.

Yet, there is a catch, the so-called "self-consciousness score", with a sharper correlation (+.73) where  shyness brings up a possible "third variable" known as "self-consciousness". So now we have shyness, anxiety and self-consciousness together. When three such closely related measures are found, it suggests that each may be a cluster of correlated values each influencing one another to some extent. 

So, the first conclusion above (in gray) is false.Ishmael hits Ahab with his car. Ahab is rushed to the hospital and is sent into surgery. During the course of the operation, the surgeon is careless and causes Ahab more injuries. 
No causation: There is no connection between A and B (coincidence)

See the two curves in the chart above (consumption of margarine and the divorce rate in Maine over 10-year period). As if Americans' fondness for margarine correlated with the divorce rate in Maine. This is an instance of two unrelated data sets showing a coincidental pattern.

Confounding (in statistics): A situation where one or more unrecognized variables (conditions or events) were responsible for some effect. 

This could give the faulty impression that the effect was due to something else. Confounding often occurs when researchers did not “control” for the possibility that other variables were or could be at work. 

Example:  The estimated risk ratio for CVD (cardiovascular diseases) in obese as compared to non-obese persons is RR = 0.153/0.86 = 1.79, suggesting that obese persons are 1.79 times as likely to develop CVD compared to non-obese persons. 

However, it is well known that the risk of CVD also increases with age. Could any (or all) of the apparent association between obesity and incident CVD be attributable to age? If the obese group in our sample is older than the non-obese group, then all or part of the increased CVD risk in obese persons could possibly due to the increase in age rather than their obesity. If age is another risk factor for CVD, and if obese and non-obese persons differ in age, then our estimate of the association between obesity and CVD will be overestimated, because of the additional burden of being older. Thus, age meets the definition of a confounder (i.e., it is associated with the primary risk factor(obesity) and the outcome (CVD). In fact, in this data set, subjects who were 50+ were more likely to be obese (200/400 = 0.500) as compared to subjects younger than (100/600=0.167), as demonstrated by the table below.   


No comments: