Welcome to my statistics blog! I have a special interest in how statistics can lie but I’ll write about anything math-related.

## Statistics Blog Index

Animal Testing Statistics: Is Experimentation Necessary?

Traveling Salesman Problem & TSP Art

Why do we use sigma / sqrt (n)?

Statistics Made Easy. Six invaluable tips to make sure you pass your statistics class!

Misleading Statistics Examples in advertising and in the news. A roundup of the most famous, the funniest, and the most blatant rubbish in the news and in advertising.

Cool Math Games for Statistics and Probability. Some of my favorite (fun!) math games from around the web.

Square Root Biased Sampling at the Airport. Perhaps this will be the end to me getting special attention *every* time I fly.

Even physicians don’t understand statistics. That’s right — don’t believe everything your doctor tells you. Especially if it’s about cancer.

Even Physicians Don’t Understand Statistics

If you’re feeling overwhelmed by your statistics class — you aren’t alone. Even doctors don’t really understand statistics. The ramifications can be serious, as an article in the journal Psychological Science in the Public Interest points out.

One example highlighted in the report concerns mammogram results. A woman has a routine mammogram and is informed by her physician that the test is positive. As a woman who has *had* two abnormal mammograms, it can be a distressing experience. But the main question on my mind was “*If my mammogram is positive, what are the odds that I actually have cancer?”. *

As a statistician, I know the odds are in my favor (not that it makes for a less distressing experience). However, in the report, the authors gave gynecologists (women’s reproductive health specialists) all of the information needed to answer that question (if a woman has a positive mammogram reading, what are the odds that she has cancer?). An alarming 80 percent of physicians got the answer wrong. The correct answer in the case given to them was 10 percent:

- 20 percent of doctors answered 10 percent
- 20 percent of doctors answered 1 percent
- 60 percent of doctors answered
**81 or 90 percent**

The last fact is perhaps the most shocking. A woman walking into one of these physicians office’s with a positive mammogram result would be told her chances of cancer were almost certain (81 or 90 percent) when in fact her chances of cancer were relatively small. A large amount of women report depression and anxiety months after receiving an abnormal mammogram, so these physicians are putting needless extra stress on their patients.

The main problem isn’t that statistics is hard, it’s that numbers in news reports and medical journals are often presented in confusing ways (just take a look at statistics on any major medical site to see examples). Even government information can be deceptive. In 1995, the UK government warned physicians that the risk of potentially lethal blood clots in the legs had doubled with the advent of new brands of contraceptives. That sounds very alarming until you read that the risk had risen from 1 in 7000 to 2 in 2000. That’s still a relatively low risk factor for a drug.

## Online Statistics Book List

A lot of students find this statistics blog through searches for “Online Statistics Book.” After all, with 600+ articles on elementary statistics, you’ve practically got a textbook for free (although you can check out our PDF version here). However, not everyone visiting this statistics blog is looking for a basic online statistics book. There *are* a lot of resources out there to help you if you’re looking for something a little more advanced. Here are some of my favorites:

- Basic/Elementary Statistics: The Practically Cheating Statistics Handbook. More than just a handbook — it included hundreds of articles on just about any elementary statistics topics you can think of, all in step-by-step format. Plus, if you buy this online statistics book you get three other statistics books for FREE (Excel Statistics and two calculator guides).
- Statistical Reasoning: Carnegie Mellon. The course includes all expository text, simulations, case studies, comprehension tests, interactive learning exercises, and the StatTutor labs. All for completely FREE!
- Mathematical Statistics: MIT. This free online mathematical statistics book is a graduate-level statistics course based upon a book in progress by MIT Professor Dudley. Copies of the book chapters and sections are presented as lecture notes on the website and can be browsed, downloaded and printed. Also included on the site are problem sets and exams.
- Statistics for Applications: MIT. This free course is free online and includes lecture notes and exams.
- Probability and Statistics in Engineering: MIT. Another free course with excellent lecture notes.

If you prefer an online interactive environment to learn R and statistics, this *free R Tutorial by Datacamp* is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try *this Statistics with R track*.

*Facebook page*and I'll do my best to help!

I’m not exactly sure what you mean by “option”. It usually depends on what data you put into your hypothesis test.

I am totally lost with my statistics class, I’ve got 2 weeks left in this class & I am stuck!! I am looking for someone to help me with my homework, 1 quiz & my final!!! For example this is one of my homework questions:

find the critical value(s) for a left-tailed z-test with a=0.02

If you can help & explain it to me in ENGLISH it would help me a lot!!

What formula to use when I have different respondents from my independent and dependent?

What are you trying to find? (i.e. standard deviation etc.)

I have a series of sales values

$2,450,271 $2,007,882 $2,604,400 $2,565,980

Using an excel formula or function, how do I determine the optimum sales value from this series.

Can any one explain how to detect outliers using IQR for nominal and categorical values.

Please tell me how to use Ornstein-Uhlenbeck process in trading ? explain with example.

Hi there. Please help…

If power is “the probability of detecting a real effect is a real effect exists in nature” explain what it’s mutually exclusive alternative is. (Hint: What is a more common name for the probability of not-detecting a real effect if a real effect exists in nature?)

There are two possibilities, you either have an effect or you don’t. In other words, the mutually exclusive alternative is that there is no effect.

An online game has a daily bonus.

There are 4 different possible bonuses. These are a Card, Chips, Reward points, and a 2X multiplier.

Each day you need to select one of three “boxes” to get your daily bonus.

For control purposes, the same box is chosen every day. For illustrative view the one the far left.

What is the probability that one of the bonuses has not come up in 700 attempts?

Assuming they have equal probability, it’s tiny (.75^700).

Hi, Sorry I’m going to ask about basic question, to make sure whether I’m on the right track or not. If we want to count the proportion of yes from sort of data answer (Yes and No), do we need to count the missing data as a sum of the denominator also? (e.g yes:2 no : 6 No data:2). So the ‘yes’ proportion will be 2/10 or 2/8?

Thank you

I wouldn’t count it in the denominator. So yes — use 2/8.

What test should be used to compare 3 different populations, when you only have 1 sample mean per population?

Z score can’t be used right as its only to compare 2 populations, and One Way ANOVA needs several sample means.

For eg i want to compare the prevalence of wheezing in my country from 2013 to that of 2002 and that of 1994.

Why not use ANOVA? It sounds like you do have three sample means (one from each population) unless I am misunderstanding what data you have. ANOVA can be used to test the hypothesis that the means are the same from three or more different populations: those from 1994, 2002 and 2013.

Hello, Can anyone provide a step by step example of how to go about filling in the missing areas in this Anova output table for my homework assignment please? I’m sorry I couldn’t copy the actual box separation lines.

The following table is the output from an ANOVA analysis comparing 4 groups. There were a total of 160 subjects. Use this information to fill in the question marks in the output table.

I would greatly appreciate it.

Thank You

df Sum Sq Mean Sq F p

Between ? ? 13.786 ? <0.001

Within ? 137.47 ?

Total 159 ?