Testimonials! That’s what you need when you write an ad for the frum community!” The extremely perky graphic artist who was interviewing for a job with me was certain. “People want to read stories about how it works. Numbers are boring.”

She has a point. Numbers can be boring, unless we make the numbers tell a story. The problem with testimonials, though, is that no matter how glowing they are, they only describe one person’s experience. In research, we call this anecdotal evidence. Anecdotes can be compel-ling, but they are not enough to demonstrate the efficacy of any treatment or product.

In 1987, Francine Shapiro was upset. She had some serious issues to mull over, so she took a walk in a busy park. She was thinking about some disturbing memories that just wouldn’t leave the forefront of her brain. As her eyes scanned the scene in the park, she noticed that her distressing emotions began to fade.

Research often starts with an anecdote. Shapiro noticed an interesting phenomenon; her bad mood begin to fade as she scanned the scene in the park. Shapiro was curious about this. What was making her bad mood dissipate? As an experienced researcher, Shapiro created a hypothesis — she thought it had something to do with the way her eyes were flicking back and forth, back and forth, over the scene. She believed that this eye movement was allowing her brain to sort the troubling images that were stored in the visual part of her brain, and move them into the long-term storage area instead.

We can generate alternative hypotheses about this story. Many things could have accounted for Shapiro’s change in mood. She could have been helped by being out in nature. There is some research to suggest that simply being outdoors, in a rural setting, has a positive effect on mood. Walking is a form of exercise, and exercise is known to lower depressive symptoms. Maybe walking was the therapeutic process here, and the eye movements were just coincidental? Perhaps nothing at all was changing her mood. We know that moods naturally dissipate over time, because our brain can’t sustain distressing emotion for very long. Perhaps Shapiro was experiencing the natural ebb and flow of emotions.

Shapiro founded a well-respected form of psychotherapy known as Eye Movement Desensitization and Reprocessing (EMDR). It has been demonstrated to be effective in hundreds of studies. But how do we know? How do we determine that the intervention we are using is actually effective? For example, suppose we have a belief that Product XYZ, when rubbed into a child’s temples, helps him concentrate. We faithfully take the child out of the classroom every day, rub Product XYZ on his temples, and we notice a change in his learning and behavior. Then we find out that Product XYZ is a fraud. It’s just colored water! What changed the child’s behavior? Something else about the encounter — the break from the class-room? The Product XYZ administrator, who’s just a really cool lady? Before we spend $400.00 on a 12 ounce bottle of Product XYZ, we need to know if it is indeed helpful.

Enter statistics. The job of statistics is to help us describe, organize, and interpret information. The information we study is vast — we can study improvement in reading scores in a school district when a new curriculum is put into place, how quickly people solve math problems when they are sleep deprived, how many side effects one drug has as compared to another, or even the average price of a kosher dairy meal in Teaneck, New Jersey. How do we find these things out? Data are collected, organized, summarized, and then interpreted.

Descriptive statistics does exactly what it sounds like. We use descriptive statistics to describe the characteristics of a collection of data, called a data set. Suppose we want to know what the typi-cal level of reading proficiency is for fifth graders in a certain school. We can test each fifth grader, get their score, and use calculations such as mean, median, and mode, to ascertain what the average score is, what the mid-level score is, and what the most frequent score is.

Inferential statistics is the next step. Very often, it’s too cumber-some to collect data from everyone we are curious about. It’s easier to collect the test scores of random fifth-grade classrooms, and then make an inference, or an educated prediction, about all fifth graders, based on our smaller sample.

Remember Francine Shapiro, and her eye movements? She started with a hypothesis — something about eye movement, while thinking about traumatic memories, helps the traumatic memories be processed better. Her next steps are twofold. She has to do some research, building up a knowledge base for why this might work, as well as conduct some experiments to see if her hypothesized treatment does what she thinks it will do.

The thing about studying most psychological processes and disturbances is that many things resolve on their own. So how can we know if a treatment is indeed effective? It would be great if we could try every single treatment out on every single person, but that would be cumbersome, and would have its own problems. For one thing, too many treatments are also detrimental! Enter the world of probability.

Suppose Shloimy Shlemazel and Lucky Levi toss a coin to resolve a dispute. Lucky Levi calls “heads,” and Shloimy Shlemazel calls “tails.” If they toss the coin 10 times, approximately how many times should the coin land on “heads” and how many times should it land on “tails”? The law of probability tells us that it’s most likely that the coin will land on “heads” five times, and “tails” five times. However, it’s possible that the coin could land on “heads” six times, and “tails” four times.

Suppose Shloimy Shlemazel suspects that Lucky Levi is cheat-ing. At what point might you agree with him? If the coin lands on “heads” seven times? Eight times? All ten times? At some point, you’re going to say, “Hey, wait a minute. Something is wrong. Ten coin tosses and ten ‘heads’? It’s possible but it’s not probable.”

Our job as intervention developers is to beat probability. Let’s assume that 50% of second graders who are struggling with reading are simply “late bloomers,” and will learn to read on their own, with no intervention, by the end of second grade. If we provide a treatment to 100 second graders, and fifty of them indeed get better, was the treatment effective? No. The law of probability tells us they would have gotten better anyway. In order for us to say the intervention was effective, we’d need some larger number. If 75% of them get better using our treatment, we have beaten probability. It’s unlikely that they all just got better due to chance. It’s more likely that the intervention had something to do with their improvement.

Shaindy gets a letter from her child’s school. Turns out, all of the chil-dren in the school were exposed to hepatitis. Shaindy has two choices of treatments — NIHIL and HEPEX. Both are on the market. NIHIL is heav-ily advertised in local publications. A celebrity says that she gave her children NIHIL and they didn’t develop hepatitis. A well-known principal also endorses NIHIL, saying that her children are routinely dosed with it, and they didn’t develop hepatitis. HEPEX, on the other hand, has no advertisements. It has been tested in what’s known as an outcome study. The researchers took 500 people who were exposed to hepatitis, and administered HEPEX. Only two of them developed hepatitis. They also administered a fake shot of HEPEX to 500 more people who had been exposed to hepatitis. Thirty-five of them developed the disease.

Which medication should Shaindy choose? If you say HEPEX, you’re a responsible mother. The outcome study is the only way to test whether or not the vaccine actually works. Remember, not all children exposed to an illness will develop it. The fact that the celebrity’s children didn’t get hepatitis might just mean that they never would have gotten it, NIHIL or no NIHIL.

Case histories and anecdotes are fun and exciting to read. “I used to be 400 pounds. Then I discovered bitter cherry extract, and now I’m a size two and I swim fourteen miles a day.” That’s much more interesting than reading about 2,000 obese women, some of whom were administered this diet, some of whom were on the wait list, and the dry percentages of how many of them lost how much weight. The problem with testimonials, though, is that you never know how many failures there were. You only read the anecdotes about the successes.


Remember Probability?

Malky is having a very hard time with her son Shaya. She reads about an approach called Kangaroo Parenting, which is supposed to help children who are insecurely attached to their mothers feel more attached. She buys the programs, reads the books, and sure enough, Shaya gets better. She writes a glowing testimonial. But what if Malky is the only person whose child gets better? What if 500 people took the Kangaroo class, and 499 people saw their child stay the same, or even get worse? What percentage of people are successful?

This is the problem with testimonial-based evidence. It doesn’t take probability into account, it doesn’t help us weigh our likelihood of success based on percentages, and it doesn’t offer any generalizable outcome data.

The gold standard in medical or psychological research is the randomized controlled double-blind study. In this type of study, people are divided randomly into two or more groups. Some people are administered the intervention we are studying. We call these people the “experimental group.” Some people are administered a placebo — a fake intervention that is known to be irrelevant. We call these people the “control group.”

We expect that some people in both groups will get better. How-ever, if a significant number of people in the experimental group get better, we know that this intervention was effective. What’s a “significant number?” It’s usually set before the experiment, but it is often 65% or better.


Placebo Effect

There are some psychological effects that are known to skew medical and psychological research. The most relevant one in this case is the “placebo effect.” Sometimes, when people know that they are receiving a treatment, they get better simply because they believe in the treatment. For example, if we tell a group of patients that they are receiving an experimental drug to treat their depression, when they are actually getting a placebo, some patients might feel so hopeful about the efficacy of the medication, they begin to get better.

There are other reasons for the placebo effect, as well. However, the randomized double-blind control group design defeats the placebo effect, since neither the researcher nor the patient knows whether or not the medication is the “real thing.”


Researcher Bias

Researchers are people too! Geeky people, for the most part, but still humans. If an intervention developer has put her whole life and soul into an intervention, she’s likely to believe strongly in it. She might ignore evidence that demonstrates it’s ineffective, or subcon-sciously skew the results of her research in many ways. This is why the double-blind study is designed so that neither the researcher nor the participants know who is getting the “real” treatment, and who is in the control group.


Lies and Statistics

“Buy GLEAM toothpaste, the chosen tool of dentists! 100% of dentists surveyed use GLEAM toothpaste.” Sounds good, right? After all, dentists would know about the quality of toothpaste!

Sounds great… until we read the fine print on the bottom of the ad, and find out that exactly two dentists were surveyed, and both of them have a financial interest in GLEAM. The ad contained the exact truth — 100% of dentists surveyed use GLEAM. They just didn’t survey many dentists, and they didn’t survey any unbiased dentists! That’s how you can lie with statistics.

Because statistics and evaluation research can be so boring and cumbersome, it’s easy for a sophisticated statistician to use the numbers to mislead the public. It takes a skilled researcher to parse the numbers and see if they’ve been manipulated. There’s a famous quote by Benjamin Disraeli equating statistics with lies. That’s because statistics can be like a painting covering a crack on the wall — as important for what they conceal as for what they reveal.

Statistics are a tool. Experimental research is a tool. Tools need to be used responsibly. Used properly, however, statistics and experimental research can help us make our parenting, clinical, educational, and medical decisions with more precision. The next time you read an ad with a glowing testimonial, call the company and ask for outcome data from a double-blind, controlled experi-mental study. If those numbers don’t exist, tread cautiously.