From the Desk of the Director: Popeye the Statistician

Click to see ‘R’ (spinach) in action!


Raise your hand if you like spinach. Now, did you like spinach when you were a kid? What changed? I, for example, learned to like the flavor and probably bought into the not so subtle marketing campaign of the Popeye cartoons. Well, the statistical program called ‘R’ is the spinach of the Investigative Biology course. At first, almost everyone wants to spit it out, but eventually, students realize that it is good for them. Actually, during our last meeting my undergraduate teaching assistants suggested that we start to give T-shirts with the following slogan to those who completed the Investigative Biology Course: “We ‘R’ survivors”

R-project is a freeware that became one of the most popular statistical software in the biological sciences. It is compatible with most computer operating systems, and has an add-on platform called Rstudio that organizes graphs, scripts and variables in a user friendly way. Every semester, over 400 students learn how to use ‘R’ for data analysis in this lab course. It is via ‘R’ that students improve their statistical, mathematical, and analytical skills. But every semester I have the one student who, with mixed anger and frustration, asks me the question:

Why do we need to learn how to program in R?”

As a college student I didn’t have to learn how to program in ‘R’. That’s because ‘R’ hadn’t really taken off yet. I learned how to use other statistical software, such as MINITAB and SAS. For my Ph.D. data analysis I wrote a complex mixed model in SAS, and I felt good about mastering this difficult statistical program. After receiving my Ph.D., I accepted a postdoctoral fellowship at Eidgenössische Technische Hochschule, (or ETH in short), in Zurich, Switzerland. As the paper reviews from my Ph.D. work were arriving, I wanted to re-check the data analysis to rebut some comments. Sadly, I realized that none of my colleagues in Zurich had the site license for this relatively expensive SAS software. I had mastered technology that was not transferable, and after leaving my alma mater my data became inaccessible. I can still see myself sitting at my desk in Zurich, drowning my sorrow into fondue.

Screen Shot 2015-04-07 at 8.29.46 AM Fast forward to the present: my Investigative Biology students collect data from their own experiments. They analyze these data and present them in peer-reviewed papers and posters. A couple years ago I noticed that some TAs walked around with their own laptops in the lab rooms and analyzed the students’ data in JMP, SAS, STATA or any other software they were familiar with. Seeing that, I realized we cannot train the generation of next scientists without starting to teach them statistical programming with a software that will be accessible wherever they end up after graduation. After my experience with lack of access to paid licenses, I decided to teach ‘R’ to our mostly underclassman students.

No matter how difficult or frustrating the learning experience may be at the beginning, in the long run, I know that biology students benefit from understanding the language of ‘R’. Yes, ‘R’ is like a universal foreign language that they can speak from the Alps to the Rockies, and more importantly, from genetics to ecology. At first, some colleagues doubted that this class comprised of mostly freshmen students would be able to tackle this task, but recently (a couple years after we started ‘R’ in Investigative Biology) other courses have started to use ‘R’, because we had such success teaching it. I frequently receive notes from alumni of the course, telling me how they got a research position or internship because they demonstrated how to analyze data in ‘R’.

So when that one student, with mixed anger and frustration, asked this semester: “Why do we need to learn how to program in ‘R’?” I smiled and answered: “Because it is good for you”, and went to pick up a batch of 400 “We ‘R’ survivors” T-shirts. – Dr. Sarvary


