Birth and Rise of R
R is a language and environment for statistical computing and graphics.
• R was initially written by Robert Gentleman and Ross Ihaka.
• The core group with write access to the R source comprise of – Douglas Bates, John Chambers, Peter Dalgaard, Seth Falcon, Robert Gentleman, Kurt Hornik, Stefano Iacus, Ross Ihaka, Friedrich Leisch, Uwe Ligges, Martin Maechler, Duncan Murdoch, Paul Murrell, Martyn Plummer, Brian Ripley, Deepayan Sarkar, Duncan Temple Lang, Luke Tierney, Simon Urbanek, and Thomas Lumley.
History of R:
The history of R is one of good fortune and good choices. In 1992, Gentleman – then a professor at the University of Waterloo in Canada – traveled 8600 miles to the University of Auckland to lecture for three months. One day, he found himself needing a manual for a particularly tricky piece of software and Ihaka – still a professor of statistics in those days – was the only one in the department who had a copy. In time, they both realized an interest in what Ihaka calls “playing academic fun and games” with statistical computing languages.
They had questions about programming languages they wanted to answer. In particular, both Ihaka and Gentleman shared a common knowledge of the language called “Scheme”, and both found the language useful in a variety of ways. Scheme, however, was unwieldy to type and lacked desired functionality. Again, convenience brought good fortune. Each was familiar with another language, called “S”, and S provided the kind of syntax they wanted. With no blend of the two languages commercially available, Gentleman suggested building something themselves.
Around that time, the University of Auckland needed a language to use in its undergraduate statistics courses as the school’s current tool had reached the end of its useful life. There was one major caveat: the program needed to run on Macintosh. According to Gentleman, the Department of Statistics took inventory and decided “that thing Ross and Robert are working on”, which happened to run on Macintosh, better than their current language. The professors called it R, as both a no to S and in reference to their forenames.
Ihaka and Gentleman kept the project secret from the wider community until August 1993, when an email to the S-news mailing list drew it into the public eye. A Canadian professor had a familiar problem: he needed a Macintosh version of S. Ihaka decided it was time to let R see the light of day. Soon after, a usable version of R appeared on StatLib, an online system for distributing statistical software and data.
Though what we have today is free software, in the mid-1990s Ihaka and Gentleman were seriously considering turning into a commercial product, but ultimately, the idea of selling was more than worth it.
Ihaka and Gentleman agreed with the idea of making free software – meaning that people would be free to use, change, and distribute it as they like. In 1995, the duo made R’s source code available under a free software license.
Evolution of the software:
As the language improved, more users joined – and more users meant less room for bugs to hide. As fixes and functions poured in, the names of the submitters began to look familiar. Usual suspects so often that Ihaka and Gentleman gave them the ability to edit the source code directly because it was easier than managing all the changes themselves. By mid ‐ 1997, 11 people – including Ihaka, Gentleman, Mächler, Peter Dalgaard, Kurt Hornik, Friedrich Leisch, and Thomas Lumley – had the keys to R’s source code. The group fashioned themselves the “R Core” team.
“The users were the developers in those days,” Ihaka says, and more of them joined the community, they needed to show off what they had done and download contributions they found useful. In March 1997, Hornik and Leisch, of the Vienna University of Economics and Business, made a Herculean contribution to the R Project by building the Comprehensive Archive Network (CRAN). This network made the essential information and files of R available for download in one place. Most importantly, users could browse packages – R version of code libraries – and download the ones they needed.
CRAN makes R shine. Most of the functionality of R is contained in the packages stored in CRAN, which can be loaded and used when needed. This makes R more versatile than other statistical software. Closed-source software, such as SAS and SPSS, can only be updated by their official developers, whereas R has a community churning out updates all the time.
In 2000, the R Project released R version 1.0.0, the first version they felt was ready for public usage. The following year, several influential statisticians published papers on data science, and 2003 saw the first academic journal dedicated to this growing field. For those people now identified as data scientists, R, CRAN and the wider community provided the means to explore and familiarize themselves with statistical tools and techniques. In turn, those data scientists added packages to help with data types and models from fields as diverse as ecology, linguistics, bioinformatics and network science.
Future of R
We have now caught up with R’s story so far, but it is by now the end of the tale. What might the future have in store?
Lumley was unsure if another computing language would be coming to bury anytime soon, but he felt that any successor would have to absorb CRAN and its stockpile of code. Gentleman agreed, saying: “There are really good algorithms in R, and no one should be implementing them.”
The future of CRAN is a popular topic for speculation as the network is starting to creak under the weight of its own success. The archive now holds more than 12,000 packages and is growing near-exponentially. With CRAN growing unabated and – in Peter Dalgaard’s words – “the original Core team approaching pensionable age”, the maintenance of R and CRAN will at some point need to change.
Ultimately, of course, the future of R will be determined by its community – the people who, over the last quarter of a century, have donated years of their lives to the source code, crafting clever packages and helping new users get started. These donations of time and effort did not come with the promise of future monetary rewards.
R is free, open source software that was created for fun, reared by committee, and developed by the masses. That is a software that could survive and flourish for 25 years, a credit to its quality, to its creators and to its users. “People in the past would have said you couldn’t do something like this,” says Lumley. “Now it’s clear that you can.”