For a single university staff member, those prices jump to $595 to $1,495 for an individual license. For a student, Stata costs $198 for Stata/IC (entry-level), $395 for Stata/SE (mid-level), $695 for Stata/MP 2-core and $995 for Stata/MP 4-core (top-level), all in US dollars and all perpetual licenses. Stata costs a varying amount depending on whether you are a student, business or institution, and whether you want an annual or perpetual license, and which version of Stata you want. R is free, and can be downloaded from their website. For genetic purposes, where there legitimately can be tens of thousands of variables, R is generally considered better, which is why genetics research does seem to favour R. For many non-genetic purposes, Stata is absolutely fine. The most expensive version, Stata/MP, has a maximum of 120,000 variables and 20 billion observations. The cheapest version, Stata/IC, has a maximum of 2,048 variables and 2.14 billion observations (rows). The newest iteration of Stata (version 15, out June 2017) has 3 versions ranging from cheapest to more expensive. One point that sometimes comes up is that Stata limits the number of variables (columns) allowed in any one dataset, whereas R is limited only by your computer.
For some purposes Stata is faster, for others R is faster, both in terms of how much code is needed and how fast it runs.īoth packages have a good community of users who develop programs within each, so whether Stata or R is better may depend more on the purpose you are using a stats package for, rather than a blanket “one is better than the other”. Stata is more intuitive for people that have used spreadsheets, since at any time you can click a button and load up a view of your data in a spreadsheet, but R allows you to do more at once as many datasets can be loaded in at the same time. Apart from that Stata costs money and R does not, there aren’t too many differences. Analyse and manipulate data however you like, produce tables and graphs, anything where you have some numbers (or letters or words) and you want to do something with them. The more code you know, the less you need to type, and the quicker things get done.īasically anything statistical.
Although code is not currently routinely checked at peer-review, it could be.
when someone needs help) and for academics at peer-review. This is great both within organisations (e.g. Repeating an analysis every month becomes as simple as loading new data and clicking “run” on the code.Errors can be found and fixed without having to completely redo the analysis.You can check through what’s been done months after you’ve forgotten what it was you did.If the initial data changes a little, no problem, just run the code again. Everything is reproducible – you can start with your initial data, clean it up, analyse it, produce tables and graphs, and save all of the output.Why is code so great? From my experience:
They both allow the user to write code, and then use this code to do everything one could want to the data. Stata and R are both great packages to manipulate and analyse data. Both packages allow anyone who uses them to create and distribute statistical programs indeed, often the programs I run most frequently are those written by people using the packages, not people who created the packages. One major difference between the two packages is that StataCorp charge people to use Stata, whereas R is completely free.
StataCorp LLC created Stata, and calls it:Īn integrated statistics, graphics, and data management solution for anyone who analyzes dataīell Laboratories created R, and calls it:Īn integrated suite of software facilities for data manipulation, calculation and graphical display Stata and R both have great facilities for cleaning data, running most statistical tests on it, producing graphs and tables, and these days exporting results straight to word, PDF, LaTeX or excel. You feed in data, and then usually write code to analyse the data. Stata and R are both statistical packages. What if someone spots an error in your work?.Will you be sharing any data or results on this blog?.Are there other statistical packages we could be using?.Why evidence synthesis for medical research?.Do you favour Frequentist or Bayesian statistics?.These are very important questions (that I could think of reasonably quickly). First off, let’s answer some burning questions about Stata and R, and about some other things.