posts » Open Science

Open Science

I'm a big believer in Freedom. That is, among other things, the freedom to understand - "Free as in free speech, not as in beer" as the Free Software Foundation describes it. In this post, I'd like to talk about how this freedom applies to Science: a movement I (and others) call Open Science. My idea writing this stemmed from a presentation I gave last year, first to the EPOPé unit (Obstetrical, Perinatal and Pediatric Epidemiology Research Team) at INSERM where I work, and then to the Early Career Research group of the RECAP Preterm project which I am also involved in. I do not claim that this piece contains an exhaustive description of all the ideas contained within, rather it is meant to act as a starting point for discussion (feel free to comment!) and for your own exploration...

The long story

The problem with science

Science - at least in my domain (public health/medical research) essentially works like this:

  • A researcher has an idea they would like to develop.

  • The researcher seeks funding. Funding normally comes from an entity that is funded publicly through taxation (e.g. the research councils such as the MRC or the ESRC in the UK, or European Union funding programmes such as Horizon 2020) or charitable donations (e.g. the Welcome Trust, the Fondation de France or the Fondation pour la Recherche Medicale). This funding allows researchers to investigate their ideas and write articles.

  • Articles are then submitted to journals with the objective of sharing them with a wider audience.

  • Journals, however, have publishers, and the publishers see this as a business where they can make money. Thus, they often charge (high) fees to access the journals and/or the individual articles.

  • Academics working in universities or other institutions are normally able to access the scientific articles they want as their institution will pay the publisher's fees. However, other members of the public -- that is, people who originally contributed to paying for the research -- are unable to access the article without paying the additional fee.

To me, something doesn't seem right about this situation. As Jon Tennant said,

Getting published is like going to a restaurant, bringing all of your own ingredients, cooking the meal yourself and then being charged $40 for the waiter to bring it out on a plate to you.

Indeed, if we look closer at the whole academic publishing industry, we find that it essentially became privatised in the 1960s and 1970s, and that over the intervening period of time, their profits have increased extraordinarily. Using Reed-Elsevier (now known as RELX) as an example, we can see that their profits increased enormously over the period 1991 to 2013, and that the actually profit margins are enormous! To put that into context, the comparative figures for supermarkets (think of the real baddies like Asda, Walmart or Amazon -- who are now the biggest retailers in the world according to Forbes) is somewhere in the region of two to five percent.

Operating profits for Reed-Elsevier. A: all journals. B: Biomedical journals
Reed-Elsevier profits, 1991-2013, from PLOS One, doi:10.1371/journal.pone.0127502

An open parallel

Now, when I initially planned to give this talk, I was given a date in October. But then it was brought forward to September, which meant I spent most of my summer holiday trying to figure out what I was going to say. The last part of this was on a sailing trip with some friends in the Baltic, and I was struck by the parallels of the existing scientific publishing model with the model of merchant trading that existed several centuries ago.

The model worked like this. The owner was based somewhere -- say, in the UK. He (it was normally a 'he') would employ a captain to be in charge of the ship when it was away from its home port. The captain would often have shares in the cargo, thus incentivising the return of the ship at the end of the voyage. Under, there would be some officers (better paid, maintaining hierarchy) and then the crew who may be coerced into working. The whole system operated as a hierarchical pyramid, with wages paid according to status, and order maintained according to the authority of the captain backed by 'law' which could be enforced upon landing.

However, I went sailing with a bunch of friends that I know through shared political interests, and we organised things differently. For starters, as shown in the picture, we ran up the Jolly Roger. This reminded me of the old piratical ways of maintaining order, that I'd read about in an article by Peter Leeson. As there was no government to enforce the rule-making of the ships' owners (indeed, there were no owners), alternative means of running things were required. And, in fact, this was almost certainly a more democratic way of doing things: everyone was important and everyone had a voice in decision-making. Or, in other words, everyone contributed to daily life on board the ship, and all resources were shared. The captain was democratically elected - and, alongside, there was a quartermaster. These two roles differed slightly: in times of urgency, be they the midst of a storm or when attacking another ship, there was clearly a need for someone to be able to make decisions rapidly, and those needed to be followed. But when time was less pressing -- for example, after the battle, when treasures were to be shared, or in maintaining supplies over the course of an entire voyage when food etc needed to be rationed -- the Quartermaster stepped to the fore and it was his or her opinion that was respected and looked for as the ultimate arbiter. And, importantly, this role was also one that was democratically elected.

I think this parallels nicely with the open-publishing model I aspire to use. Everyone is contributing towards science (paying money to fund science - or in our analogy, contributing to the running of the ship) and, when an article is finished, everyone can benefit: the resources are shared. There are clearly expenses, as publishing on the internet is not without cost (running webservers 24/7 costs money), so it's perhaps ok that part of the funding goes towards paying an open-access fee: that is, a payment to the publisher to cover costs; a similar thing would have happened on the boat (food and supplies had to be obtained somehow).

What is "open access"?

So what actually is open access? The first mainstream initiative to try and defined this happened in 2002 and became known as the Budapest Open Access Initiative. They produced a statement, part of which I reproduce here as I consider it key to this discussion:

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

This philosophy of openness is identical to that seen in the Free Software world, which first saw a version of Richard Stallman's "Free Software definition" published in 1986. The classic line from that is:

To understand the concept, you should think of "free" as in "free speech," not as in "free beer".

And, indeed, "free" does not translate simply as "without cost": if you look in a dictionary, you will find at least 11 or 12 different definitions of "free", only one of which means "without payment". For open-access publishing and for science, it is clear that there are costs involved. Consequently, these need to be paid by somebody and can still be restrictive for scientists who may not have such funds available (e.g. for scientific analyses that are being done on a voluntary basis without external funding, or students trying to publish work that they've previously done for a dissertation, etc).

Different types of open access

As a consequence of this issue of cost, various different models of open access exist. The most common model (the one described above) is often known as "gold" open access, but this is not the only possibility for sharing articles. The table below lists some examples of the different types of open access articles:

Types of open access
Name Description
Gold no restrictions on access (anyone can access, any time), open access license.
Green closed access (journal); authors’ version in institutional repository.
Hybrid subscription (restricted) journal but article has special license permitting open access.
Delayed subscription (restricted) journal, article becomes open access after a certain period of time (sometimes for a limited period only).
Platinum no restrictions on access, no publication fees, open access license.
Bronze no publication fees, no restrictions on access – but entirely at publisher’s discretion (could be reversed)
Black freely available on ‘pirate’ sites (e.g. Sci-Hub).

A good one to highlight is Sci-Hub which is "the first website in the world to provide mass & public access to research papers" and was started by Alexandra Elbakyan from Kazakhstan - incidentally, names as one of Nature's top 10 people in Science in 2016. I'll let you follow the links if you're interested in finding out more, but if you want to just find where Sci-Hub is now, https://whereisscihub.now.sh is a good website to know about. You may additionally find that sci-hub is inaccessible from wherever you are located: this may be because it is blocked locally by your ISP (for example, this is the case where I live as I use a big provider who clearly have commercial interests aligned with those such as Elsevier and others who don't like Sci-Hub). There's an easy way to get around this - and, again, relates to freedom: use tor ('tor' wikipedia page) by downloading torbrowser and "Protect yourself against tracking, surveillance, and censorship."

Licensing

I've already mentioned that the concept of "freedom" in science spawned from computer scientists - particularly, following publication of Richard Stallman's "Free Software definition". This document, and the ideas behind it, ultimately led to the founding of the Free Software Foundation and the development of the GNU Operating System and the GNU project ("GNU" being a recursive acronym for "GNU's Not Unix" - with unix being a software that preceded what is now known as Linux... But here we're probably getting off topic slightly). The history is in fact fascinating and is well documented on the "about GNU" page; the part relevant to the current story relates to the development of the GNU General Public License (GNU GPL) which essentially states that computer software should meet each of the Four Freedoms in order to be called "free software". These freedoms are:

  • the freedom to use the software for any purpose,
  • the freedom to change the software to suit your needs,
  • the freedom to share the software with your friends and neighbors, and
  • the freedom to share the changes you make.

For media, though, the same considerations aren't necessarily always important. Hence, Creative Commons was set up in America by a bunch of lawyers (including Lawrence Lessig) who believed that the ideas of sharing and openness were important beyond just computers. The Creative Commons provides a range of licences from which it is possible to pick-and-choose the parts that are important to the creator: the principal component are choices related to:

- attribution
whether the creator needs to be recognised;
- share-alike
whether the piece needs to have a similar copyright notice if adapted and shared;
- non-commercial
whether others are allowed to copy and distribute the piece of work for commercial purposes.

Of these possible conditions, the last is perhaps the most contentious as preventing the use of work in commercial distributions can be viewed as taking away freedom of others, and hence the piece of work is no longer considered as a "free cultural work". NB, it's also possible to have a "no-derivatives" clause in which it is prohibited from building on the work.

Encouraging open access

Open access publications - and open science more generally - are now becoming broadly accepted as the way of the future. To my mind, this is more true in Europe than in north America (or, certainly, the United States), although organisations on both sides of the Atlantic are now mandating publication of scientific articles in open access formats.

Specific examples include the following:

  • Plan S - a coalition of funding bodies mandating open access publication
  • Projekt DEAL - a "publish and read" model agreed by German research institutions with Wiley and Springer Nature (but notably not with Elsevier)
  • Various others in Europe - for example, Sweden, Netherlands, Hungary, and Norway have all cancelled agreements with Elsevier.
  • California - mandated open access publication, and also cancelled their agreements with Elsevier

Plan S is probably the one with the most relevance to me. It states:

With effect from 2021, all scholarly publications on the results from research funded by public or private grants provided by national, regional and international research councils and funding bodies, must be published in Open Access Journals, on Open Access Platforms, or made immediately available through Open Access Repositories without embargo.

The participants include European funding streams (e.g. European Union and the European Research Council), international organisations (e.g. the World Health Organisation, UNICEF), charitable organisations (e.g. the Bill and Melinda Gates Foundation, Wellcome Trust) and various national organisations (e.g. from UK, Sweden, Norway, Poland, Luxembourg, the Netherlands and others).

Pros and cons of open access

From what I've said so far, you may think - or at least, may think that I think - that open access is complete magic, the bees knees, the best thing since sliced bread. Well, it's true, there are lots of benefits of open access - for example:

  • Articles are available to all with unrestricted access
  • There's the possibility for unrestricted reuse - and thus it's truly possible to build on scientific knowledge
  • There are no word limits (at least, for web publication - which is the majority of cases)
  • Articles can be available sooner, for example through appearance in pre-print or other repositories.

There are, of course, not so good aspects too. Some examples are:

  • Many open-access journals are still building reputation, so aren't considered as "good" as others (never mind that it's the article quality that matters).
  • Open-access journals may have low(er) impact factors - for the above reasons, but also because they publish many more articles that may be "good" science but just don't have as many citations. This of course is a problem with the Impact Factor and not with open-access publication, but that's another story.
  • Articles may be "lost in the sea" due to there being lots of publications (same issue as mentioned above).
  • Open access may be costly - particularly for authors.

But, to summarise, there's probably one factor is more important than any other when coming to talk about open access publications, and that is transparency. Open access articles mean that science is transparent: anyone can see the results, there's no hiding (of course, there's a lot more important things that can be done to improve transparency, like open peer-review, pre-print repositories and so on. But I'm not going to talk about those here either).

Beyond open access ...

Openness and freedom are not restricted to just publishing of scholarly articles or to computer science. The ideas go a lot further than this.

... Open science

The FAIR initiative describes the circle of data principles *Findable-Accessible-Interoperable-Reusable.

... Open data

Numerous projects are working to make data accessible to all. Examples include various biomedical databases such as BioProject, GenBank, or OMIM. Many are hosted by large governmental organisations such as the US National Center for Biotechnology Information there are also others. Indeed, the European Union recently (just this week) released a draft of its plans for the future in relation to data.

... Open tools

And finally, there are of course many tools that have been created openly for use by scientists. Not least are statistical tools like the R language for statistical programming, which contains many different modular components depending upon what exactly you wish to do, as well as different front ends such as the increasingly popular graphical user interface R-Studio or (my preferred method for using R), *emacs speaks statistics (ESS) for the GNU emacs text (and more) editor.

Other tools that people may use are LibreOffice - a free alternative to micro$oft office; and the mozilla firefox web browser - although I still recommend torbrowser as mentioned above if you really care about your own privacy (and you should!). I also use LaTeX as a type-setting tool for my print documents - you basically just write and the program takes care of the presentation for you (it also does presentations, which is how I started off this article), and git is a version control system that I've now been using for many years and allows me to keep track of changes over time in my files.

References / Further Reading