This is an update of a blog (as the first three paragraphs, here slightly modified and updated) that I wrote on January 11th, 2009 when I was Chief Executive at BBSRC. It raises generally interesting points that are seemingly not well known, and that – based on a number of recent discussions – I consider worth re-sharing. Original links to CiteULike no longer work and have been removed.
The availability of many records in digital format opens up many possibilities (Weinberger, 2007), not least in bibliometrics. For this blog I shall look briefly at the distribution of scientific activity between individuals, as encapsulated by the question ‘if n individuals have published 1 scientific paper in a particular time period, how many individuals have published 2 papers or 10 papers or 100 papers?’
Now one might wonder whether one should expect there to be any regularities in such a (quantised) distribution, but there are. The question was posed and answered most pertinently by Alfred Lotka in 1926 (Lotka, 1926), and the relationship is known as Lotka’s Law. Lotka observed, from a study of papers listed in Chemical Abstracts and in Auerbach’s Geschichtstafeln der Physik, that the number of persons making n contributions is given by 1/na of those making a single contribution, with a equalling approximately 2. Thus for every 100 people who have published 1 paper, 25 have published 2 papers and 1 person has published 10 papers. In other words, the distribution of scientific productivity is best described by an inverse square law (a specific version of a negative exponential more generally referred to as a Zipf distribution). Although this is not universally true, it is a reasonable approximation and has some interesting mechanistic bases. The consequences, as recognised in Lotka’s original survey, included the fact that 60% of contributions were made by authors who contributed only one paper (and note that all joint papers were taken to have been written by the ‘senior’ author only). Nowadays this non-normal distrbution would be seen as a long-tail phenomenon, as popularised in Chris Anderson’s excellent book (Anderson, 2006). (Similar distributions apply to citations and grant income too, since nowadays they all tend to be more-or-less closely related, and a later BBSRC blog found that BBSRC funding at that time rather precisely followed the 1/n2 distribution.)
A similar empirical Law, known as Bradford’s Law, describes the pattern first noted by Samuel C. Bradford in 1934 (reprinted in 1985) that “estimates the exponentially diminishing returns of extending a search for references in science journals. One formulation is that if journals in a field are sorted by number of articles into three groups, each with about one-third of all articles, then the number of journals in each group will be proportional to 1:n:n² [see Wikipedia].” Put another way, while many of the papers in a scientific field (however defined) may well be published in a set of m core journals, about two thirds of pertinent ones will be much more widely distributed, over m*(n+n2) journals (Hjørland and Nicolaisen, 2005; Nicolaisen & Hjørland 2007). Although usually interpreted in a very different way (“most of the literature in a field is in a small set of core journals”), this too is in fact better seen as another long-tail phenomenon since the latter third are in fact extremely widespread. Indeed this focus on ‘core journals’ (especially the so-called ‘vanity journals’) in a field (often in the context of library holdings) is seen as tending to favour dominant theories and views while suppressing views other than the mainstream at a given time (leading to a ‘spiral of silence’). It is another example of the balkanisation of the literature (Kostoff, 2002), to which I had (then recently) alluded in two open access papers (now published (Hull et al. 2008; Kell 2009)). The opening gambit of the manuscript version of the 2008 paper (Hull et al. 2008) read “Most scientists now manage the bulk of their information electronically, organizing their publications and citations using digital libraries”. The most perceptive of its referees began by responding “neither I, nor most people I know, use such systems” (and we modified the remark). {I’m happy to say that this has by now changed!}The BBSRC blog finished “BBSRC has long been committed to the development of tools (including e-tools) that will help biologists, including through its Tools and Resources Strategy Panel and Committee structure. It is to be hoped that the increased development and exploitation of electronic tools to help deal with the flood of words and data will assist our community in increasing yet further both its adventure and its productivity”. Those sentiments remain and if anything are rather increased in the era of modern AI.
—
Some recent developments, not least critiques of some of our work on Long COVID (that I have dealt with in some previous blogs here, here and here) might consequently be seen in the light of recent productivity measures as to who might better know what they are talking about. However, thinking further about the Zipf distribution of abilities more generally, and wrting this during the 2024 summer olympics, I recognise that this almost certainly applies whatever the field of endeavour. Consider soccer (football), where in England the professional game has four top divisions (Premiership, Championship, League 1, and League 2). A player in league 2 is both a professional and almost certainly far better than anyone most people will have played with as amateurs. However, compared with the top players in the Premier league that person is frankly of a far lower quality. Not all professionals have equal ability! Now while this is clearly true of scientists too, as per the above, it is also likely to be true of musicians, administrators, founders of start-ups, venture capitalists, chiropractors (I am lucky to have found a superb one), and anything else one might contemplate. Possibly most worrying of all, it is presumably true of mainstream medical professionals too; that is why obtaining a diversity of views and ‘second opinions’ is likely to prove valuable. Especially in an era in which disinformation and fake news have become widespread, finding what is true, or at least most believeable (Kell & Welch, 2018), can be hard, The job of the scientist is, or should be, to seek such truths.
Anderson, C. M. (2006) The long tail: how endless choice is creating unlimited demand. London, Random House.
Bradford, S. C. (1934) Sources of information on specific subjects. Engineering, 137, 85-86 (reprinted in J. Information Science, 10, 173 – 180 (1985)).
Hjørland, B. & Nicolaisen, J. (2005) Bradford’s law of scattering: Ambiguities in the concept of “subject”. LNCS, 3507, 96-106.
Huber, J. C. (2001) A new method for analyzing scientific productivity. J Am Soc Inf Sci Technol, 52, 1089-1099.
Hull, D., Pettifer, S. R. & Kell, D. B. (2008) Defrosting the digital library: bibliographic tools for the next generation web. PLoS Comput Biol, 4, e1000204. doi:10.1371/journal.pcbi.1000204 (HTML version with tags).
Kell, D. B. (2009) Iron behaving badly: inappropriate iron chelation as a major contributor to the aetiology of vascular and other progressive inflammatory and degenerative diseases. BMC Med Genom 2, 2
Kell, D. B. & Welch, G. R. (2018) Belief: the baggage behind our being. OSF preprints pnxcs https://osf.io/pnxcs/
Kostoff, R. N. (2002) Overcoming specialization. Bioscience, 52, 937-941.
Lotka, A. J. (1926) The frequency distribution of scientific productivity. J Washington Acad Sci, 16, 317-424.
Nicolaisen, J. & Hjørland, B. (2007) Practical potentials of Bradford’s Law: a critical examination of the received view. J. Documentation 56, 674-692.
Weinberger, D. (2007) Everything is miscellaneous: the power of the new digital disorder. New York, Times Books.
Follow Prof Kell!