Hal R. Varian
University of California, Berkeley
June 1, 1997
It is widely expected that a great deal of scholarly communication will move to an electronic format. The Internet offers much lower cost of reproduction and distribution than print, the scholarly community has excellent connectivity, and the current system of journal pricing seems to be too expensive. Each of these factors are helping push journals from paper to electronic media.
In this paper I want to speculate about the impact this movement will have on the form scholarly communication. How will electronic journals evolve?
Each new medium has started by emulating the medium it replaced. Eventually the capabilities added by the new medium allow it to evolve in innovative, and often surprising, ways. Alexander Graham Bell thought that the telephone would be used to broadcast music into homes. Thomas Edison thought that recordings would be mostly of speech rather than music. Marconi thought that radio's most common use would be two-way communication rather than broadcast.
The first use of the Internet for academic communication has been as as a replacement for the printed page. But there are obviously many more possibilities.
In order to understand how journals might evolve, it is helpful to start with an understanding of the demand and supply for scholarly communication today.
The academic reward system is structured to encourage the production of ideas. It does this by rewarding the production and dissemination of ``good'' ideas---ideas that are widely read and acknowledged.
Scholarly publications are produced by researchers as part of their jobs. At most universities and research organizations, publication counts significantly towards salary and job security (e.g., tenure.) All publications are not created equally: competition for space in top-ranked journals is intense.
The demand for space in those journals is intense because they are highly visible and widely read. Publication in a top flight journal is an important measure of visibility. In some fields, citation data has become an important observable proxy for ``impact.'' Citations are a way of proving that the articles that one publishes are, in fact, read.
Scholarly communication also serves as an input to academic research. It is important to know what other researchers in your area are doing so as to improve your own work and to avoid duplicating their work. Hence, scholars generally want access to a broad range of academic journals.
The ability of universities to attract top-flight researchers depends on the size of the collection of the library. Threats to cancel journal subscriptions are met with cries of outrage by faculty.
[Tenopir and King(1996)] have provided a comprehensive overview of the economics of journal production. According to their estimates, the ``first-copy'' costs of an academic article are between $2,000 and $4,000. The bulk of these costs are labor costs, mostly clerical costs for managing the submission, review, editing, typesetting and setup costs.
The marginal cost of printing and mailing an issue of a journal is on the order of $3. A special-purpose, nontechnical academic journal that publishes 4 issues per year with 10 articles each issue would have fixed costs of about $120,000. The variable costs of printing and mailing would be about $12 per year. Such a journal might have a subscriber list of about 600, which leads to a break-even price of $212. Of course, many journals of this size are sold by for-profit firms and the actual prices may be much higher: prices of $600 or more are not uncommon for journals of this nature.
If the variable costs of printing and shipping were eliminated, the breakeven price would fall to $200. This illustrates the following point: fixed costs dominate the production of academic journals; reduction in printing and distribution costs due to electronic distribution will have negligible effect on breakeven prices.
Of course, if many new journals are produced and distributed electronically the resulting competition may chip away at the $600 monopoly prices. But if these new journals use the same manuscript-handling processes the $200 cost-per-subscription will remain the effective floor to journal prices.
There are two other costs that should be mentioned. First is the cost of archiving. [Cooper(1989)] estimates that the present value of the storage cost of a single issue of a journal to a typical library is between $25 and $40.
Another interesting figure is yearly cost-per-article read. This varies widely by field, but we can offer a few order-of-magnitude guesses. According to a chart in [Lesk(1997)], p 218, 22% of scientific papers published in 1984 were not cited in the ensuing 10-year period. The figure rises to 48% for social science papers, and a remarkable 93% for humanities papers!
[Odlyzko(1997)] estimates that the cost per reader of a mathematical article may be on the order of $200. By comparison, the director of a major medical library has told me that his policy is to cancel journals for which the cost per article read appears to be over $50.
It is not commonly appreciated that one of the major impacts of online publication is that use can be easily and precisely monitored. Will academic administrators really pay subscription rates implying costs per reading of several hundred dollars?
It seems clear that reduction in the costs of academic communication can only be achieved by re-engineering the manuscript handling process. Here I use re-engineering in both its original sense---rethinking the process---and its popular sense---reducing labor costs.
The current process of manuscript handling is not particularly mysterious. The American Economic Review works something like this. The author sends 3 paper copies of an article to the main office in Princeton. The editor assigns each manuscript to a co-editor, based on the topic of the manuscript and the expertise of the co-editor. (The editor also reviews manuscripts in his own area of expertise.) The editor is assisted in these tasks by a staff of 2-3 FTE clerical workers.
The manuscript arrive in the co-editor's office who assigns them to two or more reviewers. The co-editor is assisted in this task by a half-time clerical worker. After some nudging, the referees usually report back and the co-editor makes a decision about whether the article merits publication. At the AER about 12% of the submitted articles are accepted
Typically the author revises accepted articles for both content and form, and the article is again sent to the referees for further review. In most cases the article is then accepted and sent to the main office for further processing. At the main office, the article is copyedited and further prepared for publication. It is then sent to be typeset. The proof sheets are sent to the author for checking. After corrections are made, the article is sent to the production facilities where it is printed, bound, and mailed.
Much of the cost in this process is the cost of coordinating the communication: the author sends the paper to the editor, the editor sends it to the co-editor, the co-editor sends it to referees, etc. These costs require postage and time, but most importantly they require coordination. This is the role played by the clerical assistants.
Universal use of electronic mail could undoubtedly save significant costs in this component of the publication process. The major enabling technology are be standards for document representation (e.g., Microsoft Word, PostScript, SGML, etc.) and multi-media email.
[Revelt(1996)] sampled Internet working paper sites and prepared a summary table. According to his survey, PostScript and PDF are the most popular formats for eprints with TeX being common in technical areas and HTML for non-technical areas. It is likely that standardization on 2-3 formats would be adequate for most authors and readers. My personal recommendation would be to standardize on Adobe PDF since it is readily available, flexible and inexpensive.
With respect to email, the market seems to be rapidly converging to MIME as a standard for email inclusion; I expect this convergence to be complete within a year or two.
This means that the standards are essentially in place to move to electronic document management during the editorial and refereeing process. Obviously new practices would have to be developed to ensure security and document integrity. Systems for timestamping documents such as Electronic Postmarks are readily available; the main barrier to their adoption is training necessary for their use.
If all articles were submitted and distributed electronically, I would guess that the costs of the editorial process would drop by a factor of 50% due to the reduction in clerical labor costs, postage, photocopying, etc.
Once the manuscript was accepted for publication, it would still have to be copyedited and converted to a uniform style. In most academic publishing copyediting is rather light, but there are exceptions. Conversion to a uniform style is still rather expensive due to the idiosyncrasies of authors' wordprocessing systems and writing habits.
It is possible that journals could distribute electronic style sheets that would help authors achieve a uniform style, but experience thus far has not given great reason for optimism on this front. Journals that accept electronic submissions report significant costs in conversion to a uniform style.
One question that should be taken seriously is whether these conversion costs for uniform style are worth it. Typesetting costs are about $15-$25 per page for moderately technical material. Markup costs probably require 2-3 hours of a copyeditor's time. This means that preparation costs for a 20-page article are on the order of $500. If a hundred people read the article, is the uniform style worth $5 apiece to them? Or, more to the point, if 10 people read the article is the uniform style worth $50 apiece?
The advent of desktop publishing dramatically reduced the cost of small-scale publication. But it is not obvious that the average quality of published documents went up. The earlier movement from hard type to digital typography had the same impact. As [Knuth(1979)] observes, digitally typeset documents cost less but had lower quality than handset documents.
My own guess about this benefit-cost tradeoff is that the quality from professional formatted documents isn't worth the cost for material that is only read by small numbers of individuals. The larger the audience, the more beneficial and cost-effective formatting becomes. This may suggest a two-tiered approach: articles that are formatted by authors are published very inexpensively. Of these, the ``classics'' can be ``reprinted'' in professionally designed formats.
A further issue arises in some subjects. Author-formatted documents may be adequate for reading, but they are not adequate for archiving. It is very useful to be able to search and manipulate subcomponents of an article such as abstracts and references. This means that the article must be formatted in a way that these subcomponents can be identified. Standard Generalized Markup Language (SGML) allows for such formatting, but it is rather unlikely that it could be used by most authors, at least using tools available today.
The benefits from structured markup are significant, but it is also quite costly so the benefit-cost tradeoff is far from clear. We return to this point below.
In summary, re-engineering the manuscript handling process by moving to electronic submission and review may save close to half of the first-copy costs of journal production. If we take the $2,000 first-copy costs per article as representative, this moves the first-copy costs to about $1,000. Moving the formatting responsibility to authors would reduce quality, but would also save even more on first-copy costs. For journals with small readership this tradeoff may be worth it. Indeed, many humanities journals have moved to online publication for reasons of reduced cost.
[Odlyzko(1997)] estimates that the cost of [Ginsparg(1996)]'s electronic preprint server is about between $5 and $75 per paper These papers are formatted entirely by the authors (mostly using TeX) and are not refereed. Creation and electronic distribution of scholarly work can be very inexpensive; one has to wonder whether the value added by traditional publishing practices is really worth it.
Up until now we have only considered the costs of preparing the manuscript for publication. If the material were subsequently distributed electronically there would be further savings. We can classify these as follows:
The big issue facing those who want to publish an electronic journal is how to get the ball rolling. People will publish in electronic journals when there are lots of readers; people will read electronic journals when there is lots of high-quality material published there.
This kind of ``chicken and egg'' problem is known in economics as a ``network externalities'' problem. We say a good (such as an electronic journal) exhibits network externalities if an individual's value for the product depend on how many other people use it. Telephones, faxes, and email all exhibit network externalities. Electronic journals exhibit a kind of indirect form of network externalities since the readers' value depends on how many authors publish in the journal and the number of authors who publish depend on how many readers there are.
There are several ways around this problem, most of which involve discounts for initial purchasers. You can give the journal away for a while, and eventually charge for it, as the Wall Street Journal has done. You can pay authors to publish in it, as the Bell Journal of Economics did when it started. It is important to realize that the payment doesn't have to be a monetary one. A very attractive form of payment is to offer ``prizes'' for the best articles published each year in the journal. The prizes can offer a nominal amount of money, but the real value is being able to list such a prize on your vitae. In order to be credible, such prizes should be juried and promoted widely. This may be a very nice way to overcome young authors' reluctance to publish in electronic journals.
Let us now speculate a bit about what will happen when all academic publication is electronic. I suggest that 1) publications will have much more general forms; 2) new filtering and refereeing mechanisms will be used; 3) archiving and standardization will remain a problem.
The fundamental problem with specialized academic communication is that it is specialized. The number of readers of many academic publications is less than 100. Despite these small numbers, the academic undertaking may still be worthwhile. Progress in academic research comes by dividing problems up into small pieces and investigating these pieces in depth. Painstaking examination of minute topics provides the building blocks for grand theories.
However, there is much to be said for the viewpoint that academic research may be excessively narrow. It is said that a ghost named ``Pedro'' haunts the bell tower at Berkeley. The undergrads make offerings to Pedro at the Campanile on the evening before the exam. Pedro, it is said, was a graduate student in linguistics who wanted to right his thesis on Sanskrit. In fact, it was a thesis about one word in Sanskrit. And, it was not just one word, but in fact was on one of this word's forms in one of the particularly obscure declensions of Sanskrit. Alas, his thesis committee rejected Pedro's topic as ``too broad.''
However, the narrowness of academic publication is not entirely due to the process of research, but is also due to the costs of publication. Editors encourage short articles, partly to save on publication costs, but mostly to save on the attention costs of the readers. Physics Letters is widely read because the articles are required to be short. But one way authors achieve the required brevity is remove all ``unnecessary'' words ... such as conjunctions, prepositions, and articles.
Electronic publication eliminates the physical costs of length, but not the attention costs. Brevity will still be a virtue for some readers; depth will be a virtue for others. Electronic publication allows for mass customization of articles, much like the famous ``inverted triangle'' in journalism: there can be a one-paragraph abstract, a one-page executive summary, a four-page overview, a 20-page article, and a 50-page appendix. User interfaces can be devised to read this ``stretchtext.''
Some of these textual components can be targeted towards generalists in a field, some towards specialists. It is even possible that some components could be directed towards readers who are outside the academic specialty represented.
This possibility for variable-depth documents that can have multiple representations is very exciting. Well-written articles could appeal to both specialists and to those outside the specialty. The curse of the small audience could be overcome if the full flexibility of electronic publication were exploited.
As I noted earlier, one of the critical functions of the academic publishing system is to filter. Work cannot be cumulative unless authors have some faith that prior literature is accurate. Peer review helps ensure that work meets appropriate standards for publication.
There is a recognized pecking order among journals, with high-quality journals in each discipline having a reputation for being more selective than others. This pecking order helps researchers focus their attention on areas that are thought by their profession to be particularly important.
In the last 25 years many new journals have been introduced, with the majority coming from the private sector. Nowadays almost anything can be published somewhere ... the only issue is where. Publication itself conveys little information about quality.
Many new journals are published by for-profit publishers. They make money by selling journals subscriptions, which generally means publishing more articles. But the value of peer review comes in being selective, a value almost diametrically opposed to increasing the output of published articles.
I mentioned above that one of the significant implications of electronic publication was that monitoring costs are much lower. It will be possible to tell with some certainty what is being read. This will allow for more accurate benefit/cost comparisons with respect to purchase decisions. But perhaps even more importantly it will allow for better evaluation of the significance of academic research.
Citation counts are often used as a measure of the impact of articles and journals. Studies in economics ([Laband and Piette(1994)]) indicate that most of the citations are to articles published in a few journals. More and more articles are being published, a smaller and smaller fraction of which are read. ([de Sola Pool(1983)].) It is not clear that the filtering function of peer review is working appropriately in the current environment.
Academic hiring and promotion policies contribute an additional complication. Researchers choose narrower and narrower specialties, making it more and more difficult to judge achievement locally. Outside letters of evaluation have become essentially worthless due to lack of privacy guarantees. The only thing left is the publication record and quantity of publication is easier to convey to non-experts than quality of publication.
The result is that young academics are encouraged to publish as much as possible in their first 5-6 years. Accurate measures of the impact of young researcher's work, such as citation counts, cannot be accumulated in this short a time period. One reform that would probably help matters significantly would be to put an upper limit on the number of papers submitted as part of tenure review. Rather than submitting everything published in the last 6 years, assistant professors could only submit their 5 best articles. This would, I suggest, lead to higher quality work, and higher quality decisions on the part of review boards.
If we currently suffer from a glut of information, electronic publication will only make matters worse. Reduced cost of publication and dissemination is likely to make more and more material available. This isn't necessarily bad; it simply means that the filtering tools will have to be improved.
I would argue that there are two dimensions on which journals filter papers: interest and correctness. The first thing a referee should ask is ``is this interesting?'' If the paper is interesting, the next question should be ``is this correct?'' Interest is relatively easy to judge; correctness is substantially more difficult. But there isn't much value in determining correctness if interest is lacking.
When publication was a costly activity, it was appropriate to evaluate papers prior to publication. Ideally only interesting and correct work manuscripts would undergo the expensive transformation of publication. Furthermore publication is a binary signal: either a manuscript is published or not.
Electronic publication is cheap. Essentially everything should be published, in the sense of being made available for download. The filtering process will take place ex post, so as to help users determine which articles are worth downloading and reading. As indicated above, the existing peer review system could simply be translated to this new medium. But the electronic media offer possibilities not easily accomplished in print media. Other models of filtering may be more effective and efficient.
Allow me to sketch one such model for electronic publishing that is based on some of the considerations above. Obviously it is only one model; many models should and will be tried. However, I think that the model I suggest has some interesting features.
First, the journal assembles a board of editors. The function of the board is not only to provide a list of luminaries to grace the front cover of the journal; they will actually have to do some work.
Authors submit (electronic) papers to the journal. These papers have 3 parts: a one-paragraph abstract, a 5-page summary, and a 20-30 page conventional paper. The abstract is standard part of academic papers and needs no further discussion. The summary is modeled after the Papers and Proceedings issue of the American Economic Review: it should describe what question the author addresses, what methods were used to answer the question, and what the author found. The summary should be aimed at as broad an audience as possible. This summary would then be linked to the supporting evidence: mathematical proofs, econometric analysis, data sets, simulations, etc. The supporting evidence could be quite technical, and would probably end up being similar to current published papers in structure.
Initially, I imagine that authors would write a traditional paper and pull out parts of the introduction and conclusion to construct the summary section. This would be fine to get started, though I hope that the structure would evolve beyond this.
The submitted materials will be read by 2-3 members of the editorial board who will rate them with respect to how interesting they are. The editors will only be required to evaluate the 5-page summary, and are not necessarily responsible for evaluating the correctness of the entire article. There will be a common ``curve'' used by the editors; e.g., at most 10% of the articles would get the highest score. The ``Editorial score'' will be attached to the paper and it will be made available on the server. Editors will be anonymous; only the score will be made public.
Note that all papers will be accepted; the current ratings system of ``publish or not'' is replaced by a scale of (say) 1-5. Authors will be notified of the rating they received from the editors and they can withdraw the paper at this point if they choose to do so. However, once they agree that their paper be posted it cannot withdrawn (unless it is published elsewhere) although new versions of it can be posted and linked to the old one.
Subscribers to the journal can search all parts of the on-line papers. They can also ask to be notified by email of all papers that receive scores higher than some threshold or that contain certain keywords. When subscribers read a paper, they also score it with respect to its interest, and summary statistics of these scores are also (anonymously) attached to the paper.
Since all evaluations are available online, it would be possible to use them in quite creative ways. For example, I might be interested in seeing the ratings of all readers with whom my own judgments are closely correlated. (See [Konstan et al.(1997)Konstan, Miller, Maltz, Herlocker, Gordon, and Riedl] for elaboration of this scheme.) Or I might be interested in seeing all papers that were highly rated by Fellows of the Econometric Society or the Economic History Society.
This sort of ``social recommender'' system will help people to focus their attention on research that their peers---whomever they may be---find interesting. Papers that are deemed ``interesting'' can then be evaluated with respect to their correctness.
Authors can submit papers that comment upon or extend previous work. When they do so, they submit a paper in the ordinary way with links to the paper in question, as well as to other papers in this general area. This discussion of a topic forms a thread that can be traversed using standard software tools. See [Harnad(1995)] for more on this topic.
Papers that are widely read and commented upon will certainly be evaluated carefully for their correctness. Papers that aren't read may not be correct, but that presumably has low social cost. The length of the thread attached to a paper indicates how many people have (carefully) read it. If many people have read the paper and found it correct, a researcher may have some faith that the results satisfy conventional standards for scientific accuracy.
This model is unlike the conventional publishing model, but it addresses many of the same design considerations. The primary components are: