Not only, but also. (aka "Craig Venter has Left the Building")

Well, as you've probably guessed, the weekend was an absolute blast. And not only that, around the time I started to wake up again I got some startlingly excellent news. Celera Genomics (as was) has decided to admit that they can't make a living selling their version of the genome. Slashdot's take on this (which is mainly people's comments rather than in itself being terribly informative) is here. It does, however, link to a Business Week story here and a New York Times story (free registration required) here.

They are publishing their data freely on the usual public databases once existing subscriptions run out.

Now, in a sense this is now fairly irrelevant, but as I spent a couple of years trying, with a small army of other people, to spike their business plan, I find it hard not to take a continuing personal interest.

All opinions expressed after this point are those of the author and are not necessarily factual. This is how I remember it, and while I have a fairly good memory it's not infallible. All points should be readily (dis)provable by someone with a few minutes to spare, though.

In 1998, Applied Biosystems and Craig Venter founded Celera Genomics and issued a press release claiming that they were going to spend a year building an organisation that would produce a assembled sequence of the human genome in another year, with each point in it covered by an average of 12 sequencing reactions - that's called 12x coverage in the jargon - and then continue to 20x coverage. They would then make their money back by charging people for access to the sequence (that's right - no sneeky peeking) and applying for patents on a certain number (not originally specified, but subsequently 500 was suggested) of human genes. The Human Genome Project <Sobchak>on which I was then working </Sobchak> was at this point about halfway done according to its original timetable, and the instant suggestion was that we should shut up shop and go home. This was seriously advocated by many in the US government, who saw Celera's approach as far better than the federal government putting up cash for the same job. The general principle that the government should not directly compete with private industry was deployed in debates. It was stated - wrongly, in my view - that Celera's whole-genome-shotgun approach could provide comparable quality to the more heirarchical approach the Project was taking.

Interestingly, I can't now find a copy of that initial press release. Why this is interesting is something I'll come back to.

The killer blow came from my employers, the Wellcome Trust. Now, this isn't entirely true in that they weren't directly my employers. I was an employee of Genome Research Limited, a contract research company owned by a trust set up jointly by the Medical Research Council and the Wellcome Trust. The building I worked in, the Sanger Centre, was leased from a company called Hinxton Hall Limited, owned by a trust likewise. The same trust. What it amounted to is that the Sanger was at that time a semi-detached arm of the Wellcome Trust. These days an intermediate level has been removed and the Wellcome Trust Sanger Institute is officially an arm of the Wellcome Trust, and all the above-mentioned bodies have their registered offices at the Wellcome's building on Euston Road. That's by the by, though. What they did was simple, but devastatingly effective.

Some weeks earlier, they had heard a presentation by Jane Rogers and John Sulston, both of the Sanger, supporting a request for a doubling of funding to allow the Sanger to take on a third of the human genome rather than the sixth we were at that point contracted to.

As another digression, this contract was with HUGO, the Human Genome Organisation, which co-ordinated the effort the consortium of centres made to get the genome completed. Centres came to Hugo with funding and Hugo would make sure they had a patch - maybe a whole or even multiple chromosomes, maybe less - to get working on that duplicated nobody else's. There were quality stipulations (specific problem areas would be marked as such and unmarked areas would have an error rate less than 1 in 10 000) and data release requirements (the Bermuda Principles - sequence data would be placed in the public domain within 24 hours of their production) in addition, which everybody agreed to and which everybody stuck to. Apart from, it was widely rumoured, one centre. This was The Institute for Genome Research, or TIGR for short. You can pronounce this Tiger or Tigger depending on how sarcastic you're feeling - the name's clearly a backronym, and they used to have the worst logo in all of science : a tiger climbing a double helix. Whenever we felt insufficiently superior we'd just download their logo and snigger. Sorry, download their logo and SNIGR.

Oh look. There's a copy of it here:

TIGR had problems sticking to the Bermuda Principles. They wanted more time to do further analysis and produce a more coherent sequence before exposing it to the public. There's nothing wrong with that except that it wasn't what they'd agreed to. The founder, and at that point still the head, of TIGR was one Dr J Craig Venter.

To return to the main thread, the Trust had been going to announce a few weeks later that this application had been successful, but after a series of hurried phone calls it was pushed out with great fanfare (IIRC) the next morning. The reaction was impressive. Many people who had been wavering notably on whether the public project should continue became resolute in its defence. The New Scientist's editorial on the subject started "The Wellcome Trust don't get mad, they get even."

The doubts which had been expressed the previous day about the ability of the Project to match Celera's timetable, and about the wisdom of continuing when Celera was willing to do the job at zero upfront cost, largely disappeared. Although there were continuing attempts to cut federal funding for the US segment, they continually failed in the face of one single unstated objection: the Wellcome Trust were going ahead to sequence one-third to completion and to abandon the Project would mean that the US was not involved in what was at that time the most important scientific project in the world. National pride would not permit it, and national pride means a lot to politicians. I have never seen this stated publicly by anyone other than me, but I am sure that it's the case.

The race was on. In spite of denials from all those involved, it was a race. The finishing line wasn't where most people thought it was, though. Who had a "complete" sequence first was close to irrelevant. The crucial points were firstly whether Celera would be able to cherrypick important genes for patenting, and to prevent that as much data had to be collected and assembled as soon as possible, and enough analysis performed to provide a shield of Prior Art, and secondly that a high-quality version was available free to all without subscription charges. We had an advantage in that the heirarchical approach allows continual assembly of small sections, as the data become available, to provide the sequence of that section. A whole-genome shotgun has to have all of the sequencing done beforehand and then a single large assembly operation performed. Celera wouldn't have anything to analyse and spot genes on until right at the end of their operation, whereas we already had a lot and could rejig our operation to prioritise getting as much as possible as fast as possible.

To this end, a previous version of the plan was resurrected. We had initially been going to produce a "draft" version of the genome, with 6x sequencing, and then do another 6x for each area and perform further directed finishing work (this last bit being my job) to reach a final version. It had subsequently been decided to abandon the two-stage approach and just do 12x sequencing in one pass followed by finishing. We reverted to the first version, which had the advantage of producing raw mid-quality sequence as fast as possible, but at the cost of decoupling sequencing from finishing and therefore demanding the storage of huge numbers of DNA specimens in deep-freeze. But so be it. Needs must when the devil drives.

An automated analysis program was also instituted, whereby every night matches were sought between known gene or EST sequences and sequence from our databases, and as much information as possible was stuck up on the web. Prior publication in patent terms, remember, depends only on it being available where people could read it, not on whether anyone actually did. Our pages were available on the website, properly linked to and searchable-for by anyone. We were as prepared as we could be.

Celera had, IIRC, about 230 ABI3700 sequence readers (according to a slashdot post of the time, apparently made by an insider). These were absolutely the dog's bollocks and their relationship with ABI meant that they got them first, hot off the production line. We had, in addition to older machines which were still being used and still useful, about 100 (I believe - the ones in the lab next to us were identified with Sanger numbers in the high eighties, and they weren't quite the most recent at that point) and were one-third of the project. It seemed that in terms of raw throughput we had the edge.

Celera was floated as a separate entity on the stock exchange in 1999. Towards the end of the year their price started to go through the roof as people started talking about the impending completion of the human genome. Around the same time we became aware that the logs of our ftp servers were recording systematic access by Celera's machines. They were lifting everything we produced. This was public-domain information (in accordance with the Bermuda Principles) so they were entirely entitled to do this, but it was very puzzling behaviour for a company who had scorned our approach and loudly trumpeted the superiority of their technique and technology. Patent approval is supposed to be based on innovation, and having lifted our assemblies must surely leave them open to challenges on that front . . . shouldn't it? After all, the Wellcome Trust had stated that their lawyers might well be challengin patents in the courts, and the Trust was even then a £20 billion organisation. Could it be that they weren't going to go down the patent road after all?

In mid-March there was a joint statement by Bill and Tony (aww - Bill and Tony - we all liked them both back then, of course, even if they were a bit smarmy) on the future of the genome. In spite of its lack of teeth, this had an astonishing effect. No action was going to be taken, but both premiers expressed an aspiration that human sequence data be available without oppressive conditions. Celera's share price went through the floor. It lost about eighty percent in a few days. In June, there was a joint announcement that draft versions had been completed, but Celera's share price had recovered as much as it was ever going to - to about $90 dollars a share, and it wouldn't stay there long.

At this point, Celera had a complete genome assembly - a single coherent statement of the genome. We hadn't actually been intent on making one of them, because it hadn't seemed useful. A graduate student at UCSC, however, had thought differently and had approached UCSC for cash to build a network of about 200 or 250 commodity PCs to integrate all the Project's sequence and mapping data into such a model. It took him about a month to write the assembly program, Gigassembler, and it ran to completion (according to reports I heard) three days before the joint announcement. As Celera stopped theirs on the morning of the announcement, I would guess that this man, James Kent, was the producer of the first whole-genome human assembly and deserves a lot more public recognition than he's had. If you ever meet him, buy him a drink from me and claim the cash back when you see me. The inevitable slashdot article's here. The browser for this database had a web interface at, and it was an important resource for us during the later stages of the Project.

There was a long quiet period after this, while people analysed and wrote papers. There was also talk of a lawsuit by disgruntled Celera shareholders against the management for not disclosing that they'd been in talks over a joint release. These talks had come to nothing because the goals of a corporation and our goals were too distant. I don't know what happened about the suit, but I didn't hear of it ever reaching a court or being settled, so it might well have been dropped quietly. Celera had said they were going to be bolstering their assembly with more data during this time, and then moving on to do some mouse sequencing. We were hard at work building our version from 6x towards a fairly uniform 12x and getting on with the important finishing work to ensure consistent high quality and completeness.

The next big surprise came the next year. In 2001 Nature and Science published, the same week, papers by Us and Them respectively describing our different draft versions. Somewhat surprisingly, it turned out that while their version was based on 12x coverage, they'd only done 5x of that themselves. The rest was derived from what they'd lifted - entirely legitimately, remember - from our databases. There was actually more of our data in their assembly than there was of their own. This was incorporated as "faux" reads - overlapping sections of sequence representing our (roughly) 7x (by that time) coverage, fed into their assembly engine along with their data. This was pointed out in a subsequent paper by Sanger analysts. A Celera rebuttal stated that they would have had a better assembly if they had not used our data. Whether or not that last point is true, one fact is undisputed : the draft assembly their paper was based on was not produced using the whole-genome technique their business plan had envisaged. Furthermore, they had produced 5x rather than 12x coverage in a year, and according to their website they never increased that to anything like the 20x that their initial press release had described. In fact, they never increased it at all. What they still advertise is 5x plus our data : "Using approximately 5x of Celera human sequence data combined with BAC data from GenBank" is what it says as I write. They were never, so far as I am aware, able to justify sequence-level patents (result!) and their ability to sell a sequence database was considerably hampered by the fact that most of it was available for free over the road. As far as I'm aware Celera's subscription income never covered their operating costs. In spite of the hype, they have never posted a profit.

Two years later, the Project wrapped, having exceeded its original targets for accuracy and completeness in spite of the fact that when they were set the technology required had not been invented. Furthermore it was finished two years before the original timetable and some number of hundreds of millions of dollars below budget.

In due course, Celera fired Dr J Craig Venter and de-emphasised the genome database trade, turning instead to drug discovery as a potential cash cow. I wish them well in this, as there are many fields in which new drugs are needed and in short supply. Venter did not respond to the public announcement, as his yacht was in choppy water near the Bahamas. It's tough at the top. My heart bleeds for him. To be fair to the guy, though, his share of the float income for Celera was reputedly about $100 million, and if he kept any of it then it was a trivial amount. He was already rich from previous ventures and clearly didn't feel the need for any more. All, or basically all, of it went to fund a couple of new research centres which allow him and others to work on other useful problems. He's a high-calibre scientist and I doubt we've heard the last of him.

Now, of course, they've stopped even taking subscriptions for access to their data. As they're a stock-owned corporation with an obligation to maximise shareholder value, I can only assume that the number of people prepared to pay has fallen below the cost of running that part of their operation. And I don't imagine that that's much of a cost at all, given that all of the sequencing and assembly was done over four years ago.

Celera Genomics, as I write, is trading at $9.20 a share. In the last 12 months it's been between $9.09 and $14.73. At peak, shares were changing hands for more than $250 each. But it was always a castle built on a swampsand. When light dawned on investors, the price fell in a pair of crashes to thirtysomething dollars, and then to somethingteen, where it has remained to this day. Their business plan depended crucially on us giving up and going home, and entirely on us leaving the way open to them aquiring patent rights over swathes of the genome. Without that, they had no reliable revenue stream and no credible way to cover their setup costs.

I've been mellowing a bit towards Venter over the last couple of years. But I've just been looking at an old Wired article where he's quoted as saying they'd get sequence-level patents on human genes, and I find myself unable to think, once more, "What a tosser". I'm glad they lost. I'm just sorry so many investors were mislead and lost their shirts in the process. Analysts were recommending them as shares well after they'd lost most of their value - presumably on the grounds that they must surely now be overpriced - when it was obvious (surely?!) to anyone looking at what it was they did that they hadn't a chance of ever covering even their operating costs from their subscription revenue. Indeed, looking at their income and their declared cash reserves following their floatation, I was pretty sure that they must be getting more from interest than from sales - meaning that they were primarily an investment management company rather than a biotechnology company.

Craig Venter's problem was that he was so eager to see his method (which, to be fair, he sincerely believed to be better) used in practice that he sold his soul to corporate capitalists, who own no loyalty except to the god that is Shareholder Value, and tried to deliver them a unique and precious resource that, insofar as it can be owned, should belong to everyone. For that, he has a permanent, well-deserved and very high position on my personal shitlist. Poor Craig. I hope he has his long spoon with him.

PS: While digging, I found that MJ (who made some rather nice thermal cyclers that we used to use at the Sanger) were sued by ABI and Roche, filed for bankruptcy and were taken over by Bio-Rad, who made our confocal system. Except that that division has been sold off to Zeiss. Whenever I try to keep track of what companies are up to these days i get a headache. They're constantly taking each other over, swapping names about between divisions and selling bits of themselves to each other. It's worse than Whitby.

I have a feeling I may have a paper copy of that press release somewhere in the house. If I find it, I'll put it up. If I remember it correctly, then with the benefit of hindsight it should make quite entertaining reading.
a) That was interesting. I'm glad the Project won out, there's something seriously unpleasant about the idea of patenting sequences in my book.

b) That's a seriously awful logo!

c) I like the comparison of companies to goths ;)
Thank you.

1. Unlike drugs, where people can develop alternatives, genes are finite in number - and fewer by far than people had assumed. The potential to lock up development and study would have been huge.

2. Yes. Humility? We've heard of it.

3. If I hadn't been writing that paragraph a week after it the likeness wouldn't have occurred to me.
wow, very interesting
that tiger logo -- the tiger looks like it's falling down the helix rather than climbing up it, to me. Very appropriate.
Re: wow, very interesting
Yes. Very precarious.

TIGR are still going, and unlike their errant founder were full participants on the HGP. There's a Wikipedia page here. If you look at their site you'll see that they have a much more businesslike logo these days.
Didn't it turn out that Venter got them to mostly sequence *his* DNA?

(God, years of avoiding anythign to do with genetics like the plague has made me very behind on the scandal.)
Yes. Which is a bit megalomaniacal.

Not that there'd ever *cough* be such an ethical breach on our side, of course.