Testing for COVID-19 Infection

Accurate testing for SARS-CoV-2 infection, by which we mean testing with few false positives as well as few false negatives, is important not only for clinical management of individual cases but for epidemiological case tracing, limiting spread of infection and informing public health strategies. In the latter situations, tests that are rapid, cheap and easy to perform are particularly desirable.

Two main forms of testing for SARS-CoV-2 infection are in use.

  • Antigen testing, which includes the self administered immunochromatographic lateral flow test (LFT), detects viral coat material and is developed by raising a specific antibody against the antigen target. It therefore measures active production of the viral protein that constitutes the antigen. Developing such a test requires knowledge of how the viral behaves in the host and is reliant upon generating a sensitive and specific antibody to be used for the test.
  • Molecular tests, such as polymerase chain reaction (PCR), loop-mediated isothermal amplification (LAMP) and clustered regularly interspaced short palindromic repeats (CRISPR) tests amplify viral RNA material. These tests are potentially specific for different variant mutations of SARS-CoV-2. The gold standard test is considered to be lab based reverse transcription viral PCR (RTPCR). Rapid testing, such as the LAMP test, skips some of the time-consuming stages of formal PCR and therefore is useful for screening and epidemiology.

Many different studies have reported on the performance of different brands of rapid antigen and molecular test. This article discusses a Cochrane review of diagnostic test accuracy that collected data from 64 studies that investigated 16 different antigen tests and five different rapid molecular tests. In all there were 24087 samples. Of these, 7415 were found on the subsequent gold standard RTPCR test to be positive. No study actually included clinical diagnosis of infection as a standard or criterion of infection.

Antigen tests had a sensitivity in symptomatic cases of 72% (95% CI 63.7 to 79%). The sensitivity was higher in the first week of infection (78.3%) than in the second week (50%). The overall specificity was 99.6% (in both symptomatic and asymptomatic – obviously if symptomatic one wonders what they actually had if we are considering false positives.

Analysing a different way, the positive predictive value (PPV) in comparison was only 84-90% using the best sensitivity brands at 5% population prevalence and at 0.5% prevalence in asymptomatic people the PPV was only 11-28%!

Molecular tests had an average sensitivity of 73% (66.8 to 78.4%) and specificity was 97.2 to 99.7% depending on brand. In most trials the samples were collected in labs rather than in real life conditions. There are no data about symptoms.

For reference, WHO considers >80% sensitivity and >97% specificity an appropriate test as a replacement for a lab test.

The authors note that in low prevalence settings the dramatically reduced PPV value means that confirmatory tests may be required and that data are required for the settings in which tests are intended to be used as well as on user dependence (i.e self testing)

Journal Club Conclusions

Why the huge difference between positive predictive value and specificity?

Specificity = 1 minus false positive rate (FPR)

FPR = false positives/all negative cases (i.e. true negatives and false positives)

So high specificity means false positives is low compared to true negatives

PPV = true positives/all positive results (i.e. true positive and false positives

So high PPV means false positives low compared to true positives

The difference between specificity and PPV is essentially that sensitivity is in relation to the total number of actual cases while PPV is in relation to the total number of positive test results. PPV could be much worse than specificity if the true negatives much more common that true positives. This would happen if infection rates are very low in the population.

Is it disingenuous to quote specificity based on trials where infection rates were 31% when in the real world the infection rates are perhaps two orders of magnitude lower than that? If someone wanted to argue that they were being denied going to work, going to school or going on holiday based on a test where the predictive value was only 11-28%, would they have a good case? Is it worthless as a screening tool? If the policy is to go on to a molecular test if the result is positive, is this also invalid if the PPV of the molecular test is similarly low?

Clearly, the use of tests should be tailored to the information that can be provided. In an outbreak of high prevalence, one wants a sensitive test to pick up people who might have infection after contact tracing. One could have a specific screen and lab test only those negative to make completely sure. The priority is not to miss positive cases.

If medicolegally one wants to prove workers were not source of a case or outbreak, when the prevalence of infection is low, one may as well go straight to lab testing as there are too many false positives in such a situation.

In a situation where someone wants to question a positive result, it is not clear that rapid molecular testing is superior to antigen testing, and a lab based PCR may again be necessary. as specificity not clearly higher than some molecular tests.

There are also biological as well as statistical issues. For example, antigen tests may theoretically have false positives if the nature of the generated antibody to a viral coat antigen is not clear. If the trial was done in the summer time with no winter flu or coronavirus common colds in a setting where one in three subjects have COVID and none has any other type of infection, is the generated antibody really shown to be specific for SARS-CoV-2? The same may apply to demographic factors, such as expecting the test to have the same performance in children or in care homes as in a healthy adult test population.

On the other hand, a molecular test may correlate better with actual infectivity than a bit of residual RNA and therefore be more biologically useful for epidemiological control.

Finally, in extremely high prevalence trial populations where there is no actual clinical corroboration, the absolute reliance upon lab PCR as a gold standard may be of concern.

Posted in Infectious Diseases | Tagged , , , | Leave a comment

Distinguishing Encephalitis from Encephalopathy


Encephalitis may be defined as infection or inflammation of the brain substance, resulting typically in disturbed sensorium and perhaps seizures or focal neurological deficits and sometimes pyrexia. Encephalopathy in the other hand represents disturbed sensorium not due to an infective or inflammatory cerebral process and its causes range from toxins and drugs to metabolic upset, non cerebral sepsis, cerebral hypoperfusion and post-ictal states.

Distinguishing the two is important because considerable morbidity and mortality is associated with delayed treatment with appropriate antiviral, antibiotic or immune therapy for encephalitis and with delayed treatment of the various causes of other causes of encephalopathy.

The paper presented, “To what extent can clinical characteristics be used to distinguish encephalitis from encephalopathy of other causes? Results from a prospective observational study” by Else Quist-Paulsen et al., attempts to use clinical and rapidly available investigatory findings to distinguish the two conditions by a prospective observational study on 136 patients.

They identified candidate patients on the basis they had a lumbar puncture, and then excluded those with no evidence of encephalopathy. Their criteria for encephalitis were:

  • Pyrexia
  • Encephalopathy > 24 hours with no other cause identified and 2 of:
  • CSF WCC >=5 x 106/l
  • New onset seizures
  • New onset focal neurological findings
  • CT/MRI consistent with encephalitis
  • EEG consistent with encephalitis

The gold standard by which to gauge their test would surely be a definitive diagnosis but, as is commonly the case in clinically suspected encephalitis, such a diagnosis was only made in 10 of 19 patients. In some of the patients with non-encephalitis encephalopathy, the diagnosis was also vague, e.g. “aseptic meningitis” (which could be encephalitis), “epilepsy” (which could be autoimmune encephalitis), “headache/migraine”, “unspecified disorientation or coma”.

Subsequent analysis of specific features in the two groups then becomes somewhat difficult because the criteria themselves become the gold standard and because some specific features were in themselves their criteria. Interestingly, systemic features of infection such as raised blood white cells or CRP, argued against encephalitis because general sepsis was a common cause of encephalopathy. Nausea and personality change were more common in their encephalitis group.

They used ROC curves to look at the predictive value of these specific features and their combinations, but these were again based against their “testing variable”, their criteria, not on some objective gold standard. It would have been better to look at them only in the 10 diagnosed cases rather than all 19, but then the total number of cases would be even lower.

The diversity of diagnosis of their cases was interesting, especially that Lyme disease and TB were as common as VZV and more common than HSV. Only one of their cases had NMDA receptor antibodies, but we do not know that all the patients had this test and a full battery of other autoimmune antibody tests. Many might have been put in the encephalopathy with seizures category. Since encephalitis can be associated with meningism, some “aseptic meningitis” patients might have been viral but with negative testing, or even autoimmune with a migrainous headache and stiff neck.

The group felt that the study was very worthwhile but a more clear guide as to which cases of encephalitis warranted antimicrobial therapy  or immune therapy would be the clear goal. This would require clarity on the gold standard diagnosis and many more patients.

The Journal Club discussion on which this post is based was presented by Dr Aram Aslanyan, Specialist Registrar in Neurology at Queens Hospital, Romford, Essex.

Posted in Infectious Diseases, Inflammatory/ Auto-Immune Diseases | Tagged , | Leave a comment

Making a Differential Diagnosis using a Clinical Database


A great deal of time is spent in medicine reading and writing case reports. Essentially, clinical features are listed and a diagnosis made. Excluding those cases that point to a novel means of treatment, a case report is often noteworthy simply because the diagnosis is rare, or because the clinical features were most un-likely to be associated with the diagnosis. This hardly seems a reliable method of archiving medical knowledge.

Much less time is spent on attempting a method of diagnosis that is more systematic than the recalling of case reports. One can see that if one did wish to move medical diagnosis into the information age, natural instinct would be to use an internet search engine to enter a list of clinical features and see what disease diagnoses were associated with these terms. Unfortunately, internet search engines concern themselves only with the popularity of search terms and because of the dominance of case reports such practice may be likely to throw up the least likely cause of those features, or that which is most “titillating” to those who most perform internet searches.

There have been attempts to provide a more balanced means of linking clinical features with diseases and hence making clinical diagnoses. Rare disease with a large number of different clinical features are least easily diagnosed by clinical experience or key investigations, and so the focus of these attempts has been on rare genetic diseases using ever-expanding databases such as Orphanet, Online Mendelian inheritance in Man (OMIM) and the London Dysmorphology Database and the Pictures of Standard Syndromes and Undiagnosed Malformations (POSSUM).

One method of searching for clinical features on these databases is simple text matching. A way of quantifying the match is the feature vector method, which calculates the mathematical overlap between the Query (the clinical features of the case) and the Disease (the clinical features of the disease). A vector of the query is calculated with dimensions for each feature and a value of 1 if present and 0 if absent. The same is done for the disease. The dot product of the two vectors is the strength of the match (a 1 for both query and disease will sway the two vectors in a common direction, and a 0 for both will leave their relationship unchanged, while a 0 and a 1 will make one move away from the other).

A potentially better quantification of matching is to take into account the different specificities of different clinical features. If a clinical feature is present in only a few diseases, its annotation (the linkage of a clinical feature to a disease) is more specific for that disease (in database terms this is called the information content (IC)) and so that linkage should have more weighting. The IC is simply the negative log of the frequency of the annotation. For example, AV block is a term that annotates 3 diseases in the 4813-disease OMIM database. The frequency is 3/4813. Loge of this is -7.38 and minus loge of this is 7.38. A much more general term will have many annotations and a much lower negative log, tending towards zero. The ICs of all the clinical features of the query can be summed or otherwise combined to provide an overall match.

The authors of the presented paper have described a further refinement of this method. This is called the Ontology Similarity Search (OSS). Instead of simply matching the text of terms, they fit clinical features into a standardised language within an ontological framework. This means that the features are related to one another in a hierarchy, with more general terms higher in the hierarchy and more specific subcategories of those general terms lower in the hierarchy. While “parent” terms obviously have many “child” terms, child terms can also belong to multiple parent terms. For example, optic atrophy could be a child of demyelinating disease and also a child of visual disturbance. Their ontology is called the Human Phenotype Ontology (HPO) and has around 9000 terms.

The advantage of using the ontology is that if a clinical feature of a case does not fit the clinical features of the disease, but shares a parent term with one of the features of the disease, instead of scoring a zero match, this scores as a match but less so than if the match was with the specific terms. The method specifically find the most informative common ancestor of the two different clinical features, and uses the IC of that term. Being a more general term, it will be a feature of more diseases and so have a lower IC. (In the database, ancestor terms are implicitly annotated when child terms are annotated.) The overall strength of match is the average of all the ICs – there will always be a IC for each feature, even if it is just that they are both a feature of “any disease”, which of course has an IC of zero and would bring down the average.

Summary of the Paper

The presented paper, Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies by Köhler et al. (Am J Hum Genet. 2009 Oct 9; 85(4): 457–464), describes a further refinement of the method using a statistical treatment. For a given disease, if random clinical features from the HPO were selected one would expect a lower OSS score than for a patient who actually had the disease. If the OSS for random features were repeated many times, a distribution would be created and so one could then look at the real patient OSS and determine a p-value on this distribution. If the real OSS was higher than 95% of the random OSS scores, the p-value would be lower than 0.05 and indicate a likely match. Furthermore, if the same features were compared with different diseases and their random OSS distributions, a ranking of the likelihood of diseases could be determined by ranking the corresponding p-values. They call this the OSS –PV.

Since they considered it too onerous to enter, within the framework of the terms of the HPO, the clinical features of real patients with known diseases, they used simulated patients. This was done for 44 diseases, where they created a “patient” having a disease with a selection of the clinical features of the disease weighted by how commonly those features were found in that disease. For each disease 100 patients were created, so if from the clinical literature a feature is found in 1% of cases with the disease, 1 of the 100 simulated patients would have that feature.

They added “noise” to the process by adding to the patients some random features that were not part of the disease, and “imprecision” to the process by replacing some features with their HPO parent terms.

Then they looked at the rank position of the true disease among all the 5000 or so database diseases found by the different methods. The closer the rank position to the true position (first!), the better the method performed.

Unsurprisingly, the performance of the feature vector method, as shown by box plots of rankings for all 44 diseases tested, was found to suffer when imprecise terms were used, because that was the point of using the ontological system. The OSS-PV method more modestly outperformed the raw OSS method when noise and imprecision were added.

As the authors point out, the OSS method potentially suffers from the fact that it only matches query terms with disease terms. If a disease also had many terms that did not match the query terms, surely the overall match would be less specific. This can be taken into account by performing a symmetrical similarity search, where the OSS is the average of the matches of the query to the disease and the matches of the disease to the query. However, they did not use this method in their presented data, only stating that when they used it the symmetrical OSS-PV still significantly outperformed the feature-vector method. They do not state that it still outperforms the symmetrical raw OSS.

Another point raised by the paper is that if one finds on a disease search that no disease fits the features with a p-value less than 0.05, exploration could be made of other clinical features, or child features of the entered clinical features that would have a higher information content and provide a more significant match. Going back and looking for a specific feature, or performing a specific investigation, would be an example of this.

Journal Discussion

As described in the introduction, any attempt to quantify and rationalise differential diagnosis should be lauded and this paper clearly describes progressive refinements of this process. It is almost negligent to have all the data available on thousands of diseases and not to use them because the unaided human mind simply cannot store so much information.

However, a number of further refinements and limitations present themselves.

First, the matching of terms is still semantic rather that systematic. While a knowledge-based approach, it nevertheless does not rely on understanding of disease pathophysiologies and pathognomonic features. Some clinical features that share a close parent may in fact best distinguish diseases rather than be considered loosely positively associated features. This may apply particularly in neurology where there is a more systematic approach. For example, upper motoneurone lesion and lower motoneurone lesion may be considered together and share a common parent in “motor neurone lesion”, but apart from the case of motoneurone disease, they split the differential diagnosis more than upper motoneurone lesion and no motor lesion at all. They are semantically similar but nosologically opposite. Horizontal supranuclear gaze palsy and vertical supranuclear gaze palsy may share a strong information content parent, but may be the feature that best separates Gaucher disease from Nieman Pick disease.

This leads to the second point. The frequency, or sensitivity, of a clinical feature in a disease is not considered, although ironically considered when creating the simulated patients with the 44 tested diseases. In large part this reflects the lack of clinical data in the databases themselves. It is regrettable that case reports are not combined into case series which contain information on the frequencies of occurrence of clinical features, or when there are case series, these data are not actually collected systematically. If a clinical feature occurs in 1% of cases of one disease and 100% of cases of another disease, clearly the annotation of the feature for the second disease should be considered far stronger than for the first. Instead, because there are no such data, they are given equal weight; the weighting only considers whether or not the feature is also found in a number of other diseases, not how commonly it is found in those diseases.

There is no consideration of how common the disease is in the first place. While restricting themselves to rare and genetic diseases by definition, there can be a frustrating tendency for searches to throw up the least likely diagnosis. It is often the case in practice that the clinician does not know in advance that the patient has a rare genetic disease, and a diagnostic tool should be most useful to those with least intimate knowledge of the database. Thus, when entering the features dystonia, spastic hemiparesis and spastic dysarthria in a case of cerebral palsy, it comes as a surprise when the top diagnosis is cleft palate-lateral synechia syndrome.

Finally, the methods assume that clinical features are independent. In fact, many clinical features are strongly interdependent; they especially occur together. The association of the second feature is not really very additionally informative if the first is present. This problem would be common to most forms of differential diagnosis calculators, including those using Baysian methods, and could only be solved if there were data on the interdependence of clinical features in different diseases; currently it is hard to find even raw frequency data for most diseases.

The point that the authors raise about using their App to find features that would be more specific in making a diagnosis is an interesting one, and opens a new approach to diagnosis and refinement of the process of often expensive and sometimes risk-associated investigation. One could imagine the improvements in medical care that would arise from use of an App that gave a differential diagnosis based on initial clinical information and then showed the relative power of different investigations in narrowing that differential.

A further use of these methods would be in creating diagnostic criteria. While clinical practice is rightly focused on the most likely diagnosis in a patient, clinical research is focused on a group of patients where the diagnosis is certain, i.e. specificity at the expense of sensitivity. Currently, diagnostic criteria seem to be set largely by “workshops” – gatherings of the great and the good usually in an exotic location who draw up a list of features, create two categories of importance and then decide how many features are required for a “definite diagnosis”. Using a quantified method such as that described in this paper for every study patient and including only patients where the diagnosis reaches a threshold p-value score would seem to be a far more reliable method.

The paper on which this journal club article is based was presented by Dr John McAuley, Consultant and Honorary Senior Lecturer in Neurology at Queens Hospital, Romford.

Posted in Genetics | Tagged , , , | Leave a comment


Coronavirus is obviously not a neurological disease, apart from an isolated case report of encephalitis associated with the condition, which is to be expected very rarely in association with viral infections, but because it is so topical this paper Clinical Characterisics of Coronavirus Disease 2019 in China, published in haste in the New England Journal of Medicine on 3rd March, 2020, was nevertheless presented.


A novel enveloped RNA virus of coronavirus type, similar to SARS coronavirus, was first identified as causing viral pneumonia in early December 2019 and named as Covid-19 disease. It is believed to have first been transmitted through livestock in a large market in Wuhan, Hubei province. It is thought in general that such viruses are endemic in wildlife, such as in bats, and mutate to become transmissible to other animals and to humans.

As of Friday 6th March, there were 100,645 confirmed cases worldwide, and 3411 deaths linked to the virus. There were 55,753 cases who had recovered. In Hubei province, for the first day since the outbreak no new cases had been reported.

The details are unclear, but the fact that the UK government is said to be moving from a containment to a delay phase suggests that at least some UK cases have been identified that appear to have had no contact with potential suffers in China, Iran, Italy or other hotspots, nor with other UK individuals known to have the disease.

Journal Club Article

The paper discussed is an early report focusing on numbers affected, initial outcomes and clinical presentation. It was approved by the Chinese authorities.

Data were sourced from records of laboratory confirmed cases using assay of nasal and pharyngeal swabs between 11th December 2019 and 29th January 2020. Certain hospitals were sampled, so by no means were data collected from all cases.  In all, 14.2% of all known hospitalised cases were included in the study. It is not clear how widespread was the screening of the population by these laboratory tests; all the patients in this study were hospitalised.

26% of these cases had not had contact with Wuhan residents, indicating widespread serial human to human transmission.

Clinical information is as follows:

  • Incubation period (presumably from ascertaining likely time of exposure was median 4 days (2 to 7 days interquartile).
  • Fever in only 44% on admission, but developed later.
  • Cough in 68%.
  • Viraemic symptoms occurred in some patients, but upper respiratory tract symptoms, lymphadenopathy and rash were very rare.
  • CT chest abnormalities were very common (86%) in both mild and severe cases.
  • Lymphopaenia was common (83%).
  • Only 1% of cases were under 15 years old.

Of these hospitalised cases, 926 were considered mild and 173 severe. The main factors predicting this were advanced age and comorbid disease (especially coronary heart disease, diabetes, COPD and hypertension), also breathlessness at 38% versus 15% (unsurprisingly as this would be a criterion for severity). Similarly, inflammatory markers and markers of multi-organ involvement were associated with more severe disease. The main complicating feature of severe cases was acute respiratory distress syndrome, occurring in 16%.

The outcomes were 25% risk in severe cases of intensive care admission, mechanical ventilation or death (8%). Only 0.1% of cases categorised as non-severe died. The overall death rate was 1.4%. The national statistics at the time had a death rate of 3.2%.

By the data cut-off point, 95% of mild cases and 89% of severe cases were still hospitalised; the median lengths of hospital stay were 11 and 13 days respectively. Perhaps mild cases were hospitalised for purposes of isolation.

Journal Club Discussion

The paper reports likely ascertainment bias from milder cases not being tested. Nevertheless, the scale of the morbidity and mortality of the disease is not underestimated. Ascertainment bias becomes more relevant if one expects a pandemic and most of the population to become exposed. By these means the population risk can be inferred.

The paper also reports the fact that many patients were still in hospital, and perhaps very unwell, by the study’s end point. In the study, the number of cases requiring intensive care treatment is three times the death rate. Perhaps the death rate of already infected cases may climb. On the other hand, ARDS, the major serious complication of coronavirus infection, has a mortality of around 40%, and since 16% had this condition and 8% died, perhaps few more would be expected to die.

There does appear to be an opportunity for more information to be gleaned from these data or similar studies. The large number of cases could be randomised to have treatments not clear to be effective, such as oseltamivir, steroids and intravenous immunoglobulin. Less than half of cases had these treatments, but nevertheless appreciable numbers. It would have been helpful to know the death rates for patients who did or did not have these treatments rather than only the end point rates, as in reality some of these treatments might be most relevant when patients have already reached the ITU admission end point.

A follow up study would give better indicators of important epidemiological issues such as ultimate death rates and morbidity, the possibility of reinfection versus lasting immunity and any signs that more recently infected cases, where transmission has been via several human hosts, have any milder disease than those directly exposed to the transmitting animals.

A population based study that tested all individuals in high risk areas would determine the likely proportion of individuals who have been infected but not become very symptomatic.

Worldwide, we would also want to know how ambient temperature and sunlight levels affect transmissibility.

One suspects that epidemiologists in charge of advising governments have more information than is released to the public, and various advanced tools to model infection spread, but from the recent explosion of cases in Italy and now elsewhere, where talk is of delay rather than containment, there is little confidence that the slowing up of cases in China is going to be replicated worldwide.

From the death rates reported in Italy, there appears to be no clear evidence that the disease is becoming milder, but from the delay of many days from exposure to developing critical illness, perhaps it is too early to tell.

The lack of cases in hot or southern hemisphere countries would suggest a seasonal effect of the virus, and some reassurance to northern hemisphere countries approaching Spring. But in Australia there were already 40 cases confirmed by 4th  March and at least three cases had had no recent foreign travel and no traceable contact.

It seems that one scenario for the UK is that the infection eventually replicates that of Hubei province, which has a similar population to the UK and had around 11,000 cases with few new cases to come, and with around a 1-3% mortality rate, mainly in the elderly and infirm for whom ‘flu’ is also a significant source of mortality. With around 20% of cases classed as severe, this would require an extra 2000 of some form of high dependency inpatient beds for several days and spread over only a month or two.

However, we do not have an explanation for the slowing of new infection rates in China. It could be that most of the local population has already been exposed and most were resistant to severe symptoms, or it could be that containment measures have been very effective. If the latter is the explanation and is in reality only delaying inevitable spread through the population, or if containment is not replicated to the same degree in Western countries and if there is no seasonal dip in transmission, one could imagine hundreds of thousands of cases in the UK spread over the next year. And with a current mortality rate seemingly up to 3% this is unlikely to drop when there are insufficient hospital resources to manage such numbers.

The paper on which this journal club article is based was presented by Dr Bina Patel, Specialist Registrar in Neurology at Queens Hospital, Romford.

Posted in Infectious Diseases | Tagged , , | Leave a comment

Anticonvulsant Medications for Status Epilepticus

Status epilepticus is a medical emergency with significant morbidity and mortality and, in circumstances where benzodiazepines alone have failed to terminate seizures, has traditionally been treated with anticonvulsants such as phenytoin or phenobarbitone. Other intravenously administered antiepileptics have also been found to be effective.

There is a lack of comparative data on different anticonvulsants and this blinded prospective study “Randomised Trial of Three Anticonvulsant Medications for Status Epilepticus” by Kapur et al. (2019) compares three options: fosphenytoin (a pro drug of phenytoin which is more expensive but more soluble and can be given intravenously faster with fewer extravasation problems and can also be given intramuscularly), valproate and levetiracetam.

Study Details

Patients in the study had to be over 2 years of age, and had to have convulsive status (persistent or recurrent convulsions) for at least 5 minutes, and then more convulsions between 5-30 minutes after an adequate dose of benzodiazepine (5 minutes to have allowed the benzodiazepines to work and less than 30 minutes, after which point another dose of benzodiazepines could have been tried instead). Patients were randomised by stratifying for age.

Patients with major trauma or anoxia, etc., were excluded, as were pregnant women (give levetiracetam and consider magnesium).

The doses of the intravenous anticonvulsants levetiracetam (60 mg/kg) and valproate (40 mg/kg) seemed very high.

The primary successful outcome was absence of clinical seizure activity and improved responsiveness at 60 min after infusion start.

Analysis was based on assuming equal prior probability of success for the three treatments, then using the binomial probability of positive or negative outcome to calculate the posterior probabilities. An iterative method was then used from these three separate probabilities to calculate the probability that a given treatment was better than the other two, or worse than the other two.

The sample size was set on the basis of correctly identifying with 90% probability a difference when one treatment was 15% better than the other two (65% response for the best and 50% response for the other two).

A total of 400 patients were enrolled. The intention to treat population was only 384 because some patients were enrolled more than once. Nearly a third of patients were then excluded because treatment did not follow the protocol, e.g. not status epilepticus such as functional seizures, did not receive the correct amount of benzodiazepine or anticonvulsant or wrong timing with respect to benzodiazepine.

Half the patients were unblinded to avoid suboptimal management.

In the per-protocol population, 47% of patients responded to each of the three treatments, with probability of most effective treatment distributed as follows: levetiracetam (0.34), fosphenytoin (0.35), valproate (0.31). There was also an “adjudicated population” outcome, which was perhaps based on an adjudicator clinician looking retrospectively at the notes, whether following the protocol or having had previous treatment or not, and deciding if the treatment worked. Although the data were similar, it did seem that levetiracetam may have been worse (0.51 versus 0.29 and 0.2) and clearly 0.51 is 31% worse than 0.2 (valproate), which is more than their threshold of meaningful difference of 15% for best treatment.

Secondary outcomes included requirement for admission to ICU (87% for levetiracetam and only 71% for valproate).

Regarding safety, there were 4.7% deaths in the levetiracetam group and 1.6% in the valproate group, with fosphenytoin in the middle. Hypotension, a known issue with phenytoin was 3.2% in the fosphenytion group to a life-threatening degree and only 0.7% for levetiractem and 1.6% for valproate. Cardiac arrhythmia only occurred in one patient. Acute respiratory depression occurred in 12.8 % with fosphenytoin and 8% with levetiracetam and valproate. None of these differences reached significance.

The conclusion was that there was no difference between the drugs.

Journal Club Discussion

The study was welcome as it was on an important practical topic. The group wondered about the high doses used, and whether our own guidelines should reflect these doses. The trial was powered for the primary efficacy outcome and then stopped. However it was always going to be as likely that any differences between the drugs wold lie in their side effects as in their efficacy and it is a shame that the powering did not reflect this so that what may have been real differences in respiratory depression or hypotension never reached significance.

The vagaries of statistics are illustrated by the per-protocol efficacies, which seem identical, and the adjudicator population efficacies, where there was actually a 31% greater chance of levetiracetam being the worst drug compared to valproate.

Negative study results always make us turn to how the study was powered: were there no differences seen because there are no differences, or because too few patients were studied (i.e. too low power)? When powering a study, a judgement must always be made on what level of difference would be considered meaningful, otherwise if accepting any difference as being meaningful it would require an infinite population to prove there is no difference. They chose a meaningful 15% difference for one drug being better than the other two, but if they had chosen one drug worse than the other two, the 31% difference in the adjudicator population would have been more than their set level. There should have been more explanation of their adjudicator population, and perhaps more explanation of the advantage of using Baysian probabilities in addition to a simple comparison of means and standard errors of success rates.

In real practice, there should perhaps be tailoring of treatment to the patient. If a patient is already on therapeutic levels of phenytoin, is more of the same going to be the best choice? If a patient is a female of child bearing potential, is valproate the best choice when the patient often ends up on the oral equivalent of the status treatment they received. On reviewing the data in this study and knowing that the levetiracetam dose was very high, valproate might shade the other two choices, especially in men.

The Journal Club on which this article is based was presented by Dr Katie Yoganathan, SpR in Neurology at Queens Hospital, Romford.

Posted in Epilepsy, Intensive Care Neurology | Tagged , , , , , , | Leave a comment

Galcanezumab in Chronic Migraine

Migraine is one of the most common neurological conditions, and chronic migraine is a condition that, while less common than episodic migraine, is nevertheless a major cause of loss of quality of life in otherwise well individuals.

Once analgesia headache has been effectively treated, and tension type headache excluded, chronic migraine is treated with migraine preventative medications, often very effectively. However there are a proportion of patients who remain resistant to single or combination preventative treatments.

A novel target for migraine treatment is the calcitonin gene related peptide CGRP receptor on the smooth muscle of blood vessels in the head. CGRP is released from trigeminal ganglion efferents to the blood vessels to cause potent vasodilation as part of the trigeminovascular response (analogous to the “triple response” of pain, redness and swelling of skin inflammation). Blocking this may therefore block this response. Monoclonal antibodies raised against the receptor, or against CGRP itself, have been explored as migraine treatments.

This study describes a double blind trial on galcanesumab, one such monoclonal antibody targeting CGRP. The paper does not discuss the relative hypothetical or actual benefits versus other monoclonal Ab migraine therapies already marketed or in development.

Study Design

Around 270 patients were given each of two doses of galcanezumab by monthly subcutaneous injection, and 560 were given normal saline placebo. To be enrolled on the study, patients had to have 15+ headache days per month, at least 8 of which had to be migraine days. They needed at least 1 headache free day per month. If a patient failed >3 other preventatives, they were excluded. Before the study, patients had to stop all their existing migraine preventatives except propranolol or topiramate at least 30 days before study start.

Migraine days were defined as >30 minutes of migraine or probable migraine according to ICHD-3 beta criteria (even though the duration criterion of the latter is 4+ hours). If a patient thought it was a migraine and it did not satisfy the criteria but responded to a triptan, that also counted as a migraine day.

Over 90% of patients completed the study. Only 15% of patients were on topiramate or propranolol (not specified if this was the same proportion in the three treatment groups).

The primary outcome measure was migraine days per month. At the start of treatment, this was around 19 days. Placebo reduced this by 2.7 days per month, low dose galcanezumab by 4.8 days and high dose by 4.6 days. Therefore, compared to placebo, the drug on average reduced migraine by 2 days per month. There were only about 2 extra non migraine headache days per month on average.

There were many secondary measures. Of note, 4.5% of placebo patients had a 75% reduction in migraine days, and 7% of low dose and 8.8% of high dose patients, while 0.5% of placebo patients had a 100% response, and 0.7% of low dose and 1.3 % of high dose patients (not significantly different).

There was no overall quality of life measure, but there was a migraine related quality of life measure that showed significantly more improvement, about 25% more improvement than placebo. There was a patient global disease severity 7 point scale, where there was a 0.6 point improvement from placebo, and 0.8 for low dose and 0.9 for high dose, only the latter reaching significance.

The side effect profiles were similar between placebo and drug, notably common in both groups! However, there were no concerning side effects, nor indeed any characteristic enough to tend to unblind the patients or investigators.


The Journal Club thought it was strange that the study would exclude the very patients in whom the drug would mainly be used, namely those who had failed >3 conventional treatments. The focus was clearly on maximising benefit as measured by the study. By the same token, patients had to stop any preventatives before the study, even if they were partially beneficial, apart from topiramate and propranolol.

It was furthermore strange that only 15% of the recruited patients were on the two most common treatments for chronic migraine. Had they only been tried on the others, or had they had side effects? In real practice, there are usually at least some marginal benefits from preventatives and patients often remain on them.

It is therefore possible that many patients were treatment naïve as far as preventatives were concerned. This makes the 2 fewer migraine days per month vs placebo (from an initial 19 days per month) an all the more modest magnitude of benefit.

It is difficult to reconcile the cost of the drug with the fact that patients on average will still have 15 migraine days a month. Most patients would not consider this a treatment success, and certainly not such that a patient would happily be discharged from specialist care. In terms of patients having a 75%+ reduction in migraine days, generally the minimum level of meaningful benefit in a pain study, the excess over placebo was only 3-4% of patients.

The lack of a general quality of life measure means that cost benefit analysis cannot be performed. The quality of life measure used was specific for migraine and likely to show much larger differences; a cured migraine sufferer might have a near 0% to 100% swing on this scale, but another individual considering the range from death to total disability to perfect health might assign curing migraine only a swing from 90% to 100%.

A major aspect of migraine care is what happens when treatment is stopped. Patients do not want lifelong medication, let alone lifelong monthly injections. Fortunately we find that after six months of treatment, traditional preventatives can often be withdrawn. Although the study mentioned that there was an open label period and then a wash out period, we do not know any of these results; presumably they are to be held back for another publication. Is there rebound migraine on treatment withdrawal? Any funding body would want to know if the patients would likely need the treatment for 3-6 months or for many years.

As a final point, it was queried whether the definition of migraine is sufficiently specific; perhaps this limits the observed benefit in this and similar studies. Some headaches recorded as migraine may be tension type headache and therefore not responsive to specific anti-migraine treatment. The table below shows the relevant criteria.

ICHD-3 Headache Diagnostic Criteria

Probable Migraine Probable Tension Type Headache Definite Tension Type headache
2+ of: 2+ of: All of:
4-72 hours duration 30 min to 7 days duration 30 min to 7 days duration
2+ of:



Moderate+ severity,

Avoid routine physical activity

2+ of:


Pressing or tightening

Moderate- severity

Not aggravated by routine activity

2+ of:


Pressing or tightening

Moderate- severity

Not aggravated by routine activity

Nausea or

Photo plus phonophobia

No nausea

Not both phono and photophobia

No nausea

Not both phono and photophobia


A headache is diagnosed as a migraine if fits probable migraine and is not a better fit with another headache diagnosis, which presumably means definite rather than probable tension type headache. The severities and durations overlap so they cannot distinguish. One of photophobia or phonophobia overlaps. So a unilateral, pressing headache with avoidance of routine activity with no nausea no photophobia and no phonophobia  is classified as migraine as long as it lasts 4 hours, but it seemed that some of the migraine days were half an hour of headache. Also a headache not satisfying these criteria is a migraine if there is a response to triptans, but we have seen the large placebo response already from the main data. In general practice a tension type headache might be unilateral, and might interfere with routine activity if at the more severe end of the scale; certainly a neck ache or jaw (including temporalis muscle) ache from which a tension headache may arise may have these features.

The paper on which this Journal Club article is based was presented by Dr Piriyankan Ananthavarathan, Specialist Registrar in Neurology at Barking, Havering and Redbridge University Hospitals Trust.

Posted in Migraine | Tagged , , | Leave a comment

Disease Modifying Therapies in Multiple Sclerosis: Background for General Readers

Multiple sclerosis (MS) is a presumed autoimmune condition of demyelination and often inflammation of the central nervous system. Its evolution is very variable; some patients suffer episodes lasting weeks to months with complete or near complete recovery in between, and the periods between episodes may span months to decades (relapsing remitting MS). Other patients accumulate progressive disability as a result of or between episodes (secondary progressive MS). Still other patients, around 10% in total, do not suffer episodes but instead undergo a gradually progressive course with variable rapidity, but usually noticeable over the course of months to years (primary progressive MS). Patients with MS can evolve from one category to another; some in fact at a certain point remain clinically stable indefinitely.

For many decades, its immune basis has prompted trials of various immunomodulatory agents to try and reverse or at least arrest the progression of multiple sclerosis. Some have been shown not to work, e.g corticosteroids, immunoglobulin. Some work but have largely been overtaken by newer, more expensive, therapies. For example, azathioprine is a traditional commonly used immunosuppressant and in a Cochrane review was found to reduce relapses by around 20% each year for three years of therapy, and to reduce disease progression in secondary progressive disease by 44% (though with wide confidence intervals of 7-64%). There were the expected side effects but no increased risk of malignancy. However it remains possible that there could be a cumulative risk of malignancy for treatment durations above ten years. In the 1990s, beta-interferon became widely used but was never compared directly with azathioprine. With the 21st century came the introduction of “biological therapies”, typically monoclonal antibodies against specific immune system antigen targets. There has also been a reintroduction of non-biological therapies originally used to treat haematological malignancy or to prevent organ transplant rejection.

These new therapies, called disease modifying therapies, as opposed to symptomatic treatments or short courses of steroids for relapses, are now conceptually, though not biochemically or mechanistically, divided into two groups: those better tolerated or with fewer risks of causing malignancy or infections but less effective, and those with more risk of cancer and serious infection, including reactivation of the JC virus to cause fatal progressive multifocal leukoencephalopathy, but with greater efficacy.

The former group includes beta-interferons, glatirimer acetate and fingolimod. Fingolimod is an agent derived, like ciclosporin, from fungal toxins that parasitise insects and has the convenience of oral administration, but is now not routinely recommended because of severe relapses on withdrawal, and cardiac and infection risks.  The latter group includes the biological agents natalizumab (which targets a cell adhesion molecule on lymphocytes), rituximab and ocrelizumab (which target CD20 to deplete B-cells) and alemtuzimab (which targets CD52 expressed on more mature B and T cells) and the oral non-biological anti-tumour agent cladribine which blocks deoxycytidine kinase and thus interferes with DNA synthesis. Another  non biological oral agent, dimethyl fumarate, acts as an immunomodulatory rather than immunosuppressive agent and sits somewhere between the two groups, having oral administration convenience and better efficacy than the first group, but also possessing the increased PML and Fanconi renal syndrome risk of the second group.

Recent studies indicate that higher strength DMTs may slow disability progression in secondary progressive MS, as well as reduce the number of relapses. There have also been trials in primary progressive MS but these, most notably using rituximab, were not clearly positive. For a study looking at ocrelizumab on primary progressive MS, see the accompanying Journal Club review.


Cost of Disease Modifying Therapies

The disease modifying therapies are extremely expensive and, given MS is unfortunately not a rare disease, have a significant impact upon the health economy.

For example, in relation to the accompanying paper review of ocrelizumab for primary progressive MS, this drug is not really expensive compared to similar medications, having a list price of £4790 per 300 mg vial, with four infusions a year. There are many further costs associated with imaging, screening, monitoring and admission for infusions.

Normally, cost effectiveness is justified at around £35,000 per Quality of Life Adjusted Year (QUALY). This means the cost would be justified at £35,000 a year if each year it gave patients 100% quality of life who would otherwise die or have zero quality of life. Clearly ocrelizumab does not do that; it appears to preserve at least 0.5 or 1 out of 10 on a disability scale in 6% of patients on an ongoing basis, giving a quality of life per patient benefit of very roughly 0.6% and a QUALY estimate of over £3 million. Of course, there are other considerations such as wider health economy costs of disability, the fact that some patients might have been prevented from deteriorating by more than 1 point on the EDSS, and the potential costs of monitoring for and treating cancer and PML complications in a relatively young patient population even after treatment is stopped. Note that there was actually no significant difference in this study in the SF 36, with both groups remaining surprisingly little changed after about 2 years, which probably fits with the 0.6% mean improvement figure calculated above.

If the NHS, or the health economies of other countries, do not consider a tighter subset of primary progressive patients who might respond better, it is difficult to balance this with other medical, or indeed social care, conditions that require resourcing.

Posted in Inflammatory/ Auto-Immune Diseases, Primer Posts for General Readers | Tagged , | 1 Comment

Ocrelizumab versus Placebo in Primary Progressive Multiple Sclerosis

Recent studies indicate that higher strength disease modifying therapies (DMTs) may slow disability progression in secondary progressive multiple sclerosis (MS), as well as reduce the number of relapses. There have also been trials in primary progressive MS but these, most notably using rituximab, were not clearly positive. For a more general review, please see the post Disease modifying therapies in multiple sclerosis.

The study being reviewed in this post, by Montalban et al., 2019 is on rituximab’s sister compound, ocrelizumab, and targets younger patients with more active disease, which seemed to be a subgroup that might have responded to rituximab.

Study Design

There were 732 patients randomly assigned to ocrelizumab or placebo in a 2:1 ratio. Inclusion criteria were a diagnosis of primary progressive MS according to established criteria and age 18 to 55 years. Their disability had to range from moderate disability but still no walking impairment to impaired walking but able to walk 20m, perhaps with crutches (EDSS 3.0 to 6.5). The disease duration had to be within 10-15 years. They should never have had any relapses.

Pairs of ocrelizumab or placebo infusions were given every 24 weeks for at least five courses. The main end point was the % of patients with disability progression, defined as at least 1 point on the EDSS scale sustained for 12 weeks, or 0.5 points at the more disabled end of the scale.

Only if this primary end point was reached would the study be continued to test secondary end points such as 24 week sustained disability progression, timed walk at week 120, change in volume of MRI brain lesions, and change in quality of life on the SF36 score.


Patients had a mean disease duration of around 6 years, and 3% more patients having ocrelizumab had gadolinium enhancing lesions on MRI (27% versus 24%).

39.3% of placebo patients had increased disability sustained for a period of 12 weeks, and only 32.9% of ocrelizumab patients (p=0.03, relative risk reduction 24%). This was similar when confirming sustained disability over 24 weeks.

On the timed walk, there was a mean 39% slower performance after 120 weeks in patients on ocrelizumab and 55% slower in patients on placebo (p=0.04). There was no difference in quality of life (SF36 – physical component; a 0.7 out of 100 deterioration on ocrelizumab and 1.1 out of 100 on placebo).

There were three potentially relevant deaths in the ocrelizumab group (out of 486 patients), two from pneumonia and one from cancer, and none in the placebo group, but the overall rate of serious infections was not really different. Cancer rate was 2.3 % versus 0.8%, but obviously this would have to be monitored over further decades. Even during one year of open label extension there were two further cancers in the ocrelizumab group. The overall rate of neoplasms to date is 0.4% per 100 patient years, double the baseline rate, but this reflects a short time in a large number of patients.

In summary, a modest reduction in disability was seen on ocrelizumab, namely preserving against 0.5 to 1 point loss on the EDSS scale in 6 % of patients.



We focused mainly on the figure (see below) where it seems that ocrelizumab stopped about 5% of patients deteriorating in the first 12 to 24 weeks, from about 9% down to 4%, and then this difference was maintained throughout until the end of the trial where about 60% of patients still had not deteriorated. The plateau at 3-4 years is probably because of the end of the trial (see below), not a stable MS population.


The journal club were surprised at the focus on a 12 week primary end point. Patients would have progressed from zero to 3-6 out of 10 on the EDSS scale over a mean period of 6 years, yet they were measuring progression of 0.5 to 1 point over just three months. This is because there was some confusion over the phrase in the paper describing the primary end point as “percentage of patients with disability progression confirmed at 12 weeks”, and then in the results “percentage of patients with 12-week confirmed disability progression (primary end point) was 32.9% with ocrelizumab versus 39.3% with placebo.” It might seem that the primary end point was recorded at 12 weeks following treatment initiation. In fact the primary end point was recorded at the end of the study stopped after over 2 years when a prior defined proportion of patients had deteriorated. It means that over 2+ years, 32.9% of patients had a deterioration that was sustained over at least 12 weeks, i.e. not a relapse.

On the graph, it shows the numbers of patients remaining without disability at different times, starting at 487 and dropping to 462 at 12 weeks for ocrelizumab, which is 5.1% of patients and 244 to 232 for placebo which is 4.9%. Then at 24 weeks, this was 7.6% versus 13.1%. Some of the dropouts might be due to stopping from tolerability, but this was a small amount, possibly accounting from the small numbers of drop-outs between assessments every 12 weeks. For a 12 week confirmed disability progression, clearly there will be a lag in identifying patients whose increase in disability is sustained for 12 weeks. It seems that the time points do not add this 12 weeks because there is a first jump at 12 weeks in both groups. However, these numbers drop down to zero, not to the 60% of patients that appear not to have dropped out. This is likely to be because of patients dropping out because they started the study later and the study was terminated for them before 216 weeks. Nevertheless, factors such as drop outs due to tolerability and end of study probably explain the difference between the figures in the results and the plateau levels on the graphs.

What is interesting is that the difference between ocrelizumab and placebo diverged very early on the graph, and not really further over 2 years. While the 12-week sustained disability was designed to eliminate the possibility that the study is scoring relapses in previously primary progressive disease, or some other temporary factor such as injury from a fall or intercurrent infection, there is nevertheless a suspicion that ocrelizumab was mainly working well on a small subset with more active disease. The extra 3% with gadolinium enhanced lesions – a proportional difference of about 12% – unfortunately suggests a potential issue with randomisation; this might precisely be the group who could respond better.

It is noteworthy therefore that in its most recent NICE appraisal, the criteria for considering ocrelizumab are not those in this study, but a subset of primary progressive patients with enhancing disease on MRI imaging.

The journal club article described in this post was kindly presented by Dr Bina Patel, Specialist Registrar in Neurology.

Posted in Inflammatory/ Auto-Immune Diseases | Tagged , , , , | 1 Comment

Detection of Brain Activation in Vegetative State by Standard Electroencephalography

EEG title pageThis paper by Claassen et al., 2019 looks at EEG pattern changes in response to verbally given movement commands to see if there is a subset of vegetative state patients who are cognitively responsive and yet who have no motor response. The hope is that this might predict eventual outcome.

The study took 104 patients who had had acute brain injury. Most (85%) had non traumatic brain injury, which in general carries a more predictably bad prognosis. These patients were either in a vegetative state or in a somewhat better minimally responsive state, e.g. localising to pain but not obeying commands.

The EEG testing was performed within a few days of initial ITU referral.

In a trial, a patient was asked eight times to open and close their hand repeatedly for 10 seconds and then relax their hand for 10 seconds while recording ongoing EEG activity. Two second time blocks were analysed in the frequency domain by calculating the power spectral density (PSD), looking at the relative strength of signal in each EEG lead in four different frequency ranges (delta, theta, alpha and beta).

A “machine learning algorithm” was used to distinguish the “move” PSDs from the “stop moving” PSDs.

Patients were considered to show EEG activation if the algorithm consistently showed a significantly greater than chance (p=0.5) level of ability to distinguish moving command to stop moving command.

Outcome was determined by the standard Glasgow Outcome Scale after 12 months, with values >=4 (being able to be left up to 8 hours alone) defined as a good outcome.

Ultimately, patients who had at least one record showing EEG activation had a 44% chance of good outcome as defined above and only 14% of patients without EEG activation had a good outcome (with 5% missing data).


Some of the patients were under some sedation for safety reasons, which could influence their responsive in a more reversible manner unrelated to their brain injury and also affect their EEG, although this would be unlikely to affect the change in pattern of EEG over several seconds, other than through the patient’s genuine response level.

It might have been worthwhile to record surface EMG of the forearm flexors, just to confirm there was no difference in EMG activity between “EEG activation” patients and those with no EEG change. In a patient with critical illness neuromyopathy, a little movement or muscle activation might not easily be seen.

Because patients were just taken consecutively, rather than being matched according to their coma severity, there could be poor matching and this was indeed present, where the patients who were subsequently found to be “EEG responsive”, and eventually to have a better outcome, were less likely to be in the worst comatose category at initial enrollment (50% vs 55%) and more likely to be in the best category (31% versus 23%). Although the odds ratios were not statistically significant, this does not mean that with any degree of confidence there was positive evidence for no difference in initial severity between the groups.

In fact, if one stratified patients according to the initial three clinical severity categories, would that have more powerfully predicted better outcome than “EEG responsive” or not, making the test redundant?

On technical appraisal of the methodology, it seems that the power spectral densities were individual 2-second blocks, with all the comparisons and averaging being done subsequently by the machine learning pattern recognition algorithm.

Statistically, the paper used the single value of the area under the curve (AOC) of the receiver operating characteristic (see below). This means that across a range of sensitivities (or true positives (TP), where the algorithm correctly decides that there is enough of a difference between the “move” and “stop moving” patterns), there is an opposing range of false positives (FP). How convex is the curve that describes this range relates to how good the test is. A value of 1 means perfect classification, 0.5 is just random (the straight diagonal in the figure below), and 0 means the pattern change is actually reliably identifying the stop pattern when it was supposed to identify the move pattern.

ROC curves - Receiver operating characteristic - Wikipedia

This is shown in their fig. 3 (below), which seems to show the AOC values for each of the 5 “move” 2-second samples (hence the varying level across each peak and trough) followed by each of the 5 “stop moving” samples, with the whole thing repeated 8 times. However, they say that the graph is shown “for descriptive purposes only” so we do not know how it relates to the real data! We do not know if these are actual averages for all the controls, all the EEG responsive patients (which they call cognitive motor dissociation (CMD)) and all the non EEG responsive patients. If they are averages, they would have to be across all the first 2-second epochs and then all the second 2-second epochs, etc.

EEG pic

Where this is important is that although the algorithm provides a discrete yes-no answer, the confidence of this answer is a continuous variable, and there is a suspicion that this confidence level may fall into a continuous range with healthy volunteers at one end and the most unresponsive EEG patient at the other, rather than there being three discrete modal peaks of normal, EEG responsive and EEG unresponsive. If the former, the inevitable variability about a single mode makes the test far less useful as a predictor of outcome in individual patients. At best, it could be an independent predictor that, combined with other predictors, could build up a reasonably confident prognosis.

A major issue with patients in a vegetative state is when to withdraw support. In the UK, in patients with non traumatic acute brain injury, persistent vegetative state is defined as such around 3 months after injury and this is the time when conversations may be had along these lines on the basis that if the patient has not “woken” by this time, the chance they may eventually do so, with a reasonable quality of life, becomes remotely slim. No-one is ever going to think about withdrawing support at 6 days post-injury on the basis of an “EEG unresponsive” result.

This Journal Club post was presented by Dr Rubika Balendra, Specialist Registrar in Neurology at Barking Havering and Redbridge University Hospitals NHS Trust.


Posted in Intensive Care Neurology | Tagged , , | Leave a comment

Double-Blind Double-Dummy Randomised Study of Continuous Intrajejunal infusion of Levodopa-Carbidopa Intestinal Gel in Advanced Parkinson’s Disease

duodopa olanowBackground

Levodopa, a pro-drug of dopamine, has been used successfully to treat symptoms of Parkinson’s disease for fifty years and remains the mainstay of medical management. However after years of treatment, with increasing loss of dopaminergic presynaptic terminals, symptomatic control may become more brittle, with sudden and unpredictable “on” and “off” treatment times during the day, or with involuntary movements called dyskinesia. There are theoretical reasons, and some animal model and clinical evidence, why intermittent oral delivery of  levodopa may increase susceptibility to these problems through unphysiological wide fluctuations in synaptic dopamine; unfortunately the plasma half life of levodopa after an oral dose is as little as an hour. As a result, other long acting medicines have been introduced, but they may come with other side effects and are simply not as powerful as levodopa.

Relatively steady state levels of levodopa can be achieved by direct intra jejunal delivery. Unfortunately, levodopa is not stable in solution and the gel used to keep levodopa in suspension in a form that can be delivered is very expensive to produce. A year’s treatment in the UK was estimated by NHS England in 2015 to cost around £28000. As a result, despite there being now substantial evidence of the treatment’s effectiveness, there has been a debate about the treatment’s cost effectiveness. Calculations of the cost effectiveness in terms of cost per quality of life adjusted years (QALY) gained vary considerably. The calculations depend not only on the cost of treatment versus standard treatment and the difference in quality of life, but also the carer costs and other costs. So if a treatment is less effective, the patient may be more disabled and cost more. It is unclear, however, how figures on cost of disability can be applied to an estimate of how less effective the treatment is at all points of the severity scale. As far as I am aware there is no actual study showing how much is saved in non medication costs in patients on levodopa-carbidopa intestinal gel (LCIG); the information is instead extrapolated.

In one sense, the QALY gain might be counted twice; once for the intrinsic value of the gain in quality of life, and again for the reduction in disability that resulted in the improved quality of life. In another, this might be a fair way to handle such analysis compared to a treatment that improved quality of life without reducing disability cost.

It is important in such calculations to use reliable data on the magnitude of benefit gained, rather than just to show that there is a gain. This is likely to be achieved by a randomised controlled study with a control arm and is exemplified by the study of Olanow et. al., the subject of this journal club.

Study Design

Sixty six of sixty eight candidate patients underwent the trial. Patients were selected on the basis of having IPD for five or more years, having optimised therapy (meaning a trial of levodopa, a dopamine agonist and one other type of anti-parkinsonian therapy), at least three hours of “off” daily, and no clinically significant psychiatric abnormalities.

At first, assumed that the trial was a cross-over design; in fact it was not. Patients all had jejunostomy procedures but were randomised to LCIG plus placebo oral levodopa, or placebo LCIG plus oral levodopa. They were assessed after a four week stabilisation period before intervention, and then 12 weeks afterwards. Then the two groups were compared.

Patients who were on CR preparations or COMT inhibitors were switched to equivalent immediate release preparations. The LCIG dose was the same as the total daily levodopa dose, delivered over 16 hours of the waking day in the normal fashion for jejunal delivery.

Study Findings

On looking at the graph, labelled figure 2B in the MS, it is immediately obvious that both LCIG patients and oral patients improved very dramatically and then leveled off, despite previously being “optimised” on oral therapy. Our possible suspicions about what “optimised” means are confirmed. As explained by the authors, the doctors had the opportunity to increase the LCIG or oral levodopa during the study, and this was done in a number of cases after the 4 week stabilisation period. In fact the oral medication patients had their medication dose increased more (a mean of 250 mg daily versus 100 mg daily). Despite this, neither group had an increased on time with troublesome dyskinesia.

duodopa olanow2

The main message of the study is that after the 12 weeks, the improvement was greater with LCIG, with a mean of around 1.9 less “off” time and 1.8 hours more “on” time without troublesome dyskinesia. I suppose if there is no change in “on” time with dyskinesia, it is obvious that the two values will be similar as one state is replaced by the other.

Regarding quality of life, there was an 11 point versus 4 point improvement in PDQ-39 (a PD quality of life measure. This seems quite important.

Strangely, on the UPDRS there was an improvement in part II (activities of daily living) on LCIG and a worsening on oral, but actually twice as much improvement in part III (motor examination measured in the on state) on oral therapy. Possibly this means that there a subtle side effect of oral therapy, increased during the trial, that adversely affects wellbeing, but the increased “hit” of levodopa made their best on state better than with LCIG.


It is not clear how the withdrawal of COMT inhibitors made patients in either treatment arm suboptimally treated  and therefore needing increased treatment during the study. It would be important to ascertain if by chance the oral arm had had more COMT inhibitors withdrawn.

The main advantage of this study is that having the control arm at least allows us to appreciate that optimised does not really mean optimised. The patients were clearly underdosed; one has to wonder how much better the oral patients could have been if there was the opportunity to optimise them properly by adjusting top up dopamine agonists, adjusting the frequency rather than just the dose quantities and by introducing, reintroducing or optimising COMT inhibition. After all, studies on COMT inhibitors show reduction in on time by about an hour compared to baseline “optimised” therapy.

A parsimonious interpretation of the data is that LCIG simply has better bioavailability than oral; the patients were underdosed and switching to LCIG Is simply stronger and could be replicated by giving more oral treatment. In fact this may well have been the case, explaining the 150 mg more levodopa per day given to oral patients, but the facility for being able to change doses meant its effect would be minimised in this study.

While the power of the study was easily enough to demonstrate a clinically meaningful difference, I wonder if a cross-over design might have allowed intra-patient comparisons and a more clear effect, and eliminated or elucidated the improvement effect from oral therapy. In this design, each patient would have placebo LCIG for half the time, and placebo oral for the other half. The direction of change at the cross over point would be the key parameter. The patients’ doses would be matched at this cross over point, and then not changed over the second half. This design would be confounded by a bioavailability effect, but at least could be measured by the increase in oral dosing during the first half, and there might be an overdose effect of switching from oral to LCIG during the second half of the trial.

Studies looking at the cost effectiveness of LCIG should primarily take data from those like this one, rather than those that use an open label design showing an improvement compared to baseline “optimised” therapy of four hours “off” time reduction. The increased benefit in PDQ shown in this study is nevertheless quite persuasive that there is some real helpful feature of continuous intrajejunal delivery, at least in the short term.

There are other studies that show long term benefits of LCIG but they have not had the same design. Obviously, this design conducted over too long a period would not be ethical; presumably the principle is that all patients after 12 weeks would be offered LCIG, having already had their PEJ tubes inserted. On the other hand, in a longer term study, one would hope that every ongoing effort would be made to optimise therapy in the oral therapy group.

In practice, one must balance benefit versus side effects. Not all patients will want a PEJ tube, or to carry a large cartridge and pump. Virtually all patients had side effects, more serious ones in 13-20%. In 3% the treatment was discontinued as a result of surgical complications, 24% had tube dislocations, 21% insertion complications, 10% stoma complications, 8% pump malfunctions and 7% peritoneal problems. There are reports of neuropathy from LCIG but in this study there were three possible cases in the placebo group and only one in the treatment group.

Finally, LCIG is not the only advanced therapy available. There are no direct comparisons between LCGIG and deep brain stimulation or apomorphine pump therapy to guide as to which treatment to select in individual patients, although the different inclusion and exclusion criteria do provide some help in choosing which therapy is appropriate for which patient. For example, age over 70 and history of depression exclude deep brain stimulation but not LCIG.

Posted in Parkinson's Disease | Tagged , , , | Leave a comment