Inhaled Levodopa


Levodopa was the first main, and arguably still the best, treatment for Parkinson’s disease. Being similar in structure to tyrosine, it is absorbed through the gut by an amino acid transporter – its half life in the blood is short, namely 0.7 to 1.4 hours. This has advantages in terms of avoiding peripheral dopaminergic effects and dopa decarboxylase inhibitors further reduce this latter effect.

The fact that the clinical action of levodopa is far longer than its half life is presumably due to presynaptic storage and control of release as dopamine and so it is no wonder that, as the disease progresses and there is increasing loss of the remaining presynaptic terminals, PD control from oral levodopa becomes more brittle and the clinician resorts to frequent dosing, longer acting agonists, delayed release preparations and even continuous intrajejunal delivery.

This review article, Profile of inhaled levodopa and its potential in the treatment of Parkinson’s disease, describes a further new delivery method, namely inhaled levodopa. This avoids the variability of gastric emptying and competing with the amino acids of food for absorption.

CVT301 inhaled levodopa is a complex of molecules in a large low density porous particle that can aerosolise and avoid phagocytic destruction in the lungs. The particles are readily absorbed through the alveolar membranes. The Arcus inhaler works by delivering the particles in dry powder form and requires the patient only to breathe in rather than also to activate a pump.

Phase I study

In a Phase I study, following safety trials in animals, plasma levodopa concentrations rose within 10 minutes of inhalation (they also took oral carbidopa) by healthy volunteers.

Phase II Study

Ini this study, patients with Parkinson’s disease and significant off periods had a median T max (time to max concentration ) of 15 minutes vs 66 minutes for oral, and clinical improvement (finger tapping) within 5 minutes and lasting 90-100 minutes.

Side effect of cough occurred in 25% which often settled after initial dose and was described as mild. There was less dyskinesia giving 50 mg than for oral 100 mg – but the study did not quantify the relative benefits of finger tapping tasks, so perhaps the oral was simply a bigger dose with more benefit and more side effects.

There were no differences in lung function between oral and inhaled groups after 4 weeks.

Phase III Study

There was a 12 week double blind, placebo controlled study in 339 patients with motor fluctuations randomised to placebo, 35 mg dose and 50 mg dose. Chronic lung disease was a contraindication.

Mean change on motor UPDRS at 12 weeks from pre treatment versus 30 minutes after treatment was -9.83 versus -5.91 for placebo (p = 0.009). There was no reduction in off time according to patient diaries; 85% of subjects completed the study. There were side effects of cough (14.5% versus1.9% placebo), URTI (6.1% v 2.7%), nausea (5.3% v 2.7%), sputum discolouration (5.3% v 0%), and dyskinesia (3.5% v 0%). Only two subjects discontinued due to cough. No statistically significant differences in FEV1 or diffusion capacity of lungs were found (but is this evidence for no effect on the lungs rather than no evidence for effect?).

Long-term studies

In a longer term open label study of 408 subjects, the treatment arm used a mean of 2.3 doses a day. After dosings at four weeks into the study, they experienced a mean UPDRS III change of -5.7 at 10 min, -12.0 at 20 min, -15.5 at 30 min and -16.1 at 60 min at 4 weeks and at 60 weeks the equivalent changes were -5.0, -11.5, -15.3, -14.8.

In another study, FEV1 and diffusion of carbon monoxide were not statistically different (!). Dyskinesia was worse; 5.5% versus 3.1% in the group with standard of care. Two of I think 100 subjects discontinued due to cough.

In a study of other subjects having morning off, subjects getting 50 mg inhaled levodopa came on in 25 mins vs 35 on placebo, and an on event was 35% more likely than after placebo.

The C max of the higher strength (50 mg) is 500 ng/ml while 100 mg oral achieves 700-1000 ng/ml. As the authors comment, there are no head to head comparisons of inhaled versus oral.

The authors conclude that inhaled levodopa is an option for rescue therapy.

Journal Club discussion

The data demonstrate only biological efficacy, and it does seem fast acting. There are not data on the quantitative effect versus oral levodopa or even the speed of action versus oral levodopa, dispersible levodopa or subcutaneous apomorphine. There appears to be no carbidopa or benserazide, so patients are going to be needing to take oral levodopa at around the same time or at least have had a decent amount fairly recently.

On a google search in March 2023, the cost is $1223 for 60 capsules of what is stated to be 45 mg but I think this is equivalent to 35 mg in the study. This might seem about a 1 month supply but no – one dose is two capsules. Patients had an average of 2 doses a day and the company specifies a maximum of 5 doses a day or ten capsules which is $204 a day!

In comparison, oral levodopa is $13 for 90 100 mg tablets. The 100 mg appears to be up to double the strength of the higher strength inhaled, which we presume is 2 capsules. So the cost of inhaled levodopa is over 500x greater than standard treatment. It does seem almost provocative not to attempt a comparison with standard care so much cheaper. We can surmise that perhaps inhaled might work about 15 minutes more quickly. The magnitude of benefit is demonstrated biologically but there is no comparison with oral. Measurements were made when the patients were rendered off to maximise the benefit rather than in a real world scenario dealing with unpredictable sudden off. Would the plasma concentrations as little as half that of 100 mg of oral really rescue such patients?

Regarding safety there are one year data. No worsened lung function was demonstrated but this does not mean there is evidence of no side effects unless the study was powered and the type B error quantified. We have many examples of preliminary data not showing the dangers revealed subsequently on post- marketing surveillance.

Posted in Parkinson's Disease, Uncategorized | Tagged , , | Leave a comment

Revolution in acute ischaemic stroke care: a practical guide to mechanical thrombectomy


Stroke is the most common cause of disability in Western Countries, and its lifetime risk is up to 25%. While managing acute stroke patients in hyperacute stroke units is regarded as having benefits for short and long term outcome, specific therapeutic options are limited. The first major option for treatment of ischaemic stroke was intravenous thrombolysis, paralleling its previous development in acute myocardial infarction. However, while use in the latter indication was widespread in the 1990’s, it has only been widely for stroke in the last ten years. This is probably because of the narrower therapeutic window and the more severe consequences of haemorrhagic complications in the brain. In addition, its benefits are actually relatively modest. In randomised clinical trials on use within three hours (bearing in mind that in the first hour a stroke often spontaneously recovers – termed a TIA), using NICE data (June 2007) 15% of patients have an improved outcome, but with a brain haemorrhage risk of 7% greater, and a major haemorrhage rate actually considerably worsening the stroke (>= 4 points on the National Institute of Health Stroke Severity Scale) of 1.7% . When delivered between 3 and 4.5 hours after stroke onset the benefits were not even clearly significant.

So it is not surprising that there has been a move, just like in cardiology a decade or two earlier, away from intravenous thrombolysis and towards direct intra-arterial catheter treatment. This article discusses this new treatment and the ramifications for delivery of such a service.

The paper, Revolution in acute ischaemic stroke care: a practical guide to mechanical thrombectomy, summarises recent evidence in favour of this treatment and the infrastructure required to manage patients in this way.

The Procedure

While the first such devices were approved for use in 2004, technical developments and the improved expertise that comes with experience show in recent studies published since 2010 on new generation devices that they yield major improvements in outcome. The HERMES collaboration meta-analysis revealed that 46% of patients had a good outcome with functional independence (grades 0-2 on the Modified Rankin scale) compared with 26.5% on best medical treatment. Most of the patients in both groups received intravenous thrombolysis, since in most study protocols patients had iv thrombolysis before going on to have thrombectomy an hour or so later. Mortality and the risk of brain haemorrhage did not differ between the two groups. The benefit seemed still to be present in patients over 80, and still present when patients did not receive iv thrombolysis, though the numbers in this case were smaller. While the window for thrombectomy was within 6 hours, there may still be improved outcomes up to 7.3 hours after symptom onset, but in generation faster intervention leads to greater benefit.

The procedure involves a number of variations depending on the Neuroradiologist and the particular nature of the thrombus. It may be done under general anaesthesia or local anaesthesia and sedation with anaesthetic support. A large gauge catheter is directed to the internal carotid via a femoral puncture, and an intermediate catheter inside it is directed to the Circle of Willis. Then a microcatheter inside that serves as a guide wire to the actual clot. The microcatheter is then removed and a stent retriever is placed within the clot, and pulled back to draw the clot to the intermediate catheter. Suction is applied to this catheter to remove the clot entirely. Some techniques involve directly removing the clot by suction on the intermediate catheter. A balloon may be located on the distal end of the clot to prevent forward movement. When removing the clot reveals a tight lumen, there is the further option to perform angioplasty or stenting to open the vessel. The same can apply to a more proximal carotid stenosis occurring in tandem with the more distal thrombus.


The main complications are technical, including vessel perforation (1.6%), other symptomatic intrcranial haemorrhage (3-9%), subarachnoid haemorrhage (0.6 – 5%), arterial dissection (0.6 to 3.9%), or emboli distally (1-9%). In addition , there can be vasospasm or issues related to the puncture site. While the total incidence is 15%, not always is there any actual clinical adverse consequence.

  • While the time window for thrombectomy is wider than for intravenous treatment, there are other selection criteria that are more strict.
  • There should be a documented anterior circulation large vessel occlusion of the middle cerebral or carotid artery. (There is only limited evidence for efficacy in basilar occlusion.)
  • There should be good collateral cerebral circulation.
  • There should be relatively normal extracranial arterial anatomy from the technical viewpoint of passing the catheter.
  • There should be significant clinical deficit at the time of treatment, but this parallels the criteria for intravenous treatment and a large vessel occlusion with minimal clinical deficit nevertheless incurs a significant risk of clinical deterioration.
  • There should be a lack of extensive early ischaemic change on CT (according to ASPECTS score a threshold of 5). The role of more advanced imaging , eg CT perfusion to establish salvageable brain, is yet to be clarified.
  • Consideration should be given to pre-stroke functional status and the potential of benefit.
  • Patients should have had iv thrombolysis within 4.5 hours of symptom onset.
Posted in Uncategorized | Leave a comment

Testing for COVID-19 Infection

Accurate testing for SARS-CoV-2 infection, by which we mean testing with few false positives as well as few false negatives, is important not only for clinical management of individual cases but for epidemiological case tracing, limiting spread of infection and informing public health strategies. In the latter situations, tests that are rapid, cheap and easy to perform are particularly desirable.

Two main forms of testing for SARS-CoV-2 infection are in use.

  • Antigen testing, which includes the self administered immunochromatographic lateral flow test (LFT), detects viral coat material and is developed by raising a specific antibody against the antigen target. It therefore measures active production of the viral protein that constitutes the antigen. Developing such a test requires knowledge of how the viral behaves in the host and is reliant upon generating a sensitive and specific antibody to be used for the test.
  • Molecular tests, such as polymerase chain reaction (PCR), loop-mediated isothermal amplification (LAMP) and clustered regularly interspaced short palindromic repeats (CRISPR) tests amplify viral RNA material. These tests are potentially specific for different variant mutations of SARS-CoV-2. The gold standard test is considered to be lab based reverse transcription viral PCR (RTPCR). Rapid testing, such as the LAMP test, skips some of the time-consuming stages of formal PCR and therefore is useful for screening and epidemiology.

Many different studies have reported on the performance of different brands of rapid antigen and molecular test. This article discusses a Cochrane review of diagnostic test accuracy that collected data from 64 studies that investigated 16 different antigen tests and five different rapid molecular tests. In all there were 24087 samples. Of these, 7415 were found on the subsequent gold standard RTPCR test to be positive. No study actually included clinical diagnosis of infection as a standard or criterion of infection.

Antigen tests had a sensitivity in symptomatic cases of 72% (95% CI 63.7 to 79%). The sensitivity was higher in the first week of infection (78.3%) than in the second week (50%). The overall specificity was 99.6% (in both symptomatic and asymptomatic – obviously if symptomatic one wonders what they actually had if we are considering false positives.

Analysing a different way, the positive predictive value (PPV) in comparison was only 84-90% using the best sensitivity brands at 5% population prevalence and at 0.5% prevalence in asymptomatic people the PPV was only 11-28%!

Molecular tests had an average sensitivity of 73% (66.8 to 78.4%) and specificity was 97.2 to 99.7% depending on brand. In most trials the samples were collected in labs rather than in real life conditions. There are no data about symptoms.

For reference, WHO considers >80% sensitivity and >97% specificity an appropriate test as a replacement for a lab test.

The authors note that in low prevalence settings the dramatically reduced PPV value means that confirmatory tests may be required and that data are required for the settings in which tests are intended to be used as well as on user dependence (i.e self testing)

Journal Club Conclusions

Why the huge difference between positive predictive value and specificity?

Specificity = 1 minus false positive rate (FPR)

FPR = false positives/all negative cases (i.e. true negatives and false positives)

So high specificity means false positives is low compared to true negatives

PPV = true positives/all positive results (i.e. true positive and false positives

So high PPV means false positives low compared to true positives

The difference between specificity and PPV is essentially that sensitivity is in relation to the total number of actual cases while PPV is in relation to the total number of positive test results. PPV could be much worse than specificity if the true negatives much more common that true positives. This would happen if infection rates are very low in the population.

Is it disingenuous to quote specificity based on trials where infection rates were 31% when in the real world the infection rates are perhaps two orders of magnitude lower than that? If someone wanted to argue that they were being denied going to work, going to school or going on holiday based on a test where the predictive value was only 11-28%, would they have a good case? Is it worthless as a screening tool? If the policy is to go on to a molecular test if the result is positive, is this also invalid if the PPV of the molecular test is similarly low?

Clearly, the use of tests should be tailored to the information that can be provided. In an outbreak of high prevalence, one wants a sensitive test to pick up people who might have infection after contact tracing. One could have a specific screen and lab test only those negative to make completely sure. The priority is not to miss positive cases.

If medicolegally one wants to prove workers were not source of a case or outbreak, when the prevalence of infection is low, one may as well go straight to lab testing as there are too many false positives in such a situation.

In a situation where someone wants to question a positive result, it is not clear that rapid molecular testing is superior to antigen testing, and a lab based PCR may again be necessary. as specificity not clearly higher than some molecular tests.

There are also biological as well as statistical issues. For example, antigen tests may theoretically have false positives if the nature of the generated antibody to a viral coat antigen is not clear. If the trial was done in the summer time with no winter flu or coronavirus common colds in a setting where one in three subjects have COVID and none has any other type of infection, is the generated antibody really shown to be specific for SARS-CoV-2? The same may apply to demographic factors, such as expecting the test to have the same performance in children or in care homes as in a healthy adult test population.

On the other hand, a molecular test may correlate better with actual infectivity than a bit of residual RNA and therefore be more biologically useful for epidemiological control.

Finally, in extremely high prevalence trial populations where there is no actual clinical corroboration, the absolute reliance upon lab PCR as a gold standard may be of concern.

Posted in Infectious Diseases | Tagged , , , | Leave a comment

Distinguishing Encephalitis from Encephalopathy


Encephalitis may be defined as infection or inflammation of the brain substance, resulting typically in disturbed sensorium and perhaps seizures or focal neurological deficits and sometimes pyrexia. Encephalopathy in the other hand represents disturbed sensorium not due to an infective or inflammatory cerebral process and its causes range from toxins and drugs to metabolic upset, non cerebral sepsis, cerebral hypoperfusion and post-ictal states.

Distinguishing the two is important because considerable morbidity and mortality is associated with delayed treatment with appropriate antiviral, antibiotic or immune therapy for encephalitis and with delayed treatment of the various causes of other causes of encephalopathy.

The paper presented, “To what extent can clinical characteristics be used to distinguish encephalitis from encephalopathy of other causes? Results from a prospective observational study” by Else Quist-Paulsen et al., attempts to use clinical and rapidly available investigatory findings to distinguish the two conditions by a prospective observational study on 136 patients.

They identified candidate patients on the basis they had a lumbar puncture, and then excluded those with no evidence of encephalopathy. Their criteria for encephalitis were:

  • Pyrexia
  • Encephalopathy > 24 hours with no other cause identified and 2 of:
  • CSF WCC >=5 x 106/l
  • New onset seizures
  • New onset focal neurological findings
  • CT/MRI consistent with encephalitis
  • EEG consistent with encephalitis

The gold standard by which to gauge their test would surely be a definitive diagnosis but, as is commonly the case in clinically suspected encephalitis, such a diagnosis was only made in 10 of 19 patients. In some of the patients with non-encephalitis encephalopathy, the diagnosis was also vague, e.g. “aseptic meningitis” (which could be encephalitis), “epilepsy” (which could be autoimmune encephalitis), “headache/migraine”, “unspecified disorientation or coma”.

Subsequent analysis of specific features in the two groups then becomes somewhat difficult because the criteria themselves become the gold standard and because some specific features were in themselves their criteria. Interestingly, systemic features of infection such as raised blood white cells or CRP, argued against encephalitis because general sepsis was a common cause of encephalopathy. Nausea and personality change were more common in their encephalitis group.

They used ROC curves to look at the predictive value of these specific features and their combinations, but these were again based against their “testing variable”, their criteria, not on some objective gold standard. It would have been better to look at them only in the 10 diagnosed cases rather than all 19, but then the total number of cases would be even lower.

The diversity of diagnosis of their cases was interesting, especially that Lyme disease and TB were as common as VZV and more common than HSV. Only one of their cases had NMDA receptor antibodies, but we do not know that all the patients had this test and a full battery of other autoimmune antibody tests. Many might have been put in the encephalopathy with seizures category. Since encephalitis can be associated with meningism, some “aseptic meningitis” patients might have been viral but with negative testing, or even autoimmune with a migrainous headache and stiff neck.

The group felt that the study was very worthwhile but a more clear guide as to which cases of encephalitis warranted antimicrobial therapy  or immune therapy would be the clear goal. This would require clarity on the gold standard diagnosis and many more patients.

The Journal Club discussion on which this post is based was presented by Dr Aram Aslanyan, Specialist Registrar in Neurology at Queens Hospital, Romford, Essex.

Posted in Infectious Diseases, Inflammatory/ Auto-Immune Diseases | Tagged , | Leave a comment

Making a Differential Diagnosis using a Clinical Database


A great deal of time is spent in medicine reading and writing case reports. Essentially, clinical features are listed and a diagnosis made. Excluding those cases that point to a novel means of treatment, a case report is often noteworthy simply because the diagnosis is rare, or because the clinical features were most un-likely to be associated with the diagnosis. This hardly seems a reliable method of archiving medical knowledge.

Much less time is spent on attempting a method of diagnosis that is more systematic than the recalling of case reports. One can see that if one did wish to move medical diagnosis into the information age, natural instinct would be to use an internet search engine to enter a list of clinical features and see what disease diagnoses were associated with these terms. Unfortunately, internet search engines concern themselves only with the popularity of search terms and because of the dominance of case reports such practice may be likely to throw up the least likely cause of those features, or that which is most “titillating” to those who most perform internet searches.

There have been attempts to provide a more balanced means of linking clinical features with diseases and hence making clinical diagnoses. Rare disease with a large number of different clinical features are least easily diagnosed by clinical experience or key investigations, and so the focus of these attempts has been on rare genetic diseases using ever-expanding databases such as Orphanet, Online Mendelian inheritance in Man (OMIM) and the London Dysmorphology Database and the Pictures of Standard Syndromes and Undiagnosed Malformations (POSSUM).

One method of searching for clinical features on these databases is simple text matching. A way of quantifying the match is the feature vector method, which calculates the mathematical overlap between the Query (the clinical features of the case) and the Disease (the clinical features of the disease). A vector of the query is calculated with dimensions for each feature and a value of 1 if present and 0 if absent. The same is done for the disease. The dot product of the two vectors is the strength of the match (a 1 for both query and disease will sway the two vectors in a common direction, and a 0 for both will leave their relationship unchanged, while a 0 and a 1 will make one move away from the other).

A potentially better quantification of matching is to take into account the different specificities of different clinical features. If a clinical feature is present in only a few diseases, its annotation (the linkage of a clinical feature to a disease) is more specific for that disease (in database terms this is called the information content (IC)) and so that linkage should have more weighting. The IC is simply the negative log of the frequency of the annotation. For example, AV block is a term that annotates 3 diseases in the 4813-disease OMIM database. The frequency is 3/4813. Loge of this is -7.38 and minus loge of this is 7.38. A much more general term will have many annotations and a much lower negative log, tending towards zero. The ICs of all the clinical features of the query can be summed or otherwise combined to provide an overall match.

The authors of the presented paper have described a further refinement of this method. This is called the Ontology Similarity Search (OSS). Instead of simply matching the text of terms, they fit clinical features into a standardised language within an ontological framework. This means that the features are related to one another in a hierarchy, with more general terms higher in the hierarchy and more specific subcategories of those general terms lower in the hierarchy. While “parent” terms obviously have many “child” terms, child terms can also belong to multiple parent terms. For example, optic atrophy could be a child of demyelinating disease and also a child of visual disturbance. Their ontology is called the Human Phenotype Ontology (HPO) and has around 9000 terms.

The advantage of using the ontology is that if a clinical feature of a case does not fit the clinical features of the disease, but shares a parent term with one of the features of the disease, instead of scoring a zero match, this scores as a match but less so than if the match was with the specific terms. The method specifically find the most informative common ancestor of the two different clinical features, and uses the IC of that term. Being a more general term, it will be a feature of more diseases and so have a lower IC. (In the database, ancestor terms are implicitly annotated when child terms are annotated.) The overall strength of match is the average of all the ICs – there will always be a IC for each feature, even if it is just that they are both a feature of “any disease”, which of course has an IC of zero and would bring down the average.

Summary of the Paper

The presented paper, Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies by Köhler et al. (Am J Hum Genet. 2009 Oct 9; 85(4): 457–464), describes a further refinement of the method using a statistical treatment. For a given disease, if random clinical features from the HPO were selected one would expect a lower OSS score than for a patient who actually had the disease. If the OSS for random features were repeated many times, a distribution would be created and so one could then look at the real patient OSS and determine a p-value on this distribution. If the real OSS was higher than 95% of the random OSS scores, the p-value would be lower than 0.05 and indicate a likely match. Furthermore, if the same features were compared with different diseases and their random OSS distributions, a ranking of the likelihood of diseases could be determined by ranking the corresponding p-values. They call this the OSS –PV.

Since they considered it too onerous to enter, within the framework of the terms of the HPO, the clinical features of real patients with known diseases, they used simulated patients. This was done for 44 diseases, where they created a “patient” having a disease with a selection of the clinical features of the disease weighted by how commonly those features were found in that disease. For each disease 100 patients were created, so if from the clinical literature a feature is found in 1% of cases with the disease, 1 of the 100 simulated patients would have that feature.

They added “noise” to the process by adding to the patients some random features that were not part of the disease, and “imprecision” to the process by replacing some features with their HPO parent terms.

Then they looked at the rank position of the true disease among all the 5000 or so database diseases found by the different methods. The closer the rank position to the true position (first!), the better the method performed.

Unsurprisingly, the performance of the feature vector method, as shown by box plots of rankings for all 44 diseases tested, was found to suffer when imprecise terms were used, because that was the point of using the ontological system. The OSS-PV method more modestly outperformed the raw OSS method when noise and imprecision were added.

As the authors point out, the OSS method potentially suffers from the fact that it only matches query terms with disease terms. If a disease also had many terms that did not match the query terms, surely the overall match would be less specific. This can be taken into account by performing a symmetrical similarity search, where the OSS is the average of the matches of the query to the disease and the matches of the disease to the query. However, they did not use this method in their presented data, only stating that when they used it the symmetrical OSS-PV still significantly outperformed the feature-vector method. They do not state that it still outperforms the symmetrical raw OSS.

Another point raised by the paper is that if one finds on a disease search that no disease fits the features with a p-value less than 0.05, exploration could be made of other clinical features, or child features of the entered clinical features that would have a higher information content and provide a more significant match. Going back and looking for a specific feature, or performing a specific investigation, would be an example of this.

Journal Discussion

As described in the introduction, any attempt to quantify and rationalise differential diagnosis should be lauded and this paper clearly describes progressive refinements of this process. It is almost negligent to have all the data available on thousands of diseases and not to use them because the unaided human mind simply cannot store so much information.

However, a number of further refinements and limitations present themselves.

First, the matching of terms is still semantic rather that systematic. While a knowledge-based approach, it nevertheless does not rely on understanding of disease pathophysiologies and pathognomonic features. Some clinical features that share a close parent may in fact best distinguish diseases rather than be considered loosely positively associated features. This may apply particularly in neurology where there is a more systematic approach. For example, upper motoneurone lesion and lower motoneurone lesion may be considered together and share a common parent in “motor neurone lesion”, but apart from the case of motoneurone disease, they split the differential diagnosis more than upper motoneurone lesion and no motor lesion at all. They are semantically similar but nosologically opposite. Horizontal supranuclear gaze palsy and vertical supranuclear gaze palsy may share a strong information content parent, but may be the feature that best separates Gaucher disease from Nieman Pick disease.

This leads to the second point. The frequency, or sensitivity, of a clinical feature in a disease is not considered, although ironically considered when creating the simulated patients with the 44 tested diseases. In large part this reflects the lack of clinical data in the databases themselves. It is regrettable that case reports are not combined into case series which contain information on the frequencies of occurrence of clinical features, or when there are case series, these data are not actually collected systematically. If a clinical feature occurs in 1% of cases of one disease and 100% of cases of another disease, clearly the annotation of the feature for the second disease should be considered far stronger than for the first. Instead, because there are no such data, they are given equal weight; the weighting only considers whether or not the feature is also found in a number of other diseases, not how commonly it is found in those diseases.

There is no consideration of how common the disease is in the first place. While restricting themselves to rare and genetic diseases by definition, there can be a frustrating tendency for searches to throw up the least likely diagnosis. It is often the case in practice that the clinician does not know in advance that the patient has a rare genetic disease, and a diagnostic tool should be most useful to those with least intimate knowledge of the database. Thus, when entering the features dystonia, spastic hemiparesis and spastic dysarthria in a case of cerebral palsy, it comes as a surprise when the top diagnosis is cleft palate-lateral synechia syndrome.

Finally, the methods assume that clinical features are independent. In fact, many clinical features are strongly interdependent; they especially occur together. The association of the second feature is not really very additionally informative if the first is present. This problem would be common to most forms of differential diagnosis calculators, including those using Baysian methods, and could only be solved if there were data on the interdependence of clinical features in different diseases; currently it is hard to find even raw frequency data for most diseases.

The point that the authors raise about using their App to find features that would be more specific in making a diagnosis is an interesting one, and opens a new approach to diagnosis and refinement of the process of often expensive and sometimes risk-associated investigation. One could imagine the improvements in medical care that would arise from use of an App that gave a differential diagnosis based on initial clinical information and then showed the relative power of different investigations in narrowing that differential.

A further use of these methods would be in creating diagnostic criteria. While clinical practice is rightly focused on the most likely diagnosis in a patient, clinical research is focused on a group of patients where the diagnosis is certain, i.e. specificity at the expense of sensitivity. Currently, diagnostic criteria seem to be set largely by “workshops” – gatherings of the great and the good usually in an exotic location who draw up a list of features, create two categories of importance and then decide how many features are required for a “definite diagnosis”. Using a quantified method such as that described in this paper for every study patient and including only patients where the diagnosis reaches a threshold p-value score would seem to be a far more reliable method.

The paper on which this journal club article is based was presented by Dr John McAuley, Consultant and Honorary Senior Lecturer in Neurology at Queens Hospital, Romford.

Posted in Genetics | Tagged , , , | Leave a comment


Coronavirus is obviously not a neurological disease, apart from an isolated case report of encephalitis associated with the condition, which is to be expected very rarely in association with viral infections, but because it is so topical this paper Clinical Characterisics of Coronavirus Disease 2019 in China, published in haste in the New England Journal of Medicine on 3rd March, 2020, was nevertheless presented.


A novel enveloped RNA virus of coronavirus type, similar to SARS coronavirus, was first identified as causing viral pneumonia in early December 2019 and named as Covid-19 disease. It is believed to have first been transmitted through livestock in a large market in Wuhan, Hubei province. It is thought in general that such viruses are endemic in wildlife, such as in bats, and mutate to become transmissible to other animals and to humans.

As of Friday 6th March, there were 100,645 confirmed cases worldwide, and 3411 deaths linked to the virus. There were 55,753 cases who had recovered. In Hubei province, for the first day since the outbreak no new cases had been reported.

The details are unclear, but the fact that the UK government is said to be moving from a containment to a delay phase suggests that at least some UK cases have been identified that appear to have had no contact with potential suffers in China, Iran, Italy or other hotspots, nor with other UK individuals known to have the disease.

Journal Club Article

The paper discussed is an early report focusing on numbers affected, initial outcomes and clinical presentation. It was approved by the Chinese authorities.

Data were sourced from records of laboratory confirmed cases using assay of nasal and pharyngeal swabs between 11th December 2019 and 29th January 2020. Certain hospitals were sampled, so by no means were data collected from all cases.  In all, 14.2% of all known hospitalised cases were included in the study. It is not clear how widespread was the screening of the population by these laboratory tests; all the patients in this study were hospitalised.

26% of these cases had not had contact with Wuhan residents, indicating widespread serial human to human transmission.

Clinical information is as follows:

  • Incubation period (presumably from ascertaining likely time of exposure was median 4 days (2 to 7 days interquartile).
  • Fever in only 44% on admission, but developed later.
  • Cough in 68%.
  • Viraemic symptoms occurred in some patients, but upper respiratory tract symptoms, lymphadenopathy and rash were very rare.
  • CT chest abnormalities were very common (86%) in both mild and severe cases.
  • Lymphopaenia was common (83%).
  • Only 1% of cases were under 15 years old.

Of these hospitalised cases, 926 were considered mild and 173 severe. The main factors predicting this were advanced age and comorbid disease (especially coronary heart disease, diabetes, COPD and hypertension), also breathlessness at 38% versus 15% (unsurprisingly as this would be a criterion for severity). Similarly, inflammatory markers and markers of multi-organ involvement were associated with more severe disease. The main complicating feature of severe cases was acute respiratory distress syndrome, occurring in 16%.

The outcomes were 25% risk in severe cases of intensive care admission, mechanical ventilation or death (8%). Only 0.1% of cases categorised as non-severe died. The overall death rate was 1.4%. The national statistics at the time had a death rate of 3.2%.

By the data cut-off point, 95% of mild cases and 89% of severe cases were still hospitalised; the median lengths of hospital stay were 11 and 13 days respectively. Perhaps mild cases were hospitalised for purposes of isolation.

Journal Club Discussion

The paper reports likely ascertainment bias from milder cases not being tested. Nevertheless, the scale of the morbidity and mortality of the disease is not underestimated. Ascertainment bias becomes more relevant if one expects a pandemic and most of the population to become exposed. By these means the population risk can be inferred.

The paper also reports the fact that many patients were still in hospital, and perhaps very unwell, by the study’s end point. In the study, the number of cases requiring intensive care treatment is three times the death rate. Perhaps the death rate of already infected cases may climb. On the other hand, ARDS, the major serious complication of coronavirus infection, has a mortality of around 40%, and since 16% had this condition and 8% died, perhaps few more would be expected to die.

There does appear to be an opportunity for more information to be gleaned from these data or similar studies. The large number of cases could be randomised to have treatments not clear to be effective, such as oseltamivir, steroids and intravenous immunoglobulin. Less than half of cases had these treatments, but nevertheless appreciable numbers. It would have been helpful to know the death rates for patients who did or did not have these treatments rather than only the end point rates, as in reality some of these treatments might be most relevant when patients have already reached the ITU admission end point.

A follow up study would give better indicators of important epidemiological issues such as ultimate death rates and morbidity, the possibility of reinfection versus lasting immunity and any signs that more recently infected cases, where transmission has been via several human hosts, have any milder disease than those directly exposed to the transmitting animals.

A population based study that tested all individuals in high risk areas would determine the likely proportion of individuals who have been infected but not become very symptomatic.

Worldwide, we would also want to know how ambient temperature and sunlight levels affect transmissibility.

One suspects that epidemiologists in charge of advising governments have more information than is released to the public, and various advanced tools to model infection spread, but from the recent explosion of cases in Italy and now elsewhere, where talk is of delay rather than containment, there is little confidence that the slowing up of cases in China is going to be replicated worldwide.

From the death rates reported in Italy, there appears to be no clear evidence that the disease is becoming milder, but from the delay of many days from exposure to developing critical illness, perhaps it is too early to tell.

The lack of cases in hot or southern hemisphere countries would suggest a seasonal effect of the virus, and some reassurance to northern hemisphere countries approaching Spring. But in Australia there were already 40 cases confirmed by 4th  March and at least three cases had had no recent foreign travel and no traceable contact.

It seems that one scenario for the UK is that the infection eventually replicates that of Hubei province, which has a similar population to the UK and had around 11,000 cases with few new cases to come, and with around a 1-3% mortality rate, mainly in the elderly and infirm for whom ‘flu’ is also a significant source of mortality. With around 20% of cases classed as severe, this would require an extra 2000 of some form of high dependency inpatient beds for several days and spread over only a month or two.

However, we do not have an explanation for the slowing of new infection rates in China. It could be that most of the local population has already been exposed and most were resistant to severe symptoms, or it could be that containment measures have been very effective. If the latter is the explanation and is in reality only delaying inevitable spread through the population, or if containment is not replicated to the same degree in Western countries and if there is no seasonal dip in transmission, one could imagine hundreds of thousands of cases in the UK spread over the next year. And with a current mortality rate seemingly up to 3% this is unlikely to drop when there are insufficient hospital resources to manage such numbers.

The paper on which this journal club article is based was presented by Dr Bina Patel, Specialist Registrar in Neurology at Queens Hospital, Romford.

Posted in Infectious Diseases | Tagged , , | Leave a comment

Anticonvulsant Medications for Status Epilepticus

Status epilepticus is a medical emergency with significant morbidity and mortality and, in circumstances where benzodiazepines alone have failed to terminate seizures, has traditionally been treated with anticonvulsants such as phenytoin or phenobarbitone. Other intravenously administered antiepileptics have also been found to be effective.

There is a lack of comparative data on different anticonvulsants and this blinded prospective study “Randomised Trial of Three Anticonvulsant Medications for Status Epilepticus” by Kapur et al. (2019) compares three options: fosphenytoin (a pro drug of phenytoin which is more expensive but more soluble and can be given intravenously faster with fewer extravasation problems and can also be given intramuscularly), valproate and levetiracetam.

Study Details

Patients in the study had to be over 2 years of age, and had to have convulsive status (persistent or recurrent convulsions) for at least 5 minutes, and then more convulsions between 5-30 minutes after an adequate dose of benzodiazepine (5 minutes to have allowed the benzodiazepines to work and less than 30 minutes, after which point another dose of benzodiazepines could have been tried instead). Patients were randomised by stratifying for age.

Patients with major trauma or anoxia, etc., were excluded, as were pregnant women (give levetiracetam and consider magnesium).

The doses of the intravenous anticonvulsants levetiracetam (60 mg/kg) and valproate (40 mg/kg) seemed very high.

The primary successful outcome was absence of clinical seizure activity and improved responsiveness at 60 min after infusion start.

Analysis was based on assuming equal prior probability of success for the three treatments, then using the binomial probability of positive or negative outcome to calculate the posterior probabilities. An iterative method was then used from these three separate probabilities to calculate the probability that a given treatment was better than the other two, or worse than the other two.

The sample size was set on the basis of correctly identifying with 90% probability a difference when one treatment was 15% better than the other two (65% response for the best and 50% response for the other two).

A total of 400 patients were enrolled. The intention to treat population was only 384 because some patients were enrolled more than once. Nearly a third of patients were then excluded because treatment did not follow the protocol, e.g. not status epilepticus such as functional seizures, did not receive the correct amount of benzodiazepine or anticonvulsant or wrong timing with respect to benzodiazepine.

Half the patients were unblinded to avoid suboptimal management.

In the per-protocol population, 47% of patients responded to each of the three treatments, with probability of most effective treatment distributed as follows: levetiracetam (0.34), fosphenytoin (0.35), valproate (0.31). There was also an “adjudicated population” outcome, which was perhaps based on an adjudicator clinician looking retrospectively at the notes, whether following the protocol or having had previous treatment or not, and deciding if the treatment worked. Although the data were similar, it did seem that levetiracetam may have been worse (0.51 versus 0.29 and 0.2) and clearly 0.51 is 31% worse than 0.2 (valproate), which is more than their threshold of meaningful difference of 15% for best treatment.

Secondary outcomes included requirement for admission to ICU (87% for levetiracetam and only 71% for valproate).

Regarding safety, there were 4.7% deaths in the levetiracetam group and 1.6% in the valproate group, with fosphenytoin in the middle. Hypotension, a known issue with phenytoin was 3.2% in the fosphenytion group to a life-threatening degree and only 0.7% for levetiractem and 1.6% for valproate. Cardiac arrhythmia only occurred in one patient. Acute respiratory depression occurred in 12.8 % with fosphenytoin and 8% with levetiracetam and valproate. None of these differences reached significance.

The conclusion was that there was no difference between the drugs.

Journal Club Discussion

The study was welcome as it was on an important practical topic. The group wondered about the high doses used, and whether our own guidelines should reflect these doses. The trial was powered for the primary efficacy outcome and then stopped. However it was always going to be as likely that any differences between the drugs wold lie in their side effects as in their efficacy and it is a shame that the powering did not reflect this so that what may have been real differences in respiratory depression or hypotension never reached significance.

The vagaries of statistics are illustrated by the per-protocol efficacies, which seem identical, and the adjudicator population efficacies, where there was actually a 31% greater chance of levetiracetam being the worst drug compared to valproate.

Negative study results always make us turn to how the study was powered: were there no differences seen because there are no differences, or because too few patients were studied (i.e. too low power)? When powering a study, a judgement must always be made on what level of difference would be considered meaningful, otherwise if accepting any difference as being meaningful it would require an infinite population to prove there is no difference. They chose a meaningful 15% difference for one drug being better than the other two, but if they had chosen one drug worse than the other two, the 31% difference in the adjudicator population would have been more than their set level. There should have been more explanation of their adjudicator population, and perhaps more explanation of the advantage of using Baysian probabilities in addition to a simple comparison of means and standard errors of success rates.

In real practice, there should perhaps be tailoring of treatment to the patient. If a patient is already on therapeutic levels of phenytoin, is more of the same going to be the best choice? If a patient is a female of child bearing potential, is valproate the best choice when the patient often ends up on the oral equivalent of the status treatment they received. On reviewing the data in this study and knowing that the levetiracetam dose was very high, valproate might shade the other two choices, especially in men.

The Journal Club on which this article is based was presented by Dr Katie Yoganathan, SpR in Neurology at Queens Hospital, Romford.

Posted in Epilepsy, Intensive Care Neurology | Tagged , , , , , , | Leave a comment

Galcanezumab in Chronic Migraine

Migraine is one of the most common neurological conditions, and chronic migraine is a condition that, while less common than episodic migraine, is nevertheless a major cause of loss of quality of life in otherwise well individuals.

Once analgesia headache has been effectively treated, and tension type headache excluded, chronic migraine is treated with migraine preventative medications, often very effectively. However there are a proportion of patients who remain resistant to single or combination preventative treatments.

A novel target for migraine treatment is the calcitonin gene related peptide CGRP receptor on the smooth muscle of blood vessels in the head. CGRP is released from trigeminal ganglion efferents to the blood vessels to cause potent vasodilation as part of the trigeminovascular response (analogous to the “triple response” of pain, redness and swelling of skin inflammation). Blocking this may therefore block this response. Monoclonal antibodies raised against the receptor, or against CGRP itself, have been explored as migraine treatments.

This study describes a double blind trial on galcanesumab, one such monoclonal antibody targeting CGRP. The paper does not discuss the relative hypothetical or actual benefits versus other monoclonal Ab migraine therapies already marketed or in development.

Study Design

Around 270 patients were given each of two doses of galcanezumab by monthly subcutaneous injection, and 560 were given normal saline placebo. To be enrolled on the study, patients had to have 15+ headache days per month, at least 8 of which had to be migraine days. They needed at least 1 headache free day per month. If a patient failed >3 other preventatives, they were excluded. Before the study, patients had to stop all their existing migraine preventatives except propranolol or topiramate at least 30 days before study start.

Migraine days were defined as >30 minutes of migraine or probable migraine according to ICHD-3 beta criteria (even though the duration criterion of the latter is 4+ hours). If a patient thought it was a migraine and it did not satisfy the criteria but responded to a triptan, that also counted as a migraine day.

Over 90% of patients completed the study. Only 15% of patients were on topiramate or propranolol (not specified if this was the same proportion in the three treatment groups).

The primary outcome measure was migraine days per month. At the start of treatment, this was around 19 days. Placebo reduced this by 2.7 days per month, low dose galcanezumab by 4.8 days and high dose by 4.6 days. Therefore, compared to placebo, the drug on average reduced migraine by 2 days per month. There were only about 2 extra non migraine headache days per month on average.

There were many secondary measures. Of note, 4.5% of placebo patients had a 75% reduction in migraine days, and 7% of low dose and 8.8% of high dose patients, while 0.5% of placebo patients had a 100% response, and 0.7% of low dose and 1.3 % of high dose patients (not significantly different).

There was no overall quality of life measure, but there was a migraine related quality of life measure that showed significantly more improvement, about 25% more improvement than placebo. There was a patient global disease severity 7 point scale, where there was a 0.6 point improvement from placebo, and 0.8 for low dose and 0.9 for high dose, only the latter reaching significance.

The side effect profiles were similar between placebo and drug, notably common in both groups! However, there were no concerning side effects, nor indeed any characteristic enough to tend to unblind the patients or investigators.


The Journal Club thought it was strange that the study would exclude the very patients in whom the drug would mainly be used, namely those who had failed >3 conventional treatments. The focus was clearly on maximising benefit as measured by the study. By the same token, patients had to stop any preventatives before the study, even if they were partially beneficial, apart from topiramate and propranolol.

It was furthermore strange that only 15% of the recruited patients were on the two most common treatments for chronic migraine. Had they only been tried on the others, or had they had side effects? In real practice, there are usually at least some marginal benefits from preventatives and patients often remain on them.

It is therefore possible that many patients were treatment naïve as far as preventatives were concerned. This makes the 2 fewer migraine days per month vs placebo (from an initial 19 days per month) an all the more modest magnitude of benefit.

It is difficult to reconcile the cost of the drug with the fact that patients on average will still have 15 migraine days a month. Most patients would not consider this a treatment success, and certainly not such that a patient would happily be discharged from specialist care. In terms of patients having a 75%+ reduction in migraine days, generally the minimum level of meaningful benefit in a pain study, the excess over placebo was only 3-4% of patients.

The lack of a general quality of life measure means that cost benefit analysis cannot be performed. The quality of life measure used was specific for migraine and likely to show much larger differences; a cured migraine sufferer might have a near 0% to 100% swing on this scale, but another individual considering the range from death to total disability to perfect health might assign curing migraine only a swing from 90% to 100%.

A major aspect of migraine care is what happens when treatment is stopped. Patients do not want lifelong medication, let alone lifelong monthly injections. Fortunately we find that after six months of treatment, traditional preventatives can often be withdrawn. Although the study mentioned that there was an open label period and then a wash out period, we do not know any of these results; presumably they are to be held back for another publication. Is there rebound migraine on treatment withdrawal? Any funding body would want to know if the patients would likely need the treatment for 3-6 months or for many years.

As a final point, it was queried whether the definition of migraine is sufficiently specific; perhaps this limits the observed benefit in this and similar studies. Some headaches recorded as migraine may be tension type headache and therefore not responsive to specific anti-migraine treatment. The table below shows the relevant criteria.

ICHD-3 Headache Diagnostic Criteria

Probable Migraine Probable Tension Type Headache Definite Tension Type headache
2+ of: 2+ of: All of:
4-72 hours duration 30 min to 7 days duration 30 min to 7 days duration
2+ of:



Moderate+ severity,

Avoid routine physical activity

2+ of:


Pressing or tightening

Moderate- severity

Not aggravated by routine activity

2+ of:


Pressing or tightening

Moderate- severity

Not aggravated by routine activity

Nausea or

Photo plus phonophobia

No nausea

Not both phono and photophobia

No nausea

Not both phono and photophobia


A headache is diagnosed as a migraine if fits probable migraine and is not a better fit with another headache diagnosis, which presumably means definite rather than probable tension type headache. The severities and durations overlap so they cannot distinguish. One of photophobia or phonophobia overlaps. So a unilateral, pressing headache with avoidance of routine activity with no nausea no photophobia and no phonophobia  is classified as migraine as long as it lasts 4 hours, but it seemed that some of the migraine days were half an hour of headache. Also a headache not satisfying these criteria is a migraine if there is a response to triptans, but we have seen the large placebo response already from the main data. In general practice a tension type headache might be unilateral, and might interfere with routine activity if at the more severe end of the scale; certainly a neck ache or jaw (including temporalis muscle) ache from which a tension headache may arise may have these features.

The paper on which this Journal Club article is based was presented by Dr Piriyankan Ananthavarathan, Specialist Registrar in Neurology at Barking, Havering and Redbridge University Hospitals Trust.

Posted in Migraine | Tagged , , | Leave a comment

Disease Modifying Therapies in Multiple Sclerosis: Background for General Readers

Multiple sclerosis (MS) is a presumed autoimmune condition of demyelination and often inflammation of the central nervous system. Its evolution is very variable; some patients suffer episodes lasting weeks to months with complete or near complete recovery in between, and the periods between episodes may span months to decades (relapsing remitting MS). Other patients accumulate progressive disability as a result of or between episodes (secondary progressive MS). Still other patients, around 10% in total, do not suffer episodes but instead undergo a gradually progressive course with variable rapidity, but usually noticeable over the course of months to years (primary progressive MS). Patients with MS can evolve from one category to another; some in fact at a certain point remain clinically stable indefinitely.

For many decades, its immune basis has prompted trials of various immunomodulatory agents to try and reverse or at least arrest the progression of multiple sclerosis. Some have been shown not to work, e.g corticosteroids, immunoglobulin. Some work but have largely been overtaken by newer, more expensive, therapies. For example, azathioprine is a traditional commonly used immunosuppressant and in a Cochrane review was found to reduce relapses by around 20% each year for three years of therapy, and to reduce disease progression in secondary progressive disease by 44% (though with wide confidence intervals of 7-64%). There were the expected side effects but no increased risk of malignancy. However it remains possible that there could be a cumulative risk of malignancy for treatment durations above ten years. In the 1990s, beta-interferon became widely used but was never compared directly with azathioprine. With the 21st century came the introduction of “biological therapies”, typically monoclonal antibodies against specific immune system antigen targets. There has also been a reintroduction of non-biological therapies originally used to treat haematological malignancy or to prevent organ transplant rejection.

These new therapies, called disease modifying therapies, as opposed to symptomatic treatments or short courses of steroids for relapses, are now conceptually, though not biochemically or mechanistically, divided into two groups: those better tolerated or with fewer risks of causing malignancy or infections but less effective, and those with more risk of cancer and serious infection, including reactivation of the JC virus to cause fatal progressive multifocal leukoencephalopathy, but with greater efficacy.

The former group includes beta-interferons, glatirimer acetate and fingolimod. Fingolimod is an agent derived, like ciclosporin, from fungal toxins that parasitise insects and has the convenience of oral administration, but is now not routinely recommended because of severe relapses on withdrawal, and cardiac and infection risks.  The latter group includes the biological agents natalizumab (which targets a cell adhesion molecule on lymphocytes), rituximab and ocrelizumab (which target CD20 to deplete B-cells) and alemtuzimab (which targets CD52 expressed on more mature B and T cells) and the oral non-biological anti-tumour agent cladribine which blocks deoxycytidine kinase and thus interferes with DNA synthesis. Another  non biological oral agent, dimethyl fumarate, acts as an immunomodulatory rather than immunosuppressive agent and sits somewhere between the two groups, having oral administration convenience and better efficacy than the first group, but also possessing the increased PML and Fanconi renal syndrome risk of the second group.

Recent studies indicate that higher strength DMTs may slow disability progression in secondary progressive MS, as well as reduce the number of relapses. There have also been trials in primary progressive MS but these, most notably using rituximab, were not clearly positive. For a study looking at ocrelizumab on primary progressive MS, see the accompanying Journal Club review.


Cost of Disease Modifying Therapies

The disease modifying therapies are extremely expensive and, given MS is unfortunately not a rare disease, have a significant impact upon the health economy.

For example, in relation to the accompanying paper review of ocrelizumab for primary progressive MS, this drug is not really expensive compared to similar medications, having a list price of £4790 per 300 mg vial, with four infusions a year. There are many further costs associated with imaging, screening, monitoring and admission for infusions.

Normally, cost effectiveness is justified at around £35,000 per Quality of Life Adjusted Year (QUALY). This means the cost would be justified at £35,000 a year if each year it gave patients 100% quality of life who would otherwise die or have zero quality of life. Clearly ocrelizumab does not do that; it appears to preserve at least 0.5 or 1 out of 10 on a disability scale in 6% of patients on an ongoing basis, giving a quality of life per patient benefit of very roughly 0.6% and a QUALY estimate of over £3 million. Of course, there are other considerations such as wider health economy costs of disability, the fact that some patients might have been prevented from deteriorating by more than 1 point on the EDSS, and the potential costs of monitoring for and treating cancer and PML complications in a relatively young patient population even after treatment is stopped. Note that there was actually no significant difference in this study in the SF 36, with both groups remaining surprisingly little changed after about 2 years, which probably fits with the 0.6% mean improvement figure calculated above.

If the NHS, or the health economies of other countries, do not consider a tighter subset of primary progressive patients who might respond better, it is difficult to balance this with other medical, or indeed social care, conditions that require resourcing.

Posted in Inflammatory/ Auto-Immune Diseases, Primer Posts for General Readers | Tagged , | 1 Comment

Ocrelizumab versus Placebo in Primary Progressive Multiple Sclerosis

Recent studies indicate that higher strength disease modifying therapies (DMTs) may slow disability progression in secondary progressive multiple sclerosis (MS), as well as reduce the number of relapses. There have also been trials in primary progressive MS but these, most notably using rituximab, were not clearly positive. For a more general review, please see the post Disease modifying therapies in multiple sclerosis.

The study being reviewed in this post, by Montalban et al., 2019 is on rituximab’s sister compound, ocrelizumab, and targets younger patients with more active disease, which seemed to be a subgroup that might have responded to rituximab.

Study Design

There were 732 patients randomly assigned to ocrelizumab or placebo in a 2:1 ratio. Inclusion criteria were a diagnosis of primary progressive MS according to established criteria and age 18 to 55 years. Their disability had to range from moderate disability but still no walking impairment to impaired walking but able to walk 20m, perhaps with crutches (EDSS 3.0 to 6.5). The disease duration had to be within 10-15 years. They should never have had any relapses.

Pairs of ocrelizumab or placebo infusions were given every 24 weeks for at least five courses. The main end point was the % of patients with disability progression, defined as at least 1 point on the EDSS scale sustained for 12 weeks, or 0.5 points at the more disabled end of the scale.

Only if this primary end point was reached would the study be continued to test secondary end points such as 24 week sustained disability progression, timed walk at week 120, change in volume of MRI brain lesions, and change in quality of life on the SF36 score.


Patients had a mean disease duration of around 6 years, and 3% more patients having ocrelizumab had gadolinium enhancing lesions on MRI (27% versus 24%).

39.3% of placebo patients had increased disability sustained for a period of 12 weeks, and only 32.9% of ocrelizumab patients (p=0.03, relative risk reduction 24%). This was similar when confirming sustained disability over 24 weeks.

On the timed walk, there was a mean 39% slower performance after 120 weeks in patients on ocrelizumab and 55% slower in patients on placebo (p=0.04). There was no difference in quality of life (SF36 – physical component; a 0.7 out of 100 deterioration on ocrelizumab and 1.1 out of 100 on placebo).

There were three potentially relevant deaths in the ocrelizumab group (out of 486 patients), two from pneumonia and one from cancer, and none in the placebo group, but the overall rate of serious infections was not really different. Cancer rate was 2.3 % versus 0.8%, but obviously this would have to be monitored over further decades. Even during one year of open label extension there were two further cancers in the ocrelizumab group. The overall rate of neoplasms to date is 0.4% per 100 patient years, double the baseline rate, but this reflects a short time in a large number of patients.

In summary, a modest reduction in disability was seen on ocrelizumab, namely preserving against 0.5 to 1 point loss on the EDSS scale in 6 % of patients.



We focused mainly on the figure (see below) where it seems that ocrelizumab stopped about 5% of patients deteriorating in the first 12 to 24 weeks, from about 9% down to 4%, and then this difference was maintained throughout until the end of the trial where about 60% of patients still had not deteriorated. The plateau at 3-4 years is probably because of the end of the trial (see below), not a stable MS population.


The journal club were surprised at the focus on a 12 week primary end point. Patients would have progressed from zero to 3-6 out of 10 on the EDSS scale over a mean period of 6 years, yet they were measuring progression of 0.5 to 1 point over just three months. This is because there was some confusion over the phrase in the paper describing the primary end point as “percentage of patients with disability progression confirmed at 12 weeks”, and then in the results “percentage of patients with 12-week confirmed disability progression (primary end point) was 32.9% with ocrelizumab versus 39.3% with placebo.” It might seem that the primary end point was recorded at 12 weeks following treatment initiation. In fact the primary end point was recorded at the end of the study stopped after over 2 years when a prior defined proportion of patients had deteriorated. It means that over 2+ years, 32.9% of patients had a deterioration that was sustained over at least 12 weeks, i.e. not a relapse.

On the graph, it shows the numbers of patients remaining without disability at different times, starting at 487 and dropping to 462 at 12 weeks for ocrelizumab, which is 5.1% of patients and 244 to 232 for placebo which is 4.9%. Then at 24 weeks, this was 7.6% versus 13.1%. Some of the dropouts might be due to stopping from tolerability, but this was a small amount, possibly accounting from the small numbers of drop-outs between assessments every 12 weeks. For a 12 week confirmed disability progression, clearly there will be a lag in identifying patients whose increase in disability is sustained for 12 weeks. It seems that the time points do not add this 12 weeks because there is a first jump at 12 weeks in both groups. However, these numbers drop down to zero, not to the 60% of patients that appear not to have dropped out. This is likely to be because of patients dropping out because they started the study later and the study was terminated for them before 216 weeks. Nevertheless, factors such as drop outs due to tolerability and end of study probably explain the difference between the figures in the results and the plateau levels on the graphs.

What is interesting is that the difference between ocrelizumab and placebo diverged very early on the graph, and not really further over 2 years. While the 12-week sustained disability was designed to eliminate the possibility that the study is scoring relapses in previously primary progressive disease, or some other temporary factor such as injury from a fall or intercurrent infection, there is nevertheless a suspicion that ocrelizumab was mainly working well on a small subset with more active disease. The extra 3% with gadolinium enhanced lesions – a proportional difference of about 12% – unfortunately suggests a potential issue with randomisation; this might precisely be the group who could respond better.

It is noteworthy therefore that in its most recent NICE appraisal, the criteria for considering ocrelizumab are not those in this study, but a subset of primary progressive patients with enhancing disease on MRI imaging.

The journal club article described in this post was kindly presented by Dr Bina Patel, Specialist Registrar in Neurology.

Posted in Inflammatory/ Auto-Immune Diseases | Tagged , , , , | 1 Comment