We started the dive at Xwejni Bay. We made our way out for about 20mins until we came to the drop-off. That first peek at the depths below is an exhilarating experience in itself. Then, we let ourselves fall by this submerged cliff until we reached the bottom.

From here, we proceeded westward along the cliff’s wall making our way to the Double Arch – a marvel of underwater geomorphology that is suitably described by its title.

We were greeted by a number of amberjacks that seemed very much unfazed by our presence – no doubt thanks to our peacefully silent gliding made possible by the use of bubble-free rebreathers.

Following this we proceeded with a gradual ascent and a leisurely trip back for a total runtime of 142 minutes. A dive to be repeated! A short video clip is found below.

]]>Firstly, a few words of note.

The configuration of the two rebreathers is different. To mention but two examples: the JJ uses an axial canister whereas the X-CCR uses a radial one, and the JJ is fitted with a DSV whereas the X-CCR uses a BOV. The available CE test data are for the units set up as per the above respective configurations.

The CE testing documentation for the JJ states that tests were carried out at a pitch angle of 74^{o} (with the exception of the testing of hydrostatic imbalance vs. pitch angle, of course, because in that case the pitch angle *needs* to be varied since that is precisely what is being tested. Tests of the X-CCR were presented at pitch angles of 0^{o} (i.e. “in trim”) and 90^{o} (i.e. vertically upright). The reader should note that in some cases, a perfect, direct comparison of the two units is not possible due to the different pitch angles at which data were collected for each respective unit. This should be borne in mind throughout.

Tests at 100m using trimix were carried out using a similar (but slightly different) mix for each unit. In the case of the JJ it was 11/65 whereas for the X-CCR a (slightly less dense) mix of 10/70 was used.

In each of the below figures, I have included the EN14143 threshold in red. You do not want to approach or (worse) exceed the threshold. Exceeding the established threshold compromises the ability to obtain CE certification. The further away the data lie from this threshold, the better.

**WORK OF BREATHING**

As the name implies, work of breathing denotes the amount of work required to breathe. When we draw in a breath, our muscles cause our lungs to expand, and if this process encounters any resistance, the work required to expand the lungs will be higher.

Work of breathing (WOB) is described via the simple equation: WOB=0.5+0.03×RMV, where RMV stands for Respiratory Minute Volume. The latter simply means the volume of gas inhaled (or exhaled) per minute. The higher the RMV, the higher the WOB.

Basing on the available data, both at 40m (using air) and 100m (using trimix), the WOB for the JJ and the X-CCR is virtually identical, with the JJ’s figures (at 74^{o}) running approximately between the X-CCR’s 0^{o} and 90^{o} tests.

*If* the performance between the two units were identical, this would kind of be expected, since 74^{o} sits between the other two angles (0^{o} and 90^{o}). However, it is very important to note that one cannot comment further basing on this plot; **a perfect comparison cannot be drawn from this plot since the pitch angles at which the tests for the two respective units were carried out are different**. **A more direct comparison can be made when considering the next test: hydrostatic imbalance.**

HYDROSTATIC IMBALANCE

The counterlungs (CLs) are rarely located at the exact same depth as the diver’s lungs; this means that there will be a difference between the ambient pressure acting upon the CLs and that acting upon the diver’s lungs. This results in the diver breathing at lower or higher volumes, which they will attempt to resist via muscle tensioning.

For example, in the case of a back-mounted configuration, the CLs lie atop the diver’s back, which means that when the diver is swimming in trim, the ambient pressure on the CLs will be lower than that acting upon the diver’s lungs, simply because the CLs are located higher up in the water column (i.e. they are positioned at a shallower depth). As a result, drawing in a breath will encounter some resistance (negative imbalance) such that breathing is performed at lower volume and inhalation feels difficult.

On the other hand, for a chest-mounted configuration, the CLs lie at greater depth, so the pressure on them is higher than on the diver’s lungs. In this case, the inverse occurs: drawing in a breath will feel easier (as it is easier for gas to move from higher pressure to lower pressure) and breathing is performed at higher volume, but exhalation feels more difficult (as this time round the diver is pushing gas out against a pressure gradient).

The hydrostatic imbalance test measures this difference in pressure, and the test is typically carried out with respect to the suprasternal notch. The CE test involves two test cases:

(1) Maintaining the same roll angle but varying the pitch angle.

(2) Maintaining the same pitch angle but varying the roll angle.

**Part 1: Varying Pitch Angle**

This test maintains a fixed roll angle of 0^{o} whilst varying the pitch angle.

**The JJ shows better performance than the X-CCR between about -45 ^{o} and 90^{o}**, i.e. in the range that is arguably most commonly encountered by divers. The JJ has a single restriction at -90

*In practical terms, this restriction is a moot point as (1) divers will rarely be diving inverted, with their head upside down, and (2) if they do, they have the possibility of using the flow-stop to shut off the ADV and use the diluent MAV to add diluent as required.

**Part 2: Varying Roll Angle**

This test maintains a fixed pitch angle of 0^{o} whilst varying the roll angle. Here, **throughout virtually the entire range of roll angles, ****the JJ performs better than the X-CCR**, but both units are within EN14143 specs.

** **

For the interested reader, note that there is more data available for the JJ, as their test was carried out at a large number of pitch and roll angles, i.e. a more continuous range of angles. These additional data points are shown on a graph that can be found on page 13 of the CE documentation of the JJ.

ELASTANCE

Elastance describes the resistance of an inflatable vessel of some sort (e.g. lungs) to expand upon applying a force. If it takes a lot of force to make it expand, then it has high elastance, and vice versa. Conversely, elastance denotes the tendency of said vessel to recoil back to its original volume when some force that had previously been acting upon it is removed.

**40m (Air)**

**At pitch angle=0 ^{o}, the X-CCR performs slightly better than the JJ (pitch angle=74^{o}).**

**100m (Trimix):**

**At pitch angle=0 ^{o}, the X-CCR performs slightly better than the JJ (pitch angle=74^{o}).**

PEAK TO PEAK RESPIRATORY PRESSURE

**40m (Air)**

**At both pitch angles of 0 ^{o} and 90^{o}, the X-CCR performs better than the JJ**.

**100m (Trimix)**

**At pitch angle=0 ^{o}, the X-CCR performs slightly better than the JJ (pitch angle=74^{o}).**

SOURCES

**JJ-CCR Test Results:
**http://jj-ccr.com/wp-content/uploads/2018/06/QQ-14-01561-JJ-CCR-June-2014-V1.pdf

**X-CCR Test Results:
**https://www.facebook.com/XCCRrebreather/photos/a.1244863105537034/1888105841212754/?type=3&theater

https://www.facebook.com/XCCRrebreather/photos/a.1244863105537034/1888106431212695/?type=3&theater

https://www.facebook.com/XCCRrebreather/photos/a.1244863105537034/1888106047879400/?type=3&theater

**Meg-CCR data:
**https://www.megccr.com/wp-content/uploads/2013/10/Importance-of-Testing-Standards.pdf

DISCLAIMER:

This post may not be free of error. Any use of this data for dive planning purposes is the sole responsibility of the reader, and can result in serious injury or death. The author assumes no responsibility for the use, be it for diving or any other purpose on the part of the reader, of any of the content presented on this website.

]]>

We took in the beauty of the site of the remains for a few minutes before returning the same way we came, and exited once again at the Inland Sea. We encountered some interesting marine life during this dive, including some big fish busy at their hunting game.

The rocks of the former Azure Window might be have lost their original glowing-white look when they had first submerged beneath the waves, but the relatively new landscape is as stunning as the first day it was formed.

]]>We departed in hot and sunny conditions -a typical Maltese August day. The sea was as inviting as it gets: a deep alluring blue that calms the soul.

On our way to the dive site we enjoyed tremendous views of Gozo’s majestic cliffs, glowing in morning sunlight.

After a final pre-dive check on the rebreathers, we jumped in the water for a 75min dive (max depth: 38m).

It was a beautiful dive. The underwater landscape at the site of the collapsed window is terrific – and whilst this was not my first dive here (indeed I had dived the site with friends very soon after the window’s collapse, and a few more times since then), it hasn’t failed to impress me yet again. This time round I could also spend more time at depth since I was diving a CCR, and since both entry and exit points were right above the site (it being a boat dive), we could spend all our time at the site with no swimming to a faraway exit point required. Still, one thing is for sure: next time we’ll be spending even more time here!

Upon surfacing we were welcomed by a complete change in weather conditions: stormy skies with thunder, lightning and rain.

We all agreed that we simply have to visit and dive this site again very soon.

]]>We spent a solid 90 minutes practising multiple skills and emergency drills; a most productive and valuable dive which will certainly be repeated in the near future!

]]>

]]>

A staple of discussion amongst divers is the question of which decompression algorithm they use. From Haldanean to RGBM, to Bühlmann + Gradient Factors (GFs) and VPM-B (and any proprietary combination you care to throw in between), the options are considerable.

One of the pertinent questions on the topic is the issue of whether to adopt:

(1) **A bubble modelling approach** (e.g. VPM-B) that inserts deep stops into one’s dive profile (‘Deep Stops Profile’ hereafter), or

(2) **A dissolved-gas model** (e.g. a Bühlmann algorithm) that cuts down on deep stops in favour of longer shallower stops (‘Shallow Stops Profile’ hereafter).

In this article I assume that the reader has some basic familiarity with the two classes of model in question, so I will not dwell on that here. I will only say that from a theoretical standpoint, there are arguments that would favour either approach. So how could we decide between these two options? In science, such a decision can only be taken on the basis of empiricism, i.e. experiment. In other words, carry out a survey of an appropriate sample of dives on profiles dictated by both approaches, and pick the one that results in a smaller probability of decompression sickness (DCS) cases following the choice of an appropriate statistical test. This was the motivation of a 2008 study by the U.S. Navy Experimental Diving Unit (NEDU), authored by David Doolette, Wayne A. Gerth and Keith A. Gault and made public in 2011.

Much has been discussed already about this study, particularly on online fora, sometimes leading to very heated back-and-forths. Part of the discussion revolved around the question of equivalence (or, as the opponents argued, lack thereof) between the deep stop profiles adopted in the NEDU study and the actual deep stops profiles followed by divers in real-world situations; the arguments raised were answered there. **This article has nothing to do with that matter.**

**In this post, I wish to focus on** a different question altogether, namely **the interpretation of aspects of the NEDU study, specifically the statistical significance of the probability of DCS (for the two approaches) presented by the study**. I am referring in particular to Figure 2 of that report, which is being reproduced below, and which shows the DCS incidence for the two approaches:

Subsequent sections of the NEDU report include an investigation of Venous Gas Emboli (VGE) grades and a theoretical discussion of the reported results on DCS and VGE outcomes, but the main thrust of this study is the incidence of DCS as presented in the above figure (which is also the most often referred to result by divers who are aware of this study), at least in my experience.

Let me clarify an important point at the outset: my aim with this article is neither to support nor dismiss this report. The NEDU investigation is a careful study and the authors’ effort should be commended. The reason for this article is that I have time and again encountered arguments amongst divers that misinterpret the findings, very often because they’ve either not read the study itself, or because of a misunderstanding of the subtleties of statistical analysis. What I intend to achieve with this post is to elucidate what that plot is actually telling us, and the statistical significance associated with it.

Without further ado, let us dive in.

The authors are interested in finding out whether deep stops profiles dictated by a bubble model are more efficient than shallow stops profiles dictated by a gas content model. The gas-content model in use was VVAL 18 Thalmann, whereas the bubble model was BVM(3).

There are many different kinds of statistical analysis you can employ, and oftentimes the choice is dictated by the type of the experiment being carried out. The statistical analysis used in the NEDU study involves what is known as an **exact test**. More specifically, the test in use is known as **Fisher’s exact test**, named after its inventor, biologist and statistician Ronald Fisher.

What an exact test allows you to do in practice is the following: Suppose you want to find out whether there is a relationship between two given phenomena. You begin by assuming what is known as the ** null hypothesis**. The null hypothesis declares: “There is no relationship between the two phenomena.” Next, you try to reject this null hypothesis (i.e. nullify it, hence the name). If you manage to reject/nullify it, in effect you can say that there *is* a relationship between the two phenomena in question; in other words, you’ve found support for your **alternative hypothesis** that there *is* a relationship.

An exact test gives you the ability to be specific about your claim. First, before anything else, you choose what we call a **significance level**. Let us say you choose a significance level of 5%. That means that the study will have a 5% probability that it rejects the null hypothesis *even if it were true*. So in other words, **at a significance level of 5%, you can expect your study to mistakenly reject the null hypothesis 5% of the time in the long run**.

Now, let us say you are ready to accept this 5% “risk” of mistakenly rejecting the null hypothesis. You next carry out your experiment. Following this, you run your test statistic (e.g. Fisher’s exact test) on your result, and you get out **a number that represents the probability of obtaining your result IF the null hypothesis (which you want to reject) is actually true**. This number is known as the **p-value**. As you can infer, you don’t want this to be a large number.

**If the p-value you get is small, specifically smaller than the 5% significance level** you’ve adopted, then there’s too small a probability that you would obtain your result if the null hypothesis were true.

Therefore, the null hypothesis must be false, and therefore **you reject the null hypothesis**. On the other hand, **if the p-value is higher than your adopted significance level, you cannot reject the null hypothesis**.

Many research papers, including this study, employ this 5% threshold. This is merely an accepted convention, not some deeply revered number. Mathematically, this threshold is denoted by the Greek symbol α (alpha), such that an alpha level (also known as significance level) of 5% is written as α=0.05 (because 5% = 5/100 = 0.05). If the result of your test statistic yields an outcome (p-value) smaller than 0.05, then your result is statistically significant. Say you get a p-value of 0.02. 0.02 is less than 0.05, so you’re good.

So in summary: **the p-value obtained by your test statistic has to be less than the adopted significance level, α, if your result is to be deemed statistically significant. **

Now, one last word. If you’re not happy with a 5% threshold, i.e. if you feel that the chance of mistakenly rejecting the null hypothesis 5% of the time is too high for your comfort, then you can choose to adopt a smaller number (e.g. 1% or 0.01) for your study. That means that your p-value has to be less than 0.01 in order for your result to be statistically significant. The choice of threshold boils down to the following question: **what level of risk of wrongly rejecting the null hypothesis am I happy with?** 5%? 3% 1%?” Many papers use 5%. Some choose a more stringent threshold.

In physics, and in particular particle physics, the convention is that we use a p-value less than 0.003 (what we call a 3-sigma event) to say that a given result constitutes evidence for a phenomenon (e.g. evidence for a new particle), and a p-value less than 0.0000003 (a 5-sigma event) to call the result a discovery. This is much more stringent than most scientific papers.

Now, the way you proceed with this article from this point onwards depends on you. **If you feel like going into the details** of what a Fisher’s exact test is, and in particular would like to see some numbers, **then the green box below is for you**. If not, I am giving you the option to close the box (upper right corner) and proceed with the rest of this article.

Right, we are ready to move on to the NEDU study and apply our newfound knowledge of statistical testing.

Putting the NEDU study within the statistical framework we detailed above, this is what it’s saying:

In such a case, we will be using the same statistical test chosen by the authors, a * one-sided* Fisher’s exact test, with the alternative hypothesis being “greater” (i.e. we expect a larger probability of cases with a good outcome). If we do NOT find a statistically significant result according to this test, we do not have strong support (alternatively, our evidence is weak) that a deep stops profile is better. And we would say: “OK, we’ve tried to find evidence that a deep stops profile is better, but

If, on the other hand, while testing in this direction (that a deep stops profile is more efficient than a shallow stops profile) we indeed find support that a deep stops profile might be more efficient than a shallow stops profile, we say we have a statistically significant result. Now, we might be tempted to shout “hey, clearly a deep stops profile is better”. However, really, we should be careful. We have tested only in one direction. We decided a priori that our null hypothesis was going to be that “a deep stops profile is AS efficient as, or LESS efficient than, a shallow stops profile”, and are interested only in the result that a deep stops profile is more efficient than a shallow stops profile.

However, we really ought to entertain the possibility that a shallow stops profile might be better than a deep stops profile, given that our departure point should be that we do not know. Accounting for this possibility by adopting a two-sided (or two-tailed) test, reduces the statistical significance of a positive result in the first direction we were testing. The takeaway here is the following:

(1)** Testing in the same direction as the result you expect and obtaining a p-value larger than your significance level (a “negative” result) **constitutes a lack of statistical significance; you are “free” to stick to the “old” approach you have been using until now (pending a larger study perhaps). You cannot reject the null hypothesis because you don’t have a statistically significant result.

(2) **Testing in the same direction as the result you expect and obtaining a p-value less than your significance level (a “positive” result)** constitutes statistical significance, but does not provide enough justification to deem the supported approach conclusively better than the other UNLESS you also test in the opposite direction.

In summary, **ideally we should carry out what is known as a two-sided test, testing in both directions of possibility**.

As it turns out, and for a good reason which will be described in the next section, the authors of the study did not proceed with the hypothesis framework described in the “NEDU Original Framework” box above.

Given that the end-result of this study was DCS, the authors rightly tried to minimise unnecessary injury to test subjects. Therefore, they decided in advance that once they would reach the midway point (i.e. once they would reach 188 test dives out of the envisaged total of 375), they would “pause” and analyse the results, and if a significantly greater incidence of DCS was found for the deep stops profile than for the shallow stops profile, the trial would be put to a stop right there and then.

Putting their midpoint analysis in a statistical framework:

In this case, we shall again be using the one-sided Fisher’s exact test (chosen by the authors), with the alternative hypothesis being “less” (i.e. fewer cases of good outcome). If we test in this direction and indeed find that there are fewer good outcomes, then we might say, “hey, we tested in this direction (fewer cases of good outcomes) and we find support for that. We don’t want to take further risk and possibly injure our subjects. So we’ll stop our experiment here”.

However, as we saw above (in the orange box), finding support **in the direction you are testing** is NOT sufficient to conclude that this is necessarily the better approach.

Basically, we have excluded testing in the other direction (alternative hypothesis being “greater”). We have not tested the possibility that a deep stops profile might be better (i.e. gives a larger probability of good outcomes) than a shallow stops profile. Not testing in this other direction **as well** means that **we cannot decisively choose which is the best out of the two options of dive profile**.

Let me be clear: from an ethical point of view, in terms of protecting the test subjects of the study, the approach of stopping the trial if you found one-sided statistical significance that the new approach (a deep stops profile) yields a smaller probability of good outcomes is justifiable. **Given the context within which the study was carried out, namely the fact that the U.S. Navy would depart from continuing to use shallow-stop profiles ONLY in case of the “finding of significantly lower P _{DCS} [probability of DCS] for the bubble model schedule [deep-stops profiles] than the gas content model schedule [shallow-stops profile]” the choice of a one-sided test is appropriate**, as the authors point out.

What the above approach does NOT inform us about, is which of the two approaches (deep stops or shallow stops) is “the best”. In order to come to solid conclusions, we need to carry out a two-sided test that entertains both possibilities, because we simply do not know a priori which is the best of the two. **When trying to decide between two possible approaches, with the point of departure being that we do not yet know if there is any difference between the two and are equally interested in either outcome, then we carry out a two-tailed test. **

Let us say we are trying to establish whether there is any difference between two approaches, let’s call them A and B. Is the incidence of successes (good outcomes) equal for both? We would like to show that it is not. Our null hypothesis is that there is no difference between the two approaches, and that both yield the same result. (A “neutral” null hypothesis.)

So next we carry out the experiment. Let us say, for the purpose of this example, that we find A seems to be giving us a greater number of successes than B. We then run our two-sided test statistic, say a two-tailed Fisher exact test, and get a low p-value, smaller than our adopted significance level. In such a case, we can say that there is a difference between the outcomes of A and B at a statistically significant level. (If our p-value is found to be higher than the significance level, all we can say is that we do not have a statistically significant result. Full stop.)

In other words, within a statistical framework, we would like to pose our question about the two decompression approaches as:

However, as we’ve mentioned before, the authors carried out a ** one-sided** Fisher’s exact test (with α=0.05) at the midpoint analysis. What they are interested in here is the question of whether the deep stops profile is less efficient than a shallow stops profile (that’s the alternative hypothesis), lest they continue with the experiment and end up hurting the test subjects. They want to reject (the null hypothesis) that the deep stops profile is equally efficient to, or more efficient than, a shallow stops profile. Indeed, when the problem is framed this way, there is statistically significant support for their alternative hypothesis (smaller probability of DCS-free outcomes in a deep stops profile than in a shallow stops profile), and that justified the ending of their trial on ethical grounds. However, as we have seen, this result does not conclusively tell us which approach (deep stops profile or shallow stops profile) is more efficient. It merely tells us that up to the number of cases they tested, they found a result that supports their expected result

To make this as clear as possible, let me remind you what we said earlier on: just like many research papers, this report adopts the conventional significance level of 0.05 (i.e. it has a 5% chance of mistakenly rejecting the null hypothesis); that means that **the test result is deemed as statistically significant if the associated p-value is less than 0.05**. If it’s more than 0.05, it is deemed NOT to constitute a statistically significant finding. One could argue that 0.0489 is fairly close to 0.05, but strictly speaking, it passes this test because it’s below the adopted threshold. (The nature of p-values is such that the reader should then decide for themselves whether they find the adopted cutoff and the test result acceptable or not.)

In all this, it is important to keep in mind the context of the original question, and remember that a **one-sided** test is being used.

**If we were to frame our question as in the two-sided framework box above, adopting a two-sided test (i.e. testing in both directions of possibility), ****we would find a p-value of 0.087**, i.e. there is an 8.7% probability you would find a difference between the two algorithms *even if* the null hypothesis is true, i.e. even if the deep stops profile is, after all, as efficient as a shallow stops profile. 8.7% falls outside our acceptable 5% threshold. This means that **the result of a two-sided test applied to the hypothesis as stated in the two-sided framework box is not statistically significant**.

*(In the above, I have run the math and coded it up myself, but in the interest of space I have only presented the results.)*

Purely for purposes of illustration, let us take this analysis a bit further. Given the numbers of this study:

(1) **Had there been one more DCS case amongst the deep-stop profiles, i.e. a result of 11/198 DCS cases** (instead of 10/198), a two-sided Fisher exact test would have yielded a p-value of 0.054, i.e. **still NOT a statistically significant result** (adopting the p=0.05 cutoff).

(2) **Had there been one less DCS case amongst the shallow-stop profiles, i.e. a result of 2/192 DCS cases** (instead of 3/192), a two-sided Fisher exact test would have yielded a p-value of 0.036, i.e. **a statistically significant result**.

In case (2), we would have some certainty that there is a significant difference between the two approaches, because testing in both directions yields only a 3.6% chance of obtaining such a result even if the null hypothesis (that there was no difference) were true.

However, there’s a lesson to be learnt here. As you can see, **a tiny change in numbers (essentially just one datapoint) can shift the significance of the result. This is why the ideal way forward would be to collect more data, i.e. test more subjects.** The problem is that more data comes at the risk of harming more test divers. And that’s where ethical considerations come into play. This, I hope, helps you appreciate that indeed, the authors had to face quite a difficult decision. The options are:

(1) Carry out a larger study (i.e. collect more data) and carry out a two-sided test, potentially getting a statistically significant result that would help us learn more about difference in efficiency between the two profiles, or

(2) Place more importance on the safety of our subjects, and as soon as we suspect that there might be a risk of injury if we were to continue with our test, we stop.

Given that, as the authors themselves state in the introduction, “whether one approach is more efficient than the other is unknown”, a two-sided statistical test is desirable. Indeed, here I point the reader to Ruxton & Neuhäuser (2010, with whom I wholeheartedly agree) who point out that “we very rarely find ourselves in a position where we are comfortable with using a one-tailed test”. The nature of the NEDU study, however, entailed a real possibility of further injury to test subjects, and so the study foregoes two-sided statistical significance in favour of protecting the divers under study, utilising a one-sided test that delivers a statistically significant result with less data than a more demanding two-sided test would.

As a complete aside which you can skip (hence the fainter font colour), moving away from this study to speak more generally, it seems that a problem in biology literature is that authors often opt to use a one-sided test inappropriately. Lombardi & Hurlbert (2009) carried out a survey of every study published in 2005 in the two journals “Oecologia” and “Animal Behaviour” and found that 17% of the quoted p-values were derived from a one-sided test, whilst in 22% of cases the reader would not be able to tell whether a one- or two-sided test had been carried out. Ruxton & Neuhäuser (2010) also report that from their survey of a total of 359 papers in the journal “Ecology”, 17 (i.e. 5%) employed a one-sided test, and with the exception of one study, Ruxton & Neuhäuser (2010) find that this choice of one-sided testing was not appropriate.

There is one final aspect about Fig. 2 of the NEDU study that we haven’t yet mentioned. You might notice that each bar has a vertical line running through it. This is what is known as a **confidence interval**. Specifically, the authors tell us in the caption that they are using a “Binomial 95% CI” (CI=Confidence Interval). What does this mean?

Let’s say you wanted to present the result of some study you’ve carried out. You could show the mean of your data, for example. That’s a result encapsulated in a single point estimate. However, your readers would also be interested in having an estimate about other plausible values that the parameter in question might take in the population you’re sampling. That’s where the confidence interval comes in.

The confidence interval is giving us a range of plausible values that is likely to encompass the value of the population parameter we are interested in (the DCS incidence in the case of the NEDU study). This will become clearer by means of an example. But before we do that, let me just formally define the level, *C*, of the confidence interval. The level *C* of the confidence interval tells us the probability that the interval we produced contains the true value of the parameter of interest. Here’s an example. **Suppose the level C of the confidence interval is 95%. That means that the probability that the interval (drawn on the figure) contains the true value of DCS incidence is 95%.** As you can appreciate, the smaller the interval is, the better, because the narrower the range of plausible values would be.

In scientific experiments it is of paramount importance that if a human decision is involved, the person taking that decision and logging the data is blind as to which trial the data came from to eliminate unconscious bias. The authors properly admit that for reasons of practicality, this study was not blind*: “*This large man-trial had unique potential for response or diagnosis bias because it was not practical to conceal the diverging DCS incidence on the two schedules, not possible to blind diver-subjects to the schedules, and some DCS presented as subjective symptoms only.*“

Appendix D opens by saying that “*the tables below give the case narratives written by the attending Diving medical Officer*”. Reading through the narratives, one encounters phrases such as the below three examples (bold font is my own):

- “
*34 year old active duty Navy diver presenting with right shoulder pain beginning 2 hours after surfacing from <gap> dive*“**under profile A1**. - “
*37 year old, active duty, male diver with 14 year history of Navy diving and no previous history of DCS injury completed a 170/30*”**experimental dive****(profile A2)**… - “
*A 37 year old active duty male diver completed*”**experimental**170/30 “deep stops” decompression dive… - “
*The test diver surfaced from a 170/30*”**(A-2 profile)****research profile**at 1223 hrs.

As is clear from the above, the DMO knew which dive profile was being followed on a given dive. And despite the best and purest of intentions, it is possible that if a DMO feels that they are assessing an “experimental” dive profile, then they would be more wary than usual, a situation that can lead to subconscious bias. If, on the other hand, the DMO had more faith in the “experimental” profile, this could lead to bias in the opposite direction. So we have a situation where neither the test subject nor the assessor (DMO) is blind to the category in which the dive falls (shallow stops or deep stops). Of course, for added safety reasons, whoever gets to treat the diver should know the case history, so the need for a DMO who is fully aware of the dive history is perfectly understandable. However, this does nothing to prevent potential bias from creeping in the study.

A possible way to mitigate this could be to have 2 DMOs: DMO 1 is blind about the study and case history, and simply writes down their assessment which is in turn used for the study. DMO 2 is fully aware of the case history and is given authority in the sense that, for extra safety, if DMO 2 feels the diver should undergo recompression treatment, it is *this* decision that is followed (while DMO 1 is kept blind). This is still not a perfect solution, but it helps to mitigate subconscious bias; DMO 1 is deciding basing ONLY upon the symptoms presented, and not any concern they might have about the fact that the dive was “experimental”.

Bias can go *both* ways, and trying to predict all possible factors of bias is mostly a hypothetical exercise and not a secure way of preventing it. The only way to be sure it’s eliminated is by designing (ideally double) blindness into an experiment.

**The authors comment, however, that the deep stops also generated higher VGE grades than shallow stops, and these are not subject to bias (e.g. a diver cannot increase their VGE grade just by thinking they might be bent).*

I wanted to mention one last thing about the kind of statistical analysis we’ve discussed in this article. The methodology we’ve talked about can be referred to as “classical hypothesis testing”. As we have seen, it is an approach that strives to reject a hypothesis (the null hypothesis) in the process giving us confidence in the alternative hypothesis. It does NOT, however, prove the alternative hypothesis. It is somewhat in our nature to be inclined to think that rejecting the null hypothesis automatically means proving the alternative hypothesis. However, our alternative hypothesis is just that: an alternative. It is not necessarily the only one; there could very well be others.

So where do we go from here? What is the whole point of this lengthy analysis and discussion of Figure 2 of the NEDU study? What should one take from all this?

First off, on the basis of these numbers, specifically, the aspect of this study dealing with incidence of clinical DCS, **we should be careful about claiming that we know for sure which is the best decompression modelling approach**.** The data does not really provide a definitive conclusion about this.** I would not be comfortable to claim otherwise; **I would very much like to see a statistically significant result from a two-sided test**. Moreover, the study has some shortcomings, such as the issue of blindness which, admittedly, can be hard to implement in a study of this kind. The study also had strengths; the subjects being Navy divers means that there’s a degree of commonality in their fitness that helps to make the sample somewhat more uniform.

There might very well be other studies in progress right now the results of which might eventually help us gain a better idea, but until such studies are completed and published, we cannot tell for sure. They might end up confirming the suggestions of this study, or not.

If I were to summarise the points going for and against this experiment, this would be it:

**FOR:**

(1) The **sample is probably fairly uniform**, in that it consists of experienced Navy divers with a degree of commonality in their fitness. (This reduces dependence on variables such as poor cardiovascular health, being overweight, etc.)

(2) A **one-sided test**** yields a statistically significant result**.

(3) **VGE counts seem to agree** (although VGE counts are not tantamount to DCS; the endpoint is clinical DCS).

**AGAINST:**

(1) **If we apply a two-sided test, we do not get a statistically significant result**. The study did not use a two-sided test (which test makes it harder to achieve statistical significance). Carrying out a two-sided test effectively means that we wouldn’t be focusing on just one direction. Our starting point is that we do not know if there is a difference between the two decompression approaches, and if there is, in which direction. If we take this approach, the experiment does not yield a statistically significant result.** **More data would be desirable, as can also be appreciated from the fairly wide confidence intervals.

(2) **The study is not a blind experiment**. Neither the subject nor the assessor (of whether a diver has suffered DCS or not) was blind as to which dive profile was followed.

(3) The combination of (1) and (2), i.e. the **combination of** **being a non-blind experiment making use of a one-sided test**, decreases the robustness of the study.

For my diving, **I do take into account the findings of the NEDU study**. Given the support – on the basis of

Specifically, I am diving on a Bühlmann ZHL-16C + GF algorithm, with a GF selection of 30/70. However, for added conservatism, once I’m in the shallows I usually try to clear the subsequent (shallower) stop while still holding the stop below. Moreover, I tend to pad my last (shallowest) stop by a few minutes as a further precaution. In this regard, my usual dive profile ends up looking more like a 30/65 than a 30/70. As for the low GF of 30, I am comfortable bringing that up to 40 or 45 (in effect moving further away from deep stops), but that’s as high as I’m happy to increase it for now.

Pending further studies, I do not have evidence that deep stops are more efficient, so overly stressing their importance is not a sensible approach. There are also other reasons for my choice, such as the seemingly higher median VGE grade described in the second section of this study (although, admittedly, a higher VGE grade is not tantamount to DCS) as well as *theoretical* considerations, but this article is long enough as it is, so I will not go into these topics here. Perhaps a post for another time. In the meantime, **despite my current choice, I’m keeping my mind open**.

Always remember that no set of gradient factors is universal in its ability to protect one from DCS. Each person’s physiology is different, and each of us can tolerate a different amount of decompression stress. Also, no two dives are exactly the same. The above simply seems to be working fine for me. Whichever dive algorithm you follow, you cannot guarantee you will not get bent. Your computer does not know how fit you are, what your weight and age are, whether you’re well-hydrated, cold, well-rested or stressed out. What you should do is take as many precautions as you can. I like to joke that diving is the sole activity in my life where I wholly embrace conservatism.

I dive because I enjoy this activity. I simply do not understand people who try to get out of the water as quickly as possible towards the end of a dive. Unless you’re freezing cold or otherwise uncomfortable, what could possibly be so annoying about staying a few extra minutes in the water beyond what your computer prescribes? Aren’t those few additional minutes a worthy precaution against landing for a few hours in a hyperbaric chamber? **Cutting down on deco time just to be the first one out of the water is nothing to be proud of.**

I hope this post will help clear any misunderstandings and misinterpretation (one way or another) of the NEDU data.

**Always strive to stay informed, and if you ever hear anyone claiming loudly with an air of certainty the superiority of their approach, be sceptical… VERY sceptical**.

**Disclaimer:**

This post may not be free of error. Any use of this data for dive planning purposes is the sole responsibility of the reader, and can result in serious injury or death. The author assumes no responsibility for the use, be it for diving or any other purpose on the part of the reader, of any of the content presented on this website.

*Joseph is an astrophysicist by profession who divides his time between thinking & teaching about space, breathing underwater, and taking pictures of both (and anything in between).*

For this post, I’ve collected a few pictures of the dives I’ve been doing over the past few weeks, whilst very much enjoying the silent diving that the JJ-CCR has opened up. Hope you enjoy them! Safe & happy diving!

]]>

I’ve been on an anniversary dive to take some pictures for comparison, starting off from the Inland Sea and swimming out all the way to the Blue Hole and back. Here are some pictures from the dive.

]]>