Jump to content
House Price Crash Forum

Archived

This topic is now archived and is closed to further replies.

cica

Statistics Question

Recommended Posts

I believe for many years Concorde was the safest airliner in the world based on number of flights per crash (many flights and no crashes) (one measurement of safety - I have just read that tyre damage rates were sky high which was a possible flag for concern).

Once it crashed in Paris it's small number of flights relative to other major airliners meant its 1 crash made it statistically a bit of a death trap.

In statistics, can you apply a process like this to this situation?...

A Boeing 747 might have had 50,000 flights with 1 crash.

Concorde has 1,000 flights and no crashes. What does this data say about this situation? Not necessarily much but we could do a little experiment...

Assume that the 1001st flight will crash leaving it with 1 crash per 1,001 flights which would then highlight a possible issue with the data.

Does doing this measure the quality of the data at all? Does it in any way measure the weakness of the data?

I've tried studying concepts such as statistical significance but never truly understood it.

Share this post


Link to post
Share on other sites

I would guess that the absence of one of the stats in one or more of the data sets make them useless for comparison purposes.

Share this post


Link to post
Share on other sites

I believe for many years Concorde was the safest airliner in the world based on number of flights per crash (many flights and no crashes) (one measurement of safety - I have just read that tyre damage rates were sky high which was a possible flag for concern).

Once it crashed in Paris it's small number of flights relative to other major airliners meant its 1 crash made it statistically a bit of a death trap.

In statistics, can you apply a process like this to this situation?...

A Boeing 747 might have had 50,000 flights with 1 crash.

Concorde has 1,000 flights and no crashes. What does this data say about this situation? Not necessarily much but we could do a little experiment...

Assume that the 1001st flight will crash leaving it with 1 crash per 1,001 flights which would then highlight a possible issue with the data.

Does doing this measure the quality of the data at all? Does it in any way measure the weakness of the data?

I've tried studying concepts such as statistical significance but never truly understood it.

you could say at flight 1000, all we know is the probability to crash is smaller than 1/1000. Thats not quite the full story because datasets can have statistical fluctuations. So it could have a probability to crash 1 in 950 flights, but actually you got to flight 1000 without a crash.

I believe in risk analysis though its not just the probability you use, but the product of probability and consequences. So an unlikely scenario with catastrophic consequences is considered more risky than a likely scenario, with minor consequences.

Share this post


Link to post
Share on other sites

I believe for many years Concorde was the safest airliner in the world based on number of flights per crash (many flights and no crashes) (one measurement of safety - I have just read that tyre damage rates were sky high which was a possible flag for concern).

Once it crashed in Paris it's small number of flights relative to other major airliners meant its 1 crash made it statistically a bit of a death trap.

<snip>

The way I might phrase the comparison between the two airliners, as a layman not a statistician, would be

Concorde crash rate - better than 1 in 1,000

747 crash rate - better than 1 in 49,999

Share this post


Link to post
Share on other sites

The sample size is insufficient to be meaningful.

that never stopped a Political statement being made about a thing....EVER.

Share this post


Link to post
Share on other sites

I believe for many years Concorde was the safest airliner in the world based on number of flights per crash (many flights and no crashes) (one measurement of safety - I have just read that tyre damage rates were sky high which was a possible flag for concern).

Once it crashed in Paris it's small number of flights relative to other major airliners meant its 1 crash made it statistically a bit of a death trap.

In statistics, can you apply a process like this to this situation?...

A Boeing 747 might have had 50,000 flights with 1 crash.

Concorde has 1,000 flights and no crashes. What does this data say about this situation? Not necessarily much but we could do a little experiment...

Assume that the 1001st flight will crash leaving it with 1 crash per 1,001 flights which would then highlight a possible issue with the data.

Does doing this measure the quality of the data at all? Does it in any way measure the weakness of the data?

I've tried studying concepts such as statistical significance but never truly understood it.

Working out the likelihood of an event which has not occured can seem like an oxymoron, as is the case if in the example above you have observed 1,000 flights with no crashes recorded. But, they key is to turn the issue around a bit and say "What is the chances of no crash occuring in 1,000 flights if the likelihood of a crash on any particular flight was x% ?

So, if the supposed likelihood of a crash on a particular Concorde flight was say 50%, then the chances of seeing the observed 1,000 flights without incident is very small indeed, small enough that you can confidently state that the likelihood of a crash is almost certainly much less than 50% per flight. Note 'almost certainly'- it is not completely assured.

Ultimately, there are agreed standards for deciding what represents an acceptable limit for the point at which you can say 'statistically, this is true'. In the above example it would be the case that you could confidently state "The likelihood of Concorde crashing on any particular flight is substantially less than 50%" without issue, because the chances of that statement being wrong are below an agreed threshold for statistical possibility to the contrary.

Generally, the best plan is to state the confidence with which you can state that something is correct. This leads you into a bit more maths as in order to do this you need to start talking about what you have observed (zero crashes) versus what you might have expected to see in your observed dataset 1,000 flights, given a particular assumption (50% crash rate).

So if Concorde truly had a 50% chance of crashing on any particular flight, you might expect to see 50% of 1,000 = 500 crashes (you would have run out of Concordes, and probably paying passengers, long before this of course). How likely is what you have observed (0 crashes) versus what you might have expected to observe if the 50% assumption is correct?(vanishingly small likelihood of observing 0 in 1,000 flights with an expected value of 500). How significant is your result in light of this? (Very significant- you could confidently debunk a rival operator who eg claimed that Concorde has a 50% chance of crashing!).

The bigger the dataset, in this case flights, simply means you can home in on the chance of crashing with a greater degree of confidence. Ideally from a stats point of view, you would wish to conduct an infinite number of flights, at which point the chance of crashing would be known with certainty. In the absence of this possibility, you can get a very good estimate, within agreed confidence boundaries, with a (much) smaller set of data.

What happens on future observations is in one sense of no material impact to the data already gathered. If your treatment is correct, it will accomodate future data within the calculated boundaries. If it doesn't, it could flag either an issue with your treatment or that something has changed with the flights- eg plane in poorer mechanical condition, more terrorists as passengers, damaging ash clouds) That's not really a weakness in the data though, it's a caveat to be aware of. Clearly if the 1,001 flight takes place with only half the fuel required it's not really valid to use the observed previous probabilities as a guide to your survival chances on this flight.

Share this post


Link to post
Share on other sites

that never stopped a Political statement being made about a thing....EVER.

If politicians and newspapers could be held to account for misuse of statistics..

Share this post


Link to post
Share on other sites

To answer the OP.. use a model.

There may not have been many crashes, but there would be plenty of incidents that could have caused crashes. Generally, a crash is the result of several incidents in combination - modern planes don't just fall out of the sky..

('Damage from debris on the runway' was a known incident with a known frequency)

So, if you have a list of 'incidents', you can work out what combinations are likely to cause a crash, and therefore the modeled probability of crashes. Actual crash rates are too low as to be of much use. It's the modeled probability that gets the certification, and Concorde would not get this certification now - or at the time of the crash.

Share this post


Link to post
Share on other sites

For populations in which an event has not happened, you might find the rule of three useful for calculating an upper bound on the probable likelihood of a crash. Basically, what it says is that if a crash hasn't happened in n flights then with 95% confidence the true rate of crashes will be less than 3/n.

You could of course derive a statistic for the case where a single adverse event in a series had occurred using the same methodology.

The Concorde example is a little more difficult, as the crash is the last one in the series which terminated the "experiment", so the assumptions of the underlying model would be slightly different - the question being what is the maximum underlying probability of an accident per flight for us to be 95% sure that the first accident would be after no less than n flights or something like that.

Share this post


Link to post
Share on other sites

For populations in which an event has not happened, you might find the rule of three useful for calculating an upper bound on the probable likelihood of a crash. Basically, what it says is that if a crash hasn't happened in n flights then with 95% confidence the true rate of crashes will be less than 3/n.

You could of course derive a statistic for the case where a single adverse event in a series had occurred using the same methodology.

The Concorde example is a little more difficult, as the crash is the last one in the series which terminated the "experiment", so the assumptions of the underlying model would be slightly different - the question being what is the maximum underlying probability of an accident per flight for us to be 95% sure that the first accident would be after no less than n flights or something like that.

Chi-squared seems to be dragging itself from the bowels of my memory when thinking about this problem.

Share this post


Link to post
Share on other sites

There is a concept in statistics of a "confidence interval" which describes where you expect a "real" value to be, when you only have a limited observation.

For example, if a particular space launch vehicle has a record of 500 launches, and 499 were successful, what is the probability of launch success?

The naive answer is that this particular rocket has a 99.8% success rate, and based on the data it is correct. However, it doesn't take account of the quality of the data.

A more precise answer is to state that we can be 95% confident that the success rate of this particular type of rocket is between 98.9% and 99.99%. The real probability of success may be outside this range, due to some freak co-incidence in the launches that were recorded.

We can calculate the range based on the particular degree of confidence that is required. 95% confidence is used by convention, but you could calculate a 90% or 99% or 99.9% confidence range, but the higher the confidence you require, the larger the range will be.

Share this post


Link to post
Share on other sites

Ive always wondered how they can say 'statistically speaking, given the size of the universe, extraterrestial life is certain'

How can they say such a thing? I dont care how big the universe is, one known planet with life on it is still just one known planet.

Share this post


Link to post
Share on other sites

Chi-squared seems to be dragging itself from the bowels of my memory when thinking about this problem.

Not valid due to the very low count for the Concorde accidents. Generally, you need 5 counts in every "box" or at least 5 in 80% of the boxes. I wouldn't use Chisquare in this case. Also, have to consider the nature in which the data series was terminated...the one accident was not just anywhere in the series, but was the "terminator" of the "experiment.

Share this post


Link to post
Share on other sites

If there's one in our solar system and there's literally billions of solar systems then there must be trillions of critters out there.

The thing that goes against that apparent logic is the fact that all life on earth is related...so may have occurred just once despite all the potential chances in the ideal environment.

I just turned left and headed for the blue planet! Completely incorrectly called "Earth"! I would have called it "Water"! :blink:

Share this post


Link to post
Share on other sites

They do this sort of thing to work out whether to fix known issues on certain planes or not.

Watched a very interesting documentary on it few years ago. There is a known issue with 747 iirc. Already caused a crash. They work out the maximum payout in the event of another crash - how many other crashes were likely to occur before they are all out of service - and the cost of fixing this problem in all the still flying 747's.

It was a simple equation to work out whether to fix the problem or not based on cost.

Unless things have changed - i imagine this still occurs.

Share this post


Link to post
Share on other sites

If there's one in our solar system and there's literally billions of solar systems then there must be trillions of critters out there.

The thing that goes against that apparent logic is the fact that all life on earth is related...so may have occurred just once despite all the potential chances in the ideal environment.

Problem is this -

The moment life arises, what does it use as food? The answer being, pre-biotic chemicals. Which basically means that once life gets a start, it makes it impossible for it to re-arise.

What we do observe is that life (or traces thereof) appears on Earth right at the start of the geological record. That's an important observation, as it suggests that life appeared as soon as conditions allowed it. But complex life - things with backbones - took several billion years to appear. From which the most likely hypothesis is that bacteria will be very common on other planets, perhaps even elsewhere in the solar system, but complex life will be rare.

Of course, the really interesting step - from 'things that walk' to 'thing that can post on HPC' took about 350 million years.. which is unhelpful.

Share this post


Link to post
Share on other sites

They do this sort of thing to work out whether to fix known issues on certain planes or not.

Watched a very interesting documentary on it few years ago. There is a known issue with 747 iirc. Already caused a crash. They work out the maximum payout in the event of another crash - how many other crashes were likely to occur before they are all out of service - and the cost of fixing this problem in all the still flying 747's.

It was a simple equation to work out whether to fix the problem or not based on cost.

Unless things have changed - i imagine this still occurs.

It's the model..

People don't like the idea of cost-benefit analysis, in which you have to assign a cash value to people's lives.. but it's hard to do anything else. Unless you have a zero-risk policy ('Don't even get out of bed').

Share this post


Link to post
Share on other sites

Working out the likelihood of an event which has not occured can seem like an oxymoron, as is the case if in the example above you have observed 1,000 flights with no crashes recorded. But, they key is to turn the issue around a bit and say "What is the chances of no crash occuring in 1,000 flights if the likelihood of a crash on any particular flight was x% ?

So, if the supposed likelihood of a crash on a particular Concorde flight was say 50%, then the chances of seeing the observed 1,000 flights without incident is very small indeed, small enough that you can confidently state that the likelihood of a crash is almost certainly much less than 50% per flight. Note 'almost certainly'- it is not completely assured.

Nailed it. I knew it was something like that but I never felt I quite grasped it. Thank you.

Share this post


Link to post
Share on other sites

Statistics can be manipulated in any way you want.

It's even possible to have the answer and ignore it. Just read that in medieval times to treat the plague Drs would wear wax covered coats to stop the "plague atoms" from gripping onto the clothes and infecting the person. Apparently this was quite successful but not for the reasons they thought, the wax covered clothes made it difficult for the fleas which spread the disease to infect the individual. The flea issue was noted by a monk who suffered the plague but the significance of his observation wasn't realised.

Naturally this can give you a false answer if your premise is that the plague is carried by "atoms" as statistically the wearing of the wax clothing will have a lower infection rate. If someone had actually picked up on what the cause was perhaps Europe would have organised rat hunting gangs to curb plague outbreaks. Instead Europe struggled on for a couple more centuries before someone else figured it out.

Share this post


Link to post
Share on other sites

It's the model..

People don't like the idea of cost-benefit analysis, in which you have to assign a cash value to people's lives.. but it's hard to do anything else. Unless you have a zero-risk policy ('Don't even get out of bed').

I think they even put a maximum cash value on your life on each airline ticket don't they !?

In $$ of course if i remember right.

I suppose that does make it pretty upfront and honest.

Share this post


Link to post
Share on other sites

Nailed it. I knew it was something like that but I never felt I quite grasped it. Thank you.

Happy to help. Chumpus Rex (as usual!) explains the key idea much better than me- confidence intervals. How (statistically speaking) confident are you that (to use the same example) the chances of crashing on a particular flight are 50% having observed 1,000 flights with no incidents? Agreed boundaries for statistical truth (usually defined by a certain number of a standard deviation from your best-guess) mean that is this instance you could exclude the idea that the true value for the probability of crashing is 50%.

[an aside on the use of confidence intervals]

Conclusive statistical proof for the existence of a postulated new particle, the Higgs Boson, was announced last year. On what basis can scientists decide if they have actually observed a new particle or not? By use of confidence intervals.

At CERN and other laboratories, various types of particles are counted. There are many, many potential sources of error that mean that a particular observed particle may be erroneously recorded- dodgy electronics, misreadings on the selection criteria, random effects from things like Cosmic rays etc. You may count 10 particles of a type which has not previously been seen(say, it has some property which makes it distinct from all others), but you can't really be sure that you really have observed 10- you must quote a confidence interval on that observed number. So you may think you have discovered a new particle- you observe 10 particles, and the standard deviation (a means of quoting the confidence interval) is say 3 events. Well, the agreed standards for statistical truth are 5 standard deviations, which means that you can confidently state that the true number of particles you observed was somewhere between 10 - (5*3) = -5 events and 10+(5*3) = 25 events.

Can you confidently state that you have observed anything at all, when your results are consistent with the possibility that the true value is zero? No. This is what leads to the idea of being able to cite proof for the existence of a particle (or anything) - the ability to exclude zero. Say your dataset increases (which reduces errors, all else being equal) and you make a new measurement of the same particle and get the following result: 6 events recorded, but now your standard deviation is 1 event. Well, the agreed standard of 5 standard deviations now means that you can state that your actual value lies somewhere between 6 - (5*1) = 1 event and 6 + (5*1) = 11 events. You have now managed to reach the holy grail of a particle-hunter: you have excluded the possibility, to agreed statistical standards, that your true number of observed events is zero. So you definitely have observed something. Collect your Nobel prize.

Share this post


Link to post
Share on other sites

Statistics can be manipulated in any way you want.

Depends upon your audience, you might get away with it to the general public but if you try to mislead someone who knows about statistics they will spot it and know that you are lieing.

Share this post


Link to post
Share on other sites

They do this sort of thing to work out whether to fix known issues on certain planes or not.

Watched a very interesting documentary on it few years ago. There is a known issue with 747 iirc. Already caused a crash. They work out the maximum payout in the event of another crash - how many other crashes were likely to occur before they are all out of service - and the cost of fixing this problem in all the still flying 747's.

It was a simple equation to work out whether to fix the problem or not based on cost.

Unless things have changed - i imagine this still occurs.

Do you have any links to this? I'm curious to read about this as the aviation industry appears more than most to rely on it's safety record. They must have also factored in that losing a plane say every 5 years due to the flaw wasn't going to dent public confidence in said aircraft?

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

  • The Prime Minister stated that there were three Brexit options available to the UK:   203 members have voted

    1. 1. Which of the Prime Minister's options would you choose?


      • Leave with the negotiated deal
      • Remain
      • Leave with no deal

    Please sign in or register to vote in this poll. View topic


×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.