Understanding Data – Scale Invariance
In the last post, we discussed about certain basics in viewing a bunch of numbers and we continue that thought in this post also.We had talked about Mode. But if you look hard, Mode is not as simple as that. Mode depends on the fact that some number repeats itself. If in a bunch of numbers, none of the numbers repeat itself, then there is really no Mode or all are Modes which is trivial in nature. But then repetition in nature for a data point happens only for discrete variables for instance the number of planets in a solar system. If you consider a set of continuous variables, say mass of all planets in all solar systems, you would find no repetition. Or for instance height of the trees which we took up in the last post you wont have any repetition. If you think two trees are five foot tall, you are just approximating – look deeper and you would find that they differ by inches or centimeters or atleast millimeters. The same would hold good for the age of the trees. It is we who lack the ability to measure precisely and hence approximate when we do measurements. So for a bunch of numbers which reflect continuous variables, mode is inevitably trivial. On the other hand a bunch of numbers which arise from discrete variables such as number of orders, marks scored, number of planets in a solar system, could have a non trivial mode. I say “could” because if we have a company with 20 people and we have a bunch of numbers denoting their birth date, it could happen that there are no 2 people who share the same birthday.Truly discrete variables could yield a non trivial Mode but we will come to that soon. In case of approximated continuous variables, the Mode completely depends on two things – how we approximate and also how we calculate it, where the unit system we choose is the key.
Imagine age being a variable.
There are two problems here.
- First we approximate it to say number of years or at the most so many days and lose out on the precision wrt minutes and seconds and so on.
- Second is a much more profound problem, which is the definition of these units itself. Why was a second defined thus in SI units. They just did it and why they did what they did is another story and we will come to that later.
Now one may ask why am I nit picking these things. The first point obviously means our modes calculated with an approximated measurement aren’t exactly absolute modes. This is because if suppose we assume that number of months in a year is not 12 but 10 then the age of a person with the new year scale will be quite different from the previous scale. This means trees which were five years old could get bumped to six years. Such a rearrangement of data points could throw up a new mode. Infact it is so bad that if you select a number, I could possibly work out a way to make it the mode by changing the base scale. But this can be avoided by taking number of months to denote the age instead of number of years. What this will do is also reduce the number of repetitions and hence the probability of the mode being trivial increases. As the scale becomes smaller and smaller, it is tending to a continuous variable, time, and hence like mentioned before the mode becomes inevitably trivial.
But the problem doesnt stop at the variability of mode but goes much deeper. Suppose we have to come up with a mathematical model based on age and it is defined using age. Say the model is used to calculate the probability of death with current age as one of the relevant parameters. The probability ideally shouldn’t vary based on arbitrary choice of the base scale? The probability of death of a 60 year old (based on our current scale) man can’t change just because someone chose to measure his age using 10 month years. So model results of such a mode typically should be base or scale invariant. Without even knowing what the model is I have come up with a law related to the invariance of the model. If the model is not base or scale invariant it is a suspect model. This is exactly why I was nitpicking regarding choosing the right year scale.
Infact this base or scale independence of variables applies to all laws, even laws of physics. When say the base or scale changes, say like the number of months in a year, we need to see how the measured parameter changes – in this case age. When you measure age with 10 year months instead of 12 year months, the base scale reduces from 12 to 10 while the measured value age will increase. For instance some one who is 21 years of age will be aged 2 in the new scale while 1 in the old scale. So age increases as the scale decreases. These kinds of variables are called contra-variant as the direction of change is contrary to the scale increase or decrease. Are there things which respond in the same way as the scale variation? Covariant? Let us say we measure rate at which cancer spreads in the human body – if the cancer say spread in 5 years to the entire body in the current scale it would have spread in 6 years in the new scale which means the rate of spread decreases in the new scale and hence co-variant.
Now let us come to the model. How should the model calculating probability which has a fixed range from zero to 1 behave when supplied with contravariant or covariant variables? For instance if suppose the model calculates the probability for a person aged between 40-50 years as 0.5, then the model in the new scale should yield the same probability of death for a person of age between 48 years and 60 years. If that were not the case, there should be a problem with the model.
Suppose the model is such that the boundaries are – minimum probability of 0.1 death at any age and maximum feasible age is 150 above which the probability of death is 1. And assume between 0 to 150 the probability depends on age – higher the age, greater the probability of death.
The equation could be :
0.1 + 0.9 (x/150) -> This would yield probabilities from 0.1 at age 0 to 1 at age 150.
In the new scale this model could be changed to:
0.1 + 0.9 (x/180) as maximum age 150 is equal to 180 in the new scale. This will yield probabilities from 0.1 at age 0 to 1 at age 180 which tallies well.
What about an age like 15? or 18 in new scale?
Old Scale : Probability would be 0.1 + 0.9 (15/150) = 0.19
New Scale: Probability would be 0.1 + 0.9 (18/180) = 0.19
So this model seems to work fine with the change of scale.
Let us look at a different model – 0.1 + 0.9 ((x *x)/(150*150)). This goes from 0.1 at age zero to 1 at age 150 and the same way in the new scale also from 0.1 in age zero to 1 at age 180.
But what about 15? or 18 in new scale?
Old Scale : Probability would be 0.1 + 0.9 (15*15/150*150) = 0.109
New Scale: Probability would be 0.1 + 0.9 (18*18/180*180) = 0.109
So this model seems to work fine with the change of scale.
But take this model -> 0.1 + 0.9 ((x+x*x)/(150+150*150))
This goes from 0.1 at age 0 to 1 at age 150 and same thing would be true for the new scaled model -> 0.1 + 0.9 ((x+x*x)/(180+180*180)).
But what about 15?and 18 in new scale?
Old Scale : Probability would be 0.1 + 0.9 (240/(150+150*150)) = 0.10953
New Scale: Probability would be 0.1 + 0.9 ((18+ 18*18)/(180+180*180)) = 0.10944
So at the outset the model seems fine and also works out at the boundaries but the results are not scale invariant in the middle.
This is the reason why I was nit picking. The thing is this is not a problem with just actuarial models predicting death probabilities but also true for physical laws. The physical laws should be invariant if we change the units. Ofcourse you are allowed to change the coefficients and constants like we did in our model by replacing 150 by 180. But then even there we had a constraining logic to do it as we just replaced the maximum age in the old scale with maximum age in new scale. So all those coefficients/constants in fundamental physical laws should have a certain logic behind it and as scale changes the coefficients/constants need to change based on logic like the speed of light will change if we redefine the units of distance and time. This is called scale invariance.
Now there is a subtle point we missed in the above discussion. The end result in the discussion was probability which is unit less. If suppose the end result is also age, then the answer would definitely be different but different in such a manner that if the age were 15 in one scale the answer should be 18 in another.
So scale invariance is a subtle concept when you are dealing with models created to predict the world using certain input data in any discipline.
There is much more to this. We conveniently chose years 15 and 18 and hence didnt worry much about discontinuities of approximation and how it affects a continuous model (polynomial in this case) and we will talk about in a later post.