General Insurance

Types

What do you understand about General Insurance?
Insurance contracts that do not come under the ambit of life insurance are called general insurance. The different forms of general insurance are fire, marine, motor, accident and other miscellaneous non-life insurance. A General Insurance policy cover reimburses the insured for a financial loss caused due to certain events as stated in the respective general insurance policy.

What do you know about motor insurance?
Vehicle insurance or motor insurance is a type of insurance policy which safeguards you financially in case your vehicle (car or two-wheeler) sustains damages due to natural or man-made calamities such as earthquake, flood, lightning and theft among others. Motor insurance is a unique insurance policy meant for vehicle owners to protect them from incurring any financial losses that may arise due to damage or theft of the vehicle. Whether you have a private car, a commercial vehicle, or a two-wheeler, you can purchase a motor insurance policy.

What is Third party motor insurance?
As per The Motor Vehicles Act, third-party car insurance is mandatory while driving a vehicle in India. It reimburses the third-parties for losses/damages caused by the insured four-wheeler. Third-party Car Insurance is the most basic and compulsory insurance plan for your car. It provides financial and legal assistance if you injure a third party or damage their vehicle/property.

What is property and casualty insurance?
Property insurance and casualty insurance (also known as P&C insurance) are types of coverage that help protect you and the property you own. Property insurance helps cover stuff you own like your home or your car. Casualty insurance means that the policy includes liability coverage to help protect you if you’re found legally responsible for an accident that causes injuries to another person or damage to another person’s belongings. Property and casualty insurance are typically bundled together into one insurance policy.

Elaborate on marine insurance.
Marine insurance refers to a contract of indemnity. It is an assurance that the goods dispatched from the country of origin to the land of destination are insured. Marine insurance covers the loss/damage of ships, cargo, terminals, and includes any other means of transport by which goods are transferred, acquired, or held between the points of origin and the final destination.

What are the factors affecting Motor Insurance Premium?
Here is the list of factors affecting motor insurance premium:
1) Brand of the vehicle: A vehicle of a brand which has higher value than another will be charged a higher premium than the latter.
2) Gender and Age: Statistically speaking, young men are more prone to accidents than young women and thus, there are more chances of accidents leading to insurance claims. Hence, insurance is given to young men at a higher premium as compared to young women.
3) Location: Location is one of the primary factors that insurance providers look for while signing a policy to a customer. For instance, if a person lives in an accident prone area, chances are that his premium will be higher in comparison to someone living in a relatively quieter area with a low percentage of such incidents. Moreover, car thefts are common in many areas with high unemployment and low literacy rate that is also a point of consideration.
4) Type of Engine: Does your engine run on diesel? Then you might have to pay a high premium because diesel vehicles are costlier than petrol vehicles which directly impacts the Insured Declared Value. Thus, the higher the price of the vehicle, the higher the IDV will be. This will also lead to an increased premium.
5) Safety Fittings: If you take care of your vehicle, your vehicle will thank you with low premium rates. If your car is fitted with safety amenities like gear lock, GPS, and airbags to name a few safety fittings, the risk of it getting stolen goes down drastically. This instills confidence in the insurance provider that there won’t be frequent insurance claims and will consider these amenities while setting the premium rates.

What is home insurance? What are the factors affecting home insurance premiums?
Home insurance is a type of property insurance that provides coverage to the policyholder from the unforeseen loss or damage caused to the house structure as well as its content. Home insurance is popularly known as homeowner’s insurance. It is a sort of property insurance covering private residences. Property insurance policy is the insurance that protects the physical goods and the equipment of the business or home against any loss from theft, fire, and any other perils. It can be an all-risk coverage policy that gives protection against all the risks.
Factors affecting home insurance premium are:
a) The extent of coverage – With additional coverage, the extent of protection to your home will also increase along with the premium.
b) The location and size of your house – A house that’s located in a safer area is more economical to insure than a house that’s located in a place prone to floods or earthquakes, or where the rate of theft is higher. And, with a bigger carpet area, the premium also tends to rise.
c) The value of your belongings – If you’re insuring high-value possessions like expensive jewelry or valuables, then the premium payable also rises correspondingly.
d) The security measures in place – A house that has a good deal of safety measures in place costs less to insure than a house that doesn’t come with any security or safeguards. For eg: A house with fire fighting equipment in place will cost less than the others.

What are the factors that affect your premium towards health insurance?
a) Pre-existing medical conditions: The policyholder or applicant will need to provide their own health records to ensure that there aren’t any pre-existing medical conditions. But, if they do have any pre-existing conditions, then the company can either choose to allow those in their policies or can decide not to cover it at all. If the insurance company cannot cover it under the health insurance, then the policyholder will need to bear the costs, thereby, increasing and affecting the premium.
b) Gender: Many policies have a difference in premium rates for men and women. The 3 reasons for this, experts say, are – Women are more likely to visit doctors, take prescriptions, and be subject to chronic diseases.
c) Age: Most young individuals have premiums at much lower rates since they have fewer identified and unidentified diseases than older individuals. Young policyholders are less likely to have health problems and are more likely not to visit a doctor.
d) Choice of profession: Policyholders working in environments with hazardous substances, radiation, chemicals, and jobs with high risk of injuries like constructions have to end up paying higher premiums as per insurance companies since they’re prone to risk of cardiovascular diseases.
e) Marital status: It’s still unclear if married people live longer and healthier lives, but the insurance premiums are generally lower in rates. The men generally reap better benefits with this status change.

Modelling

What is the concept of Hypothesis?
A hypothesis is where we make a statement about something; for example the mean lifetime of smokers is less than that of non-smokers. A hypothesis test is where we collect a representative sample and examine it to see if our hypothesis holds true. The standard approach to carrying out a statistical test involves the following steps:
1) Specify the hypothesis to be tested
2) Select a suitable statistical model
3) Design and carry out an experiment/study
4) Calculate a test statistic
5) Calculate the probability value
6) Determine the conclusion of the test.

How to handle large datasets?
The steps to handle large data sets are:
a) Develop a well-defined set of objectives which need to be met by the results of the data analysis.
b) Identify the data items required for the analysis.
c) Collection of the data from appropriate sources.
d) Processing and formatting data for analysis, eg inputting into a spreadsheet, database or other model.
e) Cleaning data, eg addressing unusual, missing or inconsistent values.
f) Exploratory data analysis, which may include: (a) Descriptive analysis; producing summary statistics on central tendency and spread of the data. (b) Inferential analysis; estimating summary parameters of the wider population of data, testing hypotheses. (c) Predictive analysis; analyzing data to make predictions about future events or other data sets.
g) Modeling the data.
h) Communicating the results.
i) Monitoring the process; updating the data and repeating the process if required.
Throughout the process, the modeling team needs to ensure that any relevant professional guidance has been complied with. For example, the Financial Reporting Council has issued a Technical Actuarial Standard (TAS) on the principles for Technical Actuarial Work (TAS100) which includes principles for the use of data in technical actuarial work.Further, the modeling team should also remain aware of any legal requirement to be complied with such as Solvency II. Such legal requirements may include aspects around consumer/customer data protection and gender discrimination.

What is the meaning of p-value?
A p-value is a statistical measurement used to validate a hypothesis against observed data. A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true. The lower the p-value, the greater the statistical significance of the observed difference. A p-value is used in hypothesis testing to help you support or reject the null hypothesis. The p-value is the evidence against a null hypothesis. It is the least probability value in which a null hypothesis can be rejected.
A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random). Therefore, we reject the null hypothesis, and accept the alternative hypothesis.

What does Z-score mean?
A z-score, also known as a standard score, informs you of how far a data point is from the mean. Technically speaking, however, it’s a measurement of how many standard deviations a raw score is from or above the population mean. If a Z-score is 0, it indicates that the data point’s score is identical to the mean score.

What is regression analysis? What is the error term in that?
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables.It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.
In order to create predictions about your data, regression analysis will provide you an equation for a graph. For instance, if you’ve gained weight recently, it can estimate how much you’ll weigh in 10 years if you keep gaining weight at the same rate.
The error term, which refers to the sum of the deviations within the regression line and explains the difference between the theoretical value of the model and the actual observed results, denotes the margin of error inside a statistical model.

Define Skewness.
Skewness is a measure of the asymmetry of a distribution. A distribution is asymmetrical when its left and right side are not mirror images. A distribution can have right (or positive), left (or negative), or zero skewness. It defines the shape of the distribution.It can be measured as an indicator of how much a distribution deviates from the normal distribution. A lognormal distribution, for instance, would have considerable right skew whereas a normal distribution has zero skew.

What is the formula of correlation?
The correlation coefficient(X,Y) (written as corr(X,Y)) or (X,Y) of two random variables X and Y is defined by:
Corr(X,Y)=Cov(X,Y)/[SD(X)*SD(Y)].

What is the difference between deterministic and stochastic models?
Deterministic models are the mathematical models in which the outcomes are determined by the relationship between states and events. There is no room for random variation in a deterministic model. The deterministic model helps you make a point estimate of the payments that will need to be made in future but it does not take into consideration any variations that may show up in due course of time, while,
A stochastic model allows room for random variation in one or more inputs overtime and produces a different output every time. The random variation finds its base in historical data. So, the modeler can say with some level of confidence that the payouts in future will fall in that range. On the down-side, the Stochastic models may be computationally very complex to perform.

What is Gumbel distribution?
Extreme value distributions (EVDs), which are frequently used to model maximums and minimums, are known as the Gumbel distribution (also known as the Gumbel type). This distribution might be used to represent the distribution of the maximum level of a river in a particular year if there was a list of maximum values for the past ten years. It is useful in predicting the chance that an extreme earthquake, flood or other natural disaster will occur.

What is overfitting?
When a statistical model fails to produce reliable predictions on test data, it is said to be overfitted. A model begins to learn from the noise and erroneous data entries in our data set when it is trained with such a large amount of data. And when using test data for testing yields high variance. Due to too many details and noise, the model fails to appropriately identify the data. The non-parametric and non-linear approaches are the root causes of overfitting since these types of machine learning algorithms have more latitude in how they develop the model based on the dataset, making it possible for them to produce highly irrational models.
For example, decision trees are a nonparametric machine learning algorithm that is very flexible and is subject to overfitting training data. This problem can be addressed by pruning (Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances) a tree after it has learned in order to remove some of the detail it has picked up.

What do you understand about the Markov Chain?
A Markov chain is a representation of an object moving randomly through a discrete number of possible states.The Markov property means that the future value of a process is independent of the past history and only depends on the current value. Markov chains are used in ranking of websites in web searches. Markov chains model the probabilities of linking to a list of sites from other sites on that list; a link represents a transition.

Where does the mean, median and mode lie in positively and negatively skewed distributions?
In positively skewed distribution, the mean is greater than the median and the median is greater than the mode (Mean > Median > Mode).
In a negatively skewed distribution,  the mean is always lesser than median and the median is always lesser than the mode. (Mean < Median < Mode).

What is correlation?
Linear correlation between a pair of variables looks at the strength of the linear relationship between them. The word correlation is used in everyday life to denote some form of association. However, a correlation between two variables does not necessarily imply that a change in one variable is the reason for a change in the values of the other i.e. it does not imply causation.

What is regression?
Regression analysis is a statistical method that helps us to analyze and understand the relationship between two or more variables of interest. The process that is adapted to perform regression analysis helps to understand which factors are important, which factors can be ignored, and how they are influencing each other.
For the regression analysis to be a successful method, we understand the following terms:
a) Dependent Variable: This is the variable that we are trying to understand or forecast.
b) Independent Variable: These are factors that influence the analysis or target variable and provide us with information regarding the relationship of the variables with the target variable.

Briefly explain the various types of distributions.
Binomial distribution – Binomial distribution summarizes the number of trials, or observations when each trial has the same probability of attaining one particular value. The binomial distribution determines the probability of observing a specified number of successful outcomes in a specified number of trials.
Example – The simplest real life example of binomial distribution is the number of students that passed or failed in a college. Here the pass implies success and fail implies failure. Another example is the probability of winning a lottery ticket. Here the winning of reward implies success and not winning implies failure.
Poisson distribution – a Poisson distribution is a probability distribution that is used to show how many times an event is likely to occur over a specified period.
Example – Call centers use the Poisson distribution to model the number of expected calls per hour that they’ll receive so they know how many call center reps to keep on staff.
Normal Distribution – Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve.
Examples: Height of the population, rolling a dice, tossing a coin.
LogNormal distribution – A log-normal distribution is a continuous distribution of random variable whose natural logarithm is normally distributed
Example: One of the most common applications where log-normal distributions are used in finance is in the analysis of stock prices.

What is the difference between Poisson and Binomial distribution?
The difference between Poisson and Binomial distributions is as follows:
1) In a Binomial distribution, there is a fixed number of trials. In a Poisson distribution, there could be any number of events that occur during a certain time interval.
2) Poisson distribution is uni-parametric, or characterized by a single parameter m, whereas the binomial distribution is biparametric, or featured by two parameters, n and p.
3) In binomial distribution Mean > Variance while in poisson distribution mean = variance.
4) There are only two outcomes in a binomial distribution: success or failure. On the other hand, the poisson distribution has an infinite number of potential outcomes.
5) Binomial distribution example: flip a coin 3 times, Poisson distribution example: how many customers will arrive at a store in a given hour?

What is GLM?
Generalized linear models (GLMs) relate the response variable which we want to predict, to the explanatory variables or factors (called predictors, covariates or independent variables) about which we have information. Thus it (GLM) generalizes linear regression, allows the linear model to be linked to the response variable via a link function and allows the variance of each measurement to be a function of the projected value.

How is GLM used in life or general insurance?
They are used to:
a) determine which rating factors to use (rating factors are measurable or categorical factors that are used as proxies for risk in setting premiums, eg age or gender)
b) estimate an appropriate premium to charge for a particular policy given the level of risk present.

Which distribution will two-sample mean follow?
From the Central Limit Theorem, we know that as n gets larger and larger, two-sample mean follows a Normal Distribution. The larger the n gets, the smaller the standard deviation gets. Thus, a two-sample mean will follow a Normal Distribution. If sigma is not known, two-sample mean will follow a t-distribution.

What is a multi-state model?
Multi-state models are representations of a process that, at any given time, can be in any one of a number of states, such as one describing the life history of an individual. This might refer to a variety of potential outcomes for a single person or the interdependence of multiple people.

What is the difference between multiple state models and multiple risk models?
Multi-state models are models for a process, for example describing a life history of an individual, which at any time occupies one of a few possible states. This can describe several possible events for a single individual, or the dependence between several individuals.
Multiple risk models take into account the effects of various risks within children’s lives and the environments that impact their overall development. The greater the number of risk factors in a child’s life, the more likely that child is to face adversity or experience negative effects developmentally.

Suppose you are given a large dataset. How will you decide which distribution applies to it?
First observe the data and determine if it is discrete or continuous. If the data is discrete, then you can apply distributions like Binomial, Negative Binomial and Poisson distribution to the given set of data. If the mean is almost equal to the variance, the distribution can be thought to have come from a Poisson Distribution. If the mean is greater than variance, then the data can be thought to have come from a Binomial Distribution. If the mean is lesser than variance, then we can consider a Negative Binomial Distribution.
If the data is continuous, you can consider distributions like Normal, Cauchy, Exponential, Beta distribution. If the data is symmetric you can consider that the data is taken from a Normal or a Cauchy Distribution. If the tail probability is high, Cauchy Distribution can be considered, else Normal Distribution can be applied to it. If the data appears to be positively skewed, you can consider an Exponential Distribution and a Beta Distribution in case it appears to be negatively skewed.
After you have determined the distribution, you may use Q-Q Plots or determine whether your data fit into a certain distribution may be to use probability plots. The distribution matches your data if they fall along the straight line in the graph. Visually, this procedure is straightforward. This procedure is known as the “fat pencil” test informally.

Can you explain the cumulative distribution function to me, and when do we use it in statistics?
The Cumulative Distribution Function (CDF), of a real-valued random variable X, evaluated at x, is the probability function that X will take a value less than or equal to x. It is used to describe the probability distribution of random variables in a table.
In Statistics, the cumulative distribution function is used to describe the probability distribution of random variables. It can be used to describe the probability for a discrete, continuous or mixed variable. It is obtained by summing up the probability density function and getting the cumulative probability for a random variable.

What is the degree of confidence?
Degree of confidence represents the probability that the confidence interval captures the true population parameter. So, in statistics, a confidence interval describes the likelihood that a population parameter would fall between a set of values for a given percentage of the time. Confidence intervals that include 95% or 99% of expected observations are frequently used by analysts. Therefore, it can be concluded that there is a 95% probability that the true value falls within that range if a point estimate of 10.00 is produced using a statistical model with a 95% confidence interval of 9.50 – 10.50.

How is lognormal distribution used for actuarial science in insurance?
Lognormal distribution is a probability distribution that is used as a model to claim size distribution; and has a range from zero to infinity.  Log-normal distributions are positively skewed with long right tails due to low mean values and high variances in the random variables.It is used to simulate a variety of natural events, including the distribution of revenue, the number of moves in a chess game, how long it takes to fix a maintenance system, and more.

What is the difference between exponential and weibull distributions?
The exponential distribution is a special case of the Weibull distribution, the case corresponding to constant failure rate. The Weibull distribution with shape parameter 1 and scale parameter b ∈ ( 0 , infinity ) is the exponential distribution with scale parameter.

What is the pdf of exponential distribution?
The pdf of exponential distribution is lambda * e^-(lambda*x).

What is the linear regression equation?
The linear regression equation is given by:
Y = alpha + beta*X + e, where,
Y = Response variable.
X = Explanatory variable.
alpha = Intercept parameter.
beta = Slope parameter.
e = Uncorrelated error variables with mean 0 and common variance sigma^2.

Reinsurance

Explain the concept of Reinsurance.
Reinsurance is a form of insurance purchased by insurance companies in order to mitigate risk. Essentially, reinsurance can limit the amount of loss an insurer can potentially suffer. In other words, it protects insurance companies from financial ruin, thereby protecting the companies’ customers from uncovered losses. The claims on an insurance company must be met in full, but, to protect itself from large claims, the company itself may take out an insurance policy; such a policy is called a Reinsurance policy.

State the types of reinsurance and explain them?
There are two types of reinsurance:
1) Proportional reinsurance – Under proportional reinsurance, the insurer and reinsurer split the claim in pre-defined proportions.
2) Excess of loss reinsurance – Under individual excess of loss, the insurer will pay any claim in full up to an amount M, the retention level. Any amount above M will be met by the reinsurer.

Run-Off

Explain Run-off.
Run-off triangles are a method used to model claims experience. They’re specifically used to estimate the future claims that will be reported based on those already reported. When a claim event occurs, there will be some time before it is reported or notified to the insurer – this is known as a claim delay. The insurer will incur numerous claims in a calendar year, and each of those claims will have a claim delay. The run off triangles are used to estimate how much or how many claims have been incurred in a reporting period (e.g. financial year), but are not yet reported and a reserve is held for this. It’s called an IBNR – incurred but not reported reserve.

Explain Bornheutter – Ferguson method.
The Bornhuetter-Ferguson method combines the estimated loss ratio with a projection method. It, therefore, improves on the crude use of a loss ratio by taking account of the information provided by the latest development pattern of the claims, whilst the addition of the loss ratio to a projection method serves to add some stability against distortions in the development pattern.
Bornhuetter-Ferguson is one of the most widely used loss reserve valuation methods, second only to the chain-ladder method. It combines features of the chain ladder and expected loss ratio methods and assigns weights for the percentage of losses paid and losses incurred.

How many methods are there in run-off triangles? What are they?
There are 4 methods in run-off triangles. They are basic chain-ladder method, index chain-ladder method, Bornheutter-Ferguson method and Average Cost Per Claim method.

Why would you prefer a simple chain-ladder method over Bornheutter-Fergusson method and vice versa?
We would prefer the basic chain-ladder method over Bornheutter-Fergusson method just because of the simplicity of calculations. Also, We would prefer the Bornheutter-Fergusson method over the basic chain-ladder method because the former accounts for the loss ratio that gives us a more accurate and favorable figure for reserves.

Is the loss ratio good? Doesn’t it contradict the principle of prudence? Why?
To account for loss ratio is a very good practice and a very important one too.Loss ratios are useful for evaluating the viability and health of an insurance company. When a company receives more in premiums than it pays out in claims, a high loss ratio may be a sign that it is struggling financially.
It doesn’t contradict with the principle of prudence per se in spite of being a conservative approach because it gives us a realistic figure of reserves and protects us against extremities or anything unforeseen.