Introduction to Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable (Y) and one or more independent variables (X). The goal is to find the best-fitting line (or hyperplane) that minimizes the sum of squared differences between predicted and actual values.
plt.xlabel('X') plt.ylabel('y') plt.title
1.Confidence Intervals in Linear Regression
A confidence interval (CI) provides a range of values that is likely to contain the true parameter of the population. In linear regression, it is used to indicate the uncertainty of the estimated coefficients.
# Confidence Interval for coefficients : (results.conf_int(alpha=0.05)) # 95% Confidence Interval
- P-values in Linear Regression
The p-value tests the null hypothesis that a given coefficient is equal to zero (no effect). A smaller p-value indicates stronger evidence against the null hypothesis
(results.summary()) # Includes p-values
3. Sensitivity and Specificity
These metrics are typically used in classification problems to measure the performance of the model. Sensitivity (True Positive Rate) measures how well the model detects positives, while specificity (True Negative Rate) measures how well the model detects negatives.
For linear regression, these metrics are more relevant in classification extensions like logistic regression, but here's a quick example for binary classification (Logistic Regression):
Predictions y_pred_class = model_class.predict(X_class)
- Precision and Recall
Precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant instances that have been retrieved.
- Accuracy in Regression Models
Accuracy is the fraction of predictions that are correct. In regression, accuracy is often not the best metric. Instead, metrics like Mean Squared Error (MSE) are commonly used. However, for classification, accuracy is the overall correctness.
# Predicted values y_pred_reg = model.predict(X) # Calculate MSE as an example of accuracy in regression mse = mean_squared_error(y, y_pred_reg)
- Incidence and Prevalence
- Incidence refers to the number of new cases of a disease or condition during a specific period of time.
- Prevalence refers to the total number of cases (new and existing) of a disease or condition at a specific time.
In epidemiology, these metrics help quantify the risk of health events. Linear regression can model the relationship between these factors and outcomes.
- Quantifying Risk Using Regression
Risk quantification is common in healthcare, finance, and insurance. Logistic regression, in particular, is widely used for modeling risk (such as the risk of a disease).
For linear regression, this would look like estimating the continuous risk based on independent variables.
# Linear Regression Model model_risk = LinearRegression() model_risk.fit(X_risk, y_risk) # Predicted risk predicted_risk = model_risk.predict(X_risk
#Plotting plt.scatter(X_risk, y_risk, color='green') plt.plot(X_risk, predicted_risk, color='purple') plt.xlabel('Exposure') plt.ylabel('Risk') plt.title('Risk Quantification using Linear Regression') plt.show()