Forum: Univariate Linear Regression
Introduction:
Welcome to the Univariate Linear Regression Forum! This space is dedicated to discussions and knowledge sharing on key statistical concepts and performance metrics in regression analysis. Whether you're a researcher, data analyst, or student, feel free to ask questions, share insights, and collaborate on various statistical and machine learning topics.
1.Confidence Interval
A confidence interval provides a range of values within which the true parameter of the population is expected to fall, with a certain level of confidence (e.g., 95%). Discuss methods to compute confidence intervals in regression and their significance in hypothesis testing.
# Fit model
model = sm.OLS(y, X).fit()
print(model.conf_int(alpha=0.05))Â # 95% Confidence Interval
2. P-Values
P-values help determine statistical significance in regression analysis. A small p-value (typically <0.05) suggests strong evidence against the null hypothesis. Engage in discussions on interpreting p-values and their role in model validation.
print(model.pvalues)Â # Display p-values for model coefficients
3. Confusion Matrix
While primarily used in classification problems, confusion matrices can also help in threshold-based regression applications. Discuss how to construct and interpret confusion matrices to evaluate model performance.
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 0, 1, 1, 0]
print(confusion_matrix(y_true, y_pred))
4. Sensitivity & Specificity
- Sensitivity (True Positive Rate): Measures the model's ability to correctly identify positive cases.
- Specificity (True Negative Rate): Measures the model’s ability to correctly identify negative cases.   Â
5.Positive and Negative Predictive Values (PPV & NPV)
- Positive Predictive Value (PPV): Probability that a positive prediction is correct.
print(f"PPV: {ppv}, NPV: {npv}")
Â
6. Precision and Recall
- Precision: The proportion of correctly predicted positive cases among all predicted positives.
- Recall: The proportion of actual positive cases that were correctly predicted.
print (f"Precision: {precision}, Recall: {recall}")
7. Accuracy
Accuracy measures the proportion of correctly predicted cases among total cases. However, it may not be a reliable metric for imbalanced datasets. Join discussions on when and how to use accuracy in model evaluation.
accuracy = accuracy_score(y_true, y_pred
print(f"Accuracy: {accuracy}"
8. Incidence & Prevalence
- Incidence: The rate of occurrence of new cases in a population over time.
- Prevalence: The total number of cases (new and existing) in a population at a given time.
incidence = new_cases / population
prevalence = (new_cases + existing_cases) / population
print(f"Incidence: {incidence}, Prevalence: {prevalence}"
9. Quantifying Risk
- Regression models are often used to estimate risk. Discuss different risk measures such as:
- Risk Ratios (RR)
- Odds Ratios (OR)
- Hazard Ratios (HR)
print(f"Odds Ratio: {odds_ratio}")
Â
Â
Â