Prediction Of Heart-Failure Using Machine Learning

Parvez Sohail
9 min readJan 1, 2022

More than 300,000 deaths occur every year due to heart failure. The heart is an important biological part of the human system. It helps to pump blood through arteries and veins. The blood is passed through every body part, which is done using the heart.

Death is caused by occurs when a part of the heart muscle is damaged or dies because blood flow is reduced or completely blocked. Every 40 seconds, someone in the United States has a heart attack. A heart attack also called a myocardial infarction, occurs when a part of the heart muscle doesn't receive enough blood flow.

One of the most common diseases caused for heart failure is Cardiovascular Disease. Approximately 17 million people globally every year, and they mainly exhibit myocardial infractions and heart failures. Cardiovascular disease(CVD) is a class of diseases that involve the heart or blood vessels. CVD includes coronary artery diseases(CAD) such as angina and myocardial infarction(commonly known as a heart attack).

Other CVDs include

  • Stroke
  • Heart Failure
  • Hypertensive Heart Disease
  • Rheumatic Heart Disease
  • Cardiomyopathy.
  • and other etc..,


  1. Introduction
  2. What is Heart Failure?
  3. Serum Creatinine
  4. Ejection Fraction
  5. Types of Methods
  6. Conclusion


We are going to classify whether a person has heart failure which causes them to death or not. This project is replicating a state-of-art paper from BMC.

In the paper they analyze a dataset of 299 patients with heart failure collected in 2015. They had applied different ML classifiers to both predict the patient survival, and rank the features corresponding to the most important risk factors and feature ranking is done especially on two features serum creatinine and ejection fraction.

What is Heart Failure?

The term “heart failure” makes it sound like the heart is no longer working at all and there’s nothing that cant be done. It is a chronic, progressive condition in which the heart muscle is unable to pump enough blood to meet the body’s needs for blood and oxygen. Simple terms, heart failure means that the heart isn’t pumping as well as it should be.

At first, the heart tries to make up for this by:

  • Enlarging: The heart stretches to contract more strongly and keep up with the demand to pump more blood. Over time this causes the heart to become enlarged.
  • Developing more muscle mass: The increase in muscle mass occurs because the contracting cells of the heart get bigger. This lets the heart pump more strongly, at least initially.
  • Pumping faster: This helps increase the heart’s output.

The body also tries to compensate in other ways:

  • The blood vessels narrow to keep blood pressure up, trying to make up for the heart’s loss of power.
  • The body diverts blood away from less important tissues and organs (like the kidneys), the heart, and the brain.

These temporary measures mask the problem of heart failure, but they don’t solve it. Heart failure continues and worsens until these compensating processes no longer work.

Eventually the heart and body just can’t keep up, and the person experiences the fatigue, breathing problems or other symptoms that usually prompt a trip to the doctor.

The body’s compensation mechanisms help explain why some people may not become aware of their condition until years after their heart begins its decline. (It’s also a good reason to have a regular checkup with your doctor.)

Heart failure can involve the heart’s left side, right side or both sides. However, it usually affects the left side first.

Serum Creatinine

Diagnostic serum creatinine studies are used to determine the renal function(The term used to describe how well the kidneys work). The reference interval is 0.6–1.3 mg/dL (53–115 μmol/L).Measuring serum creatinine is a simple test, and it is the most commonly used indicator of renal function.

A rise in blood creatinine concentration is a late marker, observed only with marked damage to functioning nephrons.

Therefore, this test is unsuitable for detecting early-stage kidney disease. A better estimation of kidney function is given by calculating the estimated glomerular filtration rate (eGFR). eGFR can be accurately calculated without a 24-hour urine collection using serum creatinine concentration and some or all of the following variables: sex, age, weight, and race, as suggested by the American Diabetes Association. Many laboratories will automatically calculate eGFR when a creatinine test is requested.

What is the role of the kidney in Heart Failure?

Renal dysfunction is common in patients with heart failure and is associated with high morbidity and mortality.

Cardiac and renal dysfunction may worsen each other through multiple mechanisms such as fluid overload and increased venous pressure, hypo-perfusion, neurohormonal and inflammatory activation, and concomitant treatment. The interaction between cardiac and renal dysfunction may be critical for disease progression and prognosis.

Renal dysfunction is conventionally defined by a reduced glomerular filtration rate, calculated from serum creatinine levels. This definition has limitations as serum creatinine is dependent on age, gender, muscle mass, volume status, and renal hemodynamics. Changes in serum creatinine related to treatment with diuretics or angiotensin-converting enzyme inhibitors are not necessarily associated with worse outcomes.

New biomarkers might be of additional value to detect early deterioration in renal function and to improve the prognostic assessment, but they need further validation. Thus, the evaluation of renal function in patients with heart failure is important as it may reflect their hemodynamic status and provide a better prognostic assessment. The prevention of renal dysfunction with new therapies might also improve outcomes although strong evidence is still lacking.

Ejection Fraction

An ejection fraction (EF) is the volumetric fraction (or portion of the total) of fluid (usually blood) ejected from a chamber (usually the heart) with each contraction (or heartbeat). Thus understood, ejection fraction may be used to measure a fluid of any viscosity discharged from a hollow organ to another cavity or outside of the body. Blood, bile, and urine are commonly studied under this mathematical platform.

For example, it may refer to the cardiac atrium, ventricle, gall bladder, or leg veins, although if unspecified it usually refers to the left ventricle of the heart.

What is the role of Ejection Fraction in Heart Failure?

EF is widely used as a measure of the pumping efficiency of the heart and is used to classify heart failure types. It is also used as an indicator of the severity of heart failure, although it has recognized limitations.

The EF of the left heart, known as the left ventricular ejection fraction (LVEF), is calculated by dividing the volume of blood pumped from the left ventricle per beat (stroke volume) by the volume of blood collected in the left ventricle at the end of diastolic filling (end-diastolic volume). LVEF is an indicator of the effectiveness of pumping into the systemic circulation.

The EF of the right heart, or right ventricular ejection fraction (RVEF), is a measure of the efficiency of pumping into the pulmonary circulation. A heart that cannot pump sufficient blood to meet the body’s requirements (i.e., heart failure) will often, but not invariably, have a reduced ventricular ejection fraction.

Types of Methods

To train a model, which has to classify whether a given patient may cause a death event or not using heart failure. Some of the methods has been used and for the code you can refer here

  1. Machine Learning Methods for the binary classification (Survival prediction classifier).
  2. Biostatistics and machine learning methods for feature ranking.
  3. Survival machine learning prediction on serum creatinine and ejection fraction alone.

1. Survival Prediction Classifier

For classification 10 different machine learning algorithms has been used

  • One linear statistical method(Linear Regression).
  • Three tree-based methods (Random Forests, One Rule, Decision Tree).
  • One Artificial Neural Network (perceptron).
  • Two Support Vector Machines (Linear and Gaussian radial kernel).
  • One instance-based learning model (K-Nearest Neighbors).
  • One probabilistic classifier (Naive Bayes).
  • An ensemble boosting method (Gradient Boosting).

For evaluation metrics

  • Matthews correlation coefficient(MCC) : The MCC takes into account the dataset is imbalance and generates a high score only if the predictor performed well both on the majority of negative data instances and on the majority of positive data instances . Therefore, we give more importance to the MCC than to the other confusion matrix metrics and rank the results based on the MCC
  • Receiver operating characteristic(ROC) area under the curve.
  • Precision-recall (PR) area under the curve.

The results for the survival prediction classifier are

2. Feature Ranking

For the feature ranking, two approaches are used

  • Biostatistics
  • Machine Learning


In the Biostatistics approach three different approachs has been used

  1. Mann-Whitney U test: The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is known.

How Mann-Whitney U Test is used in the prediction of Heart Failure?

The Mann–Whitney U test (or Wilcoxon rank–sum test), applied to each feature in relation to the death event (target), detects whether we can reject the null hypothesis that the distribution of each feature for the groups of samples defined by death event is the same. A low p-value of this test (close to 0) means that the analyzed feature strongly relates to death events, while a high p-value (close to 1) means the opposite.

Here are the results

2. Pearson correlation Coefficient: It is the ratio between the covariance of two variables and the product of their standard deviations; thus it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1

How Pearson Correlation Coefficient is used in the prediction of Heart Failure?

The Pearson correlation coefficient (or Pearson product-moment correlation coefficient, PCC) indicates the linear correlation between elements of two lists. In this dataset it is applied to each feature in relation to the death event (target), showing the same elements on different positions. The absolute value of PCC generates a high value (close to 1) if the elements of the two lists have linear correlation, and a low value (close to 0) otherwise.

Here are the results

3. Chi-Square Test : A chi-square test for independence compares two variables in a contingency table to see if they are related

How Chi-Square Test is used in the prediction of Heart Failure?

The chi square test (or χ2 test) between two features checks how likely an observed distribution is due to chance. In this dataset it is applied to each feature in relation to the death event (target).A low p-value (close to 0) means that the two features have a strong relationship; a high p-value (close to 1) means, instead, that the null hypothesis of independence cannot be discarded.

Here are the results

Machine Learning

Another method is used for feature importance or ranking is Machine Learning.

With ML, we have to find the best features to classify the heart death event. For this, we have to use the one best performing algorithm from 10 different models.

For this, we are gonna use Random Forest to find the feature importance.

3. Survival machine learning prediction on serum creatinine and ejection fraction alone.

To investigate if machine learning can precisely predict patients' survival by using the top two ranked features alone. They, therefore, elaborated another computational pipeline with an initial phase of feature ranking, followed by a binary classification phase based on the top two features selected.

All the different methods employed for feature ranking identified serum creatinine and ejection fraction as the top two features.So we then performed a survival prediction on these two features by employing three algorithms:

  • Random Forests
  • Gradient Boosting
  • SVM radial.


By the above problem, we may conclude that serum creatinine and ejection fraction alone will lead a major role in heart failure.



Parvez Sohail

Hey, I am enthusiast in Machine Learning and Data Science. I love to share my work.