Overview

Dataset statistics

Number of variables11
Number of observations1025
Missing cells0
Missing cells (%)0.0%
Duplicate rows302
Duplicate rows (%)29.5%
Total size in memory53.2 KiB
Average record size in memory53.1 B

Variable types

Numeric4
Categorical7

Alerts

Dataset has 302 (29.5%) duplicate rowsDuplicates
chest_pain_type_non-anginal pain is highly correlated with chest_pain_type_typical anginaHigh correlation
chest_pain_type_typical angina is highly correlated with chest_pain_type_non-anginal pain and 1 other fieldsHigh correlation
thalassemia_normal is highly correlated with thalassemia_reversable defect and 1 other fieldsHigh correlation
thalassemia_reversable defect is highly correlated with thalassemia_normalHigh correlation
diagnosis is highly correlated with chest_pain_type_typical angina and 1 other fieldsHigh correlation
chest_pain_type_non-anginal pain is highly correlated with chest_pain_type_typical anginaHigh correlation
chest_pain_type_typical angina is highly correlated with chest_pain_type_non-anginal pain and 1 other fieldsHigh correlation
thalassemia_normal is highly correlated with thalassemia_reversable defect and 1 other fieldsHigh correlation
thalassemia_reversable defect is highly correlated with thalassemia_normalHigh correlation
diagnosis is highly correlated with chest_pain_type_typical angina and 1 other fieldsHigh correlation
chest_pain_type_non-anginal pain is highly correlated with chest_pain_type_typical anginaHigh correlation
chest_pain_type_typical angina is highly correlated with chest_pain_type_non-anginal pain and 1 other fieldsHigh correlation
thalassemia_normal is highly correlated with thalassemia_reversable defect and 1 other fieldsHigh correlation
thalassemia_reversable defect is highly correlated with thalassemia_normalHigh correlation
diagnosis is highly correlated with chest_pain_type_typical angina and 1 other fieldsHigh correlation
thalassemia_normal is highly correlated with thalassemia_reversable defect and 1 other fieldsHigh correlation
chest_pain_type_typical angina is highly correlated with chest_pain_type_non-anginal pain and 1 other fieldsHigh correlation
thalassemia_reversable defect is highly correlated with thalassemia_normalHigh correlation
chest_pain_type_non-anginal pain is highly correlated with chest_pain_type_typical anginaHigh correlation
diagnosis is highly correlated with thalassemia_normal and 1 other fieldsHigh correlation
age is highly correlated with max_heart_rate_achieved and 1 other fieldsHigh correlation
max_heart_rate_achieved is highly correlated with age and 3 other fieldsHigh correlation
num_major_vessels is highly correlated with ageHigh correlation
chest_pain_type_non-anginal pain is highly correlated with chest_pain_type_typical anginaHigh correlation
chest_pain_type_typical angina is highly correlated with max_heart_rate_achieved and 4 other fieldsHigh correlation
exercise_induced_angina_yes is highly correlated with max_heart_rate_achieved and 2 other fieldsHigh correlation
thalassemia_normal is highly correlated with chest_pain_type_typical angina and 2 other fieldsHigh correlation
thalassemia_reversable defect is highly correlated with thalassemia_normal and 1 other fieldsHigh correlation
diagnosis is highly correlated with max_heart_rate_achieved and 4 other fieldsHigh correlation
st_depression has 329 (32.1%) zeros Zeros

Reproduction

Analysis started2022-09-04 02:44:30.893872
Analysis finished2022-09-04 02:44:35.966887
Duration5.07 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct41
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.43414634
Minimum29
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.1 KiB
2022-09-03T22:44:36.027941image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum29
5-th percentile39
Q148
median56
Q361
95-th percentile68
Maximum77
Range48
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.072290233
Coefficient of variation (CV)0.1666654268
Kurtosis-0.5256178129
Mean54.43414634
Median Absolute Deviation (MAD)6
Skewness-0.2488659017
Sum55795
Variance82.30645008
MonotonicityNot monotonic
2022-09-03T22:44:36.125524image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
5868
 
6.6%
5757
 
5.6%
5453
 
5.2%
5946
 
4.5%
5243
 
4.2%
5139
 
3.8%
5639
 
3.8%
6237
 
3.6%
6037
 
3.6%
4436
 
3.5%
Other values (31)570
55.6%
ValueCountFrequency (%)
294
 
0.4%
346
 
0.6%
3515
1.5%
376
 
0.6%
3812
 
1.2%
3914
1.4%
4011
 
1.1%
4132
3.1%
4226
2.5%
4326
2.5%
ValueCountFrequency (%)
773
 
0.3%
763
 
0.3%
743
 
0.3%
7111
 
1.1%
7014
1.4%
699
 
0.9%
6812
 
1.2%
6731
3.0%
6625
2.4%
6527
2.6%

cholesterol
Real number (ℝ≥0)

Distinct152
Distinct (%)14.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean246
Minimum126
Maximum564
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.1 KiB
2022-09-03T22:44:36.230113image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum126
5-th percentile175
Q1211
median240
Q3275
95-th percentile330
Maximum564
Range438
Interquartile range (IQR)64

Descriptive statistics

Standard deviation51.59251021
Coefficient of variation (CV)0.2097256512
Kurtosis3.996803049
Mean246
Median Absolute Deviation (MAD)33
Skewness1.074072778
Sum252150
Variance2661.787109
MonotonicityNot monotonic
2022-09-03T22:44:36.338207image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20421
 
2.0%
23421
 
2.0%
19719
 
1.9%
21218
 
1.8%
25417
 
1.7%
26916
 
1.6%
17714
 
1.4%
24014
 
1.4%
28214
 
1.4%
23913
 
1.3%
Other values (142)858
83.7%
ValueCountFrequency (%)
1263
 
0.3%
1313
 
0.3%
1413
 
0.3%
1498
0.8%
1574
0.4%
1603
 
0.3%
1643
 
0.3%
1664
0.4%
1674
0.4%
1683
 
0.3%
ValueCountFrequency (%)
5643
0.3%
4173
0.3%
4093
0.3%
4074
0.4%
3943
0.3%
3603
0.3%
3543
0.3%
3534
0.4%
3424
0.4%
3414
0.4%

max_heart_rate_achieved
Real number (ℝ≥0)

HIGH CORRELATION

Distinct91
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean149.1141463
Minimum71
Maximum202
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.1 KiB
2022-09-03T22:44:36.448300image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum71
5-th percentile108
Q1132
median152
Q3166
95-th percentile182
Maximum202
Range131
Interquartile range (IQR)34

Descriptive statistics

Standard deviation23.00572375
Coefficient of variation (CV)0.1542826372
Kurtosis-0.08882248803
Mean149.1141463
Median Absolute Deviation (MAD)16
Skewness-0.5137771771
Sum152842
Variance529.2633251
MonotonicityNot monotonic
2022-09-03T22:44:36.563900image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16235
 
3.4%
16031
 
3.0%
16329
 
2.8%
17328
 
2.7%
15228
 
2.7%
14426
 
2.5%
13226
 
2.5%
15025
 
2.4%
12525
 
2.4%
14323
 
2.2%
Other values (81)749
73.1%
ValueCountFrequency (%)
714
 
0.4%
883
 
0.3%
903
 
0.3%
954
 
0.4%
967
0.7%
974
 
0.4%
993
 
0.3%
1038
0.8%
10510
1.0%
1063
 
0.3%
ValueCountFrequency (%)
2024
0.4%
1953
0.3%
1943
0.3%
1923
0.3%
1904
0.4%
1883
0.3%
1873
0.3%
1866
0.6%
1853
0.3%
1843
0.3%

st_depression
Real number (ℝ≥0)

ZEROS

Distinct40
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.071512195
Minimum0
Maximum6.2
Zeros329
Zeros (%)32.1%
Negative0
Negative (%)0.0%
Memory size8.1 KiB
2022-09-03T22:44:36.677997image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.8
Q31.8
95-th percentile3.4
Maximum6.2
Range6.2
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.175053255
Coefficient of variation (CV)1.096630781
Kurtosis1.314470889
Mean1.071512195
Median Absolute Deviation (MAD)0.8
Skewness1.210899388
Sum1098.3
Variance1.380750152
MonotonicityNot monotonic
2022-09-03T22:44:36.780084image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
0329
32.1%
1.258
 
5.7%
151
 
5.0%
0.647
 
4.6%
0.844
 
4.3%
1.444
 
4.3%
1.637
 
3.6%
0.237
 
3.6%
1.836
 
3.5%
232
 
3.1%
Other values (30)310
30.2%
ValueCountFrequency (%)
0329
32.1%
0.123
 
2.2%
0.237
 
3.6%
0.310
 
1.0%
0.430
 
2.9%
0.515
 
1.5%
0.647
 
4.6%
0.73
 
0.3%
0.844
 
4.3%
0.910
 
1.0%
ValueCountFrequency (%)
6.23
 
0.3%
5.64
 
0.4%
4.44
 
0.4%
4.26
 
0.6%
412
1.2%
3.84
 
0.4%
3.615
1.5%
3.53
 
0.3%
3.410
1.0%
3.28
0.8%

num_major_vessels
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
578 
1
226 
2
134 
3
69 
4
 
18

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row0
3rd row0
4th row1
5th row3

Common Values

ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

Length

2022-09-03T22:44:36.877169image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:36.966244image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

Most occurring characters

ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0578
56.4%
1226
 
22.0%
2134
 
13.1%
369
 
6.7%
418
 
1.8%

chest_pain_type_non-anginal pain
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
741 
1
284 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

Length

2022-09-03T22:44:37.043310image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.123379image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

Most occurring characters

ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0741
72.3%
1284
 
27.7%

chest_pain_type_typical angina
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
528 
1
497 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0528
51.5%
1497
48.5%

Length

2022-09-03T22:44:37.191937image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.270005image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0528
51.5%
1497
48.5%

Most occurring characters

ValueCountFrequency (%)
0528
51.5%
1497
48.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0528
51.5%
1497
48.5%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0528
51.5%
1497
48.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0528
51.5%
1497
48.5%

exercise_induced_angina_yes
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
680 
1
345 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0680
66.3%
1345
33.7%

Length

2022-09-03T22:44:37.336061image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.414629image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0680
66.3%
1345
33.7%

Most occurring characters

ValueCountFrequency (%)
0680
66.3%
1345
33.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0680
66.3%
1345
33.7%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0680
66.3%
1345
33.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0680
66.3%
1345
33.7%

thalassemia_normal
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
544 
0
481 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1544
53.1%
0481
46.9%

Length

2022-09-03T22:44:37.481186image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.559753image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1544
53.1%
0481
46.9%

Most occurring characters

ValueCountFrequency (%)
1544
53.1%
0481
46.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1544
53.1%
0481
46.9%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1544
53.1%
0481
46.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1544
53.1%
0481
46.9%

thalassemia_reversable defect
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
615 
1
410 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0615
60.0%
1410
40.0%

Length

2022-09-03T22:44:37.640822image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.728898image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
0615
60.0%
1410
40.0%

Most occurring characters

ValueCountFrequency (%)
0615
60.0%
1410
40.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0615
60.0%
1410
40.0%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0615
60.0%
1410
40.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0615
60.0%
1410
40.0%

diagnosis
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
526 
0
499 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1025
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Length

2022-09-03T22:44:37.800960image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-03T22:44:37.884031image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Most occurring characters

ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1025
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Most occurring scripts

ValueCountFrequency (%)
Common1025
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1025
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1526
51.3%
0499
48.7%

Interactions

2022-09-03T22:44:35.289307image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.171349image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.539664image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.905478image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:35.380386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.264430image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.626739image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.996055image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:35.471464image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.349501image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.717317image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:35.090137image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:35.567546image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.444583image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:34.815401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-03T22:44:35.192724image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-03T22:44:37.954591image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-03T22:44:38.120734image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-03T22:44:38.284374image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-03T22:44:38.434503image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-03T22:44:38.565115image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-03T22:44:35.710669image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-03T22:44:35.890322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

agecholesterolmax_heart_rate_achievedst_depressionnum_major_vesselschest_pain_type_non-anginal painchest_pain_type_typical anginaexercise_induced_angina_yesthalassemia_normalthalassemia_reversable defectdiagnosis
0522121681.02010010
1532031553.10011010
2701741252.60011010
3612031610.01010010
4622941061.93010100
5582481221.00010101
6583181404.43010000
7552891450.81011010
8462491440.80010010
9542861163.22011100

Last rows

agecholesterolmax_heart_rate_achievedst_depressionnum_major_vesselschest_pain_type_non-anginal painchest_pain_type_typical anginaexercise_induced_angina_yesthalassemia_normalthalassemia_reversable defectdiagnosis
1015582161312.23011010
1016652821741.41000100
101753282952.02011010
1018411721580.00010010
1019472041430.10010101
1020592211640.00001101
1021602581412.81011010
1022472751181.01011100
1023502541590.00010101
1024541881131.41010010

Duplicate rows

Most frequently occurring

agecholesterolmax_heart_rate_achievedst_depressionnum_major_vesselschest_pain_type_non-anginal painchest_pain_type_typical anginaexercise_induced_angina_yesthalassemia_normalthalassemia_reversable defectdiagnosis# duplicates
9381751730.041001018
0292042020.000001014
3351831821.400101014
4351921740.000001014
5351981301.600110104
10382311823.800010104
12392191401.200100104
13392201520.001001014
15401671142.000110104
17402231810.000100104