Overview

Dataset statistics

Number of variables9
Number of observations4601
Missing cells0
Missing cells (%)0.0%
Duplicate rows410
Duplicate rows (%)8.9%
Total size in memory323.6 KiB
Average record size in memory72.0 B

Variable types

Numeric8
Categorical1

Alerts

Dataset has 410 (8.9%) duplicate rowsDuplicates
word_freq_000 is highly correlated with char_freq_$High correlation
char_freq_$ is highly correlated with word_freq_000High correlation
word_freq_000 is highly correlated with char_freq_$High correlation
char_freq_$ is highly correlated with word_freq_000High correlation
word_freq_your is highly correlated with spamHigh correlation
spam is highly correlated with word_freq_yourHigh correlation
word_freq_remove has 3794 (82.5%) zeros Zeros
word_freq_free has 3360 (73.0%) zeros Zeros
word_freq_business has 3638 (79.1%) zeros Zeros
word_freq_you has 1374 (29.9%) zeros Zeros
word_freq_your has 2178 (47.3%) zeros Zeros
word_freq_000 has 3922 (85.2%) zeros Zeros
word_freq_hp has 3511 (76.3%) zeros Zeros
char_freq_$ has 3201 (69.6%) zeros Zeros

Reproduction

Analysis started2022-09-07 20:17:44.501815
Analysis finished2022-09-07 20:17:53.775389
Duration9.27 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

word_freq_remove
Real number (ℝ≥0)

ZEROS

Distinct173
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1142077809
Minimum0
Maximum7.27
Zeros3794
Zeros (%)82.5%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:53.844450image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.74
Maximum7.27
Range7.27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3914413548
Coefficient of variation (CV)3.42744909
Kurtosis75.41343865
Mean0.1142077809
Median Absolute Deviation (MAD)0
Skewness6.765580469
Sum525.47
Variance0.1532263342
MonotonicityNot monotonic
2022-09-07T16:17:53.948539image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03794
82.5%
0.0830
 
0.7%
0.0521
 
0.5%
0.519
 
0.4%
0.3219
 
0.4%
0.1918
 
0.4%
0.2516
 
0.3%
0.114
 
0.3%
0.1614
 
0.3%
0.414
 
0.3%
Other values (163)642
 
14.0%
ValueCountFrequency (%)
03794
82.5%
0.024
 
0.1%
0.0311
 
0.2%
0.048
 
0.2%
0.0521
 
0.5%
0.0612
 
0.3%
0.077
 
0.2%
0.0830
 
0.7%
0.0910
 
0.2%
0.114
 
0.3%
ValueCountFrequency (%)
7.272
< 0.1%
5.41
< 0.1%
4.541
< 0.1%
4.081
< 0.1%
41
< 0.1%
3.271
< 0.1%
3.122
< 0.1%
3.071
< 0.1%
2.981
< 0.1%
2.942
< 0.1%

word_freq_free
Real number (ℝ≥0)

ZEROS

Distinct253
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2488480765
Minimum0
Maximum20
Zeros3360
Zeros (%)73.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:54.059634image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.1
95-th percentile1.34
Maximum20
Range20
Interquartile range (IQR)0.1

Descriptive statistics

Standard deviation0.8257917011
Coefficient of variation (CV)3.31845724
Kurtosis196.4249754
Mean0.2488480765
Median Absolute Deviation (MAD)0
Skewness10.76359403
Sum1144.95
Variance0.6819319337
MonotonicityNot monotonic
2022-09-07T16:17:54.166725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03360
73.0%
0.133
 
0.7%
0.3231
 
0.7%
0.2524
 
0.5%
0.2323
 
0.5%
0.3821
 
0.5%
0.1919
 
0.4%
0.1418
 
0.4%
0.0817
 
0.4%
0.5817
 
0.4%
Other values (243)1038
 
22.6%
ValueCountFrequency (%)
03360
73.0%
0.012
 
< 0.1%
0.024
 
0.1%
0.034
 
0.1%
0.041
 
< 0.1%
0.059
 
0.2%
0.066
 
0.1%
0.073
 
0.1%
0.0817
 
0.4%
0.0914
 
0.3%
ValueCountFrequency (%)
202
< 0.1%
16.661
< 0.1%
10.161
< 0.1%
101
< 0.1%
7.692
< 0.1%
7.352
< 0.1%
6.521
< 0.1%
6.451
< 0.1%
6.252
< 0.1%
6.091
< 0.1%

word_freq_business
Real number (ℝ≥0)

ZEROS

Distinct197
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1425863943
Minimum0
Maximum7.14
Zeros3638
Zeros (%)79.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:54.278321image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.82
Maximum7.14
Range7.14
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.444055329
Coefficient of variation (CV)3.11428963
Kurtosis45.67377543
Mean0.1425863943
Median Absolute Deviation (MAD)0
Skewness5.688642099
Sum656.04
Variance0.1971851352
MonotonicityNot monotonic
2022-09-07T16:17:54.379408image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03638
79.1%
0.0827
 
0.6%
0.3226
 
0.6%
0.3724
 
0.5%
0.1920
 
0.4%
0.119
 
0.4%
0.218
 
0.4%
0.1718
 
0.4%
0.717
 
0.4%
0.4417
 
0.4%
Other values (187)777
 
16.9%
ValueCountFrequency (%)
03638
79.1%
0.012
 
< 0.1%
0.023
 
0.1%
0.035
 
0.1%
0.045
 
0.1%
0.057
 
0.2%
0.068
 
0.2%
0.077
 
0.2%
0.0827
 
0.6%
0.0914
 
0.3%
ValueCountFrequency (%)
7.141
< 0.1%
5.121
< 0.1%
5.061
< 0.1%
4.871
< 0.1%
4.811
< 0.1%
4.51
< 0.1%
3.882
< 0.1%
3.841
< 0.1%
3.732
< 0.1%
3.571
< 0.1%

word_freq_you
Real number (ℝ≥0)

ZEROS

Distinct575
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.662099544
Minimum0
Maximum18.75
Zeros1374
Zeros (%)29.9%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:54.487500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1.31
Q32.64
95-th percentile4.76
Maximum18.75
Range18.75
Interquartile range (IQR)2.64

Descriptive statistics

Standard deviation1.775480665
Coefficient of variation (CV)1.068215602
Kurtosis5.257394368
Mean1.662099544
Median Absolute Deviation (MAD)1.31
Skewness1.591674269
Sum7647.32
Variance3.152331591
MonotonicityNot monotonic
2022-09-07T16:17:54.590588image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01374
29.9%
1.3136
 
0.8%
224
 
0.5%
2.5624
 
0.5%
3.3323
 
0.5%
1.2921
 
0.5%
3.8421
 
0.5%
1.219
 
0.4%
1.3618
 
0.4%
1.8517
 
0.4%
Other values (565)3024
65.7%
ValueCountFrequency (%)
01374
29.9%
0.011
 
< 0.1%
0.022
 
< 0.1%
0.032
 
< 0.1%
0.054
 
0.1%
0.061
 
< 0.1%
0.077
 
0.2%
0.082
 
< 0.1%
0.094
 
0.1%
0.15
 
0.1%
ValueCountFrequency (%)
18.751
 
< 0.1%
14.282
< 0.1%
141
 
< 0.1%
12.52
< 0.1%
12.191
 
< 0.1%
11.111
 
< 0.1%
10.631
 
< 0.1%
9.721
 
< 0.1%
9.522
< 0.1%
9.094
0.1%

word_freq_your
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct401
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8097609215
Minimum0
Maximum11.11
Zeros2178
Zeros (%)47.3%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:54.701183image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.22
Q31.27
95-th percentile3.17
Maximum11.11
Range11.11
Interquartile range (IQR)1.27

Descriptive statistics

Standard deviation1.200809812
Coefficient of variation (CV)1.482918945
Kurtosis9.009506008
Mean0.8097609215
Median Absolute Deviation (MAD)0.22
Skewness2.435527176
Sum3725.71
Variance1.441944204
MonotonicityNot monotonic
2022-09-07T16:17:54.800769image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02178
47.3%
1.3622
 
0.5%
0.4222
 
0.5%
0.6421
 
0.5%
0.721
 
0.5%
1.2320
 
0.4%
1.1620
 
0.4%
1.3519
 
0.4%
1.0818
 
0.4%
1.2518
 
0.4%
Other values (391)2242
48.7%
ValueCountFrequency (%)
02178
47.3%
0.011
 
< 0.1%
0.024
 
0.1%
0.031
 
< 0.1%
0.043
 
0.1%
0.052
 
< 0.1%
0.066
 
0.1%
0.074
 
0.1%
0.085
 
0.1%
0.095
 
0.1%
ValueCountFrequency (%)
11.111
 
< 0.1%
10.711
 
< 0.1%
9.521
 
< 0.1%
9.091
 
< 0.1%
8.691
 
< 0.1%
811
0.2%
7.41
 
< 0.1%
7.142
 
< 0.1%
6.891
 
< 0.1%
6.664
 
0.1%

word_freq_000
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct164
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1016452945
Minimum0
Maximum5.45
Zeros3922
Zeros (%)85.2%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:54.901855image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.73
Maximum5.45
Range5.45
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3502864186
Coefficient of variation (CV)3.446164628
Kurtosis46.80785977
Mean0.1016452945
Median Absolute Deviation (MAD)0
Skewness5.713775498
Sum467.67
Variance0.122700575
MonotonicityNot monotonic
2022-09-07T16:17:55.007446image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03922
85.2%
0.3426
 
0.6%
0.3619
 
0.4%
0.0816
 
0.3%
0.616
 
0.3%
0.4814
 
0.3%
0.8514
 
0.3%
0.0912
 
0.3%
0.3911
 
0.2%
0.1511
 
0.2%
Other values (154)540
 
11.7%
ValueCountFrequency (%)
03922
85.2%
0.012
 
< 0.1%
0.021
 
< 0.1%
0.031
 
< 0.1%
0.044
 
0.1%
0.0510
 
0.2%
0.066
 
0.1%
0.074
 
0.1%
0.0816
 
0.3%
0.0912
 
0.3%
ValueCountFrequency (%)
5.451
< 0.1%
4.761
< 0.1%
4.321
< 0.1%
4.011
< 0.1%
3.621
< 0.1%
3.571
< 0.1%
3.382
< 0.1%
3.171
< 0.1%
2.951
< 0.1%
2.851
< 0.1%

word_freq_hp
Real number (ℝ≥0)

ZEROS

Distinct395
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5495044556
Minimum0
Maximum20.83
Zeros3511
Zeros (%)76.3%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:55.114538image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3.06
Maximum20.83
Range20.83
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.671349342
Coefficient of variation (CV)3.041557398
Kurtosis43.6036337
Mean0.5495044556
Median Absolute Deviation (MAD)0
Skewness5.716843443
Sum2528.27
Variance2.793408624
MonotonicityNot monotonic
2022-09-07T16:17:55.214623image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03511
76.3%
0.4914
 
0.3%
0.3410
 
0.2%
1.5810
 
0.2%
0.649
 
0.2%
2.229
 
0.2%
0.99
 
0.2%
1.789
 
0.2%
2.639
 
0.2%
0.448
 
0.2%
Other values (385)1003
 
21.8%
ValueCountFrequency (%)
03511
76.3%
0.023
 
0.1%
0.031
 
< 0.1%
0.043
 
0.1%
0.054
 
0.1%
0.081
 
< 0.1%
0.092
 
< 0.1%
0.12
 
< 0.1%
0.112
 
< 0.1%
0.134
 
0.1%
ValueCountFrequency (%)
20.831
 
< 0.1%
202
 
< 0.1%
18.181
 
< 0.1%
16.666
0.1%
15.385
0.1%
14.281
 
< 0.1%
13.931
 
< 0.1%
13.043
0.1%
12.881
 
< 0.1%
12.52
 
< 0.1%

char_freq_$
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct504
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.07581069333
Minimum0
Maximum6.003
Zeros3201
Zeros (%)69.6%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-09-07T16:17:55.317712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.052
95-th percentile0.377
Maximum6.003
Range6.003
Interquartile range (IQR)0.052

Descriptive statistics

Standard deviation0.2458820113
Coefficient of variation (CV)3.243368456
Kurtosis199.9536916
Mean0.07581069333
Median Absolute Deviation (MAD)0
Skewness11.16314105
Sum348.805
Variance0.0604579635
MonotonicityNot monotonic
2022-09-07T16:17:55.418298image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03201
69.6%
0.11816
 
0.3%
0.06115
 
0.3%
0.03113
 
0.3%
0.15812
 
0.3%
0.01412
 
0.3%
0.06210
 
0.2%
0.0569
 
0.2%
0.1079
 
0.2%
0.1579
 
0.2%
Other values (494)1295
28.1%
ValueCountFrequency (%)
03201
69.6%
0.0031
 
< 0.1%
0.0041
 
< 0.1%
0.0054
 
0.1%
0.0062
 
< 0.1%
0.0072
 
< 0.1%
0.0083
 
0.1%
0.0093
 
0.1%
0.012
 
< 0.1%
0.0114
 
0.1%
ValueCountFrequency (%)
6.0031
< 0.1%
5.32
< 0.1%
4.0171
< 0.1%
3.3051
< 0.1%
3.261
< 0.1%
3.1251
< 0.1%
2.331
< 0.1%
2.0381
< 0.1%
1.9611
< 0.1%
1.7851
< 0.1%

spam
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
email
2788 
spam
1813 

Length

Max length5
Median length5
Mean length4.605955227
Min length4

Characters and Unicode

Total characters21192
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspam
2nd rowspam
3rd rowspam
4th rowspam
5th rowspam

Common Values

ValueCountFrequency (%)
email2788
60.6%
spam1813
39.4%

Length

2022-09-07T16:17:56.004299image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-07T16:17:56.082867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
email2788
60.6%
spam1813
39.4%

Most occurring characters

ValueCountFrequency (%)
m4601
21.7%
a4601
21.7%
e2788
13.2%
i2788
13.2%
l2788
13.2%
s1813
 
8.6%
p1813
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21192
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
m4601
21.7%
a4601
21.7%
e2788
13.2%
i2788
13.2%
l2788
13.2%
s1813
 
8.6%
p1813
 
8.6%

Most occurring scripts

ValueCountFrequency (%)
Latin21192
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
m4601
21.7%
a4601
21.7%
e2788
13.2%
i2788
13.2%
l2788
13.2%
s1813
 
8.6%
p1813
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII21192
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
m4601
21.7%
a4601
21.7%
e2788
13.2%
i2788
13.2%
l2788
13.2%
s1813
 
8.6%
p1813
 
8.6%

Interactions

2022-09-07T16:17:52.795549image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:47.530536image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.297694image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.077863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.841017image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.604171image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.346807image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.082439image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.888630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:47.629121image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.395277image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.174446image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.939601image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.696249image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.439887image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.173517image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.985212image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:47.730207image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.497865image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.274031image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.040688image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.794334image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.536971image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.267097image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:53.077792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:47.828792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.595450image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.369614image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.137771image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.889416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.631051image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.358676image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:53.172874image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:47.925875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.695035image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.469699image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.234854image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.989001image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.727134image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.450254image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:53.260449image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.018955image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.789616image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.561777image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.327434image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.079578image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.817211image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.536829image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:53.349525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.113536image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.886700image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.655858image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.419513image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.168655image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.905286image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.623904image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:53.435599image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.204114image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:48.980779image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:49.745935image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:50.509590image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.253727image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:51.991860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-07T16:17:52.706975image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-07T16:17:56.141918image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-07T16:17:56.267025image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-07T16:17:56.393134image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-07T16:17:56.518240image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-07T16:17:53.569213image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-07T16:17:53.715839image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

word_freq_removeword_freq_freeword_freq_businessword_freq_youword_freq_yourword_freq_000word_freq_hpchar_freq_$spam
00.000.320.001.930.960.000.00.000spam
10.210.140.073.471.590.430.00.180spam
20.190.060.061.360.511.160.00.184spam
30.310.310.003.180.310.000.00.000spam
40.310.310.003.180.310.000.00.000spam
50.000.000.000.000.000.000.00.000spam
60.000.960.003.850.640.000.00.054spam
70.000.000.000.000.000.000.00.000spam
80.300.000.001.232.000.000.00.203spam
90.380.000.001.670.710.190.00.081spam

Last rows

word_freq_removeword_freq_freeword_freq_businessword_freq_youword_freq_yourword_freq_000word_freq_hpchar_freq_$spam
45910.00.00.06.890.000.00.00.0email
45920.00.00.00.620.000.00.00.0email
45930.00.00.00.000.000.00.00.0email
45940.00.00.06.450.000.00.00.0email
45950.00.00.03.571.190.00.00.0email
45960.00.00.00.620.000.00.00.0email
45970.00.00.06.002.000.00.00.0email
45980.00.00.01.500.300.00.00.0email
45990.00.00.01.930.320.00.00.0email
46000.00.00.04.600.650.00.00.0email

Duplicate rows

Most frequently occurring

word_freq_removeword_freq_freeword_freq_businessword_freq_youword_freq_yourword_freq_000word_freq_hpchar_freq_$spam# duplicates
00.00.00.00.000.00.00.000.000email692
10.00.00.00.000.00.00.000.000spam55
1760.00.00.02.008.00.00.000.000email11
2210.00.00.03.840.00.00.000.000email8
2690.00.00.71.401.40.00.000.000email8
3820.50.10.01.310.70.60.000.158spam8
1950.00.00.02.560.00.00.000.000email7
2280.00.00.04.340.00.00.000.000email7
2390.00.00.05.550.00.00.000.000email7
580.00.00.00.000.00.09.520.000email6