Dataset statistics
Number of variables | 9 |
---|---|
Number of observations | 4601 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 410 |
Duplicate rows (%) | 8.9% |
Total size in memory | 323.6 KiB |
Average record size in memory | 72.0 B |
Variable types
Numeric | 8 |
---|---|
Categorical | 1 |
Dataset has 410 (8.9%) duplicate rows | Duplicates |
word_freq_000 is highly correlated with char_freq_$ | High correlation |
char_freq_$ is highly correlated with word_freq_000 | High correlation |
word_freq_000 is highly correlated with char_freq_$ | High correlation |
char_freq_$ is highly correlated with word_freq_000 | High correlation |
word_freq_your is highly correlated with spam | High correlation |
spam is highly correlated with word_freq_your | High correlation |
word_freq_remove has 3794 (82.5%) zeros | Zeros |
word_freq_free has 3360 (73.0%) zeros | Zeros |
word_freq_business has 3638 (79.1%) zeros | Zeros |
word_freq_you has 1374 (29.9%) zeros | Zeros |
word_freq_your has 2178 (47.3%) zeros | Zeros |
word_freq_000 has 3922 (85.2%) zeros | Zeros |
word_freq_hp has 3511 (76.3%) zeros | Zeros |
char_freq_$ has 3201 (69.6%) zeros | Zeros |
Reproduction
Analysis started | 2022-09-07 20:17:44.501815 |
---|---|
Analysis finished | 2022-09-07 20:17:53.775389 |
Duration | 9.27 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 173 |
---|---|
Distinct (%) | 3.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.1142077809 |
Minimum | 0 |
---|---|
Maximum | 7.27 |
Zeros | 3794 |
Zeros (%) | 82.5% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0.74 |
Maximum | 7.27 |
Range | 7.27 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.3914413548 |
---|---|
Coefficient of variation (CV) | 3.42744909 |
Kurtosis | 75.41343865 |
Mean | 0.1142077809 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 6.765580469 |
Sum | 525.47 |
Variance | 0.1532263342 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3794 | |
0.08 | 30 | 0.7% |
0.05 | 21 | 0.5% |
0.5 | 19 | 0.4% |
0.32 | 19 | 0.4% |
0.19 | 18 | 0.4% |
0.25 | 16 | 0.3% |
0.1 | 14 | 0.3% |
0.16 | 14 | 0.3% |
0.4 | 14 | 0.3% |
Other values (163) | 642 | 14.0% |
Value | Count | Frequency (%) |
0 | 3794 | |
0.02 | 4 | 0.1% |
0.03 | 11 | 0.2% |
0.04 | 8 | 0.2% |
0.05 | 21 | 0.5% |
0.06 | 12 | 0.3% |
0.07 | 7 | 0.2% |
0.08 | 30 | 0.7% |
0.09 | 10 | 0.2% |
0.1 | 14 | 0.3% |
Value | Count | Frequency (%) |
7.27 | 2 | |
5.4 | 1 | |
4.54 | 1 | |
4.08 | 1 | |
4 | 1 | |
3.27 | 1 | |
3.12 | 2 | |
3.07 | 1 | |
2.98 | 1 | |
2.94 | 2 |
Distinct | 253 |
---|---|
Distinct (%) | 5.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.2488480765 |
Minimum | 0 |
---|---|
Maximum | 20 |
Zeros | 3360 |
Zeros (%) | 73.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0.1 |
95-th percentile | 1.34 |
Maximum | 20 |
Range | 20 |
Interquartile range (IQR) | 0.1 |
Descriptive statistics
Standard deviation | 0.8257917011 |
---|---|
Coefficient of variation (CV) | 3.31845724 |
Kurtosis | 196.4249754 |
Mean | 0.2488480765 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 10.76359403 |
Sum | 1144.95 |
Variance | 0.6819319337 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3360 | |
0.1 | 33 | 0.7% |
0.32 | 31 | 0.7% |
0.25 | 24 | 0.5% |
0.23 | 23 | 0.5% |
0.38 | 21 | 0.5% |
0.19 | 19 | 0.4% |
0.14 | 18 | 0.4% |
0.08 | 17 | 0.4% |
0.58 | 17 | 0.4% |
Other values (243) | 1038 | 22.6% |
Value | Count | Frequency (%) |
0 | 3360 | |
0.01 | 2 | < 0.1% |
0.02 | 4 | 0.1% |
0.03 | 4 | 0.1% |
0.04 | 1 | < 0.1% |
0.05 | 9 | 0.2% |
0.06 | 6 | 0.1% |
0.07 | 3 | 0.1% |
0.08 | 17 | 0.4% |
0.09 | 14 | 0.3% |
Value | Count | Frequency (%) |
20 | 2 | |
16.66 | 1 | |
10.16 | 1 | |
10 | 1 | |
7.69 | 2 | |
7.35 | 2 | |
6.52 | 1 | |
6.45 | 1 | |
6.25 | 2 | |
6.09 | 1 |
Distinct | 197 |
---|---|
Distinct (%) | 4.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.1425863943 |
Minimum | 0 |
---|---|
Maximum | 7.14 |
Zeros | 3638 |
Zeros (%) | 79.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0.82 |
Maximum | 7.14 |
Range | 7.14 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.444055329 |
---|---|
Coefficient of variation (CV) | 3.11428963 |
Kurtosis | 45.67377543 |
Mean | 0.1425863943 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 5.688642099 |
Sum | 656.04 |
Variance | 0.1971851352 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3638 | |
0.08 | 27 | 0.6% |
0.32 | 26 | 0.6% |
0.37 | 24 | 0.5% |
0.19 | 20 | 0.4% |
0.1 | 19 | 0.4% |
0.2 | 18 | 0.4% |
0.17 | 18 | 0.4% |
0.7 | 17 | 0.4% |
0.44 | 17 | 0.4% |
Other values (187) | 777 | 16.9% |
Value | Count | Frequency (%) |
0 | 3638 | |
0.01 | 2 | < 0.1% |
0.02 | 3 | 0.1% |
0.03 | 5 | 0.1% |
0.04 | 5 | 0.1% |
0.05 | 7 | 0.2% |
0.06 | 8 | 0.2% |
0.07 | 7 | 0.2% |
0.08 | 27 | 0.6% |
0.09 | 14 | 0.3% |
Value | Count | Frequency (%) |
7.14 | 1 | |
5.12 | 1 | |
5.06 | 1 | |
4.87 | 1 | |
4.81 | 1 | |
4.5 | 1 | |
3.88 | 2 | |
3.84 | 1 | |
3.73 | 2 | |
3.57 | 1 |
Distinct | 575 |
---|---|
Distinct (%) | 12.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.662099544 |
Minimum | 0 |
---|---|
Maximum | 18.75 |
Zeros | 1374 |
Zeros (%) | 29.9% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 1.31 |
Q3 | 2.64 |
95-th percentile | 4.76 |
Maximum | 18.75 |
Range | 18.75 |
Interquartile range (IQR) | 2.64 |
Descriptive statistics
Standard deviation | 1.775480665 |
---|---|
Coefficient of variation (CV) | 1.068215602 |
Kurtosis | 5.257394368 |
Mean | 1.662099544 |
Median Absolute Deviation (MAD) | 1.31 |
Skewness | 1.591674269 |
Sum | 7647.32 |
Variance | 3.152331591 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 1374 | |
1.31 | 36 | 0.8% |
2 | 24 | 0.5% |
2.56 | 24 | 0.5% |
3.33 | 23 | 0.5% |
1.29 | 21 | 0.5% |
3.84 | 21 | 0.5% |
1.2 | 19 | 0.4% |
1.36 | 18 | 0.4% |
1.85 | 17 | 0.4% |
Other values (565) | 3024 |
Value | Count | Frequency (%) |
0 | 1374 | |
0.01 | 1 | < 0.1% |
0.02 | 2 | < 0.1% |
0.03 | 2 | < 0.1% |
0.05 | 4 | 0.1% |
0.06 | 1 | < 0.1% |
0.07 | 7 | 0.2% |
0.08 | 2 | < 0.1% |
0.09 | 4 | 0.1% |
0.1 | 5 | 0.1% |
Value | Count | Frequency (%) |
18.75 | 1 | < 0.1% |
14.28 | 2 | |
14 | 1 | < 0.1% |
12.5 | 2 | |
12.19 | 1 | < 0.1% |
11.11 | 1 | < 0.1% |
10.63 | 1 | < 0.1% |
9.72 | 1 | < 0.1% |
9.52 | 2 | |
9.09 | 4 |
Distinct | 401 |
---|---|
Distinct (%) | 8.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.8097609215 |
Minimum | 0 |
---|---|
Maximum | 11.11 |
Zeros | 2178 |
Zeros (%) | 47.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0.22 |
Q3 | 1.27 |
95-th percentile | 3.17 |
Maximum | 11.11 |
Range | 11.11 |
Interquartile range (IQR) | 1.27 |
Descriptive statistics
Standard deviation | 1.200809812 |
---|---|
Coefficient of variation (CV) | 1.482918945 |
Kurtosis | 9.009506008 |
Mean | 0.8097609215 |
Median Absolute Deviation (MAD) | 0.22 |
Skewness | 2.435527176 |
Sum | 3725.71 |
Variance | 1.441944204 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 2178 | |
1.36 | 22 | 0.5% |
0.42 | 22 | 0.5% |
0.64 | 21 | 0.5% |
0.7 | 21 | 0.5% |
1.23 | 20 | 0.4% |
1.16 | 20 | 0.4% |
1.35 | 19 | 0.4% |
1.08 | 18 | 0.4% |
1.25 | 18 | 0.4% |
Other values (391) | 2242 |
Value | Count | Frequency (%) |
0 | 2178 | |
0.01 | 1 | < 0.1% |
0.02 | 4 | 0.1% |
0.03 | 1 | < 0.1% |
0.04 | 3 | 0.1% |
0.05 | 2 | < 0.1% |
0.06 | 6 | 0.1% |
0.07 | 4 | 0.1% |
0.08 | 5 | 0.1% |
0.09 | 5 | 0.1% |
Value | Count | Frequency (%) |
11.11 | 1 | < 0.1% |
10.71 | 1 | < 0.1% |
9.52 | 1 | < 0.1% |
9.09 | 1 | < 0.1% |
8.69 | 1 | < 0.1% |
8 | 11 | |
7.4 | 1 | < 0.1% |
7.14 | 2 | < 0.1% |
6.89 | 1 | < 0.1% |
6.66 | 4 | 0.1% |
Distinct | 164 |
---|---|
Distinct (%) | 3.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.1016452945 |
Minimum | 0 |
---|---|
Maximum | 5.45 |
Zeros | 3922 |
Zeros (%) | 85.2% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0.73 |
Maximum | 5.45 |
Range | 5.45 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.3502864186 |
---|---|
Coefficient of variation (CV) | 3.446164628 |
Kurtosis | 46.80785977 |
Mean | 0.1016452945 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 5.713775498 |
Sum | 467.67 |
Variance | 0.122700575 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3922 | |
0.34 | 26 | 0.6% |
0.36 | 19 | 0.4% |
0.08 | 16 | 0.3% |
0.6 | 16 | 0.3% |
0.48 | 14 | 0.3% |
0.85 | 14 | 0.3% |
0.09 | 12 | 0.3% |
0.39 | 11 | 0.2% |
0.15 | 11 | 0.2% |
Other values (154) | 540 | 11.7% |
Value | Count | Frequency (%) |
0 | 3922 | |
0.01 | 2 | < 0.1% |
0.02 | 1 | < 0.1% |
0.03 | 1 | < 0.1% |
0.04 | 4 | 0.1% |
0.05 | 10 | 0.2% |
0.06 | 6 | 0.1% |
0.07 | 4 | 0.1% |
0.08 | 16 | 0.3% |
0.09 | 12 | 0.3% |
Value | Count | Frequency (%) |
5.45 | 1 | |
4.76 | 1 | |
4.32 | 1 | |
4.01 | 1 | |
3.62 | 1 | |
3.57 | 1 | |
3.38 | 2 | |
3.17 | 1 | |
2.95 | 1 | |
2.85 | 1 |
Distinct | 395 |
---|---|
Distinct (%) | 8.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.5495044556 |
Minimum | 0 |
---|---|
Maximum | 20.83 |
Zeros | 3511 |
Zeros (%) | 76.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 3.06 |
Maximum | 20.83 |
Range | 20.83 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 1.671349342 |
---|---|
Coefficient of variation (CV) | 3.041557398 |
Kurtosis | 43.6036337 |
Mean | 0.5495044556 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 5.716843443 |
Sum | 2528.27 |
Variance | 2.793408624 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3511 | |
0.49 | 14 | 0.3% |
0.34 | 10 | 0.2% |
1.58 | 10 | 0.2% |
0.64 | 9 | 0.2% |
2.22 | 9 | 0.2% |
0.9 | 9 | 0.2% |
1.78 | 9 | 0.2% |
2.63 | 9 | 0.2% |
0.44 | 8 | 0.2% |
Other values (385) | 1003 | 21.8% |
Value | Count | Frequency (%) |
0 | 3511 | |
0.02 | 3 | 0.1% |
0.03 | 1 | < 0.1% |
0.04 | 3 | 0.1% |
0.05 | 4 | 0.1% |
0.08 | 1 | < 0.1% |
0.09 | 2 | < 0.1% |
0.1 | 2 | < 0.1% |
0.11 | 2 | < 0.1% |
0.13 | 4 | 0.1% |
Value | Count | Frequency (%) |
20.83 | 1 | < 0.1% |
20 | 2 | < 0.1% |
18.18 | 1 | < 0.1% |
16.66 | 6 | |
15.38 | 5 | |
14.28 | 1 | < 0.1% |
13.93 | 1 | < 0.1% |
13.04 | 3 | |
12.88 | 1 | < 0.1% |
12.5 | 2 | < 0.1% |
Distinct | 504 |
---|---|
Distinct (%) | 11.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.07581069333 |
Minimum | 0 |
---|---|
Maximum | 6.003 |
Zeros | 3201 |
Zeros (%) | 69.6% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 36.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0.052 |
95-th percentile | 0.377 |
Maximum | 6.003 |
Range | 6.003 |
Interquartile range (IQR) | 0.052 |
Descriptive statistics
Standard deviation | 0.2458820113 |
---|---|
Coefficient of variation (CV) | 3.243368456 |
Kurtosis | 199.9536916 |
Mean | 0.07581069333 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 11.16314105 |
Sum | 348.805 |
Variance | 0.0604579635 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 3201 | |
0.118 | 16 | 0.3% |
0.061 | 15 | 0.3% |
0.031 | 13 | 0.3% |
0.158 | 12 | 0.3% |
0.014 | 12 | 0.3% |
0.062 | 10 | 0.2% |
0.056 | 9 | 0.2% |
0.107 | 9 | 0.2% |
0.157 | 9 | 0.2% |
Other values (494) | 1295 |
Value | Count | Frequency (%) |
0 | 3201 | |
0.003 | 1 | < 0.1% |
0.004 | 1 | < 0.1% |
0.005 | 4 | 0.1% |
0.006 | 2 | < 0.1% |
0.007 | 2 | < 0.1% |
0.008 | 3 | 0.1% |
0.009 | 3 | 0.1% |
0.01 | 2 | < 0.1% |
0.011 | 4 | 0.1% |
Value | Count | Frequency (%) |
6.003 | 1 | |
5.3 | 2 | |
4.017 | 1 | |
3.305 | 1 | |
3.26 | 1 | |
3.125 | 1 | |
2.33 | 1 | |
2.038 | 1 | |
1.961 | 1 | |
1.785 | 1 |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 36.1 KiB |
spam |
---|
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 4.605955227 |
Min length | 4 |
Characters and Unicode
Total characters | 21192 |
---|---|
Distinct characters | 7 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | spam |
---|---|
2nd row | spam |
3rd row | spam |
4th row | spam |
5th row | spam |
Common Values
Value | Count | Frequency (%) |
2788 | ||
spam | 1813 |
Length
Histogram of lengths of the category
Category Frequency Plot
Value | Count | Frequency (%) |
2788 | ||
spam | 1813 |
Most occurring characters
Value | Count | Frequency (%) |
m | 4601 | |
a | 4601 | |
e | 2788 | |
i | 2788 | |
l | 2788 | |
s | 1813 | 8.6% |
p | 1813 | 8.6% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 21192 |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
m | 4601 | |
a | 4601 | |
e | 2788 | |
i | 2788 | |
l | 2788 | |
s | 1813 | 8.6% |
p | 1813 | 8.6% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 21192 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
m | 4601 | |
a | 4601 | |
e | 2788 | |
i | 2788 | |
l | 2788 | |
s | 1813 | 8.6% |
p | 1813 | 8.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 21192 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
m | 4601 | |
a | 4601 | |
e | 2788 | |
i | 2788 | |
l | 2788 | |
s | 1813 | 8.6% |
p | 1813 | 8.6% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
word_freq_remove | word_freq_free | word_freq_business | word_freq_you | word_freq_your | word_freq_000 | word_freq_hp | char_freq_$ | spam | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.00 | 0.32 | 0.00 | 1.93 | 0.96 | 0.00 | 0.0 | 0.000 | spam |
1 | 0.21 | 0.14 | 0.07 | 3.47 | 1.59 | 0.43 | 0.0 | 0.180 | spam |
2 | 0.19 | 0.06 | 0.06 | 1.36 | 0.51 | 1.16 | 0.0 | 0.184 | spam |
3 | 0.31 | 0.31 | 0.00 | 3.18 | 0.31 | 0.00 | 0.0 | 0.000 | spam |
4 | 0.31 | 0.31 | 0.00 | 3.18 | 0.31 | 0.00 | 0.0 | 0.000 | spam |
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.000 | spam |
6 | 0.00 | 0.96 | 0.00 | 3.85 | 0.64 | 0.00 | 0.0 | 0.054 | spam |
7 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.000 | spam |
8 | 0.30 | 0.00 | 0.00 | 1.23 | 2.00 | 0.00 | 0.0 | 0.203 | spam |
9 | 0.38 | 0.00 | 0.00 | 1.67 | 0.71 | 0.19 | 0.0 | 0.081 | spam |
Last rows
word_freq_remove | word_freq_free | word_freq_business | word_freq_you | word_freq_your | word_freq_000 | word_freq_hp | char_freq_$ | spam | |
---|---|---|---|---|---|---|---|---|---|
4591 | 0.0 | 0.0 | 0.0 | 6.89 | 0.00 | 0.0 | 0.0 | 0.0 | |
4592 | 0.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.0 | 0.0 | 0.0 | |
4593 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | |
4594 | 0.0 | 0.0 | 0.0 | 6.45 | 0.00 | 0.0 | 0.0 | 0.0 | |
4595 | 0.0 | 0.0 | 0.0 | 3.57 | 1.19 | 0.0 | 0.0 | 0.0 | |
4596 | 0.0 | 0.0 | 0.0 | 0.62 | 0.00 | 0.0 | 0.0 | 0.0 | |
4597 | 0.0 | 0.0 | 0.0 | 6.00 | 2.00 | 0.0 | 0.0 | 0.0 | |
4598 | 0.0 | 0.0 | 0.0 | 1.50 | 0.30 | 0.0 | 0.0 | 0.0 | |
4599 | 0.0 | 0.0 | 0.0 | 1.93 | 0.32 | 0.0 | 0.0 | 0.0 | |
4600 | 0.0 | 0.0 | 0.0 | 4.60 | 0.65 | 0.0 | 0.0 | 0.0 |
Most frequently occurring
word_freq_remove | word_freq_free | word_freq_business | word_freq_you | word_freq_your | word_freq_000 | word_freq_hp | char_freq_$ | spam | # duplicates | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.000 | 692 | |
1 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.000 | spam | 55 |
176 | 0.0 | 0.0 | 0.0 | 2.00 | 8.0 | 0.0 | 0.00 | 0.000 | 11 | |
221 | 0.0 | 0.0 | 0.0 | 3.84 | 0.0 | 0.0 | 0.00 | 0.000 | 8 | |
269 | 0.0 | 0.0 | 0.7 | 1.40 | 1.4 | 0.0 | 0.00 | 0.000 | 8 | |
382 | 0.5 | 0.1 | 0.0 | 1.31 | 0.7 | 0.6 | 0.00 | 0.158 | spam | 8 |
195 | 0.0 | 0.0 | 0.0 | 2.56 | 0.0 | 0.0 | 0.00 | 0.000 | 7 | |
228 | 0.0 | 0.0 | 0.0 | 4.34 | 0.0 | 0.0 | 0.00 | 0.000 | 7 | |
239 | 0.0 | 0.0 | 0.0 | 5.55 | 0.0 | 0.0 | 0.00 | 0.000 | 7 | |
58 | 0.0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 9.52 | 0.000 | 6 |