Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 6099 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 619.6 KiB |
| Average record size in memory | 104.0 B |
Variable types
| DateTime | 2 |
|---|---|
| Categorical | 2 |
| Numeric | 9 |
PT08.S1(CO) is highly correlated with PT08.S2(NMHC) and 3 other fields | High correlation |
PT08.S2(NMHC) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
PT08.S3(NOx) is highly correlated with PT08.S1(CO) and 2 other fields | High correlation |
PT08.S4(NO2) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
PT08.S5(O3) is highly correlated with PT08.S1(CO) and 2 other fields | High correlation |
T is highly correlated with PT08.S4(NO2) and 2 other fields | High correlation |
RH is highly correlated with T | High correlation |
AH is highly correlated with PT08.S4(NO2) and 1 other fields | High correlation |
PT08.S1(CO) is highly correlated with PT08.S2(NMHC) and 3 other fields | High correlation |
PT08.S2(NMHC) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
PT08.S3(NOx) is highly correlated with PT08.S1(CO) and 2 other fields | High correlation |
PT08.S4(NO2) is highly correlated with PT08.S1(CO) and 4 other fields | High correlation |
PT08.S5(O3) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
T is highly correlated with PT08.S4(NO2) and 2 other fields | High correlation |
RH is highly correlated with T | High correlation |
AH is highly correlated with PT08.S4(NO2) and 1 other fields | High correlation |
PT08.S1(CO) is highly correlated with PT08.S2(NMHC) and 2 other fields | High correlation |
PT08.S2(NMHC) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
PT08.S3(NOx) is highly correlated with PT08.S1(CO) and 2 other fields | High correlation |
PT08.S4(NO2) is highly correlated with PT08.S2(NMHC) and 1 other fields | High correlation |
PT08.S5(O3) is highly correlated with PT08.S1(CO) and 2 other fields | High correlation |
AH is highly correlated with PT08.S4(NO2) | High correlation |
Month is highly correlated with PT08.S4(NO2) and 2 other fields | High correlation |
Hour is highly correlated with PT08.S2(NMHC) and 1 other fields | High correlation |
PT08.S1(CO) is highly correlated with PT08.S2(NMHC) and 3 other fields | High correlation |
PT08.S2(NMHC) is highly correlated with Hour and 4 other fields | High correlation |
PT08.S3(NOx) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
PT08.S4(NO2) is highly correlated with Month and 6 other fields | High correlation |
PT08.S5(O3) is highly correlated with PT08.S1(CO) and 3 other fields | High correlation |
T is highly correlated with Month and 3 other fields | High correlation |
RH is highly correlated with Hour and 1 other fields | High correlation |
AH is highly correlated with Month and 2 other fields | High correlation |
DateTime has unique values | Unique |
Hour has 287 (4.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-01 23:47:45.574097 |
|---|---|
| Analysis finished | 2022-09-01 23:47:55.809870 |
| Duration | 10.24 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
Date
Date
| Distinct | 341 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 47.8 KiB |
| Minimum | 2004-03-10 00:00:00 |
|---|---|
| Maximum | 2005-04-04 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 47.8 KiB |
| March | |
|---|---|
| May | |
| June | |
| July | |
| January | |
| Other values (7) |
Length
| Max length | 9 |
|---|---|
| Median length | 7 |
| Mean length | 5.916871618 |
| Min length | 3 |
Characters and Unicode
| Total characters | 36087 |
|---|---|
| Distinct characters | 26 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | March |
|---|---|
| 2nd row | March |
| 3rd row | March |
| 4th row | March |
| 5th row | March |
Common Values
| Value | Count | Frequency (%) |
| March | 1054 | |
| May | 543 | |
| June | 537 | |
| July | 511 | |
| January | 501 | |
| April | 484 | |
| February | 482 | |
| November | 459 | |
| December | 417 | 6.8% |
| September | 401 | 6.6% |
| Other values (2) | 710 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| march | 1054 | |
| may | 543 | |
| june | 537 | |
| july | 511 | |
| january | 501 | |
| april | 484 | |
| february | 482 | |
| november | 459 | |
| december | 417 | 6.8% |
| september | 401 | 6.6% |
| Other values (2) | 710 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 4727 | |
| r | 4616 | |
| a | 3081 | 8.5% |
| u | 2779 | 7.7% |
| b | 2095 | 5.8% |
| y | 2037 | 5.6% |
| c | 1807 | 5.0% |
| M | 1597 | 4.4% |
| J | 1549 | 4.3% |
| m | 1277 | 3.5% |
| Other values (16) | 10522 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 29988 | |
| Uppercase Letter | 6099 | 16.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 4727 | |
| r | 4616 | |
| a | 3081 | |
| u | 2779 | |
| b | 2095 | 7.0% |
| y | 2037 | 6.8% |
| c | 1807 | 6.0% |
| m | 1277 | 4.3% |
| t | 1111 | 3.7% |
| h | 1054 | 3.5% |
| Other values (8) | 5404 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 1597 | |
| J | 1549 | |
| A | 858 | |
| F | 482 | 7.9% |
| N | 459 | 7.5% |
| D | 417 | 6.8% |
| S | 401 | 6.6% |
| O | 336 | 5.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 36087 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 4727 | |
| r | 4616 | |
| a | 3081 | 8.5% |
| u | 2779 | 7.7% |
| b | 2095 | 5.8% |
| y | 2037 | 5.6% |
| c | 1807 | 5.0% |
| M | 1597 | 4.4% |
| J | 1549 | 4.3% |
| m | 1277 | 3.5% |
| Other values (16) | 10522 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 36087 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 4727 | |
| r | 4616 | |
| a | 3081 | 8.5% |
| u | 2779 | 7.7% |
| b | 2095 | 5.8% |
| y | 2037 | 5.6% |
| c | 1807 | 5.0% |
| M | 1597 | 4.4% |
| J | 1549 | 4.3% |
| m | 1277 | 3.5% |
| Other values (16) | 10522 |
Weekday
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 47.8 KiB |
| Saturday | |
|---|---|
| Sunday | |
| Monday | |
| Friday | |
| Thursday | |
| Other values (2) |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.110345958 |
| Min length | 6 |
Characters and Unicode
| Total characters | 43366 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Wednesday |
|---|---|
| 2nd row | Wednesday |
| 3rd row | Wednesday |
| 4th row | Wednesday |
| 5th row | Wednesday |
Common Values
| Value | Count | Frequency (%) |
| Saturday | 1005 | |
| Sunday | 946 | |
| Monday | 899 | |
| Friday | 863 | |
| Thursday | 804 | |
| Tuesday | 796 | |
| Wednesday | 786 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| saturday | 1005 | |
| sunday | 946 | |
| monday | 899 | |
| friday | 863 | |
| thursday | 804 | |
| tuesday | 796 | |
| wednesday | 786 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 7104 | |
| d | 6885 | |
| y | 6099 | |
| u | 3551 | |
| r | 2672 | 6.2% |
| n | 2631 | 6.1% |
| s | 2386 | 5.5% |
| e | 2368 | 5.5% |
| S | 1951 | 4.5% |
| T | 1600 | 3.7% |
| Other values (7) | 6119 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 37267 | |
| Uppercase Letter | 6099 | 14.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 7104 | |
| d | 6885 | |
| y | 6099 | |
| u | 3551 | |
| r | 2672 | 7.2% |
| n | 2631 | 7.1% |
| s | 2386 | 6.4% |
| e | 2368 | 6.4% |
| t | 1005 | 2.7% |
| o | 899 | 2.4% |
| Other values (2) | 1667 | 4.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1951 | |
| T | 1600 | |
| M | 899 | |
| F | 863 | |
| W | 786 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 43366 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 7104 | |
| d | 6885 | |
| y | 6099 | |
| u | 3551 | |
| r | 2672 | 6.2% |
| n | 2631 | 6.1% |
| s | 2386 | 5.5% |
| e | 2368 | 5.5% |
| S | 1951 | 4.5% |
| T | 1600 | 3.7% |
| Other values (7) | 6119 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 43366 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 7104 | |
| d | 6885 | |
| y | 6099 | |
| u | 3551 | |
| r | 2672 | 6.2% |
| n | 2631 | 6.1% |
| s | 2386 | 5.5% |
| e | 2368 | 5.5% |
| S | 1951 | 4.5% |
| T | 1600 | 3.7% |
| Other values (7) | 6119 |
| Distinct | 6099 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 47.8 KiB |
| Minimum | 2004-03-10 18:00:00 |
|---|---|
| Maximum | 2005-04-04 14:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.99294966 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 287 |
| Zeros (%) | 4.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 12 |
| Q3 | 18 |
| 95-th percentile | 22 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 6.880900037 |
|---|---|
| Coefficient of variation (CV) | 0.5737454279 |
| Kurtosis | -1.10497181 |
| Mean | 11.99294966 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.1144539213 |
| Sum | 73145 |
| Variance | 47.34678531 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 22 | 292 | 4.8% |
| 23 | 292 | 4.8% |
| 0 | 287 | 4.7% |
| 15 | 286 | 4.7% |
| 7 | 284 | 4.7% |
| 12 | 283 | 4.6% |
| 1 | 280 | 4.6% |
| 16 | 280 | 4.6% |
| 13 | 279 | 4.6% |
| 14 | 276 | 4.5% |
| Other values (14) | 3260 |
| Value | Count | Frequency (%) |
| 0 | 287 | |
| 1 | 280 | |
| 2 | 269 | |
| 3 | 25 | 0.4% |
| 4 | 145 | |
| 5 | 260 | |
| 6 | 274 | |
| 7 | 284 | |
| 8 | 231 | |
| 9 | 246 |
| Value | Count | Frequency (%) |
| 23 | 292 | |
| 22 | 292 | |
| 21 | 271 | |
| 20 | 253 | |
| 19 | 236 | |
| 18 | 250 | |
| 17 | 268 | |
| 16 | 280 | |
| 15 | 286 | |
| 14 | 276 |
| Distinct | 840 |
|---|---|
| Distinct (%) | 13.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1093.204952 |
| Minimum | 667 |
|---|---|
| Maximum | 1667 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 667 |
|---|---|
| 5-th percentile | 835 |
| Q1 | 955 |
| median | 1070 |
| Q3 | 1212 |
| 95-th percentile | 1438 |
| Maximum | 1667 |
| Range | 1000 |
| Interquartile range (IQR) | 257 |
Descriptive statistics
| Standard deviation | 182.2290977 |
|---|---|
| Coefficient of variation (CV) | 0.1666925286 |
| Kurtosis | -0.3126180085 |
| Mean | 1093.204952 |
| Median Absolute Deviation (MAD) | 128 |
| Skewness | 0.5033105253 |
| Sum | 6667457 |
| Variance | 33207.44405 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 973 | 23 | 0.4% |
| 938 | 22 | 0.4% |
| 1100 | 22 | 0.4% |
| 988 | 21 | 0.3% |
| 969 | 20 | 0.3% |
| 1016 | 20 | 0.3% |
| 1065 | 20 | 0.3% |
| 1111 | 19 | 0.3% |
| 984 | 19 | 0.3% |
| 1021 | 19 | 0.3% |
| Other values (830) | 5894 |
| Value | Count | Frequency (%) |
| 667 | 1 | |
| 683 | 1 | |
| 692 | 1 | |
| 695 | 2 | |
| 703 | 1 | |
| 729 | 1 | |
| 732 | 2 | |
| 738 | 1 | |
| 740 | 1 | |
| 741 | 1 |
| Value | Count | Frequency (%) |
| 1667 | 2 | |
| 1664 | 1 | |
| 1657 | 1 | |
| 1651 | 1 | |
| 1643 | 1 | |
| 1642 | 1 | |
| 1636 | 1 | |
| 1633 | 2 | |
| 1625 | 1 | |
| 1621 | 1 |
| Distinct | 978 |
|---|---|
| Distinct (%) | 16.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 928.6655189 |
| Minimum | 440 |
|---|---|
| Maximum | 1504 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 440 |
|---|---|
| 5-th percentile | 601 |
| Q1 | 758.5 |
| median | 913 |
| Q3 | 1089 |
| 95-th percentile | 1319 |
| Maximum | 1504 |
| Range | 1064 |
| Interquartile range (IQR) | 330.5 |
Descriptive statistics
| Standard deviation | 220.6129523 |
|---|---|
| Coefficient of variation (CV) | 0.237559108 |
| Kurtosis | -0.6364182336 |
| Mean | 928.6655189 |
| Median Absolute Deviation (MAD) | 165 |
| Skewness | 0.2717299101 |
| Sum | 5663931 |
| Variance | 48670.07472 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 880 | 20 | 0.3% |
| 776 | 19 | 0.3% |
| 800 | 18 | 0.3% |
| 896 | 17 | 0.3% |
| 803 | 17 | 0.3% |
| 985 | 17 | 0.3% |
| 849 | 16 | 0.3% |
| 931 | 16 | 0.3% |
| 962 | 16 | 0.3% |
| 826 | 16 | 0.3% |
| Other values (968) | 5927 |
| Value | Count | Frequency (%) |
| 440 | 1 | |
| 449 | 1 | |
| 454 | 1 | |
| 459 | 1 | |
| 460 | 1 | |
| 465 | 2 | |
| 466 | 1 | |
| 470 | 2 | |
| 474 | 1 | |
| 476 | 1 |
| Value | Count | Frequency (%) |
| 1504 | 1 | |
| 1503 | 1 | |
| 1501 | 1 | |
| 1500 | 1 | |
| 1499 | 1 | |
| 1497 | 1 | |
| 1496 | 1 | |
| 1495 | 1 | |
| 1493 | 1 | |
| 1492 | 2 |
| Distinct | 904 |
|---|---|
| Distinct (%) | 14.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 820.3277586 |
| Minimum | 360 |
|---|---|
| Maximum | 1374 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 360 |
|---|---|
| 5-th percentile | 525 |
| Q1 | 675.5 |
| median | 800 |
| Q3 | 944 |
| 95-th percentile | 1185 |
| Maximum | 1374 |
| Range | 1014 |
| Interquartile range (IQR) | 268.5 |
Descriptive statistics
| Standard deviation | 197.1205218 |
|---|---|
| Coefficient of variation (CV) | 0.2402948331 |
| Kurtosis | -0.2934934982 |
| Mean | 820.3277586 |
| Median Absolute Deviation (MAD) | 133 |
| Skewness | 0.4288411019 |
| Sum | 5003179 |
| Variance | 38856.50013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 705 | 19 | 0.3% |
| 765 | 19 | 0.3% |
| 800 | 19 | 0.3% |
| 733 | 19 | 0.3% |
| 846 | 19 | 0.3% |
| 737 | 18 | 0.3% |
| 751 | 18 | 0.3% |
| 702 | 18 | 0.3% |
| 793 | 18 | 0.3% |
| 830 | 17 | 0.3% |
| Other values (894) | 5915 |
| Value | Count | Frequency (%) |
| 360 | 1 | |
| 370 | 1 | |
| 381 | 1 | |
| 384 | 1 | |
| 396 | 1 | |
| 404 | 1 | |
| 407 | 1 | |
| 410 | 1 | |
| 415 | 1 | |
| 417 | 1 |
| Value | Count | Frequency (%) |
| 1374 | 1 | < 0.1% |
| 1373 | 2 | |
| 1370 | 2 | |
| 1368 | 1 | < 0.1% |
| 1366 | 3 | |
| 1365 | 1 | < 0.1% |
| 1364 | 1 | < 0.1% |
| 1363 | 1 | < 0.1% |
| 1361 | 3 | |
| 1359 | 1 | < 0.1% |
| Distinct | 1405 |
|---|---|
| Distinct (%) | 23.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1430.273323 |
| Minimum | 601 |
|---|---|
| Maximum | 2337 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 601 |
|---|---|
| 5-th percentile | 880 |
| Q1 | 1192 |
| median | 1444 |
| Q3 | 1662 |
| 95-th percentile | 1968.1 |
| Maximum | 2337 |
| Range | 1736 |
| Interquartile range (IQR) | 470 |
Descriptive statistics
| Standard deviation | 329.5066174 |
|---|---|
| Coefficient of variation (CV) | 0.2303801742 |
| Kurtosis | -0.5563477879 |
| Mean | 1430.273323 |
| Median Absolute Deviation (MAD) | 233 |
| Skewness | -0.01741131692 |
| Sum | 8723237 |
| Variance | 108574.6109 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1488 | 15 | 0.2% |
| 1418 | 15 | 0.2% |
| 1552 | 14 | 0.2% |
| 1539 | 14 | 0.2% |
| 1580 | 13 | 0.2% |
| 1490 | 13 | 0.2% |
| 1307 | 13 | 0.2% |
| 1467 | 13 | 0.2% |
| 1382 | 13 | 0.2% |
| 1374 | 13 | 0.2% |
| Other values (1395) | 5963 |
| Value | Count | Frequency (%) |
| 601 | 1 | |
| 605 | 1 | |
| 621 | 1 | |
| 637 | 1 | |
| 640 | 1 | |
| 642 | 1 | |
| 647 | 1 | |
| 652 | 1 | |
| 655 | 1 | |
| 660 | 1 |
| Value | Count | Frequency (%) |
| 2337 | 1 | |
| 2332 | 1 | |
| 2319 | 1 | |
| 2316 | 2 | |
| 2311 | 1 | |
| 2306 | 1 | |
| 2305 | 1 | |
| 2301 | 1 | |
| 2288 | 1 | |
| 2283 | 1 |
| Distinct | 1399 |
|---|---|
| Distinct (%) | 22.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1007.403837 |
| Minimum | 288 |
|---|---|
| Maximum | 2108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 288 |
|---|---|
| 5-th percentile | 501 |
| Q1 | 756 |
| median | 979 |
| Q3 | 1238 |
| 95-th percentile | 1598.1 |
| Maximum | 2108 |
| Range | 1820 |
| Interquartile range (IQR) | 482 |
Descriptive statistics
| Standard deviation | 334.890392 |
|---|---|
| Coefficient of variation (CV) | 0.332429141 |
| Kurtosis | -0.4601943517 |
| Mean | 1007.403837 |
| Median Absolute Deviation (MAD) | 240 |
| Skewness | 0.3322983599 |
| Sum | 6144156 |
| Variance | 112151.5747 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 836 | 18 | 0.3% |
| 825 | 15 | 0.2% |
| 905 | 15 | 0.2% |
| 826 | 14 | 0.2% |
| 949 | 14 | 0.2% |
| 926 | 14 | 0.2% |
| 807 | 14 | 0.2% |
| 940 | 13 | 0.2% |
| 891 | 13 | 0.2% |
| 1019 | 13 | 0.2% |
| Other values (1389) | 5956 |
| Value | Count | Frequency (%) |
| 288 | 1 | |
| 307 | 1 | |
| 310 | 1 | |
| 313 | 2 | |
| 322 | 2 | |
| 326 | 1 | |
| 328 | 1 | |
| 332 | 2 | |
| 341 | 1 | |
| 342 | 1 |
| Value | Count | Frequency (%) |
| 2108 | 1 | |
| 2026 | 1 | |
| 2023 | 1 | |
| 2021 | 2 | |
| 2020 | 2 | |
| 2016 | 1 | |
| 2008 | 1 | |
| 2004 | 1 | |
| 1999 | 1 | |
| 1972 | 1 |
| Distinct | 414 |
|---|---|
| Distinct (%) | 6.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.18083292 |
| Minimum | -1.9 |
|---|---|
| Maximum | 41.1 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 8 |
| Negative (%) | 0.1% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | -1.9 |
|---|---|
| 5-th percentile | 4.4 |
| Q1 | 11.8 |
| median | 17.6 |
| Q3 | 24.2 |
| 95-th percentile | 34.5 |
| Maximum | 41.1 |
| Range | 43 |
| Interquartile range (IQR) | 12.4 |
Descriptive statistics
| Standard deviation | 8.858076248 |
|---|---|
| Coefficient of variation (CV) | 0.4872205958 |
| Kurtosis | -0.4899169847 |
| Mean | 18.18083292 |
| Median Absolute Deviation (MAD) | 6.2 |
| Skewness | 0.3076622934 |
| Sum | 110884.9 |
| Variance | 78.46551482 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 21.3 | 42 | 0.7% |
| 20.8 | 39 | 0.6% |
| 20.2 | 39 | 0.6% |
| 13.5 | 38 | 0.6% |
| 12 | 36 | 0.6% |
| 17.8 | 35 | 0.6% |
| 13.4 | 35 | 0.6% |
| 16.3 | 35 | 0.6% |
| 19.3 | 34 | 0.6% |
| 12.3 | 34 | 0.6% |
| Other values (404) | 5732 |
| Value | Count | Frequency (%) |
| -1.9 | 1 | < 0.1% |
| -1.3 | 2 | |
| -1.2 | 1 | < 0.1% |
| -0.6 | 1 | < 0.1% |
| -0.5 | 1 | < 0.1% |
| -0.2 | 1 | < 0.1% |
| -0.1 | 1 | < 0.1% |
| 0 | 1 | < 0.1% |
| 0.2 | 1 | < 0.1% |
| 0.3 | 3 |
| Value | Count | Frequency (%) |
| 41.1 | 3 | |
| 41 | 1 | < 0.1% |
| 40.9 | 1 | < 0.1% |
| 40.6 | 2 | < 0.1% |
| 40.5 | 1 | < 0.1% |
| 40.4 | 5 | |
| 40.3 | 4 | |
| 40.2 | 3 | |
| 40.1 | 6 | |
| 40 | 2 | < 0.1% |
| Distinct | 734 |
|---|---|
| Distinct (%) | 12.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48.07504509 |
| Minimum | 9.2 |
|---|---|
| Maximum | 88.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 9.2 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 34.3 |
| median | 47.9 |
| Q3 | 61.45 |
| 95-th percentile | 77.2 |
| Maximum | 88.7 |
| Range | 79.5 |
| Interquartile range (IQR) | 27.15 |
Descriptive statistics
| Standard deviation | 17.42229095 |
|---|---|
| Coefficient of variation (CV) | 0.3623978078 |
| Kurtosis | -0.8709640956 |
| Mean | 48.07504509 |
| Median Absolute Deviation (MAD) | 13.6 |
| Skewness | 0.03965543715 |
| Sum | 293209.7 |
| Variance | 303.536222 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 47.8 | 21 | 0.3% |
| 39.4 | 20 | 0.3% |
| 49.4 | 20 | 0.3% |
| 34.5 | 19 | 0.3% |
| 44.1 | 19 | 0.3% |
| 43 | 19 | 0.3% |
| 42.8 | 19 | 0.3% |
| 53.1 | 18 | 0.3% |
| 45.9 | 18 | 0.3% |
| 58.4 | 18 | 0.3% |
| Other values (724) | 5908 |
| Value | Count | Frequency (%) |
| 9.2 | 1 | |
| 9.3 | 1 | |
| 9.8 | 1 | |
| 9.9 | 2 | |
| 10.4 | 1 | |
| 11.1 | 1 | |
| 11.6 | 1 | |
| 12.3 | 1 | |
| 12.6 | 2 | |
| 12.7 | 1 |
| Value | Count | Frequency (%) |
| 88.7 | 1 | |
| 87.1 | 1 | |
| 87 | 1 | |
| 86.5 | 2 | |
| 86 | 1 | |
| 85.7 | 2 | |
| 85.5 | 1 | |
| 85.4 | 2 | |
| 85.3 | 1 | |
| 85.1 | 2 |
| Distinct | 4949 |
|---|---|
| Distinct (%) | 81.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9931203968 |
| Minimum | 0.1847 |
|---|---|
| Maximum | 2.0297 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 47.8 KiB |
Quantile statistics
| Minimum | 0.1847 |
|---|---|
| 5-th percentile | 0.38559 |
| Q1 | 0.70425 |
| median | 0.9688 |
| Q3 | 1.2666 |
| 95-th percentile | 1.70552 |
| Maximum | 2.0297 |
| Range | 1.845 |
| Interquartile range (IQR) | 0.56235 |
Descriptive statistics
| Standard deviation | 0.3982713569 |
|---|---|
| Coefficient of variation (CV) | 0.4010302861 |
| Kurtosis | -0.5950051195 |
| Mean | 0.9931203968 |
| Median Absolute Deviation (MAD) | 0.2801 |
| Skewness | 0.2641157285 |
| Sum | 6057.0413 |
| Variance | 0.1586200737 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.9722 | 5 | 0.1% |
| 0.8736 | 5 | 0.1% |
| 0.9129 | 4 | 0.1% |
| 1.3689 | 4 | 0.1% |
| 1.0451 | 4 | 0.1% |
| 0.8193 | 4 | 0.1% |
| 0.8394 | 4 | 0.1% |
| 0.9385 | 4 | 0.1% |
| 0.9462 | 4 | 0.1% |
| 0.9033 | 4 | 0.1% |
| Other values (4939) | 6057 |
| Value | Count | Frequency (%) |
| 0.1847 | 1 | |
| 0.1862 | 1 | |
| 0.191 | 1 | |
| 0.1975 | 1 | |
| 0.1988 | 1 | |
| 0.2029 | 1 | |
| 0.2031 | 1 | |
| 0.2062 | 1 | |
| 0.2086 | 1 | |
| 0.2157 | 1 |
| Value | Count | Frequency (%) |
| 2.0297 | 1 | |
| 2.029 | 1 | |
| 2.0224 | 1 | |
| 2.019 | 1 | |
| 2.0184 | 1 | |
| 2.0183 | 1 | |
| 2.0181 | 1 | |
| 2.0177 | 1 | |
| 2.0127 | 1 | |
| 2.012 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Date | Month | Weekday | DateTime | Hour | PT08.S1(CO) | PT08.S2(NMHC) | PT08.S3(NOx) | PT08.S4(NO2) | PT08.S5(O3) | T | RH | AH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004-03-10 | March | Wednesday | 2004-03-10 18:00:00 | 18 | 1360.0 | 1046.0 | 1056.0 | 1692.0 | 1268.0 | 13.6 | 48.9 | 0.7578 |
| 1 | 2004-03-10 | March | Wednesday | 2004-03-10 19:00:00 | 19 | 1292.0 | 955.0 | 1174.0 | 1559.0 | 972.0 | 13.3 | 47.7 | 0.7255 |
| 2 | 2004-03-10 | March | Wednesday | 2004-03-10 20:00:00 | 20 | 1402.0 | 939.0 | 1140.0 | 1555.0 | 1074.0 | 11.9 | 54.0 | 0.7502 |
| 3 | 2004-03-10 | March | Wednesday | 2004-03-10 21:00:00 | 21 | 1376.0 | 948.0 | 1092.0 | 1584.0 | 1203.0 | 11.0 | 60.0 | 0.7867 |
| 4 | 2004-03-10 | March | Wednesday | 2004-03-10 22:00:00 | 22 | 1272.0 | 836.0 | 1205.0 | 1490.0 | 1110.0 | 11.2 | 59.6 | 0.7888 |
| 5 | 2004-03-10 | March | Wednesday | 2004-03-10 23:00:00 | 23 | 1197.0 | 750.0 | 1337.0 | 1393.0 | 949.0 | 11.2 | 59.2 | 0.7848 |
| 6 | 2004-03-11 | March | Thursday | 2004-03-11 08:00:00 | 8 | 1333.0 | 900.0 | 1136.0 | 1517.0 | 1102.0 | 10.8 | 57.4 | 0.7408 |
| 7 | 2004-03-11 | March | Thursday | 2004-03-11 09:00:00 | 9 | 1351.0 | 960.0 | 1079.0 | 1583.0 | 1028.0 | 10.5 | 60.6 | 0.7691 |
| 8 | 2004-03-11 | March | Thursday | 2004-03-11 10:00:00 | 10 | 1233.0 | 827.0 | 1218.0 | 1446.0 | 860.0 | 10.8 | 58.4 | 0.7552 |
| 9 | 2004-03-11 | March | Thursday | 2004-03-11 11:00:00 | 11 | 1179.0 | 762.0 | 1328.0 | 1362.0 | 671.0 | 10.5 | 57.9 | 0.7352 |
Last rows
| Date | Month | Weekday | DateTime | Hour | PT08.S1(CO) | PT08.S2(NMHC) | PT08.S3(NOx) | PT08.S4(NO2) | PT08.S5(O3) | T | RH | AH | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6089 | 2005-04-04 | April | Monday | 2005-04-04 05:00:00 | 5 | 888.0 | 528.0 | 1077.0 | 987.0 | 578.0 | 10.4 | 59.9 | 0.7550 |
| 6090 | 2005-04-04 | April | Monday | 2005-04-04 06:00:00 | 6 | 1031.0 | 730.0 | 760.0 | 1129.0 | 905.0 | 9.5 | 63.1 | 0.7531 |
| 6091 | 2005-04-04 | April | Monday | 2005-04-04 07:00:00 | 7 | 1384.0 | 1221.0 | 470.0 | 1600.0 | 1457.0 | 9.7 | 61.9 | 0.7446 |
| 6092 | 2005-04-04 | April | Monday | 2005-04-04 08:00:00 | 8 | 1446.0 | 1362.0 | 415.0 | 1777.0 | 1705.0 | 13.5 | 48.9 | 0.7553 |
| 6093 | 2005-04-04 | April | Monday | 2005-04-04 09:00:00 | 9 | 1297.0 | 1102.0 | 507.0 | 1375.0 | 1583.0 | 18.2 | 36.3 | 0.7487 |
| 6094 | 2005-04-04 | April | Monday | 2005-04-04 10:00:00 | 10 | 1314.0 | 1101.0 | 539.0 | 1374.0 | 1729.0 | 21.9 | 29.3 | 0.7568 |
| 6095 | 2005-04-04 | April | Monday | 2005-04-04 11:00:00 | 11 | 1163.0 | 1027.0 | 604.0 | 1264.0 | 1269.0 | 24.3 | 23.7 | 0.7119 |
| 6096 | 2005-04-04 | April | Monday | 2005-04-04 12:00:00 | 12 | 1142.0 | 1063.0 | 603.0 | 1241.0 | 1092.0 | 26.9 | 18.3 | 0.6406 |
| 6097 | 2005-04-04 | April | Monday | 2005-04-04 13:00:00 | 13 | 1003.0 | 961.0 | 702.0 | 1041.0 | 770.0 | 28.3 | 13.5 | 0.5139 |
| 6098 | 2005-04-04 | April | Monday | 2005-04-04 14:00:00 | 14 | 1071.0 | 1047.0 | 654.0 | 1129.0 | 816.0 | 28.5 | 13.1 | 0.5028 |