Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5d5b186
is JSON data preparation
aliamcami Mar 23, 2019
cd0ac0c
Quantitative analysts for json values
aliamcami Mar 23, 2019
970ea0e
Readme with overview of the findings about the quantitative analysts
aliamcami Mar 23, 2019
0272b1c
Sample comparasions for quantity of valid json values
aliamcami Mar 31, 2019
c5ec9b9
Data prep saving other samples
aliamcami Mar 31, 2019
c3fb738
Update readme with future questions
aliamcami Mar 31, 2019
1a5bcdb
Add of 'domain' column to data prep
aliamcami Mar 31, 2019
efc051e
Update jsJson_dataPrep to include an extra column with the md5 of val…
aliamcami Mar 31, 2019
2b617de
Rename 'isJson_Sample_Comparasion' to 'isJson_Quantitative_Comparasion'
aliamcami Mar 31, 2019
68700ec
Rename folder from ''2019_03_aliamcami_greatest_values_are_json' to '…
aliamcami Mar 31, 2019
0820cea
Removal of outdated notebook
aliamcami Mar 31, 2019
327429a
Add analyse for the correlation the domain and the value have with ea…
aliamcami Mar 31, 2019
4e18a11
Readme update - Quantitative_Comparasion overview
aliamcami Mar 31, 2019
a509bff
DataPrep cleanup and new 'json_keys' and 'json_schema' columns to dat…
aliamcami Apr 4, 2019
9e48a03
Remove Quantitative comparison and Add value distribution notebook
aliamcami Apr 8, 2019
699b066
Fix typo
aliamcami Apr 8, 2019
46c31d0
Removed fixed names, session organization, removed false positives fo…
aliamcami Apr 8, 2019
df6d843
Value distribution with new data that filtered json false positives
aliamcami Apr 8, 2019
b77dccf
Add new notebookt 'isJson_Occurrence_of_operation_symbols_domains.ipynb'
aliamcami Apr 17, 2019
2be179c
Clean run of the dataPrep with all columns
aliamcami Apr 17, 2019
f30f68a
Add isJson_Identify_Source.ipynb
aliamcami Apr 22, 2019
e1ee1f2
Remove isJson_correlation_domain_and_value.ipynb
aliamcami Apr 22, 2019
3b80915
Add isJson_Script_Domain_Output.ipynb and update readme
aliamcami Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions analyses/2019_03_aliamcami_greatest_values_are_json/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Overview

All the greatest values are JSON, but they represent very little percentage of the whole data.

### Most of the data have small value_len
(mean = 1356 for the 10% sample)
- 95,58% of the data have value_len smaller than the mean
- 4,42% are bigger than the mean
- 9.35% are valid JSON

### Values above the mean:
- 61,54% are NOT valid JSON
- 38,46% are valid JSON

### Values that are 1 standard deviation (std) above the mean
(std = 26310 for 10% sample):
- 0,11% are NOT valid JSON
- 99,88% are valid JSON
- The bigger the value the greater the chance of being a valid JSON

### Values 4 std above the mean
- 100% are valid JSON
- The biggest non-JSON value have the length of 104653

##
The top 46745 gratest value_len are valid JSONs, that is 9.35% of the filtered sample (value_len > mean) and 0,41% of the original 10% sample.
Loading