Currently focusing on broad classification of astronomical objects. Project in hibernation as of May 2018. Like a bear.
- Python 3
- SQLite command line tool (optional)
- Set
LCMLenvironment variable to repo checkout's path (e.g.,export LCML=/Users/*/code/light_curve_ml) cd $LCML && pip install -e . --user
See instructions in conf/dev/ubuntu_install.txt
Supervised and unsupervised machine learning pipelines are run via the
run_pipeline.py entry point. It expects the path to a job (config) file and
file name for logger output. For example:
python3 lcml/pipeline/run_pipeline.py --path conf/local/supervised/macho.json --logFileName super_macho.log
The pipeline expects a job file (macho.json in above example) specifying the
configuration of the pipeline and detailed declaration of experiment parameters.
The specified job file supercedes and overrides the default job file
(conf/common/pipeline.json) on a per field basis recursively. So any, or none,
of the default fields may be overridden. The default settings are located at
conf/common/pipeline.json.
Job files have the following structure:
globalParams- Parameters used across multiple pipeline stagesdatabase- All db config and table namesloadData- Stage coverting raw data into coherent light curvespreprocessData- Stage cleaning and preprocessing light curvesextractFeatures- Stage extracting features from cleaned light curvespostprocessFeatures- Stage further processing extracted featuresmodelSearch- Stage testing several ML models with differing hyperparametersfunction- search function namemodel- ML model spec including non-searched parametersparams- parameters controlling the model search
serialization- Stage persisting ML model and metadata to disk
Pipeline 'stages' are customizable processors. Each stage definition has the following components:
skip- Boolean determining whether stage should executeparams- stage-specific parameterswriteTable- name of db table to which output is written
Some representative job files provided in this repo include:
local/supervised/fast_macho.json- Runs tiny portion of MACHO dataset through all supervised stages. Useful for pipeline debugging and for integration testing.local/supervised/macho.json- Full supervised learning pipeline for MACHO dataset. Usesfeetslibrary for feature extraction and random forests for classification.local/supervised/ogle3.json- Ditto for OGLE3local/unsupervised/macho.json- Unsupervised learning pipeline for MACHO focused on Mini-batch KMeans and Agglomerative clustering
lcml.data.acquisistion- Scripts used to acquire and/or process various datasets including MACHO, OGLE3, Catalina, and Gaialcml.poc- One-off proof-of-concept scripts for various libaries
The LoggingManager class allows for convenient customization of Python Logger
objects. The default Logging config is specified conf/common/logging.json.
This config should contain the following main keys:
basicConfig- values passed tologging.basicConfighandlers- handler definitions with atypeattribute, which may be eitherstreamorfilemodules- list of module specific logger level settings
Main modules should initialize the manager by invoking LoggingManager.initLogging
at the start of execution before logger objects have been created.