-
Notifications
You must be signed in to change notification settings - Fork 7
Data lab #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Data lab #8
Conversation
autokaggle/auto_ml.py
Outdated
|
||
# TODO: Further clean the design of this file | ||
class AutoKaggle(BaseEstimator): | ||
pipeline = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the class variables to instance variables.
autokaggle/auto_ml.py
Outdated
p_hparams_base = None | ||
|
||
def __init__(self, config=None, **kwargs): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow autokeras doc string style.
autokaggle/auto_ml.py
Outdated
import hyperopt | ||
from hyperopt import tpe, hp, fmin, Trials, STATUS_OK, STATUS_FAIL | ||
from sklearn.model_selection import cross_val_score | ||
from autokaggle.ensemblers import RankedEnsembler, StackingEnsembler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import modules instead of classes.
autokaggle/auto_ml.py
Outdated
m_hparams_base = None | ||
p_hparams_base = None | ||
|
||
def __init__(self, config=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly clarify all the arguments instead of using kwargs.
autokaggle/auto_ml.py
Outdated
x: A numpy.ndarray instance containing the training data. | ||
y: training label vector. | ||
time_limit: remaining time budget. | ||
data_info: meta-features of the dataset, which is an numpy.ndarray describing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list of strings. (specify the type)
autokaggle/auto_ml.py
Outdated
self.pipeline.fit(x_train, y_train) | ||
|
||
def resample(self, x, y): | ||
if self.config.balance_class_dist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add doc strings.
return x, y | ||
|
||
def subsample(self, x, y, sample_percent): | ||
# TODO: Add way to balance the subsample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc string to subsample.
autokaggle/auto_ml.py
Outdated
return grid_train_x, grid_train_y | ||
|
||
def search(self, x, y, prep_space, model_space): | ||
grid_train_x, grid_train_y = self.subsample(x, y, sample_percent=self.config.subsample_ratio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set maximum line length to 85, and check with CI using flake 8.
autokaggle/auto_ml.py
Outdated
np.random.shuffle(best_trials) | ||
|
||
if self.config.diverse_ensemble: | ||
estimator_list = self.pick_diverse_estimators(best_trials, self.config.num_estimators_ensemble) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the second arg
autokaggle/auto_ml.py
Outdated
return np.array(data_info) | ||
|
||
|
||
class AutoKaggleClassifier(AutoKaggle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to "Classifier"
autokaggle/auto_ml.py
Outdated
return score_metric, skf | ||
|
||
|
||
class AutoKaggleRegressor(AutoKaggle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to "Regressor".
self.ensembling_algo = hyperopt.rand.suggest if ensembling_algo == 'random' else hyperopt.tpe.suggest | ||
self.num_p_hparams = num_p_hparams | ||
|
||
def update(self, options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc string.
autokaggle/config.py
Outdated
setattr(self, k, v) | ||
|
||
|
||
knn_classifier_params = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use all capital letters for constants.
autokaggle/ensemblers.py
Outdated
} | ||
|
||
|
||
class RankedEnsembler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract a base class , function should raise not implemented error.
- Extend object class.
- Rename to RankEnsembleModel
- Doc strings.
self.stacking_estimator = self.search(predictions, y_val) | ||
self.stacking_estimator.fit(predictions, y_val) | ||
|
||
def search(self, x, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add doc string,
autokaggle/preprocessor.py
Outdated
LEVEL_HIGH = 32 | ||
|
||
|
||
class TabularPreprocessor(TransformerMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename
No description provided.