Skip to content

Multiple tab components #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 100 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -587,13 +587,111 @@ trainer.fit(
)
```

**7. Tabular with a multi-target loss**
**7. A two-tower model**

This is a popular model in the context of recommendation systems. Let's say we
have a tabular dataset formed my triples (user features, item features,
target). We can create a two-tower model where the user and item features are
passed through two separate models and then "fused" via a dot product.

<p align="center">
<img width="350" src="docs/figures/arch_7.png">
</p>


```python
import numpy as np
import pandas as pd

from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import TabPreprocessor
from pytorch_widedeep.models import TabMlp, WideDeep, ModelFuser

# Let's create the interaction dataset
# user_features dataframe
np.random.seed(42)
user_ids = np.arange(1, 101)
ages = np.random.randint(18, 60, size=100)
genders = np.random.choice(["male", "female"], size=100)
locations = np.random.choice(["city_a", "city_b", "city_c", "city_d"], size=100)
user_features = pd.DataFrame(
{"id": user_ids, "age": ages, "gender": genders, "location": locations}
)

# item_features dataframe
item_ids = np.arange(1, 101)
prices = np.random.uniform(10, 500, size=100).round(2)
colors = np.random.choice(["red", "blue", "green", "black"], size=100)
categories = np.random.choice(["electronics", "clothing", "home", "toys"], size=100)

item_features = pd.DataFrame(
{"id": item_ids, "price": prices, "color": colors, "category": categories}
)

# Interactions dataframe
interaction_user_ids = np.random.choice(user_ids, size=1000)
interaction_item_ids = np.random.choice(item_ids, size=1000)
purchased = np.random.choice([0, 1], size=1000, p=[0.7, 0.3])
interactions = pd.DataFrame(
{
"user_id": interaction_user_ids,
"item_id": interaction_item_ids,
"purchased": purchased,
}
)
user_item_purchased = interactions.merge(
user_features, left_on="user_id", right_on="id"
).merge(item_features, left_on="item_id", right_on="id")

# Users
tab_preprocessor_user = TabPreprocessor(
cat_embed_cols=["gender", "location"],
continuous_cols=["age"],
)
X_user = tab_preprocessor_user.fit_transform(user_item_purchased)
tab_mlp_user = TabMlp(
column_idx=tab_preprocessor_user.column_idx,
cat_embed_input=tab_preprocessor_user.cat_embed_input,
continuous_cols=["age"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

# Items
tab_preprocessor_item = TabPreprocessor(
cat_embed_cols=["color", "category"],
continuous_cols=["price"],
)
X_item = tab_preprocessor_item.fit_transform(user_item_purchased)
tab_mlp_item = TabMlp(
column_idx=tab_preprocessor_item.column_idx,
cat_embed_input=tab_preprocessor_item.cat_embed_input,
continuous_cols=["price"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

two_tower_model = ModelFuser([tab_mlp_user, tab_mlp_item], fusion_method="dot")

model = WideDeep(deeptabular=two_tower_model)

trainer = Trainer(model, objective="binary")

trainer.fit(
X_tab=[X_user, X_item],
target=interactions.purchased.values,
n_epochs=1,
batch_size=32,
)
```

**8. Tabular with a multi-target loss**

This one is "a bonus" to illustrate the use of multi-target losses, more than
actually a different architecture.

<p align="center">
<img width="200" src="docs/figures/arch_7.png">
<img width="200" src="docs/figures/arch_8.png">
</p>


Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.6.2
1.6.3
1 change: 0 additions & 1 deletion docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,4 @@ them to address different problems
* `HyperParameter Tuning With RayTune <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/10_hyperParameter_tuning_w_raytune_n_wnb.ipynb>`__
* `Model Uncertainty Prediction <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/13_Model_Uncertainty_prediction.ipynb>`__
* `Bayesian Models <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/14_bayesian_models.ipynb>`__
* `Deep Imbalanced Regression <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/15_DIR-LDS_and_FDS.ipynb>`__

Binary file modified docs/figures/arch_7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/arch_8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
847 changes: 0 additions & 847 deletions examples/notebooks/15_DIR-LDS_and_FDS.ipynb

This file was deleted.

53 changes: 0 additions & 53 deletions examples/scripts/california_housing_fds_lds.py

This file was deleted.

85 changes: 84 additions & 1 deletion examples/scripts/readme_snippets.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,90 @@ def output_dim(self):
)


# 7. Simply Tabular with a multi-target loss
# 7. A Two tower model
np.random.seed(42)

# user_features dataframe
user_ids = np.arange(1, 101)
ages = np.random.randint(18, 60, size=100)
genders = np.random.choice(["male", "female"], size=100)
locations = np.random.choice(["city_a", "city_b", "city_c", "city_d"], size=100)
user_features = pd.DataFrame(
{"id": user_ids, "age": ages, "gender": genders, "location": locations}
)

# item_features dataframe
item_ids = np.arange(1, 101)
prices = np.random.uniform(10, 500, size=100).round(2)
colors = np.random.choice(["red", "blue", "green", "black"], size=100)
categories = np.random.choice(["electronics", "clothing", "home", "toys"], size=100)

item_features = pd.DataFrame(
{"id": item_ids, "price": prices, "color": colors, "category": categories}
)

# Interactions dataframe
interaction_user_ids = np.random.choice(user_ids, size=1000)
interaction_item_ids = np.random.choice(item_ids, size=1000)
purchased = np.random.choice([0, 1], size=1000, p=[0.7, 0.3])
interactions = pd.DataFrame(
{
"user_id": interaction_user_ids,
"item_id": interaction_item_ids,
"purchased": purchased,
}
)
user_item_purchased = interactions.merge(
user_features, left_on="user_id", right_on="id"
).merge(item_features, left_on="item_id", right_on="id")


# Users
tab_preprocessor_user = TabPreprocessor(
cat_embed_cols=["gender", "location"],
continuous_cols=["age"],
)
X_user = tab_preprocessor_user.fit_transform(user_item_purchased)
tab_mlp_user = TabMlp(
column_idx=tab_preprocessor_user.column_idx,
cat_embed_input=tab_preprocessor_user.cat_embed_input,
continuous_cols=["age"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

# Items
tab_preprocessor_item = TabPreprocessor(
cat_embed_cols=["color", "category"],
continuous_cols=["price"],
)
X_item = tab_preprocessor_item.fit_transform(user_item_purchased)
tab_mlp_item = TabMlp(
column_idx=tab_preprocessor_item.column_idx,
cat_embed_input=tab_preprocessor_item.cat_embed_input,
continuous_cols=["price"],
mlp_hidden_dims=[16, 8],
mlp_dropout=[0.2, 0.2],
)

two_tower_model = ModelFuser([tab_mlp_user, tab_mlp_item], fusion_method="dot")

model = WideDeep(deeptabular=two_tower_model)

trainer = Trainer(
model,
objective="binary",
)

trainer.fit(
X_tab=[X_user, X_item],
target=interactions.purchased.values,
n_epochs=1,
batch_size=32,
)


# 8. Simply Tabular with a multi-target loss

# let's add a second target to the dataframe
df["target2"] = [random.choice([0, 1]) for _ in range(100)]
Expand Down
17 changes: 8 additions & 9 deletions mkdocs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,14 @@ nav:
- 12_ZILNLoss_origkeras_vs_pytorch_widedeep: examples/12_ZILNLoss_origkeras_vs_pytorch_widedeep.ipynb
- 13_model_uncertainty_prediction: examples/13_model_uncertainty_prediction.ipynb
- 14_bayesian_models: examples/14_bayesian_models.ipynb
- 15_DIR-LDS_and_FDS: examples/15_DIR-LDS_and_FDS.ipynb
- 16_Self-Supervised Pre-Training pt 1: examples/16_Self_Supervised_Pretraning_pt1.ipynb
- 16_Self-Supervised Pre-Training pt 2: examples/16_Self_Supervised_Pretraning_pt2.ipynb
- 17_Usign-a-custom-hugging-face-model: examples/17_Usign_a_custom_hugging_face_model.ipynb
- 18_feature_importance_via_attention_weights: examples/18_feature_importance_via_attention_weights.ipynb
- 19_wide_and_deep_for_recsys_pt1: examples/19_wide_and_deep_for_recsys_pt1.ipynb
- 19_wide_and_deep_for_recsys_pt2: examples/19_wide_and_deep_for_recsys_pt2.ipynb
- 20_load_from_folder_functionality: examples/20_load_from_folder_functionality.ipynb
- 21-Using-huggingface-within-widedeep: examples/21_Using_huggingface_within_widedeep.ipynb
- 15_Self-Supervised Pre-Training pt 1: examples/16_Self_Supervised_Pretraning_pt1.ipynb
- 15_Self-Supervised Pre-Training pt 2: examples/16_Self_Supervised_Pretraning_pt2.ipynb
- 16_Usign-a-custom-hugging-face-model: examples/17_Usign_a_custom_hugging_face_model.ipynb
- 17_feature_importance_via_attention_weights: examples/18_feature_importance_via_attention_weights.ipynb
- 18_wide_and_deep_for_recsys_pt1: examples/19_wide_and_deep_for_recsys_pt1.ipynb
- 18_wide_and_deep_for_recsys_pt2: examples/19_wide_and_deep_for_recsys_pt2.ipynb
- 19_load_from_folder_functionality: examples/20_load_from_folder_functionality.ipynb
- 20-Using-huggingface-within-widedeep: examples/21_Using_huggingface_within_widedeep.ipynb
- Contributing: contributing.md

theme:
Expand Down
Loading