Skip to content

CTGenerator

oschulte edited this page Jul 4, 2017 · 5 revisions

Functionality

Solves the Contingency Table Problem described in Qian et al. CIKM 2014. Implements the solution in that paper, which uses the Fast Moebius Transform.

Input

Required Arguments

The procedure is passed connections objects for different databases.

  • con_std connects to a data_db database with the original data (e.g. unielwin_std). [This should be renamed con_data.]
  • con_setup is a database connection that connects to a metadata database setup_db (e.g unielwin_std_setup). The metadata comprise first-order random variable called functor nodes (e.g. 1Nodes, RNodes, FNodes), . Optional Arguments:
    • FunctorSet a table in setup_db. Restricts the computation to the functor nodes listed in FunctorSet. Default setting: contains all functor nodes.
    • Groundings a table in setup_db. Contains population variables (e.g Student). The contingency tables are expanded with entity Ids (e.g. student-id), so that the computation returns counts for individuals. Default setting: empty.
  • con_bnconnects to a bn_db database that contains metadata for learning (e.g. the lattice of relationship chains).
  • con_ct connects to a ct_db with the contingency tables that are constructed by dynamic programming algorithm. [db_db and ct_db should be merged.]

Output

  • after running CTGenerator, ct_db contains the contingency table for the first-order random variables listed in setup_db.FunctorSet and the data listed in data_db. If setup_db.Groundings contains first-order population variables, then the contingency table lists counts for each tuple of population members.

Program Flow

Assumes the following steps have been taken:

  1. Runs script transfer.sql. Transfer metadata from setup_db to bn_db.
  2. Generates relationship chain lattice in bn_db.
  3. Generates more metadata using metadata_3.sql or a variant depending on which option was chosen (link analysis on or off).

Then does the following:

  1. Builds contingency tables for each population variable (BuildCT_Pvars).
  2. For each relationship chain length, builds contingency tables for that length (BuildCT_Rnodes_join).

If link analysis is off, the procedure uses simple table joins. If it is on, it performs a virtual join using the Moebius Transform.

TODO

  • Make this a self-contained repository.
  • Add screenshots
  • Add a gallery of examples
Clone this wiki locally