ICML2020 因果推論系論文著者発表会 (オンライン)

2020/07/22(水)19:00 〜 21:00 開催

ブックマーク

#アルゴリズム, #機械学習

イベント内容

概要

機械学習に関するトップ国際会議 International Conference on Machine Learning (ICML2020)で、多くの因果推論に関連する論文が採択されました。本勉強会では、そのうち日本人によって執筆された論文に関して、著者本人らが日本語でたっぷり時間をとって論文の背景や内容を共有します。

なお、採択された論文一覧はこちらにあります。

機械学習 and/or 因果推論に関する論文を普段読んでいる研究者、学生、エンジニア、データサイエンティストの方を想定しておりますが、誰でもご参加いただけます。

実施方法

ZOOMを使用します。各自インストールをお願いします。 URLは当日（7/22）に、connpassのメッセージ機能で登録者に事前にお送りいたします。一つの発表時間は約40分で、発表資料（日本語 or 英語）はイベント後に公開する予定です。

質疑応答は、sli.doを使用して行います。使用方法は以下の通りです。

connpassのメッセージ機能でお知らせしたURLにアクセスしてください
発表者に質問があれば、都度ここに書き込んでください（匿名可）。
質問は他の方も閲覧可能です。自分も聞きたいという質問があれば「いいね」ができます。
発表終了後時間が許す限り、発表者が「いいね」が多いものから優先して回答していきます。

注意事項

技術交流が目的の勉強会ですので、知識の共有および、参加者同士の交流を目的としない参加はお断りしています。参加目的が不適切だと判断される場合には、運営側で参加をキャンセルさせていただく場合がございます。

タイムテーブル

時間	内容
19:00 - 19:05	挨拶・諸連絡
19:05 - 19:45	発表①：Counterfactual Cross-Validation
19:45 - 20:25	発表②：Few-shot Domain Adaptation by Causal Mechanism Transfer
20:25 - 21:05	発表③：Statistically Efficient Off-Policy Policy Gradients

※ 適宜休憩を取ります。当日予告なく時間配分、内容が変更になる可能性がございます。

発表の詳細

タイトル：Counterfactual Cross-Validation: Stable Model Selection Procedure for Causal Inference Models

発表者：齋藤優太 (東京工業大学経営工学系学士課程4年)

プロフィール：主に因果推論と機械学習の融合技術を用いた情報検索システムのバイアス除去に関する研究を行う。また、CyberAgent, Sony, ZOZO, SMNなどの国内企業と連携して、因果推論 x 機械学習領域の社会実装や実証研究を進めている。

論文リンク: https://arxiv.org/abs/1909.05299

スライドリンク：https://speakerdeck.com/usaito/counterfactual-cross-validation-stable-model-selection-procedure-for-causal-inference-models-gong-kai-yong

論文概要：機械学習的な汎化誤差最小化の定式化に基づく因果効果予測手法が乱立傾向にあります。またそれらの手法は、多くのハイパーパラメータを有しています。そうした背景から、各環境について最適な予測手法とハイパーパラメータを選択するという手順の重要性が増している一方で、その手順に関する研究はあまり進んでいません。本研究では、観測可能なデータのみを用いてデータドリブンに、因果効果予測手法のモデル選択やハイパーパラメータチューニングを行うための方法を提案します。また、ベンチマークデータセットを用いて、既存のヒューリスティックな評価指標よりも提案手法がより良い性能を持つモデルを候補集合の中から探し当てられることを実証します。

We study the model selection problem in conditional average treatment effect (CATE) prediction. Unlike previous works on this topic, we focus on preserving the rank order of the performance of candidate CATE predictors to enable accurate and stable model selection. To this end, we analyze the model performance ranking problem and formulate guidelines to obtain a better evaluation metric. We then propose a novel metric that can identify the ranking of the performance of CATE predictors with high confidence. Empirical evaluations demonstrate that our metric outperforms existing metrics in both model selection and hyperparameter tuning tasks.

タイトル：Few-shot Domain Adaptation by Causal Mechanism Transfer

発表者：手嶋毅志 (東京大学大学院新領域創成科学研究科博士課程2年)

プロフィール：少数データから統計的機械学習を行う方法論の理論的研究を行っています．特に因果的情報を学習に活用するというアプローチの研究を進めています．

論文リンク: https://arxiv.org/abs/2002.03497

スライドリンク：https://takeshi-teshima.github.io/talks/2020-07-22/few-shot-domain-adaptation-by-causal-mechanism-transfer.pdf

論文概要：本研究では手元のデータが少数しかない場合にも精度の良い予測器を学習する方法論として「ドメイン適応 (Domain adaptation; DA)」の実現方法を考えます．ドメイン適応とは，手元のデータが限られているときに「手元のデータとは確率分布が異なるが関連性のある追加的データ」を学習に役立てるという方法論です．ドメイン適応の方法を開発する上で最も重要な問いは，「『関連性』として異なるデータ間にどのような関係を仮定するか（転移仮定）」です．本研究では転移仮定として「データ分布の背後にある『因果モデル』が共通である」という仮定を用いることができる可能性を探索しました．例えば仮想的応用例として医療記録から疾病予測を行うための予測器を，ある地域に特化して学習したいという状況を考えます．その際，異なる地域でのデータを活用しようとしても，生活習慣が異なるなどの理由でデータの分布そのものは大きく異なる可能性があります．しかしながら同じ疾病は地域によらず同じ機序を持つと期待することはできます．こういった状況下で「背後にある因果的機構が同一」という事前知識を利用して精度の良い学習を行う方法を提供することが本研究の理想的ゴールです．本論文では，具体的には各ドメインの確率分布の背後に構造的因果的モデルがあると仮定したうえで，「その構造方程式が共通である」という転移仮定を用いることを提案しました．この仮定を利用してドメイン適応を行う方法を開発し，その理論的解析を通して提案法が統計的学習においてどのように貢献するかを明らかにし，さらにProof-of-conceptとなる実データ実験を通して手法の妥当性を確認した結果を報告しています．

We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available. Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities, e.g., identical conditionals or small distributional discrepancies. However, these assumptions may preclude the possibility of adaptation from intricately shifted and apparently very different distributions. To overcome this problem, we propose mechanism transfer, a meta-distributional scenario in which a data generating mechanism is invariant among domains. This transfer assumption can accommodate nonparametric shifts resulting in apparently different distributions while providing a solid statistical basis for DA. We take the structural equations in causal modeling as an example and propose a novel DA method, which is shown to be useful both theoretically and experimentally. Our method can be seen as the first attempt to fully leverage the structural causal models for DA.

タイトル： Statistically Efficient Off-Policy Policy Gradients

発表者：上原雅俊 (Harvard大学統計学科博士課程３年)

プロフィール：強化学習と因果推論の境界を主に研究しています。

論文リンク: 1. https://arxiv.org/abs/2002.04014 2. https://arxiv.org/abs/1908.08526

論文概要：医学や経済学の応用分野で、過去の時系列データを用いて、方策を評価し最適な方策を学習する手法（Off-policy evaluation, Off-policy learning）は重要になってきていている。また理論的な研究も因果推論や強化学習のコミュニティでとても盛んになっている。しかし、既存の有名な推定手法（Sequential IPWや Marginal sturctural model）はホライズン数が長くなるにつれ推定量の誤差が指数的に爆発するという、ホライズン数の呪いという問題がある。実際に、モバイルアプリを用いたMicro randomized trialsではホライズンが何百となり、ホライズン数の呪いは深刻な問題になる。本研究ではMDPにおける方策評価と方策勾配評価の誤差の漸近下限を導出し、実はこの下限はホライズン数に指数的ではなく多項式的に依存することが示した。そして、その下限を達成する推定量を提案しホライズン数の呪いを（部分的に）解いた。提案手法は二つの推定量（周辺密度比とQ関数）を組み合わせたメタアルゴリズムであり、DoublyRobustというロバスト性を持つ。そして提案した方策勾配推定を用いた学習手法の収束レートを示し、やはりレートがホライズン数に対して多項式的に依存するようにできることを示した。また本発表では近年の因果推論と強化学習界隈におけるOff policy learningの重要な論文たちとの関連も、時間が許す限り、俯瞰的に発表する予定である。

1. Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from off-policy data, where the estimation is particularly non-trivial. We derive the asymptotic lower bound on the feasible mean-squared error in both Markov and non-Markov decision processes and show that existing estimators fail to achieve it in general settings. We propose a meta-algorithm that achieves the lower bound without any parametric assumptions and exhibits a unique 3-way double robustness property. We discuss how to estimate nuisances that the algorithm relies on. Finally, we establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.

2. Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of q-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness.