ãã®èšäºã¯ BASE Advent Calendar 2022 ã®19æ¥ç®ã®èšäºã§ãã ã¯ããã« ããã«ã¡ã¯ãDataStrategyããŒã ã®ç«¹å
ã§ãã ä»åã¯BASEã§äœæãããã·ã§ãããæ±ã£ãŠããååã®ã«ããŽãªãæ©æ¢°åŠç¿ã¢ãã«ã䜿ã£ãŠæšè«ããããã®åãçµã¿ã«ã€ããŠã玹ä»ããããŸãã ã¯ããã« TL;DR ååã«ããŽãª ããŒã¿ã»ããã®äœæ ã©ãã«ã»ããã®æ€èš ããŒã¿ã®ãµã³ããªã³ã° AWS Ground Truthãå©çšããã¢ãããŒã·ã§ã³ ã¢ãããŒã·ã§ã³å¯Ÿè±¡ã®ãã£ã«ã¿ãªã³ã° ã¢ãã«ã®åŠç¿ãšãã¹ã BERTã®ãã¡ã€ã³ãã¥ãŒãã³ã° ã¢ãã«ã®æ§èœè©äŸ¡ gokartãå©çšãããã€ãã©ã€ã³ã®æ§ç¯ AWS Batchãå©çšãããããæšè«åºç€ ãããã« â» èšäºå
ã®ã³ãŒãã¯ãµã³ãã«ãšããŠç°¡ç¥åããŠããŸãã TL;DR BASEã§äœæãããã·ã§ããã«ç»é²ãããŠããååã®ã«ããŽãªïŒãã¡ãã·ã§ã³ã飿åãªã©ïŒãäºæž¬ããã¢ãã«ãäœæããŸãã åé¡ã¢ãã«ã«ã¯BERTããã¡ã€ã³ãã¥ãŒãã³ã°ãããã®ã䜿çšããŸããïŒååç»åã®å©çšã¯ä»åŸã®èª²é¡ã§ãïŒ ã¢ãããŒã·ã§ã³ã«ã¯AWS Ground Truthãå©çšããŸãã ã«ããŽãªæ°åã®2å€åé¡ã¢ãã«ãäœæãã8ã«ããŽãªåããã«ãã©ãã«ã§ã©ãã«ä»ãããŠããŸã ã¢ãã«ã®åŠç¿ã¯ç€Ÿå
ãªã³ãã¬GPUãã·ã³ãå©çšããæšè«ã¯AWS Batchãå©çšããŠããŸã ååã«ããŽãª çŸåšBASEã§äœæãããã·ã§ããããã¯ãã¡ãã·ã§ã³ã¢ã€ãã ãã€ã³ããªã¢çšåã飿åãªã©æ¯æ¥æ°äžç¹ãã®æ§ã
ãªååãç»é²ããã販売ãããŠããŸãã ããããªãããããã1ã€1ã€ã®ååãã©ããã£ãã«ããŽãªã«å±ããããææ¡ããããšã¯å®¹æã§ã¯ãªããäŸãã°ãBASEãéããŠã©ãããããã¡ãã·ã§ã³ååã販売ãããã®ãããäžæ£æ±ºæžãèµ·ããŠããååã®åŸåã¯ã©ããã£ããã®ãããšãã£ã现ããåæãè¡ãéã¯ãäººã®æã§ååã1ã€1ã€ç¢ºèªããŠããããã·ã§ããåŽã§èšå®ãããã·ã§ããã®ã«ããŽãªãå©çšããããšãå€ãã®ãçŸç¶ã§ãã ãšããããã·ã§ããã®ã«ããŽãªãå¿
ããããã®ã·ã§ããã§è²©å£²ãããŠããååã®ã«ããŽãªãšäžèŽãããšã¯éããããŸãã·ã§ããã«ããŽãªã®èšå®ã¯ä»»æã§ããããã«ããŽãªãèšå®ããŠããªãã·ã§ãããæ°å€ãååšããŸãã ãã®ãããDataStrategyããŒã ã§ã¯ãã®æ¥ã«BASEã®ã·ã§ããã§ç»é²ãããååã®ã«ããŽãªããæ©æ¢°åŠç¿ã¢ãã«ãçšããŠèªåçã«æšè«ãããããåŠçåºç€ãæ°ããäœæããŸããã ããŒã¿ã»ããã®äœæ æ©æ¢°åŠç¿ã¢ãã«ãäœæããããã«ã¯åŠç¿ããã³ãã¹ãçšã®ããŒã¿ã»ãããå¿
èŠãšãªããŸãã çææ®µéã§ã¯ååã®ã«ããŽãªãæ£ç¢ºã«ã©ãã«ä»ããããŠããäžå®ã®èŠæš¡ã®ããŒã¿ã»ãããååšããªãã£ãããããŸãã¯ãããã®äœæããè¡ããŸããã ã©ãã«ã»ããã®æ€èš ããŒã¿ã»ããã®äœæã«ãããããŸãååã«ã©ã®ãããªã©ãã«ãã€ããŠãããã®æ€èšãè¡ããŸããã åºæ¬çã«ã¯æ¢åã®ã·ã§ããã«ããŽãªãããŒã¹ã«ãã€ã€ãå®éã®åå矀ã«ç®ãéããªãããã§ããã ãæŒããéè€ãå°ãªããªããã倧ã«ããŽãª8çš®é¡ïŒã€ã³ããªã¢ããã¡ãã·ã§ã³ãã¹ããŒããé»åæ©åšãã³ã¹ã¡ã飲é£ç©ããµãŒãã¹ïŒãšããã®äžã®äžã«ããŽãªçŽ100çš®é¡ïŒèª¿çåšå
·ãTã·ã£ãããŽã«ãçšåããèåãªã©ïŒã«æŽçããŸããã ãã®äžã§äžæŠå€§ã«ããŽãª8çš®é¡åã«çµã£ãåé¡ã¢ãã«ã®äœæãè¡ãããšã«ããŸããã ããŒã¿ã®ãµã³ããªã³ã° BASEã®ã·ã§ããã§æ¢ã«å
¬éãããŠããååãã©ã³ãã ã«ãµã³ããªã³ã°ããã¢ãããŒã·ã§ã³ãè¡ãããšã§åŠç¿ããã³ãã¹ãçšã®ããŒã¿ã»ãããäœæããŸããã ãµã³ããªã³ã°ããéã¯ç¹å®ã®å£ç¯ã®ååã«åããªãããã«1幎以äžå¹
ã®ããç¯å²ããåçã«æœåºãè¡ããŸããã ãã®äžã§ã¢ãããŒã·ã§ã³ã³ã¹ããæãã€ã€8çš®é¡ã®å€§ã«ããŽãªå
šãŠã«ã€ããŠååãªãµã³ãã«ãµã€ãºã確ä¿ã§ãããããæ¢ã«ååãªãµã³ãã«ãµã€ãºãåŸãããã«ããŽãªã«åé¡ãããååããã£ã«ã¿ãªã³ã°ããåŠçãå
¥ããŸãããïŒåŸè¿°ïŒ AWS Ground Truthãå©çšããã¢ãããŒã·ã§ã³ ããŒã¿ã»ããã®ã¢ãããŒã·ã§ã³ã«ã¯ AWS Ground Truth ã®ãã³ããŒã¯ãŒã¯ãã©ãŒã¹ãå©çšããŸããã AWS Ground Truthãå©çšããçç±ãšããŠã¯æ®æ®µããAWSã䜿çšããŠããããåŠç¿ã³ã¹ããå°ããç¹ãS3ãšé£æºããããšã§ããŒã¿ã»ãããã¯ãŒã¯ãããŒã®ç®¡çã容æã§ããç¹ããã³ããŒã¯ãŒã¯ãã©ãŒã¹ã«ã€ããŠã¯AWSã«ãã£ãŠå質ãã»ãã¥ãªãã£ã®æé ãäºåã«ã¹ã¯ãªãŒãã³ã°ãããŠããç¹ãªã©ãæããããŸãã ã¢ãããŒã·ã§ã³ã«ã¯ããã¹ããšååç»åäž¡æ¹ã倿ææãšãããã£ãã®ã§ãããGround Truthã§å©çšã§ããã©ããªã³ã°ããŒã«ã¯ããã¹ããç»åã©ã¡ããçæ¹ã®ã¿ã«å¯Ÿå¿ãããã®ã§ãã£ãããããããããååç»åã«ååã¿ã€ãã«ãšåå説ææããã£ãã·ã§ã³ãšããŠä»äžãã1æã®ç»åãäœæã *1 ãããã察象ã«1ç»åã®ãã«ãã©ãã« *2 åé¡ã¿ã¹ã¯ãšããŠã¢ãããŒã·ã§ã³ã®ãžã§ããäœæããŸããã ã¢ãããŒã·ã§ã³çšã«ãã£ãã·ã§ã³ãä»äžãããç»åïŒãµã³ãã«ïŒ ã¡ãªã¿ã«Ground Truthã®ãžã§ãäœææãã¯ãŒã«ãŒæ°ã®èšå®éšåãããã©ã«ãã§æãç³ãŸããŠããŸããããã®éšåã¯ã¿ã¹ã¯ã®é£æåºŠãå¿
èŠãªç²ŸåºŠãªã©ãèããŠèª¿æŽããŠããããšãæšå¥šããŸããïŒãã®éšåã®åæå€ã3ã«ãªã£ãŠããããã®ãŸãŸå€æŽãå¿ããŠããå Žåã«ããŒã¿ã»ããã®ãµã€ãºã®çŽ3åã®ã³ã¹ããšäœæ¥æéãããã£ãŠããŸã£ãããšãã£ããã©ãã«ãèµ·ããå¯èœæ§ããããŸããïŒ Ground Truthã®ã¯ãŒã«ãŒæ°èšå® ã¢ãããŒã·ã§ã³å¯Ÿè±¡ã®ãã£ã«ã¿ãªã³ã° åã«ã©ã³ãã ãµã³ããªã³ã°ãè¡ãªã£ãããŒã¿ãã¢ãããŒã·ã§ã³ãããšãBASEã®ååå
šäœã«ãããŠå€æ°ãå ãããã¡ãã·ã§ã³ã«ããŽãªã®ååã«ã¢ãããŒã·ã§ã³ãéäžããããšã«ãªããŸãã äžæ¹ã§ãã¡ãã·ã§ã³ååã¯æ¯èŒçåé¡ãç°¡åãªã«ããŽãªã®ãããããŸã§ãµã³ãã«ãµã€ãºã¯å¿
èŠãªããéã«åé¡ãé£ããå°æ°æŽŸã®ã«ããŽãªã®ãµã³ãã«ãµã€ãºã確ä¿ããããšãããšå€§éã®ããŒã¿ãã¢ãããŒã·ã§ã³ããªããŠã¯ãªããªããªããã³ã¹ãããã©ãŒãã³ã¹ãæªããªããŸãã ãã®ããååãªãµã³ãã«ã確ä¿ã§ããã«ããŽãªããé æ¬¡åé¡ã¢ãã«ãäœæããŠãããã¢ãã«ã®ãã¹ãæ§èœãååã ã£ãå Žåã¯ãã®ã¢ãã«ãå©çšãã以éã®ã¢ãããŒã·ã§ã³ã§äºåã«ãã®ã«ããŽãªãšäºæž¬ãããååãé€å€ããåŠçãå
¥ããŸããã ããããŠåŸãããåŠç¿ããŒã¿ã®ã«ããŽãªååžã¯å®éã®ã«ããŽãªã®äºåååžãšå€§ããç°ãªãããšã«ãªããããåŠç¿æã®ãããããã®äœæéšåã§ãµã³ãã«ãµã€ãºã®èª¿æŽãè¡ããŸããã ã¢ãã«ã®åŠç¿ãšãã¹ã BERTã®ãã¡ã€ã³ãã¥ãŒãã³ã° åé¡ã¢ãã«ã«ã¯äºååŠç¿æžã¿ã®BERTããã¡ã€ã³ãã¥ãŒãã³ã°ãããã®ã䜿çšããããšã«ããŸããã BERTãã¯ãããšãããã¥ãŒã©ã«ãããããŒã¹ã®ã¢ãã«ã®å©ç¹ãšããŠã¯ ããã¹ãã®æèãåèªã®ååŸé¢ä¿ãªã©ãèæ
®ã§ããç¹ åã£ãååŠçãäžèŠã§ããç¹ kaggleãªã©ã®ã³ã³ãã§å€çšãããŠãããåé¡ã¿ã¹ã¯ã«ãããæ§èœã®é«ããæ
ä¿ãããŠããç¹ äœ¿ããããã©ã€ãã©ãªïŒäž»ã«Hugging Faceã®TransformersïŒãååšããããŒã¹ã¢ãã«ãã¿ã¹ã¯ã®å€æŽãã¢ãã«ã®æ¹é ãªã©ãæè»ã«è¡ããç¹ äœ¿ããããæ¥æ¬èªã®äºååŠç¿æžã¿ã¢ãã«ãååšããç¹ ãªã©ãæããããŸãã ãŸããä»åã¯å°å
¥ããŠããŸãããããããããã¹ãã ãã§ãªãååç»åãå©çšãããã«ãã¢ãŒãã«ãªã¢ãã«ã«ããããšãèŠéã«å
¥ããŠããŸãã *3 ã©ãã«ã®ã€ããããŒã¿ã»ãããtrain/evalã«åã *4 ãååã®ã¿ã€ãã«ããã³èª¬ææãçµåããããã¹ãããŒã¿ãç°¡åã«ååŠçããPytorchã®DatasetåããåŸã§transformersã®Trainerã¯ã©ã¹ã«ããŒã¹ã¢ãã«ãšåçš®ãã©ã¡ãŒã¿ãšãšãã«æž¡ããŸãã åŠç¿ã¯ç€Ÿå
ã®ãªã³ãã¬GPUãã·ã³ïŒRTX3090ïŒã䜿çšããŸããã ãã¡ã€ã³ãã¥ãŒãã³ã°éšåã®å®è£
äŸ import pandas as pd from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification, AutoTokenizer, EarlyStoppingCallback, ProgressCallback class BertClassifier : def __init__ (self, target_label: str , model: str , tokenizer: str , num_labels: int ): self.model = AutoModelForSequenceClassification.from_pretrained(model, num_labels=num_labels) self.tokenizer = AutoTokenizer.from_pretrained(tokenizer) self.item_category_collator = ItemCategoryCollator(self.tokenizer) self.model.config.id2label = { 0 : f "not_{target_label}" , 1 : target_label} def fit (self, df_train: pd.DataFrame, df_eval: pd.DataFrame, early_stopping_patience: int , training_args: TrainingArguments) -> None : dataset_train = ItemCategoryDataset(df_train) dataset_eval = ItemCategoryDataset(df_eval) trainer = Trainer( model=self.model, args=training_args, compute_metrics=self.metrics, train_dataset=dataset_train, eval_dataset=dataset_eval, tokenizer=self.tokenizer, data_collator=self.item_category_collator, callbacks=[ EarlyStoppingCallback(early_stopping_patience=early_stopping_patience), ProgressCallback()], ) trainer.train(ignore_keys_for_eval=[ 'last_hidden_state' , 'hidden_states' , 'attentions' ]) def evaluate (self, df_test, batch_size): ... def inference (self, df_inference, batch_size): ... ã¢ãã«ã®æ§èœè©äŸ¡ äœæããã¢ãã«ã®æ§èœãæ€èšŒããããã®ãã¹ãããŒã¿ã¯åŠç¿ããŒã¿ã»ãããšåææã®åå矀ããã³æ¯èŒçæè¿ã®åå矀ããã®ã©ã³ãã ãµã³ãã«ã®2çš®é¡ãçšæããŸããã *5 è€æ°ã®æ£è§£ã©ãã«ãã€ãã1ã€ã®ååã«å¯ŸããŠ8ã€ã®2å€åé¡ã¢ãã«ã§æšè«ãè¡ããã¢ãã«ã®åºåãéŸå€0.5ãäžåã£ãã©ãã«ãšæ£è§£ã©ãã«ãæ¯èŒããåœ¢ã§æ§èœè©äŸ¡ãè¡ããŸããã ååã«ããŽãªã®åé¡ã¯ãã«ãã¯ã©ã¹ã»ãã«ãã©ãã«ã®åé¡ã¿ã¹ã¯ã§ãããã©ãã«ã¯äžåè¡¡ã§ãããã®ã®èª€åé¡ã³ã¹ãã«ã€ããŠã¯äžæ£æ€ç¥ã®ããã«ãããŸã§é察称æ§ãããããã§ã¯ãªããšããç¹ãèžãŸããæ§èœè©äŸ¡ã®ææšãšããŠã¯ãã¯ã/ãã¯ãã®f1ã¹ã³ã¢ãéèŠãã€ã€ããã¯ã/ãã¯ãã®é©åçïŒprecisionïŒãåçŸçïŒrecallïŒãã©ãã«ããšã®é©åçãåçŸçãªã©ãåç
§ããŸãã *6 ã ãã¹ãçµæã®äŸ scores precision recall f1-score micro_avg 0.882 0.885 0.884 macro_avg 0.746 0.742 0.732 weighted_avg 0.884 0.885 0.882 æ°å€ã«å¯Ÿå¿ããã€ã¡ãŒãž â å
šäœçã«æŒããªãæ€åºãããŠããïŒã©ã®ååãèŠãŠããã¡ããšã©ãã«ãä»ããŠããïŒ â å
šäœçã«èª€åé¡ãå°ãªãïŒãã¿ã©ã¡ãªã©ãã«ãå°ãªãïŒ â æ¥µç«¯ã«æ€åºæŒããããã¯ã©ã¹ããªãïŒç¹ã«ã©ãã«ãä»ãã«ããã«ããŽãªãå°ãªãïŒ â æ¥µç«¯ã«èª€åé¡ãå€ãã¯ã©ã¹ããªãïŒç¹ã«ãã¿ã©ã¡ãªã©ãã«ãã€ããŠããã«ããŽãªãå°ãªãïŒ å
šäœã®å€ããå ãããã¡ãã·ã§ã³ç³»ã®ã¹ã³ã¢ãé«ããããã¯ãf1ã¹ã³ã¢ã¯é«ãäžæ¹ãäžéšåé¡ãé£ããå°æ°æŽŸã®ã«ããŽãªã®ã¹ã³ã¢ãäœããªã£ãããããã¯ãf1ã¹ã³ã¢ã¯æ¯èŒçäœãå€ãšãªã£ãŠããŸãããã¯ãf1ã¹ã³ã¢ãããŒããããŸãŸåé¡ãé£ããã«ããŽãªã®ã¢ãã«ãæ¹è¯ãããã¯ãf1ã¹ã³ã¢ãæ¹åããŠããããšãä»åŸã®èª²é¡ãšãªããŸãã gokartãå©çšãããã€ãã©ã€ã³ã®æ§ç¯ ä»åã¯2å€åé¡ã¢ãã«ãè€æ°çµã¿åãããããšã§ãã«ãã©ãã«åé¡ã¢ãã«ãäœæããŠãããããå
šäœãšããŠä»¥äžã®ããã«è¥å¹²ç
©éãªã¯ãŒã¯ãããŒãšãªã£ãŠããŸããïŒå®éã«ã¯ç»åããŒã¹ã®ã¢ãã«ã®æ€èšŒãªã©éã«è²ã
ãšå®éšãæãã§ããã®ã§ããã«ãªã¹ã§ããïŒ ã¢ãã«äœæãã€ãã©ã€ã³ ããããã¯ãŒã¯ãããŒãããŸãæŽçããä»ã®ã¡ã³ããŒãæå
ã§äœæ¥ãåçŸããããããç®çã§ãç»åãªã©ã®çããŒã¿ã®ããŒãžã§ã³ç®¡çã« Data Version Control ãå©çšããã¯ãŒã¯ãããŒã®å®è£
ã«ã¯ãšã ã¹ãªãŒãããéçºãããŠãããªãŒãã³ãœãŒã¹ã®ãã€ãã©ã€ã³ããŒã«ã§ãã Gokart ãå©çšãããŠããã ããŸããã gokartã¯S3ãšã®é£æºã容æãªãããåŠç¿ããã¢ãã«ã®ãã§ãã¯ãã€ã³ãããã®ãŸãŸæšè«åºç€ããå©çšããããšã§ã¢ãã«ã®ãããã€ãããŒãžã§ã³ç®¡çãã¹ã ãŒãºã«è¡ãããšãã§ããŸããã AWS Batchãå©çšãããããæšè«åºç€ DataStrategyããŒã ã§ã¯ãããåŠçã¯åºæ¬çã«Fargateã䜿çšããŠããŸãããä»åã¯åŠçã«GPUã€ã³ã¹ã¿ã³ã¹ãå¿
èŠãšãªããæ®å¿µãªããFargateã¯ãŸã GPUã«å¯Ÿå¿ããŠããªããããä»åã¯AWS Batch+EC2ã®GPUã€ã³ã¹ã¿ã³ã¹ïŒg4dnïŒãå©çšããŸããã åžžææšè«å¯Ÿè±¡ãšãªãååãSQSã«ä¿åããŠããã1æ¥ã«1åAWS BatchããGPUã€ã³ã¹ã¿ã³ã¹ãè€æ°å°èµ·åããããåŠçãè¡ãªã£ãŠããŸãã æšè«éšåã«é¢ããŠã¯transformersã® TextClassificationPipeline ã䜿çšãããšããªããã£ãããšå®è£
ããããšãã§ããŸãã æšè«éšåã®å®è£
äŸ import pandas as pd from torch.utils.data import Dataset from transformers import TextClassificationPipeline from transformers.modeling_utils import PreTrainedModel from transformers.pipelines.pt_utils import KeyDataset from transformers.tokenization_utils import PreTrainedTokenizer def inference ( df_inference_target: pd.DataFrame, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, target_label: str , batch_size: int = 16 , device: int = 0 , ) -> pd.Series: result = [] classifier = TextClassificationPipeline( model=model, tokenizer=tokenizer, framework= "pt" , task= "item_category" , batch_size=batch_size, device=device, num_workers= 2 ) dataset = ItemCategoryDataset(df_inference_target) tokenizer_kwargs = { "padding" : True , "truncation" : True , "max_length" : tokenizer.model_max_length, } for out in classifier( KeyDataset(dataset, "text" ), batch_size=batch_size, **tokenizer_kwargs ): if out[ "label" ] == target_label: score = out[ "score" ] else : score = 1.0 - out[ "score" ] result.append(score) return pd.Series(result) æšè«ãããçµæã¯ç€Ÿå
ã®ããŒã¿ãŠã§ã¢ããŠã¹ã«ä¿åãããä»ã®ããŒã¿ãšåãããåæããLookerã䜿çšããããã·ã¥ããŒãåãªã©ã«å©çšããããšãã§ããŸãã ãããã« ä»åã¯æ©æ¢°åŠç¿ã¢ãã«ãå©çšããååã«ããŽãªã®æšè«åºç€ã«é¢ããåãçµã¿ã«ã€ããŠã玹ä»ãããŠããã ããŸããã ä»åŸã«åã㊠ãã詳现ãªã«ããŽãªã®åé¡ ããŒã¿ã»ããã®æ¡å€§ïŒç²ŸåºŠã®åäžïŒ ååç»åã®å©çšïŒãã«ãã¢ãŒãã«åïŒ ã¢ãã«ã®ç²ŸåºŠç£èŠãšç¶ç¶çã¢ããããŒã Out of Distributionã®æ€ç¥ ãªã©ãªã©è²ã
ãšèª²é¡ã¯æ®ã£ãŠããŸãããã²ãšãŸãä»ãŸã§ææ¡ã§ããŠããªãã£ãåååäœã®ç²åºŠãŸã§è§£å床ãäžããåæãå¯èœãšãªããæ¢åã®å¥ã®æ©æ¢°åŠç¿ã¢ãã«ã®ç¹åŸŽéãšããŠäœ¿çšããããæ€çŽ¢ãæšèŠã®ç²ŸåºŠãäžããããã«ããŽãªãèšå®ãããŠããªãã·ã§ããã®ãžã£ã³ã«ãæšè«ããããšè²ã
ãªçšéã§æŽ»çšãèŠèŸŒãããã§ãã ããŠãDataStrategyããŒã ã§ã¯æ©æ¢°åŠç¿ãšã³ãžãã¢ãšããŠäžç·ã«åããŠãã ããæ¹ãç©æ¥µæ¡çšäžã§ããã«ãžã¥ã¢ã«é¢è«ã宿œããŠããã®ã§ãã²ãæ°è»œã«ãé£çµ¡ãã ããã 募集一覧 / BASE株式会社 ææ¥ã¯ @yuzuy @ayako-hotehama ã®èšäºãå
¬éäºå®ã§ãããã²ã芧ãã ããã *1 : ãã¡ãã®èšäºãåèã«ãããŠããã ããŸããã https://qiita.com/mo256man/items/b6e17b5a66d1ea13b5e3 *2 : ã»ããååãã¹ããŒããŠã§ã¢ãªã©ãè€æ°ã®ã«ããŽãªã«ãŸããã£ãŠå±ããååãååšãããããã«ãã©ãã«ãšããŸããã *3 : timm ã®ã¢ãã«ã®åºåãæ£èŠåããŠconcatãããªã©ãã€ãŒããªææ³ã¯è²ã
詊ãããã®ã®ãBERTåäœã®ã¢ãã«ãäžåããªãçµæãšãªããŸãããæ²ããã *4 : åãã·ã§ãããtrain/evalã«åãããªãããã·ã§ããã®IDã§GroupFoldããŠããŸããæ§èœæ€èšŒã§äœ¿çšãããã¹ãããŒã¿ã«ãåŠç¿æã«äœ¿çšããã·ã§ãããå«ãŸããªãããã«ããŠããŸãã *5 : ç»é²ææã«ããããŒã¿ããªããã®åœ±é¿ãèŠããã£ããããå®éã¯ãããŸã§åœ±é¿ã¯ãªãã£ãã®ã§æçµçã«1ã€ã®ãã¹ãããŒã¿ã«ãŸãšããŠããŸããŸããã *6 : åçš®ã¹ã³ã¢ã®å®çŸ©ã¯ sklearnã®docment ãåèã«ãªããŸãã