WebHack#40 Introduction to Japanese tokenizers



If you believe building relationships with people who are in the same industry helps you know yourself and achieve what you couldn't do alone, this month's event is for you!

If you value continuously learning, read WebHack Monthly and follow @WebHackMeetup to keep receiving valuable contents & earlier updates ‍

Event Url


PASSWORD: bekind

To gain the best experience, please

  • Join on PC (not on mobile phones)
  • Use Chrome browser (not Safari or Firefox)
  • Start at 19:30 in Tokyo time (GMT+9)

Feel free to write real-time collaborative notes at real-time event notes!




In this talk, Wanasit will share what he learn about Japanese NLP after trying to build a Japanese tokenizer from scratch.

Doing Natural Language Processing (NLP) or text processing for Japanese has many challenges. One of the most basic and obvious problems is tokenization (aka. splitting text into a list of words).

Unlike English that the words typically separated by space, splitting Japanese text (e.g. 日本語の自然言語処理を行うには…) doesn’t have such a rule-of-thumb. It requires the tokenizers and NLP tools to be a lot more sophisticated.


Wanasit Tanakitrungruang, Engineering Manager, Indeed

Wanasit works an Engineering Manager in Search Quality team at Indeed. His team is focused on improving NLP for job descriptions and helps people get jobs by making Indeed's search better.

He also works on language and text processing projects on his free time (and this talk is related to one of his personal projects).


Time Session
19:30 - 19:35 Opening
19:35 - 20:00 Talk
20:00 - 20:15 Q & A
20:15 - 20:30 Networking
20:30 Good night!


This event will be live streamed, and the link will be sent to attendees three days before. Ensure to register in order to receive the link, please.


※ こちらのイベント情報は、外部サイトから取得した情報を掲載しています。
※ 掲載タイミングや更新頻度によっては、情報提供元ページの内容と差異が発生しますので予めご了承ください。
※ 最新情報の確認や参加申込手続き、イベントに関するお問い合わせ等は情報提供元ページにてお願いします。