2020/11/10(火)19:30 〜 21:00

WebHack#40 Introduction to Japanese tokenizers

オンライン

イベント内容

WebHack

If you believe building relationships with people who are in the same industry helps you know yourself and achieve what you couldn't do alone, this month's event is for you!

If you value continuously learning, read WebHack Monthly and follow @WebHackMeetup to keep receiving valuable contents & earlier updates ‍

Event Url

https://indeed.zoom.us/s/95236959786

PASSWORD: bekind

To gain the best experience, please

Join on PC (not on mobile phones)
Use Chrome browser (not Safari or Firefox)
Start at 19:30 in Tokyo time (GMT+9)

Feel free to write real-time collaborative notes at real-time event notes!

Language

English

Description

In this talk, Wanasit will share what he learn about Japanese NLP after trying to build a Japanese tokenizer from scratch.

Doing Natural Language Processing (NLP) or text processing for Japanese has many challenges. One of the most basic and obvious problems is tokenization (aka. splitting text into a list of words).

Unlike English that the words typically separated by space, splitting Japanese text (e.g. 日本語の自然言語処理を行うには…) doesn’t have such a rule-of-thumb. It requires the tokenizers and NLP tools to be a lot more sophisticated.

Speaker

Wanasit Tanakitrungruang, Engineering Manager, Indeed

Wanasit works an Engineering Manager in Search Quality team at Indeed. His team is focused on improving NLP for job descriptions and helps people get jobs by making Indeed's search better.

He also works on language and text processing projects on his free time (and this talk is related to one of his personal projects).

Schedule

Time	Session
19:30 - 19:35	Opening
19:35 - 20:00	Talk
20:00 - 20:15	Q & A
20:15 - 20:30	Networking
20:30	Good night!

Venue

This event will be live streamed, and the link will be sent to attendees three days before. Ensure to register in order to receive the link, please.