Skip to content

podcast-data-lab/core-nlp-research

Repository files navigation

Natural Language Processing of Podcast Data

This repository contains research into podcasts with Natural Language processing.

Podcast Data is vast and growing tremendously day by day. There are many data points to research podcasts, with the main being the audio files themselves, transcripts of the audio, podcast descriptions and other metadata obtained from a podcast's rss feed.

Phase 1: Name entity recognition of Podcast and episode text descriptions

The first phase of this research is dealing with textual data obtained from the podcast and it's episodes' descriptions obtained from rss feeds. Named entities are extracted from the descriptions and the entities attached to the the resulting podcast file.

Research Notes

About

NLP research of Podcast Data using Stanford CoreNLP.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published