PodcastInsights

PodcastInsights is a Python-based tool for processing podcast RSS feeds, downloading episodes, and extracting transcripts for further analysis. This project aims to create a foundation for building insights and analytics from podcast content.

Features

Core Functionality

Parse podcast RSS feeds to extract episode metadata
Download podcast episodes from feeds
Transcribe audio to text using Whisper speech recognition
Store organized data with proper file management

Usage Examples

from podcast_processor import PodcastProcessor

# Initialize the processor
processor = PodcastProcessor()

# Process a podcast feed (with a limit of 5 episodes)
results = processor.process_feed("https://feeds.megaphone.fm/darknetdiaries", max_episodes=5)

# Print results
for result in results:
    print(f"\nTitle: {result['title']}")
    print(f"Audio: {result['audio_path']}")
    print(f"Transcript: {result['transcript_path']}")

Installation

Prerequisites

Python 3.12+
FFmpeg (required for audio processing)

Setup

Using Conda (recommended)

# Clone the repository
git clone https://github.com/yourusername/podcast-insights.git
cd podcast-insights

# Create the conda environment
conda env create -f environment.yaml

# Activate the environment
conda activate podcast-processor

Using pip

# Clone the repository
git clone https://github.com/yourusername/podcast-insights.git
cd podcast-insights

# Create a virtual environment
python -m venv venv

# Activate the environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Project Structure

podcast-insights/
├── podcast_processor/      # Core package
│   ├── __init__.py
│   ├── processor.py        # Main processing functionality
│   ├── feed_parser.py      # RSS feed parsing utilities
│   ├── audio_handler.py    # Audio download and processing
│   └── transcriber.py      # Speech-to-text transcription
├── scripts/                # Utility scripts
├── data/                   # Default data storage
│   ├── audio/              # Downloaded audio files
│   └── transcripts/        # Generated transcripts
└── tests/                  # Test suite

Configuration

Set custom paths and preferences by editing the config.yaml file:

output_directory: "data"
whisper_model: "base"  # Options: tiny, base, small, medium, large
max_episodes: 10  # Default limit for batch processing

Planned Features

Near-term Roadmap

Episode metadata extraction (show notes, timestamps, etc.)
Basic content analysis (topic identification, keyword extraction)
Simple web interface for browsing podcasts and transcripts

Future Enhancements

Speaker diarization (identifying different speakers in transcripts)
Topic segmentation (dividing episodes into thematic sections)
Sentiment analysis of podcast content
Cross-episode thematic analysis
Automated summarization and key point extraction
Content-based recommendation engine
Transcript search and indexing
Potential integration with LLM-based RAG systems for intelligent querying

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PodcastInsights

Features

Core Functionality

Usage Examples

Installation

Prerequisites

Setup

Using Conda (recommended)

Using pip

Project Structure

Configuration

Planned Features

Near-term Roadmap

Future Enhancements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

L1nusB/podcast-insights

Folders and files

Latest commit

History

Repository files navigation

PodcastInsights

Features

Core Functionality

Usage Examples

Installation

Prerequisites

Setup

Using Conda (recommended)

Using pip

Project Structure

Configuration

Planned Features

Near-term Roadmap

Future Enhancements

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages