Portfolio Dataset

World Publishing Houses — Publishing Intelligence Dataset

A structured dataset and product prototype for tracking publishers, translated books, translation paths, market events, and rights opportunities across international publishing markets.

  • Data Product
  • Publishing Intelligence
  • Translation Metadata
  • Rights Discovery
  • Portfolio Dataset

Problem

International publishing metadata is fragmented across retailers, publishers, libraries, rights catalogs, award announcements, and market news. A reader may see that a book exists, but not whether it is available in English, who translated it, which publisher handled the edition, or whether the attribution is reliable.

For translators, publishers, bookstores, and rights professionals, the same fragmentation makes it difficult to compare national markets, identify translation gaps, track rights movement, and understand where acquisition opportunities may exist.

Product framing

World Publishing Houses uses the dataset as the foundation for a publishing intelligence product. The goal is not only to collect book data, but to make trust, provenance, translation paths, and reader-facing availability visible.

Rows marked curated_needs_check are treated as demonstration or research leads, not as fully verified records.

Dataset Overview

Coverage built for a portfolio-ready data product.

The pilot dataset focuses on Denmark and Iceland and models works, publishers, translations, market events, rights signals, and sources as connected product entities.

Works

55Books and works represented in the pilot dataset.

Publishers

29Publisher and imprint records across the pilot markets.

Translations

55Translation records with attribution and availability fields.

Events

60Release, award, market, and publishing signal events.

Rights Signals

19Rows designed for watchlist and acquisition workflows.

Pilot Countries

2Denmark and Iceland as early country templates.

Data Model

A schema designed for discovery, trust, and professional workflows.

CountryPublisherWorkTranslationSourceEventRights Opportunity

Core relationships

  • A country has many publishers.
  • A publisher can be associated with many works.
  • A work can have many translation records.
  • A translation can have source and verification metadata.

Product signals

  • Events capture releases, awards, market activity, and publishing signals.
  • Rights Watchlist rows identify potential opportunities for professional users.
  • Reader buckets translate metadata into product-facing availability categories.
  • Source rows preserve provenance for review and trust decisions.

Trust and verification

The dataset separates evidence levels instead of flattening every row into the same confidence level. This matters because publishing metadata can be incomplete, inconsistent, or source-dependent.

verified_public_sourcecurated_needs_check

Verified public-source rows can support reader-facing confidence. Curated rows are useful for portfolio demonstration, research planning, and rights-watchlist exploration, but they need manual review before being treated as authoritative.

Trust fields

  • verification_status
  • source_url
  • source_name
  • translation_path
  • translator
  • original_language
  • translated_language
  • rights_signal
  • reader_bucket

Reader mode

  • Find books available in English now.
  • Discover upcoming translations.
  • Explore books not yet available in English.
  • Browse by country, publisher, or translator.

Professional mode

  • Track rights and acquisition signals.
  • Monitor publisher activity.
  • Identify translation gaps.
  • Compare countries and market coverage.
  • Follow market events and awards.
Dashboard Preview

A calm overview for country coverage and rights signals.

Dashboard preview summarizing World Publishing Houses dataset coverage, works, publishers, translations, events, and rights signals.
Dashboard preview summarizing country coverage, works, publishers, translations, events, and rights-watchlist signals.
Sample Rows

A small sample that preserves verification status.

This table shows a small portfolio sample only. Rows marked curated_needs_check are research leads and should not be presented as fully verified.

CountryWork TitleAuthorPublisherOriginal LanguageEnglish AvailabilityTranslatorVerification StatusReader BucketRights Signal
Loading dataset sample...

Product insight

This dataset can support country pages, publisher pages, translated book discovery, translator attribution pages, rights dashboards, bookstore buying-intelligence views, trust badges, and translation-chain transparency.

For machine learning work, it also creates natural opportunities for entity resolution, classification, metadata conflict detection, embeddings-based search, and recommendation workflows.

My role

I designed this as a product-minded data portfolio asset: product researcher, data modeler, QA reviewer, dashboard designer, and ML/Data Science portfolio builder.

I planned the data structure, researched source-backed examples, separated verified and curated records, and mapped how the dataset could support both reader-facing and professional workflows.

Skills Demonstrated

What this case study is meant to show.

  • Data Modeling
  • Dataset Design
  • Metadata Architecture
  • Product Analytics
  • QA Thinking
  • Source Verification
  • Dashboard Design
  • International Publishing Research
  • Translation Metadata
  • Product Strategy

Foundation for a larger publishing intelligence platform.

This dataset is the foundation for a larger publishing intelligence platform that can support readers, publishers, translators, and bookstores without hiding uncertainty or provenance.