paperKB
Help
Sign in

Importing Papers

paperKB supports several ways to add papers to a knowledge base. Navigate to your KB → DocsImport to get started.

Import methods

PubMed IDs (PMIDs)

Paste one or more PMIDs (one per line, or comma/space separated). paperKB will fetch metadata from PubMed and full text from PMC when available.

DOIs

Paste DOIs in any common format (10.1234/..., doi:10.1234/..., or full URLs). Metadata is fetched from CrossRef and PubMed.

PDF upload

Drag and drop PDF files or click to browse. For each PDF, paperKB will:

  1. Extract text using pdftotext.
  2. Search for a DOI in the first few pages.
  3. If a DOI is found, fetch full metadata from CrossRef/PubMed.
  4. If no DOI is found, create a source using the filename as the title.
  5. Chunk the extracted text for search and embedding.

URL import

Paste a URL to a paper. paperKB will attempt to extract metadata and content from the page.

Zotero sync

Connect your Zotero library from Settings → External Keys:

  1. Generate an API key at zotero.org/settings/keys (needs read access to library and files).
  2. Add it in Settings → External Keys with service "Zotero".
  3. On the Import page, select the Zotero tab, choose a library and collection, and import.
  4. You can also set up auto-sync to keep a Zotero collection in sync with your KB.

What happens after import

After a paper is imported, a background worker processes it:

  1. Text extraction — full text is parsed from PMC XML or PDF.
  2. Chunking — text is split into overlapping passages (~200 words each).
  3. Embedding — each chunk is embedded for semantic search.
  4. Observation extraction — an LLM extracts structured entities and relationships.
  5. Citation graph — references are parsed and linked.
  6. PageRank — citation-based importance scores are computed.

Monitor progress on the Status tab in KB settings.

Bulk operations

From the Docs list, you can select multiple papers and:

  • Add them to a collection
  • Remove them from a collection
  • Remove them from the KB entirely