Home

Awesome

<h1><img src="icon.png" alt="Alt text" width="40" height="40"> Logseq PDF Extract </h1>

A plugin for improving PDF workflow in Logseq. It now mainly features:

And more features are planned. PRs are welcome!

šŸ›  Installation

Search for "PDF Extract" in the Logseq plugin store and install it. Or you could install it manually by downloading the latest release from GitHub Releases.

If you are using this plugin for the first time, follow these steps after installation:

<details> <summary>ā— To enable TeX OCR of area highlights</summary> </details> <details> <summary>ā— To enable Zotero-related features </summary>

Make sure Zotero is running and the plugin ZotServer is successfully installed in Zotero:

Now you can import stuff by using the slash command /PDF: show search panel or pressing Ctrl+Alt+z. The shortcut only works in editing mode.

Then to view PDFs imported, you might need to specify two paths in your settings:

<details> <summary> Two things possibly required by PDF "open" buttons: </summary>

And that's all for it! šŸŽ‰

This is to tell Logseq where to look for those PDFs managed by Zotero. Otherwise, Logseq might crash when you click a PDF open button because the PDF file is out of the current graph folder. For the detailed mechanism, see my explanation in this PR.

</details> </details>

šŸš€ A Quick Guide

1. Import Zotero Items šŸ“š

For a comparison between this and Logseq's native /Zotero command, see #6.

Currently, this plugin supports quick importing of items selected in Zotero or importing by searching items in a popup panel, as shown below. More features are planned and PRs welcome. If Better BibTeX enabled in Zotero, citation keys can also be imported. (See alias citationKey for more details. This is an experimental feature.).

Import items selected in Zotero

demonstration


Search panel:

search

The items page will be created in the above process. But it will be aborted if the page @{original-title}.md already exists. By default, the panel will be initialized with the items being selected in Zotero. And the search panel is responsive. It will request Zotero automatically after you stop typing for a while (customize the search_delay in Settings). And it matches any part (or combination) of the following (according to Zotero's documentation)

Some examples are

2. Annotation Extraction šŸ“

For any highlight, this feature replaces ((uuid)) with its linked content (wrapped by a customizable template). For area highlights, $\LaTeX$ OCR are performed first and taken as the contents(Experimental). It supports batch extraction.

Highlights extraction

<blockquote> <details> <summary><bold>Text Highlights from PDF</bold></summary> Here we explain what happens when you use `Ctrl+Alt+i` to convert `((uuid))` links in a block.

In the default case,

- ((uuid))

will be converted to

- pdf-ref:: ((uuid))
  > The original content of ((uuid))
</details> <details> <summary><bold>Area Highlights from PDF</bold></summary>

It's possible to extract TeX formula from area highlights. The OCR service is provided by Hugging Face. The OCR model is Norm/nougat-latex-base.

Two ways to invoke OCR:

</details> </blockquote> </details> </blockquote> <br> <details> <summary><h3>3. Open PDF from Any Path (under development šŸš§)</h3></summary>

With Zotero integration enabled, we could open PDFs under Zotero linked attachment base directory even if it's not in the assets folder. Logesq provides a macro {{zotero-linked-file your_pdf_path}} which is rendered as a button. <br> <img src="pdfOpenButton.png" width="250" > <br>

Here is how we could take advantage of it:

Caution! Buttons are delicate. If Logseq cannot find a PDF specified by the button, it may crash (possible data loss). Dynamical update might be implemented in the future. But no easy solutions so far. One idea is to record Zotero item key to update the button from Zotero. PRs or ideas are welcome.

<details> <summary>How it works and when I use it.</summary> > Personally, I love this hack because by creating mutli-profiles, in principle we could open any PDFs no matter where it's located on your PC. For example, we could insert buttons as "bookmarks" linked to any PDF without importing them. However, this feature depends on the enhancements to the multi-profile feature, as proposed in [this PR](https://github.com/logseq/logseq/pull/10430). Without it, it's better to ignore this function. > > Maybe with more Logseq API published in future, we could create various buttons, such as a button that links to a specific page of a PDF, or even "non-highlight" button that eliminates the need for highlighting. And if you have any ideas, PRs are welcome. </details> </details>

āš™ Settings

search_delay

The default delay between user's input and search is 100ms.

To optimize the performance and avoid unnecessary queries by the responsive search panel, we add some delay between user's input and queries. It forces to wait for a specified duration after the user stops typing in the search box before initiating a new search in Zotero. This delay ensures that the system doesn't trigger a search with every keystroke, thereby reducing unnecessary load. However, if your Zotero library has a relatively small number of items, you're welcome to reduce the delay duration as much as you like.

insert_button: insert PDF open button when importing Zotero items

Turn this on and you'll get a button to open a PDF every time you import an item from Zotero that has a PDF attached. If an item has more than one PDF, you'll get more than one button.

alias_citationKey (Experimental)

Lots of people use Better BibTeX to handle BibTeX keys in Zotero.

If you turn this on, the citation key will be used as the alias for an item page. This idea came from sawhney17/logseq-citation-manager.

For example, if the citation key is Smith2021, then the item page will have alias:: [[Smith2021]]. Also, the item will be inserted as [[Smith2021]] at cursor, instead of the full title.

unwanted_keys

A list of item page property keys that you don't want to import. For example, if you don't want to import original-title, then add original-title to the list. Separate keys with commas or newlines, like this:

original-title, date,
item-type

excerpt_style: Template for Annotation Excerpts

This is where you decide how the inserted text should look. Use {{excerpt}} as a placeholder, which will be replaced by the excerpt. By default, it looks like this:

> {{excerpt}}

area_style: Template for Inserting TeX

When inserting TeX, one could also customize the style by a template. In the template, two placeholders are provided: uuid and tex, which will be replaced by the UUID of the area highlight and the TeX respectively. The default template is

((uuid))\n$$tex$$

For example, use $$tex$$ as the template and the original area highlights will be replaced by LaTeX OCR results. More complex template using hiccup syntax might be possible, but I haven't tested it.

Possible Improvements

Import as Logseq pages:

Search Panel:

Search Syntax:


Proof of concept:


Not planned yet

Known Issues

Q & A

How is this different from Logseq's native /Zotero command?

This plugin is designed as a fully local substitute of the /Zotero command.

Meanwhile, this plugin is designed to be more user-friendly and feature-rich. For example, it supports importing multiple items at once, and it allows you to search for items in a popup panel. It also supports importing citation keys and has a more customizable import process (in future).

How to customize the template for a page created by the plugin?

It is still under development. See this discussion. Please share there what exactly you want for a template (because I still don't understand the needs well).

Till now, we allow to filter out unwanted properties of an item page. See unwanted_keys#Settings.

Why is the OCR service slow sometimes?

The OCR service is provided by Hugging Face. And it has a initialization time when it's not used for a while.

Why can't I change the page title format @xxx in the Zotero settings?

One thing we should keep in mind is that the settings in your Zotero profile won't affect this plugin. Insert page name with prefix:, Notes under block of: and other options are not exposed to plugins in Logseq.

These settings can only influence the behavior of the native /Zotero command.

Acknowledgements

TeX OCR

Zotero API

Icon

Search Panel GUI

Coding Assistance

Support

Find this plugin useful? Buy me a coffee ā˜•ļø (Unfortunately, I lost the access to my Buymeacoffee account. @Hira G. Thank for supporting me!) or you could support my favorite Logseq plugins and their developers. It's also a great help for me.

Both projects are not only feature-rich but also continue to evolve through active development.

Development