Harvest: WordPress Plugin

This plugin is intended for use by WordPress developers rather than average users. If you’d like help integrating it into your current site then please get in touch with us at support@vialibri.net.

If you are using WooCommerce or custom post types to list your books then we have other plugins that will take care of most of the integration for you. You can find the documentation for these on our Harvest page.

Our aim with this plugin is to make it as easy as possible to format book data correctly for harvest by viaLibri. All you need to do is implement two filters in your WordPress plugin or theme and our plugin will take care of the rest. For the plugin to function correctly both of these filters must be implemented.

If you’re not familiar with the use of filters in WordPress then it’s worth reading the documentation before going any further.

Installing the plugin

The plugin isn’t available from the WordPress repository and must be downloaded from us directly. You can get a zip file containing the most recent version of the plugin from here.

You can add this to your WordPress installation in the normal way. Don’t forget to activate it too!

Where do I put my filter implementation?

If you’re already building a custom theme or plugin for the site then feel free to add your implementation of the filters to that. If you haven’t added any custom code yet, then the best place to put your filter implementation is in a new plugin.

There’s plenty of detail in the WordPress documentation about creating a plugin, but you can get away with these steps:

  1. Create a file called vialibri-harvest-filters.php.
  2. Save it in the plugins directory of the site on the server. This is usually wp-content/plugins.
  3. Edit the file, putting <?php on the first line and /* Plugin Name: viaLibri Harvest Filters */ on the second line. The rest of the file will be your implementation of the filters described below.
  4. Activate the plugin in the site’s admin area.

Checking your filter implementation

The plugin adds a new page to the Settings menu in your WordPress admin area called “viaLibri Harvest”. This page will show you all the data that’s collected from the filters and will also let you know if the plugin’s detected any problems with the format of the data.

Any books that have errors displayed on the settings page won’t be passed on to viaLibri, so it’s important to fix any errors that show up there.

The vialibri_harvest_id_list filter

Using this filter you should supply an array containing the IDs of all the books that are currently for sale on your site. These are the internal IDs that your database system uses and correspond with the source_id values in the second filter outlined below. The IDs can be provided as strings or integers. The ordering doesn’t matter.

Examples

In this first example we just pass in a hard-coded array of values:

add_filter('vialibri_harvest_id_list', 'harvest_id_list');

function harvest_id_list($ids) {
  return [
    1234,
    1235,
    5673,
    8090
  ];
}

Note that we ignore the $ids parameter that’s passed in. This will always be an empty array and you can always ignore it.

Of course, on a real site you’d probably want to fetch the values from a database. Here’s what that might look like on a WooCommerce site:

add_filter('vialibri_harvest_id_list', 'harvest_id_list');

function harvest_id_list($ids) {

  $q = new WP_Query([
    'post_type' => 'product',
    'post_status' => 'publish', // Only books currently for sale.
    'fields' => 'ids', // Return the ID only, rather than the full object.
    'posts_per_page' => -1 // Turn off paging so that we get all books.
  ]);

  return $q->posts;
}

Check out the WP_Query documentation for more details of querying custom post types.

Note: We use WooCommerce as an example here, but if you’re using WooCommerce you might be better off using our dedicated WooCommerce plugin.

The vialibri_harvest_book_data filter

The plugin uses this filter to capture data on the books that are for sale on your site. You should return an array of books, with each book being represented by an associative array of values.

These books should be returned in descending order of the date they were last updated, so that the most recently updated book comes first.

This filter is different from the previous one in that it provides two extra parameters: a page number and a page size. The page number starts at 1 and you should use this parameter to return a page of books at a time. The page size parameter tells you how many books should be returned per page. The plugin will keep going through the pages until it runs out of books, so you’ll need to make sure that using a page number that’s too high does not result in an error and just returns an empty array.

Examples

A hard-coded example would look like this:

// Note the extra parameters here. These need to be included.
add_filter('vialibri_harvest_book_data', 'harvest_book_data', 10, 3);

function harvest_book_data($books, $pageNumber, $pageSize) {
  if($pageNumber === 1) {
    return array(
      array(
        'date_update' => '2016-12-08 19:10:05',
        'author' => 'George Orwell',
        'title' => 'Nineteen Eighty-Four',
        'description' => 'Nineteen Eighty-Four, often published as 1984, is a dystopian novel by English author George Orwell published in 1949. The novel is set in Airstrip One (formerly known as Great Britain), a province of the superstate Oceania in a world of perpetual war...',
        'source_id' => 137,
        'sku_dealer_item_id' => 'ABC123',
        'year' => '1949',
        'edition' => 'First edition',
        'publisher' => 'Secker & Warburg',
        'price' => 1234.56,
        'keywords' => 'dystopian, sci-fi', // Can be a string or an array.
        'isbn' => '9780547249643',
        'first_edition' => true,
        'signed' => false,
        'dust_jacket' => true,
        'item_url' => 'https://www.example.com/1984/',
        'image_url' => 'https://www.example.com/1984.jpg'
      )
    );
  } else {
    return array();
  }
}

Again we ignore the first parameter, $books. This will always be an empty array and you can always ignore it.

The following example shows what this filter might looks like on a site using WooCommerce. This makes the assumption that all product titles are formatted like “George Orwell: Nineteen Eighty-Four”, i.e. “Author: Title”.

This example only shows a very simple integration. If you are using custom product attributes to store other bits of book data then you’ll want to adapt this example to bring that data in too.

add_filter('vialibri_harvest_book_data', 'harvest_book_data', 10, 3);

function harvest_book_data($books, $pageNumber, $pageSize) {
  $q = new WP_Query(array(
    'post_type' => 'product',
    'post_status' => 'publish', // Only books currently for sale.
    'posts_per_page' => $pageSize,
    'paged' => $pageNumber,
    'order' => 'DESC',
    'orderby' => 'modified' // Most recently modified first.
  ));

  $books = array();

  foreach ($q->posts as $post) {
    $product = new WC_Product($post);

    $tags = array_map(
      function ($t) { return $t->name; },
      get_the_terms($post, 'product_tag')
    );
    $cats = array_map(
      function ($t) { return $t->name; },
      get_the_terms($post, 'product_cat')
    );

    // We assume that all product titles are of the form "Author: Title".
    // You'll want to update this to fit your naming conventions.
    $split_title = explode(':', $product->get_title(), 2);
    $sku = $product->get_sku();
    $image_id = $product->get_image_id();

    $books[] = array(
      'date_update' => $post->post_modified, // Already in the right format.
      'author' => $split_title[0],
      'title' => trim($split_title[1]),
      'description' => $post->post_content,
      'source_id' => $post->ID,

      // Use SKU if available, and fall back to ID if there's no SKU.
      'sku_dealer_item_id' => $sku ? $sku : $post->ID,

      // get_price() takes sale prices into account.
      'price' => $product->get_price(),
      'keywords' => array_merge($cats, $tags),
      'item_url' => $product->get_permalink(),
      'image_url' => $image_id
                        ? wp_get_attachment_image_src($image_id, 'full')[0]
                        : null
    );
  }

  return $books;
}

Note: We use WooCommerce as an example here, but if you’re using WooCommerce you might be better off using our dedicated WooCommerce plugin.

Paging

The idea of paging the filter response is to reduce the load on your site and avoid timeouts in fetching large numbers of books. As the books are ordered in reverse chonological order we would ideally only have to fetch a page or two to get all the changes since we last checked.

Fields

Fields with stars (*) are required.

  • date_update * – The date that the book’s info was last updated. This must be in the format Y-m-d H:i:s, e.g. 2016-12-16 13:03:12. The timezone doesn’t matter and we don’t need to know what it is, you just need to be consistent in using the same timezone.
  • author – The author of the book.
  • title * – The title of the book.
  • description – A description of the book. This field should merge, in the preferred order, all data that is needed as part of the book description, such as publisher, condition, place of publication, edition, format, comments, publication date, etc. Data which is included in the edition and publisher fields will not be displayed unless it has also been added to the field. However, it is possible for our system to automatically add the publisher and year fields to the start of all your descriptions. Let us know if you’d like us to turn that on for your account.
  • source_id * – Your database’s internal ID for the book.
  • sku_dealer_item_id * – Dealer’s inventory code for the book. This must always be provided, so if you don’t use any sort of inventory code just use source_id instead.
  • year – Publication year. This can be just given as four digits, but you may include some text as well. Any extra text will be ignored.
  • edition – A description of the edition of the book.
  • publisher – The book’s publisher.
  • price * – Either as a number or a string. This should not include a currency symbol or any extra formatting. Just the price.
  • keywords – A set of topics or areas that are relevant for this book. Can be provided as a string or an array.
  • isbn – The book’s ISBN number (if it has one).
  • first_edition – A boolean representing whether this book is a first edition or not.
  • signed – A boolean representing whether this book is signed or not.
  • dust_jacket – A boolean representing whether this book has a dust jacket or not.
  • item_url * – The full URL for the book on your website.
  • image_url – The full URL for an image of the book. This should be the largest version of the image available, so if you’re using wp_get_attachment_image_src you need to pass 'full' as the image size. If you don’t have an image for this particular book then leave this field blank. Do not give us the URL for a placeholder image.

All text fields (including description) should be provided as plain text, not HTML. We do make an effort to clean HTML elements from the text in the files, but including them in your file may result in strange characters being displayed in your search listings on viaLibri.