Link Search Menu Expand Document

Filesystem sync + cloud storage + android app

I want to share some ideas I have for after the 1.10 release. One of the more requested features is better synching. Well... Google Reader did provide a choice, but is dead soon. And TinyTinyRSS is an option but requires sysadmin skills to set it up. Native Liferea synchronization would be a better choice. The following post is a collection of some first ideas on how to do it in a very easy, simple and light-weight way.

Sync Services

IMO the easiest solution often suggested by users would be a SFTP / WebDAV / Cloud Storage based storage that the user mounts and Liferea just syncs files to. So some well known options would be
Ubuntu One5GB freeNative in Ubuntu, known to work elsewhere too
Dropbox2GB freePackages Debian,Ubuntu,Fedora (link)
SpiderOak2GB freePackages Debian, Fedora, Slackware (link)
Wuala2GB freeInstaller for Debian, Ubuntu, Fedora, Redhat, Centos; OpenSuse (link)

Sync Schema

So at the moment I'm wondering how to implement a synchronization schema that can sync several Liferea instances and a mobile client. Consider this simple schema as a starting point: Sync Concept Chart The most important implicit points:
  1. We do not sync all items. Just the reading state!!!
  2. Users want synchronization to not read things twice
  3. Users want synchronization to never loose there subscriptions
  4. Users want synchronization to keep important stuff (flagged items and newsbin)
  5. Only one client at a time. Locking with lock expiration.
  6. We rely on RSS/Atom GUIDs for having synchronous item ids amongst different sync clients
  7. We simplify read state sync by only listing unread ranges

XML File Layout

So an implementation might be working with a set of simple XML files in a directory structure like this:
clients/
clients/01d263e0-dde9-11e2-a28f-0800200c9a66.xml     (Client 1 state)
clients/0b6f7af0-dde9-11e2-a28f-0800200c9a66.xml      (Client 2 state)
clients/lock.xml                                                         (might be missing)
data/feedlist.opml
data/read-states.xml
data/items/flagged-chunk1.xml
data/items/flagged-chunk2.xml
data/items/flagged-chunk3.xml
data/items/newsbin-wtwxzo34-chunk1.xml
data/items/newsbin-wtwxzo34-chunk2.xml

Sync Logic

Each client can read files at once and relies on them being being written atomically. As it is XML it shouldn't parse if files are not complete. Then the client can cancel a sync and read again later. If a client can obtain all files it should:
  • Check if sync replay is needed (another client did write more recently)
  • Merge changes in the feed list, fetch new feeds initially
  • Merge all read states that do not match yet
  • Merge all new flagged items chunks.
  • Merge all new newsbin chunks.
If a clients wants to sync (e.g. periodically, or on user request or on shutdown) it should:
  • Aquire the lock (even if this might not make sense on delayed sync'ed directories)
  • Update the client meta data
  • Update read states
  • Add new flagged/newsbin item chunks if needed.
  • Remove items from older chunks if needed.
  • Join chunk files if there are too many.
  • Release the lock.