Initial Prototype Spec for Journal

Implemented as an Extension for web browsers (Chrome and Firefox). This will also be done as a mobile app shortly after the web version works.

For clarity: Each audio file is a stream, an Event is a length of time with one or more streams.



The extension will allow the user to record their microphone audio, and computer audio as separate channels (at a single button click). The computer audio is used to help line up/synch with other user’s record and is used to determine what should come together as an ‘event’, something which will necessarily need to become more sophisticated later, with permissions and so on, but for the prototype, all data is open since this will be used to build this project itself and the project itself is open.


Main Screen Tabs:

•  Record Event
•  Stop Recording
•  List of previously used tags to click on to add
•  Space to enter new tags. Tags are always in the form of tag data and tag category, such as ’speaker:Sam’ There will be fields for (at least), these, with will be entered into the system as, fx: ‘topics:data sharing’:
•   Known participants
•   Topics

My Streams
•  List of all events (audio files to start) with option to Rename. Shows status of upload to server and which server is used.
•  Way to add documents to the stream, of any kinds, such as text to do lists, with the option to tag with ‘used at’ time, which will be different from the ‘created’ time in the document itself.

My Contacts
•  Shows a list of all my contacts/friends with icons showing if they are sharing with me.
•  Dialog for adding friends. Responding to friend requisitions should also be here, or via email.

•  Enter username and password
•  Allow sharing of location of recorded data
•  Specify where to store data, with login information for each (Google Docs, Dropbox or Amazon s3/bucket to start)



This is where the user can combine streams into ‘events’ which are an equivalent of EDL (Edit Decision Lists) in video editing – descriptions of what media is to be used and what portions, and when.

The events are edited/packaged by the user and are also imported from other users. This is an area where more user testing will be crucial.

This tab allows the user to search and browse any available timelines. Here the user can play back any audio (not left right, but top to bottom, to allow text to flow with it), video and decide what to play and what to mute, much like how a music editor like GarageBand or Logic handles musical timelines.
Done Recording/Upload

Once done, the website will list the events and users can choose to send this for transcription, with a known company, who will use this interface to do the transcription in an environment which labels each speaker. The invoking will be done to whomever clicks the ‘transcribe button’ at something like $1.25 a min ($0.25 of which is for the time coding).

Users will be notified when the transcription is done and will be able to visit the website to do keyword searches, scroll through the full text and/or the audio, which will be in synch.
The Audio

The audio file will be named after the start time and date, with duration in the title, and ‘a’ at the end. If the system recorded more than one stream, then ‘b’ and so on will be tagged to the other streams. Primary, highest quality stream will always be ‘a’.

The recorded audio is tagged with created time at start of the recording (using correct time through publicly available time-server) as well as completed time, in the document meta. A time stream running along the document, aligning document-time with world-time will be added to the meta-information of the document (the EXIF data), so that any annotations added to the document are also added to the real-world timeline.

This recorded audio goes to a server, in the highest quality possible, and a link is generated. If this is through dropbox this is easy. Probably the same with Amazon or Google storage.

The various feeds from the same time ‘event’ are combined into a multi stream where the person listening can see who is who, by who recorded which stream. This means that transcription can ‘know’ who is who as well.



Proposed flow:

different journal architecture to discuss

Larger version: