Work in progress: January 2000
Sculpt by Chris Gunn is a standard VE Lab demonstration application. It shows two advanced user interface technologies, a force feedback haptic input device and a stereoscopic immersive display, being used for the task of sculpting a clay model. Sculpt has proved to be an easily approachable introduction to the technology with a high "wow!" factor for new users.
Within the VE Lab there are two projects extending Sculpt into a networked application running on multiple hosts. The first is to generate a standard MPEG-4 stream from within Sculpt that can be viewed by any MP4 compatible viewer. The second, and the subject of this report, is netsculpt, a version that allows multiple participants on different workstations to work together on sculpting a single model. (The two projects are independent: netsculpt does not use MPEG-4.)
Netsculpt was designed to explore three design concepts in particular:
Netsculpt is now complete and stable on SGI IRIX workstations within the VE Lab local area network (LAN), but has not yet been tested over a wide area network (WAN). Communication between hosts is by IP multicast, using both unreliable and reliable protocols as appropriate. Collaborative peer to peer groups can be formed transparently to the users, although there is an unresolved problem with late joining. Overall, netsculpt is a successful demonstration of the key design concepts.
The design concepts behind netsculpt should be valid in other areas. While not a product itself, netsculpt has most of the essential characteristics that would be required for a real distributed CAD or VR system:
Netsculpt uses IP multicasting for all network communication. In multicasting one transmitted packet will be delivered to many receivers, rather than the more well known and widespread one to one unicast UDP/TCP model. Multicast is used extensively in routing protocols and network/ service configuration and discovery, such as JINI. It is also used for real time applications such as streaming audio and video where TCP is inappropriate. Netsculpt uses multicast for the same reasons of automatic configuration and realtime transport of data.
The core IP multicast protocols are best effort (unreliable) like UDP, as opposed to the guaranteed delivery in order behaviour as in TCP. There is currently no single Internet standard for reliable multicasting between applications. Netsculpt uses LRMP, the Lightweight Reliable Multicast Protocol developed at INRIA in France and also used in the Java Shared Data Toolkit.
As an IP protocol, multicast can potentially be used between hosts anywhere on the global Internet. In reality, multicast can only be assumed to work between hosts on a single LAN. While all operating systems since about 1995 have implemented multicast on LANs, wide area multicast requires support from routers as well. Multicast routing between LANs is not available at the VE Lab, hence the lack of WAN testing for netsculpt.
Netsculpt uses a single multicast address for all communication, and two different port numbers to distinguish between reliable and unreliable traffic. The address is configured at runtime rather than being hard coded, so multiple netsculpt sessions can exist without interference on a single network by using different addresses.
Many distributed shared memory or remote procedure call systems have the goal of making communication over the network identical in semantics and usage to communication within the program itself. Netsculpt does not. Instead, the network is exposed to the application programmer and the design makes use of comparatively low level socket primitives rather than some form of abstraction layer. This has the disadvantage of requiring more work from the programmer, with the advantage of a closer fit between the application requirements and the communication protocols, important for a real time system. (Then again, the author is comfortable with such programming - others might prefer a toolkit approach.)
Multicast protocols, whether peer to peer or client/server, also have the advantage that they are not "fate-sharing" as is TCP. In a system based on unicast TCP, if one host crashes or otherwise exits, all other hosts connected must cope with the abrupt disconnection: sharing the crash. In a system based around multicast, an important host can be shut down and restarted, or replaced by another, without others being affected except by a short delay in response. Such transparency and roll-over capabilities are especially important for long-lived "24/7" applications.
Overall the use of multicast in netsculpt has been successful. The implementation has some ugly aspects, but solving these would not have been possible without extensive modifications to the existing code. Similar problems occur when adding network communication to any existing application, whether multicast or unicast.
Collaboration in netsculpt is built around multicast groups, creating a peer to peer system rather than a client/server (eg the Web) or player/ source (MPEG-4). A group is identified by a unique multicast address that must be known to all participants. It is automatically created when the first netsculpt participant starts, and persists until every participant has exited. Multicast addresses can be registered in the DNS like any other Internet entity, so it would be possible to identify long-lived sessions by name such as sculpt.act.cmis.csiro.au. To date netsculpt uses only the raw IP numbers such as 239.255.0.8.
With all participants needing to know only a multicast group address, there is no server that must be running for the lifetime of the group, or that anyone has to worry about whether is running or not. Creating the group, joining the group, and exiting are all transparent to the participants. The group persists while any member remains active, even if the original creator of the group exits.
For a distributed system with active participants, being able to leave without notification is actually a problem. Netsculpt requires regular "heartbeat" packets from all participants so that those who have dropped out can be identified by their silence and removed from the shared state.
For the small number of users in a demonstration system the advantages of multicast compared to a client/server design are not obvious. The major advantage is in an environment with numerous short-lived collaborative groups, since there is no need for a special server host. Forming such sessions is quicker and easier, and not dependent on one particular host remaining active. For a large scale system, removal of a single point of failure is an obvious advantage.
There is one unresolved problem related to groups in netsculpt: late joining. In the current implementation, participants may start up in any order, but all participants must have joined before any actual sculpting takes place. For the same reason, a participant cannot rejoin once they have left. The details of this problem are covered in a later section.
Other than the late joining problem, netsculpt successfully demonstrates that peer to peer multicast allows collaborative groups to be conveniently formed and can remove the need to code special server versions.
A collaborative system needs some way to indicate the presence of others: how many and what they are currently doing. Some systems such as Active Worlds use avatars or floating heads, others a list of usernames. In netsculpt the virtual presence of another user is represented by their sculpting stylus.
Every user of Sculpt is familiar with the stylus that appears on screen. Netsculpt extends this by displaying a stylus image for every other participant in the session as well. Like the users own stylus, the position and orientation is updated in real time and each image shows the tool tip shape and current painting color. Each user can see at a glance roughly how many other people have joined the session, which part of the model they are working on, and what they are doing.
These remote stylus images are "ghosts" with no physical presence. Netsculpt does not do any collision detection or haptic feedback for intersecting stylus images. While this doesn't correspond to what would happen in the real world, from the user interface point of view there is no reason to slow users down by making them dodge around each other in close working conditions. And for the implementor, it is much easier to code.
All participants in a netsculpt session multicast their current stylus position, orientation, and tool type every frame. Unreliable multicast is used, since a dropped packet only causes a brief stutter in the stylus motion. These stylus packets also serve as "heartbeat" or "I'm still alive" indicators. A user may not be actively updating the model and still be part of the group, but if they stop sending stylus packets for a number of seconds the rest of the group assumes that they have exited and removes them from the display.
Latency has not been detectable in netsculpt, not surprising in the fast LAN network environment of the VE Lab. Latency problems should not arise in a WAN environment. Stylus movements are generally slow, so the decreased packet rate in a WAN can be compensated for by dead reckoning prediction. Stylii do not collide, so inconsistent or missing updates will not cause any user interface discontinuities between different hosts.
One improvement would be to transmit personal information, such as the user name, and make these details visible on request to other users. A stylus image is anonymous and only shows actions, not who is performing them. In the VE Lab this is not a problem since the hosts are physically co-located and participants can just talk to each other about what they are doing, but this is obviously not practical if the hosts, even on a LAN, are in different rooms.
Transmitting user input device values over the network is proving useful in other areas. The protocol and code used in netsculpt was first used to demonstrate remote stylus images being exchanged between Performer programs on IRIX and Java3D programs on Windows, and these earlier programs can eavesdrop on the stylus actions of a netsculpt session. (Although without being able to see the actual model updates this is not very interesting.) It is also being used to allow a video hand tracking system running on a PC to be used as a 3D/6DOF input device for an SGI viewer program. The SGI program was written first, for use with a Polhemus or Phantom input device. Replacing these with the camera system required no changes at all to the viewer, not even a recompilation, since the network packet format is the same.
Netsculpt implements two types of model updates: geometry deformations (sculpting) and texture pixel changes (painting). Sculpt also allows for the sketching of 3D lines, but these are not distributed in netsculpt.
Netsculpt uses the LRMP protocol for reliable multicast. This is a member of the SRM family of protocols, in which reliability does not depend on any particular host. With LRMP, even if a host transmits an update and then immediately crashes, other hosts can retransmit that update should it be required. (Although the particular implementation of LRMP used by netsculpt does not implement this yet.) As with the underlying multicast group, reliable multicast protocols allow the shared state to persist even after the originator has left.
Netsculpt uses a simple application-specific protocol for transmitting updates. Changes to the geometry or texture of the model are batched into packets and transmitted every frame. So far, the rate at which data is generated by any copy of netsculpt has never exceeded 50 kilobytes per second. As with virtual presence, latency has not been detectable due to the high speed LAN used for testing.
Transmission of model updates is the area most likely to have problems for netsculpt in a WAN environment. There is a reason why there is no Internet standard for reliable multicast yet: it is very hard to do well. Netsculpt will probably not scale without a more extensive redesign of the core code.
Netsculpt does demonstrate that reliable multicast is practical for collaborative systems on small networks. There is more work for the application programmer than a comparable system using unicast TCP, but the extra effort is less than that required to add network capabilities to the application in the first place.
Any collaborative application requires consistency of data between the distributed systems to be useful, and the subject has been intensively researched in many application domains. In the case of netsculpt, there is a single 3D model which is being updated in real time by multiple users. Various approaches to the consistency problem have been discussed at length in the VE Lab. Netsculpt adopts the well known "Ostrich Algorithm" of ignoring a potential problem in the hope that it won't occur. There is no locking, turn-taking, or other synchronisation between the participants in a netsculpt session. Updates are simply applied to the model as they arrive.
This algorithm - or rather, the lack of any - has been successful in netsculpt. It was also adopted in the Distributed Interactive Virtual Environment (DIVE) system in the upgrade from version 2 to version 3. For the implementor, exposing the network in this way makes the code considerably simpler. It also makes the distributed system as a whole more robust, since there is no need to consider tokens being lost, locks claimed and never released, or other problems.
Simultaneous updates are indeed possible in netsculpt, and the behaviour of the application if this happens is not predictable due to network latency. But while this may be disturbing to anyone interested in the implementation of distributed systems, it is not an issue for the application users because it matches their mental model of how the system should work. The stylus images in netsculpt show the proximity of other users and hence the possibility of interference. Who gets to update a particular area can then be resolved informally using communication channels outside the scope of netsculpt. If two users do try to update at the same time anyway, the result is unpredictable but this is exactly what the users would expect to happen in the real world and therefore not seen as a defect in the application.
Netsculpt demonstrates that consistency and synchronisation at all times is not essential to a collaborative modelling system. Human users generate updates at a slow rate as compared to, say, a distributed database; and are able to anticipate and avoid likely conflicts themselves. It certainly helps that netsculpt is a demonstrator application rather than a serious production tool, but even in more demanding environments it is worth exploring the concept of on demand or occasional consistency rather than attempting global and continuous perfection.
Netsculpt fails to meet one important design goal: it has a late joining problem. All participants in a netsculpt session must join before any updates can be made to the model, and should a participant leave the session for any reason they cannot rejoin.
More accurately, late joining is possible but leads to inconsistencies in the shared model. When a netsculpt session begins, the shared model is a standard blank ball of clay which is then updated over time. A late joiner should see the current model with all updates applied since the session began. In the current implementation of netsculpt, they instead gets the original blank sphere and only see those updates transmitted after joining.
The first consequence is obviously that the model seen by a late joiner is not the same as that seen by the rest of the group. The second is that if they in turn begin to update the model, those updates are based on the wrong geometry and when transmitted to other users will appear to overwrite, or even revert the model instead of maintaining a consistent work flow.
Solving this will require a redesign of the protocol used for geometry updates, and more extensive changes to Sculpt itself. The goal will be to design an application independent protocol for general use in collaborative systems, and for various reasons will probably not use Sculpt as the test case. Once designed, netsculpt will be modified to use the new protocol with the aim of demonstrating at least limited interoperability between netsculpt and other 3D programs at the VE Lab.
The new protocol and algorithms have yet to be determined. One approach is to have a designated host which is accepted as the authorative source. This is simple, but contrary to the concept of a true peer to peer system unless the authorative host can be chosen dynamically and replaced if needed. User initiated synchronisation would be a good choice from the perspective of user interface design, as some delays or restrictions on action while the application stabilised would be tolerated. The last option (to date) is background consistency, where all hosts exchange state in a continuous trickle of information during application and network idle time slots.
Netsculpt is now complete, as far as any software project is ever really finished. Future work will fall into three areas: porting to Windows, WAN testing, and solving the late joiner problem.
Sculpt now runs on Windows NT PCs as well as IRIX, as do the networking libraries used for multicasting. Netsculpt will likewise be ported to Windows, with the goal of demonstrating interoperability between the IRIX and Windows versions. Porting to Windows will also increase the possible number of simultaneous users in a session within the VE Lab, allowing tests of performance under heavier and more complex loads.
Testing over some form of WAN is necessary to determine whether the multicast protocols used will scale to larger systems without significant latency or reliability problems. This testing may end up being done with one of the network simulation packages available, avoiding the complexity of trying to organise a geographically dispersed sculpting session. Such packages are now well established as a testing tool for other aspects of Internet protocols and applications.
Lastly, there is the serious late joining problem to solve. This is not just a problem for netsculpt, and the solution will probably be first developed as part of a separate project and only later adapted to netsculpt.