From Macro to Micro: Autonomous Multiscale Image Fusion for Robotic Surgery

Minimally invasive surgery (MIS), performed through a small number of keyhole incisions, has become the standard of care for many general surgical procedures, reducing trauma, blood loss, and other complications and offering patients the prospect of a faster recovery with less postoperative pain. These improvements for the patient, however, require higher dexterity and complex instrument control by the surgeons. Keyhole incisions constrain the motion of surgical instruments, while the loss of stereovision when using a laparoscope or endoscope means that depth perception is much poorer than in traditional open surgery. The desire to tackle these issues has been the main driver behind the development of robotic MIS systems with stereovision. In particular, the da Vinci robot (Intuitive Surgical, Inc., Sunnyvale, California) is a successful surgical platform, used widely in the treatment of gynecological and urological cancers. While human guidance is essential for MIS, recent studies [1] have suggested that automation of some surgical subtasks, particularly those that are tedious and repetitive or require high precision, can be beneficial in improving accuracy and reducing the cognitive load of the surgeon. For example, several studies have investigated automation of surgical suturing subtasks, including using a suturing tool under fluorescence guidance [2], and other studies have explored areas such as autonomous tissue dissection [3].


I. INTRODUCTION
Minimally Invasive Surgery (MIS), performed through a small number of 'keyhole' incisions, has become the standard of care for many general surgical procedures, reducing trauma, blood loss and other complications, and offering patients the prospect of a faster recovery with less postoperative pain. These improvements for the patient, however, require higher dexterity and complex instrument control by the surgeons. Keyhole incisions constrain the motion of surgical instruments, while the loss of stereo vision when using a laparoscope or endoscope means that depth perception is much poorer than in traditional open surgery. The desire to tackle these issues has been the main driver behind the development of robotic MIS systems with stereo vision. In particular, the da Vinci R robot (Intuitive Surgical Inc., CA) is a successful surgical platform, used widely in the treatment of gynaecological and urological cancers. While human guidance is essential for MIS, recent studies [1] have suggested that automation of some surgical subtasks, particularly those that are tedious and repetitive or require high precision, can be beneficial in improving accuracy and reducing the cognitive load of the surgeon. For example, several studies have investigated automation of surgical suturing subtasks, including using a suturing tool under fluorescence guidance [2], and other studies have explored areas such as autonomous tissue dissection [3].
A less-explored potential beneficiary of automation is optical biopsy. High-resolution probe-based confocal laser endomicroscopy (pCLE) and optical coherence tomography (OCT), offer visualisation of cellular scale tissue details in situ, providing a real-time alternative to conventional biopsy and histopathology. pCLE has been applied extensively to diagnostic and surgical procedures in the gastrointestinal tract and abdominal organs [4], [5], and there are a number of potential applications in surgery, particularly for the identification of tumour resection margins. However, the small Field-of-View (FoV) of the probes (typically less than 1.0 mm) means that scanning is needed if significant areas of tissue are to be analysed. For individual images to be stitched together and to form large FoV mosaics, several mosaicing algorithms have been developed, including both real-time [6] and more robust offline processing [7] approaches. However, the operator is required to maintain optimal probe orientation while performing smooth and controlled scanning motions with sub-millimetre accuracy. For pCLE in particular, which requires direct tissue contact, the probe-tissue force is critical [8], both to ensure good quality images and to limit the amount of tissue deformation occurring during scanning. Mosaicing under manual control is therefore extremely challenging.
To address this issue, several studies have attempted to robotise pCLE probes with the aim of achieving consistent imaging over large areas of tissue. These developments have included dedicated robotic mechanisms for 2D scanning [9]- [11] and for maintaining the desired probe-tissue contact force [8]. Integration with existing robotic systems, such as the da Vinci R [8], [12], has also been investigated. However, even with robotic assistance, limited depth perception and poor ergonomics mean that it can still be difficult for clinicians to perform a continuous and smooth scan during tele-operated surgery; maintaining the probe in a specific orientation and ensuring continuous contact with the surface remains difficult and tedious.
OCT differs from pCLE in that it provides cross-sectional images of the tissue's structure. The nominal depth range is typically several millimetres, although penetration into scattering tissue is limited to 1-2 mm in practice. OCT has achieved significant clinical success in ophthalmology, and endoscopic OCT has extended the range of applications to areas such as the gastrointestinal tract [13]. It has also been used in laparoscopic and robotic prostatectomy [14] for characterising neurovascu- Fig. 1. An overview of our experimental setup and the steps involved for autonomous optical biopsy probe scanning and multi-scale fusion. The robotic system consists of a set of dVRK controllers and both the pCLE and OCT probes are grasped by a da Vinci R PSM. The microscopic system consists of an endomicroscope (pCLE) system, an OCT system and a PC used to capture and process pCLE and OCT images. The data flow streaming from the different imaging modalities is processed for visualisation and servoing purposes. From a pair of stereo images, a surface of the scene is reconstructed as a point cloud. By stitching pCLE images, a mosaic image can be created, and a 3D volume can be built from OCT images. These results are fused into a unified window for multi-scale visualisation. lar bundles along the prostate in real-time. As for pCLE, it is necessary to mosaic OCT images if a large FoV is required, and this shares many of the difficulties with pCLE mosaicing. However, since OCT is a cross-sectional technique that does not require tissue contact, it can also be used to determine the distance between the probe tip and the tissue. Several studies have suggested the potential benefits of combining cross-sectional OCT images and fluorescence microscopy (e.g., [15]), although most of the designs for combined probes require sacrificing the resolution of the fluorescence channel. Here, by mounting the two probes side-by-side, we combine the advantages of both techniques to provide complementary diagnostic information -high-resolution surface information and lower resolution sub-surface information.
Autonomous scanning systems have the potential to support the use of optical biopsy in surgical operations by reducing the cognitive load on surgeons and improving the feasibility of scanning over larger surfaces. Zhang et al. [16] proposed such an autonomous framework using the da Vinci R robot, where both the pCLE images and the laparoscope were used to guide the scanning. Estimation of the required pCLE probe position for good surface contact was through stereo surface reconstruction. However, due to inaccuracies of this reconstruction, manual adjustment may be required in order to maintain consistent contact forces and hence good pCLE images. To address this issue, in this paper the scanning motion is visually servoed not only using pCLE images (in the direction transverse to the tissue surface) but also by OCT images (in the axial direction). With this approach, the robot is able to ensure that optimal contact force is maintained between the pCLE probe and the tissue, assisting with smooth motion (avoiding stick-slip) and helping to form continuous mosaics. Furthermore, to facilitate intraoperative tissue diagnosis and identification, a 3D visualisation method is proposed to fuse the reconstruction of the 3D tissue surface with 2D endomicroscopy mosaics and 3D OCT volumes on-the-fly, an improvement over the work of [16]. This 3D fusion approach is designed to provide the surgeon with intuitive real-time visualisation of multi-scale image information, supporting surgical diagnosis and decision-making. The framework has been validated with a series of phantom and ex vivo tissue experiments, with the results demonstrating the potential clinical value of the approach.

A. Framework Overview
The system links custom pCLE and OCT systems with a stereo laparoscope and a patient side manipulator (PSM) of a da Vinci R surgical robot with dVRK controllers. The dVRK controllers (da Vinci Research Kit, Medical Motion) allow the conventional master console of the da Vinci R to be bypassed and replaced by the autonomous control system. They are connected to a host PC via an IEEE 1394 firewire interface in a daisy chain topology. The stereoscopic system provides standard-definition (720x576) video streaming (for both left and right channels) at 25 Hz, which is captured by the host PC using a Kona 4 PCIe frame grabber (AJA Video System).
The pCLE and OCT systems are developed with in-house designs, both consisting of a flexible fibre optic probe connected to an external base unit. The distal tips of the fibre probes are held by a pick-up mount which can be grasped by da Vinci R instruments. The pCLE system is a high-speed linescanning endomicroscope [17] coupled to a Cellvizio UHD fibre probe (Mauna Kea Technologies). It excites tissue at a wavelength of 488 nm and collects fluorescence emission above 500 nm. It is used with typical fluorescent stains which can be excited at 488 nm, such as acriflavine. The probe, which consists of a 30,000-core fibre imaging bundle and a microlens, provides a FoV of 240 µm and a fibre-sampling limited resolution of approximately 2.4 µm. Since each fibre core within the bundle transmits a 'pixel' of the image, no scanning mechanism is needed at the distal tip of the probe; it is entirely passive. The external line-scanning system illuminates and images a single line of the tissue (via the bundle) at a time, a technique which ensures light is predominantly collected only from an in-focus plane approximately 9 µm in depth. The line-scan design (rather than point-scanning as in conventional confocal microscopes) sacrifices some axial resolution in exchange for a maximum frame rate of 120 fps, which is beneficial for this application as it allows for higher speed scanning.
The OCT system uses a swept source laser (Axsun Technologies, Billerica, MA) with a central wavelength of 1300 nm. The laser sweeps over a wavelength of 110 nm at a frequency of 100 kHz, with each sweep allowing acquisition of a single axial line (an A scan) through the tissue by the technique of spectral domain low coherence interferometry. The forward viewing fibre probe is 13 mm long (fabricated at the Institute of Applied Physics, Russia) and has a 2.7 mm diameter rigid tip. This tip incorporates copper wires and a NdFeB magnet, providing a mechanism for scanning the fibre at 40 Hz and hence generating 2D images. A GRIN micro-lens focuses the beam onto the tissue, and the assembly is covered with Teflon tubing. The FoV of the probe is approximately 1 mm.
Both optical biopsy systems are controlled via a single software interface, developed in LabVIEW (National Instruments), running on a secondary PC. Images are acquired and processed in real time, resized to 300x300 (pCLE) and 90x300 pixels (OCT), corresponding to real dimensions of 240 x 240 µm and 900 x 3000 µm respectively, and are streamed to the host PC at 40 Hz via a TCP/IP connection. The software component of the framework is implemented using the Robot Operating System (ROS) across two computers, with the workload of robot motion control separated from imaging processing and 3D visualisation. The ROS uses TCP/IP to communicate with the pCLE/OCT PC and the servoing PC; this has a round trip time of about 5 ms. The 2D mosaic processing runs at 10 Hz (100 ms). Therefore, for lateral motion compensation using pCLE images, the delay is approximately 110 ms. For vertical motion compensation using OCT distance estimation, the delay is approximately 40 ms.
As shown in Fig. 1, the stereo images captured by the camera are used for 3D tissue surface reconstruction via a stereo matching method. Images captured by the pCLE system are registered pairwise by normalised cross-correlation and combined by the dead-leaf method. Concurrently, OCT images are mapped into a 3D volume based on the current probe pose, providing a 3D volumetric reconstruction over the scanned tissue. The pCLE mosaic, OCT volume and reconstructed surface are fused together on-the-fly to provide both macroand micro-views of the scanned region.
To generate a mosaic over a defined area, the probe is scanned over a pre-defined trajectory using the robot, with the trajectory planned in the coordinate system of the mosaic. A visual control component closes the loop by comparing the current and desired probe poses and driving the robot to minimise their difference. This results in the desired region of tissue being imaged regardless of kinematic errors or tissue motion and deformation.
Separate from the visual servoing loop, the end-effector pose of the robot (in Cartesian space) is then read and set via a dVRK-ROS component which is connected to a low-level PID controller implemented by the SAW package using the cisst library [18].
The scanning trajectory is planned on a 2D plane with the aim of generating a mosaic over the desired area of tissue. A spiral trajectory is preferable to a raster pattern as it avoids sudden changes in direction and hence exerts less deformation on the tissue. To generate a desired spiral trajectory, the distance between two successive spiral loops (∆r) is predefined. This defines the constant b: In order to reduce the likelihood of producing gaps in the mosaic image, whilst preventing excessive over-sampling, ∆r is set to half of the endomicroscope's FoV (240 µm). The radius of the spiral trajectory r sp represents the size of the scanned region, and the total length of the spiral trajectory l sp is calculated as: Finally, the k-th trajectory point is defined as: where θ k is obtained using equation 1 and 2:

B. Closed Loop Scanning
At each iteration of the control loop, the desired robot endeffector command is defined as: where T 2 1 denotes a transformation from coordinate {1} to {2}. The definition of the different coordinate systems is illustrated in Fig. 2. T E B is the end-effector pose in robot base coordinates, calculated using forward kinematics. The transformation T P * P is the transformation between the current and desired probe positions. T P * P is calculated in every iteration during closed loop control using the displacement between the current and desired probe positions. T P * P is defined as: where ∆t = ∆t x , ∆t y , ∆t z is a displacement vector and I is a 3×3 identity matrix. In order to obtain the displacement vector, we use the information from either or both of the pCLE and OCT images. The rest of the transformations in equation 5 are constants that can be measured or calibrated in advance.
In particular, the end-effector to marker transformation T M E is calibrated using a standard hand-eye calibration method [19]. The marker to probe transformation T P M is determined from the CAD model of the adapter. We also note that T E * M * = (T M E ) −1 and T M * P * = (T P M ) −1 . Continuous detection of the marker is not required during local scanning with closed loop servoing, as it is only used to determine the transformation between the robot end-effector and probe, which does not change in time with a rigid setup. Note that the scanning surface is assumed to be a plane as each individual scanning region is usually small (about 2x2mm).
1) Servoing using pCLE Images: To allow a continuous mosaic to be generated over the desired area of tissue, the visual servoing loop is closed on the mosaic image. The pCLE mosaicing is performed using an approach similar to the standard real-time technique described in [6], [17], using normalised cross-correlation to estimate the relative shift between each pair of consecutive frames. At the beginning of each scanning procedure, the probe position in the mosaic image m p(t = 0) = (0, 0) is located at the centre of the image. During scanning, an estimate of the current probe position m p(t) at time t is obtained from the mosaic (i.e. from integrated pair-wise image shifts over time). Next, by comparing m p(t) with the k-th desired probe position in the trajectory m p * (k), the displacement between the current and desired position is calculated as: where α is an 2D rotation angle between the probe and the mosaic image, and k m is a constant that converts the displacement from pixels to real distance. To calibrate the rotation angle α, we drive the robot in a horizontal line scan using the laparoscope. The angle α between the scanned line and the desired horizontal line can be calculated.
2) Servoing using OCT Images: The distance from the OCT probe to the tissue can be estimated by detecting the top surface of the tissue in the OCT cross-sectional image, which can be seen in Fig. 1. The OCT probe is mounted slightly higher than the pCLE probe, so that the top surface of the tissue appears at a non-zero depth in the OCT image. Since the top surface usually maintains the most intense signal, this can be found by simple peak detection, taking the first peak above a user-defined threshold. To mitigate the influence of noise, a Kalman filter is applied to assist accurate distance estimation for servoing. Here, we consider a constant velocity model for the Kalman filter as the motion of robot along the depth direction is smooth. At the beginning of each scan, we assume that a good initial contact has been made by the user (i.e. clear pCLE images can be seen), and the current OCT distance estimate d oct is recorded. This distance is then set as the desired distance d * oct and the robot is thus required to maintain this distance during scanning. The displacement along the z-axis is defined as: Here k o is a constant that converts the distance from pixels to real distance.

C. Multiscale Fusion and Visualisation
In this work, three different imaging modalities are fused together in a unified visualisation framework including both macro-and micro-views. For the macro-view, a pair of stereo images is used to reconstruct a 3D surface of the scanning region using an efficient stereo matching method [20]. The micro-view then consists of a mosaic image obtained from pCLE images overlaid onto a 3D volume reconstructed from OCT images.
To place the mosaic image with the OCT volume, we take account of the known lateral offset between the OCT and pCLE probes due to the design of the pick-up mount, as shown in Fig. 2. The size of the volume is set according to the defined size of the scanning region such that it will be large enough to contain all OCT images during scanning. The size of each voxel in the volume is set equal to the size of the OCT image pixels, such that each voxel represents approximately 10 µm in the plane of the cross-section, and 10 µm along the outof-plane direction. A 2D-3D mapping allows each pixel in each OCT image to be mapped to a voxel in the volume. For each scan, the volume is initialised such that all voxels are set to invisible. The pose of each OCT image in the volume coordinate frame is then obtained from the current position of the probe in the mosaic image coordinate frame, taking into account the known offset between the two probes. The value of the voxel is simply set equal to the intensity of the corresponding pixel in the OCT image. If the voxel has already been set using a previous OCT image, its value is updated by averaging the new and current values.
The mosaic image is then scaled according to its real size and placed in the same visualisation framework as the OCT volume. To register the OCT volume and pCLE mosaics with the macro surface reconstruction, we make use of a circular-dot marker attached to the adaptor Fig. 2(a). The definition of different coordinate frames is defined in Fig. 2(b). When a scan is started, the initial pose of the marker in camera coordinates T M C is recorded. Since there is a known transformation between the marker and the probe T P M and a known offset T I V between the two probes, the registration matrices T I C and T V C can be calculated as: It should be noted that T I P = I as the probe is located at the centre of the mosaic image when the scan starts. With the transformation T I C and T V C , the mosaic image and volume can be registered with the surface reconstruction.

A. Accuracy of pCLE Visual Servoing
To evaluate the accuracy and consistency of the pCLE visual servoing, we manufactured a test phantom containing a known grid pattern. The phantom was printed by a laser printer on a sheet of paper and coated by a fluorescent marker, making it visible in the pCLE channel. Every square in the grid pattern had a line thickness of 35 µm and width of 208 µm, as shown in Fig. 3. To avoid the effect of printing inaccuracies, which can be seen in Fig. 3 where the printed grid pattern is not the same as the design, a benchtop microscope was used to take several measurements of the width of individual squares. The average of the measurements was then taken as the ground truth width of a square. For the results shown below, the measured ground truth width of a square was 213.24 µm (compared to an intended width of 208 µm).
To test the consistency of the visual servoing, 10 scans were performed over the same phantom, generating 10 mosaic images. To maintain the independence of the trials, the phantom was repositioned between each run. In each mosaic image, we measured the width of every square and compared it with the ground truth value. We took 36 measurements along the horizontal and vertical direction for each image. Due to the intrinsic inaccuracy of the servoing, squares in the pattern can be misaligned as shown in Fig. 3. The fraction of misaligned squares is presented in Table I. The Root Mean Square Error (RMSE) and Interquartile Mean Error (IME) were then calculated for each trial, with the IME ignoring the effect of outliers. These results are shown in Table I where the total RMSE is 14.8 µm and the total IME is 10.0 µm. If this is compared to the FoV of the probe (240 µm), it is clear that both qualitative and quantitative results show that the proposed pCLE servoing method can achieve a high accuracy and good consistency using the da Vinci R robot, for the case of a non-deforming sample.
Tendon-driven robots such as the da Vinci R system, are often considered incapable of performing tasks that require such high precision due to backlash and kinematic inaccuracy. To demonstrate how the proposed method improves the robot's capability for pCLE scanning, we compare scanning results with and without the pCLE servoing. As shown in Figs. 4(a) and (b), although the kinematic model appears to show that the robot should follow the desired trajectory correctly, the actual mosaic image does not show the corresponding motion, indicating that there is an error in the kinematic model. With pCLE servoing, comparison of the trajectory and resulting mosaic image demonstrates that the controller corrects the kinematic error in order to follow the desired trajectory in the space of the mosaic image.
We then tested the framework on ex vivo porcine stomach tissue stained with acriflavine. Unlike the phantom experiments, deformation is now caused by probe interaction with the tissue during scanning. In Fig. 4(g), the kinematic trajectory indicates that the robot attempts to compensate for this deformation, since the deviation is 0.4 mm more than for the non-deforming phantom shown in Fig. 4(c). It can also be seen that the scanning trajectory of an ideal spiral is typically 'squared-off' as shown in Figs. 4(c) and (g). The most likely causes of this are backlash in the tendon-driven robot, and the limited bandwidth of the visual servoing control of the robot position.

B. Robustness to Unexpected Motion
The system is also somewhat robust to unexpected motion and deformation of the scanning target. To show this qualitatively, we manually moved the phantom to interrupt the scanning and mosaicing process. The first row of Figs. 4(e) and (f) shows that the servoing is able to recover following the unexpected phantom motion. To confirm the capability of the system under unexpected motion of the tissue, the tissue was placed on a motorised translation stage which was moved laterally with different velocities during the robotic scanning. The mosaic results are shown in Fig. 5. The maximal velocity of the lateral motion that would not cause failure (discontinuities in the mosaic image) was 0.5 mm/s. When the velocity was increased further to 0.63 mm/s, there was an obvious discontinuity when the system attempted to correct the motion and continue the mosaicing. Improving the system to deal with faster motion would require an increase in the mosaic processing frame rate, which is currently set to 10 frames per second. However, this is not limited in practice by the mosaic algorithm but by the use of a tendon-driven robot that is not capable of fast and accurate motion. If the mosaic frame rate was increased without an increase in velocity, the mosaic algorithm would fail as the positional shift between frames would be too small to detect accurately.

C. Validation of OCT Distance Servoing
In order to obtain high quality pCLE images, as shown in Fig. 6, the probe should be placed in gentle contact with the tissue at all times. In the probe holder, the OCT probe is positioned with an axial offset, slightly higher than the pCLE probe. Optimal pCLE images should therefore be obtained when the distance to the tissue, given by the position of the top surface in the OCT cross-sectional image, is at a fixed value around this distance (the exact distance required is determined experimentally). To evaluate how effective the OCT distance servoing is at maintaining continuous tissue contact, and hence good image quality, we used a translation stage to move a phantom in a cyclic motion with a constant linear velocity along the axial direction of the OCT probe (both towards and away from the probe). Trials were performed with different linear velocities ranging from 16 to 80 µm/s. The results in Fig. 7 show how well the robot maintains a desired distance relative to the moving phantom under different peak velocities (16,32 and 80 µm/s). When the phantom moves, the deviation between the current and desired distances can be generally kept within 4 pixels (approximately 40 µm). When the phantom stops, the current distance successfully converges to the desired one.
To determine whether the OCT distance servoing improves the robustness of the pCLE scanning, a spiral scanning task on a curved surface phantom was tested with and without the distance servoing. Fig. 8, shows an example where the mosaicing cannot continue without OCT distance servoing as the pCLE probe loses contact with the phantom and the image becomes blurred. In contrast, as shown in the zoomed region of Fig. 8, when contact begins to be lost, the robot is successfully able to recover.

D. OCT Volume Rendering and Fusion
To validate the OCT volume rendering, we scanned a surgical needle with a thickness of 0.3 mm. In this experiment, Fig. 4. Illustration of how pCLE servoing improves mosaicing results. The left column shows the scanning trajectory based on kinematic reading of the robot's end-effector and the right column shows the corresponding mosaic results on the same phantom. The first row is the result from using only the kinematic model, while the second row uses closed loop scanning. The effect of unexpected motion on the mosaic image is shown in the third row. The unexpected motion is shown in the red dashed circle in (f) before the closedloop servoing recovers and returns the probe to the desired trajectory relative to the tissue surface. The diameter of the spiral trajectory from the kinematic model is about 0.93 mm and from the pCLE servoing on the printed pattern and porcine tissue is 1.41 mm and 1.85 mm, respectively. the OCT distance servoing was disabled so that only pCLE images were used for the control. As shown in Fig. 9, a segment of the needle is scanned and reconstructed in an OCT volume. From the volume, we can clearly see that the needle is placed on a flat tissue surface. To visualise both macro and micro information, a multiscale image fusion is shown Fig. 5. Mosaic results obtained using the framework under unexpected motion with various velocities. The 'arm' connecting the mosaic over the planned region and the additionally scanned region (caused by the motion) is enlarged to show that it is continuous, suggesting that the probe correctly returned to its planned trajectory on the tissue surface. Fig. 6. Demonstration of the correlation between the distance to the tissue surface measured in the OCT image (bottom row) and the quality of pCLE images (top row). As the surface moves closer to the probe, from left to right in the figure, the probe-to-tissue distance measured by the OCT probe becomes smaller while the pCLE image quality improves. Note that the pCLE images are autocontrasted.
in Fig. 10. From the stereo reconstruction in macro scale, we can zoom in to the scanned region in micro scale where a mosaic image and an OCT volume are presented.

IV. CONCLUSION AND FUTURE WORK
In this article, we have presented an automated scanning framework for pCLE and OCT optical biopsy using the da Vinci R surgical robot. It is capable of generating large area mosaics of both pCLE and OCT images. A crucial feature is that pCLE images are used to close the control loop, and mosaicing results from both static and deforming phantoms Fig. 7. The desired OCT distance shown in red is assumed to be constant during servoing. As the phantom starts moving, the deviation between the desired potion and the current position measured by the OCT channel initially increases and then decreases towards zero as the system corrects for the motion. Finally, the deviation converges to zero when the phantom stops. 1 pixel in the OCT image corresponds to approximately 10 µm in real space. demonstrated that this effectively compensates kinematic errors. Furthermore, by using OCT images to maintain a constant distance to the tissue, and hence ensure consistent contact between the pCLE probe and the tissue, the system is able to compensate for target motion along the axial direction. This visual servoing allows for the correction of errors due to tissue deformation, robot positioning and grasping of the pick-up probe. The accuracy of this correction is better than the FoV of the pCLE probe, resulting in continuous 2D mosaics without gaps or discontinuities, which represent a common problem for open loop control. Finally, the resulting high-resolution tissue maps at micro-scale can be fused in real-time with a stereo reconstruction at macro-scale from the laparoscopic image-feed, providing the surgeon with a multi-scale 3D view of the operating site. This augmented visualisation provides a range of potential benefits for intraoperative tissue characterisation and surgical planning. Future work will investigate complementary servoing using the laparoscopic image, both for robot motion to the desired scan site, and to improve robustness to loss of contact (and hence loss of pCLE images for servoing), and an investigation of the potential of the Fig. 8. Demonstration of pCLE scanning with and without OCT distance compensation on a curved surface. Without the OCT distance servoing, the mosaic could only continue for a short distance before the probe lost contact, the pCLE image was degraded and the lateral visual servoing failed. When OCT distance servoing was turned on, a mosaic was generated over the whole trajectory. Fig. 9. A volume reconstruction of a surgical needle with a thickness of 0.3 mm. The red rectangle indicates the segment that has been reconstructed in the microscopic volume shown on the left. system to handle patient motion. A study comparing the results to master-slave controlled scanning will confirm the practical clinical benefits of the system in providing the surgeon with real-time intraoperative cellular-scale tissue analysis. ACKNOWLEDGMENT Lin Zhang is financially supported by China Scholarship Council (CSC) and the Hamlyn Centre during his PhD study. This work was partly funded by EPSRC Grant EP/N019318/1: REBOT: Robotic Endobronchial Optical Tomography; and EP/N022521/1: Translational Alliance: SMART-Endomicroscopy. Data underpinning this study was generated by code which is openly accessible via https://github. com/hamlyn-centre/auto scan.