Vascular tree disentanglement and vessel type classification are two crucial steps of the graph-based method for retinal artery-vein (A/V) separation. Existing approaches treat them as two independent tasks and mostly rely on ad hoc rules (e.g. change of vessel directions) and hand-crafted features (e.g. color, thickness) to handle them respectively. However, we argue that the two tasks are highly correlated and should be handled jointly since knowing the A/V type can unravel those highly entangled vascular trees, which in turn helps to infer the types of connected vessels that are hard to classify based on only appearance. Therefore, designing features and models isolatedly for the two tasks often leads to a suboptimal solution of A/V separation. In view of this, this paper proposes a multi-task siamese network which aims to learn the two tasks jointly and thus yields more robust deep features for accurate A/V separation. Specifically, we first introduce Convolution Along Vessel (CAV) to extract the visual features by convolving a fundus image along vessel segments, and the geometric features by tracking the directions of blood flow in vessels. The siamese network is then trained to learn multiple tasks: i) classifying A/V types of vessel segments using visual features only, and ii) estimating the similarity of every two connected segments by comparing their visual and geometric features in order to disentangle the vasculature into individual vessel trees. Finally, the results of two tasks mutually correct each other to accomplish final A/V separation. Experimental results demonstrate that our method can achieve accuracy values of 94.7%, 96.9%, and 94.5% on three major databases (DRIVE, INSPIRE, WIDE) respectively, which outperforms recent state-of-the-arts.