Citation :
cairo-xlib tessellates the high-level paths from the user into trapezoids and sends those to the Xserver. The ddx then rasterises the trapezoids into a mask and composites that onto the destination. Both Nvidia and glamor use trapezoid shaders to avoid rasterising with the CPU, SNA uses the same high speed scanline rasteriser as cairo-image (both try to eliminate the intermediate mask), and EXA uses the slow pixman trapezoid rasterisation routines and the extra compositing step. (For -intel the CPU is faster at generating the RLE opacity mask and sending it as geometry to the GPU than the current GPUs are at executing the branch heavy trapezoid shader. The ultimate question is whether we can tolerate using MSAA and have GPUs sufficiently fast enough...)
cairo-image rasterises directly from the general complex polygon computed for the path (convert the curves into straight lines, convolve with a pen etc). This essentially folds the two passes peformed by cairo-xlib into one and eliminates the very computationally expensive Bentley-Ottmann routine for tessellating trapezoids. On the downside, cairo-image only uses a single core (and no GPU offload) for its rasterisation. Also, more work can be done for cairo-image to process the path without requiring an intermediate polygonisation (e.g. walk splines within the scanline rasteriser, use a hairline renderer for thin pens, compute offset curves, etc).
The next step to speed up cairo-xlib would be to eliminate the trapezoids and send paths directly to X - fix the protocol to be more useful for cairo, and also coincidentally would enable separate render threads within cairo. For Nvidia, they would then couple up their driver to use their existing NV_path acceleration, and I would do something similar for SNA (as usual, look at the early experiments in cairo-drm) if the GPU was not the bottleneck.
|