SIGCOMM '22 Posters and Demos Paper #97 Reviews and Comments =========================================================================== Paper #97 Accelerating Kubernetes with In-network Caching Review #97A =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 4. Expert Paper summary ------------- The paper proposes substituting etcd, a fault-tolerant key value store used in Kubernetes, with an in-network caching layer on programmable switches. The authors make the observation that most of the accesses to this kv-store are read accesses, so they need a read-optimised kv store for that. They introduce NetCRAQ which is the implementation of the CRAQ read-optimised chain replication system in P4. Comments for author ------------------- Dear authors, thank you very much for submitting your poster to SIGCOMM. I found the insight that Kubernetes accesses to etcd are mostly read-only, thus having a read-optimised fault-tolerant kv-store is quite interesting. I have the following questions/suggestions, though: - Although the motivation for your work is Kubernetes, the main contribution and evaluation of your poster is on the implementation on CRAQ in programmable switches. NetCRAQ has merit as a system on its own. Why did you decide to couple it so tightly with Kubernetes? - Given that your motivation is Kubernetes, I found it odd that your evaluation did not include any end to end experiment with Kubernetes. What are the end to end performance implications in using NetCRAQ in Kubernetes? Do you get lower latency is placement decisions? - How big are the key value pairs that kubernetes needs to store? Given that you have limited resource on a P4 switch, can you always accomodate kubernetes' needs with registers? - How does your system scale? Are the information stored in NetCRAQ per container? How many containers can you support given the limited amount of memory on the dataplane? - Why did you run experiments only on the behavioural model? Having an evaluation on an actual testbed would make your argument much more convincing. Review #97B =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 2. Some familiarity Paper summary ------------- This poster paper aims to extend a Kubernetes component, called etcd, with in-network caching (inspired by an earlier work called NetChain) to improve its performance and scalability. Etcd uses the Raft protocol to maintain consistency among multiple nodes, which could significantly increase the write queries latency. This paper presents a framework called NetCRAQ that integrates a variant of NetChain with a different replication method (called CRAQ) into Kubernetes architecture. The results show that NetCRAQ can achieve higher throughput compared to NetChain. Comments for author ------------------- I think integrating in-network offloading capabilities into Kubernetes could benefit many other researchers. I have the following questions; I hope they could help the authors to improve their work. - Are you planning to release your source code? And potentially upstream your changes to Kubernetes? - Which node is being reported in Figure 3? Is it the reference node (i.e., distance from tail = 0)? - The text mentions that the evaluation uses a realistic workload. Is there any well-known benchmark or workload? - The evaluation mainly focuses on the read/write ratios; could skewness and/or key-value sizes affect your results? Review #97C =========================================================================== Overall merit ------------- 3. Weak accept Reviewer expertise ------------------ 1. No familiarity Paper summary ------------- This poster presents a new kubernetes architecture that leverages in-network caching to speed up its key-value store. Comments for author ------------------- The design can outperform the state-of-the-art approach, NetChain. This is good news. I think the goal for the poster is not clear. Is it speeding up key-value store? Key-value store have many applications, there is not direct link between key-values store and kubernetes. Why restrict the usage to kubernetes? If the poster can show the latency of read and write requests, it will be better. Latency is also an important metric.