Zero-copy in the Linux operating system
Let’s start with the general I/O process of Linux
This is a procedure that reads from a disk file and writes to it via socket, and the corresponding system call is as follows.
read(file, tmp_buf, len); write(socket, tmp_buf, len);
The program uses the read() system call to change the system from user state to kernel state, and the data from the disk is read into the kernel buffer by means of DMA (Direct memory access). Instead, the DMA processor transfers the hard disk data directly to memory via the bus. The system changes from kernel state to user state, and when the data to be read by the program has been completely stored in the kernel read buffer, the program writes the data from the kernel read buffer to the user buffer, a process that requires the CPU to participate in reading and writing the data. When the program uses the write() system call, the system switches from the user state to the kernel state and writes data from the user buffer to the network buffer (Socket Buffer), a process that requires the CPU to participate in reading and writing data. The system switches from kernel state to user state, and the data in the network buffer is transferred to the driver (storage buffer) of the NIC (protocol engine) by DMA. As you can see, the common copy process goes through four kernel and user state switches (context switches) and two CPU read/write processes from memory, which is relatively more system resource intensive.
Memory mapping method I/O
tmp_buf = mmap(file, len); write(socket, tmp_buf, len);
This is the system call method used. The I/O principle of this method is to map the memory address of the user buffer to the memory address of the kernel buffer, which means that the system can read and manipulate the kernel data directly in the user state.
The mmap() system call first reads the disk data into the kernel buffer using DMA, and then makes the memory address of the user buffer and the kernel read buffer the same memory address through memory mapping, which means there is no need for the CPU to copy the data from the kernel read buffer to the user buffer. When the write() system call is used, the CPU writes the data from the kernel buffer (equivalent to the user buffer) directly to the network send buffer (socket buffer), and then passes the data to the NIC driver for sending by DMA. You can see that this way of memory mapping reduces the number of CPU reads and writes, but the switch from user state to kernel state (context switch) is still four times, and this way allows the application to read and write the data accordingly.
Kernel space internal transfer I/O
sendfile(socket, file, len);
The sendfile() system call allows for direct I/O transfers from within kernel space.
The sendfile() system call also causes a switch from the user state to the kernel state. Unlike the memory mapping method, the user space cannot see or modify the data content at this point, which means that this is a full data transfer process. Reading from disk to memory is DMA, reading from the kernel read buffer to the network send buffer still requires the CPU to participate in the copy, and reading from the network send buffer to the buffer in the NIC is still DMA. There is still one CPU data copy and two user-state and kernel-state switching operations. This is a great improvement compared to the memory mapping approach, but the problem is that the program cannot modify the data, but simply performs a data transfer process.
Upgraded version - kernel space internal transfer I/O
Still the system call sendfile().
sendfile(socket, file, len);
In Linux kernel 2.4 and later versions, the socket buffer descriptor has been adjusted to support the DMA self-collection function, which is still the same for the user, but the internal operation has changed.
As you can see, this is truly zero-copy, because the CPU is no longer involved in the data copying process, but of course such a process requires hardware support to achieve.
With the help of the hardware, we can do it. In fact, we just need to pass the buffer descriptor to the socket buffer and then pass the data length, so that the DMA controller can directly pack the data from the page cache and send it to the network.
After the system call sendfile() is initiated, the disk data is read into the kernel buffer by DMA, and the data in the kernel buffer is aggregated into the network buffer by DMA and then sent to the NIC together. As you can see, in this mode, there is no CPU data copy, so there is really zero copy.
Subsequent optimization-splice() system call
The splice() system call is very similar to sendfile() in that the user application must have two open file descriptors, one for the input device and one for the output device. Unlike sendfile(), splice() allows any two files to be connected to each other, not just files to a socket for data transfer. The sendfile() system call has always been used for the special case of sending data from a file descriptor to a socket, whereas splice has always been a mechanism that is not limited to the function of sendfile(). In other words, sendfile() is just a subset of splice(). In Linux 2.6.23, the implementation of sendfile() is no longer available, but the API and the corresponding functions still exist, except that the API and the corresponding functions are implemented using the splice() mechanism.
In general, splice() is a method that replaces the sendfile() system call in Linux 2.6.23 kernel version. It supports not only direct file-to-socket transfer, but also direct file-to-file I/O transfer, but the underlying transfer process is no different from that of sendfile().
Zero-copy in JavaNIO
Memory-mapped I/O in NIO
The first thing to explain is that the Channel (channel) in JavaNIO is equivalent to the kernel buffer in the operating system, which may be a read buffer or a network buffer, and the Buffer is equivalent to the user buffer in the operating system.
Correction: The operating system’s read and write buffers have no relationship with JavaNIO, the operating system’s read and write buffers (in Linux is the PageCache) is the kernel and IO devices directly interact with the driver memory area, the programmer usually can not touch
Let’s look at a piece of code:
File file = new File("test.zip"); RandomAccessFile raf = new RandomAccessFile(file, "rw"); FileChannel fileChannel = raf.getChannel(); MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
The FileChannel.map() method in IO actually uses the memory mapping method in the operating system Memory address mapping is actually a mapping of memory addresses and disk files by the OS, and reading and writing to this memory is equivalent to reading and writing to the disk files directly, but the actual reading still goes through the OS to read into the memory PageCache, and the writing process also requires the OS to automatically replace the dirty pages to the disk.
This approach is suitable for reading large files and also for making changes to the file content, but if the file is subsequently sent via SocketChannel, the CPU is still required to make a copy of the data
processData(); // After data processing is complete, open a SocketChannel SocketChannel socketChannel = SocketChannel.open(new InetSocketAddress("", 1234)); // This still requires the CPU to copy the contents of the kernel buffer to the network buffer socketChannel.write(buffer);
Zero copy in NIO
File file = new File("test.zip"); RandomAccessFile raf = new RandomAccessFile(file, "rw"); FileChannel fileChannel = raf.getChannel(); SocketChannel socketChannel = SocketChannel.open(new InetSocketAddress("", 1234)); // TransferTo() is used directly for data transfer between channels fileChannel.transferTo(0, fileChannel.size(), socketChannel);
his approach is the zero copy in NIO, let’s analyze the principle:
Buffer in NIO are in user space, including DirectBuffer, which is also a piece of memory malloc out of C language.
The way transferTo() is implemented is through the system call sendfile() (of course, this is a system call in Linux, the system call in Windows is different), and according to what we wrote above, this process is much more efficient than reading and writing from the kernel buffer to the user buffer. Similarly transferFrom() is also implemented in this way.
At the same time we came to a conclusion, the efficiency of IO in Java is far worse than C and C++, because the existence of the JVM leads to Java’s IO will always be more than other languages to exchange a layer of memory, but Java is still promising in middleware, for example, message queues, kafka is written in Java, throughput and stability have reached a satisfactory effect.