Linux generic_file_buffered_write是VFS层缓冲写路径的入口函数它将用户态数据先写入pagecache再通过后台或同步写回机制持久化到磁盘。该函数封装了从iov_iter遍历、pagecache查找/分配、数据拷贝到标记脏页的完整流程。// mm/filemap.cssize_t generic_file_buffered_write(struct kiocb *iocb,struct iov_iter *from){struct file *file iocb-ki_filp;struct address_space *mapping file-f_mapping;struct inode *inode mapping-host;ssize_t written 0;ssize_t status;do {status generic_perform_write(iocb, from);if (likely(status 0))written status;else if (written)break;elsereturn status;} while (iov_iter_count(from));return written;}外层是一个循环保证当一次generic_perform_write未消耗完所有用户数据时继续执行。实际上generic_perform_write内部对文件锁定位和分页操作已经做了分段外层循环更多是兜底保护。// mm/filemap.cssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i){struct file *file iocb-ki_filp;struct address_space *mapping file-f_mapping;const struct address_space_operations *a_ops mapping-a_ops;loff_t pos iocb-ki_pos;ssize_t written 0;do {struct page *page;unsigned long offset;size_t bytes;loff_t newsize;bytes iov_iter_count(i);offset pos (PAGE_SIZE - 1);bytes min(bytes, (size_t)PAGE_SIZE - offset);if (bytes 0)break;page a_ops-write_begin(file, mapping, pos, bytes, page,fsdata);if (unlikely(IS_ERR(page)))return PTR_ERR(page);copied copy_page_from_iter_atomic(page, offset, bytes, i);flush_dcache_page(page);status a_ops-write_end(file, mapping, pos, bytes, copied,page, fsdata);if (unlikely(status 0))break;written status;if (status ! bytes)break;pos status;cond_resched();} while (iov_iter_count(i));return written ? written : status;}write_begin回调的核心工作是在pagecache中查找或创建目标页面。对于ext4来说ext4_write_begin调用grab_cache_page_write_begin在radix tree/xarray中寻找页面如果不存在则分配一个全新的pagecache页并加入mapping。// mm/filemap.cstruct page *grab_cache_page_write_begin(struct address_space *mapping,pgoff_t index, unsigned flags){struct page *page;int fgp_flags FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE;page pagecache_get_page(mapping, index, fgp_flags,mapping_gfp_mask(mapping));if (page)wait_for_stable_page(page);return page;}pagecache_get_page的FGP_CREAT标志意味着如果页面不存在内核通过__page_cache_alloc分配一个新页面并通过add_to_page_cache_lru将其插入mapping的xarray和LRU链中。这个过程持有inode锁和页面锁保证并发写不会冲突。数据拷贝由copy_page_from_iter_atomic完成。该函数使用map操作的kmap_atomic将page映射到内核虚拟地址空间然后通过copyin从用户态iov_iter拷贝数据。注意这里的atomic是指映射操作是原子的不是指整个写操作不可被抢占。// lib/iov_iter.csize_t copy_page_from_iter_atomic(struct page *page, unsigned offset,size_t bytes, struct iov_iter *i){char *kaddr kmap_atomic(page);size_t n;n copyin(kaddr offset, i-data_source, bytes);kunmap_atomic(kaddr);return n;}write_end回调完成数据拷贝后的收尾工作。对于ext4ext4_write_end调用block_write_end将页面标记为脏并处理延迟分配。如果新写入的位置扩展了文件大小还需要更新i_size并触发inode的iversion变更。// fs/buffer.cint block_write_end(struct file *file, struct address_space *mapping,loff_t pos, unsigned len, unsigned copied,struct page *page, void *fsdata){unsigned start pos (PAGE_SIZE - 1);if (unlikely(copied len)) {if (!PageUptodate(page)) {zero_user(page, start copied, len - copied);}}if (!PageUptodate(page))SetPageUptodate(page);if (pos copied inode-i_size)i_size_write(inode, pos copied);set_page_dirty(page);return copied;}set_page_dirty通过__set_page_dirty_buffers将页面加入BDIbacking device info的脏链表并设置xarray中的PAGECACHE_TAG_DIRTY标签供后续writeback线程扫描写回。缓冲写的页面锁定模型值得注意write_begin获取页面锁write_end释放页面锁。在锁持有期间任何其他读者如直接I/O或mmap缺页都会阻塞在这个页面上。同时write_begin和write_end之间不能睡眠太久因此大块写入被切成多个以PAGE_SIZE为单位的循环。对于大文件追加写入generic_perform_write每次向前推进pos下一次循环通过pagecache_get_page可能命中刚刚写入的页面如果页面大小大于写入块此时write_begin直接返回已有页面避免了页面分配开销。这个行为对顺序写性能至关重要。最后如果文件系统启用了DAX直接访问generic_file_buffered_write不会被执行转而走dax_iomap_rw路径绕过pagecache。因此generic_file_buffered_write只在非DAX模式且非O_DIRECT打开的文件上执行这是内核缓冲写路径的核心数据流。