linux与Windows虚拟磁盘实现之比较

楼主^#

更多发布于：2007-05-02 01:57

粗略看了一下linux下虚拟磁盘实现方式，与windows实现方式来个简单对比。

1 实现原理

  两者实现原理相同的，说明万物不离其宗，大家都是实现电子操作的工具而已：用文件模拟硬盘，当然也可以使用内存，网络，串口，并口等所有可以实现的方式。

  两者也有不同点：

  a Windows磁盘驱动实现了转接口，上层(up interface)实现磁盘的操作接口，下层(down interface)实现文件的操作。类似usb实现方式，不过usb是逻辑上分为了两个模块。（当然使用文件方式虚拟磁盘的也可以分为两种模块：一个sys实现up接口，一个sys实现down接口，两个sys之间实现stack上的有序排列和数据交互，共享内存即可）

   主要接口函数和流程：

   DriverEntry->Dispatch->每个虚拟磁盘的Thread->ZwWrite/ZwRead->Encrypto/Decrypto->ZwClose

   Windows操作大家都很熟悉，这里不再累赘，具体可以参考filedisk(比较简单明了)或者truecypto（相对成熟）.

  b Linux实现方法有些不同，它是通过dm_register_type注册一个target_type类型，该类型是用户定义的，如cyliu_type, 由name来保存类型名称。dm（device map）驱动是磁盘驱动类的框架接口，通过bio接口实现对磁盘的读写操作，由于再linux中设备也是当作文件操作，因此可以说对应用来说都是对文件系统的操作，那么linux系统的文件系统框架就具有很好的扩展性和通用性。

Device mapper在内核中向外提供了一个从逻辑设备到物理设备的映射架构，只要用户在用户空间制定好映射策略，按照自己的需要编写处理具体IO请求的target driver插件，就可以很方便的实现一个类似LVM的逻辑卷管理器。Device mapper以ioctl的方式向外提供接口，用户通过用户空间的device mapper库，向device mapper的字符设备发送ioctl命令，完成向内的通信。它还通过ioctl提供向往的事件通知机制，允许target driver将IO相关的某些事件传送到用户空间。因此linux实现虚拟磁盘比较简单，只需要对数据加密/解密即可，其他操作都有linux 的驱动完成。

mapped_device 结构用于表示 mapped device，它主要包括该 mapped device 相关的锁，注册的请求队列和一些内存池以及指向它所对应映射表的指针等域。Mapped device 对应的映射表是由 dm_table 结构表示的，该结构中包含一个 dm_target结构数组，dm_target 结构具体描述了 mapped_device 到它某个 target device 的映射关系.

struct mapped_device {
   struct rw_semaphore io_lock;
   struct semaphore suspend_lock;
   spinlock_t pushback_lock;
   rwlock_t map_lock;
   atomic_t holders;
   atomic_t open_count;

   unsigned long flags;

   request_queue_t *queue;
   struct gendisk *disk;
   char name[16];

   void *interface_ptr;

   /*
   * A list of ios that arrived while we were suspended.
   */
   atomic_t pending;
   wait_queue_head_t wait;
   struct bio_list deferred;
   struct bio_list pushback;

   /*
   * The current mapping.
   */
   struct dm_table *map;
   /*
   * io objects are allocated from here.
   */
   mempool_t *io_pool;
   mempool_t *tio_pool;

   struct bio_set *bs;

   /*
   * Event handling.
   */
   atomic_t event_nr;
   wait_queue_head_t eventq;

   /*
   * freeze/thaw support require holding onto a super block
   */
   struct super_block *frozen_sb;
   struct block_device *suspended_bdev;

   /* forced geometry settings */
   struct hd_geometry geometry;
};

struct dm_table {
   struct mapped_device *md;
   atomic_t holders;

   /* btree table */
   unsigned int depth;
   unsigned int counts[MAX_DEPTH]; /* in nodes */
   sector_t *index[MAX_DEPTH];

   unsigned int num_targets;
   unsigned int num_allocated;
   sector_t *highs;
   struct dm_target *targets;

   /*
   * Indicates the rw permissions for the new logical
   * device. This should be a combination of FMODE_READ
   * and FMODE_WRITE.
   */
   int mode;

   /* a list of devices used by this table */
   struct list_head devices;

   /*
   * These are optimistic limits taken from all the
   * targets, some targets will need smaller limits.
   */
   struct io_restrictions limits;

   /* events get handed up using this callback */
   void (*event_fn)(void *);
   void *event_context;
};

而在 dm_table 结构中将这些 dm_target 按照 B 树的方式组织起来方便 IO 请求映射时的查找操作。Dm_target 结构具体记录该结构对应 target device 所映射的 mapped device 逻辑区域的开始地址和范围，同时还包含指向具体 target device 相关操作的 target_type 结构的指针。
struct dm_target {
   struct dm_table *table;
   struct target_type *type;

   /* target limits */
   sector_t begin;
   sector_t len;

   /* FIXME: turn this into a mask, and merge with io_restrictions */
   /* Always a power of 2 */
   sector_t split_io;

   /*
   * These are automatically filled in by
   * dm_table_get_device.
   */
   struct io_restrictions limits;

   /* target specific data */
   void *private;

   /* Used to provide an error string from the ctr */
   char *error;
};

Target_type 结构主要包含了 target device 对应的 target driver 插件的名字、定义的构建和删除该类型target device的方法、该类target device对应的IO请求重映射和结束IO的方法等。而表示具体的target device的域是dm_target中的private域，该指针指向mapped device所映射的具体target device对应的结构。表示target device的具体结构由于不同的target 类型而不同，比如最简单的线性映射target类型对应target device的结构是linear_c结构。

该target device的定义相当简单，就只包括了表示对应物理设备的dm_dev结构指针和在该物理设备中以扇区为单位的偏移地址start。

struct target_type {
   const char *name;
   struct module *module;
   unsigned version[3];
   dm_ctr_fn ctr;
   dm_dtr_fn dtr;
   dm_map_fn map;
   dm_endio_fn end_io;
   dm_flush_fn flush;
   dm_presuspend_fn presuspend;
   dm_postsuspend_fn postsuspend;
   dm_preresume_fn preresume;
   dm_resume_fn resume;
   dm_status_fn status;
   dm_message_fn message;
   dm_ioctl_fn ioctl;
};

/*
* main unit of I/O for the block layer and lower layers (ie drivers and
* stacking drivers)
*/
struct bio {
   sector_t bi_sector; /* device address in 512 byte
   sectors */
   struct bio *bi_next; /* request queue link */
   struct block_device *bi_bdev;
   unsigned long bi_flags; /* status, command, etc */
   unsigned long bi_rw; /* bottom bits READ/WRITE,
   * top bits priority
   */

   unsigned short bi_vcnt; /* how many bio_vec's */
   unsigned short bi_idx; /* current index into bvl_vec */

   /* Number of segments in this BIO after
   * physical address coalescing is performed.
   */
   unsigned short bi_phys_segments;

   /* Number of segments after physical and DMA remapping
   * hardware coalescing is performed.
   */
   unsigned short bi_hw_segments;

   unsigned int bi_size; /* residual I/O count */

   /*
   * To keep track of the max hw size, we account for the
   * sizes of the first and last virtually mergeable segments
   * in this bio
   */
   unsigned int bi_hw_front_size;
   unsigned int bi_hw_back_size;

   unsigned int bi_max_vecs; /* max bvl_vecs we can hold */

   struct bio_vec *bi_io_vec; /* the actual vec list */

   bio_end_io_t *bi_end_io;
   atomic_t bi_cnt; /* pin count */

   void *bi_private;

   bio_destructor_t *bi_destructor; /* destructor */
};
内核中建立过程

在下面我们结合具体的代码简要介绍下在内核中创建一个mapped device的过程：

1、根据内核向用户空间提供的ioctl 接口传来的参数，用dev_create函数创建相应的mapped device结构。这个过程很简单，主要是向内核申请必要的内存资源，包括mapped device和为进行IO操作预申请的内存池，通过内核提供的blk_queue_make_request函数注册该mapped device对应的请求队列dm_request。并将该mapped device作为磁盘块设备注册到内核中。

2、调用dm_hash_insert将创建好的mapped device插入到device mapper中的一个全局hash表中，该表中保存了内核中当前创建的所有mapped device。

3、用户空间命令通过ioctl调用table_load函数，该函数根据用户空间传来的参数构建指定mapped device的映射表和所映射的target device。该函数先构建相应的dm_table、dm_target结构，再调用dm-table.c中的dm_table_add_target函数根据用户传入的参数初始化这些结构，并且根据参数所指定的target类型，调用相应的target类型的构建函数ctr在内存中构建target device对应的结构，然后再根据所建立的dm_target结构更新dm_table中维护的B树。上述过程完毕后，再将建立好的dm_table添加到mapped device的全局hash表对应的hash_cell结构中。

4、最后通过ioctl调用do_resume函数建立mapped device和映射表之间的绑定关系，事实上该过程就是通过dm_swap_table函数将当前dm_table结构指针值赋予mapped_device相应的map域中，然后再修改mapped_device表示当前状态的域。

通过上述的4个主要步骤，device mapper在内核中就建立一个可以提供给用户使用的mapped device逻辑块设备。

IO流

Device mapper本质功能就是根据映射关系和target driver描述的IO处理规则，将IO请求从逻辑设备mapped device转发相应的target device上。Device mapper处理所有从内核中块一级IO子系统的generic_make_request和submit_bio接口，定向到mapped device的所有块读写IO请求。IO请求在device mapper的设备树中通过请求转发从上到下地进行处理。当一个bio请求在设备树中的mapped deivce向下层转发时，一个或者多个bio的克隆被创建并发送给下层target device。然后相同的过程在设备树的每一个层次上重复，只要设备树足够大理论上这种转发过程可以无限进行下去。在设备树上某个层次中，target driver结束某个bio请求后，将表示结束该bio请求的事件上报给它上层的mapped device，该过程在各个层次上进行直到该事件最终上传到根mapped device的为止，然后device mapper结束根mapped device上原始bio请求，结束整个IO请求过程。

Bio在device mapper的设备树进行逐层的转发时，最终转发到一个或多个叶子target节点终止。因为一个bio请求不可以跨多个target device(亦即物理空间段)，因此在每一个层次上，device mapper根据用户预先告知的mapped device 的target映射信息克隆一个或者多个bio，将bio进行拆分后转发到对应的target device上。这些克隆的bio先交给mapped device上对应的target driver上进行处理，根据target driver中定义的IO处理规则进行IO请求的过滤等处理，然后再提交给target device完成。上述过程在dm.c文件中的dm_request函数中完成。Target driver可以对这些bio做如下处理：

   １、将这些bio在本驱动内部排队等待以后进行处理；

   ２、将bio重新定向到一个或多个target device上或者每个target device上的不同扇区；

   ３、向device mapper返回error 状态

2 操作方式

  操作方式是应用空间操作磁盘-〉内核进入文件系统->磁盘驱动-〉文件操作->文件系统->物理磁盘驱动->物理磁盘。

喜欢1

yuanyuan 驱动大牛注册日期2003-01-15 最后登录2010-08-04 粉丝0 关注0 积分1025分威望300点贡献值0点好评度232点原创分0分专家分0分加关注写私信	沙发^# 发布于：2007-05-07 09:31 受教，谢谢
	回复(0) 喜欢(0)

lxcsyh 驱动牛犊注册日期2007-08-05 最后登录2010-11-22 粉丝0 关注0 积分38分威望279点贡献值0点好评度4点原创分0分专家分0分加关注写私信	地板^# 发布于：2009-01-12 10:02 呵呵，正想做这方面的东西，长见识了。貌似linux下的驱动和windows有很大的不同，连i/o管理都没有，看来有一阵子好学了，呵呵。
	回复(0) 喜欢(0)

linux与Windows虚拟磁盘实现之比较

最新喜欢：