linux块设备5-白红宇

linux块设备5

阅读量：2187 次

发布时间：2019-05-02

本文共 6688 字，大约阅读时间需要 22 分钟。

linux 3.1.5

blk-merge.c

blk_recalc_rq_segments/__blk_recalc_rq_segments: 计算request中的segments数。高page的算，被blk_core调用

blk_recount_segments：仅统计这个bio的segments，不包含next的，被driver/fs/其他merge函数调用

blk_phys_contig_segment:

cluster为0，返回0，

bio bi_seg_back_size + bio next bi_seg_front_size > max segment size, 返回0

bi_seg_back_size是指这个bio当中剩余的不足max segment的部分

bi_seg_front_size其实指这个bio整个就不足max segment

bio为空或者无数据，返回1

bio和next bio不是相连的。返回0

BIO_SEG_BOUNDARY：检查边界是否相连，是，返回1

返回1，表示可merge（可在一个segment中），返回0，不可merge（不可在一个segment中）

blk_rq_map_sg:由driver或者blk-lib调用

map request到sglist（scatter list），并且返回segment数量

ll_new_hw_segment: merge bio到request

ll_back_merge_fn/ll_front_merge_fn：调用ll_new_hw_segment 来merge bio到request，它被

bio_attempt_back_merge/bio_attempt_front_merge调用，分别把bio加到request的tail和头

ll_merge_requests_fn： check是否能merge reqest和next request

blk_rq_set_mixed_merge: 设置request mixed merge标志

blk_account_io_merge：统计分区的io stat，并且如果无分区引用，删除分区

attempt_merge：

检查是否request, net request是否mergable

不merge dsiacard， secure request

不merge方向不同，不merge不在同一个disk的，不merge next是special

调用 ll_merge_requests_fn检查是否能merge，并且更新segment count

如有必要， set mixed merge标志

将next bio加到request bio tail

调用elv_merge_requests在elevator层作处理

调用blk_account_io_merge处理next

调用__blk_put_request释放next

attempt_back_merge/attempt_front_merge/blk_attempt_req_merge：

调用attempt_merge尝试merge

由blk core的make request调用

blk-settings.c：

blk_queue_prep_rq：设置prepare request queue函数，由driver调用

blk_queue_unprep_rq：设置unprepare函数，由driver调用

blk_queue_merge_bvec/blk_queue_softirq_done/blk_queue_rq_timeout/

blk_queue_rq_timed_out/blk_queue_lld_busy：注册相应的request处理函数，由driver调用

blk_set_default_limits：设置limit的缺省值， driver调用和blk setting的make request调用

blk_queue_make_request：设置queue的make_request_fn函数，初始化一些变量，由driver， core（core的blk_init_allocated_queue_node）调用

blk_queue_bounce_limit：设置queue的bounce buffer的limit， driver和setting调用

blk_queue_max_hw_sectors/blk_limits_max_hw_sectors：设置最大 hw_sectors, 并且设置max_sectors为max_hw_sector size和default 1024两者的最小值，由driver setting调用

blk_queue_max_discard_sectors：设置max discard sectors值

blk_queue_max_segments/blk_queue_max_segment_size/blk_queue_logical_block_size/blk_queue_physical_block_size/blk_queue_alignment_offset/blk_limits_io_min/blk_queue_io_min/blk_limits_io_opt/blk_queue_io_opt/blk_queue_stack_limits/ ： driver调用，设置各limit

blk_queue_stack_limits/blk_stack_limits/：为一些stack driver，设置所有的设备的common的limit，（各；imit在一个函数里设置）

blk_queue_dma_pad：设置dma pad mask

blk_queue_update_dma_pad：更新pad mask

blk_queue_dma_drain: 设置dma drain buffer，是一些如atapi一样使用的dma设备需要真实的pad buffer传输数据，由atapi scsi调用

blk_queue_segment_boundary：设置segment boundary

blk_queue_dma_alignment/blk_queue_update_dma_alignment: 设置更新dma aligment

blk_queue_flush：设置queue flush标志， driver调用

blk_queue_flush_queueable：设置queue的queueable标志

blk_settings_init：设置blk_max_low_pfn/blk_max_pfn，在加载blk系统的时候调用

blk-sysfs.c： blk queue的sysfs操作函数，操作各种属性，当有些属性修改时，需要唤醒queue或者其它

show, store等

blk_register_queue: register a queue在add disk的时候

blk_unregister_queue： unregister a queue在del disk时

blk-tag.c：当前被scsi使用，

生成queue的tag map，并对request生成tag

blk-throttle.c: 用于控制request queue的io bandwidth

主要三个函数被blk-core调用：

blk_throtl_init/： blk core生成queue的时候

blk_throtl_bio/： blk-core产生request的时候

blk_throtl_exit： blk queue release的时候(sysfs kobj)

throtl_init: 模块初始化函数，分配workqueue, 注册blkio_policy_throtl到blkio_list(是blk cgoup list, blk cgroup是cgroup， blk cgroup由好多cft，其中的一些就是控制流量的throttl, 每个cft在subsys中有个目录和文件，有相应的读写操作函数，这些函数最终会调到blkio_policy_throtl所定义的函数。)

blkio_policy_throtl：

policy是BLKIO_POLICY_THROTL

含有bps和iops的更新函数。

blk_throtl_init：

生成和初始化throtl_data，init blk_throtl_work，设置td的queue为blk core生成的queue

调用throtl_alloc_tg生成throtl_grp，这个函数还会调用blkio_alloc_blkg_stats生成blk group的每cpu变量 stats，初始化tg

设置td的root_tg为刚生成的tg

调用throtl_init_add_tg_lists把tg加入到td的tg_list, 这个函数调用__throtl_tg_fill_dev_details设置tg的dev（来自td queue的dev），调用blkiocg_add_blkio_group把tg加到blkio_root_cgroup，初始化tg的bps， iops

设置q->td = td;

blk_throtl_exit:

停止workqueue

从cgroup删除tg

从td删除tg

如有需要同步rcu， synchronize_rcu（nr_undestroyed_grps》0时）

free td

blk_throtl_bio：增加bio到tg

对于throttled bio， do nothing

查找tg（以td为key，在blkcg中查找），如果由、有，且tg的bps， iops在指定方向无定义，就更新cgroup的stat，返回0，（对于blk core来讲，返回0，意味着将调用queue的make_request_fn函数，否则会将bio加入tg, 不必调用该函数）

get tg（可能有，可能新生成）

如果tg中的queue数大于0，设update_disptime= false（不必现在schedule tg workqueue）

调用tg_may_dispatch检查是否可以dispatch bio到tg，返回1，不可以（但更新了tg的bytes_disp， io_disp，以及cgroup的stat, ？）。还调用throtl_trim_slice，一面bio长时间得不到响应

调用throtl_add_bio_tg，增加bio到tg

如update_disptime为true，调用tg_update_disptime更新， tg的disptime，然后throtl_schedule_next_dispatch调度tg work

xchg: 交换数据，因为原子性，也会用于设置。

throtl_unlink_blkio_group： destroy tg， cgroup destroy的时候使用

throtl_schedule_delayed_work：如果nr queues > 0或者limit changes，调度 throtld_workqueue,

blk_throtl_work：调度程序，调用throtl_dispatch。

throtl_dispatch：调用throtl_process_limit_change处理limit 变化的情况，如果total_nr_queued不为0，调用throtl_select_dispatch处理queue

throtl_unlink_blkio_group: destroy tg

throtl_dispatch:

roundup: ((x+(y-1))/y) * y: 将x按照y的倍数向上圆整

rounddown: x - x%y: 将x按y向下圆整

tg_may_dispatch: 返回1，表示立即可以执行，返回0，表示还要等待wait个时间

如果tg的bps[rw]/iops[rw] = -1, 表示没有流量限制，返回1

如果tg的slice已经用完，重新设置，否则如果slice end的时间小于当前时间加上throtl_slice，需要重新设置slice end时间

调用tg_with_in_bps_limit（bytes_disp（大小）和bps（限制）比较，大小加上bio size小于bps，在限制内，否则用剩余的size计算等待时间）/tg_with_in_iops_limit（io_disp（大小）和iops（限制）比较，小于在限>制内）进行流量限制检查，在限制内的，返回1， wait为0

取iops/bps的较大值，并用此重设slice end

io_disp指读或者写的io次数

bps指每次读写多少

throtl_charge_bio：在可以dispatch之后，用此函数更新io_disp/bytes_disp,还有tg cpu stat

blk_throtl_bio: 当core的make request在调用它的时候，如果返回非0，就不会调用queue的make request fn，否则调用。

对于blk_throtl_bio函数来讲，在get tg失败的时候才会返回非0，其他不管是queue bio或者是立即dispatch，都返回0

throtl_trim_slice：对iops/bps进行裁剪，防止bio长时间queued（不是特别理解）

如果nr_queued大于0或者现在不可以dispatch，就queue bio（就是把bio加到tg的bio list上），并且把tg插到tg树(tg_service_tree)上面

同时如果插入之前nr_queued是0（意指新的tg吧！），则需要更新dispatch时间，并发起调度(用td的min disp时间调度（触发）tg work（这个时间是由tg数得到的）)(过disp time时间去执行blk_throtl_work，而blk_throtl_work将调用throtl_dispatch)。

如果加入了queue， biop会被置空，这样的话在blk core的make request中就不会运行。

throtl_dispatch:

如果是limit改变引起的，调用throtl_process_limit_change，

如果queue的size不是0，调用throtl_select_dispatch（--->throtl_dispatch_tg->tg_dispatch_one_bio）得到可以dispatch的bio的list(bio_list_on_stack)，从每个tg里面拿出最大读的%75, 写的25%个数的bio。如果这个tg中还有bio，那么还要把这个tg插入到树中

调用throtl_schedule_next_dispatch启动下次的调度

如果这个list中有（bio_list_on_stack），对每个bio调用generic_make_request，

blk-timeout.c:

DECLARE_FAULT_ATTR(fail_io_timeout), 定义timeout相关的io fail的sysfs的操作

blk_delete_timer：删除request对应的timer

blk_rq_timed_out： timeout处理，调用queue的rq_timed_out_fn，并根据返回值做相应处理（完成或者重启），被blk_rq_timed_out_timer/abort调用

blk_add_timer：把request加入到timeoutlist

转载地址：http://utrkb.baihongyu.com/

你可能感兴趣的文章

Leetcode C++《热题 Hot 100-15》437.路径总和III

查看>>

Leetcode C++《热题 Hot 100-17》461.汉明距离