Linux Network Architecture

   

The interfaces available for queuing disciplines and filters are mostly independent of the functionality available within an element.

18.4.1 Handles

All elements within the traffic-control tree can be addressed by 32-bit identifiers called handles. For example, the instances of the queuing disciplines discussed further below are marked with 32-bit identifiers, divided into a major number and a minor number. However, these numbers have nothing to do with the major and minor numbers for device files. These identifiers are unique for each network device, but they can occur more than once for several network devices.

In contrast, the minor number for a queue discipline is always null, except for input queuing discipline number ffff:fff1 TC_H_INGRESS (in include/linux/pkt_sched.h) and the top queue of output queuing discipline number ffff:ffff TC_H_ROOT. Major numbers are assigned by the user and are in the range from 0x0001 to 0x7fff. If the user specifies major number 0, then the kernel allocates a major number between 0x8000 und 0xffff.

For classes, the major number corresponds to the associated queuing discipline, while the minor number specifies the class within that queuing discipline. In this case, the minor number can be in the range from 0x0 to 0xffff. Minor numbers are unique only within all classes of a queuing discipline.

include/linux/pkt_sched.h defines several macros you can use to mask major and minor numbers.

18.4.2 Queuing Disciplines

The functions supplied by a queuing discipline are defined in the Qdisc_ops structure in include/net/pkt_sched.h:

struct Qdisc_ops { struct Qdisc_ops *next; struct Qdisc_class_ops *cl_ops; char id[IFNAMSIZ]; int priv_size; int (*enqueue) (struct sk_buff *, struct Qdisc *); struct sk_buff * (*dequeue) (struct Qdisc *); int (*requeue) (struct sk_buff *, struct Qdisc *); int (*drop) (struct Qdisc *); int (*init) (struct Qdisc *, struct rtattr *arg); void (*reset) (struct Qdisc *); void (*destroy) (struct Qdisc *); int (*change) (struct Qdisc *, struct rtattr *arg); int (*dump) (struct Qdisc *, struct sk_buff *); };

The first four entries are a link to a list (struct Qdisc_ops *next;), a reference to the class-related operations (struct Qdisc_class_ops *cl_ops), which will be described later. They represent an identifier (char id [IFNAMSIZ]) and values used internally.

The following functions are available externally:

enqueue ()

include/net/pkt_sched.h

The function enqueue() is used to pass packets to a queuing discipline. The return value is null (NET_XMIT_SUCCESS, see include/linux/netdevice.h), if the packet is accepted by the queuing discipline. If this or another packet is discarded when ordering packets, then the return value is unequal null:

  • NET_XMIT_DROP: The packet just passed was discarded.

  • NET_XMIT_CN: A packet was discarded for example, because of buffer overflow (CN stands for "congestion").

  • NET_XMIT_POLICED: A packet was discarded because the policing mechanism detected violation of a rule (e.g., the admissible rate was exceeded).

  • NET_XMIT_BYPASS: The passed packet was accepted, but won't leave the queuing discipline over the regular dequeue() function.

dequeue()

include/net/pkt_sched.h

When the function dequeue() is invoked, the queuing discipline returns a pointer to a packet (skb), which may be sent next. The return value null doesn't mean that there are no more packets waiting in the queuing discipline; it means only that there are no packets ready to be sent at the time of the call. The total number of packets waiting in a queuing discipline is stated in struct Qdisc* q->q.len. This value should be valid when a queuing discipline manages more than one queue.

requeue()

include/net/pkt_sched.h

The requeue() function puts a previously removed packet back into the queue. In contrast to enqueue(), however, the packet should be arranged at the position in the queuing discipline where it had been before, and the counter of packets running through this queuing discipline should not be increased. This function is intended for cases where a packet was removed by dequeue() to send it, but eventually it couldn't be sent, for an unexpected cause.

drop()

include/net/pkt_sched.h

This function removes a packet from the queue and discards it.

reset()

include/net/pkt_sched.h

The reset() function sets a queuing discipline back into the initial state (empty queues, reset counters, delete timers, etc.). If this queuing discipline manages other queuing disciplines, then their reset() functions will also be invoked.

init()

include/net/pkt_sched.h

The init() function is used to initialize a new, instantiated queuing discipline.

destroy()

include/net/pkt_sched.h

The destroy() function frees the resources that had been reserved during the initialization and runtime of the queuing discipline.

change()

include/net/pkt_sched.h

The change() function can be used to change parameters of a queuing discipline.

dump()

include/net/pkt_sched.h

The dump() function serves to output configuration parameters and statistics of a queuing discipline.

The central structure of each queuing discipline, which is referred to by all functions introduced so far, is the structure struct Qdisc (include/net/pkt_sched.h), printed as follows:

struct Qdisc { int (*enqueue) (struct sk_buff *skb, struct Qdisc *dev); struct sk_buff * (*dequeue) (struct Qdisc *dev); unsigned flags; #define TCQ_F_BUILTIN 1 #define TCQ_F_THROTTLED 2 #define TCQ_F_INGRES 4 struct Qdisc_ops *ops; struct Qdisc *next; u32 handle; atomic_t refcnt; struct sk_buff_head q; struct net_device *dev; struct tc_stats stats; int (*reshape_fail) (struct sk_buff *skb, struct Qdis c *q); /* This field is deprecated, but it is still used by CBQ * and it will live until better solution will be invented. */ struct Qdisc *__parent; char data[0]; };

In addition to a reference to the Qdisc_ops structure, there is a pointer to link Qdisc structures and a handle for unique marking of an instance of the structure within the kernel. For a simple queuing discipline with only one queue, the entry struct sk_buff_head q; represents the header of this queue. Each queuing discipline is always allocated to exactly one network device, which is referred to by struct net_device *dev.

The function reshape_fail() can be used to implement more complex traffic-shaping mechanisms. When an outer queue passes a packet to an inner queue, then it can happen that the packet has to be discarded for example, when there is no buffer space available. If the outer queuing discipline implements the callback function reshape_fail(), then it can be invoked by the inner queuing discipline in this case. Subsequently, the outer queuing discipline can select a different class.

The structure struct tc_stats contained in struct Qdisc (include/linux/pkt_sched.h) serves to carry along statistics (in addition to the q.qlen entry described earlier for the number of packets to be ordered). The following counters exist in the structure tc_stats:

__u64 bytes: /* Number of enqueued bytes */ __u32 packets; /* Number of enqueued packets */ __u32 drops; /* Packets dropped because of lack of re- sources */ __u32 overlimits; /* Number of throttle events when this flow goes out of allocated bandwidth */ __u32 bps; /* Current flow byte rate */ __u32 pps; /* Current flow packet rate */ __u32 qlen; __u32 backlog;

These statistics can have certain inaccuracies if a queuing discipline manages additional inner queuing disciplines. This is the case, for example, when a packet was dropped in an inner queuing discipline, because the number of ordered bytes can then deviate from the real value. If a queuing discipline has several classes, then separate statistics can be maintained for each class.

A queuing discipline can be added in either of the following two ways:

pktsched_init()

net/sched/sch_api.c

This function is used when a queuing discipline is permanently compiled in the kernel. In this case, the RT-NETLINK interface, which will be introduced later, is initialized, and the function register_qdisc() is invoked. Unless additional queuing disciplines were selected when the kernel was configured, only the bfifo and pfifo queuing disciplines (defined in net/sched/sch_fifo.c) are selected here.

register_qdisc()

net/sched/sch_api.c

This function is invoked either by the above described function, pktsched_init(), or by init_module(), if we want to include the queuing discipline as a module. Initially, this function checks for whether a queuing discipline with the same identification?id[IFNAMSIZ] of the Qdisc_ops structure already exists. If this is not the case, then the new queuing discipline is appended to the end of the list, and the functions are allocated.

18.4.3 Classes

Classes can be thought of as logically independent elements, but they relate closely to queuing disciplines as far as the implementation is concerned. Rather than independent files that implement classes, classes are always offered by queuing disciplines. In addition, notice that the classification (i.e., allocation of packets to a class) is handled by the filters described later (packet classifiers), which are logically separate from classes.

Unique class identifiers, similarly to queuing disciplines, are used to be able to address a class within the kernel. However, there are two identifying options for classes: The classid of type u32 serves primarily to identify a class by the user and the configuration tools in the user space; this option will be discussed in Section 18.7. In addition, there is an internal identification of the type unsigned long, which can be used for general identification of a class within the kernel. In this case, various classids can be mapped from the user space onto an internal identification, if other filter information play a role (e.g., specific fields of the skb structure).

Queuing disciplines that supply classes offer various functions, including functions to bind queues to classes and functions to change or dump a class configuration. The functions introduced below are defined in the sch_* files and exported over the structure Qdisc_ops (include/net/pkt_sched.h) (except for the qdisc_graft() function, which builds on top of the former):

graft()

include/net/pkt_sched.h

The graft() function serves to bind a queuing discipline to a class. The return value is the queuing discipline that was previously bound to that class.

get()

include/net/pkt_sched.h

The get() function maps the classid to the internal identification; this is its return value. If a usage counter exists within the class, then get() increments this counter by one.

put()

include/net/pkt_sched.h

In contrast to get(), the put() function decrements the usage counter. If this causes the usage counter to reach null, then put() can remove the class.

qdisc_graft()

net/sched/sch_api.c

This function is used in all cases where a new queuing discipline should be attached to the traffic-control tree. It initially checks on whether there is a parent or the queuing discipline itself should form the root of the traffic control tree. In the latter case, the function dev_graft_qdisc() from net/sched/sch_api.c is invoked. If a parent is present, then the get() function is invoked first to map the classid to the internal identification. Subsequently, the graft() function is invoked to bind the new queuing discipline to the classes. Finally, put() is invoked to decrement the reference counter of the old class.

leaf()

include/net/pkt_sched.h

This function returns a pointer to the queuing discipline currently bound to that class.

change()

include/net/pkt_sched.h

The change() function is used to change class parameters or create new classes, provided that the queuing discipline allows this.

delete()

include/net/pkt_sched.h

This function checks on whether the class is still referenced, and it deletes the class if this holds true.

walk()

include/net/pkt_sched.h

This function walks through the linked list of all the classes of a queuing discipline and, if it is implemented, invokes a callback function to fetch configuration data and statistical parameters.

tcf_chain()

include/net/pkt_sched.h

Figure 18-2 shows that each class is bound to at least one filter. The function tcf_chain() returns a pointer to the beginning of a linked list for the filter bound to that class.

bind_tcf()

include/net/pkt_sched.h

This function tells the queuing discipline that a filter is going to be bound to the class. This means that the function is similar to the get() function, but can be used in some cases where we have to run additional checks.

unbind_tcf()

include/net/pkt_sched.h

This function is the counterpart of the previous function, bind_tcf(), which means that it represents an extension of the put() function.

dump_class()

include/net/pkt_sched.h

Like the dump() function for queuing disciplines, the function dump_class() serves to output configuration parameters and statistical data for a class.

18.4.4 Filters

The class packets that passed by the enqueue() function in a queuing discipline belong to is decided by filters.

To make this decision, a filter uses the classify() function. This function and other filter functions, which will be described below, are exported over the tcf_proto_ops (include/net/pkt_cls.h) structure:

classify()

include/net/pkt_cls.h

This function classifies a packet (i.e., the filter checks for whether there is a filtering rule that could be applied to the packet). The following return values are possible (as for include/linux/pkt_cls.h):

  • TC_POLICE_OK: The packet was accepted by the filter.

  • TC_POLICE_RECLASSIFY: The packet violates agreed parameters (e.g., a maximum rate) and should be allocated to a different class. However, the packet is not dropped yet, to enable the queuing discipline to transport the packet over a different class.

  • TC_POLICE_SHOT: The packet was accepted by the filter, but the filter dropped it, because it violated agreed parameters.

  • TC_POLICE_UNSPEC: The rule applied by the filter doesn't match the packet, and it should be passed to the next filter or filter element.

In addition, the classify() function in the structure tcf_result (include/net/pkt_cls.h) returns the classid and, if present, the internal identification of the pertaining class. The internal identification can then simply be made available, if a separate instance of the filter exists for each class. If the internal identification is not written to the result structure, then the classid has to be mapped to the internal identification in the queuing discipline (normally by use of a linear search). In some cases, the filter can be informed about the internal identification while binding to a class, so that no mapping cost occurs.

init()

include/net/pkt_cls.h

This function initializes a filter.

destroy()

include/net/pkt_cls.h

The destroy() function removes a filter. To remove bindings to a class, it will have to invoke unbind_tcf().

get()

include/net/pkt_cls.h

Again, the get() function is used to map identifiers in this case, to map a handle of a filter element to an internal filter identification.

put()

include/net/pkt_cls.h

The put() function is invoked to unreference a filter.

change()

include/net/pkt_cls.h

This function serves to configure a new filter or change the configuration of an existing filter. bind_tcf() is used to bind new filters to classes.

delete()

include/net/pkt_cls.h

In contrast to the destroy() function, this function is used to delete one single element of a filter. The difference between a filter and a filter element will be discussed later.

walk()

include/net/pkt_cls.h

As with classes, the walk() function walks through all elements and invokes callback functions to get configuration data and statistical parameters.

dump()

include/net/pkt_cls.h

The dump() function serves to output configuration parameters and statistical data of a filter or filter elements.

Next, when a packet is passed to a queuing discipline with several classes, then the latter invokes the function tc_classify() from include/net/pkt_cls.h. This function checks on whether the filter accepts the protocol specified in skb->protocol and then invokes the filter's classify() function. The return values are identical to those of the classify() function.

The central structure of filters within Linux traffic control is struct tcf_proto in include/net/pkt_cls.h. The entry struct tcf_proto *next can be used to link several filters to a list. In addition, there are entries for the accepted protocol, for the classid of the appropriate class, and for a priority. The priority can be used to order filters that can be applied to the same protocol. For this purpose, the filters are walked through from prio variables with small values towards larger values, and a packet is allocated to the filter with rules matching first.

In addition, a filter can be split internally into filter elements, and handles of the type u32 are allocated to these internal elements. How filters are split and managed (i.e., in linear lists or in more efficient data structures such as hash tables) depends on the implementation.

As in queuing disciplines, there are two functions available to add new filters. The function tc_filter_init() (net/sched/cls_api.c) is used when a filter is permanently compiled in the kernel. From within this function, the function register_tcf_proto_ops() (net/sched/cls_api.c) is invoked, including the case where we want to embed the filter as a module. This function initially checks for whether a filter of the same type (kind element in the tcf_proto_ops structure) already exists. If this is not the case, then the new filter is appended to the end of the filter list, and functions are allocated.


       

    Категории