From 48816322ad4d9ce195aaddd10f0ce98c944af193 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Sat, 23 Aug 2008 13:28:27 +0200 Subject: dccp: Empty the write queue when disconnecting dccp_disconnect() can be called due to several reasons: 1. when the connection setup failed (inet_stream_connect()); 2. when shutting down (inet_shutdown(), inet_csk_listen_stop()); 3. when aborting the connection (dccp_close() with 0 linger time). In case (1) the write queue is empty. This patch empties the write queue, if in case (2) or (3) it was not yet empty. This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues() later on. It also seems natural to do: when breaking an association, to delete all packets that were originally intended for the soon-disconnected end (compare with call to tcp_write_queue_purge in tcp_disconnect()). Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 1ca3b26eed0f..ae66473b11fa 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -309,7 +309,9 @@ int dccp_disconnect(struct sock *sk, int flags) sk->sk_err = ECONNRESET; dccp_clear_xmit_timers(sk); + __skb_queue_purge(&sk->sk_receive_queue); + __skb_queue_purge(&sk->sk_write_queue); if (sk->sk_send_head != NULL) { __kfree_skb(sk->sk_send_head); sk->sk_send_head = NULL; -- cgit v1.2.2 From 432649916b0435b608fb3e1fcb97347ac294d38d Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Sat, 23 Aug 2008 13:28:27 +0200 Subject: dccp: Toggle debug output without module unloading This sets the sysfs permissions so that root can toggle the `debug' parameter available for nearly every DCCP module. This is useful since there are various module inter-dependencies. The debug flag can now be toggled at runtime using echo 1 > /sys/module/dccp/parameters/dccp_debug echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug The last is not very useful yet, since no code at the moment calls the tfrc_debug() macro. Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index ae66473b11fa..d0bd34819761 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -1030,7 +1030,7 @@ MODULE_PARM_DESC(thash_entries, "Number of ehash buckets"); #ifdef CONFIG_IP_DCCP_DEBUG int dccp_debug; -module_param(dccp_debug, bool, 0444); +module_param(dccp_debug, bool, 0644); MODULE_PARM_DESC(dccp_debug, "Enable debug messages"); EXPORT_SYMBOL_GPL(dccp_debug); -- cgit v1.2.2 From 828755cee087e4a34f45d6c9db661ccd0631cc6d Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Per-socket initialisation of feature negotiation This provides feature-negotiation initialisation for both DCCP sockets and DCCP request_sockets, to support feature negotiation during connection setup. It also resolves a FIXME regarding the congestion control initialisation. Thanks to Wei Yongjun for help with the IPv6 side of this patch. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index d0bd34819761..1cdf4ae99605 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -193,6 +193,7 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) dccp_init_xmit_timers(sk); + INIT_LIST_HEAD(&dp->dccps_featneg); /* * FIXME: We're hardcoding the CCID, and doing this at this point makes * the listening (master) sock get CCID control blocks, which is not -- cgit v1.2.2 From 702083839b607f390dbed5d2304eb8fc5f4c85ac Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Cleanup routines for feature negotiation This inserts the required de-allocation routines for memory allocated by feature negotiation in the socket destructors, replacing dccp_feat_clean() in one instance. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 1cdf4ae99605..dafcefd86594 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -268,7 +268,7 @@ void dccp_destroy_sock(struct sock *sk) dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; /* clean up feature negotiation state */ - dccp_feat_clean(dmsk); + dccp_feat_list_purge(&dp->dccps_featneg); } EXPORT_SYMBOL_GPL(dccp_destroy_sock); -- cgit v1.2.2 From 86349c8d9c6892b57aff4549256ab1aa65aed0f0 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Registration routines for changing feature values Two registration routines, for SP and NN features, are provided by this patch, replacing a previous routine which was used for both feature types. These are internal-only routines and therefore start with `__feat_register'. It further exports the known limits of Sequence Window and Ack Ratio as symbolic constants. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index dafcefd86594..01332fe7a99a 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -202,7 +202,7 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) * setsockopt(CCIDs-I-want/accept). -acme */ if (likely(ctl_sock_initialized)) { - int rc = dccp_feat_init(dmsk); + int rc = dccp_feat_init(sk); if (rc) return rc; -- cgit v1.2.2 From 71bb49596bbf4e5a3328e1704d18604e822ba181 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Query supported CCIDs This provides a data structure to record which CCIDs are locally supported and three accessor functions: - a test function for internal use which is used to validate CCID requests made by the user; - a copy function so that the list can be used for feature-negotiation; - documented getsockopt() support so that the user can query capabilities. The data structure is a table which is filled in at compile-time with the list of available CCIDs (which in turn depends on the Kconfig choices). Using the copy function for cloning the list of supported CCIDs is useful for feature negotiation, since the negotiation is now with the full list of available CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation will not fail if the peer requests to use CCID3 instead of CCID2. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 01332fe7a99a..b4b10cbd8880 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -649,6 +649,8 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname, case DCCP_SOCKOPT_GET_CUR_MPS: val = dp->dccps_mss_cache; break; + case DCCP_SOCKOPT_AVAILABLE_CCIDS: + return ccid_getsockopt_builtin_ccids(sk, len, optval, optlen); case DCCP_SOCKOPT_SERVER_TIMEWAIT: val = dp->dccps_server_timewait; break; -- cgit v1.2.2 From 093e1f46cf162913d05e1d4eeb01baa3e297b683 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Resolve dependencies of features on choice of CCID This provides a missing link in the code chain, as several features implicitly depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector feature, but also Ack Ratio and Send Loss Event Rate (also taken care of). For Send Ack Vector, the situation is as follows: * since CCID2 mandates the use of Ack Vectors, there is no point in allowing endpoints which use CCID2 to disable Ack Vector features such a connection; * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4); * for all other CCIDs, the use of (Send) Ack Vector is optional and thus negotiable. However, this implies that the code negotiating the use of Ack Vectors also supports it (i.e. is able to supply and to either parse or ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack Vector support), the use of Ack Vectors is here disabled, with a comment in the source code. An analogous consideration arises for the Send Loss Event Rate feature, since the CCID-3 implementation does not support the loss interval options of RFC 4342. To make such use explicit, corresponding feature-negotiation options are inserted which signal the use of the loss event rate option, as it is used by the CCID3 code. Lastly, the values of the Ack Ratio feature are matched to the choice of CCID. The patch implements this as a function which is called after the user has made all other registrations for changing default values of features. The table is variable-length, the reserved (and hence for feature-negotiation invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0' is used to mark the end of the table. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index b4b10cbd8880..46cb3490d48e 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -278,6 +278,9 @@ static inline int dccp_listen_start(struct sock *sk, int backlog) struct dccp_sock *dp = dccp_sk(sk); dp->dccps_role = DCCP_ROLE_LISTEN; + /* do not start to listen if feature negotiation setup fails */ + if (dccp_feat_finalise_settings(dp)) + return -EPROTO; return inet_csk_listen_start(sk, backlog); } -- cgit v1.2.2 From 668144f7b41716a9efe1b398e15ead32a26cd101 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Deprecate old setsockopt framework The previous setsockopt interface, which passed socket options via struct dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to ugly code since the old approach did not distinguish between NN and SP values. This patch removes the old setsockopt interface and replaces it with two new functions to register NN/SP values for feature negotiation. These are essentially wrappers around the internal __feat_register functions, with checking added to avoid * wrong usage (type); * changing values while the connection is in progress. Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 53 ++--------------------------------------------------- 1 file changed, 2 insertions(+), 51 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 46cb3490d48e..108d56bd25c5 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -470,44 +470,6 @@ static int dccp_setsockopt_service(struct sock *sk, const __be32 service, return 0; } -/* byte 1 is feature. the rest is the preference list */ -static int dccp_setsockopt_change(struct sock *sk, int type, - struct dccp_so_feat __user *optval) -{ - struct dccp_so_feat opt; - u8 *val; - int rc; - - if (copy_from_user(&opt, optval, sizeof(opt))) - return -EFAULT; - /* - * rfc4340: 6.1. Change Options - */ - if (opt.dccpsf_len < 1) - return -EINVAL; - - val = kmalloc(opt.dccpsf_len, GFP_KERNEL); - if (!val) - return -ENOMEM; - - if (copy_from_user(val, opt.dccpsf_val, opt.dccpsf_len)) { - rc = -EFAULT; - goto out_free_val; - } - - rc = dccp_feat_change(dccp_msk(sk), type, opt.dccpsf_feat, - val, opt.dccpsf_len, GFP_KERNEL); - if (rc) - goto out_free_val; - -out: - return rc; - -out_free_val: - kfree(val); - goto out; -} - static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { @@ -530,20 +492,9 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, err = 0; break; case DCCP_SOCKOPT_CHANGE_L: - if (optlen != sizeof(struct dccp_so_feat)) - err = -EINVAL; - else - err = dccp_setsockopt_change(sk, DCCPO_CHANGE_L, - (struct dccp_so_feat __user *) - optval); - break; case DCCP_SOCKOPT_CHANGE_R: - if (optlen != sizeof(struct dccp_so_feat)) - err = -EINVAL; - else - err = dccp_setsockopt_change(sk, DCCPO_CHANGE_R, - (struct dccp_so_feat __user *) - optval); + DCCP_WARN("sockopt(CHANGE_L/R) is deprecated: fix your app\n"); + err = 0; break; case DCCP_SOCKOPT_SERVER_TIMEWAIT: if (dp->dccps_role != DCCP_ROLE_SERVER) -- cgit v1.2.2 From 20f41eee82864e308a5499308a1722dc3181cc3a Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Feature negotiation for minimum-checksum-coverage This provides feature negotiation for server minimum checksum coverage which so far has been missing. Since sender/receiver coverage values range only from 0...15, their type has also been reduced in size from u16 to u4. Feature-negotiation options are now generated for both sender and receiver coverage, i.e. when the peer has `forgotten' to enable partial coverage then feature negotiation will automatically enable (negotiate) the partial coverage value for this connection. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 53 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 13 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 108d56bd25c5..47b137a3feaf 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -470,6 +470,42 @@ static int dccp_setsockopt_service(struct sock *sk, const __be32 service, return 0; } +static int dccp_setsockopt_cscov(struct sock *sk, int cscov, bool rx) +{ + u8 *list, len; + int i, rc; + + if (cscov < 0 || cscov > 15) + return -EINVAL; + /* + * Populate a list of permissible values, in the range cscov...15. This + * is necessary since feature negotiation of single values only works if + * both sides incidentally choose the same value. Since the list starts + * lowest-value first, negotiation will pick the smallest shared value. + */ + if (cscov == 0) + return 0; + len = 16 - cscov; + + list = kmalloc(len, GFP_KERNEL); + if (list == NULL) + return -ENOBUFS; + + for (i = 0; i < len; i++) + list[i] = cscov++; + + rc = dccp_feat_register_sp(sk, DCCPF_MIN_CSUM_COVER, rx, list, len); + + if (rc == 0) { + if (rx) + dccp_sk(sk)->dccps_pcrlen = cscov; + else + dccp_sk(sk)->dccps_pcslen = cscov; + } + kfree(list); + return rc; +} + static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { @@ -502,20 +538,11 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, else dp->dccps_server_timewait = (val != 0); break; - case DCCP_SOCKOPT_SEND_CSCOV: /* sender side, RFC 4340, sec. 9.2 */ - if (val < 0 || val > 15) - err = -EINVAL; - else - dp->dccps_pcslen = val; + case DCCP_SOCKOPT_SEND_CSCOV: + err = dccp_setsockopt_cscov(sk, val, false); break; - case DCCP_SOCKOPT_RECV_CSCOV: /* receiver side, RFC 4340 sec. 9.2.1 */ - if (val < 0 || val > 15) - err = -EINVAL; - else { - dp->dccps_pcrlen = val; - /* FIXME: add feature negotiation, - * ChangeL(MinimumChecksumCoverage, val) */ - } + case DCCP_SOCKOPT_RECV_CSCOV: + err = dccp_setsockopt_cscov(sk, val, true); break; default: err = -ENOPROTOOPT; -- cgit v1.2.2 From 73bbe095bbb9ce5f94d5475bad54c7ccd8573b1b Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Tidy up setsockopt calls This splits the setsockopt calls into two groups, depending on whether an integer argument (val) is required and whether routines being called do their own locking. Some options (such as setting the CCID) use u8 rather than int, so that for these the test with regard to integer-sizeof can not be used. The second switch-case statement now only has those statements which need locking and which make use of `val'. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald Acked-by: Arnaldo Carvalho de Melo Reviewed-by: Eugene Teo --- net/dccp/proto.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 47b137a3feaf..e29bbf914057 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -512,7 +512,17 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, struct dccp_sock *dp = dccp_sk(sk); int val, err = 0; - if (optlen < sizeof(int)) + switch (optname) { + case DCCP_SOCKOPT_PACKET_SIZE: + DCCP_WARN("sockopt(PACKET_SIZE) is deprecated: fix your app\n"); + return 0; + case DCCP_SOCKOPT_CHANGE_L: + case DCCP_SOCKOPT_CHANGE_R: + DCCP_WARN("sockopt(CHANGE_L/R) is deprecated: fix your app\n"); + return 0; + } + + if (optlen < (int)sizeof(int)) return -EINVAL; if (get_user(val, (int __user *)optval)) @@ -523,15 +533,6 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, lock_sock(sk); switch (optname) { - case DCCP_SOCKOPT_PACKET_SIZE: - DCCP_WARN("sockopt(PACKET_SIZE) is deprecated: fix your app\n"); - err = 0; - break; - case DCCP_SOCKOPT_CHANGE_L: - case DCCP_SOCKOPT_CHANGE_R: - DCCP_WARN("sockopt(CHANGE_L/R) is deprecated: fix your app\n"); - err = 0; - break; case DCCP_SOCKOPT_SERVER_TIMEWAIT: if (dp->dccps_role != DCCP_ROLE_SERVER) err = -EOPNOTSUPP; @@ -548,8 +549,8 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, err = -ENOPROTOOPT; break; } - release_sock(sk); + return err; } -- cgit v1.2.2 From fade756f18d42694e3acb00e3471ab43002cba16 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Set per-connection CCIDs via socket options With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which overrides the defaults set by the global sysctl variables for TX/RX CCIDs. To make full use of this facility, the remaining patches of this patch set are needed, which track dependencies and activate negotiated feature values. Note on the maximum number of CCIDs that can be registered: ----------------------------------------------------------- The maximum number of CCIDs that can be registered on the socket is constrained by the space in a Confirm/Change feature negotiation option. The space in these in turn depends on the size of header options as defined in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from ackvec.h into linux/dccp.h, clarifying its purpose. Relative to this size, the maximum number of CCID identifiers that can be present in a Confirm option (which always consumes 1 byte more than a Change option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the CCID-feature-type and one for the selected value. Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index e29bbf914057..2cd56df44d8e 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -506,6 +506,36 @@ static int dccp_setsockopt_cscov(struct sock *sk, int cscov, bool rx) return rc; } +static int dccp_setsockopt_ccid(struct sock *sk, int type, + char __user *optval, int optlen) +{ + u8 *val; + int rc = 0; + + if (optlen < 1 || optlen > DCCP_FEAT_MAX_SP_VALS) + return -EINVAL; + + val = kmalloc(optlen, GFP_KERNEL); + if (val == NULL) + return -ENOMEM; + + if (copy_from_user(val, optval, optlen)) { + kfree(val); + return -EFAULT; + } + + lock_sock(sk); + if (type == DCCP_SOCKOPT_TX_CCID || type == DCCP_SOCKOPT_CCID) + rc = dccp_feat_register_sp(sk, DCCPF_CCID, 1, val, optlen); + + if (!rc && (type == DCCP_SOCKOPT_RX_CCID || type == DCCP_SOCKOPT_CCID)) + rc = dccp_feat_register_sp(sk, DCCPF_CCID, 0, val, optlen); + release_sock(sk); + + kfree(val); + return rc; +} + static int do_dccp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen) { @@ -520,6 +550,10 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, case DCCP_SOCKOPT_CHANGE_R: DCCP_WARN("sockopt(CHANGE_L/R) is deprecated: fix your app\n"); return 0; + case DCCP_SOCKOPT_CCID: + case DCCP_SOCKOPT_RX_CCID: + case DCCP_SOCKOPT_TX_CCID: + return dccp_setsockopt_ccid(sk, optname, optval, optlen); } if (optlen < (int)sizeof(int)) -- cgit v1.2.2 From c8041e264b3db6944d37b87969fbe6458cb30cfd Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: API to query the current TX/RX CCID This provides function to query the current TX/RX CCID dynamically, without reliance on the minisock value, using dynamic information available in the currently loaded CCID module. This query function is then used to (a) provide the getsockopt part for getting/setting CCIDs via sockopts; (b) replace the current test for "which CCID is in use" in probe.c. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 2cd56df44d8e..6550452d59e2 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -667,6 +667,16 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname, break; case DCCP_SOCKOPT_AVAILABLE_CCIDS: return ccid_getsockopt_builtin_ccids(sk, len, optval, optlen); + case DCCP_SOCKOPT_TX_CCID: + val = ccid_get_current_tx_ccid(dp); + if (val < 0) + return -ENOPROTOOPT; + break; + case DCCP_SOCKOPT_RX_CCID: + val = ccid_get_current_rx_ccid(dp); + if (val < 0) + return -ENOPROTOOPT; + break; case DCCP_SOCKOPT_SERVER_TIMEWAIT: val = dp->dccps_server_timewait; break; -- cgit v1.2.2 From 3a53a9adfa269da7fa40fc476f09e46155c0143d Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Integration of dynamic feature activation - part 1 (socket setup) This first patch out of three replaces the hardcoded default settings with initialisation code for the dynamic feature negotiation. Note on retransmitting Confirm options: --------------------------------------- This patch also defers flushing the client feature-negotiation queue, due to the following considerations. As long as the client is in PARTOPEN, it needs to retransmit the Confirm options for the Change options received on the DCCP-Response from the server. Otherwise, if the packet containing the Confirm options gets dropped in the network, the connection aborts due to undefined feature negotiation state. Thanks to Leandro Melo de Sales who reported a bug in an earlier revision of the patch set, resulting from not retransmitting the Confirm options. The patch now ensures that the client feature-negotiation queue is flushed only when entering the OPEN state. Since confirmed Change options are removed as soon as they are confirmed (in the DCCP-Response), this ensures that Confirm options are retransmitted. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 46 ++++++---------------------------------------- 1 file changed, 6 insertions(+), 40 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 6550452d59e2..0d420790b795 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -67,6 +67,9 @@ void dccp_set_state(struct sock *sk, const int state) case DCCP_OPEN: if (oldstate != DCCP_OPEN) DCCP_INC_STATS(DCCP_MIB_CURRESTAB); + /* Client retransmits all Confirm options until entering OPEN */ + if (oldstate == DCCP_PARTOPEN) + dccp_feat_list_purge(&dccp_sk(sk)->dccps_featneg); break; case DCCP_CLOSED: @@ -175,7 +178,6 @@ EXPORT_SYMBOL_GPL(dccp_state_name); int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) { struct dccp_sock *dp = dccp_sk(sk); - struct dccp_minisock *dmsk = dccp_msk(sk); struct inet_connection_sock *icsk = inet_csk(sk); dccp_minisock_init(&dp->dccps_minisock); @@ -194,45 +196,9 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) dccp_init_xmit_timers(sk); INIT_LIST_HEAD(&dp->dccps_featneg); - /* - * FIXME: We're hardcoding the CCID, and doing this at this point makes - * the listening (master) sock get CCID control blocks, which is not - * necessary, but for now, to not mess with the test userspace apps, - * lets leave it here, later the real solution is to do this in a - * setsockopt(CCIDs-I-want/accept). -acme - */ - if (likely(ctl_sock_initialized)) { - int rc = dccp_feat_init(sk); - - if (rc) - return rc; - - if (dmsk->dccpms_send_ack_vector) { - dp->dccps_hc_rx_ackvec = dccp_ackvec_alloc(GFP_KERNEL); - if (dp->dccps_hc_rx_ackvec == NULL) - return -ENOMEM; - } - dp->dccps_hc_rx_ccid = ccid_hc_rx_new(dmsk->dccpms_rx_ccid, - sk, GFP_KERNEL); - dp->dccps_hc_tx_ccid = ccid_hc_tx_new(dmsk->dccpms_tx_ccid, - sk, GFP_KERNEL); - if (unlikely(dp->dccps_hc_rx_ccid == NULL || - dp->dccps_hc_tx_ccid == NULL)) { - ccid_hc_rx_delete(dp->dccps_hc_rx_ccid, sk); - ccid_hc_tx_delete(dp->dccps_hc_tx_ccid, sk); - if (dmsk->dccpms_send_ack_vector) { - dccp_ackvec_free(dp->dccps_hc_rx_ackvec); - dp->dccps_hc_rx_ackvec = NULL; - } - dp->dccps_hc_rx_ccid = dp->dccps_hc_tx_ccid = NULL; - return -ENOMEM; - } - } else { - /* control socket doesn't need feat nego */ - INIT_LIST_HEAD(&dmsk->dccpms_pending); - INIT_LIST_HEAD(&dmsk->dccpms_conf); - } - + /* control socket doesn't need feat nego */ + if (likely(ctl_sock_initialized)) + return dccp_feat_init(sk); return 0; } -- cgit v1.2.2 From b235dc4abbc1356284bd0dc730efa711f394e0e2 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp ccid-2: Phase out the use of boolean Ack Vector sysctl This removes the use of the sysctl and the minisock variable for the Send Ack Vector feature, which is now handled fully dynamically via feature negotiation; i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per RFC 4341, 4.). Using a sysctl in parallel to this implementation would open the door to crashes, since much of the code relies on tests of the boolean minisock / sysctl variable. Thus, this patch replaces all tests of type if (dccp_msk(sk)->dccpms_send_ack_vector) /* ... */ with if (dp->dccps_hc_rx_ackvec != NULL) /* ... */ The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature negotiation concluded that Ack Vectors are to be used on the half-connection. Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child), so that the test is a valid one. The activation handler for Ack Vectors is called as soon as the feature negotiation has concluded at the * server when the Ack marking the transition RESPOND => OPEN arrives; * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN. Adding the sequence number of the Response packet to the Ack Vector has been removed, since (a) connection establishment implies that the Response has been received; (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e. this entry will always be ignored; (c) it can not be used for anything useful - to detect loss for instance, only packets received after the loss can serve as pseudo-dupacks. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 0d420790b795..775eaa3d0c49 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -207,7 +207,6 @@ EXPORT_SYMBOL_GPL(dccp_init_sock); void dccp_destroy_sock(struct sock *sk) { struct dccp_sock *dp = dccp_sk(sk); - struct dccp_minisock *dmsk = dccp_msk(sk); /* * DCCP doesn't use sk_write_queue, just sk_send_head @@ -225,7 +224,7 @@ void dccp_destroy_sock(struct sock *sk) kfree(dp->dccps_service_list); dp->dccps_service_list = NULL; - if (dmsk->dccpms_send_ack_vector) { + if (dp->dccps_hc_rx_ackvec != NULL) { dccp_ackvec_free(dp->dccps_hc_rx_ackvec); dp->dccps_hc_rx_ackvec = NULL; } -- cgit v1.2.2 From 51c7d4fa2675c106a980ddcdbe308b54b5151945 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Implement both feature-local and feature-remote Sequence Window feature This adds full support for local/remote Sequence Window feature, from which the * sequence-number-validity (W) and * acknowledgment-number-validity (W') windows derive as specified in RFC 4340, 7.5.3. Specifically, the following changes are introduced: * integrated new socket fields into dccp_sk; * updated the update_gsr/gss routines with regard to these fields; * updated handler code: the Sequence Window feature is located at the TX side, so the local feature is meant if the handler-rx flag is false; * the initialisation of `rcv_wnd' in reqsk is removed, since - rcv_wnd is not used by the code anywhere; - sequence number checks are not done in the LISTEN state (cf. 7.5.3); - dccp_check_req checks the Ack number validity more rigorously; * the `struct dccp_minisock' became empty and is now removed. Until the handshake completes with activating negotiated values, the local/remote Sequence-Window values are undefined and thus can not reliably be estimated. This issue is addressed in a separate patch. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 775eaa3d0c49..392a5d822b33 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -180,8 +180,6 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) struct dccp_sock *dp = dccp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); - dccp_minisock_init(&dp->dccps_minisock); - icsk->icsk_rto = DCCP_TIMEOUT_INIT; icsk->icsk_syn_retries = sysctl_dccp_request_retries; sk->sk_state = DCCP_CLOSED; -- cgit v1.2.2 From 2faae5587f692fd5c79856ca4c4b90944ee0472a Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp ccid-2: Use feature-negotiation to report Ack Ratio changes This uses the new feature-negotiation framework to signal Ack Ratio changes, as required by RFC 4341, sec. 6.1.2. This raises some problems for CCID-2 since it can at the moment not cope gracefully with Ack Ratio of e.g. 2. A FIXME has thus been added which reverts to the existing policy of bypassing the Ack Ratio sysctl. Signed-off-by: Gerrit Renker Acked-by: Ian McDonald --- net/dccp/proto.c | 1 - 1 file changed, 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 392a5d822b33..11905e0cf8f7 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -189,7 +189,6 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) dp->dccps_rate_last = jiffies; dp->dccps_role = DCCP_ROLE_UNDEFINED; dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT; - dp->dccps_l_ack_ratio = dp->dccps_r_ack_ratio = 1; dccp_init_xmit_timers(sk); -- cgit v1.2.2 From 146993cf5174472644ed11bd5fb539f0af8bfa49 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Refine the wait-for-ccid mechanism This extends the existing wait-for-ccid routine so that it may be used with different types of CCID. It further addresses the problems listed below. The code looks if the write queue is non-empty and grants the TX CCID up to `timeout' jiffies to drain the queue. It will instead purge that queue if * the delay suggested by the CCID exceeds the time budget; * a socket error occurred while waiting for the CCID; * there is a signal pending (eg. annoyed user pressed Control-C); * the CCID does not support delays (we don't know how long it will take). D e t a i l s [can be removed] ------------------------------- DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg() will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting for them to be released to the network. Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally 64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets when the application closes. Since the write queue is congestion-controlled by the CCID, draining the queue is also under control of the CCID. There are several problems that needed to be addressed: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case by calling __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 11905e0cf8f7..8c125ffab1c5 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -735,7 +735,7 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, goto out_discard; skb_queue_tail(&sk->sk_write_queue, skb); - dccp_write_xmit(sk,0); + dccp_write_xmit(sk); out_release: release_sock(sk); return rc ? : len; @@ -958,9 +958,22 @@ void dccp_close(struct sock *sk, long timeout) /* Check zero linger _after_ checking for unread data. */ sk->sk_prot->disconnect(sk, 0); } else if (sk->sk_state != DCCP_CLOSED) { + /* + * Normal connection termination. May need to wait if there are + * still packets in the TX queue that are delayed by the CCID. + */ + dccp_flush_write_queue(sk, &timeout); dccp_terminate_connection(sk); } + /* + * Flush write queue. This may be necessary in several cases: + * - we have been closed by the peer but still have application data; + * - abortive termination (unread data or zero linger time), + * - normal termination but queue could not be flushed within time limit + */ + __skb_queue_purge(&sk->sk_write_queue); + sk_stream_wait_close(sk, timeout); adjudge_to_death: -- cgit v1.2.2 From d6da3511d6b558d0b017777b61dc08b8fbc06ea4 Mon Sep 17 00:00:00 2001 From: Tomasz Grobelny Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp: Policy-based packet dequeueing infrastructure This patch adds a generic infrastructure for policy-based dequeueing of TX packets and provides two policies: * a simple FIFO policy (which is the default) and * a priority based policy (set via socket options). Both policies honour the tx_qlen sysctl for the maximum size of the write queue (can be overridden via socket options). The priority policy uses skb->priority internally to assign an u32 priority identifier, using the same ranking as SO_PRIORITY. The skb->priority field is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary data using cmsg(3), the patch also provides the requisite parsing routines. Signed-off-by: Tomasz Grobelny Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 64 insertions(+), 3 deletions(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 8c125ffab1c5..b56efdd2a421 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -189,6 +189,7 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) dp->dccps_rate_last = jiffies; dp->dccps_role = DCCP_ROLE_UNDEFINED; dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT; + dp->dccps_tx_qlen = sysctl_dccp_tx_qlen; dccp_init_xmit_timers(sk); @@ -541,6 +542,20 @@ static int do_dccp_setsockopt(struct sock *sk, int level, int optname, case DCCP_SOCKOPT_RECV_CSCOV: err = dccp_setsockopt_cscov(sk, val, true); break; + case DCCP_SOCKOPT_QPOLICY_ID: + if (sk->sk_state != DCCP_CLOSED) + err = -EISCONN; + else if (val < 0 || val >= DCCPQ_POLICY_MAX) + err = -EINVAL; + else + dp->dccps_qpolicy = val; + break; + case DCCP_SOCKOPT_QPOLICY_TXQLEN: + if (val < 0) + err = -EINVAL; + else + dp->dccps_tx_qlen = val; + break; default: err = -ENOPROTOOPT; break; @@ -648,6 +663,12 @@ static int do_dccp_getsockopt(struct sock *sk, int level, int optname, case DCCP_SOCKOPT_RECV_CSCOV: val = dp->dccps_pcrlen; break; + case DCCP_SOCKOPT_QPOLICY_ID: + val = dp->dccps_qpolicy; + break; + case DCCP_SOCKOPT_QPOLICY_TXQLEN: + val = dp->dccps_tx_qlen; + break; case 128 ... 191: return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname, len, (u32 __user *)optval, optlen); @@ -690,6 +711,43 @@ int compat_dccp_getsockopt(struct sock *sk, int level, int optname, EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); #endif +static int dccp_msghdr_parse(struct msghdr *msg, struct sk_buff *skb) +{ + struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg); + + /* + * Assign an (opaque) qpolicy priority value to skb->priority. + * + * We are overloading this skb field for use with the qpolicy subystem. + * The skb->priority is normally used for the SO_PRIORITY option, which + * is initialised from sk_priority. Since the assignment of sk_priority + * to skb->priority happens later (on layer 3), we overload this field + * for use with queueing priorities as long as the skb is on layer 4. + * The default priority value (if nothing is set) is 0. + */ + skb->priority = 0; + + for (; cmsg != NULL; cmsg = CMSG_NXTHDR(msg, cmsg)) { + + if (!CMSG_OK(msg, cmsg)) + return -EINVAL; + + if (cmsg->cmsg_level != SOL_DCCP) + continue; + + switch (cmsg->cmsg_type) { + case DCCP_SCM_PRIORITY: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32))) + return -EINVAL; + skb->priority = *(__u32 *)CMSG_DATA(cmsg); + break; + default: + return -EINVAL; + } + } + return 0; +} + int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len) { @@ -705,8 +763,7 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, lock_sock(sk); - if (sysctl_dccp_tx_qlen && - (sk->sk_write_queue.qlen >= sysctl_dccp_tx_qlen)) { + if (dccp_qpolicy_full(sk)) { rc = -EAGAIN; goto out_release; } @@ -734,7 +791,11 @@ int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, if (rc != 0) goto out_discard; - skb_queue_tail(&sk->sk_write_queue, skb); + rc = dccp_msghdr_parse(msg, skb); + if (rc != 0) + goto out_discard; + + dccp_qpolicy_push(sk, skb); dccp_write_xmit(sk); out_release: release_sock(sk); -- cgit v1.2.2 From 7d1af6a8d935678248d057564e75e1452409a53c Mon Sep 17 00:00:00 2001 From: Tomasz Grobelny Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp qpolicy: Parameter checking of cmsg qpolicy parameters Ensure that cmsg->cmsg_type value is valid for qpolicy that is currently in use. Signed-off-by: Tomasz Grobelny Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index b56efdd2a421..a3caa11aa836 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -735,6 +735,10 @@ static int dccp_msghdr_parse(struct msghdr *msg, struct sk_buff *skb) if (cmsg->cmsg_level != SOL_DCCP) continue; + if (cmsg->cmsg_type <= DCCP_SCM_QPOLICY_MAX && + !dccp_qpolicy_param_ok(skb->sk, cmsg->cmsg_type)) + return -EINVAL; + switch (cmsg->cmsg_type) { case DCCP_SCM_PRIORITY: if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32))) -- cgit v1.2.2 From 34a081be8e14b7ada70e069b65b05d54db4af497 Mon Sep 17 00:00:00 2001 From: Gerrit Renker Date: Thu, 4 Sep 2008 07:30:19 +0200 Subject: dccp tfrc: Let dccp_tfrc_lib do the sampling work This migrates more TFRC-related code into the dccp_tfrc_lib: * sampling of the packet size `s' (which is only needed until the first loss interval is computed (ccid3_first_li)); * updating the byte-counter `bytes_recvd' in between sending feedbacks. The result is a better separation of CCID-3 specific and TFRC specific code, which aids future integration with ECN and e.g. CCID-4. Further changes: ---------------- * replaced magic number of 536 with equivalent constant TCP_MIN_RCVMSS; (this constant is also used when no estimate for `s' is available). Signed-off-by: Gerrit Renker --- net/dccp/proto.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/dccp/proto.c') diff --git a/net/dccp/proto.c b/net/dccp/proto.c index a3caa11aa836..ecf3be961e11 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -185,7 +185,7 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized) sk->sk_state = DCCP_CLOSED; sk->sk_write_space = dccp_write_space; icsk->icsk_sync_mss = dccp_sync_mss; - dp->dccps_mss_cache = 536; + dp->dccps_mss_cache = TCP_MIN_RCVMSS; dp->dccps_rate_last = jiffies; dp->dccps_role = DCCP_ROLE_UNDEFINED; dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT; -- cgit v1.2.2