kernelnewbies.org

Linux_3.14 - Linux Kernel Newbies

  • ️Sat Dec 30 2017

Linux 3.14 has been released on Sun, 30 Mar 2014.

Summary: This release includes the deadline task scheduling policy for real-time tasks, a memory compression mechanism is now considered stable, a port of the locking validator to userspace, ability to store properties such as compression for each inode in Btrfs, trigger support for tracing events, improvements to userspace probing, kernel address space randomization, TCP automatic coalescing of certain kinds of connections, a new network packet scheduler to fight bufferbloat, new drivers and many other small improvements.

1. Prominent features

1.1. Deadline scheduling class for better real-time scheduling

Operating systems traditionally provide scheduling priorities for processes: The higher priority a process has, the more scheduling time that process it can get with respect other processes with lower priorities. In Linux, users usually set scheduling priorities from a value of -20 to 19 using the nice(2) tool (in addition, Linux supports the notion scheduling classes: each class provides different scheduling policies; for example, there is a SCHED_FIFO class with a "first in, first out" policy, and a SCHED_RR with a round-robin policy).

The approach of process priorities is, however, not well suited for real-time tasks. Evidence Srl and the ReTiS Lab have created an alternative designed around real time concepts: deadline scheduling, implemented as a new scheduling policy, SCHED_DEADLINE.

Deadline scheduling gets away with the notion of process priorities. Instead, processes provide three parameters: runtime, period, and deadline. A SCHED_DEADLINE task is guaranteed to receive "runtime" microseconds of execution time every "period" microseconds, and these "runtime" microseconds are available within "deadline" microseconds from the beginning of the period. The task scheduler uses that information to run the process with the earliest deadline, a behavior closer to the requirements needed by real-time systems. For more details about the scheduling algorithms, read the documentation

Recommended LWN article: Deadline scheduling: coming soon?

Recommended page on Wikipedia: SCHED_DEADLINE

Documentation: Documentation/scheduler/sched-deadline.txt

Code: commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

1.2. zram: Memory compression mechanism considered stable

zram provides RAM block devices. Everything written to these block devices gets compressed. If zram block devices are used as swap, when the system tries to move parts of memory to swap it will be effectively moving memory from one part of the RAM to another, except that the data will be compressed before being copied to the destination. This effectively works as a cheap memory compression mechanism to improve responsiveness in systems with limited amounts of memory. Zram is being used by TV companies, Android 4.4, Cyanogenmod, Chrome OS, Lubuntu...

Zram has been in staging since Linux 2.6.33. In this release, zram has been moved out of staging to drivers/block/zram.

Code: commit, commit, commit

1.3. Btrfs: inode properties

This release adds infrastructure in Btrfsto attach name/value pairs to inodes as xattrs. The purpose of these pairs is to store properties for inodes, such as compression. These properties can be inherited, this means when a directory inode has inheritable properties set, these are added to new inodes created under that directory. Subvolumes can also have properties associated with them, and they can be inherited from their parent subvolume. This release adds one specific property implementation, named "compression", whose values can be "lzo" or "zlib" and it's an inheritable property.

Code: commit

1.4. Trigger support for tracing events

The tracing infastructure in the Linux kernel allows to easily register probe functions as events (for more details, see Documentation/trace/events.txt. This release allows these events to conditionally trigger "commands". These commands can take various forms, examples would be enabling or disabling other trace events or invoking a stack trace whenever the trace event is hit. Any given trigger can additionally have an event filter, the command will only be invoked if the event being invoked passes the associated filter.

For example, the following trigger dumps a stacktrace the first 5 times a kmalloc request happens with a size >= 64K: {{{# echo 'stacktrace:5 if bytes_req >= 65536' > \

  • /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger}}}

For more details, see Section 6 in Documentation/trace/events.txt

Recommended LWN article: Triggers for tracing

Code: commit 1, 2, 3, 4,5, 6, 7

1.5. Userspace probes access to all arguments

Userspace probes are a Linux 3.5 feature that allows to set tracing probes in userspace programs at runtime. This release enables to fetch other types of argument for the uprobes: memory, stack, deference, bitfield, retval and file offset. For more details see here.

Code: commit 1, 2, 3

1.6. Userspace locking validator

The Linux kernel has (since 2.6.18) a lock validator that can find locking issues at runtime. This release makes possible to run the Linux locking validator in userspace, making possible to debug locking issues in userspace programs. For more details, see the recommended LWN link.

Recommended LWN article: User-space lockdep

Code: commit 1, 2, 3, 4, 5, 6, 7

1.7. Kernel address space randomization

This release allows to randomize the physical and virtual address at which the kernel image is decompressed, as a security feature that deters exploit attempts relying on knowledge of the location of kernel internals.

Recommended LWN article: Kernel address space layout randomization

Code: 1, 2, 3, 4, 5, 6, 7, 8, 9

1.8. TCP automatic corking

When applications do consecutive small write()/sendmsg() system calls, the Linux kernel will try to coalesce these small writes as much as possible, to lower total amount of sent packets - this feature is called "automatic corking". Automatic corking is done if at least one prior packet for the flow is waiting in Qdisc queues or device transmit queue. Applications can still use TCP_CORK for optimal behavior when they know how/when to uncork their sockets. A new sysctl (/proc/sys/net/ipv4/tcp_autocorking) has been added to control this feature, which defaults to enabled. For benchmarks and more details see the commit link.

Code: commit

1.9. Antibufferbloat: "Proportional Integral controller Enhanced" packet scheduler

Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real-time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There has been a number of features and improvements in the Linux kernel network stack that try to address this problem.

This release adds a new network packet scheduler: PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design incurs very small overhead. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. Also you can refer to the IETF draft submission. All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/.

Code: commit

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the Linux_3.14-DriversArch page

3. Core

  • Tool for suspend/resume performance analysis and optimization commit

  • futexes: Increase hash table size for better performance commit

  • IPC queues: remove limits for the amount of system-wide queues that were added in 93e6f119c0ce commit

  • kexec: add sysctl to disable future kexec usage commit

  • lib: introduce arch optimized hash library commit

  • locking: Optimize lock_bh functions commit

  • scheduler: Drop sysctl_numa_balancing_settle_count sysctl commit

  • scheduler: add tracepoints related to NUMA task migration commit

  • stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG commit

  • swap: add a simple detector for inappropriate swapin readahead commit

  • sysfs, kernfs: add skeletons for kernfs commit

  • rcutorture: Add --bootargs argument to specify additional boot arguments commit, add --buildonly dry-run capability commit, add --kmake-arg argument to kvm.sh commit, add --no-initrd argument to kvm.sh commit, add --qemu-args argument to kvm.sh commit, add KVM-based test framework commit, add SRCU Kconfig-fragment files commit, add datestamp argument to kvm.sh commit, add per-Kconfig fragment boot parameters commit, add per-version default Kconfig fragments and module parameters commit, add v3.12 version, which adds sysidle testing commit, eliminate --rcu-kvm argument commit, eliminate configdir argument from kvm-recheck.sh script commit, remove decorative qemu argument commit

  • cpufreq: support for boost frequency support commit, commit

4. Memory management

  • /proc/meminfo: provide estimated available memory commit

  • Add overcommit_kbytes sysctl variable, it allows a more finer grain configuration than overcommit_ratio in machines with lots of memory commit

  • Document improved handling of swappiness==0 (implemented long time ago) commit

5. Block layer

  • Immutable bio vecs commit

  • rbd: add support for single-major device number allocation scheme commit, enable extended devt in single-major mode commit

  • Device Manager
    • dm cache policy mq: introduce three promotion threshold tunables commit

    • dm cache: add block sizes and total cache blocks to status output commit

    • dm cache: add policy name to status output commit

6. File systems

  • Btrfs
    • Incompatible format change to remove hole extents commit

    • Add a few mount options so that features can be changed on remounts: "barrier" commit, "datacow" commit, "datasum" commit, "noautodefrag" commit, "nodiscard" commit, "noenospc_debug" commit, "noflushoncommit" commit, "noinode_cache" commit, "treelog" commit

    • Publish btrfs internal information in sysfs, some of the features can be changed commit 1, 2, 3, 4, 5, 6, 7, 8

  • Add ioctls to query/change feature bits online commit

  • Performance improvements: Various performance improvements, see each commit for details commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

  • Add ioctl to export size of global metadata reservation, for better btrfs df reporting commit

  • f2fs
    • Add a new mount option: inline_data commit

    • Add a sysfs entry to control max_discards commit

    • Improve write performance under frequent fsync calls commit

    • Introduce sysfs entry to control in-place-update policy commit

  • hfsplus: add HFSX subfolder count support commit

  • exofs: Allow O_DIRECT open commit

  • ext4: enable punch hole for bigalloc commit

  • Ceph: Add ACL support commit, commit

  • XFS: Allow logical-sector sized O_DIRECT commit

  • 9P: Introduction of a new cache=mmap model. commit

7. Networking

  • ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes commit

  • ipv6: enable anycast addresses as source addresses for datagrams commit

  • ipv6: router reachability probing commit

  • ipv6: send Change Status Report after DAD is completed commit

  • ipv6: support IPV6_PMTU_INTERFACE on sockets commit

  • ipv6: add the option to use anycast addresses as source addresses in echo reply

commit

  • mac80211
    • Add support for QoS mapping commit,

    • Tx frame latency statistics commit

    • Add generic cipher scheme support commit

    • Vendor command support commit

  • macvtap: Add support of packet capture on macvtap device. commit

  • net-gre-gro: Add GRE support to the GRO stack commit

  • net-sysfs: add support for device-specific rx queue sysfs attributes commit

  • Add GRO support for UDP encapsulating protocols commit

  • Add GRO support for vxlan traffic commit

  • Add NETDEV_PRECHANGEMTU to notify before mtu change happens commit

  • if_arp: add ARPHRD_6LOWPAN type commit

  • net_tstamp: Add SIOCGHWTSTAMP ioctl to match SIOCSHWTSTAMP commit

  • netconf: add proxy-arp support commit, add support for IPv6 proxy_ndp commit

  • netfilter
    • Add IPv4/6 IPComp extension match support commit

    • Introduce l2tp match extension commit

    • nf_nat: add full port randomization support commit

    • nf_tables: add "inet" table for IPv4/IPv6 commit

    • nf_tables: add nfproto support to meta expression commit

    • nf_tables: add support for multi family tables commit

    • nfnetlink_queue: enable UID/GID socket info retrieval commit

    • nft: add queue module commit

    • nft_ct: Add support to set the connmark commit

    • nft_meta: add l4proto support commit

    • nft_reject: support for IPv6 and TCP reset commit

  • numa: add a sysctl for numa_balancing commit

  • openvswitch: Allow user space to announce ability to accept unaligned Netlink messages commit, enable memory mapped Netlink i/o commit

  • packet: improve socket create/bind latency in some cases commit, introduce PACKET_QDISC_BYPASS socket option commit, use percpu mmap tx frame pending refcount commit

  • tcp: metrics: New netlink attribute for src IP and dumped in netlink reply commit

  • sunrpc: add an "info" file for the dummy gssd pipe commit

  • tun: Add support for RFS on tun flows commit

  • pktgen, xfrm: Add statistics counting when transforming commit

  • rtnetlink: provide api for getting and setting slave info commit

  • IB: Add flow steering support for IPoIB UD traffic commit, ethernet L2 attributes in verbs/cm structures commit

  • NFC: NCI: Add set_config API commit

  • af_packet: Add Queue mapping mode to af_packet fanout operation commit

  • batman-adv: add bonding again commit

  • bonding: add netlink attribute support: ad_info commit, ad_select commit, all_slaves_active commit, arp_all_targets commit, arp_interval commit, arp_ip_target commit, add arp_validate commit, downdelay commit, fail_over_mac commit, lacp_rate commit, lp_interval commit, miimon commit, min_links commit, num_grat_arp commit, packets_per_slave commit, primary commit, resend_igmp commit, updelay commit, use_carrier commit, xmit_hash_policy commit

  • bonding: add sysfs /slave dir for bond slave devices. commit

  • bonding: add option lp_interval for loading module commit

  • cfg80211: Add support for QoS mapping commit

  • filter: bpf_dbg: add minimal bpf debugger commit

8. Virtualization

  • Add support for Hyper-V reference time counter commit

  • virtio-net: auto-tune mergeable rx buffer size for improved performance commit

  • virtio-net: initial rx sysfs support, export mergeable rx buffer size commit

  • xen/pvh: Support ParaVirtualized Hardware extensions (v3). commit

  • xen-netfront: add support for IPv6 offloads commit

  • xen/events: Add the hypervisor interface for the FIFO-based event channels commit

  • xen: balloon: enable for ARM commit

9. Security

  • Smack: Make the syslog control configurable commit

  • audit
    • Added exe field to audit core dump signal log commit

    • Add audit_backlog_wait_time configuration option commit

    • Allow unlimited backlog queue commit

    • log on errors from filter user rules commit

    • log task info on feature change commit

10. Crypto

  • Support for AMD Cryptographic Coprocessor which can be used to accelerate or offload encryption operations such as SHA, AES and more commit 1, 2, 3, 4, 5, 6, 7

  • mxs - Add Freescale MXS DCP driver commit, remove the old DCP driver commit

  • aesni: AVX and AVX2 version of AESNI-GCM encode and decode commit

11. Tracing/perf

  • perf kvm: Introduce option -v for perf kvm command. commit, make perf kvm diff support --guestmount. commit

  • perf probe: Support basic dwarf-based operations on uprobe events commit

  • perf record: add --initial-delay option commit, default -t option to no inheritance commit, make per-cpu mmaps the default. commit, rename --initial-delay to --delay commit, rename --no-delay to --no-buffering commit

  • perf report: Add --header/--header-only options commit

  • perf script: add --header/--header-only options commit, add an option to print the source line number commit, print callchains and symbols if they exist commit, print comm, fork and exit events also commit, print mmap events also commit

  • perf timechart: Add --highlight option commit, add backtrace support commit, add backtrace support to CPU info commit, add option to limit number of tasks commit, add support for -P and -T in timechart recording commit, add support for displaying only tasks related data commit, always try to print at least 15 tasks commit, group figures and add title with details commit,

  • perf tools: Add 'build-test' make target commit, add build and install plugins targets commit, allow '--inherit' as the negation of '--no-inherit' commit

  • perf trace: Add support for syscalls vs raw_syscalls commit

  • perf ui/tui: Implement header window commit

  • perf stat: Add event unit and scale support commit

12. Other news sites that track the changes of this release