aboutsummaryrefslogtreecommitdiffstats
path: root/lib
diff options
context:
space:
mode:
authorDavid Howells <dhowells@redhat.com>2008-11-13 18:39:23 -0500
committerJames Morris <jmorris@namei.org>2008-11-13 18:39:23 -0500
commitd84f4f992cbd76e8f39c488cf0c5d123843923b1 (patch)
treefc4a0349c42995715b93d0f7a3c78e9ea9b3f36e /lib
parent745ca2475a6ac596e3d8d37c2759c0fbe2586227 (diff)
CRED: Inaugurate COW credentials
Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
Diffstat (limited to 'lib')
-rw-r--r--lib/Makefile2
1 files changed, 1 insertions, 1 deletions
diff --git a/lib/Makefile b/lib/Makefile
index 7cb65d85aeb0..80fe8a3ec12a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -11,7 +11,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
11 rbtree.o radix-tree.o dump_stack.o \ 11 rbtree.o radix-tree.o dump_stack.o \
12 idr.o int_sqrt.o extable.o prio_tree.o \ 12 idr.o int_sqrt.o extable.o prio_tree.o \
13 sha1.o irq_regs.o reciprocal_div.o argv_split.o \ 13 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
14 proportions.o prio_heap.o ratelimit.o show_mem.o 14 proportions.o prio_heap.o ratelimit.o show_mem.o is_single_threaded.o
15 15
16lib-$(CONFIG_MMU) += ioremap.o 16lib-$(CONFIG_MMU) += ioremap.o
17lib-$(CONFIG_SMP) += cpumask.o 17lib-$(CONFIG_SMP) += cpumask.o
9 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950




                                               
                                




























                                                                                
                                                


































































































































































































































































































































                                                                                        
                                                                          









































































































































































































































































































































































































































































































































































                                                                                                                                                                                                                                                                                                   
                               





                                       




                                    























                                                                                
How to conserve battery power using laptop-mode
-----------------------------------------------

Document Author: Bart Samwel (bart@samwel.tk)
Date created: January 2, 2004
Last modified: December 06, 2004

Introduction
------------

Laptop mode is used to minimize the time that the hard disk needs to be spun up,
to conserve battery power on laptops. It has been reported to cause significant
power savings.

Contents
--------

* Introduction
* Installation
* Caveats
* The Details
* Tips & Tricks
* Control script
* ACPI integration
* Monitoring tool


Installation
------------

To use laptop mode, you don't need to set any kernel configuration options
or anything. Simply install all the files included in this document, and
laptop mode will automatically be started when you're on battery. For
your convenience, a tarball containing an installer can be downloaded at:

http://www.xs4all.nl/~bsamwel/laptop_mode/tools/

To configure laptop mode, you need to edit the configuration file, which is
located in /etc/default/laptop-mode on Debian-based systems, or in
/etc/sysconfig/laptop-mode on other systems.

Unfortunately, automatic enabling of laptop mode does not work for
laptops that don't have ACPI. On those laptops, you need to start laptop
mode manually. To start laptop mode, run "laptop_mode start", and to
stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
has experimental support for APM, you might want to try that first.)


Caveats
-------

* The downside of laptop mode is that you have a chance of losing up to 10
  minutes of work. If you cannot afford this, don't use it! The supplied ACPI
  scripts automatically turn off laptop mode when the battery almost runs out,
  so that you won't lose any data at the end of your battery life.

* Most desktop hard drives have a very limited lifetime measured in spindown
  cycles, typically about 50.000 times (it's usually listed on the spec sheet).
  Check your drive's rating, and don't wear down your drive's lifetime if you
  don't need to.

* If you mount some of your ext3/reiserfs filesystems with the -n option, then
  the control script will not be able to remount them correctly. You must set
  DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
  wrong options -- or it will fail because it cannot write to /etc/mtab.

* If you have your filesystems listed as type "auto" in fstab, like I did, then
  the control script will not recognize them as filesystems that need remounting.
  You must list the filesystems with their true type instead.

* It has been reported that some versions of the mutt mail client use file access
  times to determine whether a folder contains new mail. If you use mutt and
  experience this, you must disable the noatime remounting by setting the option
  DO_REMOUNT_NOATIME to 0 in the configuration file.


The Details
-----------

Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
present for all kernels that have the laptop mode patch, regardless of any
configuration options. When the knob is set, any physical disk I/O (that might
have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
result of this is that after a disk has spun down, it will not be spun up
anymore to write dirty blocks, because those blocks had already been written
immediately after the most recent read operation. The value of the laptop_mode
knob determines the time between the occurrence of disk I/O and when the flush
is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
0 disables laptop mode.

To increase the effectiveness of the laptop_mode strategy, the laptop_mode
control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
/proc/sys/vm to about 10 minutes (by default), which means that pages that are
dirtied are not forced to be written to disk as often. The control script also
changes the dirty background ratio, so that background writeback of dirty pages
is not done anymore. Combined with a higher commit value (also 10 minutes) for
ext3 or ReiserFS filesystems (also done automatically by the control script),
this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.

If you want to find out which process caused the disk to spin up, you can
gather information by setting the flag /proc/sys/vm/block_dump. When this flag
is set, Linux reports all disk read and write operations that take place, and
all block dirtyings done to files. This makes it possible to debug why a disk
needs to spin up, and to increase battery life even more. The output of
block_dump is written to the kernel output, and it can be retrieved using
"dmesg". When you use block_dump and your kernel logging level also includes
kernel debugging messages, you probably want to turn off klogd, otherwise
the output of block_dump will be logged, causing disk activity that is not
normally there.


Configuration
-------------

The laptop mode configuration file is located in /etc/default/laptop-mode on
Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
contains the following options:

MAX_AGE:

Maximum time, in seconds, of hard drive spindown time that you are
confortable with. Worst case, it's possible that you could lose this
amount of work if your battery fails while you're in laptop mode.

MINIMUM_BATTERY_MINUTES:

Automatically disable laptop mode if the remaining number of minutes of
battery power is less than this value. Default is 10 minutes.

AC_HD/BATT_HD:

The idle timeout that should be set on your hard drive when laptop mode
is active (BATT_HD) and when it is not active (AC_HD). The defaults are
20 seconds (value 4) for BATT_HD  and 2 hours (value 244) for AC_HD. The
possible values are those listed in the manual page for "hdparm" for the
"-S" option.

HD:

The devices for which the spindown timeout should be adjusted by laptop mode.
Default is /dev/hda. If you specify multiple devices, separate them by a space.

READAHEAD:

Disk readahead, in 512-byte sectors, while laptop mode is active. A large
readahead can prevent disk accesses for things like executable pages (which are
loaded on demand while the application executes) and sequentially accessed data
(MP3s).

DO_REMOUNTS:

The control script automatically remounts any mounted journaled filesystems
with approriate commit interval options. When this option is set to 0, this
feature is disabled.

DO_REMOUNT_NOATIME:

When remounting, should the filesystems be remounted with the noatime option?
Normally, this is set to "1" (enabled), but there may be programs that require
access time recording.

DIRTY_RATIO:

The percentage of memory that is allowed to contain "dirty" or unsaved data
before a writeback is forced, while laptop mode is active. Corresponds to
the /proc/sys/vm/dirty_ratio sysctl.

DIRTY_BACKGROUND_RATIO:

The percentage of memory that is allowed to contain "dirty" or unsaved data
after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
sysctl.

Note that the behaviour of dirty_background_ratio is quite different
when laptop mode is active and when it isn't. When laptop mode is inactive,
dirty_background_ratio is the threshold percentage at which background writeouts
start taking place. When laptop mode is active, however, background writeouts
are disabled, and the dirty_background_ratio only determines how much writeback
is done when dirty_ratio is reached.

DO_CPU:

Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
See Documentation/cpu-freq/user-guide.txt for more info. Disabled by default.)

CPU_MAXFREQ:

When on battery, what is the maximum CPU speed that the system should use? Legal
values are "slowest" for the slowest speed that your CPU is able to operate at,
or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.


Tips & Tricks
-------------

* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
  of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).

* You can spin down the disk while playing MP3, by setting disk readahead
  to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
  once, and will then spin down while the MP3 is playing. (Thanks to Bartek
  Kania.)

* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
  of colours that my display uses it consumes less battery power. I've seen
  this on powerbooks too. I hope that this is a piece of information that
  might be useful to the Laptop Mode patch or it's users."

* In syslog.conf, you can prefix entries with a dash ``-'' to omit syncing the
  file after every logging. When you're using laptop-mode and your disk doesn't
  spin down, this is a likely culprit.

* Richard Atterer observed that laptop mode does not work well with noflushd
  (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
  from doing its thing.

* If you're worried about your data, you might want to consider using a USB
  memory stick or something like that as a "working area". (Be aware though
  that flash memory can only handle a limited number of writes, and overuse
  may wear out your memory stick pretty quickly. Do _not_ use journalling
  filesystems on flash memory sticks.)


Configuration file for control and ACPI battery scripts
-------------------------------------------------------

This allows the tunables to be changed for the scripts via an external
configuration file

It should be installed as /etc/default/laptop-mode on Debian, and as
/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.

--------------------CONFIG FILE BEGIN-------------------------------------------
# Maximum time, in seconds, of hard drive spindown time that you are
# confortable with. Worst case, it's possible that you could lose this
# amount of work if your battery fails you while in laptop mode.
#MAX_AGE=600

# Automatically disable laptop mode when the number of minutes of battery
# that you have left goes below this threshold.
MINIMUM_BATTERY_MINUTES=10

# Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
# by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
# will read a complete MP3 at once, and will then spin down while the MP3/OGG is
# playing.
#READAHEAD=4096

# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
#DO_REMOUNTS=1

# And shall we add the "noatime" option to that as well? (1=yes)
#DO_REMOUNT_NOATIME=1

# Dirty synchronous ratio.  At this percentage of dirty pages the process
# which
# calls write() does its own writeback
#DIRTY_RATIO=40

#
# Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
# exceeded, the kernel will wake pdflush which will then reduce the amount
# of dirty memory to dirty_background_ratio.  Set this nice and low, so once
# some writeout has commenced, we do a lot of it.
#
#DIRTY_BACKGROUND_RATIO=5

# kernel default dirty buffer age
#DEF_AGE=30
#DEF_UPDATE=5
#DEF_DIRTY_BACKGROUND_RATIO=10
#DEF_DIRTY_RATIO=40
#DEF_XFS_AGE_BUFFER=15
#DEF_XFS_SYNC_INTERVAL=30
#DEF_XFS_BUFD_INTERVAL=1

# This must be adjusted manually to the value of HZ in the running kernel
# on 2.4, until the XFS people change their 2.4 external interfaces to work in
# centisecs. This can be automated, but it's a work in progress that still
# needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
# external interfaces, and that is currently always set to 100. So you don't
# need to change this on 2.6.
#XFS_HZ=100

# Should the maximum CPU frequency be adjusted down while on battery?
# Requires CPUFreq to be setup.
# See Documentation/cpu-freq/user-guide.txt for more info
#DO_CPU=0

# When on battery what is the maximum CPU speed that the system should
# use? Legal values are "slowest" for the slowest speed that your
# CPU is able to operate at, or a value listed in:
# /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
# Only applicable if DO_CPU=1.
#CPU_MAXFREQ=slowest

# Idle timeout for your hard drive (man hdparm for valid values, -S option)
# Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
#AC_HD=244
#BATT_HD=4

# The drives for which to adjust the idle timeout. Separate them by a space,
# e.g. HD="/dev/hda /dev/hdb".
#HD="/dev/hda"

# Set the spindown timeout on a hard drive?
#DO_HD=1

--------------------CONFIG FILE END---------------------------------------------


Control script
--------------

Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
to Kiko Piris).

--------------------CONTROL SCRIPT BEGIN----------------------------------------
#!/bin/bash

# start or stop laptop_mode, best run by a power management daemon when
# ac gets connected/disconnected from a laptop
#
# install as /sbin/laptop_mode
#
# Contributors to this script:   Kiko Piris
#				 Bart Samwel
#				 Micha Feigin
#				 Andrew Morton
#				 Herve Eychenne
#				 Dax Kelson
#
# Original Linux 2.4 version by: Jens Axboe

#############################################################################

# Source config
if [ -f /etc/default/laptop-mode ] ; then
	# Debian
	. /etc/default/laptop-mode
elif [ -f /etc/sysconfig/laptop-mode ] ; then
	# Others
        . /etc/sysconfig/laptop-mode
fi

# Don't raise an error if the config file is incomplete
# set defaults instead:

# Maximum time, in seconds, of hard drive spindown time that you are
# confortable with. Worst case, it's possible that you could lose this
# amount of work if your battery fails you while in laptop mode.
MAX_AGE=${MAX_AGE:-'600'}

# Read-ahead, in kilobytes
READAHEAD=${READAHEAD:-'4096'}

# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
DO_REMOUNTS=${DO_REMOUNTS:-'1'}

# And shall we add the "noatime" option to that as well? (1=yes)
DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}

# Shall we adjust the idle timeout on a hard drive?
DO_HD=${DO_HD:-'1'}

# Adjust idle timeout on which hard drive?
HD="${HD:-'/dev/hda'}"

# spindown time for HD (hdparm -S values)
AC_HD=${AC_HD:-'244'}
BATT_HD=${BATT_HD:-'4'}

# Dirty synchronous ratio.  At this percentage of dirty pages the process which
# calls write() does its own writeback
DIRTY_RATIO=${DIRTY_RATIO:-'40'}

# cpu frequency scaling
# See Documentation/cpu-freq/user-guide.txt for more info
DO_CPU=${CPU_MANAGE:-'0'}
CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}

#
# Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
# exceeded, the kernel will wake pdflush which will then reduce the amount
# of dirty memory to dirty_background_ratio.  Set this nice and low, so once
# some writeout has commenced, we do a lot of it.
#
DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}

# kernel default dirty buffer age
DEF_AGE=${DEF_AGE:-'30'}
DEF_UPDATE=${DEF_UPDATE:-'5'}
DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}

# This must be adjusted manually to the value of HZ in the running kernel
# on 2.4, until the XFS people change their 2.4 external interfaces to work in
# centisecs. This can be automated, but it's a work in progress that still needs
# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
# interfaces, and that is currently always set to 100. So you don't need to
# change this on 2.6.
XFS_HZ=${XFS_HZ:-'100'}

#############################################################################

KLEVEL="$(uname -r |
             {
	       IFS='.' read a b c
	       echo $a.$b
	     }
)"
case "$KLEVEL" in
	"2.4"|"2.6")
		;;
	*)
		echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
		exit 1
		;;
esac

if [ ! -e /proc/sys/vm/laptop_mode ] ; then
	echo "Kernel is not patched with laptop_mode patch." >&2
	exit 1
fi

if [ ! -w /proc/sys/vm/laptop_mode ] ; then
	echo "You do not have enough privileges to enable laptop_mode." >&2
	exit 1
fi

# Remove an option (the first parameter) of the form option=<number> from
# a mount options string (the rest of the parameters).
parse_mount_opts () {
	OPT="$1"
	shift
	echo ",$*," | sed		\
	 -e 's/,'"$OPT"'=[0-9]*,/,/g'	\
	 -e 's/,,*/,/g'			\
	 -e 's/^,//'			\
	 -e 's/,$//'
}

# Remove an option (the first parameter) without any arguments from
# a mount option string (the rest of the parameters).
parse_nonumber_mount_opts () {
	OPT="$1"
	shift
	echo ",$*," | sed		\
	 -e 's/,'"$OPT"',/,/g'		\
	 -e 's/,,*/,/g'			\
	 -e 's/^,//'			\
	 -e 's/,$//'
}

# Find out the state of a yes/no option (e.g. "atime"/"noatime") in
# fstab for a given filesystem, and use this state to replace the
# value of the option in another mount options string. The device
# is the first argument, the option name the second, and the default
# value the third. The remainder is the mount options string.
#
# Example:
# parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
#
# If fstab contains, say, "rw" for this filesystem, then the result
# will be "defaults,atime".
parse_yesno_opts_wfstab () {
	L_DEV="$1"
	OPT="$2"
	DEF_OPT="$3"
	shift 3
	L_OPTS="$*"
	PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
	PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
	# Watch for a default atime in fstab
	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
	if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
		# option specified in fstab: extract the value and use it
		if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
			echo "$PARSEDOPTS1,no$OPT"
		else
			# no$OPT not found -- so we must have $OPT.
			echo "$PARSEDOPTS1,$OPT"
		fi
	else
		# option not specified in fstab -- choose the default.
		echo "$PARSEDOPTS1,$DEF_OPT"
	fi
}

# Find out the state of a numbered option (e.g. "commit=NNN") in
# fstab for a given filesystem, and use this state to replace the
# value of the option in another mount options string. The device
# is the first argument, and the option name the second. The
# remainder is the mount options string in which the replacement
# must be done.
#
# Example:
# parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
#
# If fstab contains, say, "commit=3,rw" for this filesystem, then the
# result will be "rw,commit=3".
parse_mount_opts_wfstab () {
	L_DEV="$1"
	OPT="$2"
	shift 2
	L_OPTS="$*"
	PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
	# Watch for a default commit in fstab
	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
	if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
		# option specified in fstab: extract the value, and use it
		echo -n "$PARSEDOPTS1,$OPT="
		echo ",$FSTAB_OPTS," | sed \
		 -e 's/.*,'"$OPT"'=//'	\
		 -e 's/,.*//'
	else
		# option not specified in fstab: set it to 0
		echo "$PARSEDOPTS1,$OPT=0"
	fi
}

deduce_fstype () {
	MP="$1"
	# My root filesystem unfortunately has
	# type "unknown" in /etc/mtab. If we encounter
	# "unknown", we try to get the type from fstab.
	cat /etc/fstab |
	grep -v '^#' |
	while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
		if [ "$FSTAB_MP" = "$MP" ]; then
			echo $FSTAB_FST
			exit 0
		fi
	done
}

if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
	NOATIME_OPT=",noatime"
fi

case "$1" in
	start)
		AGE=$((100*$MAX_AGE))
		XFS_AGE=$(($XFS_HZ*$MAX_AGE))
		echo -n "Starting laptop_mode"

		if [ -d /proc/sys/vm/pagebuf ] ; then
			# (For 2.4 and early 2.6.)
			# This only needs to be set, not reset -- it is only used when
			# laptop mode is enabled.
			echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
		elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
			# (A couple of early 2.6 laptop mode patches had these.)
			# The same goes for these.
			echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
		elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
			# (2.6.6)
			# But not for these -- they are also used in normal
			# operation.
			echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
			echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
			# (2.6.7 upwards)
			# And not for these either. These are in centisecs,
			# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
			echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
			echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
			echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
		fi

		case "$KLEVEL" in
			"2.4")
				echo 1					> /proc/sys/vm/laptop_mode
				echo "30 500 0 0 $AGE $AGE 60 20 0"	> /proc/sys/vm/bdflush
				;;
			"2.6")
				echo 5					> /proc/sys/vm/laptop_mode
				echo "$AGE"				> /proc/sys/vm/dirty_writeback_centisecs
				echo "$AGE"				> /proc/sys/vm/dirty_expire_centisecs
				echo "$DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
				echo "$DIRTY_BACKGROUND_RATIO"		> /proc/sys/vm/dirty_background_ratio
				;;
		esac
		if [ $DO_REMOUNTS -eq 1 ]; then
			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
				PARSEDOPTS="$(parse_mount_opts "$OPTS")"
				if [ "$FST" = 'unknown' ]; then
					FST=$(deduce_fstype $MP)
				fi
				case "$FST" in
					"ext3"|"reiserfs")
						PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
						;;
					"xfs")
						mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
						;;
				esac
				if [ -b $DEV ] ; then
					blockdev --setra $(($READAHEAD * 2)) $DEV
				fi
			done
		fi
		if [ $DO_HD -eq 1 ] ; then
			for THISHD in $HD ; do
				/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
				/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
			done
		fi
		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
			if [ $CPU_MAXFREQ = 'slowest' ]; then
				CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
			fi
			echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		fi
		echo "."
		;;
	stop)
		U_AGE=$((100*$DEF_UPDATE))
		B_AGE=$((100*$DEF_AGE))
		echo -n "Stopping laptop_mode"
		echo 0 > /proc/sys/vm/laptop_mode
		if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
			# These need to be restored, if there are no lm_*.
			echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER))	 	> /proc/sys/fs/xfs/age_buffer
			echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) 	> /proc/sys/fs/xfs/sync_interval
		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
			# These need to be restored as well.
			echo $((100*$DEF_XFS_AGE_BUFFER))	> /proc/sys/fs/xfs/age_buffer_centisecs
			echo $((100*$DEF_XFS_SYNC_INTERVAL))	> /proc/sys/fs/xfs/xfssyncd_centisecs
			echo $((100*$DEF_XFS_BUFD_INTERVAL))	> /proc/sys/fs/xfs/xfsbufd_centisecs
		fi
		case "$KLEVEL" in
			"2.4")
				echo "30 500 0 0 $U_AGE $B_AGE 60 20 0"	> /proc/sys/vm/bdflush
				;;
			"2.6")
				echo "$U_AGE"				> /proc/sys/vm/dirty_writeback_centisecs
				echo "$B_AGE"				> /proc/sys/vm/dirty_expire_centisecs
				echo "$DEF_DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
				echo "$DEF_DIRTY_BACKGROUND_RATIO"	> /proc/sys/vm/dirty_background_ratio
				;;
		esac
		if [ $DO_REMOUNTS -eq 1 ] ; then
			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
				# Reset commit and atime options to defaults.
				if [ "$FST" = 'unknown' ]; then
					FST=$(deduce_fstype $MP)
				fi
				case "$FST" in
					"ext3"|"reiserfs")
						PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
						;;
					"xfs")
						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
						;;
				esac
				if [ -b $DEV ] ; then
					blockdev --setra 256 $DEV
				fi
			done
		fi
		if [ $DO_HD -eq 1 ] ; then
			for THISHD in $HD ; do
				/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
				/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
			done
		fi
		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
			echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		fi
		echo "."
		;;
	*)
		echo "Usage: $0 {start|stop}" 2>&1
		exit 1
		;;

esac

exit 0
--------------------CONTROL SCRIPT END------------------------------------------


ACPI integration
----------------

Dax Kelson submitted this so that the ACPI acpid daemon will
kick off the laptop_mode script and run hdparm. The part that
automatically disables laptop mode when the battery is low was
writen by Jan Topinski.

-----------------/etc/acpi/events/ac_adapter BEGIN------------------------------
event=ac_adapter
action=/etc/acpi/actions/ac.sh %e
----------------/etc/acpi/events/ac_adapter END---------------------------------


-----------------/etc/acpi/events/battery BEGIN---------------------------------
event=battery.*
action=/etc/acpi/actions/battery.sh %e
----------------/etc/acpi/events/battery END------------------------------------


----------------/etc/acpi/actions/ac.sh BEGIN-----------------------------------
#!/bin/bash

# ac on/offline event handler

status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`

case $status in
        "on-line")
                /sbin/laptop_mode stop
                exit 0
        ;;
        "off-line")
                /sbin/laptop_mode start
                exit 0
        ;;
esac
---------------------------/etc/acpi/actions/ac.sh END--------------------------


---------------------------/etc/acpi/actions/battery.sh BEGIN-------------------
#! /bin/bash

# Automatically disable laptop mode when the battery almost runs out.

BATT_INFO=/proc/acpi/battery/$2/state

if [[ -f /proc/sys/vm/laptop_mode ]]
then
   LM=`cat /proc/sys/vm/laptop_mode`
   if [[ $LM -gt 0 ]]
   then
     if [[ -f $BATT_INFO ]]
     then
        # Source the config file only now that we know we need
        if [ -f /etc/default/laptop-mode ] ; then
                # Debian
                . /etc/default/laptop-mode
        elif [ -f /etc/sysconfig/laptop-mode ] ; then
                # Others
                . /etc/sysconfig/laptop-mode
        fi
        MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}

        ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
        if [[ ACTION -eq "discharging" ]]
        then
           PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
           REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
        fi
        if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
        then
           /sbin/laptop_mode stop
        fi
     else
       logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
     fi
   fi
fi
---------------------------/etc/acpi/actions/battery.sh END--------------------


Monitoring tool
---------------

Bartek Kania submitted this, it can be used to measure how much time your disk
spends spun up/down.

---------------------------dslm.c BEGIN-----------------------------------------
/*
 * Simple Disk Sleep Monitor
 *  by Bartek Kania
 * Licenced under the GPL
 */
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <time.h>
#include <string.h>
#include <signal.h>
#include <sys/ioctl.h>
#include <linux/hdreg.h>

#ifdef DEBUG
#define D(x) x
#else
#define D(x)
#endif

int endit = 0;

/* Check if the disk is in powersave-mode
 * Most of the code is stolen from hdparm.
 * 1 = active, 0 = standby/sleep, -1 = unknown */
int check_powermode(int fd)
{
    unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0};
    int state;

    if (ioctl(fd, HDIO_DRIVE_CMD, &args)
	&& (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */
	&& ioctl(fd, HDIO_DRIVE_CMD, &args)) {
	if (errno != EIO || args[0] != 0 || args[1] != 0) {
	    state = -1; /* "unknown"; */
	} else
	    state = 0; /* "sleeping"; */
    } else {
	state = (args[2] == 255) ? 1 : 0;
    }
    D(printf(" drive state is:  %d\n", state));

    return state;
}

char *state_name(int i)
{
    if (i == -1) return "unknown";
    if (i == 0) return "sleeping";
    if (i == 1) return "active";

    return "internal error";
}

char *myctime(time_t time)
{
    char *ts = ctime(&time);
    ts[strlen(ts) - 1] = 0;

    return ts;
}

void measure(int fd)
{
    time_t start_time;
    int last_state;
    time_t last_time;
    int curr_state;
    time_t curr_time = 0;
    time_t time_diff;
    time_t active_time = 0;
    time_t sleep_time = 0;
    time_t unknown_time = 0;
    time_t total_time = 0;
    int changes = 0;
    float tmp;

    printf("Starting measurements\n");

    last_state = check_powermode(fd);
    start_time = last_time = time(0);
    printf("  System is in state %s\n\n", state_name(last_state));

    while(!endit) {
	sleep(1);
	curr_state = check_powermode(fd);

	if (curr_state != last_state || endit) {
	    changes++;
	    curr_time = time(0);
	    time_diff = curr_time - last_time;

	    if (last_state == 1) active_time += time_diff;
	    else if (last_state == 0) sleep_time += time_diff;
	    else unknown_time += time_diff;

	    last_state = curr_state;
	    last_time = curr_time;

	    printf("%s: State-change to %s\n", myctime(curr_time),
		   state_name(curr_state));
	}
    }
    changes--; /* Compensate for SIGINT */

    total_time = time(0) - start_time;
    printf("\nTotal running time:  %lus\n", curr_time - start_time);
    printf(" State changed %d times\n", changes);

    tmp = (float)sleep_time / (float)total_time * 100;
    printf(" Time in sleep state:   %lus (%.2f%%)\n", sleep_time, tmp);
    tmp = (float)active_time / (float)total_time * 100;
    printf(" Time in active state:  %lus (%.2f%%)\n", active_time, tmp);
    tmp = (float)unknown_time / (float)total_time * 100;
    printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp);
}

void ender(int s)
{
    endit = 1;
}

void usage()
{
    puts("usage: dslm [-w <time>] <disk>");
    exit(0);
}

int main(int argc, char **argv)
{
    int fd;
    char *disk = 0;
    int settle_time = 60;

    /* Parse the simple command-line */
    if (argc == 2)
	disk = argv[1];
    else if (argc == 4) {
	settle_time = atoi(argv[2]);
	disk = argv[3];
    } else
	usage();

    if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) {
	printf("Can't open %s, because: %s\n", disk, strerror(errno));
	exit(-1);
    }

    if (settle_time) {
	printf("Waiting %d seconds for the system to settle down to "
	       "'normal'\n", settle_time);
	sleep(settle_time);
    } else
	puts("Not waiting for system to settle down");

    signal(SIGINT, ender);

    measure(fd);

    close(fd);

    return 0;
}
---------------------------dslm.c END-------------------------------------------