aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/power
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2016-07-26 20:29:07 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2016-07-26 20:29:07 -0400
commit6453dbdda30428a3c56568c96fe70ea3612f07e2 (patch)
tree9a3c6087a2832c36e8c49296fb05f95b877e0111 /kernel/power
parent27b79027bc112a63ad4004eb83c6acacae08a0de (diff)
parentbc841e260c95608921809a2c7481cf6f03bec21a (diff)
Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "Again, the majority of changes go into the cpufreq subsystem, but there are no big features this time. The cpufreq changes that stand out somewhat are the governor interface rework and improvements related to the handling of frequency tables. Apart from those, there are fixes and new device/CPU IDs in drivers, cleanups and an improvement of the new schedutil governor. Next, there are some changes in the hibernation core, including a fix for a nasty problem related to the MONITOR/MWAIT usage by CPU offline during resume from hibernation, a few core improvements related to memory management during resume, a couple of additional debug features and cleanups. Finally, we have some fixes and cleanups in the devfreq subsystem, generic power domains framework improvements related to system suspend/resume, support for some new chips in intel_idle and in the power capping RAPL driver, a new version of the AnalyzeSuspend utility and some assorted fixes and cleanups. Specifics: - Rework the cpufreq governor interface to make it more straightforward and modify the conservative governor to avoid using transition notifications (Rafael Wysocki). - Rework the handling of frequency tables by the cpufreq core to make it more efficient (Viresh Kumar). - Modify the schedutil governor to reduce the number of wakeups it causes to occur in cases when the CPU frequency doesn't need to be changed (Steve Muckle, Viresh Kumar). - Fix some minor issues and clean up code in the cpufreq core and governors (Rafael Wysocki, Viresh Kumar). - Add Intel Broxton support to the intel_pstate driver (Srinivas Pandruvada). - Fix problems related to the config TDP feature and to the validity of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka, Srinivas Pandruvada). - Make intel_pstate update the cpu_frequency tracepoint even if the frequency doesn't change to avoid confusing powertop (Rafael Wysocki). - Clean up the usage of __init/__initdata in intel_pstate, mark some of its internal variables as __read_mostly and drop an unused structure element from it (Jisheng Zhang, Carsten Emde). - Clean up the usage of some duplicate MSR symbols in intel_pstate and turbostat (Srinivas Pandruvada). - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay Adiga, Viresh Kumar, Ben Dooks). - Fix a regression (introduced during the 4.5 cycle) in the pcc-cpufreq driver by reverting the problematic commit (Andreas Herrmann). - Add support for Intel Denverton to intel_idle, clean up Broxton support in it and make it explicitly non-modular (Jacob Pan, Jan Beulich, Paul Gortmaker). - Add support for Denverton and Ivy Bridge server to the Intel RAPL power capping driver and make it more careful about the handing of MSRs that may not be present (Jacob Pan, Xiaolong Wang). - Fix resume from hibernation on x86-64 by making the CPU offline during resume avoid using MONITOR/MWAIT in the "play dead" loop which may lead to an inadvertent "revival" of a "dead" CPU and a page fault leading to a kernel crash from it (Rafael Wysocki). - Make memory management during resume from hibernation more straightforward (Rafael Wysocki). - Add debug features that should help to detect problems related to hibernation and resume from it (Rafael Wysocki, Chen Yu). - Clean up hibernation core somewhat (Rafael Wysocki). - Prevent KASAN from instrumenting the hibernation core which leads to large numbers of false-positives from it (James Morse). - Prevent PM (hibernate and suspend) notifiers from being called during the cleanup phase if they have not been called during the corresponding preparation phase which is possible if one of the other notifiers returns an error at that time (Lianwei Wang). - Improve suspend-related debug printout in the tasks freezer and clean up suspend-related console handling (Roger Lu, Borislav Petkov). - Update the AnalyzeSuspend script in the kernel sources to version 4.2 (Todd Brandt). - Modify the generic power domains framework to make it handle system suspend/resume better (Ulf Hansson). - Make the runtime PM framework avoid resuming devices synchronously when user space changes the runtime PM settings for them and improve its error reporting (Rafael Wysocki, Linus Walleij). - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus) and in the core, make some devfreq code explicitly non-modular and change some of it into tristate (Bartlomiej Zolnierkiewicz, Peter Chen, Paul Gortmaker). - Add DT support to the generic PM clocks management code and make it export some more symbols (Jon Hunter, Paul Gortmaker). - Make the PCI PM core code slightly more robust against possible driver errors (Andy Shevchenko). - Make it possible to change DESTDIR and PREFIX in turbostat (Andy Shevchenko)" * tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits) Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" PM / hibernate: Introduce test_resume mode for hibernation cpufreq: export cpufreq_driver_resolve_freq() cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index() PCI / PM: check all fields in pci_set_platform_pm() cpufreq: acpi-cpufreq: use cached frequency mapping when possible cpufreq: schedutil: map raw required frequency to driver frequency cpufreq: add cpufreq_driver_resolve_freq() cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT intel_pstate: Update cpu_frequency tracepoint every time cpufreq: intel_pstate: clean remnant struct element PM / tools: scripts: AnalyzeSuspend v4.2 x86 / hibernate: Use hlt_play_dead() when resuming from hibernation cpufreq: powernv: Replacing pstate_id with frequency table index intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate() PM / hibernate: Image data protection during restoration PM / hibernate: Add missing braces in __register_nosave_region() PM / hibernate: Clean up comments in snapshot.c PM / hibernate: Clean up function headers in snapshot.c PM / hibernate: Add missing braces in hibernate_setup() ...
Diffstat (limited to 'kernel/power')
-rw-r--r--kernel/power/Makefile2
-rw-r--r--kernel/power/console.c8
-rw-r--r--kernel/power/hibernate.c101
-rw-r--r--kernel/power/main.c11
-rw-r--r--kernel/power/power.h11
-rw-r--r--kernel/power/process.c3
-rw-r--r--kernel/power/snapshot.c940
-rw-r--r--kernel/power/suspend.c10
-rw-r--r--kernel/power/swap.c6
-rw-r--r--kernel/power/user.c14
10 files changed, 632 insertions, 474 deletions
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index cb880a14cc39..eb4f717705ba 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -1,6 +1,8 @@
1 1
2ccflags-$(CONFIG_PM_DEBUG) := -DDEBUG 2ccflags-$(CONFIG_PM_DEBUG) := -DDEBUG
3 3
4KASAN_SANITIZE_snapshot.o := n
5
4obj-y += qos.o 6obj-y += qos.o
5obj-$(CONFIG_PM) += main.o 7obj-$(CONFIG_PM) += main.o
6obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o 8obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o
diff --git a/kernel/power/console.c b/kernel/power/console.c
index aba9c545a0e3..0e781798b0b3 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -126,17 +126,17 @@ out:
126 return ret; 126 return ret;
127} 127}
128 128
129int pm_prepare_console(void) 129void pm_prepare_console(void)
130{ 130{
131 if (!pm_vt_switch()) 131 if (!pm_vt_switch())
132 return 0; 132 return;
133 133
134 orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1); 134 orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
135 if (orig_fgconsole < 0) 135 if (orig_fgconsole < 0)
136 return 1; 136 return;
137 137
138 orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE); 138 orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
139 return 0; 139 return;
140} 140}
141 141
142void pm_restore_console(void) 142void pm_restore_console(void)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 9021387c6ff4..a881c6a7ba74 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -52,6 +52,7 @@ enum {
52#ifdef CONFIG_SUSPEND 52#ifdef CONFIG_SUSPEND
53 HIBERNATION_SUSPEND, 53 HIBERNATION_SUSPEND,
54#endif 54#endif
55 HIBERNATION_TEST_RESUME,
55 /* keep last */ 56 /* keep last */
56 __HIBERNATION_AFTER_LAST 57 __HIBERNATION_AFTER_LAST
57}; 58};
@@ -409,6 +410,11 @@ int hibernation_snapshot(int platform_mode)
409 goto Close; 410 goto Close;
410} 411}
411 412
413int __weak hibernate_resume_nonboot_cpu_disable(void)
414{
415 return disable_nonboot_cpus();
416}
417
412/** 418/**
413 * resume_target_kernel - Restore system state from a hibernation image. 419 * resume_target_kernel - Restore system state from a hibernation image.
414 * @platform_mode: Whether or not to use the platform driver. 420 * @platform_mode: Whether or not to use the platform driver.
@@ -433,7 +439,7 @@ static int resume_target_kernel(bool platform_mode)
433 if (error) 439 if (error)
434 goto Cleanup; 440 goto Cleanup;
435 441
436 error = disable_nonboot_cpus(); 442 error = hibernate_resume_nonboot_cpu_disable();
437 if (error) 443 if (error)
438 goto Enable_cpus; 444 goto Enable_cpus;
439 445
@@ -642,12 +648,39 @@ static void power_down(void)
642 cpu_relax(); 648 cpu_relax();
643} 649}
644 650
651static int load_image_and_restore(void)
652{
653 int error;
654 unsigned int flags;
655
656 pr_debug("PM: Loading hibernation image.\n");
657
658 lock_device_hotplug();
659 error = create_basic_memory_bitmaps();
660 if (error)
661 goto Unlock;
662
663 error = swsusp_read(&flags);
664 swsusp_close(FMODE_READ);
665 if (!error)
666 hibernation_restore(flags & SF_PLATFORM_MODE);
667
668 printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
669 swsusp_free();
670 free_basic_memory_bitmaps();
671 Unlock:
672 unlock_device_hotplug();
673
674 return error;
675}
676
645/** 677/**
646 * hibernate - Carry out system hibernation, including saving the image. 678 * hibernate - Carry out system hibernation, including saving the image.
647 */ 679 */
648int hibernate(void) 680int hibernate(void)
649{ 681{
650 int error; 682 int error, nr_calls = 0;
683 bool snapshot_test = false;
651 684
652 if (!hibernation_available()) { 685 if (!hibernation_available()) {
653 pr_debug("PM: Hibernation not available.\n"); 686 pr_debug("PM: Hibernation not available.\n");
@@ -662,9 +695,11 @@ int hibernate(void)
662 } 695 }
663 696
664 pm_prepare_console(); 697 pm_prepare_console();
665 error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); 698 error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
666 if (error) 699 if (error) {
700 nr_calls--;
667 goto Exit; 701 goto Exit;
702 }
668 703
669 printk(KERN_INFO "PM: Syncing filesystems ... "); 704 printk(KERN_INFO "PM: Syncing filesystems ... ");
670 sys_sync(); 705 sys_sync();
@@ -697,8 +732,12 @@ int hibernate(void)
697 pr_debug("PM: writing image.\n"); 732 pr_debug("PM: writing image.\n");
698 error = swsusp_write(flags); 733 error = swsusp_write(flags);
699 swsusp_free(); 734 swsusp_free();
700 if (!error) 735 if (!error) {
701 power_down(); 736 if (hibernation_mode == HIBERNATION_TEST_RESUME)
737 snapshot_test = true;
738 else
739 power_down();
740 }
702 in_suspend = 0; 741 in_suspend = 0;
703 pm_restore_gfp_mask(); 742 pm_restore_gfp_mask();
704 } else { 743 } else {
@@ -709,12 +748,18 @@ int hibernate(void)
709 free_basic_memory_bitmaps(); 748 free_basic_memory_bitmaps();
710 Thaw: 749 Thaw:
711 unlock_device_hotplug(); 750 unlock_device_hotplug();
751 if (snapshot_test) {
752 pr_debug("PM: Checking hibernation image\n");
753 error = swsusp_check();
754 if (!error)
755 error = load_image_and_restore();
756 }
712 thaw_processes(); 757 thaw_processes();
713 758
714 /* Don't bother checking whether freezer_test_done is true */ 759 /* Don't bother checking whether freezer_test_done is true */
715 freezer_test_done = false; 760 freezer_test_done = false;
716 Exit: 761 Exit:
717 pm_notifier_call_chain(PM_POST_HIBERNATION); 762 __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
718 pm_restore_console(); 763 pm_restore_console();
719 atomic_inc(&snapshot_device_available); 764 atomic_inc(&snapshot_device_available);
720 Unlock: 765 Unlock:
@@ -740,8 +785,7 @@ int hibernate(void)
740 */ 785 */
741static int software_resume(void) 786static int software_resume(void)
742{ 787{
743 int error; 788 int error, nr_calls = 0;
744 unsigned int flags;
745 789
746 /* 790 /*
747 * If the user said "noresume".. bail out early. 791 * If the user said "noresume".. bail out early.
@@ -827,35 +871,20 @@ static int software_resume(void)
827 } 871 }
828 872
829 pm_prepare_console(); 873 pm_prepare_console();
830 error = pm_notifier_call_chain(PM_RESTORE_PREPARE); 874 error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls);
831 if (error) 875 if (error) {
876 nr_calls--;
832 goto Close_Finish; 877 goto Close_Finish;
878 }
833 879
834 pr_debug("PM: Preparing processes for restore.\n"); 880 pr_debug("PM: Preparing processes for restore.\n");
835 error = freeze_processes(); 881 error = freeze_processes();
836 if (error) 882 if (error)
837 goto Close_Finish; 883 goto Close_Finish;
838 884 error = load_image_and_restore();
839 pr_debug("PM: Loading hibernation image.\n");
840
841 lock_device_hotplug();
842 error = create_basic_memory_bitmaps();
843 if (error)
844 goto Thaw;
845
846 error = swsusp_read(&flags);
847 swsusp_close(FMODE_READ);
848 if (!error)
849 hibernation_restore(flags & SF_PLATFORM_MODE);
850
851 printk(KERN_ERR "PM: Failed to load hibernation image, recovering.\n");
852 swsusp_free();
853 free_basic_memory_bitmaps();
854 Thaw:
855 unlock_device_hotplug();
856 thaw_processes(); 885 thaw_processes();
857 Finish: 886 Finish:
858 pm_notifier_call_chain(PM_POST_RESTORE); 887 __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
859 pm_restore_console(); 888 pm_restore_console();
860 atomic_inc(&snapshot_device_available); 889 atomic_inc(&snapshot_device_available);
861 /* For success case, the suspend path will release the lock */ 890 /* For success case, the suspend path will release the lock */
@@ -878,6 +907,7 @@ static const char * const hibernation_modes[] = {
878#ifdef CONFIG_SUSPEND 907#ifdef CONFIG_SUSPEND
879 [HIBERNATION_SUSPEND] = "suspend", 908 [HIBERNATION_SUSPEND] = "suspend",
880#endif 909#endif
910 [HIBERNATION_TEST_RESUME] = "test_resume",
881}; 911};
882 912
883/* 913/*
@@ -924,6 +954,7 @@ static ssize_t disk_show(struct kobject *kobj, struct kobj_attribute *attr,
924#ifdef CONFIG_SUSPEND 954#ifdef CONFIG_SUSPEND
925 case HIBERNATION_SUSPEND: 955 case HIBERNATION_SUSPEND:
926#endif 956#endif
957 case HIBERNATION_TEST_RESUME:
927 break; 958 break;
928 case HIBERNATION_PLATFORM: 959 case HIBERNATION_PLATFORM:
929 if (hibernation_ops) 960 if (hibernation_ops)
@@ -970,6 +1001,7 @@ static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
970#ifdef CONFIG_SUSPEND 1001#ifdef CONFIG_SUSPEND
971 case HIBERNATION_SUSPEND: 1002 case HIBERNATION_SUSPEND:
972#endif 1003#endif
1004 case HIBERNATION_TEST_RESUME:
973 hibernation_mode = mode; 1005 hibernation_mode = mode;
974 break; 1006 break;
975 case HIBERNATION_PLATFORM: 1007 case HIBERNATION_PLATFORM:
@@ -1115,13 +1147,16 @@ static int __init resume_offset_setup(char *str)
1115 1147
1116static int __init hibernate_setup(char *str) 1148static int __init hibernate_setup(char *str)
1117{ 1149{
1118 if (!strncmp(str, "noresume", 8)) 1150 if (!strncmp(str, "noresume", 8)) {
1119 noresume = 1; 1151 noresume = 1;
1120 else if (!strncmp(str, "nocompress", 10)) 1152 } else if (!strncmp(str, "nocompress", 10)) {
1121 nocompress = 1; 1153 nocompress = 1;
1122 else if (!strncmp(str, "no", 2)) { 1154 } else if (!strncmp(str, "no", 2)) {
1123 noresume = 1; 1155 noresume = 1;
1124 nohibernate = 1; 1156 nohibernate = 1;
1157 } else if (IS_ENABLED(CONFIG_DEBUG_RODATA)
1158 && !strncmp(str, "protect_image", 13)) {
1159 enable_restore_image_protection();
1125 } 1160 }
1126 return 1; 1161 return 1;
1127} 1162}
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 27946975eff0..5ea50b1b7595 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -38,12 +38,19 @@ int unregister_pm_notifier(struct notifier_block *nb)
38} 38}
39EXPORT_SYMBOL_GPL(unregister_pm_notifier); 39EXPORT_SYMBOL_GPL(unregister_pm_notifier);
40 40
41int pm_notifier_call_chain(unsigned long val) 41int __pm_notifier_call_chain(unsigned long val, int nr_to_call, int *nr_calls)
42{ 42{
43 int ret = blocking_notifier_call_chain(&pm_chain_head, val, NULL); 43 int ret;
44
45 ret = __blocking_notifier_call_chain(&pm_chain_head, val, NULL,
46 nr_to_call, nr_calls);
44 47
45 return notifier_to_errno(ret); 48 return notifier_to_errno(ret);
46} 49}
50int pm_notifier_call_chain(unsigned long val)
51{
52 return __pm_notifier_call_chain(val, -1, NULL);
53}
47 54
48/* If set, devices may be suspended and resumed asynchronously. */ 55/* If set, devices may be suspended and resumed asynchronously. */
49int pm_async_enabled = 1; 56int pm_async_enabled = 1;
diff --git a/kernel/power/power.h b/kernel/power/power.h
index efe1b3b17c88..242d8b827dd5 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -38,6 +38,8 @@ static inline char *check_image_kernel(struct swsusp_info *info)
38} 38}
39#endif /* CONFIG_ARCH_HIBERNATION_HEADER */ 39#endif /* CONFIG_ARCH_HIBERNATION_HEADER */
40 40
41extern int hibernate_resume_nonboot_cpu_disable(void);
42
41/* 43/*
42 * Keep some memory free so that I/O operations can succeed without paging 44 * Keep some memory free so that I/O operations can succeed without paging
43 * [Might this be more than 4 MB?] 45 * [Might this be more than 4 MB?]
@@ -59,6 +61,13 @@ extern int hibernation_snapshot(int platform_mode);
59extern int hibernation_restore(int platform_mode); 61extern int hibernation_restore(int platform_mode);
60extern int hibernation_platform_enter(void); 62extern int hibernation_platform_enter(void);
61 63
64#ifdef CONFIG_DEBUG_RODATA
65/* kernel/power/snapshot.c */
66extern void enable_restore_image_protection(void);
67#else
68static inline void enable_restore_image_protection(void) {}
69#endif /* CONFIG_DEBUG_RODATA */
70
62#else /* !CONFIG_HIBERNATION */ 71#else /* !CONFIG_HIBERNATION */
63 72
64static inline void hibernate_reserved_size_init(void) {} 73static inline void hibernate_reserved_size_init(void) {}
@@ -200,6 +209,8 @@ static inline void suspend_test_finish(const char *label) {}
200 209
201#ifdef CONFIG_PM_SLEEP 210#ifdef CONFIG_PM_SLEEP
202/* kernel/power/main.c */ 211/* kernel/power/main.c */
212extern int __pm_notifier_call_chain(unsigned long val, int nr_to_call,
213 int *nr_calls);
203extern int pm_notifier_call_chain(unsigned long val); 214extern int pm_notifier_call_chain(unsigned long val);
204#endif 215#endif
205 216
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 0c2ee9761d57..8f27d5a8adf6 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -89,6 +89,9 @@ static int try_to_freeze_tasks(bool user_only)
89 elapsed_msecs / 1000, elapsed_msecs % 1000, 89 elapsed_msecs / 1000, elapsed_msecs % 1000,
90 todo - wq_busy, wq_busy); 90 todo - wq_busy, wq_busy);
91 91
92 if (wq_busy)
93 show_workqueue_state();
94
92 if (!wakeup) { 95 if (!wakeup) {
93 read_lock(&tasklist_lock); 96 read_lock(&tasklist_lock);
94 for_each_process_thread(g, p) { 97 for_each_process_thread(g, p) {
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 3a970604308f..d90df926b59f 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -38,6 +38,43 @@
38 38
39#include "power.h" 39#include "power.h"
40 40
41#ifdef CONFIG_DEBUG_RODATA
42static bool hibernate_restore_protection;
43static bool hibernate_restore_protection_active;
44
45void enable_restore_image_protection(void)
46{
47 hibernate_restore_protection = true;
48}
49
50static inline void hibernate_restore_protection_begin(void)
51{
52 hibernate_restore_protection_active = hibernate_restore_protection;
53}
54
55static inline void hibernate_restore_protection_end(void)
56{
57 hibernate_restore_protection_active = false;
58}
59
60static inline void hibernate_restore_protect_page(void *page_address)
61{
62 if (hibernate_restore_protection_active)
63 set_memory_ro((unsigned long)page_address, 1);
64}
65
66static inline void hibernate_restore_unprotect_page(void *page_address)
67{
68 if (hibernate_restore_protection_active)
69 set_memory_rw((unsigned long)page_address, 1);
70}
71#else
72static inline void hibernate_restore_protection_begin(void) {}
73static inline void hibernate_restore_protection_end(void) {}
74static inline void hibernate_restore_protect_page(void *page_address) {}
75static inline void hibernate_restore_unprotect_page(void *page_address) {}
76#endif /* CONFIG_DEBUG_RODATA */
77
41static int swsusp_page_is_free(struct page *); 78static int swsusp_page_is_free(struct page *);
42static void swsusp_set_page_forbidden(struct page *); 79static void swsusp_set_page_forbidden(struct page *);
43static void swsusp_unset_page_forbidden(struct page *); 80static void swsusp_unset_page_forbidden(struct page *);
@@ -67,25 +104,32 @@ void __init hibernate_image_size_init(void)
67 image_size = ((totalram_pages * 2) / 5) * PAGE_SIZE; 104 image_size = ((totalram_pages * 2) / 5) * PAGE_SIZE;
68} 105}
69 106
70/* List of PBEs needed for restoring the pages that were allocated before 107/*
108 * List of PBEs needed for restoring the pages that were allocated before
71 * the suspend and included in the suspend image, but have also been 109 * the suspend and included in the suspend image, but have also been
72 * allocated by the "resume" kernel, so their contents cannot be written 110 * allocated by the "resume" kernel, so their contents cannot be written
73 * directly to their "original" page frames. 111 * directly to their "original" page frames.
74 */ 112 */
75struct pbe *restore_pblist; 113struct pbe *restore_pblist;
76 114
77/* Pointer to an auxiliary buffer (1 page) */ 115/* struct linked_page is used to build chains of pages */
78static void *buffer;
79 116
80/** 117#define LINKED_PAGE_DATA_SIZE (PAGE_SIZE - sizeof(void *))
81 * @safe_needed - on resume, for storing the PBE list and the image, 118
82 * we can only use memory pages that do not conflict with the pages 119struct linked_page {
83 * used before suspend. The unsafe pages have PageNosaveFree set 120 struct linked_page *next;
84 * and we count them using unsafe_pages. 121 char data[LINKED_PAGE_DATA_SIZE];
85 * 122} __packed;
86 * Each allocated image page is marked as PageNosave and PageNosaveFree 123
87 * so that swsusp_free() can release it. 124/*
125 * List of "safe" pages (ie. pages that were not used by the image kernel
126 * before hibernation) that may be used as temporary storage for image kernel
127 * memory contents.
88 */ 128 */
129static struct linked_page *safe_pages_list;
130
131/* Pointer to an auxiliary buffer (1 page) */
132static void *buffer;
89 133
90#define PG_ANY 0 134#define PG_ANY 0
91#define PG_SAFE 1 135#define PG_SAFE 1
@@ -94,6 +138,19 @@ static void *buffer;
94 138
95static unsigned int allocated_unsafe_pages; 139static unsigned int allocated_unsafe_pages;
96 140
141/**
142 * get_image_page - Allocate a page for a hibernation image.
143 * @gfp_mask: GFP mask for the allocation.
144 * @safe_needed: Get pages that were not used before hibernation (restore only)
145 *
146 * During image restoration, for storing the PBE list and the image data, we can
147 * only use memory pages that do not conflict with the pages used before
148 * hibernation. The "unsafe" pages have PageNosaveFree set and we count them
149 * using allocated_unsafe_pages.
150 *
151 * Each allocated image page is marked as PageNosave and PageNosaveFree so that
152 * swsusp_free() can release it.
153 */
97static void *get_image_page(gfp_t gfp_mask, int safe_needed) 154static void *get_image_page(gfp_t gfp_mask, int safe_needed)
98{ 155{
99 void *res; 156 void *res;
@@ -113,9 +170,21 @@ static void *get_image_page(gfp_t gfp_mask, int safe_needed)
113 return res; 170 return res;
114} 171}
115 172
173static void *__get_safe_page(gfp_t gfp_mask)
174{
175 if (safe_pages_list) {
176 void *ret = safe_pages_list;
177
178 safe_pages_list = safe_pages_list->next;
179 memset(ret, 0, PAGE_SIZE);
180 return ret;
181 }
182 return get_image_page(gfp_mask, PG_SAFE);
183}
184
116unsigned long get_safe_page(gfp_t gfp_mask) 185unsigned long get_safe_page(gfp_t gfp_mask)
117{ 186{
118 return (unsigned long)get_image_page(gfp_mask, PG_SAFE); 187 return (unsigned long)__get_safe_page(gfp_mask);
119} 188}
120 189
121static struct page *alloc_image_page(gfp_t gfp_mask) 190static struct page *alloc_image_page(gfp_t gfp_mask)
@@ -130,11 +199,22 @@ static struct page *alloc_image_page(gfp_t gfp_mask)
130 return page; 199 return page;
131} 200}
132 201
202static void recycle_safe_page(void *page_address)
203{
204 struct linked_page *lp = page_address;
205
206 lp->next = safe_pages_list;
207 safe_pages_list = lp;
208}
209
133/** 210/**
134 * free_image_page - free page represented by @addr, allocated with 211 * free_image_page - Free a page allocated for hibernation image.
135 * get_image_page (page flags set by it must be cleared) 212 * @addr: Address of the page to free.
213 * @clear_nosave_free: If set, clear the PageNosaveFree bit for the page.
214 *
215 * The page to free should have been allocated by get_image_page() (page flags
216 * set by it are affected).
136 */ 217 */
137
138static inline void free_image_page(void *addr, int clear_nosave_free) 218static inline void free_image_page(void *addr, int clear_nosave_free)
139{ 219{
140 struct page *page; 220 struct page *page;
@@ -150,17 +230,8 @@ static inline void free_image_page(void *addr, int clear_nosave_free)
150 __free_page(page); 230 __free_page(page);
151} 231}
152 232
153/* struct linked_page is used to build chains of pages */ 233static inline void free_list_of_pages(struct linked_page *list,
154 234 int clear_page_nosave)
155#define LINKED_PAGE_DATA_SIZE (PAGE_SIZE - sizeof(void *))
156
157struct linked_page {
158 struct linked_page *next;
159 char data[LINKED_PAGE_DATA_SIZE];
160} __packed;
161
162static inline void
163free_list_of_pages(struct linked_page *list, int clear_page_nosave)
164{ 235{
165 while (list) { 236 while (list) {
166 struct linked_page *lp = list->next; 237 struct linked_page *lp = list->next;
@@ -170,30 +241,28 @@ free_list_of_pages(struct linked_page *list, int clear_page_nosave)
170 } 241 }
171} 242}
172 243
173/** 244/*
174 * struct chain_allocator is used for allocating small objects out of 245 * struct chain_allocator is used for allocating small objects out of
175 * a linked list of pages called 'the chain'. 246 * a linked list of pages called 'the chain'.
176 * 247 *
177 * The chain grows each time when there is no room for a new object in 248 * The chain grows each time when there is no room for a new object in
178 * the current page. The allocated objects cannot be freed individually. 249 * the current page. The allocated objects cannot be freed individually.
179 * It is only possible to free them all at once, by freeing the entire 250 * It is only possible to free them all at once, by freeing the entire
180 * chain. 251 * chain.
181 * 252 *
182 * NOTE: The chain allocator may be inefficient if the allocated objects 253 * NOTE: The chain allocator may be inefficient if the allocated objects
183 * are not much smaller than PAGE_SIZE. 254 * are not much smaller than PAGE_SIZE.
184 */ 255 */
185
186struct chain_allocator { 256struct chain_allocator {
187 struct linked_page *chain; /* the chain */ 257 struct linked_page *chain; /* the chain */
188 unsigned int used_space; /* total size of objects allocated out 258 unsigned int used_space; /* total size of objects allocated out
189 * of the current page 259 of the current page */
190 */
191 gfp_t gfp_mask; /* mask for allocating pages */ 260 gfp_t gfp_mask; /* mask for allocating pages */
192 int safe_needed; /* if set, only "safe" pages are allocated */ 261 int safe_needed; /* if set, only "safe" pages are allocated */
193}; 262};
194 263
195static void 264static void chain_init(struct chain_allocator *ca, gfp_t gfp_mask,
196chain_init(struct chain_allocator *ca, gfp_t gfp_mask, int safe_needed) 265 int safe_needed)
197{ 266{
198 ca->chain = NULL; 267 ca->chain = NULL;
199 ca->used_space = LINKED_PAGE_DATA_SIZE; 268 ca->used_space = LINKED_PAGE_DATA_SIZE;
@@ -208,7 +277,8 @@ static void *chain_alloc(struct chain_allocator *ca, unsigned int size)
208 if (LINKED_PAGE_DATA_SIZE - ca->used_space < size) { 277 if (LINKED_PAGE_DATA_SIZE - ca->used_space < size) {
209 struct linked_page *lp; 278 struct linked_page *lp;
210 279
211 lp = get_image_page(ca->gfp_mask, ca->safe_needed); 280 lp = ca->safe_needed ? __get_safe_page(ca->gfp_mask) :
281 get_image_page(ca->gfp_mask, PG_ANY);
212 if (!lp) 282 if (!lp)
213 return NULL; 283 return NULL;
214 284
@@ -222,44 +292,44 @@ static void *chain_alloc(struct chain_allocator *ca, unsigned int size)
222} 292}
223 293
224/** 294/**
225 * Data types related to memory bitmaps. 295 * Data types related to memory bitmaps.
226 * 296 *
227 * Memory bitmap is a structure consiting of many linked lists of 297 * Memory bitmap is a structure consiting of many linked lists of
228 * objects. The main list's elements are of type struct zone_bitmap 298 * objects. The main list's elements are of type struct zone_bitmap
229 * and each of them corresonds to one zone. For each zone bitmap 299 * and each of them corresonds to one zone. For each zone bitmap
230 * object there is a list of objects of type struct bm_block that 300 * object there is a list of objects of type struct bm_block that
231 * represent each blocks of bitmap in which information is stored. 301 * represent each blocks of bitmap in which information is stored.
232 * 302 *
233 * struct memory_bitmap contains a pointer to the main list of zone 303 * struct memory_bitmap contains a pointer to the main list of zone
234 * bitmap objects, a struct bm_position used for browsing the bitmap, 304 * bitmap objects, a struct bm_position used for browsing the bitmap,
235 * and a pointer to the list of pages used for allocating all of the 305 * and a pointer to the list of pages used for allocating all of the
236 * zone bitmap objects and bitmap block objects. 306 * zone bitmap objects and bitmap block objects.
237 * 307 *
238 * NOTE: It has to be possible to lay out the bitmap in memory 308 * NOTE: It has to be possible to lay out the bitmap in memory
239 * using only allocations of order 0. Additionally, the bitmap is 309 * using only allocations of order 0. Additionally, the bitmap is
240 * designed to work with arbitrary number of zones (this is over the 310 * designed to work with arbitrary number of zones (this is over the
241 * top for now, but let's avoid making unnecessary assumptions ;-). 311 * top for now, but let's avoid making unnecessary assumptions ;-).
242 * 312 *
243 * struct zone_bitmap contains a pointer to a list of bitmap block 313 * struct zone_bitmap contains a pointer to a list of bitmap block
244 * objects and a pointer to the bitmap block object that has been 314 * objects and a pointer to the bitmap block object that has been
245 * most recently used for setting bits. Additionally, it contains the 315 * most recently used for setting bits. Additionally, it contains the
246 * pfns that correspond to the start and end of the represented zone. 316 * PFNs that correspond to the start and end of the represented zone.
247 * 317 *
248 * struct bm_block contains a pointer to the memory page in which 318 * struct bm_block contains a pointer to the memory page in which
249 * information is stored (in the form of a block of bitmap) 319 * information is stored (in the form of a block of bitmap)
250 * It also contains the pfns that correspond to the start and end of 320 * It also contains the pfns that correspond to the start and end of
251 * the represented memory area. 321 * the represented memory area.
252 * 322 *
253 * The memory bitmap is organized as a radix tree to guarantee fast random 323 * The memory bitmap is organized as a radix tree to guarantee fast random
254 * access to the bits. There is one radix tree for each zone (as returned 324 * access to the bits. There is one radix tree for each zone (as returned
255 * from create_mem_extents). 325 * from create_mem_extents).
256 * 326 *
257 * One radix tree is represented by one struct mem_zone_bm_rtree. There are 327 * One radix tree is represented by one struct mem_zone_bm_rtree. There are
258 * two linked lists for the nodes of the tree, one for the inner nodes and 328 * two linked lists for the nodes of the tree, one for the inner nodes and
259 * one for the leave nodes. The linked leave nodes are used for fast linear 329 * one for the leave nodes. The linked leave nodes are used for fast linear
260 * access of the memory bitmap. 330 * access of the memory bitmap.
261 * 331 *
262 * The struct rtree_node represents one node of the radix tree. 332 * The struct rtree_node represents one node of the radix tree.
263 */ 333 */
264 334
265#define BM_END_OF_MAP (~0UL) 335#define BM_END_OF_MAP (~0UL)
@@ -305,9 +375,8 @@ struct bm_position {
305struct memory_bitmap { 375struct memory_bitmap {
306 struct list_head zones; 376 struct list_head zones;
307 struct linked_page *p_list; /* list of pages used to store zone 377 struct linked_page *p_list; /* list of pages used to store zone
308 * bitmap objects and bitmap block 378 bitmap objects and bitmap block
309 * objects 379 objects */
310 */
311 struct bm_position cur; /* most recently used bit position */ 380 struct bm_position cur; /* most recently used bit position */
312}; 381};
313 382
@@ -321,12 +390,12 @@ struct memory_bitmap {
321#endif 390#endif
322#define BM_RTREE_LEVEL_MASK ((1UL << BM_RTREE_LEVEL_SHIFT) - 1) 391#define BM_RTREE_LEVEL_MASK ((1UL << BM_RTREE_LEVEL_SHIFT) - 1)
323 392
324/* 393/**
325 * alloc_rtree_node - Allocate a new node and add it to the radix tree. 394 * alloc_rtree_node - Allocate a new node and add it to the radix tree.
326 * 395 *
327 * This function is used to allocate inner nodes as well as the 396 * This function is used to allocate inner nodes as well as the
328 * leave nodes of the radix tree. It also adds the node to the 397 * leave nodes of the radix tree. It also adds the node to the
329 * corresponding linked list passed in by the *list parameter. 398 * corresponding linked list passed in by the *list parameter.
330 */ 399 */
331static struct rtree_node *alloc_rtree_node(gfp_t gfp_mask, int safe_needed, 400static struct rtree_node *alloc_rtree_node(gfp_t gfp_mask, int safe_needed,
332 struct chain_allocator *ca, 401 struct chain_allocator *ca,
@@ -347,12 +416,12 @@ static struct rtree_node *alloc_rtree_node(gfp_t gfp_mask, int safe_needed,
347 return node; 416 return node;
348} 417}
349 418
350/* 419/**
351 * add_rtree_block - Add a new leave node to the radix tree 420 * add_rtree_block - Add a new leave node to the radix tree.
352 * 421 *
353 * The leave nodes need to be allocated in order to keep the leaves 422 * The leave nodes need to be allocated in order to keep the leaves
354 * linked list in order. This is guaranteed by the zone->blocks 423 * linked list in order. This is guaranteed by the zone->blocks
355 * counter. 424 * counter.
356 */ 425 */
357static int add_rtree_block(struct mem_zone_bm_rtree *zone, gfp_t gfp_mask, 426static int add_rtree_block(struct mem_zone_bm_rtree *zone, gfp_t gfp_mask,
358 int safe_needed, struct chain_allocator *ca) 427 int safe_needed, struct chain_allocator *ca)
@@ -417,17 +486,18 @@ static int add_rtree_block(struct mem_zone_bm_rtree *zone, gfp_t gfp_mask,
417static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone, 486static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone,
418 int clear_nosave_free); 487 int clear_nosave_free);
419 488
420/* 489/**
421 * create_zone_bm_rtree - create a radix tree for one zone 490 * create_zone_bm_rtree - Create a radix tree for one zone.
422 * 491 *
423 * Allocated the mem_zone_bm_rtree structure and initializes it. 492 * Allocated the mem_zone_bm_rtree structure and initializes it.
424 * This function also allocated and builds the radix tree for the 493 * This function also allocated and builds the radix tree for the
425 * zone. 494 * zone.
426 */ 495 */
427static struct mem_zone_bm_rtree * 496static struct mem_zone_bm_rtree *create_zone_bm_rtree(gfp_t gfp_mask,
428create_zone_bm_rtree(gfp_t gfp_mask, int safe_needed, 497 int safe_needed,
429 struct chain_allocator *ca, 498 struct chain_allocator *ca,
430 unsigned long start, unsigned long end) 499 unsigned long start,
500 unsigned long end)
431{ 501{
432 struct mem_zone_bm_rtree *zone; 502 struct mem_zone_bm_rtree *zone;
433 unsigned int i, nr_blocks; 503 unsigned int i, nr_blocks;
@@ -454,12 +524,12 @@ create_zone_bm_rtree(gfp_t gfp_mask, int safe_needed,
454 return zone; 524 return zone;
455} 525}
456 526
457/* 527/**
458 * free_zone_bm_rtree - Free the memory of the radix tree 528 * free_zone_bm_rtree - Free the memory of the radix tree.
459 * 529 *
460 * Free all node pages of the radix tree. The mem_zone_bm_rtree 530 * Free all node pages of the radix tree. The mem_zone_bm_rtree
461 * structure itself is not freed here nor are the rtree_node 531 * structure itself is not freed here nor are the rtree_node
462 * structs. 532 * structs.
463 */ 533 */
464static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone, 534static void free_zone_bm_rtree(struct mem_zone_bm_rtree *zone,
465 int clear_nosave_free) 535 int clear_nosave_free)
@@ -492,8 +562,8 @@ struct mem_extent {
492}; 562};
493 563
494/** 564/**
495 * free_mem_extents - free a list of memory extents 565 * free_mem_extents - Free a list of memory extents.
496 * @list - list of extents to empty 566 * @list: List of extents to free.
497 */ 567 */
498static void free_mem_extents(struct list_head *list) 568static void free_mem_extents(struct list_head *list)
499{ 569{
@@ -506,10 +576,11 @@ static void free_mem_extents(struct list_head *list)
506} 576}
507 577
508/** 578/**
509 * create_mem_extents - create a list of memory extents representing 579 * create_mem_extents - Create a list of memory extents.
510 * contiguous ranges of PFNs 580 * @list: List to put the extents into.
511 * @list - list to put the extents into 581 * @gfp_mask: Mask to use for memory allocations.
512 * @gfp_mask - mask to use for memory allocations 582 *
583 * The extents represent contiguous ranges of PFNs.
513 */ 584 */
514static int create_mem_extents(struct list_head *list, gfp_t gfp_mask) 585static int create_mem_extents(struct list_head *list, gfp_t gfp_mask)
515{ 586{
@@ -565,10 +636,10 @@ static int create_mem_extents(struct list_head *list, gfp_t gfp_mask)
565} 636}
566 637
567/** 638/**
568 * memory_bm_create - allocate memory for a memory bitmap 639 * memory_bm_create - Allocate memory for a memory bitmap.
569 */ 640 */
570static int 641static int memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask,
571memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed) 642 int safe_needed)
572{ 643{
573 struct chain_allocator ca; 644 struct chain_allocator ca;
574 struct list_head mem_extents; 645 struct list_head mem_extents;
@@ -607,8 +678,9 @@ memory_bm_create(struct memory_bitmap *bm, gfp_t gfp_mask, int safe_needed)
607} 678}
608 679
609/** 680/**
610 * memory_bm_free - free memory occupied by the memory bitmap @bm 681 * memory_bm_free - Free memory occupied by the memory bitmap.
611 */ 682 * @bm: Memory bitmap.
683 */
612static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free) 684static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free)
613{ 685{
614 struct mem_zone_bm_rtree *zone; 686 struct mem_zone_bm_rtree *zone;
@@ -622,14 +694,13 @@ static void memory_bm_free(struct memory_bitmap *bm, int clear_nosave_free)
622} 694}
623 695
624/** 696/**
625 * memory_bm_find_bit - Find the bit for pfn in the memory 697 * memory_bm_find_bit - Find the bit for a given PFN in a memory bitmap.
626 * bitmap
627 * 698 *
628 * Find the bit in the bitmap @bm that corresponds to given pfn. 699 * Find the bit in memory bitmap @bm that corresponds to the given PFN.
629 * The cur.zone, cur.block and cur.node_pfn member of @bm are 700 * The cur.zone, cur.block and cur.node_pfn members of @bm are updated.
630 * updated. 701 *
631 * It walks the radix tree to find the page which contains the bit for 702 * Walk the radix tree to find the page containing the bit that represents @pfn
632 * pfn and returns the bit position in **addr and *bit_nr. 703 * and return the position of the bit in @addr and @bit_nr.
633 */ 704 */
634static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn, 705static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
635 void **addr, unsigned int *bit_nr) 706 void **addr, unsigned int *bit_nr)
@@ -658,10 +729,9 @@ static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn,
658 729
659zone_found: 730zone_found:
660 /* 731 /*
661 * We have a zone. Now walk the radix tree to find the leave 732 * We have found the zone. Now walk the radix tree to find the leaf node
662 * node for our pfn. 733 * for our PFN.
663 */ 734 */
664
665 node = bm->cur.node; 735 node = bm->cur.node;
666 if (((pfn - zone->start_pfn) & ~BM_BLOCK_MASK) == bm->cur.node_pfn) 736 if (((pfn - zone->start_pfn) & ~BM_BLOCK_MASK) == bm->cur.node_pfn)
667 goto node_found; 737 goto node_found;
@@ -754,14 +824,14 @@ static bool memory_bm_pfn_present(struct memory_bitmap *bm, unsigned long pfn)
754} 824}
755 825
756/* 826/*
757 * rtree_next_node - Jumps to the next leave node 827 * rtree_next_node - Jump to the next leaf node.
758 * 828 *
759 * Sets the position to the beginning of the next node in the 829 * Set the position to the beginning of the next node in the
760 * memory bitmap. This is either the next node in the current 830 * memory bitmap. This is either the next node in the current
761 * zone's radix tree or the first node in the radix tree of the 831 * zone's radix tree or the first node in the radix tree of the
762 * next zone. 832 * next zone.
763 * 833 *
764 * Returns true if there is a next node, false otherwise. 834 * Return true if there is a next node, false otherwise.
765 */ 835 */
766static bool rtree_next_node(struct memory_bitmap *bm) 836static bool rtree_next_node(struct memory_bitmap *bm)
767{ 837{
@@ -790,14 +860,15 @@ static bool rtree_next_node(struct memory_bitmap *bm)
790} 860}
791 861
792/** 862/**
793 * memory_bm_rtree_next_pfn - Find the next set bit in the bitmap @bm 863 * memory_bm_rtree_next_pfn - Find the next set bit in a memory bitmap.
864 * @bm: Memory bitmap.
794 * 865 *
795 * Starting from the last returned position this function searches 866 * Starting from the last returned position this function searches for the next
796 * for the next set bit in the memory bitmap and returns its 867 * set bit in @bm and returns the PFN represented by it. If no more bits are
797 * number. If no more bit is set BM_END_OF_MAP is returned. 868 * set, BM_END_OF_MAP is returned.
798 * 869 *
799 * It is required to run memory_bm_position_reset() before the 870 * It is required to run memory_bm_position_reset() before the first call to
800 * first call to this function. 871 * this function for the given memory bitmap.
801 */ 872 */
802static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm) 873static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
803{ 874{
@@ -819,11 +890,10 @@ static unsigned long memory_bm_next_pfn(struct memory_bitmap *bm)
819 return BM_END_OF_MAP; 890 return BM_END_OF_MAP;
820} 891}
821 892
822/** 893/*
823 * This structure represents a range of page frames the contents of which 894 * This structure represents a range of page frames the contents of which
824 * should not be saved during the suspend. 895 * should not be saved during hibernation.
825 */ 896 */
826
827struct nosave_region { 897struct nosave_region {
828 struct list_head list; 898 struct list_head list;
829 unsigned long start_pfn; 899 unsigned long start_pfn;
@@ -832,15 +902,42 @@ struct nosave_region {
832 902
833static LIST_HEAD(nosave_regions); 903static LIST_HEAD(nosave_regions);
834 904
905static void recycle_zone_bm_rtree(struct mem_zone_bm_rtree *zone)
906{
907 struct rtree_node *node;
908
909 list_for_each_entry(node, &zone->nodes, list)
910 recycle_safe_page(node->data);
911
912 list_for_each_entry(node, &zone->leaves, list)
913 recycle_safe_page(node->data);
914}
915
916static void memory_bm_recycle(struct memory_bitmap *bm)
917{
918 struct mem_zone_bm_rtree *zone;
919 struct linked_page *p_list;
920
921 list_for_each_entry(zone, &bm->zones, list)
922 recycle_zone_bm_rtree(zone);
923
924 p_list = bm->p_list;
925 while (p_list) {
926 struct linked_page *lp = p_list;
927
928 p_list = lp->next;
929 recycle_safe_page(lp);
930 }
931}
932
835/** 933/**
836 * register_nosave_region - register a range of page frames the contents 934 * register_nosave_region - Register a region of unsaveable memory.
837 * of which should not be saved during the suspend (to be used in the early 935 *
838 * initialization code) 936 * Register a range of page frames the contents of which should not be saved
937 * during hibernation (to be used in the early initialization code).
839 */ 938 */
840 939void __init __register_nosave_region(unsigned long start_pfn,
841void __init 940 unsigned long end_pfn, int use_kmalloc)
842__register_nosave_region(unsigned long start_pfn, unsigned long end_pfn,
843 int use_kmalloc)
844{ 941{
845 struct nosave_region *region; 942 struct nosave_region *region;
846 943
@@ -857,12 +954,13 @@ __register_nosave_region(unsigned long start_pfn, unsigned long end_pfn,
857 } 954 }
858 } 955 }
859 if (use_kmalloc) { 956 if (use_kmalloc) {
860 /* during init, this shouldn't fail */ 957 /* During init, this shouldn't fail */
861 region = kmalloc(sizeof(struct nosave_region), GFP_KERNEL); 958 region = kmalloc(sizeof(struct nosave_region), GFP_KERNEL);
862 BUG_ON(!region); 959 BUG_ON(!region);
863 } else 960 } else {
864 /* This allocation cannot fail */ 961 /* This allocation cannot fail */
865 region = memblock_virt_alloc(sizeof(struct nosave_region), 0); 962 region = memblock_virt_alloc(sizeof(struct nosave_region), 0);
963 }
866 region->start_pfn = start_pfn; 964 region->start_pfn = start_pfn;
867 region->end_pfn = end_pfn; 965 region->end_pfn = end_pfn;
868 list_add_tail(&region->list, &nosave_regions); 966 list_add_tail(&region->list, &nosave_regions);
@@ -923,10 +1021,12 @@ static void swsusp_unset_page_forbidden(struct page *page)
923} 1021}
924 1022
925/** 1023/**
926 * mark_nosave_pages - set bits corresponding to the page frames the 1024 * mark_nosave_pages - Mark pages that should not be saved.
927 * contents of which should not be saved in a given bitmap. 1025 * @bm: Memory bitmap.
1026 *
1027 * Set the bits in @bm that correspond to the page frames the contents of which
1028 * should not be saved.
928 */ 1029 */
929
930static void mark_nosave_pages(struct memory_bitmap *bm) 1030static void mark_nosave_pages(struct memory_bitmap *bm)
931{ 1031{
932 struct nosave_region *region; 1032 struct nosave_region *region;
@@ -956,13 +1056,13 @@ static void mark_nosave_pages(struct memory_bitmap *bm)
956} 1056}
957 1057
958/** 1058/**
959 * create_basic_memory_bitmaps - create bitmaps needed for marking page 1059 * create_basic_memory_bitmaps - Create bitmaps to hold basic page information.
960 * frames that should not be saved and free page frames. The pointers 1060 *
961 * forbidden_pages_map and free_pages_map are only modified if everything 1061 * Create bitmaps needed for marking page frames that should not be saved and
962 * goes well, because we don't want the bits to be used before both bitmaps 1062 * free page frames. The forbidden_pages_map and free_pages_map pointers are
963 * are set up. 1063 * only modified if everything goes well, because we don't want the bits to be
1064 * touched before both bitmaps are set up.
964 */ 1065 */
965
966int create_basic_memory_bitmaps(void) 1066int create_basic_memory_bitmaps(void)
967{ 1067{
968 struct memory_bitmap *bm1, *bm2; 1068 struct memory_bitmap *bm1, *bm2;
@@ -1007,12 +1107,12 @@ int create_basic_memory_bitmaps(void)
1007} 1107}
1008 1108
1009/** 1109/**
1010 * free_basic_memory_bitmaps - free memory bitmaps allocated by 1110 * free_basic_memory_bitmaps - Free memory bitmaps holding basic information.
1011 * create_basic_memory_bitmaps(). The auxiliary pointers are necessary 1111 *
1012 * so that the bitmaps themselves are not referred to while they are being 1112 * Free memory bitmaps allocated by create_basic_memory_bitmaps(). The
1013 * freed. 1113 * auxiliary pointers are necessary so that the bitmaps themselves are not
1114 * referred to while they are being freed.
1014 */ 1115 */
1015
1016void free_basic_memory_bitmaps(void) 1116void free_basic_memory_bitmaps(void)
1017{ 1117{
1018 struct memory_bitmap *bm1, *bm2; 1118 struct memory_bitmap *bm1, *bm2;
@@ -1033,11 +1133,13 @@ void free_basic_memory_bitmaps(void)
1033} 1133}
1034 1134
1035/** 1135/**
1036 * snapshot_additional_pages - estimate the number of additional pages 1136 * snapshot_additional_pages - Estimate the number of extra pages needed.
1037 * be needed for setting up the suspend image data structures for given 1137 * @zone: Memory zone to carry out the computation for.
1038 * zone (usually the returned value is greater than the exact number) 1138 *
1139 * Estimate the number of additional pages needed for setting up a hibernation
1140 * image data structures for @zone (usually, the returned value is greater than
1141 * the exact number).
1039 */ 1142 */
1040
1041unsigned int snapshot_additional_pages(struct zone *zone) 1143unsigned int snapshot_additional_pages(struct zone *zone)
1042{ 1144{
1043 unsigned int rtree, nodes; 1145 unsigned int rtree, nodes;
@@ -1055,10 +1157,10 @@ unsigned int snapshot_additional_pages(struct zone *zone)
1055 1157
1056#ifdef CONFIG_HIGHMEM 1158#ifdef CONFIG_HIGHMEM
1057/** 1159/**
1058 * count_free_highmem_pages - compute the total number of free highmem 1160 * count_free_highmem_pages - Compute the total number of free highmem pages.
1059 * pages, system-wide. 1161 *
1162 * The returned number is system-wide.
1060 */ 1163 */
1061
1062static unsigned int count_free_highmem_pages(void) 1164static unsigned int count_free_highmem_pages(void)
1063{ 1165{
1064 struct zone *zone; 1166 struct zone *zone;
@@ -1072,11 +1174,12 @@ static unsigned int count_free_highmem_pages(void)
1072} 1174}
1073 1175
1074/** 1176/**
1075 * saveable_highmem_page - Determine whether a highmem page should be 1177 * saveable_highmem_page - Check if a highmem page is saveable.
1076 * included in the suspend image.
1077 * 1178 *
1078 * We should save the page if it isn't Nosave or NosaveFree, or Reserved, 1179 * Determine whether a highmem page should be included in a hibernation image.
1079 * and it isn't a part of a free chunk of pages. 1180 *
1181 * We should save the page if it isn't Nosave or NosaveFree, or Reserved,
1182 * and it isn't part of a free chunk of pages.
1080 */ 1183 */
1081static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn) 1184static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
1082{ 1185{
@@ -1102,10 +1205,8 @@ static struct page *saveable_highmem_page(struct zone *zone, unsigned long pfn)
1102} 1205}
1103 1206
1104/** 1207/**
1105 * count_highmem_pages - compute the total number of saveable highmem 1208 * count_highmem_pages - Compute the total number of saveable highmem pages.
1106 * pages.
1107 */ 1209 */
1108
1109static unsigned int count_highmem_pages(void) 1210static unsigned int count_highmem_pages(void)
1110{ 1211{
1111 struct zone *zone; 1212 struct zone *zone;
@@ -1133,12 +1234,14 @@ static inline void *saveable_highmem_page(struct zone *z, unsigned long p)
1133#endif /* CONFIG_HIGHMEM */ 1234#endif /* CONFIG_HIGHMEM */
1134 1235
1135/** 1236/**
1136 * saveable_page - Determine whether a non-highmem page should be included 1237 * saveable_page - Check if the given page is saveable.
1137 * in the suspend image.
1138 * 1238 *
1139 * We should save the page if it isn't Nosave, and is not in the range 1239 * Determine whether a non-highmem page should be included in a hibernation
1140 * of pages statically defined as 'unsaveable', and it isn't a part of 1240 * image.
1141 * a free chunk of pages. 1241 *
1242 * We should save the page if it isn't Nosave, and is not in the range
1243 * of pages statically defined as 'unsaveable', and it isn't part of
1244 * a free chunk of pages.
1142 */ 1245 */
1143static struct page *saveable_page(struct zone *zone, unsigned long pfn) 1246static struct page *saveable_page(struct zone *zone, unsigned long pfn)
1144{ 1247{
@@ -1167,10 +1270,8 @@ static struct page *saveable_page(struct zone *zone, unsigned long pfn)
1167} 1270}
1168 1271
1169/** 1272/**
1170 * count_data_pages - compute the total number of saveable non-highmem 1273 * count_data_pages - Compute the total number of saveable non-highmem pages.
1171 * pages.
1172 */ 1274 */
1173
1174static unsigned int count_data_pages(void) 1275static unsigned int count_data_pages(void)
1175{ 1276{
1176 struct zone *zone; 1277 struct zone *zone;
@@ -1190,7 +1291,8 @@ static unsigned int count_data_pages(void)
1190 return n; 1291 return n;
1191} 1292}
1192 1293
1193/* This is needed, because copy_page and memcpy are not usable for copying 1294/*
1295 * This is needed, because copy_page and memcpy are not usable for copying
1194 * task structs. 1296 * task structs.
1195 */ 1297 */
1196static inline void do_copy_page(long *dst, long *src) 1298static inline void do_copy_page(long *dst, long *src)
@@ -1201,12 +1303,12 @@ static inline void do_copy_page(long *dst, long *src)
1201 *dst++ = *src++; 1303 *dst++ = *src++;
1202} 1304}
1203 1305
1204
1205/** 1306/**
1206 * safe_copy_page - check if the page we are going to copy is marked as 1307 * safe_copy_page - Copy a page in a safe way.
1207 * present in the kernel page tables (this always is the case if 1308 *
1208 * CONFIG_DEBUG_PAGEALLOC is not set and in that case 1309 * Check if the page we are going to copy is marked as present in the kernel
1209 * kernel_page_present() always returns 'true'). 1310 * page tables (this always is the case if CONFIG_DEBUG_PAGEALLOC is not set
1311 * and in that case kernel_page_present() always returns 'true').
1210 */ 1312 */
1211static void safe_copy_page(void *dst, struct page *s_page) 1313static void safe_copy_page(void *dst, struct page *s_page)
1212{ 1314{
@@ -1219,10 +1321,8 @@ static void safe_copy_page(void *dst, struct page *s_page)
1219 } 1321 }
1220} 1322}
1221 1323
1222
1223#ifdef CONFIG_HIGHMEM 1324#ifdef CONFIG_HIGHMEM
1224static inline struct page * 1325static inline struct page *page_is_saveable(struct zone *zone, unsigned long pfn)
1225page_is_saveable(struct zone *zone, unsigned long pfn)
1226{ 1326{
1227 return is_highmem(zone) ? 1327 return is_highmem(zone) ?
1228 saveable_highmem_page(zone, pfn) : saveable_page(zone, pfn); 1328 saveable_highmem_page(zone, pfn) : saveable_page(zone, pfn);
@@ -1243,7 +1343,8 @@ static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
1243 kunmap_atomic(src); 1343 kunmap_atomic(src);
1244 } else { 1344 } else {
1245 if (PageHighMem(d_page)) { 1345 if (PageHighMem(d_page)) {
1246 /* Page pointed to by src may contain some kernel 1346 /*
1347 * The page pointed to by src may contain some kernel
1247 * data modified by kmap_atomic() 1348 * data modified by kmap_atomic()
1248 */ 1349 */
1249 safe_copy_page(buffer, s_page); 1350 safe_copy_page(buffer, s_page);
@@ -1265,8 +1366,8 @@ static inline void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
1265} 1366}
1266#endif /* CONFIG_HIGHMEM */ 1367#endif /* CONFIG_HIGHMEM */
1267 1368
1268static void 1369static void copy_data_pages(struct memory_bitmap *copy_bm,
1269copy_data_pages(struct memory_bitmap *copy_bm, struct memory_bitmap *orig_bm) 1370 struct memory_bitmap *orig_bm)
1270{ 1371{
1271 struct zone *zone; 1372 struct zone *zone;
1272 unsigned long pfn; 1373 unsigned long pfn;
@@ -1315,12 +1416,11 @@ static struct memory_bitmap orig_bm;
1315static struct memory_bitmap copy_bm; 1416static struct memory_bitmap copy_bm;
1316 1417
1317/** 1418/**
1318 * swsusp_free - free pages allocated for the suspend. 1419 * swsusp_free - Free pages allocated for hibernation image.
1319 * 1420 *
1320 * Suspend pages are alocated before the atomic copy is made, so we 1421 * Image pages are alocated before snapshot creation, so they need to be
1321 * need to release them after the resume. 1422 * released after resume.
1322 */ 1423 */
1323
1324void swsusp_free(void) 1424void swsusp_free(void)
1325{ 1425{
1326 unsigned long fb_pfn, fr_pfn; 1426 unsigned long fb_pfn, fr_pfn;
@@ -1351,6 +1451,7 @@ loop:
1351 1451
1352 memory_bm_clear_current(forbidden_pages_map); 1452 memory_bm_clear_current(forbidden_pages_map);
1353 memory_bm_clear_current(free_pages_map); 1453 memory_bm_clear_current(free_pages_map);
1454 hibernate_restore_unprotect_page(page_address(page));
1354 __free_page(page); 1455 __free_page(page);
1355 goto loop; 1456 goto loop;
1356 } 1457 }
@@ -1362,6 +1463,7 @@ out:
1362 buffer = NULL; 1463 buffer = NULL;
1363 alloc_normal = 0; 1464 alloc_normal = 0;
1364 alloc_highmem = 0; 1465 alloc_highmem = 0;
1466 hibernate_restore_protection_end();
1365} 1467}
1366 1468
1367/* Helper functions used for the shrinking of memory. */ 1469/* Helper functions used for the shrinking of memory. */
@@ -1369,7 +1471,7 @@ out:
1369#define GFP_IMAGE (GFP_KERNEL | __GFP_NOWARN) 1471#define GFP_IMAGE (GFP_KERNEL | __GFP_NOWARN)
1370 1472
1371/** 1473/**
1372 * preallocate_image_pages - Allocate a number of pages for hibernation image 1474 * preallocate_image_pages - Allocate a number of pages for hibernation image.
1373 * @nr_pages: Number of page frames to allocate. 1475 * @nr_pages: Number of page frames to allocate.
1374 * @mask: GFP flags to use for the allocation. 1476 * @mask: GFP flags to use for the allocation.
1375 * 1477 *
@@ -1419,7 +1521,7 @@ static unsigned long preallocate_image_highmem(unsigned long nr_pages)
1419} 1521}
1420 1522
1421/** 1523/**
1422 * __fraction - Compute (an approximation of) x * (multiplier / base) 1524 * __fraction - Compute (an approximation of) x * (multiplier / base).
1423 */ 1525 */
1424static unsigned long __fraction(u64 x, u64 multiplier, u64 base) 1526static unsigned long __fraction(u64 x, u64 multiplier, u64 base)
1425{ 1527{
@@ -1429,8 +1531,8 @@ static unsigned long __fraction(u64 x, u64 multiplier, u64 base)
1429} 1531}
1430 1532
1431static unsigned long preallocate_highmem_fraction(unsigned long nr_pages, 1533static unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
1432 unsigned long highmem, 1534 unsigned long highmem,
1433 unsigned long total) 1535 unsigned long total)
1434{ 1536{
1435 unsigned long alloc = __fraction(nr_pages, highmem, total); 1537 unsigned long alloc = __fraction(nr_pages, highmem, total);
1436 1538
@@ -1443,15 +1545,15 @@ static inline unsigned long preallocate_image_highmem(unsigned long nr_pages)
1443} 1545}
1444 1546
1445static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages, 1547static inline unsigned long preallocate_highmem_fraction(unsigned long nr_pages,
1446 unsigned long highmem, 1548 unsigned long highmem,
1447 unsigned long total) 1549 unsigned long total)
1448{ 1550{
1449 return 0; 1551 return 0;
1450} 1552}
1451#endif /* CONFIG_HIGHMEM */ 1553#endif /* CONFIG_HIGHMEM */
1452 1554
1453/** 1555/**
1454 * free_unnecessary_pages - Release preallocated pages not needed for the image 1556 * free_unnecessary_pages - Release preallocated pages not needed for the image.
1455 */ 1557 */
1456static unsigned long free_unnecessary_pages(void) 1558static unsigned long free_unnecessary_pages(void)
1457{ 1559{
@@ -1505,7 +1607,7 @@ static unsigned long free_unnecessary_pages(void)
1505} 1607}
1506 1608
1507/** 1609/**
1508 * minimum_image_size - Estimate the minimum acceptable size of an image 1610 * minimum_image_size - Estimate the minimum acceptable size of an image.
1509 * @saveable: Number of saveable pages in the system. 1611 * @saveable: Number of saveable pages in the system.
1510 * 1612 *
1511 * We want to avoid attempting to free too much memory too hard, so estimate the 1613 * We want to avoid attempting to free too much memory too hard, so estimate the
@@ -1535,7 +1637,7 @@ static unsigned long minimum_image_size(unsigned long saveable)
1535} 1637}
1536 1638
1537/** 1639/**
1538 * hibernate_preallocate_memory - Preallocate memory for hibernation image 1640 * hibernate_preallocate_memory - Preallocate memory for hibernation image.
1539 * 1641 *
1540 * To create a hibernation image it is necessary to make a copy of every page 1642 * To create a hibernation image it is necessary to make a copy of every page
1541 * frame in use. We also need a number of page frames to be free during 1643 * frame in use. We also need a number of page frames to be free during
@@ -1708,10 +1810,11 @@ int hibernate_preallocate_memory(void)
1708 1810
1709#ifdef CONFIG_HIGHMEM 1811#ifdef CONFIG_HIGHMEM
1710/** 1812/**
1711 * count_pages_for_highmem - compute the number of non-highmem pages 1813 * count_pages_for_highmem - Count non-highmem pages needed for copying highmem.
1712 * that will be necessary for creating copies of highmem pages. 1814 *
1713 */ 1815 * Compute the number of non-highmem pages that will be necessary for creating
1714 1816 * copies of highmem pages.
1817 */
1715static unsigned int count_pages_for_highmem(unsigned int nr_highmem) 1818static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
1716{ 1819{
1717 unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem; 1820 unsigned int free_highmem = count_free_highmem_pages() + alloc_highmem;
@@ -1724,15 +1827,12 @@ static unsigned int count_pages_for_highmem(unsigned int nr_highmem)
1724 return nr_highmem; 1827 return nr_highmem;
1725} 1828}
1726#else 1829#else
1727static unsigned int 1830static unsigned int count_pages_for_highmem(unsigned int nr_highmem) { return 0; }
1728count_pages_for_highmem(unsigned int nr_highmem) { return 0; }
1729#endif /* CONFIG_HIGHMEM */ 1831#endif /* CONFIG_HIGHMEM */
1730 1832
1731/** 1833/**
1732 * enough_free_mem - Make sure we have enough free memory for the 1834 * enough_free_mem - Check if there is enough free memory for the image.
1733 * snapshot image.
1734 */ 1835 */
1735
1736static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem) 1836static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
1737{ 1837{
1738 struct zone *zone; 1838 struct zone *zone;
@@ -1751,10 +1851,11 @@ static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
1751 1851
1752#ifdef CONFIG_HIGHMEM 1852#ifdef CONFIG_HIGHMEM
1753/** 1853/**
1754 * get_highmem_buffer - if there are some highmem pages in the suspend 1854 * get_highmem_buffer - Allocate a buffer for highmem pages.
1755 * image, we may need the buffer to copy them and/or load their data. 1855 *
1856 * If there are some highmem pages in the hibernation image, we may need a
1857 * buffer to copy them and/or load their data.
1756 */ 1858 */
1757
1758static inline int get_highmem_buffer(int safe_needed) 1859static inline int get_highmem_buffer(int safe_needed)
1759{ 1860{
1760 buffer = get_image_page(GFP_ATOMIC | __GFP_COLD, safe_needed); 1861 buffer = get_image_page(GFP_ATOMIC | __GFP_COLD, safe_needed);
@@ -1762,13 +1863,13 @@ static inline int get_highmem_buffer(int safe_needed)
1762} 1863}
1763 1864
1764/** 1865/**
1765 * alloc_highmem_image_pages - allocate some highmem pages for the image. 1866 * alloc_highmem_image_pages - Allocate some highmem pages for the image.
1766 * Try to allocate as many pages as needed, but if the number of free 1867 *
1767 * highmem pages is lesser than that, allocate them all. 1868 * Try to allocate as many pages as needed, but if the number of free highmem
1869 * pages is less than that, allocate them all.
1768 */ 1870 */
1769 1871static inline unsigned int alloc_highmem_pages(struct memory_bitmap *bm,
1770static inline unsigned int 1872 unsigned int nr_highmem)
1771alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
1772{ 1873{
1773 unsigned int to_alloc = count_free_highmem_pages(); 1874 unsigned int to_alloc = count_free_highmem_pages();
1774 1875
@@ -1787,25 +1888,24 @@ alloc_highmem_pages(struct memory_bitmap *bm, unsigned int nr_highmem)
1787#else 1888#else
1788static inline int get_highmem_buffer(int safe_needed) { return 0; } 1889static inline int get_highmem_buffer(int safe_needed) { return 0; }
1789 1890
1790static inline unsigned int 1891static inline unsigned int alloc_highmem_pages(struct memory_bitmap *bm,
1791alloc_highmem_pages(struct memory_bitmap *bm, unsigned int n) { return 0; } 1892 unsigned int n) { return 0; }
1792#endif /* CONFIG_HIGHMEM */ 1893#endif /* CONFIG_HIGHMEM */
1793 1894
1794/** 1895/**
1795 * swsusp_alloc - allocate memory for the suspend image 1896 * swsusp_alloc - Allocate memory for hibernation image.
1796 * 1897 *
1797 * We first try to allocate as many highmem pages as there are 1898 * We first try to allocate as many highmem pages as there are
1798 * saveable highmem pages in the system. If that fails, we allocate 1899 * saveable highmem pages in the system. If that fails, we allocate
1799 * non-highmem pages for the copies of the remaining highmem ones. 1900 * non-highmem pages for the copies of the remaining highmem ones.
1800 * 1901 *
1801 * In this approach it is likely that the copies of highmem pages will 1902 * In this approach it is likely that the copies of highmem pages will
1802 * also be located in the high memory, because of the way in which 1903 * also be located in the high memory, because of the way in which
1803 * copy_data_pages() works. 1904 * copy_data_pages() works.
1804 */ 1905 */
1805 1906static int swsusp_alloc(struct memory_bitmap *orig_bm,
1806static int 1907 struct memory_bitmap *copy_bm,
1807swsusp_alloc(struct memory_bitmap *orig_bm, struct memory_bitmap *copy_bm, 1908 unsigned int nr_pages, unsigned int nr_highmem)
1808 unsigned int nr_pages, unsigned int nr_highmem)
1809{ 1909{
1810 if (nr_highmem > 0) { 1910 if (nr_highmem > 0) {
1811 if (get_highmem_buffer(PG_ANY)) 1911 if (get_highmem_buffer(PG_ANY))
@@ -1855,7 +1955,8 @@ asmlinkage __visible int swsusp_save(void)
1855 return -ENOMEM; 1955 return -ENOMEM;
1856 } 1956 }
1857 1957
1858 /* During allocating of suspend pagedir, new cold pages may appear. 1958 /*
1959 * During allocating of suspend pagedir, new cold pages may appear.
1859 * Kill them. 1960 * Kill them.
1860 */ 1961 */
1861 drain_local_pages(NULL); 1962 drain_local_pages(NULL);
@@ -1918,12 +2019,14 @@ static int init_header(struct swsusp_info *info)
1918} 2019}
1919 2020
1920/** 2021/**
1921 * pack_pfns - pfns corresponding to the set bits found in the bitmap @bm 2022 * pack_pfns - Prepare PFNs for saving.
1922 * are stored in the array @buf[] (1 page at a time) 2023 * @bm: Memory bitmap.
2024 * @buf: Memory buffer to store the PFNs in.
2025 *
2026 * PFNs corresponding to set bits in @bm are stored in the area of memory
2027 * pointed to by @buf (1 page at a time).
1923 */ 2028 */
1924 2029static inline void pack_pfns(unsigned long *buf, struct memory_bitmap *bm)
1925static inline void
1926pack_pfns(unsigned long *buf, struct memory_bitmap *bm)
1927{ 2030{
1928 int j; 2031 int j;
1929 2032
@@ -1937,22 +2040,21 @@ pack_pfns(unsigned long *buf, struct memory_bitmap *bm)
1937} 2040}
1938 2041
1939/** 2042/**
1940 * snapshot_read_next - used for reading the system memory snapshot. 2043 * snapshot_read_next - Get the address to read the next image page from.
2044 * @handle: Snapshot handle to be used for the reading.
1941 * 2045 *
1942 * On the first call to it @handle should point to a zeroed 2046 * On the first call, @handle should point to a zeroed snapshot_handle
1943 * snapshot_handle structure. The structure gets updated and a pointer 2047 * structure. The structure gets populated then and a pointer to it should be
1944 * to it should be passed to this function every next time. 2048 * passed to this function every next time.
1945 * 2049 *
1946 * On success the function returns a positive number. Then, the caller 2050 * On success, the function returns a positive number. Then, the caller
1947 * is allowed to read up to the returned number of bytes from the memory 2051 * is allowed to read up to the returned number of bytes from the memory
1948 * location computed by the data_of() macro. 2052 * location computed by the data_of() macro.
1949 * 2053 *
1950 * The function returns 0 to indicate the end of data stream condition, 2054 * The function returns 0 to indicate the end of the data stream condition,
1951 * and a negative number is returned on error. In such cases the 2055 * and negative numbers are returned on errors. If that happens, the structure
1952 * structure pointed to by @handle is not updated and should not be used 2056 * pointed to by @handle is not updated and should not be used any more.
1953 * any more.
1954 */ 2057 */
1955
1956int snapshot_read_next(struct snapshot_handle *handle) 2058int snapshot_read_next(struct snapshot_handle *handle)
1957{ 2059{
1958 if (handle->cur > nr_meta_pages + nr_copy_pages) 2060 if (handle->cur > nr_meta_pages + nr_copy_pages)
@@ -1981,7 +2083,8 @@ int snapshot_read_next(struct snapshot_handle *handle)
1981 2083
1982 page = pfn_to_page(memory_bm_next_pfn(&copy_bm)); 2084 page = pfn_to_page(memory_bm_next_pfn(&copy_bm));
1983 if (PageHighMem(page)) { 2085 if (PageHighMem(page)) {
1984 /* Highmem pages are copied to the buffer, 2086 /*
2087 * Highmem pages are copied to the buffer,
1985 * because we can't return with a kmapped 2088 * because we can't return with a kmapped
1986 * highmem page (we may not be called again). 2089 * highmem page (we may not be called again).
1987 */ 2090 */
@@ -1999,53 +2102,41 @@ int snapshot_read_next(struct snapshot_handle *handle)
1999 return PAGE_SIZE; 2102 return PAGE_SIZE;
2000} 2103}
2001 2104
2002/** 2105static void duplicate_memory_bitmap(struct memory_bitmap *dst,
2003 * mark_unsafe_pages - mark the pages that cannot be used for storing 2106 struct memory_bitmap *src)
2004 * the image during resume, because they conflict with the pages that
2005 * had been used before suspend
2006 */
2007
2008static int mark_unsafe_pages(struct memory_bitmap *bm)
2009{ 2107{
2010 struct zone *zone; 2108 unsigned long pfn;
2011 unsigned long pfn, max_zone_pfn;
2012 2109
2013 /* Clear page flags */ 2110 memory_bm_position_reset(src);
2014 for_each_populated_zone(zone) { 2111 pfn = memory_bm_next_pfn(src);
2015 max_zone_pfn = zone_end_pfn(zone); 2112 while (pfn != BM_END_OF_MAP) {
2016 for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) 2113 memory_bm_set_bit(dst, pfn);
2017 if (pfn_valid(pfn)) 2114 pfn = memory_bm_next_pfn(src);
2018 swsusp_unset_page_free(pfn_to_page(pfn));
2019 } 2115 }
2020
2021 /* Mark pages that correspond to the "original" pfns as "unsafe" */
2022 memory_bm_position_reset(bm);
2023 do {
2024 pfn = memory_bm_next_pfn(bm);
2025 if (likely(pfn != BM_END_OF_MAP)) {
2026 if (likely(pfn_valid(pfn)))
2027 swsusp_set_page_free(pfn_to_page(pfn));
2028 else
2029 return -EFAULT;
2030 }
2031 } while (pfn != BM_END_OF_MAP);
2032
2033 allocated_unsafe_pages = 0;
2034
2035 return 0;
2036} 2116}
2037 2117
2038static void 2118/**
2039duplicate_memory_bitmap(struct memory_bitmap *dst, struct memory_bitmap *src) 2119 * mark_unsafe_pages - Mark pages that were used before hibernation.
2120 *
2121 * Mark the pages that cannot be used for storing the image during restoration,
2122 * because they conflict with the pages that had been used before hibernation.
2123 */
2124static void mark_unsafe_pages(struct memory_bitmap *bm)
2040{ 2125{
2041 unsigned long pfn; 2126 unsigned long pfn;
2042 2127
2043 memory_bm_position_reset(src); 2128 /* Clear the "free"/"unsafe" bit for all PFNs */
2044 pfn = memory_bm_next_pfn(src); 2129 memory_bm_position_reset(free_pages_map);
2130 pfn = memory_bm_next_pfn(free_pages_map);
2045 while (pfn != BM_END_OF_MAP) { 2131 while (pfn != BM_END_OF_MAP) {
2046 memory_bm_set_bit(dst, pfn); 2132 memory_bm_clear_current(free_pages_map);
2047 pfn = memory_bm_next_pfn(src); 2133 pfn = memory_bm_next_pfn(free_pages_map);
2048 } 2134 }
2135
2136 /* Mark pages that correspond to the "original" PFNs as "unsafe" */
2137 duplicate_memory_bitmap(free_pages_map, bm);
2138
2139 allocated_unsafe_pages = 0;
2049} 2140}
2050 2141
2051static int check_header(struct swsusp_info *info) 2142static int check_header(struct swsusp_info *info)
@@ -2063,11 +2154,9 @@ static int check_header(struct swsusp_info *info)
2063} 2154}
2064 2155
2065/** 2156/**
2066 * load header - check the image header and copy data from it 2157 * load header - Check the image header and copy the data from it.
2067 */ 2158 */
2068 2159static int load_header(struct swsusp_info *info)
2069static int
2070load_header(struct swsusp_info *info)
2071{ 2160{
2072 int error; 2161 int error;
2073 2162
@@ -2081,8 +2170,12 @@ load_header(struct swsusp_info *info)
2081} 2170}
2082 2171
2083/** 2172/**
2084 * unpack_orig_pfns - for each element of @buf[] (1 page at a time) set 2173 * unpack_orig_pfns - Set bits corresponding to given PFNs in a memory bitmap.
2085 * the corresponding bit in the memory bitmap @bm 2174 * @bm: Memory bitmap.
2175 * @buf: Area of memory containing the PFNs.
2176 *
2177 * For each element of the array pointed to by @buf (1 page at a time), set the
2178 * corresponding bit in @bm.
2086 */ 2179 */
2087static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm) 2180static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
2088{ 2181{
@@ -2095,7 +2188,7 @@ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
2095 /* Extract and buffer page key for data page (s390 only). */ 2188 /* Extract and buffer page key for data page (s390 only). */
2096 page_key_memorize(buf + j); 2189 page_key_memorize(buf + j);
2097 2190
2098 if (memory_bm_pfn_present(bm, buf[j])) 2191 if (pfn_valid(buf[j]) && memory_bm_pfn_present(bm, buf[j]))
2099 memory_bm_set_bit(bm, buf[j]); 2192 memory_bm_set_bit(bm, buf[j]);
2100 else 2193 else
2101 return -EFAULT; 2194 return -EFAULT;
@@ -2104,13 +2197,9 @@ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm)
2104 return 0; 2197 return 0;
2105} 2198}
2106 2199
2107/* List of "safe" pages that may be used to store data loaded from the suspend
2108 * image
2109 */
2110static struct linked_page *safe_pages_list;
2111
2112#ifdef CONFIG_HIGHMEM 2200#ifdef CONFIG_HIGHMEM
2113/* struct highmem_pbe is used for creating the list of highmem pages that 2201/*
2202 * struct highmem_pbe is used for creating the list of highmem pages that
2114 * should be restored atomically during the resume from disk, because the page 2203 * should be restored atomically during the resume from disk, because the page
2115 * frames they have occupied before the suspend are in use. 2204 * frames they have occupied before the suspend are in use.
2116 */ 2205 */
@@ -2120,7 +2209,8 @@ struct highmem_pbe {
2120 struct highmem_pbe *next; 2209 struct highmem_pbe *next;
2121}; 2210};
2122 2211
2123/* List of highmem PBEs needed for restoring the highmem pages that were 2212/*
2213 * List of highmem PBEs needed for restoring the highmem pages that were
2124 * allocated before the suspend and included in the suspend image, but have 2214 * allocated before the suspend and included in the suspend image, but have
2125 * also been allocated by the "resume" kernel, so their contents cannot be 2215 * also been allocated by the "resume" kernel, so their contents cannot be
2126 * written directly to their "original" page frames. 2216 * written directly to their "original" page frames.
@@ -2128,11 +2218,11 @@ struct highmem_pbe {
2128static struct highmem_pbe *highmem_pblist; 2218static struct highmem_pbe *highmem_pblist;
2129 2219
2130/** 2220/**
2131 * count_highmem_image_pages - compute the number of highmem pages in the 2221 * count_highmem_image_pages - Compute the number of highmem pages in the image.
2132 * suspend image. The bits in the memory bitmap @bm that correspond to the 2222 * @bm: Memory bitmap.
2133 * image pages are assumed to be set. 2223 *
2224 * The bits in @bm that correspond to image pages are assumed to be set.
2134 */ 2225 */
2135
2136static unsigned int count_highmem_image_pages(struct memory_bitmap *bm) 2226static unsigned int count_highmem_image_pages(struct memory_bitmap *bm)
2137{ 2227{
2138 unsigned long pfn; 2228 unsigned long pfn;
@@ -2149,24 +2239,25 @@ static unsigned int count_highmem_image_pages(struct memory_bitmap *bm)
2149 return cnt; 2239 return cnt;
2150} 2240}
2151 2241
2152/**
2153 * prepare_highmem_image - try to allocate as many highmem pages as
2154 * there are highmem image pages (@nr_highmem_p points to the variable
2155 * containing the number of highmem image pages). The pages that are
2156 * "safe" (ie. will not be overwritten when the suspend image is
2157 * restored) have the corresponding bits set in @bm (it must be
2158 * unitialized).
2159 *
2160 * NOTE: This function should not be called if there are no highmem
2161 * image pages.
2162 */
2163
2164static unsigned int safe_highmem_pages; 2242static unsigned int safe_highmem_pages;
2165 2243
2166static struct memory_bitmap *safe_highmem_bm; 2244static struct memory_bitmap *safe_highmem_bm;
2167 2245
2168static int 2246/**
2169prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p) 2247 * prepare_highmem_image - Allocate memory for loading highmem data from image.
2248 * @bm: Pointer to an uninitialized memory bitmap structure.
2249 * @nr_highmem_p: Pointer to the number of highmem image pages.
2250 *
2251 * Try to allocate as many highmem pages as there are highmem image pages
2252 * (@nr_highmem_p points to the variable containing the number of highmem image
2253 * pages). The pages that are "safe" (ie. will not be overwritten when the
2254 * hibernation image is restored entirely) have the corresponding bits set in
2255 * @bm (it must be unitialized).
2256 *
2257 * NOTE: This function should not be called if there are no highmem image pages.
2258 */
2259static int prepare_highmem_image(struct memory_bitmap *bm,
2260 unsigned int *nr_highmem_p)
2170{ 2261{
2171 unsigned int to_alloc; 2262 unsigned int to_alloc;
2172 2263
@@ -2201,39 +2292,42 @@ prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p)
2201 return 0; 2292 return 0;
2202} 2293}
2203 2294
2295static struct page *last_highmem_page;
2296
2204/** 2297/**
2205 * get_highmem_page_buffer - for given highmem image page find the buffer 2298 * get_highmem_page_buffer - Prepare a buffer to store a highmem image page.
2206 * that suspend_write_next() should set for its caller to write to.
2207 * 2299 *
2208 * If the page is to be saved to its "original" page frame or a copy of 2300 * For a given highmem image page get a buffer that suspend_write_next() should
2209 * the page is to be made in the highmem, @buffer is returned. Otherwise, 2301 * return to its caller to write to.
2210 * the copy of the page is to be made in normal memory, so the address of
2211 * the copy is returned.
2212 * 2302 *
2213 * If @buffer is returned, the caller of suspend_write_next() will write 2303 * If the page is to be saved to its "original" page frame or a copy of
2214 * the page's contents to @buffer, so they will have to be copied to the 2304 * the page is to be made in the highmem, @buffer is returned. Otherwise,
2215 * right location on the next call to suspend_write_next() and it is done 2305 * the copy of the page is to be made in normal memory, so the address of
2216 * with the help of copy_last_highmem_page(). For this purpose, if 2306 * the copy is returned.
2217 * @buffer is returned, @last_highmem page is set to the page to which 2307 *
2218 * the data will have to be copied from @buffer. 2308 * If @buffer is returned, the caller of suspend_write_next() will write
2309 * the page's contents to @buffer, so they will have to be copied to the
2310 * right location on the next call to suspend_write_next() and it is done
2311 * with the help of copy_last_highmem_page(). For this purpose, if
2312 * @buffer is returned, @last_highmem_page is set to the page to which
2313 * the data will have to be copied from @buffer.
2219 */ 2314 */
2220 2315static void *get_highmem_page_buffer(struct page *page,
2221static struct page *last_highmem_page; 2316 struct chain_allocator *ca)
2222
2223static void *
2224get_highmem_page_buffer(struct page *page, struct chain_allocator *ca)
2225{ 2317{
2226 struct highmem_pbe *pbe; 2318 struct highmem_pbe *pbe;
2227 void *kaddr; 2319 void *kaddr;
2228 2320
2229 if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) { 2321 if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) {
2230 /* We have allocated the "original" page frame and we can 2322 /*
2323 * We have allocated the "original" page frame and we can
2231 * use it directly to store the loaded page. 2324 * use it directly to store the loaded page.
2232 */ 2325 */
2233 last_highmem_page = page; 2326 last_highmem_page = page;
2234 return buffer; 2327 return buffer;
2235 } 2328 }
2236 /* The "original" page frame has not been allocated and we have to 2329 /*
2330 * The "original" page frame has not been allocated and we have to
2237 * use a "safe" page frame to store the loaded page. 2331 * use a "safe" page frame to store the loaded page.
2238 */ 2332 */
2239 pbe = chain_alloc(ca, sizeof(struct highmem_pbe)); 2333 pbe = chain_alloc(ca, sizeof(struct highmem_pbe));
@@ -2263,11 +2357,12 @@ get_highmem_page_buffer(struct page *page, struct chain_allocator *ca)
2263} 2357}
2264 2358
2265/** 2359/**
2266 * copy_last_highmem_page - copy the contents of a highmem image from 2360 * copy_last_highmem_page - Copy most the most recent highmem image page.
2267 * @buffer, where the caller of snapshot_write_next() has place them, 2361 *
2268 * to the right location represented by @last_highmem_page . 2362 * Copy the contents of a highmem image from @buffer, where the caller of
2363 * snapshot_write_next() has stored them, to the right location represented by
2364 * @last_highmem_page .
2269 */ 2365 */
2270
2271static void copy_last_highmem_page(void) 2366static void copy_last_highmem_page(void)
2272{ 2367{
2273 if (last_highmem_page) { 2368 if (last_highmem_page) {
@@ -2294,17 +2389,13 @@ static inline void free_highmem_data(void)
2294 free_image_page(buffer, PG_UNSAFE_CLEAR); 2389 free_image_page(buffer, PG_UNSAFE_CLEAR);
2295} 2390}
2296#else 2391#else
2297static unsigned int 2392static unsigned int count_highmem_image_pages(struct memory_bitmap *bm) { return 0; }
2298count_highmem_image_pages(struct memory_bitmap *bm) { return 0; }
2299 2393
2300static inline int 2394static inline int prepare_highmem_image(struct memory_bitmap *bm,
2301prepare_highmem_image(struct memory_bitmap *bm, unsigned int *nr_highmem_p) 2395 unsigned int *nr_highmem_p) { return 0; }
2302{
2303 return 0;
2304}
2305 2396
2306static inline void * 2397static inline void *get_highmem_page_buffer(struct page *page,
2307get_highmem_page_buffer(struct page *page, struct chain_allocator *ca) 2398 struct chain_allocator *ca)
2308{ 2399{
2309 return ERR_PTR(-EINVAL); 2400 return ERR_PTR(-EINVAL);
2310} 2401}
@@ -2314,27 +2405,27 @@ static inline int last_highmem_page_copied(void) { return 1; }
2314static inline void free_highmem_data(void) {} 2405static inline void free_highmem_data(void) {}
2315#endif /* CONFIG_HIGHMEM */ 2406#endif /* CONFIG_HIGHMEM */
2316 2407
2408#define PBES_PER_LINKED_PAGE (LINKED_PAGE_DATA_SIZE / sizeof(struct pbe))
2409
2317/** 2410/**
2318 * prepare_image - use the memory bitmap @bm to mark the pages that will 2411 * prepare_image - Make room for loading hibernation image.
2319 * be overwritten in the process of restoring the system memory state 2412 * @new_bm: Unitialized memory bitmap structure.
2320 * from the suspend image ("unsafe" pages) and allocate memory for the 2413 * @bm: Memory bitmap with unsafe pages marked.
2321 * image. 2414 *
2415 * Use @bm to mark the pages that will be overwritten in the process of
2416 * restoring the system memory state from the suspend image ("unsafe" pages)
2417 * and allocate memory for the image.
2322 * 2418 *
2323 * The idea is to allocate a new memory bitmap first and then allocate 2419 * The idea is to allocate a new memory bitmap first and then allocate
2324 * as many pages as needed for the image data, but not to assign these 2420 * as many pages as needed for image data, but without specifying what those
2325 * pages to specific tasks initially. Instead, we just mark them as 2421 * pages will be used for just yet. Instead, we mark them all as allocated and
2326 * allocated and create a lists of "safe" pages that will be used 2422 * create a lists of "safe" pages to be used later. On systems with high
2327 * later. On systems with high memory a list of "safe" highmem pages is 2423 * memory a list of "safe" highmem pages is created too.
2328 * also created.
2329 */ 2424 */
2330 2425static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2331#define PBES_PER_LINKED_PAGE (LINKED_PAGE_DATA_SIZE / sizeof(struct pbe))
2332
2333static int
2334prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2335{ 2426{
2336 unsigned int nr_pages, nr_highmem; 2427 unsigned int nr_pages, nr_highmem;
2337 struct linked_page *sp_list, *lp; 2428 struct linked_page *lp;
2338 int error; 2429 int error;
2339 2430
2340 /* If there is no highmem, the buffer will not be necessary */ 2431 /* If there is no highmem, the buffer will not be necessary */
@@ -2342,9 +2433,7 @@ prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2342 buffer = NULL; 2433 buffer = NULL;
2343 2434
2344 nr_highmem = count_highmem_image_pages(bm); 2435 nr_highmem = count_highmem_image_pages(bm);
2345 error = mark_unsafe_pages(bm); 2436 mark_unsafe_pages(bm);
2346 if (error)
2347 goto Free;
2348 2437
2349 error = memory_bm_create(new_bm, GFP_ATOMIC, PG_SAFE); 2438 error = memory_bm_create(new_bm, GFP_ATOMIC, PG_SAFE);
2350 if (error) 2439 if (error)
@@ -2357,14 +2446,15 @@ prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2357 if (error) 2446 if (error)
2358 goto Free; 2447 goto Free;
2359 } 2448 }
2360 /* Reserve some safe pages for potential later use. 2449 /*
2450 * Reserve some safe pages for potential later use.
2361 * 2451 *
2362 * NOTE: This way we make sure there will be enough safe pages for the 2452 * NOTE: This way we make sure there will be enough safe pages for the
2363 * chain_alloc() in get_buffer(). It is a bit wasteful, but 2453 * chain_alloc() in get_buffer(). It is a bit wasteful, but
2364 * nr_copy_pages cannot be greater than 50% of the memory anyway. 2454 * nr_copy_pages cannot be greater than 50% of the memory anyway.
2455 *
2456 * nr_copy_pages cannot be less than allocated_unsafe_pages too.
2365 */ 2457 */
2366 sp_list = NULL;
2367 /* nr_copy_pages cannot be lesser than allocated_unsafe_pages */
2368 nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages; 2458 nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
2369 nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE); 2459 nr_pages = DIV_ROUND_UP(nr_pages, PBES_PER_LINKED_PAGE);
2370 while (nr_pages > 0) { 2460 while (nr_pages > 0) {
@@ -2373,12 +2463,11 @@ prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2373 error = -ENOMEM; 2463 error = -ENOMEM;
2374 goto Free; 2464 goto Free;
2375 } 2465 }
2376 lp->next = sp_list; 2466 lp->next = safe_pages_list;
2377 sp_list = lp; 2467 safe_pages_list = lp;
2378 nr_pages--; 2468 nr_pages--;
2379 } 2469 }
2380 /* Preallocate memory for the image */ 2470 /* Preallocate memory for the image */
2381 safe_pages_list = NULL;
2382 nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages; 2471 nr_pages = nr_copy_pages - nr_highmem - allocated_unsafe_pages;
2383 while (nr_pages > 0) { 2472 while (nr_pages > 0) {
2384 lp = (struct linked_page *)get_zeroed_page(GFP_ATOMIC); 2473 lp = (struct linked_page *)get_zeroed_page(GFP_ATOMIC);
@@ -2396,12 +2485,6 @@ prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2396 swsusp_set_page_free(virt_to_page(lp)); 2485 swsusp_set_page_free(virt_to_page(lp));
2397 nr_pages--; 2486 nr_pages--;
2398 } 2487 }
2399 /* Free the reserved safe pages so that chain_alloc() can use them */
2400 while (sp_list) {
2401 lp = sp_list->next;
2402 free_image_page(sp_list, PG_UNSAFE_CLEAR);
2403 sp_list = lp;
2404 }
2405 return 0; 2488 return 0;
2406 2489
2407 Free: 2490 Free:
@@ -2410,10 +2493,11 @@ prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm)
2410} 2493}
2411 2494
2412/** 2495/**
2413 * get_buffer - compute the address that snapshot_write_next() should 2496 * get_buffer - Get the address to store the next image data page.
2414 * set for its caller to write to. 2497 *
2498 * Get the address that snapshot_write_next() should return to its caller to
2499 * write to.
2415 */ 2500 */
2416
2417static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca) 2501static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
2418{ 2502{
2419 struct pbe *pbe; 2503 struct pbe *pbe;
@@ -2428,12 +2512,14 @@ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
2428 return get_highmem_page_buffer(page, ca); 2512 return get_highmem_page_buffer(page, ca);
2429 2513
2430 if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) 2514 if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
2431 /* We have allocated the "original" page frame and we can 2515 /*
2516 * We have allocated the "original" page frame and we can
2432 * use it directly to store the loaded page. 2517 * use it directly to store the loaded page.
2433 */ 2518 */
2434 return page_address(page); 2519 return page_address(page);
2435 2520
2436 /* The "original" page frame has not been allocated and we have to 2521 /*
2522 * The "original" page frame has not been allocated and we have to
2437 * use a "safe" page frame to store the loaded page. 2523 * use a "safe" page frame to store the loaded page.
2438 */ 2524 */
2439 pbe = chain_alloc(ca, sizeof(struct pbe)); 2525 pbe = chain_alloc(ca, sizeof(struct pbe));
@@ -2450,22 +2536,21 @@ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
2450} 2536}
2451 2537
2452/** 2538/**
2453 * snapshot_write_next - used for writing the system memory snapshot. 2539 * snapshot_write_next - Get the address to store the next image page.
2540 * @handle: Snapshot handle structure to guide the writing.
2454 * 2541 *
2455 * On the first call to it @handle should point to a zeroed 2542 * On the first call, @handle should point to a zeroed snapshot_handle
2456 * snapshot_handle structure. The structure gets updated and a pointer 2543 * structure. The structure gets populated then and a pointer to it should be
2457 * to it should be passed to this function every next time. 2544 * passed to this function every next time.
2458 * 2545 *
2459 * On success the function returns a positive number. Then, the caller 2546 * On success, the function returns a positive number. Then, the caller
2460 * is allowed to write up to the returned number of bytes to the memory 2547 * is allowed to write up to the returned number of bytes to the memory
2461 * location computed by the data_of() macro. 2548 * location computed by the data_of() macro.
2462 * 2549 *
2463 * The function returns 0 to indicate the "end of file" condition, 2550 * The function returns 0 to indicate the "end of file" condition. Negative
2464 * and a negative number is returned on error. In such cases the 2551 * numbers are returned on errors, in which cases the structure pointed to by
2465 * structure pointed to by @handle is not updated and should not be used 2552 * @handle is not updated and should not be used any more.
2466 * any more.
2467 */ 2553 */
2468
2469int snapshot_write_next(struct snapshot_handle *handle) 2554int snapshot_write_next(struct snapshot_handle *handle)
2470{ 2555{
2471 static struct chain_allocator ca; 2556 static struct chain_allocator ca;
@@ -2491,6 +2576,8 @@ int snapshot_write_next(struct snapshot_handle *handle)
2491 if (error) 2576 if (error)
2492 return error; 2577 return error;
2493 2578
2579 safe_pages_list = NULL;
2580
2494 error = memory_bm_create(&copy_bm, GFP_ATOMIC, PG_ANY); 2581 error = memory_bm_create(&copy_bm, GFP_ATOMIC, PG_ANY);
2495 if (error) 2582 if (error)
2496 return error; 2583 return error;
@@ -2500,6 +2587,7 @@ int snapshot_write_next(struct snapshot_handle *handle)
2500 if (error) 2587 if (error)
2501 return error; 2588 return error;
2502 2589
2590 hibernate_restore_protection_begin();
2503 } else if (handle->cur <= nr_meta_pages + 1) { 2591 } else if (handle->cur <= nr_meta_pages + 1) {
2504 error = unpack_orig_pfns(buffer, &copy_bm); 2592 error = unpack_orig_pfns(buffer, &copy_bm);
2505 if (error) 2593 if (error)
@@ -2522,6 +2610,7 @@ int snapshot_write_next(struct snapshot_handle *handle)
2522 copy_last_highmem_page(); 2610 copy_last_highmem_page();
2523 /* Restore page key for data page (s390 only). */ 2611 /* Restore page key for data page (s390 only). */
2524 page_key_write(handle->buffer); 2612 page_key_write(handle->buffer);
2613 hibernate_restore_protect_page(handle->buffer);
2525 handle->buffer = get_buffer(&orig_bm, &ca); 2614 handle->buffer = get_buffer(&orig_bm, &ca);
2526 if (IS_ERR(handle->buffer)) 2615 if (IS_ERR(handle->buffer))
2527 return PTR_ERR(handle->buffer); 2616 return PTR_ERR(handle->buffer);
@@ -2533,22 +2622,23 @@ int snapshot_write_next(struct snapshot_handle *handle)
2533} 2622}
2534 2623
2535/** 2624/**
2536 * snapshot_write_finalize - must be called after the last call to 2625 * snapshot_write_finalize - Complete the loading of a hibernation image.
2537 * snapshot_write_next() in case the last page in the image happens 2626 *
2538 * to be a highmem page and its contents should be stored in the 2627 * Must be called after the last call to snapshot_write_next() in case the last
2539 * highmem. Additionally, it releases the memory that will not be 2628 * page in the image happens to be a highmem page and its contents should be
2540 * used any more. 2629 * stored in highmem. Additionally, it recycles bitmap memory that's not
2630 * necessary any more.
2541 */ 2631 */
2542
2543void snapshot_write_finalize(struct snapshot_handle *handle) 2632void snapshot_write_finalize(struct snapshot_handle *handle)
2544{ 2633{
2545 copy_last_highmem_page(); 2634 copy_last_highmem_page();
2546 /* Restore page key for data page (s390 only). */ 2635 /* Restore page key for data page (s390 only). */
2547 page_key_write(handle->buffer); 2636 page_key_write(handle->buffer);
2548 page_key_free(); 2637 page_key_free();
2549 /* Free only if we have loaded the image entirely */ 2638 hibernate_restore_protect_page(handle->buffer);
2639 /* Do that only if we have loaded the image entirely */
2550 if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages) { 2640 if (handle->cur > 1 && handle->cur > nr_meta_pages + nr_copy_pages) {
2551 memory_bm_free(&orig_bm, PG_UNSAFE_CLEAR); 2641 memory_bm_recycle(&orig_bm);
2552 free_highmem_data(); 2642 free_highmem_data();
2553 } 2643 }
2554} 2644}
@@ -2561,8 +2651,8 @@ int snapshot_image_loaded(struct snapshot_handle *handle)
2561 2651
2562#ifdef CONFIG_HIGHMEM 2652#ifdef CONFIG_HIGHMEM
2563/* Assumes that @buf is ready and points to a "safe" page */ 2653/* Assumes that @buf is ready and points to a "safe" page */
2564static inline void 2654static inline void swap_two_pages_data(struct page *p1, struct page *p2,
2565swap_two_pages_data(struct page *p1, struct page *p2, void *buf) 2655 void *buf)
2566{ 2656{
2567 void *kaddr1, *kaddr2; 2657 void *kaddr1, *kaddr2;
2568 2658
@@ -2576,15 +2666,15 @@ swap_two_pages_data(struct page *p1, struct page *p2, void *buf)
2576} 2666}
2577 2667
2578/** 2668/**
2579 * restore_highmem - for each highmem page that was allocated before 2669 * restore_highmem - Put highmem image pages into their original locations.
2580 * the suspend and included in the suspend image, and also has been 2670 *
2581 * allocated by the "resume" kernel swap its current (ie. "before 2671 * For each highmem page that was in use before hibernation and is included in
2582 * resume") contents with the previous (ie. "before suspend") one. 2672 * the image, and also has been allocated by the "restore" kernel, swap its
2673 * current contents with the previous (ie. "before hibernation") ones.
2583 * 2674 *
2584 * If the resume eventually fails, we can call this function once 2675 * If the restore eventually fails, we can call this function once again and
2585 * again and restore the "before resume" highmem state. 2676 * restore the highmem state as seen by the restore kernel.
2586 */ 2677 */
2587
2588int restore_highmem(void) 2678int restore_highmem(void)
2589{ 2679{
2590 struct highmem_pbe *pbe = highmem_pblist; 2680 struct highmem_pbe *pbe = highmem_pblist;
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 5b70d64b871e..0acab9d7f96f 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -266,16 +266,18 @@ static int suspend_test(int level)
266 */ 266 */
267static int suspend_prepare(suspend_state_t state) 267static int suspend_prepare(suspend_state_t state)
268{ 268{
269 int error; 269 int error, nr_calls = 0;
270 270
271 if (!sleep_state_supported(state)) 271 if (!sleep_state_supported(state))
272 return -EPERM; 272 return -EPERM;
273 273
274 pm_prepare_console(); 274 pm_prepare_console();
275 275
276 error = pm_notifier_call_chain(PM_SUSPEND_PREPARE); 276 error = __pm_notifier_call_chain(PM_SUSPEND_PREPARE, -1, &nr_calls);
277 if (error) 277 if (error) {
278 nr_calls--;
278 goto Finish; 279 goto Finish;
280 }
279 281
280 trace_suspend_resume(TPS("freeze_processes"), 0, true); 282 trace_suspend_resume(TPS("freeze_processes"), 0, true);
281 error = suspend_freeze_processes(); 283 error = suspend_freeze_processes();
@@ -286,7 +288,7 @@ static int suspend_prepare(suspend_state_t state)
286 suspend_stats.failed_freeze++; 288 suspend_stats.failed_freeze++;
287 dpm_save_failed_step(SUSPEND_FREEZE); 289 dpm_save_failed_step(SUSPEND_FREEZE);
288 Finish: 290 Finish:
289 pm_notifier_call_chain(PM_POST_SUSPEND); 291 __pm_notifier_call_chain(PM_POST_SUSPEND, nr_calls, NULL);
290 pm_restore_console(); 292 pm_restore_console();
291 return error; 293 return error;
292} 294}
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index c1aaac431055..a3b1e617bcdc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -350,6 +350,12 @@ static int swsusp_swap_check(void)
350 if (res < 0) 350 if (res < 0)
351 blkdev_put(hib_resume_bdev, FMODE_WRITE); 351 blkdev_put(hib_resume_bdev, FMODE_WRITE);
352 352
353 /*
354 * Update the resume device to the one actually used,
355 * so the test_resume mode can use it in case it is
356 * invoked from hibernate() to test the snapshot.
357 */
358 swsusp_resume_device = hib_resume_bdev->bd_dev;
353 return res; 359 return res;
354} 360}
355 361
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 526e8911460a..35310b627388 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -47,7 +47,7 @@ atomic_t snapshot_device_available = ATOMIC_INIT(1);
47static int snapshot_open(struct inode *inode, struct file *filp) 47static int snapshot_open(struct inode *inode, struct file *filp)
48{ 48{
49 struct snapshot_data *data; 49 struct snapshot_data *data;
50 int error; 50 int error, nr_calls = 0;
51 51
52 if (!hibernation_available()) 52 if (!hibernation_available())
53 return -EPERM; 53 return -EPERM;
@@ -74,9 +74,9 @@ static int snapshot_open(struct inode *inode, struct file *filp)
74 swap_type_of(swsusp_resume_device, 0, NULL) : -1; 74 swap_type_of(swsusp_resume_device, 0, NULL) : -1;
75 data->mode = O_RDONLY; 75 data->mode = O_RDONLY;
76 data->free_bitmaps = false; 76 data->free_bitmaps = false;
77 error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); 77 error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls);
78 if (error) 78 if (error)
79 pm_notifier_call_chain(PM_POST_HIBERNATION); 79 __pm_notifier_call_chain(PM_POST_HIBERNATION, --nr_calls, NULL);
80 } else { 80 } else {
81 /* 81 /*
82 * Resuming. We may need to wait for the image device to 82 * Resuming. We may need to wait for the image device to
@@ -86,13 +86,15 @@ static int snapshot_open(struct inode *inode, struct file *filp)
86 86
87 data->swap = -1; 87 data->swap = -1;
88 data->mode = O_WRONLY; 88 data->mode = O_WRONLY;
89 error = pm_notifier_call_chain(PM_RESTORE_PREPARE); 89 error = __pm_notifier_call_chain(PM_RESTORE_PREPARE, -1, &nr_calls);
90 if (!error) { 90 if (!error) {
91 error = create_basic_memory_bitmaps(); 91 error = create_basic_memory_bitmaps();
92 data->free_bitmaps = !error; 92 data->free_bitmaps = !error;
93 } 93 } else
94 nr_calls--;
95
94 if (error) 96 if (error)
95 pm_notifier_call_chain(PM_POST_RESTORE); 97 __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
96 } 98 }
97 if (error) 99 if (error)
98 atomic_inc(&snapshot_device_available); 100 atomic_inc(&snapshot_device_available);