gpu: nvgpu: update priv_cmdbuff computation

Update the priv_cmdbuff computation to take into account the amount of memory semaphores take. Since semaphores always require more memory than sync-pts the sync-pt computation has been dropped. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ic05c26b4d1ed9cbd03d3239655c4607bb418396c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1141420 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
author: Alex Waterman <alexw@nvidia.com> 2016-05-04 13:07:36 -0400
committer: Terje Bergstrom <tbergstrom@nvidia.com> 2016-05-06 15:13:52 -0400
commit: 70d531388205852865d48469cfbd9d0c996acd53 (patch)
tree: 7b2db040485e844e4015e4dfc8a970b14278fbee /drivers
parent: 1fc23d1280d2777c1a32544e787f257769cf8834 (diff)
1 files changed, 19 insertions, 11 deletions
diff --git a/drivers/gpu/nvgpu/gk20a/channel_gk20a.c b/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
index 697861e2..0d7a6bec 100644
--- a/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
@@ -1278,17 +1278,25 @@ static int channel_gk20a_alloc_priv_cmdbuf(struct channel_gk20a *c)
        u32 size;
        int err = 0;
-        /* Kernel can insert gpfifos before and after user gpfifos.
+        /*
-           Before user gpfifos, kernel inserts fence_wait, which takes
+         * Compute the amount of priv_cmdbuf space we need. In general the worst
-           syncpoint_a (2 dwords) + syncpoint_b (2 dwords) = 4 dwords.
+         * case is the kernel inserts both a semaphore pre-fence and post-fence.
-           After user gpfifos, kernel inserts fence_get, which takes
+         * Any sync-pt fences will take less memory so we can ignore them for
-           wfi (2 dwords) + syncpoint_a (2 dwords) + syncpoint_b (2 dwords)
+         * now.
-           = 6 dwords.
+         *
-           Worse case if kernel adds both of them for every user gpfifo,
+         * A semaphore ACQ (fence-wait) is 8 dwords: semaphore_a, semaphore_b,
-           max size of priv_cmdbuf is :
+         * semaphore_c, and semaphore_d. A semaphore INCR (fence-get) will be 10
-           (gpfifo entry number * (2 / 3) * (4 + 6) * 4 bytes */
+         * dwords: all the same as an ACQ plus a non-stalling intr which is
-        size = roundup_pow_of_two(
+         * another 2 dwords.
-                c->gpfifo.entry_num * 2 * 12 * sizeof(u32) / 3);
+         *
+         * Lastly the number of gpfifo entries per channel is fixed so at most
+         * we can use 2/3rds of the gpfifo entries (1 pre-fence entry, one
+         * userspace entry, and one post-fence entry). Thus the computation is:
+         *
+         *   (gpfifo entry number * (2 / 3) * (8 + 10) * 4 bytes.
+         */
+        size = roundup_pow_of_two(c->gpfifo.entry_num *
+                                  2 * 18 * sizeof(u32) / 3);
        err = gk20a_gmmu_alloc_map(ch_vm, size, &q->mem);
        if (err) {
author	Alex Waterman <alexw@nvidia.com>	2016-05-04 13:07:36 -0400
committer	Terje Bergstrom <tbergstrom@nvidia.com>	2016-05-06 15:13:52 -0400
commit	70d531388205852865d48469cfbd9d0c996acd53 (patch)
tree	7b2db040485e844e4015e4dfc8a970b14278fbee /drivers
parent	1fc23d1280d2777c1a32544e787f257769cf8834 (diff)

diff --git a/drivers/gpu/nvgpu/gk20a/channel_gk20a.c b/drivers/gpu/nvgpu/gk20a/channel_gk20a.c index 697861e2..0d7a6bec 100644 --- a/drivers/gpu/nvgpu/gk20a/channel_gk20a.c +++ b/drivers/gpu/nvgpu/gk20a/channel_gk20a.c
@@ -1278,17 +1278,25 @@ static int channel_gk20a_alloc_priv_cmdbuf(struct channel_gk20a *c)
1278	u32 size;	1278	u32 size;
1279	int err = 0;	1279	int err = 0;
1280		1280
1281	/* Kernel can insert gpfifos before and after user gpfifos.	1281	/*
1282	Before user gpfifos, kernel inserts fence_wait, which takes	1282	* Compute the amount of priv_cmdbuf space we need. In general the worst
1283	syncpoint_a (2 dwords) + syncpoint_b (2 dwords) = 4 dwords.	1283	* case is the kernel inserts both a semaphore pre-fence and post-fence.
1284	After user gpfifos, kernel inserts fence_get, which takes	1284	* Any sync-pt fences will take less memory so we can ignore them for
1285	wfi (2 dwords) + syncpoint_a (2 dwords) + syncpoint_b (2 dwords)	1285	* now.
1286	= 6 dwords.	1286	*
1287	Worse case if kernel adds both of them for every user gpfifo,	1287	* A semaphore ACQ (fence-wait) is 8 dwords: semaphore_a, semaphore_b,
1288	max size of priv_cmdbuf is :	1288	* semaphore_c, and semaphore_d. A semaphore INCR (fence-get) will be 10
1289	(gpfifo entry number * (2 / 3) * (4 + 6) * 4 bytes */	1289	* dwords: all the same as an ACQ plus a non-stalling intr which is
1290	size = roundup_pow_of_two(	1290	* another 2 dwords.
1291	c->gpfifo.entry_num * 2 * 12 * sizeof(u32) / 3);	1291	*
		1292	* Lastly the number of gpfifo entries per channel is fixed so at most
		1293	* we can use 2/3rds of the gpfifo entries (1 pre-fence entry, one
		1294	* userspace entry, and one post-fence entry). Thus the computation is:
		1295	*
		1296	* (gpfifo entry number * (2 / 3) * (8 + 10) * 4 bytes.
		1297	*/
		1298	size = roundup_pow_of_two(c->gpfifo.entry_num *
		1299	2 * 18 * sizeof(u32) / 3);
1292		1300
1293	err = gk20a_gmmu_alloc_map(ch_vm, size, &q->mem);	1301	err = gk20a_gmmu_alloc_map(ch_vm, size, &q->mem);
1294	if (err) {	1302	if (err) {