From 401b704e73599a36bfdc8c778dab85f94d74ed1d Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 17 Nov 2017 12:14:52 -0800 Subject: Speed up Dhrystone on the HiFive1 There's a handful of things that went wrong here: * The read-only data sections were mapped to flash, which is very slow. I just put them in the data segment, so they end up in the scratchpad. This is about a 10x hit, so it's really important. * The toolchain was an old version, which didn't have a fast memcpy implementation on 32-bit systems. This is about a 2x hit. * Some compiler flags were incorrect, including * -Os instead of -O3 * Missing -mexplicit-relocs * Missing -DNOENUM * Missing -falign-functions=4 I haven't checked how much those hurt With this, I get $ make software BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone $ make upload BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone Execution starts, 10000000 runs through Dhrystone Execution ends Final values of the variables used in the benchmark: Int_Glob: 5 should be: 5 Bool_Glob: 1 should be: 1 Ch_1_Glob: A should be: A Ch_2_Glob: B should be: B Arr_1_Glob[8]: 7 should be: 7 Arr_2_Glob[8][7]: 10000010 should be: Number_Of_Runs + 10 Ptr_Glob-> Ptr_Comp: -2147470264 should be: (implementation-dependent) Discr: 0 should be: 0 Enum_Comp: 2 should be: 2 Int_Comp: 17 should be: 17 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: -2147470264 should be: (implementation-dependent), same as above Discr: 0 should be: 0 Enum_Comp: 1 should be: 1 Int_Comp: 18 should be: 18 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: 5 should be: 5 Int_2_Loc: 13 should be: 13 Int_3_Loc: 7 should be: 7 Enum_Loc: 1 should be: 1 Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING should be: DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING should be: DHRYSTONE PROGRAM, 2'ND STRING Microseconds for one run through Dhrystone: 1.3 Dhrystones per Second: 714285.6 which is 1.55 DMIPS/MHz at 262 MHz. It's still a bit slower than our current stuff, but I don't remember what was actually in the HiFive1 so I'm not sure what we should be getting. I verified the clock is accurate with a stopwatch. I haven't bothered to go look through the binary, but I think we're about 10 cycles off so it should be managable. --- riscv-gnu-toolchain | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'riscv-gnu-toolchain') diff --git a/riscv-gnu-toolchain b/riscv-gnu-toolchain index 65cb174..bf5697a 160000 --- a/riscv-gnu-toolchain +++ b/riscv-gnu-toolchain @@ -1 +1 @@ -Subproject commit 65cb174d37e2efe99e094eb317113442c79ed3ce +Subproject commit bf5697a1a6509705b50dcc1f67b8c620a7b21ec4 -- cgit v1.2.1-18-gbd029