From 401b704e73599a36bfdc8c778dab85f94d74ed1d Mon Sep 17 00:00:00 2001 From: Palmer Dabbelt Date: Fri, 17 Nov 2017 12:14:52 -0800 Subject: Speed up Dhrystone on the HiFive1 There's a handful of things that went wrong here: * The read-only data sections were mapped to flash, which is very slow. I just put them in the data segment, so they end up in the scratchpad. This is about a 10x hit, so it's really important. * The toolchain was an old version, which didn't have a fast memcpy implementation on 32-bit systems. This is about a 2x hit. * Some compiler flags were incorrect, including * -Os instead of -O3 * Missing -mexplicit-relocs * Missing -DNOENUM * Missing -falign-functions=4 I haven't checked how much those hurt With this, I get $ make software BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone $ make upload BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone Execution starts, 10000000 runs through Dhrystone Execution ends Final values of the variables used in the benchmark: Int_Glob: 5 should be: 5 Bool_Glob: 1 should be: 1 Ch_1_Glob: A should be: A Ch_2_Glob: B should be: B Arr_1_Glob[8]: 7 should be: 7 Arr_2_Glob[8][7]: 10000010 should be: Number_Of_Runs + 10 Ptr_Glob-> Ptr_Comp: -2147470264 should be: (implementation-dependent) Discr: 0 should be: 0 Enum_Comp: 2 should be: 2 Int_Comp: 17 should be: 17 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: -2147470264 should be: (implementation-dependent), same as above Discr: 0 should be: 0 Enum_Comp: 1 should be: 1 Int_Comp: 18 should be: 18 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: 5 should be: 5 Int_2_Loc: 13 should be: 13 Int_3_Loc: 7 should be: 7 Enum_Loc: 1 should be: 1 Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING should be: DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING should be: DHRYSTONE PROGRAM, 2'ND STRING Microseconds for one run through Dhrystone: 1.3 Dhrystones per Second: 714285.6 which is 1.55 DMIPS/MHz at 262 MHz. It's still a bit slower than our current stuff, but I don't remember what was actually in the HiFive1 so I'm not sure what we should be getting. I verified the clock is accurate with a stopwatch. I haven't bothered to go look through the binary, but I think we're about 10 cycles off so it should be managable. --- software/dhrystone/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'software') diff --git a/software/dhrystone/Makefile b/software/dhrystone/Makefile index d401720..4602653 100644 --- a/software/dhrystone/Makefile +++ b/software/dhrystone/Makefile @@ -5,10 +5,10 @@ C_SRCS := dhry_stubs.c dhry_printf.c HEADERS := dhry.h DHRY_SRCS := dhry_1.c dhry_2.c -DHRY_CFLAGS := -O2 -DTIME -fno-inline -fno-builtin-printf -Wno-implicit -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) +DHRY_CFLAGS := -O3 -DTIME -fno-inline -fno-builtin-printf -Wno-implicit -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) XLEN ?= 32 -CFLAGS := -Os -fno-common -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) +CFLAGS := -O3 -fno-common -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) -mexplicit-relocs -DNOENUM -falign-functions=4 LDFLAGS := -Wl,--wrap=scanf -Wl,--wrap=printf -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) -mcmodel=medany DHRY_OBJS := $(patsubst %.c,%.o,$(DHRY_SRCS)) -- cgit v1.2.1-18-gbd029