diff options
author | Palmer Dabbelt <palmer@dabbelt.com> | 2017-11-17 12:14:52 -0800 |
---|---|---|
committer | Palmer Dabbelt <palmer@dabbelt.com> | 2017-11-17 15:24:01 -0800 |
commit | 401b704e73599a36bfdc8c778dab85f94d74ed1d (patch) | |
tree | 5d4d4851f1a9ff03864ab87de11bacc05804ddbd /software | |
parent | b53187a0434fe5e8dce288f55cfca36b292552e4 (diff) |
Speed up Dhrystone on the HiFive1
There's a handful of things that went wrong here:
* The read-only data sections were mapped to flash, which is very slow.
I just put them in the data segment, so they end up in the scratchpad.
This is about a 10x hit, so it's really important.
* The toolchain was an old version, which didn't have a fast memcpy
implementation on 32-bit systems. This is about a 2x hit.
* Some compiler flags were incorrect, including
* -Os instead of -O3
* Missing -mexplicit-relocs
* Missing -DNOENUM
* Missing -falign-functions=4
I haven't checked how much those hurt
With this, I get
$ make software BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone
$ make upload BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone
Execution starts, 10000000 runs through Dhrystone
Execution ends
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 10000010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: -2147470264
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: -2147470264
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1.3
Dhrystones per Second: 714285.6
which is 1.55 DMIPS/MHz at 262 MHz. It's still a bit slower than our
current stuff, but I don't remember what was actually in the HiFive1 so
I'm not sure what we should be getting. I verified the clock is
accurate with a stopwatch. I haven't bothered to go look through the
binary, but I think we're about 10 cycles off so it should be managable.
Diffstat (limited to 'software')
-rw-r--r-- | software/dhrystone/Makefile | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/software/dhrystone/Makefile b/software/dhrystone/Makefile index d401720..4602653 100644 --- a/software/dhrystone/Makefile +++ b/software/dhrystone/Makefile @@ -5,10 +5,10 @@ C_SRCS := dhry_stubs.c dhry_printf.c HEADERS := dhry.h DHRY_SRCS := dhry_1.c dhry_2.c -DHRY_CFLAGS := -O2 -DTIME -fno-inline -fno-builtin-printf -Wno-implicit -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) +DHRY_CFLAGS := -O3 -DTIME -fno-inline -fno-builtin-printf -Wno-implicit -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) XLEN ?= 32 -CFLAGS := -Os -fno-common -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) +CFLAGS := -O3 -fno-common -mcmodel=medany -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) -mexplicit-relocs -DNOENUM -falign-functions=4 LDFLAGS := -Wl,--wrap=scanf -Wl,--wrap=printf -march=$(RISCV_ARCH) -mabi=$(RISCV_ABI) -mcmodel=medany DHRY_OBJS := $(patsubst %.c,%.o,$(DHRY_SRCS)) |