|
There's a handful of things that went wrong here:
* The read-only data sections were mapped to flash, which is very slow.
I just put them in the data segment, so they end up in the scratchpad.
This is about a 10x hit, so it's really important.
* The toolchain was an old version, which didn't have a fast memcpy
implementation on 32-bit systems. This is about a 2x hit.
* Some compiler flags were incorrect, including
* -Os instead of -O3
* Missing -mexplicit-relocs
* Missing -DNOENUM
* Missing -falign-functions=4
I haven't checked how much those hurt
With this, I get
$ make software BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone
$ make upload BOARD=freedom-e300-hifive1 PROGRAM=dhrystone LINK_TARGET=dhrystone
Execution starts, 10000000 runs through Dhrystone
Execution ends
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 10000010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: -2147470264
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: -2147470264
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1.3
Dhrystones per Second: 714285.6
which is 1.55 DMIPS/MHz at 262 MHz. It's still a bit slower than our
current stuff, but I don't remember what was actually in the HiFive1 so
I'm not sure what we should be getting. I verified the clock is
accurate with a stopwatch. I haven't bothered to go look through the
binary, but I think we're about 10 cycles off so it should be managable.
|