Bloader failure

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi All,

I support about 100 students using ATMega128 with bootloaders and we use BLIPS as the iploader. This has worked pretty well. However recently I had two failures where it goes through the motions, but fails to verify. If the Verify after program box is unticked for all intentional purposes one would think that the new program has been uploaded. However the old program remains (without any damage) and runs quite happily.

The bootloader is firmly in control and can be used for executing the application program etc.

The number of erasures for each was about 100 times, so it is not a exceeded maximum erasure issue.

I will be trying to read the code out of these controllers, to see what has happened to the bootloader to fail in this weird manner, before I try & reflash them.

Has anyone else experienced failures of this type.

Lee de Vries

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I presume that you have access to at least one JTAG/ISP? If so then on the problematic boards just get the bootloader to do an erase without a reprogram then extract the flash with ISP and see if it has returned to all 0xFFs

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Cliff,
Funny about that, just got out of bed & thought the problem over in my sleep & came up with the same plan of attack. Thought I would check what guru's had given it some thought & there you were.
Will report my findings!
Lee

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LDEVRIES wrote:
Hi All,

I support about 100 students using ATMega128 with bootloaders and we use BLIPS as the iploader. This has worked pretty well. However recently I had two failures where it goes through the motions, but fails to verify. If the Verify after program box is unticked for all intentional purposes one would think that the new program has been uploaded. However the old program remains (without any damage) and runs quite happily.

The bootloader is firmly in control and can be used for executing the application program etc.

The number of erasures for each was about 100 times, so it is not a exceeded maximum erasure issue.

I will be trying to read the code out of these controllers, to see what has happened to the bootloader to fail in this weird manner, before I try & reflash them.

Has anyone else experienced failures of this type.

Lee de Vries

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I used PonyProg/ISP to read the bootloader & fuses on one of the offending ATMega128 controller boards.
The bootloader was quite OK, but BLB11,BLB12,BLB01 & BLB02 were all programmed (Ie. As in cleared).
We normally leave these flags unprogrammed. So, somehow these fuses bits have inadvertantly been cleared which disables writing to the application.

When I tried to set the fuses with PonyProg/ISP, they
would not reset untill I issued a Reset command using PonyProg.

After that, ontroller & bootloader worked as normal.

The students do not have access to any other programmer so they could not have inadvertantly changed any fuse bits.
When I get hold of another failed board I will confirm whether the same problem occurred.
I have no idea of the mechanism causing the re-write
of the fuses. Open to suggestions.

Lee

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My posting, above, seems to be missing my comments (???).

Well, glad you found that the fuse bits are hosed. I don't have an idea of how that happened. BLIPS and as I recall, no other ISP can alter the fuse bits.

Must have been cosmic rays. Or a Student plugged in a virgin AVR chip?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that while AVRs generally cannot change their own fuses from within running code there are some where the LockBits *can* be changed by running SPM software. The fact that these all have a bootloader in them (and hence some SPM's in the BLS) means that there is a chance if the code goes rogue that it might hit the SPM with the right values for lock bits to be changed.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Cliff,
Unfortunatily the Technical Officer who helps me just reloaded bootloaders onto about three other controllers that had failed.
I would have to agree with you that it probaly
a "branch on telephone numbers" that has caused the
fuses to be changed. It would be nice to know "how"
SPM might change the fuses.

All these failures occurred in the week before projects were due to be handed in of course (Murphy's law).

Watch this space for updates if I get any further lead on it.

Lee

Watch the

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

It would be nice to know "how"
SPM might change the fuses.

Search your copy of the 128 datasheet for "Setting the Boot Loader Lock Bits by SPM". In the 06/08 copy of the datasheet this is on page 281

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Cliff, obviously the mechanism exists. So all we one can assume that the code has gone awry, with some common code that they have used.
This gives me a really good starting point for next time it happens.
Thanks guys!
Lee

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lee,

One way to minimize this is to ensure that the entire bootloader only actually contains ONE spm opcode that all the various procedures call upon to actually do the self programming action. Then you maybe wrap it in code that needs certain variables to hold certain "key" values for it to actually operate. The callers only set those key values just before it is called and clear them after. If something goes rogue and falls into the routine it'd be unlucky if the key values were in place (of course it could fall into the calling code - so this is not a perfect protection)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Cliff,
I have now re-written it with the one spm inside a wrapper which is potected with the $aa & $55 keywords,
which are initialized in a seperate subroutine.
So code that branches on telephone numbers will have to do quite a few things to be able to rewrite the BLB fuses.
There were in fact six spm in the code.
Regards,
Lee

This post has been edited!

Charles Darwin, Lord Kelvin & Murphy are always lurking about!
Lee -.-
Riddle me this...How did the serpent move around before the fall?