What are my options for low level debug on the AVR (ATmega128) ?

Go To Last Post
56 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

First posting. I hope this is the correct subsection!

 

Quick summary. I am helping a friend port working code from an Arduino based Teensy(3.5) to real hardware which is ATmega128 based. The code (so far) runs fine on the Teensy but crashes and reboots on the real hardware.

 

I have bought an ATmega128 dev board. It has two 10 pin connectors: one for SPI programming; and the other is labelled JTAG. All pins are brought out to headers. It was really good value for around £10 off Ebay. I am programming directly using the usbTiny programmer, and I am not using a boot loader.

 

I have lots of experience with microprocessors and microcontrollers of various types, however much less when it comes to the AVR family. I am comfortable with assembly code, low level monitor debuggers, logic analysers, etc.

 

We are using the MegaCore Arduino plug-in (from GitHub) for the Arduino GUI, which supports a number of non standard AVR cores such as the ATmega128. It is gcc based.

 

We have narrowed down the crash to the first call to lcd.print("message") in code run as part of setup. Could be a memory leak or stack corruption, I have no idea. The serial port sign on message keeps being printed, suggesting a crash and reboot.

 

The Teensy breadboard setup uses a I2C based colour OLED. The OLED uses the Adafruit library which itself uses hardware I2C. It appears to work OK.

 

Try#1.

 

I modified this library for this project to use bit banged I2C, since on the real hardware the hardware I2C pins are connected elsewhere and cannot be freed. There are spare pins brought to a header, so these have been used. This version of the code on the ATmega128 does not run, there is no sign on message and the serial based GUI never appears. I was concerned about memory usage, since the ATmega128 only has 4k bytes of RAM, so I tried to optimised away all fixed strings which were needlessly consuming RAM. The available free RAM went up, but we couldn't get the code to run.

 

Try #2.

 

Let's use the 20 x 4 I2C LCD module, I thought. So I modified yet another Adafruit library to make this bit banged too. The example sketch prints to all 4 lines and runs correctly (both my dev board and my friends real hardware). So we integrated this code and replaced the OLED code. This is where we at at now. No sign of anything being printed on the LCD, just the crash and reboot. If the LCD code is #ifdef'ed out, the code runs and we can operate the serial based (VT100) simple text GUI. The "lcd" (bit banged LCD library) setup is called from setup and so occurs early in the code flow execution. The crash is the same on both our hardware.

 

So I could have easily done something wrong when I modified the Adafruit library, so here was try #3.

 

On my dev board (only) connect up the I2C LCD to the hardware I2C pins and use the Adafruit LCD library. I was expecting this to solve all the problems, but no, it still crashes and reboots.

 

So now I am very puzzled, and unsure how to debug this.

 

One method I have used for other processors is to run a command line based simulator which prints out lots of trace information, such as code executed, writes and reads. The log file can be post processed using Linux tools to extract all reads and writes to look for anything unusual. I did download some code called "avr_sim" which runs under Linux. I gave it the Arduino hex file, it uses CPU time, but generates no output. I am doing something wrong, and really need to find a support group to progress this angle of attack.

 

I note that the dev board has a JTAG connector, however the only JTAG units I have are for Altera and Xilinx FPGAs. It would be great to hear that some clever hackers have managed to get these working with the AVR, but I suspect it is unlikely.

 

Since this looks like some kind of memory corruption, I think I need a low level debug solution.

 

I would appreciate any suggestions, however I would like to minimise cost ($$$), so I don't really want to buy an AVR JTAG pod.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


JTAG and you can get one for a few $, let's see if I can find a link for you.

 

 A few here

John Samperi

Ampertronics Pty. Ltd.

https://www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

Last Edited: Thu. Jun 23, 2022 - 10:16 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Welcome!

migry wrote:
One method I have used for other processors is to run a command line based simulator which prints out lots of trace information, such as code executed, writes and reads. The log file can be post processed using Linux tools to extract all reads and writes to look for anything unusual.
Valgrind on most OS other than Windows.

migry wrote:
It would be great to hear that some clever hackers have managed to get these working with the AVR, but I suspect it is unlikely.
The AVR JTAG debug stream has been somewhat reversed (USB bridge to JTAG)

migry wrote:
Since this looks like some kind of memory corruption, I think I need a low level debug solution.
One can debug an application by assertions and/or a logic analyzer.

migry wrote:
so I don't really want to buy an AVR JTAG pod.
MPLAB Snap is relatively inexpensive.

 

P.S.

migry wrote:
ATmega128
AVR DA or AVR DB are the follow-on to mega128.

 


Valgrind Home

due to 

Hardware and software tools for embedded developers | Static Analysis and Metrics Tools (and dynamic analysis) by Jack Ganssle

[near bottom]

Matthew MacClary uses these checkers:

[specific GCC arguments, a linter, Valgrind]

A linter identifies instances of a computer language's ambiguities and known defect patterns (a linter pattern matches); an instance of an ambiguity may be no issue on one CPU whereas is an issue on a different CPU (or computer language run-time, RTOS, or framework)

 

JTAG Debugger (ECE 4760) (Cornell University)

>>> LURA <<<

Don't have a URL for first generation AVR JTAG debuggers as clones of AVRJTAGICE.

 

Adding Automatic Debugging to Firmware for Embedded Systems by Jack Ganssle

EDN - 8 tips for squashing bugs using ASSERT in C - Jacob Beningo

Generating Unique Error Codes | The Embedded Muse 306 by Jack Ganssle

Using Asserts in Embedded Systems | Interrupt (Memfault)

Troubleshooting real-time software issues using a logic analyzer - Embedded.com

May have a spare port on that mega128 PCBA.

 

PG164100 - MPLAB(R) Snap In-Circuit Debugger | Welcome to Microchip Technology | Microchip Technology Inc. (microchipDIRECT)

MPLAB Snap | AVR Freaks

 

Migration from the megaAVR® to AVR® Dx Microcontroller Families

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Thank you!

 

The low prices are a rather unexpected and pleasant surprise.

 

As I explained, I am unfamiliar with the AVR and various tool sets. My only experience is via pre-built Arduino boards, where you are isolated from the underlying architecture, which is fine for most of my projects.

 

The official Altera JTAG pods are rather expensive, however it is possible to buy a cheap Chinese clone for $10. Some have had issues with them, but I have used them successfully.

 

On Ebay UK I found a UK seller of "AVR JTAG Studio USB Flux Workshop". At around $20 it might be worth a punt.

 

Someone else has listed a "AVR Jtag In-Circuit Emulator Mk-II" for $130. It does however look genuine, nevertheless more than I want to invest, after all I am only helping a friend to port his code.

 

Now the question is, what software tools are able to use the JTAG pods?

Would any of these tools allow me to generate the full trace of program execution?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The great thing about 128 is that it is one of about the first 10 AVR with JTAG that ever existed, around the turn of the millennium. Atmel made a JTAG debugger for them and ill-advisedly did not "lock" the design. So anyone could clone it (and did) so since then you've been able to get a clone of that JTAG for about $10-$20. They didn't repeat the mistake so after that all AVRs with debug have required an official Atmel/Microchip debug interface.

 

So you are lucky you picked an ancient old dinosaur to work with! 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
Someone else has listed a "AVR Jtag In-Circuit Emulator Mk-II" for $130. It does however look genuine,
Only the Atmel AVRJTAGICE mkII is original; its EOL leads to clones.

migry wrote:
nevertheless more than I want to invest,
Atmel-ICE may be reasonable though MPLAB PICkit 4 is more so.

migry wrote:
what software tools are able to use the JTAG pods?
Nearly numerous; can aid by stating the operating system.

migry wrote:
Would any of these tools allow me to generate the full trace of program execution?
No as AVR OCD lacks trace; in lieu of, an AVR simulator.

edit : AVR OCD does have application data in the stream to the debugger.

 


The ones at Waveshare enclosure Atmel-ICE PCBA :

Atmel-ICE-C, ATMEL-ICE-PCBA, Powerful development tool for debugging and programming Atmel SAM and AVR microcontrollers

East :

Atmel-ice-c Kit -ice-c Development Programming -ice-pcba Pi - AliExpress

West :

Cost Effective Original Atmel-ICE-C Kits PCBA Inside Debugger Supports JTAG SWD | eBay

 

PG164140 - MPLAB PICkit 4 In-Circuit Debugger | Microchip Technology Inc. | World's Largest Inventory of Microchip Products

 

edit :

ATmega128A datasheet

[page 363]

29.16.1. OCDR – On-chip Debug Register

IDR Events | Application Output | Trace | Microchip Studio

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Sun. Jun 26, 2022 - 07:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Thank you for your reply and the various links regarding debug methodology.

 

gchapman wrote:

AVR DA or AVR DB are the follow-on to mega128.

 

We want to port the code to existing hardware. The project started out of frustration with the original device and poor support by the vendor. Originally he wanted to make "a better mouse trap" using a Teensy, but I was interested to see if we could port to the old hardware. Thanks to MegaCore we could use the familiar Arduino environment for the ATmega128 which the original hardware uses. Thanks to the vendor we have full schematics, with greatly simplifies things.

 

gchapman wrote:

MPLAB Snap is relatively inexpensive.

 

I realised that Atmel was now part of Microchip, but I had no idea that this product existed. It's pretty cheap, but I already have a PicKit II and III for the now abandoned (by me) PIC family. The $10 options for JTAG ICE from China seem more appealing at the moment.

 

gchapman wrote:

 

The AVR JTAG debug stream has been somewhat reversed (USB bridge to JTAG)

 

Aha! Interesting. I'll have to have a google just out of curiosity. I am very familiar with the JTAG protocol. I could be wrong, but looks like the ICE uses a serial data stream, which it possibly translates into toggling of the JTAG pins. Hmmm, I wonder if anyone has tried to write code to interpret the data stream to make a clone ICE? More googling to do :-)

 

Anyway I'm still on an AVR learning curve. Lot's more research to do!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
So you are lucky you picked an ancient old dinosaur to work with! 
Restocking of AVRe+ may be underway.

 

ATmega128A | Octopart

https://www.microchipdirect.com/product/search/all/AVR128DB64

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

So you are lucky you picked an ancient old dinosaur to work with! 

 

There was no choice involved. The old original hardware happened to use this version of the chip. For the development of my friends new project, I am pretty sure that he will stick with the Teensy.

 

Looks like you can buy an AVR JTAG ICE for cheap from China, so I am now looing forward to seeing whether it can help me debug the program crash.

 

Interestingly the Chinese Altera JTAG pod clones appear to be clones of the original Altera design, which used a PIC micro (if I remember correct). Perhaps Altera made the same mistake :-) If so that was lucky as the cheap Altera clone JTAG pods really make the Altera FPGA boards a great low cost choice for learning about this technology.

 

Off-topic. Under Linux, I compiled simavr from sources and finally got this compiled version to run my "Blink" Intelhex Arduino compiled code! It even printed out what was going to go to the serial port to the terminal. It can generate a VCD which can be viewed post simulation, but the way to add traces is less than intuitive and I am having to dig through the source code to understand the syntax. There is a thread on this forum and I will try posting there.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FYI if you just get the $10 JTAG MK1 clone then you are restricted to only a few OLDER chips like the Mega128 (see specs for JTAG debugger).

 

If you are intending on using AVRs then an Atmel ICE would be a great investment as it will do more that just JTAG debugging including newer interfaces for newer chips and ARM chips.

 

I built my own JTAG MK1 many years ago as I needed one for debugging over a 4 days long weekend and I had all the bits in my junk box, I then got more powerful debuggers like the old JTAG ICE MK2, also a few Mk3 and a couple of Atmel ICEs. Now semiretired so have given away a few things.

 

Under Windows you have the powerful Atmel Studio 7 for the IDE, under Linux you can use the newer MPLAB-X with the above tools as well as using the MPLAB PICkit™ 4  or the cheaper MPLAB Snap  instead of the above hardware.

John Samperi

Ampertronics Pty. Ltd.

https://www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

js wrote:
under Linux you can use ...
AVaRICE or Bloom; Python for EDBG (later JTAGICE3, and, subsequent)

Some IDE have an AVR GDB client and/or a Python interface; Microchip Studio has an AVR GDB server.

Visual Studio Code has some AVR; greater integration is forthcoming by the Visual Studio Code Embedded Tools extension.

 

Supported Targets - Bloom (search for ATmega128A)

pyedbglib · PyPI

Microchip Studio 7.0.2542 | AVR Freaks

How do you debug avr code on non-windows platforms (Lazarus IDE for Free Pascal)

AVR in VS Code | AVR Freaks

AVR Studio On Mac & Linux? | AVR Freaks

Visual Studio Code Embedded | AVR Freaks

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Just curious.....

What did you do to the fuses?

The mega128 has a Mega103 compatibility fuse if that is not changed the chip behaves like a mega103 and not a mega128.

It might be that that is acting up......

I have not seen a comment on changing fuses so if they are all at default it might be you need to change a couple more to get the processor running the way it should be for the code.....

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Regarding JTAG debuggers.

 

The $10 JTAGICE-1 clone ones only work with old AVRs.  JTAGICE-1 is NOT supported by AS7.0 or MPLABX.

 

SNAP is cheap.  You need to make a cable and a case for it.

PicKit4 is good value.   You need to make a cable.

ATMEL-ICE is fairly expensive.   You need to buy version with cable and adapter.

 

The JTAGICE-1 clones plug straight into your board but need to use obsolete software.

 

The modern alternatives are future-proof.  i.e. they work with modern software and all AVRs, ARMs, ...

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

meslomp wrote:

Just curious.....

What did you do to the fuses?

 

Ha ha! I chuckled when I read your reply. Well what I did was use them to brick the Atmega128 laughblushsurprise dev board when I first started to play with it. I did say I was new to "stand-alone" AVRs (without the protection of the Arduino GUI or built-in programmer).

 

BTW I did see the fuse you mention, understand that it was for backwards compatibility, and I definitely need to check the datasheet to understand it 100%. It would also be trivial to unset this fuse and see if this fixes the problem. Fingers crossed!

 

Here are more details for those interested.

 

The cheap dev board arrived with a "blink" sketch installed, so the LED blinked when powered. I tried a simple serial sketch, but there was no text in the TeraTerm window. So I copied the standard blink sketch and modified it to flash the two LEDs on this board, using "delay(1000)" for a 1 second on/off delay. I compiled in the Arduino GUI and then found the Arduino build folder in my Temp folder and copied the hex file. I programmed the hex using the usbTiny. No flash. Oh hold on I saw a LED go on, but it's not flashing. Waited longer, and saw that the LEDs were flashing on and off just very slowly. In a moment of lucidity, I had an suspicion that this might be a fuse related clock setting, remembering the pain of the PIC16 days and their fuses. Yep the datasheet explains that the chip is normally supplied with internal oscillator and divided by 8, giving me a 1MHz clock rate (external xtal is 8MHz). So my 1s on/off became 8s on/off and I was too quick to misinterpret the result. Not the first time I have to admit blush

 

So Now I needed to program the fuses. Clearly the binary only contained Flash data. I found an online fuse calculator and selected what I thought was an appropriate setting. What was strange is that the fuse values appeared to be inverted (active low?) and I did get a little confused. I was able to  find the magic command line command for usbTiny and I programmed the fuses to enable the external xtal, but nothing. Oh, and now usbTiny can't connect. I appear to have bricked my dev board.  crying Oops! SO now I was slightly panicking and googled for an answer. I think eventually I found the answer on this forum, which was to connect an external xtal oscillator to the XTAL IN pin. I found an 8MHz oscillator module, wired it up, and connected the output to each of the 3 convenient IC pin socket holes which allow using a different xtal. Connecting to point one, then two, then three, finally the LEDs started to flash, and this time at the correct rate. usbTiny now also was happy to connect. To cut a long story short, after much messing around, trying different "fuse settings" web sites, I finally managed to work out the correct hex value to enable the external 8MHz xtal. Phew! I tried the simple serial sketch again, and this time it worked like a charm. Now I was in a good position to try my friends code to see if I could help him. I did notice the Mega103 fuse setting, but I left it alone. 

 

So it's all a learning curve, and to be honest I find it fun, sometimes frustrating, but I usually always find a very satisfying when things start to work.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
SNAP is cheap.  You need to make a cable and a case for it.
Several have created 3D printer command files for MPLAB Snap enclosures; IIRC, there are web services for 3D printing.

david.prentice wrote:
PicKit4 is good value.   You need to make a cable.
The ones at Waveshare package an MPLAB PICkit 4 with a PIC cable.

david.prentice wrote:
The JTAGICE-1 clones plug straight into your board but need to use obsolete software.
AVaRICE was moved to GitHub along with AVRDUDE; AVaRICE is an AVR GDB server.

david.prentice wrote:
The modern alternatives are future-proof.
AVRxt and UPDI are a pair that will exist for quite sometime.

 


MPLAB SNAP Case | Microchip

leads to

"mplab snap"-things - Search - Thingiverse

 

PICkit 4 - Waveshare Wiki

 

https://github.com/avrdudes/avarice/blob/main/src/jtag1.h

 

Instruction Set Summary | AVR® Instruction Set Manual

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

Regarding JTAG debuggers.

 

Thank you David, that is a very helpful posting. You might have saved me wasted time and money. I see that SNAPs can be found on Ebay UK for around $50 to $70. A little more than I want to pay, but I'll have a think.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
...usbTiny ...

I found an 8MHz oscillator module, wired it up, and connected the output to each of the 3 convenient IC pin socket holes which allow using a different xtal.

Pololu - 5.10. Using the clock output to revive AVRs for Pololu USB AVR Programmer v2.1 though several times the price of a USBtiny.

Overview | USBtinyISP | Adafruit Learning System

 


[TUT][SOFT] Recovering from a "locked out" AVR | AVR Freaks

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
I see that SNAPs can be found on Ebay UK for around $50 to $70. A little more than I want to pay, but I'll have a think.

 

You should buy tools and evaluation boards from regular Distributors.

 

I can often buy cheaper from Farnell UK than from Microchip.  And the packet arrives within about 16 hours.

 

Chinese goods are cheaper on Ebay but genuine Atmel, ST, TI,  ... are cheaper from the Distributor.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just FYI. 

 

I found a UK seller selling as "as new" SNAP module for £20 plus postage. Seemed a reasonable price, so I have bought it.

 

BTW, I live in the UK. I tried to find a way to edit my location in my profile, but couldn't find where to set it sad

 

This is only a hobby for me, so I tend to avoid manufacturers official programmers, as they tend to be a little pricey. I cheaped out (some time ago) by buying the usbTiny, which until a few days ago, I hardly ever used. I know that there are better programmers, likely costlier, but this cheap programmer has worked for me.

 

Having said that I do have a genuine PicKit II and III since Microchip were selling them reasonably priced, and the III was heavily discounted when I bought it. If I was planning to work with AVRs more regularly, then even the SNAP at full retail cost I would have considered very reasonable. 

 

Since I'm helping a friend for a one off "project", then I want to keep any costs as low as possible. My main cost will be the number of hours I spend trying to find a solution to the crash, but since this is a hobby, I will get the satisfaction of helping my friend out, which makes it worthwhile for me, as he has certainly helped me often in the past.

 

I don't want to sound like I am trying to be awkward and disagree, but I have bought various small and cheap FPGA and CPU dev boards from China. Yes there is a delay in getting the boards, but the saving over buying in the UK (if even available here) is too tempting for me to pass up. I will admit that I can be a bit of a cheapskate! If I was doing this professionally, then yes, I would buy official products.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seriously.   Buying XMINI, XPRO, Curiosity Evaluation boards are cheap.   And they are cheap from Distributors.

 

Ok,  my SNAP was very cheap i.e. when they first came out.

 

I could buy this new for £27.36 + VAT https://uk.farnell.com/microchip/pg164100/snap-in-ckt-debugger-mcu/dp/2915518

I would avoid a secondhand item from Ebay.

 

Yes,  Arduino clones,  small modules, ... are excellent quality and price from China.

You just have to compare what you get for your money.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0



I live in the UK. I tried to find a way to edit my location in my profile, but couldn't find where to set it 

I added UK in your location, you can add anything else you want. DON'T FORGET to SAVE the new settings if you make any changes.

 

John Samperi

Ampertronics Pty. Ltd.

https://www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You click on your username in the top-right corner.  Then you hit [Account Settings].

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you! I had actually visited this page, but didn't make the connection. I was assuming that this information was set in "Accounts".

 

I thought it best to indicate my location, especially since I got told off in another forum for not indicating that I lived in the UK blush

 

So far all the replies have been very helpful. This is a pleasant surprise since I now get nervous about posting in a new forum due to some negative experiences elsewhere. It may not appear so, but I have lots of experience in the micro processor/controller arena, but when it comes to the AVR I am very much a noob laugh

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK.  He's sucked in.  Let the hazing begin!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK I have made some progress. I will give details in case the methodology is of interest to others in a similar position. This is just my approach, YMMV.

 

So I wanted to use a technique which I have used in the past, where I run the code using a software simulator and get a full trace of every instruction executed, with memory and I/O read and write information. The (very) large log file can be manipulated using various unix tools to look for unusual activity.

 

I did some googling and having eliminated various candidates, I came across "simavr" (GitHub link).

 

So good points were: source code was available if modifications were to be needed; it supports the ATmega128; recent GitHub activity suggesting ongoing development and/or support; advanced capabilities for hooking up peripherals.

 

Bad points: lack of beginners "getting started" guide; if needing to compile from scratch extra support libraries need to be downloaded.

 

My debug environment is a Ubuntu machine running in a VM (Virtual Machine) under Windows 10.

 

I used the software download GUI and installed the "simavr" binary. I copied over the output files from the "arduino_build" folder under Win10. Executing the command in a shell gives basic usage information. The good news is that it can read in Intelhex, so this is perfect because the arduino_build folder contains an Intelhex file. You have to define the CPU type, however the usage does not tell you what this switch is! It is "-mcu <cpu_type>". You also have to give the CPU frequency, but again this usage does not indicate what the  switch is (it's "-freq"). That's a minor quibble, easily fixed in the source. I also copied over my "Blink" sketch and used this as a starting point.

 

migry@Ubuntu:~/AVR$ simavr -mcu atmega128 -freq 16000000 -ff Blink/Blink.ino.hex 
Loaded 1 section of ihex
Load HEX flash 00000000, 2566

The CPU went to full/half(?)  throttle and nothing happened. I had to hit ctrl-C.

 

A little more googling explained that this tool can link with gdb (actually avr-gdb). This isn't clearly explained in the GitHub or supplied PDF manual, but information is found in various blogs. It connects via a port (1234) to gdb. This is new to me. Unfortunately I have very rarely used gdb, so I have no idea of even the simplest commands. So a bit of a learning curve, and TBH I was reluctant to dive into gdb. I'm sure that it is an excellent tool, but I didn't persue this avenue. I also had no idea if gdb could give the trace I was looking for. The "simavr" tool does have a trace switch, but when I tried it, nothing changed, and there was no output file. Also the usage does not give a version, so I had no idea which binary version had been installed.

 

So I decided to bite the bullet and compile the source from scratch. This was a simple download from GitHub. I've "built" tools like this before under Linux. Some go smoothly and some are a nightmare. YMMV. This build/make went reasonably well, but I did have to find the missing libraries and install them, but this required more googling to find out where they were. I should have kept notes, to share with others. Perhaps I will go back and try again in order to create some notes to help others.

 

This version appears to be called "run_avr" and has quite a few extra options, but still no indication of version. In particular there is now support for input and output VCD (Value Change Dump). You can add traces to be sent to the VCD, but once again the syntax is not clear. I had to dive into the source code to try to figure out what the syntax was and what options were allowed. I'm still not sure. At least "-mcu" now "--mcu" and "-freq" now "--freq" are shown in the usage.

 

 

 

migry@Ubuntu:~/AVR$ ./src/simavr/run_avr --mcu atmega128 --freq 16000000 -ff Blink/Blink.ino.hex 
Loaded 1 section(s) of ihex
Load HEX flash 00000000, 2566 at 00000000
Hello..
Hello..
Hello..
Hello..
Hello..
^Csignal caught, simavr terminating

I hit ctrl-C to stop it. Woah! The output which would go to the serial port, now appears in the shell. Wow! It's working. I did try to add signals to the VCD, but failed miserably, so the output VCD was always empty. When I added the trace option, I was clearly told that this needed to enabled in the makefile and the project re-built. I did so by uncommenting a line in the makefile. This was painless.

 

Loaded 1 section(s) of ihex
Load HEX flash 00000000, 2566 at 00000000
avr_run_one: 0000: jmp 0x0000a7
avr_run_one: 014e: clr r1[00]
014e: 									SREG = .Z......
                                       ->> r1=00 
avr_run_one: 0150: out SREG, r1[00]
0150: 									SREG = ........
                                       ->> SREG=00 
avr_run_one: 0152: ldi YL, 0xff
                                       ->> YL=ff YH=00 
avr_run_one: 0154: ldi YH, 0x10
                                       ->> YL=ff YH=10 
avr_run_one: 0156: out SPH, YH[10]
                                       ->> SPL=ff SPH=10 
avr_run_one: 0158: out SPL, YL[ff]
                                       ->> SPL=ff SPH=10 
avr_run_one: 015a: ldi r17, 0x01
                                       ->> r17=01 
avr_run_one: 015c: ldi XL, 0x00
                                       ->> XL=00 XH=00 
avr_run_one: 015e: ldi XH, 0x01
                                       ->> XL=00 XH=01 
avr_run_one: 0160: ldi ZL, 0xe6
                                       ->> ZL=e6 ZH=00 
avr_run_one: 0162: ldi ZH, 0x09
                                       ->> ZL=e6 ZH=09 
avr_run_one: 0164: ldi r16, 0x00
                                       ->> r16=00 
avr_run_one: 0166: out io:5b, r16[00]
                                       ->> io:5b=00 
avr_run_one: 0168: rjmp .2 [016e]
avr_run_one: 016e: cpi XL[00], 0x20
016e: 									SREG = C.N.S...
avr_run_one: 0170: cpc XH[01], r17[01] = ff
0170: 									SREG = C.N.SH..
avr_run_one: 0172: brne .-5 [016a]	; Will branch

OK this is exactly what I am looking for! Brilliant! I have a tiny amount of familiarity with the AVR instruction set and architecture, but using the list file from the arduino_build folder to cross reference helped enormously (it shows the source C together with the compiled assembly).

 

It has taken me a little time to understand the output format, and figure out what are reads and writes, but it's starting to make sense. Since I have the source I can re-format the output if I really need to. So I extracted all the writes and then all the I/O accesses. I used sort and uniq to get a condensed picture of CPU activity.

 

Now I moved over to the firmware which I wanted to debug.

 

migry@Ubuntu:~/AVR$ ./src/simavr/run_avr  --mcu atmega128 --freq 16000000 -ff ZZ210/ZZ210.hex 
Loaded 1 section(s) of ihex
Load HEX flash 00000000, 17578 at 00000000
.[2JTechnoloya Teensy Controller.
Version 16-Apr-2022.
r0=65 r1=00 r2=00 r3=00 r4=00 r5=00 r6=00 r7=00 r8=00 r9=00 r10=00 r11=00 r12=00 r13=00 r14=00 r15=00
r16=00 r17=03 r18=01 r19=80 r20=0d r21=00 r22=36 r23=01 r24=f7 r25=02 XL=f8 XH=02 YL=90 YH=03 ZL=65 ZH=67 
Y+00=00 Y+01=00 Y+02=00 Y+03=00 Y+04=00 Y+05=00 Y+06=00 Y+07=00 Y+08=00 Y+09=00 Y+10=00 Y+11=10 Y+12=0e Y+13=10 Y+14=0e Y+15=03 Y+16=00 Y+17=0a Y+18=00 Y+19=0a 
*** CYCLE 49366949PC ceca
Segmentation fault (core dumped)

 

OK, so this is great. The code crashes, just like it does on the real ATmega128 hardware (both professional kit and also my cheap Chinese dev board). Hmmm, it does appear to crash "simavr" too laugh

 

So now I realised something. The latest version of the code supports ".elf" format, and this format of file is also created in the arduino_build folder. Let's try it and enable tracing...

 

0000: __vectors                 jmp 0x000394
0728: __dtors_end               clr r1[00]
0728: 									SREG = .Z......
                                       ->> r1=00 
072a: __dtors_end               out SREG, r1[00]
072a: 									SREG = ........
                                       ->> SREG=00 
072c: __dtors_end               ldi YL, 0xff
                                       ->> YL=ff YH=00 
072e: __dtors_end               ldi YH, 0x10
                                       ->> YL=ff YH=10 
0730: __dtors_end               out SPH, YH[10]
                                       ->> SPL=ff SPH=10 
0732: __dtors_end               out SPL, YL[ff]
                                       ->> SPL=ff SPH=10 
0734: __do_copy_data            ldi r17, 0x02
                                       ->> r17=02 
0736: __do_copy_data            ldi XL, 0x00
                                       ->> XL=00 XH=00 
0738: __do_copy_data            ldi XH, 0x01
                                       ->> XL=00 XH=01 
073a: __do_copy_data            ldi ZL, 0xd2
                                       ->> ZL=d2 ZH=00 
073c: __do_copy_data            ldi ZH, 0x42
                                       ->> ZL=d2 ZH=42 
073e: __do_copy_data            ldi r16, 0x00
                                       ->> r16=00 
0740: __do_copy_data            out io:5b, r16[00]
                                       ->> io:5b=00 
0742: __do_copy_data            rjmp .2 [0748]
0748: __do_copy_data            cpi XL[00], 0xd8
0748: 									SREG = C....H..
074a: __do_copy_data            cpc XH[01], r17[02] = fe
074a: 									SREG = C.N.SH..
074c: __do_copy_data            brne .-5 [0744]	; Will branch

OMG! It's got even better. I can now clearly see in which function the code is being executed from!

 

I can compare to the arduino_build list file, which I edit in Gvim.

 

 

I'm liking "simavr" more and more! Now I feel that I have the correct tool to allow me to progress the real debug.

 

I hope this might be interesting to readers, or useful to anyone finding this thread.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I will mention just one thing I found unusual about "simavr" and the output format. AVR experts may be able to clarify.

 

07a6: setup                     call 0x0004a8
                                       ->> SPL=f9 SPH=10
0950: _Z14hardware_setupv       ldi r22, 0x02
                                       ->> r22=02
0952: _Z14hardware_setupv       ldi r24, 0x07
                                       ->> r24=07
0954: _Z14hardware_setupv       call 0x0016f9
                                       ->> SPL=f7 SPH=10
2df2: pinMode                   push YL[90] (@10f6)
                                       ->> SPL=f6 SPH=10

At first I thought that there might be a bug with the call and jmp addresses. The call gives an address, but this isn't what is printed on the next line. I quickly determined that they were shifted by one bit.

 

Interestingly the list file from the arduino_build folder uses the same addresses as shown in the "simavr" trace log.

 

 

Now I know from the datasheet that the ATmega 128 has 64k words , resulting in 128k bytes.

It looks as if "simavr" and the Arduino list file are showing the address as a byte address and not a word address. Then "simavr" shows the call and jmp destination as a word address.

I later had to look up the EPLM instruction which appears to allow accessing the ROM using byte addressing.

 

It just is a shame since I might want to search on the call hex value, but this won't work since the address printed for the function is the hex value doubled. Since the names are extracted from the ".elf" this isn't really a problem, but it seems inconsistent. I did wonder why the Arduino list file also used byte addressing for the addresses, however I can see that the call assembly also shows the byte address rather than the word address.

 

Perhaps someone can give some further explanation?

Is there an accepted standard in the AVR world?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Surely you have the actual C, C++, ... source code for your ATmega128 project.

And you can create and save a copy of of the generated ELF file.

 

So you would study this high level source code first.

Run the ELF file in a suitable IDE.

Perhaps you set breakpoints and examine the program variables, memory, AVR registers, ...

 

I understand that you have bought a SNAP debugger.   It should work nicely in AS7.0 running on your Windows 10 PC.

 

It should also work in MPLABX (on Linux) but the debugging will be less pleasant.

 

Seriously.   Porting code from a Teensy 3.5 to an ATmega128 is largely a question of using appropriate variables and expressions.

e.g. native 32-bit int expressions on an ARM seldom overflow or underflow.   Native 16-bit calculations on an AVR require care.

e.g. ARM has a nice flat address space.   AVR has separate Flash, EEPROM, SRAM, areas.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry, nice work.   You could also run sim_avr under gdb and when it seg-faults use gdb "up" command until you get to the simavr code where it happened, then look at the sim variables.

 

EDIT: when i say under gdb, I don't mean using the simavr gdb-target mode, but the main program under gdb.  You'll have to compile and link aimavr with "-g".

 

I don't think gdb can do the type of trace you want.  You can trace specific locations and log those to a file.

 

EDIT: I looked into these years ago and ended up (mostly for fun) writing my own simulator.   I have breakpoints, tracing, and working on trap.  Years away from github though.

Last Edited: Sun. Jun 26, 2022 - 05:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David, thank you for replying. I do not want to appear ungrateful for your help and pointers, since I think I am heading in the right direction. I have addressed your concerns below (hopefully).

 

david.prentice wrote:

Surely you have the actual C, C++, ... source code for your ATmega128 project.

And you can create and save a copy of of the generated ELF file.

 

Yes, we wrote the source code from scratch. It served as a way to help my friend learn to code with C. We have also coded in a defensive manner to avoid standard issue such as array overflow (use of strcpy etc.).

When we compile for the ATmega128 in the Arduino GUI by selecting the MegaCore plug-in we get all the standard files, Intelhex and Elf. I was using the ".elf" with "simavr" in order to get the symbols.

 

david.prentice wrote:

So you would study this high level source code first.

 

Well it's our code and we know it very well. We (well I forced my friend) to write a test plan for the end of phase#1, and he ran these tests, recording bugs and other issues. I am not a software developer by trade, but I have tried to help my friend work in a more structured way to minimise any bugs as we go along.

When it comes to the Adafruit LCD I2C library(*), I have studied the code, and here I am less comfortable because I am a C++ noob (well possibly more advanced than noob, but I make no claims).

 

(*)This appears to trigger the crash.

 

david.prentice wrote:

Run the ELF file in a suitable IDE.

Perhaps you set breakpoints and examine the program variables, memory, AVR registers, ...

 

OK. I am using "simavr" as a simulator. It has no GUI, but I actually prefer this.

Please bear in mind, I am not familiar with the AVR world, and so I have no knowledge of the various tools, which AVR experts would use in my position. That's why I posted, so ask for advice. While I can google to find tools, there's nothing like getting advice from experts who can recommend specific tools.

 

david.prentice wrote:

I understand that you have bought a SNAP debugger.   It should work nicely in AS7.0 running on your Windows 10 PC.

It should also work in MPLABX (on Linux) but the debugging will be less pleasant.

 

OK thank you for the suggestion. AS7 sounds like the tool I need to use (and learn how to use).

 

Once I have the JTAG SNAP module, I will certainly install Atmel Studio.

If it already has a simulation capability, then I don't know this.

 

I will use the JTAG module to stop (real hardware) before the "lcd.print("crash")" line and examine memory which I have identified as relevant from the "simavr" work. I could possibly already do this using "simavr" and connecting/controlling it via gdb, but I was reluctant to go this route, simply because I don't know this tool. Some might criticise me for being lazy and tell me to just learn the damn tool, but I will decide when I am ready to do this, possibly if other avenues don't pan out. I literally just had a play with avr-gdb. It has so many commands (from help) that I am floundering. I will need to find a good YT video to learn the basics.

 

david.prentice wrote:

Seriously.   Porting code from a Teensy 3.5 to an ATmega128 is largely a question of using appropriate variables and expressions.

e.g. native 32-bit int expressions on an ARM seldom overflow or underflow.   Native 16-bit calculations on an AVR require care.

e.g. ARM has a nice flat address space.   AVR has separate Flash, EEPROM, SRAM, areas.

 

I am knowledgeable enough to know that the code should port easily. That's an advantage of using the Arduino environment, support code and libraries, they usually have been well tested on many variants of CPU (both Teensy and AVR).

 

We have been careful in coding, w.r.t. variable types and sizes, only using things like uint32_t, uint16_t, int_8, etc to try to avoid portability bugs, but this is not an area of expertise, so yes there could be problems.

 

My biggest concern was the difference in the amount of RAM, and initially I thoughts that this was certainly the reason for the crash. I no longer think this. This concern caused us to thoroughly go through the code base and move all string constants to ROM (PROGMEM in Arduino speak). On the Teensy "const char" are automatically put into ROM, but not with the ATmega128 version of the compiler. This required code changes (mainly adding PROGMEM for string constants). We now have plenty of free RAM (~2.5k out of 4k).

 

Once we #ifdef out the "lcd" module, the code runs, we can operate the VT100-like serial GUI, and the peripherals on the real hardware are all operating as we expect. 

Again initially I thought the bug was in my modification of the Adafruit I2C library (uses software bit bang I2C), but using the original Adafruit library (uses hardware I2C) the crash still happens.

 

Ironically there is no LCD (or OLED) on the real hardware. The OLED was added to the Teensy version of the project to give some sort of status screen. For the ATmega128 the LCD was wanted only for printing debug messages to the LCD to help during code development. We are now running various tests on the real hardware to confirm that we can operate all the peripherals, and so far everything is working. Basic "mission mode" operation is also working correctly.

 

My friend felt compelled to develop a Teensy version, in order to bring the hardware into the 21st century and add features which the original hardware (not his design BTW) and software does not support and is unlikely ever to. Certainly the final Teensy code base will not port back to the old hardware, as he wants to add use of SD card, digital audio (I2S) , and ethernet connectivity. My friend intents to release the project for anyone to use. For this reason, and given he admits to being on a big learning curve, we are trying to keep the code as clean and professional as possible.

 

The port comes from his frustration with the current hardware/software and bugs introduced with every new release of firmware.

I looked on it as a challenge to port our new code base (now at end of phase #2) to the old hardware, just to prove that the old hardware could be re-purposed.

 

 

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi MattRW, thank you for your comments.

 

This encouraged me "to have a go" using gdb, a tool which I am horribly unfamiliar with.

 

migry@Ubuntu:~/AVR/ZZ210$ avr-gdb ZZ210.elf 
GNU gdb (GDB) 7.11.1

...
Reading symbols from ZZ210.elf...done.
(gdb) target remote :1234
Remote debugging using :1234
0x00000000 in __vectors ()
(gdb) c
Continuing.

Now although the code has crashed in "simavr" avr-gdb hasn't given me a prompt. So I need to ctrl-C. The crash is in the usual place.

 

^C
Program received signal SIGINT, Interrupt.
0x0000cecc in ?? ()
(gdb) up
Initial frame selected; you cannot go up.
(gdb) 

I had more of a play and there is a problem, not serious, in that since the code/elf was compiled under Win10 all the paths to the source are for the Win10 machine and not the VM. avr-gdb can't display source code as a consequence.

 

So I will try to install the Arduino GUI in this Linux VM. I think that this is possible. It will take a day or two to install Arduino, libraries and MegaCore guess, but then I should get access to source code in the avr-gdb debugger.

 

It's an interesting quandary. You know that there are tools out there that do more or less what you want, but you really want feature 'X'. So do you use someone else's codebase and try to understand the code, not always easy or pleasant, or do you say "what the heck" and start your own version of the tool? Yes, it can turn out to be a lot more effort than first imagined :-) . But good on you for trying. Hopefully it was fun while you were working on it?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
For the ATmega128 the LCD was wanted only for printing debug messages to the LCD to help during code development.
Alternatives :

  • logic analyzer
  • spare peripheral to Data Visualizer (Microchip Studio extension, MPLAB X plug-in, EXE or exec)

 

P.S.

migry wrote:
We have been careful in coding, w.r.t. variable types and sizes, only using things like uint32_t, uint16_t, int_8, etc to try to avoid portability bugs, but this is not an area of expertise, so yes there could be problems.
A linter will mow through the source code to identify instances of

  • ambiguities
  • known defect patterns

 


Protocol decoder:numbers_and_state - sigrok

https://onlinedocs.microchip.com/?find=Data%20Visualizer

Connecting to Data Gateway Interface | Atmel-ICE

Pinouts for Interfaces | MPLAB® PICkit™ 4 In-Circuit Debugger User's Guide

[mid-page]

Table 2. Pinouts for Data Stream Interfaces

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry,  Sorry I think I did not communicate well.  I intended to steer you to something like ...

 

$ gdb  ./src/simavr/run_avr
(gdb) run--mcu atmega128 --freq 16000000 -ff ZZ210/ZZ210.hex
...
sigfault
(gdb) up
...
[shows context in the source for simavr]
(gdb) p [some structure]showing cpu context]->pc

I do this all the time.  I look at the pc and registers in the context of where the simulator crashed.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok MattRW, thanks for the further suggestion.

 

BTW previously I was running "run_avr" in another xterm and I was connected remotely from the avr-gdb tool. I wondered whether the crash messed up the stack so much avr-gdb was unable to make sense of the situation?

 

I tried again using "gdb" (which AFAIK debugs i686 code on the Linux VM), and I was able to use the run command in the way you show, but now when I did the "up" command it was clear that I was in the source of "run_avr", i.e. I was debugging the "simavr" simulator and not the AVR code. Also note after the AVR crash occurs I have to ctrl-C gdb to get a prompt.

migry@Ubuntu:~/AVR/ZZ210V5$ gdb ~/AVR/src/simavr/run_avr 

GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/migry/AVR/src/simavr/run_avr...done.

(gdb) run  --mcu atmega128 -f 16000000  ~/AVR/ZZ210V5/ZZ210.elf

Starting program: /home/migry/AVR/src/simavr/run_avr --mcu atmega128 -f 16000000  ~/AVR/ZZ210V5/ZZ210.elf
Loaded 17106 bytes at 0
Loaded 472 bytes at 800100
.[2JTechnoloya Teensy Controller.
Version 16-Apr-2022.
r0=65 r1=00 r2=00 r3=00 r4=00 r5=00 r6=00 r7=00 r8=00 r9=00 r10=00 r11=00 r12=00 r13=00 r14=00 r15=00
r16=00 r17=03 r18=01 r19=80 r20=0d r21=00 r22=36 r23=01 r24=f7 r25=02 XL=f8 XH=02 YL=90 YH=03 ZL=65 ZH=67 
Y+00=00 Y+01=00 Y+02=00 Y+03=00 Y+04=00 Y+05=00 Y+06=00 Y+07=00 Y+08=00 Y+09=00 Y+10=00 Y+11=10 Y+12=0e Y+13=10 Y+14=0e Y+15=03 Y+16=00 Y+17=0a Y+18=00 Y+19=0a 
*** CYCLE 49366949PC ceca
*** 2ce4: delay                     RESET -1; sp 10ed
*** 2bc0: __empty                   RESET -2; sp 10ef
*** 2ce0: delay                     RESET -3; sp 10ed
*** 2cb8: micros                    RESET -4; sp 10ef
*** 2c88: micros                    RESET -5; sp 10ed
*** 2c88: micros                    RESET -6; sp 10ed
*** 2ce4: delay                     RESET -7; sp 10ed
*** 2bc0: __empty                   RESET -8; sp 10ef
*** 2ce0: delay                     RESET -9; sp 10ed
*** 2cb8: micros                    RESET -10; sp 10ef
*** 2c88: micros                    RESET -11; sp 10ed
*** 2ce4: delay                     RESET -12; sp 10ed
*** 2bc0: __empty                   RESET -13; sp 10ef
*** 2ce0: delay                     RESET -14; sp 10ed
*** 2cb8: micros                    RESET -15; sp 10ef
*** 2c88: micros                    RESET -16; sp 10ed
*** 2ce4: delay                     RESET -17; sp 10ed
*** 2bc0: __empty                   RESET -18; sp 10ef
*** 2ce0: delay                     RESET -19; sp 10ed
*** 2cb8: micros                    RESET -20; sp 10ef
*** 2c88: micros                    RESET -21; sp 10ed
*** 2c88: micros                    RESET -22; sp 10ed
*** 2ce4: delay                     RESET -23; sp 10ed
*** 2bc0: __empty                   RESET -24; sp 10ef
*** 2ce0: delay                     RESET -25; sp 10ed
*** 2cb8: micros                    RESET -26; sp 10ef
*** 2c88: micros                    RESET -27; sp 10ed
*** 2ce4: delay                     RESET -28; sp 10ed
*** 2bc0: __empty                   RESET -29; sp 10ef
*** 2ce0: delay                     RESET -30; sp 10ed
*** 2cb8: micros                    RESET -31; sp 10ef
Stack Ptr 10f7/10ff = 8 
avr_gdb_init listening on port 1234

^C

Program received signal SIGINT, Interrupt.
0xb7fd6d09 in __kernel_vsyscall ()
(gdb) up
#1  0xb7e233f1 in __GI___select (nfds=4, readfds=0xbfffdd9c, writefds=0x0, exceptfds=0x0, timeout=0xbfffdd88)
    at ../sysdeps/unix/sysv/linux/select.c:41
41	../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) up
#2  0xb7f3c487 in gdb_network_handler (g=0x427cc0, dosleep=dosleep@entry=50000) at sim/sim_gdb.c:800
800		int ret = select(max, &read_set, NULL, NULL, &timo);
(gdb) up
#3  0xb7f3d63c in avr_gdb_processor (avr=0x40b350, sleep=50000) at sim/sim_gdb.c:939
939		return gdb_network_handler(g, sleep);
(gdb) up
#4  0xb7f4046e in avr_callback_run_gdb (avr=0x40b350) at sim/sim_avr.c:290
290		avr_gdb_processor(avr, avr->state == cpu_Stopped ? 50000 : 0);
(gdb) up
#5  0xb7f4057f in avr_run (avr=0x40b350) at sim/sim_avr.c:405
405		avr->run(avr);
(gdb) up
#6  0x0040102e in main (argc=<optimised out>, argv=<optimised out>) at sim/run_avr.c:281
281			int state = avr_run(avr);
(gdb) up
Initial frame selected; you cannot go up.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry.   gdb stopped because of the ^C (SIGINT).   It looks like you hit that.  I thought you said it was crashing on a segfault. If not, this approach may be wasting time.     

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK, here's an update. I have found the bug.

 

I am embarrassed to say that it was a stupid programming mistake in a trivial piece of code which I wrote.

I think this is a classic bug in C, in that I wrote to an array "out of bounds". The first byte following the end of the array was written to as a consequence of the bug in the code.

 

By chance the memory after the array was used by the "lcd" C++ object. The first byte of this object was corrupted. The first part of the "lcd" object appears to be 2 vectors, which is then followed by the private variables found in the "lcd.h" file. I am unsure as to what these vectors are, however the first appears to point to an array of addresses in ROM, one of which is "lcd.write()". Corrupting a vector causes a nasty crash. Ironically if the memory following the array was used for simple variables, the corruption might have had zero impact and would have gone unnoticed. The fact that a vector was corrupted at least made the CPU go haywire!


#define SR_MAX_PORT 16

...

   for (uint8_t i=0; i < SR_MAX_PORT; i++) 
   {
      output_port[SR_MAX_PORT] = false;     // <=== index SHOULD be i NOT SR_MAX_PORT
   }
   output_port[SR_P1PTX]  = PTX_OFF;
   output_port[SR_P2PTX]  = LED_OFF;
   output_port[SR_P3PTX]  = LED_OFF;
   output_port[SR_P1PLX]  = CTC_OFF;   
   load_serial_outputs();
   

It looks like a cut and paste problem (I probably copied the definition text), in that the array index is the value used in the definition (SR_MAX_PORT) when it should be the loop index 'i'.

The code output_port[SR_MAX_PORT] = false; writes one byte into the memory following the array.

 

I called my friend using Skype to explain that I had done some debugging and knew where the problem was but didn't yet know why. He said that by chance he had the code open, so I took him to the module where the above source code is, and explained that there was a write which I didn't understand, which looked like accessing the array out of bounds "as we have discussed", when he spotted the bug which I describe above.

 

After correcting the bug, the code now runs and the LCD is operational on both my dev board (uses hardware I2C) and my friends original hardware (uses the new bit bang I2C library).

 

For those curious as to how I found the bug (narrowed down the bugs location?) I will make a further posting with more details.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well done!

migry wrote:
I think this is a classic bug in C, in that I wrote to an array "out of bounds".
Such is a common part of coding standards and linters (pattern matching of common defects)

ARR30-C. Do not form or use out-of-bounds pointers or array subscripts - SEI CERT C Coding Standard - Confluence

C: Memory access should be explicitly bounded to prevent buffer overflows (SonarSource)

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gchapman wrote:

Well done!

migry wrote:
I think this is a classic bug in C, in that I wrote to an array "out of bounds".
Such is a common part of coding standards and linters (pattern matching of common defects)

ARR30-C. Do not form or use out-of-bounds pointers or array subscripts - SEI CERT C Coding Standard - Confluence

C: Memory access should be explicitly bounded to prevent buffer overflows (SonarSource)

 

 

And I apologise for not taking heed of the posters who recommended linting tools.

 

I will now pursue finding a lint tool and confirming that it would have caught this bug.

 

I briefly checked in my Linux VM and there doesn't appear to be a lint. Perhaps there is a switch to gcc to get it to do extra checks, making it "lint like" checks?

 

I just opened the project in Win10 by running the Arduino GUI. I raised the error setting from "Default" to "All", and wouldn't you just believe it ...

 

 

I couldn't copy and paste from the Arduino compile output window, neither could I find the log file. Google didn't help either. I guess one of those quirks of the Arduino GUI.

 

OK, so no need to pursue finding a linting tool. It's already there in gcc!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Excellent!

That sorta reminds me of a bug from 30 years ago with an LCD.  All of my strings had some marker to indicate the end of the line, which was used to put up the messages (I think it was a 4 line LCD).  Anyhow, everything worked fine except for this occasional annoying "flicker".  I used a debugger all day to "prove" what was happening was not possible---tracing the program showed no issue, but what I was tracing/stepping seemed to prove the issue couldn't be happening.  Finally, after several hours, I realized using the debugger was actually fixing (preventing) the problem.  I had accidentally left out one of the markers in one message, so when that message was up, the routine did not terminate properly, it kept going through memory until it found a marker (causing the lcd to blank out or maybe show a bad char, due to some 8 bit int finally wrapping around). However, it would be rewritten fast enough that the problem just looked like flicker.   By pure coincidence, it so happened that the debugger inserted the same missing marker value (prob 0x00 or 0xFF) as part of the debugging process.  This was encountered before the variable could wrap around, preventing the bounds issue.  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
I briefly checked in my Linux VM and there doesn't appear to be a lint.
Lint is in developer packages.

migry wrote:
Perhaps there is a switch to gcc to get it to do extra checks, making it "lint like" checks?
Yes; likewise for Clang.

migry wrote:
It's already there in gcc!
Minimally essential in GCC and a bit more in Clang; a linter is much closer to complete.

 


Ubuntu – Software Packages in "jammy", Subsection devel (search for 'lint')

MPLAB X linter plug-in is a variant of Cppcheck :

Come Join Us (MPLAB Now Supports AVRs) | Page 8 | AVR Freaks

Online Demo - Cppcheck

 

Static Analyzer Options (Using the GNU Compiler Collection (GCC))

linter-gcc-with-avr

CodeChecker

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MattRW wrote:

migry.   gdb stopped because of the ^C (SIGINT).   It looks like you hit that.  I thought you said it was crashing on a segfault. If not, this approach may be wasting time.     

 

Sorry I'm not explaining it clearly. Also I'm getting slightly different behaviour today as compared to yesterday, in that the "run_avr" SIGSEGV no longer occurs.

 

Bottom line is although "run_avr" supports remote debugging using avr-gdb, in this case it's not giving useful information, presumably as a consequence of the simulated crash of the AVR code. When the AVR code crashes, "run_avr" does not give control back to gdb to allow inspection of stack frame and variables. At this point there is no prompt in gdb and the only way to do anything is to hit ctrl-C. Also my lack of knowledge of gdb means that I cannot do the most basic of status reports.

 

I need to learn more about (avr-)gdb breakpoints and stop the code simulation cleanly before the crash. I think then avr-gdb will be able to give useful information.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can't help thinking you'd find life simpler if you simply used MPLABX and the simulator it contains ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Or if you just used AS7.0 on the Native Windows PC on your desk.

 

Incidentally,  the Array Bounds violation would have been just as damaging on Teensy 3.5

It is the kind of problem that lurks.

 

Not that easy to find with a hardware debugger.   (Unless you know the exact address that gets corrupted)

 

Much better to just write HLL code with correct widths, array sizes etc.

And if the Compiler issues a warning you need to investigate.

 

I would concentrate on porting your friend's HLL code first.

You can play with blinking LEDs at a later date.

 

David.

Last Edited: Tue. Jun 28, 2022 - 08:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Or if you just used AS7.0 on the Native Windows PC on your desk.
Except that #26 suggests he's using Ubuntu not Windows. (hence my MPLABX suggestion)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From #26.

migry wrote:

My debug environment is a Ubuntu machine running in a VM (Virtual Machine) under Windows 10.

 

I understood this to be a Windows10 PC.    i.e. he has a Win10 licence.

 

It would be different if it was a Linux PC running Win10 in a VM without a Win10 licence.

 

The Operating System is academic.   My point was that you verify HLL code when porting.   Especially if your original HLL Teensy code was working.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Not that easy to find with a hardware debugger.
Concur

david.prentice wrote:
(Unless you know the exact address that gets corrupted)
By debugger "breadcrumbs" to section code then resection to locate the defect.

 


Adding Automatic Debugging to Firmware for Embedded Systems by Jack Ganssle

[assertions]

[after 3/4 page]

Side Effects

... (you do use Lint, don't you? It's an essential part of any development environment) ...

 

IDR Events | Application Output | Trace | Microchip Studio

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
It would be different if it was a Linux PC running Win10 in a VM without a Win10 licence.
An activation key enables Windows updates; Microsoft's VHD are useful.

 

Importing Virtual Machines and disk images | Qemu/KVM Virtual Machines - Proxmox VE

[mid-display]

Step-by-step example of a Windows OVF import

Microsoft provides Virtual Machines downloads to get started with Windows development.We are going to use one of these to demonstrate the OVF import feature.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to clarify (FWIW) my PC runs Win10, but I use Oracle VMs to be able to have a Linux (Ubuntu) desktop on the same machine. Some projects related to old retro CPUs and microcontrollers I prefer to do on the Linux VMs, because I like to have all the usual Unix tools at hand and a lot of user code appears to come from a Linux environment (configure/build/make files etc.). BTW I have Unix tools installed under Windows too, including Cygwin64 which I mostly use when I need to use the "find" Unix command.

 

I received the SNAP JTAG board yesterday, and installed Microchip Studio and MPLABX (IDE and IPE). There are clearly very large and sophisticated tools and I got the impression that to use them efficiently, you will need to invest a lot of time.

 

Thanks to a YT video I was able to get a simple program working in MPLABX, and after some fighting I was able to program the ATmega128 using the SNAP and I was able to use the debugger.

 

Again thanks to another YT video I discovered the "arduino" plug-in for MCS7. I was able to get "Blink" to buiild, however the ZZ210 code failed to build due to not finding various include files. I made many efforts to manually update the include search path, but the damn tool kept overwriting my entries, and drove me crazy. I tried to switch off whatever "auto" feature was doing this, but clearly I was unable to stop this behaviour. There are so many menus and options in the menus that it made my head spin. In the end I gave up.

 

I might offend fans of this software, but I'll simply say "it's not for me". Clearly YMMV.

 

I fully understand that to get the best out of many software tools, you have to invest the time to learn it. So I can see that someone who is proficient in the use of this tool would be able to use it to debug the kind of bug I had.

 

At the end of the day we all are free to choose whatever tools and methodology we prefer, even if they are not the most efficient in the eyes of other people. Let's not fall out because we all have different opinions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

Incidentally,  the Array Bounds violation would have been just as damaging on Teensy 3.5

It is the kind of problem that lurks.

 

I fully agree.

 

It has been a bit of an eye opener to say the least.

 

In the next session I have with my friend, he will be setting Arduino warning to "All" and we will go through each reported warning one by one and each will be fixed. Yes there are more than the one that was shown above! A side effect of the project is that I am teaching my friend to be able to program in C, however I am not a professional software developer (I'm a hardware person) so I am going to be a less than perfect teacher(*). Fixing these warnings will be a useful part of the sessions and my friends (software development) education.

 

(*) Due to observations from training sessions related to my day job, I mentioned to my friend that (adult) students will rarely criticise teachers (well possibly not in the anonymous feedback). To teach you only have to be a bit more knowledgeable than the student to "get away with it". But as we also discussed the difference at school between an excellent teacher and an inadequate teacher can be "life changing" to a child.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

migry wrote:
Just to clarify (FWIW) my PC runs Win10, but I use Oracle VMs to be able to have a Linux (Ubuntu) desktop on the same machine.
WSL 2 in Windows 10 is an alternative; USB is a work-in-progress in WSL 2 Windows 11.

migry wrote:
Some projects related to old retro CPUs and microcontrollers I prefer to do on the Linux VMs, because I like to have all the usual Unix tools at hand and a lot of user code appears to come from a Linux environment (configure/build/make files etc.).
For the ones at Atmel, Windows was the choice as Linux was approximately half decade of age when AVR was announced; WinAVR includes some of those Unix tools.

EOL AVR have AVR Studio 4 (zero price, up to AVRJTAGICE2 inclusive) or VisualGDB (low price, AVaRICE); NRND/EOL distributors still stock EOL AVR.

migry wrote:
BTW I have Unix tools installed under Windows too, including Cygwin64 which I mostly use when I need to use the "find" Unix command.
MSYS is another; some prefer MSYS for building GCC.

migry wrote:
Again thanks to another YT video I discovered the "arduino" plug-in for MCS7.
Visual Studio Code has an Arduino extension.

 


Run Linux GUI apps with WSL | Microsoft Docs

WSL: Run Linux GUI Apps - YouTube (17m16s)

Connect USB devices | Microsoft Docs (USB/IP is the work-around for WSL 2 in recent Windows 11; alternative is USB pass-through in your preferred VMM)

 

Fixing AS4.19 so it works with WinAVR | AVR Freaks

Reviving AT90S8515 Project Using AVRStudio 4.18 | AVR Freaks

AVRISP MKII not working in Windows 10 | AVR Freaks (AVR Studio 4.19)

WinAVR / Code / [r299] /trunk/avrtest (another AVR simulator)

 

MSYS2

 

AVR Studio On Mac & Linux? | AVR Freaks

AVR in VS Code | AVR Freaks

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

WARNING - long post. Skip now if not interested.

 

For those interested in a more detailed explanation of how I found the bug, here are the details.

 

Disclaimer. There are many different methodologies and tools for software debugging. The method which I used may not be the best or most efficient. I hope that the information might be useful to someone in the same position as me.

 

Quick summary. The code, developed initially for a Teensy in the Arduino environment and now ported to an ATmega128, was crashing and rebooting (repeated sign-on message from the serial port gave this away). The problem had been narrowed down to one line of code [ first use of lcd.print("TEST TEXT"); ]. My personal preference was to use a simulator to see what was going on. I have previously used simple non GUI simulators compiled and running in a Linux environment quite effectively for other processors Searching online I found a number of AVR simulators, but was drawn to "simavr" since the source code was on GitHub and it seemed to have recent pushes, suggesting that it was still live and supported. With a little effort and support library install, I was able to compile the source, the second time with "trace" enabled by uncommenting a line in the Makefile.

 

The latest version of "simavr" supports reading the ".elf" file. In my previous use of Arduino (other boards) I had used the built in programmer. My ATmega128 has no boot loader. Since I was now using an usbTiny programmer I had to find the arduino build folder and grab the intelhex from there to program the ATmega128, using command line avrdude (for those interested). I noted with interest the other files which included an ".elf" file and an assembly list file.

 

When either the .hex or .elf is loaded into "simavr" and run, the AVR code crashes and there is a register dump. So this mimics what is seen on real hardware. Using the '.elf' causes each line of the trace file to be pre-pended with the name of the function in which the code comes from. This was extremely helpful and did not (could not) happen if using the simple ".hex" file as input. The log file is 4Gb in length. In hindsight I could have removed some debug LED flashing code we added when trying to find the location of the crash. Reason is the LED flash code also had delay(1000) which resulted in a lot of unncessary tracing of the delay() and micro() functions in the log file. The crash occurs early in the code where the "setup_oled()" function is being called from Arduino "setup()".

 

I focessed on the last code executed before the register dump.

 

3308: _ZN5Print5writeEPKc.part.2 tst r0[00]
3308:                                                                   SREG = .Z.....I
                                       ->> r0=00
330a: _ZN5Print5writeEPKc.part.2 brne .-3 [3306]        ; Will not branch
330c: _ZN5Print5writeEPKc.part.2 sbiw ZL:ZH[0144], 0x01
330c:                                                                   SREG = .......I
                                       ->> ZL=43 ZH=01
330e: _ZN5Print5writeEPKc.part.2 movw r20:r21, ZL:ZH[0143]
                                       ->> r20=43 r21=01
3310: _ZN5Print5writeEPKc.part.2 sub r20[43], r22[36] = 0d
3310:                                                                   SREG = .....H.I
                                       ->> r20=0d
3312: _ZN5Print5writeEPKc.part.2 sbc r21[01], r23[01] = 00
3312:                                                                   SREG = .......I
                                       ->> r21=00
3314: _ZN5Print5writeEPKc.part.2 movw XL:XH, r24:r25[02f7]
                                       ->> XL=f7 XH=02
3316: _ZN5Print5writeEPKc.part.2 ld ZL, X[02f7]++
                                       ->> XL=f8 XH=02 ZL=00 ZH=01
3318: _ZN5Print5writeEPKc.part.2 ld ZH, X[02f8]
                                       ->> XL=f8 XH=02 ZL=00 ZH=02
331a: _ZN5Print5writeEPKc.part.2 ld r0, (Z+2[0202])=[65]
                                       ->> r0=65
331c: _ZN5Print5writeEPKc.part.2 ld ZH, (Z+3[0203])=[67]
                                       ->> ZL=00 ZH=67
331e: _ZN5Print5writeEPKc.part.2 mov ZL, r0[65] = 65
                                       ->> ZL=65 ZH=67
3320: _ZN5Print5writeEPKc.part.2 ijmp Z[ceca]
avr_run_one: ceca: RESET
r0=65 r1=00 r2=00 r3=00 r4=00 r5=00 r6=00 r7=00 r8=00 r9=00 r10=00 r11=00 r12=00 r13=00 r14=00 r15=00
r16=00 r17=03 r18=01 r19=80 r20=0d r21=00 r22=36 r23=01 r24=f7 r25=02 XL=f8 XH=02 YL=90 YH=03 ZL=65 ZH=67
Y+00=00 Y+01=00 Y+02=00 Y+03=00 Y+04=00 Y+05=00 Y+06=00 Y+07=00 Y+08=00 Y+09=00 Y+10=00 Y+11=10 Y+12=0e Y+13=10 Y+14=0e Y+15=03 Y+16=00 Y+17=0a Y+18=00 Y+19=0a
*** CYCLE 49366949PC ceca

 

I wrote some notes and reformatted the text to make is more readable (to me).

 

Just a little further back in the log file

 

2d26: delay                     pop r9 (@10f6)[00]
                                       ->> r9=00 SPL=f6 SPH=10
2d28: delay                     pop r8 (@10f7)[00]
                                       ->> r8=00 SPL=f7 SPH=10
2d2a: delay                     ret
                                       ->> SPL=f9 SPH=10
0cda: _Z10oled_setupv           ldi r22, 0x36
                                       ->> r22=36
0cdc: _Z10oled_setupv           ldi r23, 0x01
                                       ->> r23=01
0cde: _Z10oled_setupv           ldi r24, 0xf7
                                       ->> r24=f7
0ce0: _Z10oled_setupv           ldi r25, 0x02
                                       ->> r25=02
0ce2: _Z10oled_setupv           call 0x0019b5
                                       ->> SPL=f7 SPH=10
336a: _ZN5Print5printEPKc       cp r22[36], r1[00] = 36
336a:                                                                   SREG = .......I
336c: _ZN5Print5printEPKc       cpc r23[01], r1[00] = 01
336c:                                                                   SREG = .......I
336e: _ZN5Print5printEPKc       breq .2 [3374]  ; Will not branch
3370: _ZN5Print5printEPKc       jmp 0x001982
3304: _ZN5Print5writeEPKc.part.2 movw ZL:ZH, r22:r23[0136]

 

OK so thanks to the labels as a consequence of using the ".elf" I confirm that we are in "oled_setup()" and the "Print" hints at the C++ Print function. The "write" suggests trying to write the first character of the sting in lcd.print("TEST LINE").

 

After some searching in the low level code of the Arduino library(?) source files, in "Print.h" I found the following code.

 

   virtual size_t write(uint8_t) = 0;
    size_t write(const char *str) {
      if (str == NULL) return 0;
      return write((const uint8_t *)str, strlen(str));
    }

 

Going back to the call.

 

0cda: _Z10oled_setupv           ldi r22, 0x36
                                       ->> r22=36
0cdc: _Z10oled_setupv           ldi r23, 0x01
                                       ->> r23=01
0cde: _Z10oled_setupv           ldi r24, 0xf7
                                       ->> r24=f7
0ce0: _Z10oled_setupv           ldi r25, 0x02
                                       ->> r25=02
0ce2: _Z10oled_setupv           call 0x0019b5

 

I wondered what these four values were used for. I noted that the r22 and r23 appeared to be a pointer to a text(?) string which was begin scanned using a loop to find a NUL. Aha the strlen() function.

 

3304: _ZN5Print5writeEPKc.part.2 movw ZL:ZH, r22:r23[0136] ->> ZL=36 ZH=01
3306: _ZN5Print5writeEPKc.part.2 ld r0, Z[0136]++ ->> r0=54 ZL=37 ZH=01
3308: _ZN5Print5writeEPKc.part.2 tst r0[54]
3308:                                                                   SREG = .......I ->> r0=54
330a: _ZN5Print5writeEPKc.part.2 brne .-3 [3306]        ; Will branch
3306: _ZN5Print5writeEPKc.part.2 ld r0, Z[0137]++ ->> r0=45 ZL=38 ZH=01
3308: _ZN5Print5writeEPKc.part.2 tst r0[45]
3308:                                                                   SREG = .......I ->> r0=45
330a: _ZN5Print5writeEPKc.part.2 brne .-3 [3306]        ; Will branch
3306: _ZN5Print5writeEPKc.part.2 ld r0, Z[0138]++ ->> r0=53 ZL=39 ZH=01
3308: _ZN5Print5writeEPKc.part.2 tst r0[53]
3308:                                                                   SREG = .......I ->> r0=53

Z is initialised to 0x0136 and then the location pointed to be Z is read and then Z is incremented. 0x54, 0x45 and 0x53 are ASCII for "TES".

 

I didn't know what the other parameter was, but guessed a vector of 0x02f7 pointing into RAM (since the address was similar in value to the string pointer).

 

So I now used search in "vim" and used a simple regexp to find similar addresses  /02f[0-9a-f] .

 

In "__do_clear_bss" these locations were written with zero (0x00). This is the BSS initialisation done by the C startup code. OK that makes sense. Might be a global variable?

 

Then I found the following.

 

0d72: _GLOBAL__sub_I_line1      ldi r25, 0x02
                                       ->> r25=02
0d74: _GLOBAL__sub_I_line1      jmp 0x000f47
1e8e: _ZN17LiquidCrystal_I2CC2Ehhh movw ZL:ZH, r24:r25[02f7]
                                       ->> ZL=f7 ZH=02
1e90: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+3[02fa]), r1[00]
1e92: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+2[02f9]), r1[00]
1e94: _ZN17LiquidCrystal_I2CC2Ehhh ldi r24, 0xa8
                                       ->> r24=a8
1e96: _ZN17LiquidCrystal_I2CC2Ehhh ldi r25, 0x02
                                       ->> r25=02
1e98: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+1[02f8]), r25[02]
1e9a: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+0[02f7]), r24[a8]
1e9c: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+4[02fb]), r22[27]
1e9e: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+9[0300]), r20[14]
1ea0: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+10[0301]), r18[04]
1ea2: _ZN17LiquidCrystal_I2CC2Ehhh st (Z+11[0302]), r1[00]
1ea4: _ZN17LiquidCrystal_I2CC2Ehhh ret
                                       ->> SPL=ff SPH=10
076e: __do_global_ctors         cpi YL[90], 0x90
076e:                                                             

 

This looks like some kind of initialisation to known values. Z appears to be loaded with the base address of this object/structure and it is 0x02f7. 

0x2f7/8 is loaded with a802

The name LiquidCrystal_I2C is clearly related to the Adafruit LCD I2C hardware library. Could it be the "begin()" initialiser?

 

Not immediately but I built up the following picture.

 

_do_clear_bss
     |
     V
===============================================
02f7 a8  00 (hardware setup)
02f8 02
02f9 00
02fa 00
02fb 27                   [_Addr]
02fc     00 init_priv     [_displayfunction]
02fd                      [_displaycontrol]
02fe     02 ::begin       [_displaymode]
02ff     04               [_numlines]
0300 14                   [_cols]
0301 04                   [_rows]
0302 00                   [_backlightval]
===============================================

This matches the LiquidCrystal_I2C object and it's variables(members?) from the dot h file.

I have no idea what the first 4 bytes are, but assumed they were part of the objects temporary housekeeping added by the compiler.

 

class LiquidCrystal_I2C : public Print {
public:
  LiquidCrystal_I2C(uint8_t lcd_Addr,uint8_t lcd_cols,uint8_t lcd_rows);
  void begin(uint8_t cols, uint8_t rows, uint8_t charsize = LCD_5x8DOTS );

...

	
private:
  void init_priv();
  void send(uint8_t, uint8_t);
  void write4bits(uint8_t);
  void expanderWrite(uint8_t);
  void pulseEnable(uint8_t);
  uint8_t _Addr;
  uint8_t _displayfunction;
  uint8_t _displaycontrol;
  uint8_t _displaymode;
  uint8_t _numlines;
  uint8_t _cols;
  uint8_t _rows;
  uint8_t _backlightval;
};

 

So I continued the search.

 

2ebe: digitalWrite              pop r17 (@10f7)[03]
                                       ->> r17=03 SPL=f7 SPH=10
2ec0: digitalWrite              ret
                                       ->> SPL=f9 SPH=10
0b88: _Z14hardware_setupv       ldi ZL, 0xe7
                                       ->> ZL=e7 ZH=07
0b8a: _Z14hardware_setupv       ldi ZH, 0x02
                                       ->> ZL=e7 ZH=02
0b8c: _Z14hardware_setupv       st (Z+16[02f7]), r1[00]
0b8e: _Z14hardware_setupv       ldi r24, 0x01
                                       ->> r24=01
0b90: _Z14hardware_setupv       st (Z+15[02f6]), r24[01]
0b92: _Z14hardware_setupv       st (Z+14[02f5]), r24[01]
0b94: _Z14hardware_setupv       st (Z+13[02f4]), r24[01]
0b96: _Z14hardware_setupv       st (Z+12[02f3]), r24[01]
0b98: _Z14hardware_setupv       call 0x000486
                                       ->> SPL=f7 SPH=10
090c: _Z19load_serial_outputsv  push r17[03] (@10f6)
                                       ->> SPL=f6 SPH=10
090e: _Z19load_serial_outputsv  push YL[90] (@10f5)

 

OK, this is where I am setting all array elements of a 16 length boolean array to zero, although I don't see the for loop. Oh, I guess that the compiler has optimised it away. [ NOPE I now know this is where the bug is!!!! ]

I noted the write of zero to 0x02f7 which puzzled me as I thought that this is part of the LiquidCrystal_I2C object.

 

Going back to the instructions before the crash.

 

3314: _ZN5Print5writeEPKc.part.2 movw XL:XH, r24:r25[02f7]
                                       ->> XL=f7 XH=02
3316: _ZN5Print5writeEPKc.part.2 ld ZL, X[02f7]++
                                       ->> XL=f8 XH=02 ZL=00 ZH=01
3318: _ZN5Print5writeEPKc.part.2 ld ZH, X[02f8]
                                       ->> XL=f8 XH=02 ZL=00 ZH=02

We see our friend 0x02f7. The bytes pointed to are loaded into Z which then has a value of 0x0200.

 

331a: _ZN5Print5writeEPKc.part.2 ld r0, (Z+2[0202])=[65]
                                       ->> r0=65
331c: _ZN5Print5writeEPKc.part.2 ld ZH, (Z+3[0203])=[67]
                                       ->> ZL=00 ZH=67
331e: _ZN5Print5writeEPKc.part.2 mov ZL, r0[65] = 65
                                       ->> ZL=65 ZH=67
3320: _ZN5Print5writeEPKc.part.2 ijmp Z[ceca]

But then Z is dereferenced to find another pointer which is used to call a function. The final ijmp is the crash. 

 

Hold on 0x2f7/8 were initialised to 0xa802, but 0x2f7 was written with a single zero, making 0x2f7/8 0x0002.

 

I had an idea. What would have happened if 0x2f7 was still 0xa8?

 

More searching.

02a8/9/a/b/c/d are initialised to 0xf8,0x0f,0x55,0x19,0x43,0x0f in "__do_copy_data".

 

So if Z held 0x02a8 the new vector in Z (from Z+2 and Z+3) would be 0x1955. OK (see my previous post on byte and word address confusion) let's double it to 0x32aa

 

Now I need  to search the Arduino lst file.

 

0x1955 *2 = 0x32aa -> <Print::write(unsigned char const*, unsigned int)>:      ***** BINGO *****

0x0ff8 *2 = 0x1ff0 -> inline size_t LiquidCrystal_I2C::write(uint8_t value) { send(value, Rs);

0x0f43 *2 = 0x1e86 -> void LiquidCrystal_I2C::load_custom_character(uint8_t char_num, uint8_t *rows){ createChar(char_num, rows);

Clearly it's an array of pointer to functions in ROM.

 

OK so when lcd.print("TEST TEXT") is called the first character is sent to the LCD "write()" function.

 

So the bug was the write of 0x00 to 0x02f7, corrupting the first bye of the LiquidCrystal_I2C object called "lcd".

The first 4 bytes appear to be pointers, so the first pointer is being corrupted. Never a good thing !!!

 

At this point I had no idea why zero was being written to this location, it was only when speaking to my friend that he spotted (in a for loop) the code

 

      output_port[SR_MAX_PORT] = false;     // Logic 0 on the output of the ULN2003

The SR_MAX_PORT should have had the index variable 'i'.

This explains why there was no obvious for loop in the assembly. It has been optimised away to a single write. This particular write, writes one byte after the end of the memory reserved for the array. Valid indices are only 0 to SR_MAX_PORT-1.

 

So by compete chance the object named "lcd" came immediately after the boolean array "output_port".

 

The irony is that any number of other variables could have come after the "output_port" array and the corruption would likely have been corrected since many get re-initialised later.

Pages