[C] How to find size of an array but using only the base address of the array?

Go To Last Post
28 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi everyone,

 

Need little help on this one. I have a utility on PC that generates some arrays  for me, and I put those arrays on megaAVR's program memory. Now the number of elements of these arrays varies a lot. The size of these arrays could be quite large(10000 bytes) or small(5 bytes). Consider it random. I just put the C file which contains these arrays along with the other files and hit compile. The program takes care of everything. The program only knows the names of these arrays. It has no idea of how many elements are there in each array.

 

So for example to keep it short consider something like this.

 

const __flash uint8_t data_A[] = {
    0xff,0xff,0xff,0xff
};

const __flash uint8_t data_B[] = {
    0xff,0xff,0xff,0xff,0xff
};

const __flash uint8_t data_C[] = {
    0xff,0xff,0xff
};

const __flash uint8_t data_D[] = {
    0xff,0xff,0xff,0xff,0xff,0xff
};

const __flash uint8_t data_E[] = {
    0xff,0xff
};

Now normally I can find the address of first element and last element of these arrays. Using the sizeof operator. Like this :-

const __flash uint8_t *startAddress[5];
const __flash uint8_t *endAddress[5];

int main(void)
{
    startAddress[0] = data_A;
    endAddress[0] = &data_A[sizeof(data_A)-1];

    startAddress[1] = data_B;
    endAddress[1] = &data_B[sizeof(data_B)-1];

    startAddress[2] = data_C;
    endAddress[2] = &data_C[sizeof(data_C)-1];

    startAddress[3] = data_D;
    endAddress[3] = &data_D[sizeof(data_D)-1];

    startAddress[4] = data_E;
    endAddress[4] = &data_E[sizeof(data_E)-1];

    while(1);
}

By knowing the first and last element address of each array I can do all sorts of operations on them.

Now this works okay but I want to find the end address's with a loop. Reason being if this can be done then it opens doors for me to do something even further.

const __flash uint8_t *startAddress[] = {data_A,data_B,data_C,data_D,data_E};
const __flash uint8_t *endAddress[sizeof(startAddress)/2];

int main(void)
{
    for(uint8_t x=0; x < (sizeof(startAddress)/2); x++)
    {
        endAddress[x] =  ???  // What can we write here so that endAddress contains the address's of last element of all the data arrays.
    }

    while (1);
}

At this point the startAddress array contains the base address's of every array. Can I use this somehow to fill the endAddress's? I mean without using the array names here. If this can't be done let me know so I don't try to do something that is not possible.

 

Thanks.

This topic has a solution.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Tue. Aug 10, 2021 - 05:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you update your utility to generate the following? 

const __flash uint8_t nelt_A = 4;
const __flash uint8_t nelt_B = 5;
...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

No, you can't know the size of an array just from its start address.  You might want to create a structure that holds the array start address and its size.  Then you can create an array of those structures.  That would make your endAddress data elements redundant, since you can always calculate endAddress from startAddress and sizeof.

 

Also, don't use magic numbers like 2.  Use a define such as PTR_SIZE, preferably defined with a sizeof for clarity and universality.

 

EDIT: another approach that was done a lot in the past is to put the length of the array at the beginning of the array.  So reserve 2 bytes at the beginning of the array to hold the array size (in bytes or elements, whatever works for you).  Then you know that the actual start of the array data is startAddress + sizeof(int) (using byte arithmetic, otherwise doing startAddress + X will point to X*sizeof(array element), which you don't want when you're finding the beginning of the array data).

Last Edited: Mon. Aug 9, 2021 - 10:35 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, this is what I would have to do at last. If nothing else works.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kk6gm wrote:
EDIT: another approach that was done a lot in the past is to put the length of the array at the beginning of the array.  So reserve 2 bytes at the beginning of the array to hold the array size (in bytes or elements, whatever works for you).  Then you know that the actual start of the array data is startAddress + sizeof(int).

This is nice. Ultimately I would have to edit the PC utility to get me the size of the arrays so it can be read by the micro. Which is what I was trying to avoid. Thanks for the idea.

 

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I use a macro.  e.g.

#define M0(x) {x, #x, sizeof(x)}
typedef struct { const unsigned char *data; const char *name; uint32_t sz; } gif_detail_t;
gif_detail_t gifs[] = {

    M0(teakettle_128x128x10_gif),  // 21155
//    M0(llama_driver_gif),          //758945
    M0(horse_128x96x8_gif),        //  7868
    M0(globe_rotating_gif),        // 90533
    M0(bottom_128x128x17_gif),     // 51775
    M0(irish_cows_green_beer_gif), // 29798
    //    M0(cliff_100x100_gif),   //406564

    //    M0(llama_driver_gif),    //758945
//    M0(marilyn_240x240_gif),       // 40843
};

I only have to pass the name of the array.   The macro provides array address, human readable name,  array size.   Which I can pass as arguments to the rendering function.

 

David.

Last Edited: Mon. Aug 9, 2021 - 10:41 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

how about this :

 

#include <avr/io.h>

const __flash uint8_t data_A[] = { 0xff, 0xff };
const __flash uint8_t data_B[] = { 0xff, 0xff };

const __flash uint8_t *addrs[] = { data_A, data_B };
const __flash uint8_t sizes[] = { sizeof(data_A), sizeof(data_B) };

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Heisen wrote:
At this point the startAddress array contains the base address's of every array. Can I use this somehow to fill the endAddress's?

 

No, you can't. 

 

What you need is an array type. Once you lost the array type (allowed it to decay to a pointer), size information is lost. It is your responsibility to preserve (store) the sizes while arrays are still arrays.

Dessine-moi un mouton

Last Edited: Tue. Aug 10, 2021 - 02:28 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I must be missing something here : This is ALL done at compile time:

 

const __flash uint8_t *startAddress[] = { data_A,data_B,data_C,data_D,data_E };
const __flash uint8_t *endAddress[sizeof(startAddress) / 2];

int main(void)
{
    for (uint8_t x = 0; x < (sizeof(startAddress) / 2); x++) {
        /* You CANNOT write to flash here surely */
        endAddress[x] = ? ? ?  // What can we write here so that endAddress contains the address's of last element of all the data arrays.
    }
    while (1);
}

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
Ultimately I would have to edit the PC utility to get me the size of the arrays so it can be read by the micro. Which is what I was trying to avoid.
Why? Surely it's trivial? The thing that generates the bytes of data just needs to count them and then fill in the length field at the end. If your current code serializes the data to a generated file as it goes perhaps it needs to serialize it to a RAM buffer with a placeholder for the size at the start, then go back and fill that when the final size is know and then serialize the buffer to disk.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can store information at compile-time.   e.g. my M0() macro.

 

Or you pass specific array address and size at runtime.

 

Alternatively you can embed the array size in a struct.    Which is effectively what a Pascal-style string or a C++ String does.

 

With C strings you use strlen() to look for the NUL-terminator.

If you have random binary data in an array you can't search for a "terminator"

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


Forgot to mention that C++ has std::array ;-)

 

 

Note that you have to use c++17 or above for this to work. If, for example, you use c++14 you will see something like:

    d:\c\arrays\arrays\arrays.cpp(4): error C2955: 'std::array': use of class template requires template argument list
    c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\array(22): note: see declaration of 'std::array'
    d:\c\arrays\arrays\arrays.cpp(4): error C2514: 'std::array': class has no constructors
    c:\program files (x86)\microsoft visual studio\2017\professional\vc\tools\msvc\14.16.27023\include\array(22): note: see declaration of 'std::array'
    d:\c\arrays\arrays\arrays.cpp(8): error C2662: 'std::array<_Ty,_Size>::size_type std::array<_Ty,_Size>::size(void) noexcept const': cannot convert 'this' pointer from 'std::array' to 'const std::array<_Ty,_Size> &'
    d:\c\arrays\arrays\arrays.cpp(8): note: Reason: cannot convert from 'std::array' to 'const std::array<_Ty,_Size>'
    d:\c\arrays\arrays\arrays.cpp(8): note: Conversion requires a second user-defined-conversion operator or constructor

(this is because previously you had to specify type and length with something like "std::array<int, 5> data{17, 5, 19, 23, 46};" which would just bring you back to the same issue of needing to know the "5" value at the start).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

I use a macro.  e.g.

#define M0(x) {x, #x, sizeof(x)}

Nice macro. Thanks for showing, I surely have a use case for this one.

MattRW wrote:

how about this :

#include <avr/io.h>

const __flash uint8_t data_A[] = { 0xff, 0xff };
const __flash uint8_t data_B[] = { 0xff, 0xff };

const __flash uint8_t *addrs[] = { data_A, data_B };
const __flash uint8_t sizes[] = { sizeof(data_A), sizeof(data_B) };

Yes this will work. But I was doing something else, but now I am thinking clearly and there are much simpler solutions like this one. I must not complicate things.

N.Winterbottom wrote:

I must be missing something here : This is ALL done at compile time:

const __flash uint8_t *startAddress[] = { data_A,data_B,data_C,data_D,data_E };
const __flash uint8_t *endAddress[sizeof(startAddress) / 2];

int main(void)
{
    for (uint8_t x = 0; x < (sizeof(startAddress) / 2); x++) {
        /* You CANNOT write to flash here surely */
        endAddress[x] = ? ? ?  // What can we write here so that endAddress contains the address's of last element of all the data arrays.
    }
    while (1);
}

I was not writing to flash, the pointers are in SRAM that's where I am writing while the program is running. The const __flash uint8_t is the pointer type.

clawson wrote:

Why? Surely it's trivial? The thing that generates the bytes of data just needs to count them and then fill in the length field at the end.

True. The PC app is written in Windows Form App(.NET Framework) at which I am not fluent. I wanted to avoid editing that. But have to do it.

 

Thanks everyone for eliminating my fuzzy thinking and restoring clarity.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@ Heisen

 

You would make life easier if you showed an example of the data.

 

If these arrays have names,  you can do something similar to my M0() example.   i.e. create an array of structs.

 

But if you have a name,  you just ensure that you pass the &array_name and sizeof(array_name) to any function that needs to know.

 

In my example I don't know the size of each GIF image.   The value in the comment was hand-edited after displaying the GIF.

 

However it is possible to calculate the size of the GIF data because GIF is a published format.

I suspect that your array data has a format that you could decode.

If it is truly random data the only method is sizeof(a_valid_array_name)

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a thought but it appears from your example that your data is all 0xffs', if this is so why couldn't you put say a 0x00 at the end of each array and then look through each array counting until you access the 0x00?

 

Or have I completely misunderstood the problem?

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Heisen wrote:
True. The PC app is written in Windows Form App(.NET Framework) at which I am not fluent. I wanted to avoid editing that. But have to do it.
Time to learn Python perhaps? It's universal - you can port the stuff between Windows and Linux - you aren't tied to M$ technologies. Learn once, use many.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After thinking deep, your macro seems to be the cleanest solution to my very fuzzy question. Thanks David.

#include <avr/io.h>

#define M0(x) {x, &x[sizeof(x)-1]}

const __flash uint8_t data_A[] = {
    0xff,0xff,0xff,0xff
};

const __flash uint8_t data_B[] = {
    0xff,0xff,0xff,0xff,0xff
};

const __flash uint8_t data_C[] = {
    0xff,0xff,0xff
};

const __flash uint8_t data_D[] = {
    0xff,0xff,0xff,0xff,0xff,0xff
};

const __flash uint8_t data_E[] = {
    0xff,0xff
};

typedef struct
{
    const __flash uint8_t *startAddress;
    const __flash uint8_t *endAddress;
}address;

address data[] = {
    M0(data_A),
    M0(data_B),
    M0(data_C),
    M0(data_D),
    M0(data_E),
};

int main(void)
{
    while (1);
}

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Tue. Aug 10, 2021 - 05:39 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mike32217 wrote:

Just a thought but it appears from your example that your data is all 0xffs', if this is so why couldn't you put say a 0x00 at the end of each array and then look through each array counting until you access the 0x00?

 

Or have I completely misunderstood the problem?

Real data is not all 0xff's there might be a 0x00 in there. I just typed 0xff in a hurry to show dummy example.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Tue. Aug 10, 2021 - 05:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Time to learn Python perhaps?

But Python is made from C and I know C. laugh 

Just kidding.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

>Now this works okay but I want to find the end address's with a loop. Reason being if this can be done then it opens doors for me to do something even further.

 

 

Not sure what the loop is going to do for you, as the arrays are filled just the same, so assume you want to change the order at runtime or do some other iteration over sequences of arrays, or something. You could also just as well let the pc app create these start/end arrays since it has all the info needed to do so-

 

https://godbolt.org/z/MszzWhzha

 

The dataInfo array can be iterated in any order you want, and the underlying info remains as-is (and all in flash in this case).

 

common code split into function-

https://godbolt.org/z/MGTTMjvMq

 

 

 

Last Edited: Tue. Aug 10, 2021 - 07:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm curious. Why in many of the suggested solutions are start/end rather than start/length being used. Surely the latter is "simpler"? Or did I miss some requirement where the need to know the end address was stated?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I'm curious. Why in many of the suggested solutions are start/end rather than start/length being used. Surely the latter is "simpler"? Or did I miss some requirement where the need to know the end address was stated?

 

I am not entirely sure on this one but here it is. Considering if I have an array with very large size, say

uint8_t data[5000];

 I am reading in chucks of data at a time and the chuck size varies too every time. If I use the length of an array to iterate through it, that would be like this.

data[x]

where x can go from 0 to 4999

Whenever I come back to read the next chunk of data, first the program has to add the value of x to the base pointer of data (for which we know that we have a fast instruction for this purpose, but it only works up to adding 63 to the base address, beyond that the compiler uses different instructions/code due to opcode limitations), which is okay but right now I am doing this other way.

 

Instead of incrementing the x and making it go beyond 63, I copy the base address of the array to a pointer, and read the data though that pointer while also adding to it after reading. Since I am reading data sequentially to move though the data, and I am sure nowhere in the program I have to read more than 63 bytes chunk of data at once. It's only possible through this :- (not completely sure though)

 

// where x = 0

ptr[x++]
ptr[x++]
ptr[x++]
ptr[x++]

ptr += x;

if(ptr == endAddress)
{
    ptr = startAddress; // To read data again from the beginning
}

Now whenever I comeback to continue reading the data the program just has to read the address in the pointer variable to resume. It doesn't have to add anything to that pointer beyond 63 at any point in the program, seems more clean because it will not generate slightly poor code when it has to add more than 63 to the base address/any address.

 

This way to determine the end of the array I need the end address so that I can do the comparison and find the end.

 

This may or may not help in generating slightly faster code. I'd have to test to properly see the difference.

 

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

Last Edited: Wed. Aug 11, 2021 - 12:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK but why not just:

ptr[x++]
ptr[x++]
ptr[x++]
ptr[x++]

if(x == size_of_array)
{
    x = 0; // To read data again from the beginning
}

(during its life x ranges between 0 and 4999 for a 5000 element array)

Last Edited: Wed. Aug 11, 2021 - 02:36 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

I'm curious. Why in many of the suggested solutions are start/end rather than start/length being used. Surely the latter is "simpler"? Or did I miss some requirement where the need to know the end address was stated?

I asked about that as well.  It seems an odd approach.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

kk6gm wrote:

clawson wrote:

 

I'm curious. Why in many of the suggested solutions are start/end rather than start/length being used. Surely the latter is "simpler"? Or did I miss some requirement where the need to know the end address was stated?

I asked about that as well.  It seems an odd approach.

 

While I would probaly also opt for storing length instead of past-the-end pointer, this is nothing more than an old well-entrenched habit.

 

On the contrary, formal logic says that storing the end pointer is a more "natural" approach for more reasons than one. Storing length is actually an "odd" approach.

 

Two pointers is a uniform homogenous pair (as opposed to a ponter & an integer). Two pointers form a range of iterators: a start iterator and an end iterator, which is as useful in C as it is in C++. 

 

Also, the end pointer can be obtained without resorting to a `sizeof`-based fraction using the following neat trick

 

T array[] = { /* whatever */ };
T *begin = array, *end = (&array)[1];

And while formally it is illegal, it is worth pushing for its legalization :)

Dessine-moi un mouton

Last Edited: Wed. Aug 11, 2021 - 10:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AndreyT wrote:

T *begin = array, *end = (&array)[1];

Assuming we're talking about inclusive pointers and you could legally write through end, wouldn't the above code produce end as the address of the next element after the final element ?

I.e An address outside that array ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:

AndreyT wrote:

 

T *begin = array, *end = (&array)[1];

Assuming we're talking about inclusive pointers and you could legally write through end, wouldn't the above code produce end as the address of the next element after the final element ?

I.e An address outside that array ?

 

 

Yes, it is an adress outside the array. What I want to achieve here is an inclusive-exclusive range [begin, end). I.e. `end` is a one-past-the-end pointer, which cannot be dereferenced.

Dessine-moi un mouton

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AndreyT wrote:

 

Also, the end pointer can be obtained without resorting to a `sizeof`-based fraction using the following neat trick

 

T array[] = { /* whatever */ };
T *begin = array, *end = (&array)[1];

And while formally it is illegal, it is worth pushing for its legalization :)

 

That's a cute trick.  I like it.