CS50x – Week 4

Written by on September 28, 2017 in Coding, CS50

ENHANCE!

Last week we learnt about searching or sorting algorithms and their computational complexity.

This week we will learn more about strings, memory and delve briefly into image manipulation.

Strings

Up until this point we’ve been using strings as a data type. int and float are basic data types that take a definitive amount of data, (4 bytes each). We know that strings are in essence an array of chars and since they can be any length, we won’t be able to know how much memory to allocate until we know the length of the string.

int main(void)
{
    printf("s: ");
    string s = get_string();

    printf("t: ");
    string t = get_string();

    if (s == t)
    {
        printf("same\n");
    }
    else
    {
        printf("different\n");
    }
}

If we put the same string for s and t we should expect the program to print ‘same’.

However the program says that they are different. Let’s look at another example.

int main(void)
{
    printf("s: ");
    string s = get_string();
    if (s == NULL)
    {
        return 1;
    }

    string t = s;

    if (strlen(t) > 0)
    {
        t[0] = toupper(t[0]);
    }

    printf("s: %s\n", s);
    printf("t: %s\n", t);

    return 0;
}

With this code we should expect only t should become capitalised.

However we see that both s and t have been capitalised. We can conclude from this that strings definitely do not work the same way that other data types do. Let’s look at another example, a very common function.


void swap(int a, int b);

int main(void)
{
    int x = 1;
    int y = 2;

    printf("x is %i\n", x);
    printf("y is %i\n", y);
    printf("Swapping...\n");
    swap(x, y);
    printf("Swapped.\n");
    printf("x is %i\n", x);
    printf("y is %i\n", y);
}

void swap(int a, int b)
{
    int tmp = a;
    a = b;
    b = tmp;
}

This code should swap the two variables.

However we find that the output is still not what we would expect. The reason for this is memory allocation.

Memory

Every time we run a program, no matter how small or complex, the operating system allocates a chunk of memory for that program. It can be shown like so.

The memory allocation can be thought of as a big rectangle, made up of smaller rectangles or a grid of bytes. The top layer of this is ‘text’, this is where the machine code of our program is located in memory. The next layer is the data that our program will use. Under that we have what is known as the heap and stack. The heap and stack are special in that their sizes are not defined. The stack is where the real-time allocation of our variables are allocated and stored within memory. Every function, including main and each function that is subsequently called along with the variables that they will use, are allocated memory here.

If we look at the earlier example of Swap, this is what is allocated when the program first starts. You can see a chunk of memory has been allocated for main and it’s variables.

Once the program reaches a point where swap is called then that function is allocated a chunk of memory along with the variables it will use. As we have passed in variables into the function a copy of those variables will be used within the function, where a is a copy of x and b is a copy of y.

The function does its operation and the two variables are swapped in memory. Once the function is completed it is cleared from memory.

Once the function is cleared from memory we can see that it did not swap the two variables in main. It had in fact only swapped the copied variables within the swap function. Once that was cleared from memory that swap was in fact lost.

So how do we go about solving this? One way is to understand what happens when we declare a string. When we declare a string we are creating a variable that holds an address. An address in memory where our string starts, since we don’t know how big a string will be.

So earlier when we tried to compare strings we declared string s which created a variable which has an address at 123, then we declared string t which held a different address at 321. So when we tried to see if s == t, they in fact we not because 123!=321.

In the next example we tried to capitalise our second string, we declared string s, then we declared string t = s. This meant that s and t held the same address. So when we changed t to uppercase, s which had the same address, also changed.

We can refer s and t to ‘pointers’ because they ‘point’ to a memory address. In C we denote pointers with a ‘*’ when we declare a variable. a string is in fact a char *.
Pointers become very handy when you have a large program with lots of variables and functions. Instead of creating copies of variables every time you pass them into functions if you have pointers you only need to point to where that variable is stored in memory to manipulate it.

Lets try the swap function again this time with pointers.

void swap(int *a, int *b);

int main(void)
{
    int x = 1;
    int y = 2;

    printf("x is %i\n", x);
    printf("y is %i\n", y);
    printf("Swapping...\n");
    swap(&x, &y);
    printf("Swapped!\n");
    printf("x is %i\n", x);
    printf("y is %i\n", y);
}

void swap(int *a, int *b)
{
    int tmp = *a;
    *a = *b;
    *b = tmp;
}

You can see that our swap function has two pointers passed in as variables, denoted by the int *. Also in the swap function itself we are using the pointers as variables instead of the copied variables. When we call swap we use & of x and y to say lets pass in the address of this variable instead of the variable itself. Lets look at the output now.

It works now because we are not copying in values into a swap function, but the address where the pointer is pointing to, so we swap the address the of the variable.

Simple Memory Management

We can also define the amount of memory we wish to give a variable. We use function called malloc. malloc will allocate an amount of memory we define. We usually use malloc with another function called sizeof which returns the size of a data type. This ensures that we use the correct amount of memory for the given data type. Using malloc give us a lot more freedom with how we can manage our memory usage and is an essential tool in optimising large scale programs.
However having that much freedom with memory also has risks. One of those risks is when we declare our pointers. If we declare a pointer without using malloc we are pointing to ‘somewhere’ in memory which could be an unreadable area of memory or a part of another program or even a operating system service. If we tried to use a pointer without pointing it some specific block of memory our program will definitely produce unexpected results. Another one of the major risks is memory leaks. A memory leak occurs when we allocate memory for use but then do not free up that memory once we have finished with it. That allocated bit of memory will still persist whilst our program is running. Memory is a finite resource and if for example you have a system that is running 24/hr a day and a program kept allocating memory without freeing it up after it was done, the amount of usable memory would run out, causing the whole system to crash.

CS50 has a tool we can use valgrind which allows us to see how much memory we allocate during our program and how much of it we have freed. It is a good idea to run valgrind and clear up any memory leaks before we finalise our programs.

Returning back to the Heap and Stack in the memory. The stack is the part of memory where we store the functions and variables and the heap is where we allocate the memory using malloc. If we grow our stack too much usually by using a recursive function that calls itself too many times we can end up in a situation called a stack overflow. When we use malloc to allocate large amounts of memory without every using free we can end up in a situation called heap overflow. Both of these will cause our system to crash.

Image Manipulation

A very common application of pointers is image manipulation.
When we zoom in to an image, we can see if it made up of a grid of squares.

Each square in the grid is a pixel. Images have a finite amount of pixels thus have a finite amount of information. Each pixel can hold information, most basic of which is 0 or 1.

Where 1 is white and 0 is black we can create a black and white image of something. This type of image is usually known as a bitmap image.

There are other types of image files, most common is the JPEG. JPEG is useful because it uses compression, which makes images a smaller file size by using fewer bits and losing some of the information. JPEG have a signature which means all JPEG have the same first three bytes, 255,216,255. Back in week 2 we touched upon 2 numbering systems that we use in computing. Binary and Decimal. There is a 3rd type that is very commonly used, Hexadecimal. Where binary uses 2 values, and decimal 10. Hexadecimal uses 16 values, 0-9 and then a, b, c, d, e and f, to represent 10-15. 255 in decimal is 1111 1111 and 216 is 1101 1000. Each of the 4 bits can hold 16 values and so map perfectly to hexadecimal. 1111 is f and 1101 is d and 1000 is 8. So 255 in decimal becomes ff in hexadecimal and 216 becomes d8. We use a particular convention to denote hexadecimal numbers, we precede them with 0x. So we write them as 0xff and 0xd8. Knowing that a particular file starts with a signature we can recover those files from raw binary data. For example files that have been deleted but not overwritten.

Bitmap files are another type of image file. They are older and have a less complex data structure. They are less efficient due to not using any compression and are easier to manipulate and work with since each pixel has some number of bytes. The file header is what goes at the beginning of each bitmap. It looks like this.

Files are essentially a sequence of bits and if each byte has some offset from the beginning we can specify exactly what we need to find in order to validate the file. The first 5 parts of that list are collectively known as BITMAPFILEHEADER. This collective structure is known as a struct. Structs allow us to group together variables, so that they can be managed more easily. The next 11 parts are put into a struct called BITMAPINFOHEADER. After that are more structs called RGBTRIPLE which are actually the pixels of our image. Each RGBTRIPLE has three variables, a value for RED, GREEN and BLUE. We can access the values of the struct using a dot notation.


    typedef struct RGBTRIPLE
    {
        BYTE rgbtBlue;
        BYTE rgbtGreen;
        BYTE rgbtRed;
    }
    RGBTRIPLE color;
    color.rgbtBlue = 255;
    color.rgbtGreen = 255;
    color.rgbtRed = 255;

We can declare a struct using the keyword typedef.

Problem Set 4

In this pset we will be resizing and recovering images. The tasks outline are as follows:

  1. Implement Whodunit
  2. Implement either of:
    • Resize, less comfortable
    • Resize, more comfortable
  3. Implement Recover

Whodunit

Whodunit asks us to reveal the hidden message within an image. The image provided is like this. 

The main objective of whodunit is to understand the code provided and make the necessary changes to reveal the hidden message.

After going through the code given, which basically copies an image. It reads a file/image at every pixel and writes them to a new file. The solution to this is to modify the code where the new file written. The clue image has a large amount of red pixels. So we modify the values that are written in the new file and nullify the red values. I decided that if the pixel is red we will change that pixel to black.

see source
// iterate over pixels in scanline
for (int j = 0; j < bi.biWidth; j++)
{
// temporary storage
RGBTRIPLE triple;
// read RGB triple from infile
fread(&triple, sizeof(RGBTRIPLE), 1, inptr);
//Filter out the red noise, setting it to black
if (triple.rgbtRed == 0xff)
{
triple.rgbtRed = 0;
triple.rgbtBlue = 0;
triple.rgbtGreen = 0;
}
// write RGB triple to outfile
fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);
}
view raw whodunit.c hosted with ❤ by GitHub

Resize

Resize asks us to resize an image. There are two version that are offered. The less comfortable asks us to resize the image by a integer scalar. This would ensure that can only make the images larger. The more comfortable version asks us to use a float scalar which means we need to be able to scale it larger or smaller. I went for the more comfortable version and it turned out to be a lot more challenging.

We needed to modify the code that was provided with whodunit. The main thing that we have to understand and change is how changing the file size and the height and width of the image we need to update our BITMAPFILEHEADER and BITMAPINFOHEADER.

The next issue was when we need to scale our image by our scalar. If our scalar was above 1 and a whole number it is easy to resize, we simply increase the number of pixels by the scalar. However when we have floating point value scalars it isn’t a simple increase the number of pixels by x amount. We need some way to choose which pixels we carry over to our new image. For example if our scalar is 0.5 and our image has an area of 5 black pixels in a row. We can’t simply halve it an have 2.5 black pixels. We need to either scale the pixels either down to 2 or up to 3.

see source
/**
* Copies a BMP piece by piece, just because.
*/
#include <stdio.h>
#include <stdlib.h>
#include "bmp.h"
int main(int argc, char *argv[])
{
// ensure proper usage
if (argc != 4)
{
fprintf(stderr, "Usage: ./copy n infile outfile\n");
return 1;
}
//get float value
float scale = atof(argv[1]);
// remember filenames
char *infile = argv[2];
char *outfile = argv[3];
// open input file
FILE *inptr = fopen(infile, "r");
if (inptr == NULL)
{
fprintf(stderr, "Could not open %s.\n", infile);
return 2;
}
// open output file
FILE *outptr = fopen(outfile, "w");
if (outptr == NULL)
{
fclose(inptr);
fprintf(stderr, "Could not create %s.\n", outfile);
return 3;
}
// create and read infile's BITMAPFILEHEADER
BITMAPFILEHEADER bf_in;
fread(&bf_in, sizeof(BITMAPFILEHEADER), 1, inptr);
// create and read infile's BITMAPINFOHEADER
BITMAPINFOHEADER bi_in;
fread(&bi_in, sizeof(BITMAPINFOHEADER), 1, inptr);
// ensure infile is (likely) a 24-bit uncompressed BMP 4.0
if (bf_in.bfType != 0x4d42 || bf_in.bfOffBits != 54 || bi_in.biSize != 40 ||
bi_in.biBitCount != 24 || bi_in.biCompression != 0)
{
fclose(outptr);
fclose(inptr);
fprintf(stderr, "Unsupported file format.\n");
return 4;
}
// determine padding for scanlines for the infile
int in_padding = (4 - (bi_in.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;
//an array to store our input image.
RGBTRIPLE image[bi_in.biWidth][abs(bi_in.biHeight)];
// iterate over infile's scanlines
for (int i = 0, biHeight = abs(bi_in.biHeight); i < biHeight; i++)
{
// iterate over pixels in scanline
for (int j = 0; j < bi_in.biWidth; j++)
{
// temporary storage
RGBTRIPLE triple;
// read RGB triple from infile
fread(&triple, sizeof(RGBTRIPLE), 1, inptr);
// store the image in an array
image[j][i] = triple;
}
// skip over padding, if any
fseek(inptr, in_padding, SEEK_CUR);
}
// create the outfile's BITMAPFILEHEADER
BITMAPFILEHEADER bf_out;
bf_out = bf_in;
// create the outfile's BITMAPINFOHEADER
BITMAPINFOHEADER bi_out;
bi_out = bi_in;
//calculate the new width and height;
bi_out.biWidth = bi_in.biWidth * scale;
bi_out.biHeight = bi_in.biHeight * scale;
// determine padding for scanlines for the infile
int out_padding = (4 - (bi_out.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;
//change the image size
bi_out.biSizeImage = ((sizeof(RGBTRIPLE) * bi_out.biWidth) + out_padding) * abs(bi_out.biHeight);
//change the file size
bf_out.bfSize = bi_out.biSizeImage + sizeof(BITMAPFILEHEADER) + sizeof(BITMAPINFOHEADER);
// write outfile's BITMAPFILEHEADER
fwrite(&bf_out, sizeof(BITMAPFILEHEADER), 1, outptr);
// write outfile's BITMAPINFOHEADER
fwrite(&bi_out, sizeof(BITMAPINFOHEADER), 1, outptr);
// iterate over outfile's scanlines
for (int i = 0, biHeight = abs(bi_out.biHeight); i < biHeight; i++)
{
// iterate over pixels in scanline
for (int j = 0; j < bi_out.biWidth; j++)
{
// temporary storage
RGBTRIPLE triple;
//determine which pixel to use for the outfile
triple = image[(int) (j / scale)][(int) (i / scale)];
// write RGB triple to outfile
fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);
}
// then add it back (to demonstrate how)
for (int k = 0; k < out_padding; k++)
{
fputc(0x00, outptr);
}
}
// close infile
fclose(inptr);
// close outfile
fclose(outptr);
// success
return 0;
}
view raw resize.c hosted with ❤ by GitHub

Recover

This problem set asks us to recover a series of photographs from raw binary data. When we delete files from our device it is not permanently deleted. It is usually stored in an area is not readily accessible. Another thing to note is that photos taken on a digital camera or phone are saved in contiguously on the memory card.

The task gives us a file of raw binary data. Our job is to go through the file and recover our images. Luckily we know that the files we need to recover are JPEG and as mentioned earlier JPEG file all start with their signature. We also know that JPEG are stored in blocks. This means we know that the JPEG file size will be a multiple of the block size.

The solution to this is to read through the file and check for the JPEG signature. Once it has been found we need to read through the file at every block size and write to a file until we reach another JPEG signature. When we reach a new signature we know that the previous image is recovered and we can close the file and create a new image file.

see source
/**
* recover.c
*/
#include <stdio.h>
#define BLOCK_SIZE 512
int main(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "Please enter the correct number of arguments\n");
return 1;
}
// open input file
FILE *file = fopen(argv[1], "r");
if (file == NULL)
{
fprintf(stderr, "Could not open %s.\n", argv[1]);
return 2;
}
//create block
unsigned char block[BLOCK_SIZE];
//create outfile file
int jpgcount = 0;
FILE* pic = NULL;
int jpg_found = 0;
//read through the file til the end
while(fread(block,BLOCK_SIZE,1, file)==1)
{
//Check for jpg signiture
if (block[0] == 0xff && block[1] == 0xd8 && block[2] == 0xff &&
(block[3] == 0xe0 || block[3] == 0xe1 || block[3] == 0xe2 || block[3] == 0xe3
|| block[3] == 0xe4 || block[3] == 0xe5 || block[3] == 0xe6 || block[3] == 0xe7
|| block[3] == 0xe8 || block[3] == 0xe9 || block[3] == 0xea || block[3] == 0xeb
|| block[3] == 0xec || block[3] == 0xed || block[3] == 0xee || block[3] == 0xef))
{
//check to see if a picture has already been found, if so close it
if(jpg_found ==1)
{
fclose(pic);
}
else
{
jpg_found = 1;
}
//construct filename
char filename[8];
sprintf(filename,"%03d.jpg",jpgcount);
pic = fopen(filename, "a");
jpgcount++;
}
//write the file
if(jpg_found == 1)
{
fwrite(&block, BLOCK_SIZE, 1, pic);
}
}
//close files
fclose(file);
fclose(pic);
//success
return 0;
}
view raw recover.c hosted with ❤ by GitHub

This is CS50x and that was week 4

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments«