How to Calculate Statistical Median using a C++ Program Example

by Koscica Dusko on March 27, 2014

Statistics is an essential part of Math, they have several practical applications, but sometimes very hard to understand for non mathematicians.

In this tutorial, we’ll focus on how to implement an algorithm to calculate statistical median.

The following are the high-level tasks that will be performed in our example program:

  • Input the dimension of an array. All elements are of double data type.
  • After you have figure out the dimension of an array, create the array with exactly enough elements. This way you will not waste unnecessary space in the memory.
  • The elements of the array will be prompted from the user, but the array will be sorted at all times.
  • Print all elements of an array in sorted order, and calculate the median in statistical terms.

When we say that we calculate the median in statistical terms, we mean that half of the elements of the array is less and half is greater than determined value.

If we deal with an array that has odd number of elements, it will give us one value that belongs to the array. If we deal with an array that has even number of elements, we should take two from the middle of a sorted one and find the average of those two values, this way we don’t need to get the value that is in the array.

Also, this algorithm might not be the fastest in terms of speed, but it will be good investment when we calculate some other, also important things of the given set of numbers.

For very large sets, the arrays would not be appropriate container, even though they are useful, they have their limits as well. To apply the program in real world you should analyze it more carefully and find some applications if possible.

Example Code to Calculate Median

#include <iostream>

using namespace std;

void   intakeOfElements(double*,int);
double calculateMedian (double*,int);
void   printElements   (double*,int);

int
main()
{
	cout<<"How manny elements will you imput->";
	int nElements;
	cin>>nElements;

	double *ptrArray = new double[nElements];

	intakeOfElements(ptrArray, nElements);

	double dMedian   = calculateMedian(ptrArray, nElements);

	printElements(ptrArray, nElements);

	cout<<"The median of set is ="
	    <<dMedian
	    <<endl;

	delete [] ptrArray;

	return 0;
}

void 
intakeOfElements(double* ptr,
	         int     iN )
{
double *ptrI,
       *ptrJ,
	dTemp ;

	for(ptrI = ptr; ptrI < ptr+iN; ptrI++)
	{
	   cout<<"Next element->"; cin>>dTemp;

	   for(ptrJ = ptrI-1; 
               ptrJ >= ptr && *ptrJ > dTemp; 
               *(ptrJ+1) = *ptrJ ,ptrJ--);

          *(ptrJ+1)= dTemp;
	}
}

double 
calculateMedian(double* ptr, 
		int     iN )
{
 double dResult;
 int iHalf = iN/2;
 if(iN%2==0)
 {
   dResult = (*(ptr + iHalf-1)+ *(ptr + iHalf))/double(2);
 }
 else
 {
  dResult = *(ptr + iHalf);
 }
 return dResult;
}

void 
printElements(double* ptr, 
	      int     iN )
{
	for(double* d=ptr;
	    d < ptr+iN ;
	    cout<<*d<<endl, d++);
}

Explanation of the Code

The main function does the following:

  • nElements serves to keep the size of an array.
  • We create array ptrArray with right amount of places in memory.
  • The function intakeOfElements will provide the input of the elements. This function will sort the array as well.
  • After the elements are sorted, we call the function calculateMedian, in which we find the value we are looking for.
  • We print the elements of sorted array on a screen. Then, print the median.
  • Finally, apply the delete operator on the array.

Now we will look at those functions and explain how do they work:

  • The most important function is intakeOfElements. It gets: one pointer and one int. It will return void.
  • In the function we have two pointers *ptrI and *ptrJ of double data type and one variable to contain the result.
  • For first pointer we have reserved job of advancing from the start of an array towards the end of it.
  • The start is figured with address that is kept in the name of the array. The end will be limited with simple operation of adding pointer and the number of elements, this way you prevent pointer ptrI from going beyond the right limit of an array.
  • After this we take one element after another. The numbers are kept in the dTemp and after we have the next value of the array we will go back toward the begging of the array, and those elements that we will go through are already sorted. So, the part of an array in the memory is always sorted, and every element is looking for its place in ordered array from the back. In another words, it is inserted at its appropriate place.
  • The function calculateMedian has two values to get: pointer at the begging of an array and number of the elements in that array.
  • The return value is dResult, and it would be returned to main function in double data type.
  • After we have sorted an array it is easy to calculate a value of a median. Even do, this might not be the fastest way to achieve that task, it would pay off when we calculate the frequencies of each element or if we wish to remove elements that are repeated.
  • printElements() is the function that presents the elements. The pointer d will get the address of an array. The ptrI + iN is the marker for the end of an array, so that you don’t get over the limes of the array.
  • Each element off an array is printed, one after another and the pointer is moved toward the end marker. It might be even possible to do this without “,” operator. That might be, way too much for some people.

Additional Exercises:

  1. Find the average value of the set also you should calculate the geometrical and harmonic middle.
  2. Find how often each element is repeated in an array.
  3. Figure out which of the element is most often repeated in an array.
  4. Find the element that has the lowest frequency in an array.
  5. Print the elements of original set without sorting the elements that are imputed.
  6. Reduce an array to show just the elements without repetitions.
  7. If an average of the set is signed as the avgValue try to calculate the value of this sum( avgValue – dArray[i])* (avgValue – dArray[i]). Where i goes from zero to the end of an array. After this you should sign the medValue as already mentioned median of the set and calculate similar value as the sum of ( medValue – dArray[i])* ( medValue – dArray[i]).

Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 6 comments… read them below or add one }

1 J.O. Williams March 28, 2014 at 10:02 am

Excellent article.

A natural followup for this article is finding the median by sorting the data and picking the exact middle value. I know that some places have legislated that the median is the exact middle value (which depends on whether the data is even or odd in number).

2 duskoKoscica March 30, 2014 at 9:30 am

THX a lot!

This article is bit more on C side of the programming but it could still be called a C++. The next logical thing could be also the vector container used instead of array, or wrtting it in pure C, or perhaps developing a complete line of methods and classes. It is up to you, it dipends what are you level on and so on.
It is just bare push toward the goal.
For those who did not like it, could try also to replace cout<<*d<<endl, d++ with something like this cout<<*d++<<endl.
But it is just up to the reader to see how She/He/It or what so ever wanna go from this point.
I also recomend some reading to those who don't know enouhg about statistics, and the article has intention to show few tricks, that would not be so easilly achieved in some other languages.
This is one of the resons why C++ has beautifull and powerfull sintaks to achieve great things that would be producing fast code.

3 engeland April 2, 2014 at 3:02 am

Thank you for your work. I agree with duskoKoscica that this is more a c approach. For all of you who want to see it done the c++ way should look into the boost accumulator template library. here.

4 duskoKoscica April 2, 2014 at 8:47 am

To be more precize this could be part of developing one ore few methods for one of the classes. And becuase I would like to show how to do it in C++ 11, this was my omiton, you culd do something like this for(type i: tContainter) cout<<i<<endl; or if you wanna access the locations you could use somethin like this for(type& i: tContainedr) i=something; As one might notice to develop OOP solution it would take way more than just one class with few methods. But, this is way it gets tough to explain the things through OOP things… THX and have nice day!!!

5 Terry April 5, 2014 at 5:14 am

I have been in few situations where results from statistics have helped me to create better programs! Just look at the case of creating Morsse code from the textual file. It is faster if yyou know what is the language you are dilling with.

6 dusKO^2scica April 17, 2014 at 4:50 am

This is not connected to article, but somehow I think it is natural for people to like things like this: lunar eclipse

It is a moon! It looks just to cool not to be noticed. Planets look to cool to…>

Leave a Comment

Previous post:

Next post: