Class 3: Functions, Modules, Headers, Primes

So let's shoot for a goal today. We probably won't make it, but like I've said: This is how we grow: by being defeated, decisively, by constantly greater beings.

A man always has two reasons for doing anything: a good reason and the real reason.

J. P. Morgan

Today's Goal (good reason):

Generate a random prime from a given range.

So if the bounds were 1 to 11 I would want to have equal chance of selecting 2, 3, 5, 7, or 11.

The Real Goals (today's intellectual gifts):

Converse about software engineering
Learn how to make modular code in C++
Practice making functions
Practice logical code control
Learn about random generation
Making header files
Basic timing practices
Beginning to think about complexity

Thinking about a large task

So when you think about today's goal, what do you imagine?

Group-Chat Can you as a group break up the task of generating a random prime from some range into doable chunks?

One of the reasons that programing languages provide us the ability to create functions is to allow dividing up tasks into manageable action chunks.

We want our code to be comprehensible, because that is the first step on the way to maintainability. The first step to comprehensibility is to break computational tasks into comprehensible chunks (represented as functions and classes) and name those. Such functions then provide the basic vocabulary of computation, just as the types (built-in and user-defined) provide the basic vocabulary of data. […]

The number of errors in code correlates strongly with the amount of code and the complexity of the code. Both problems can be addressed by using more and shorter functions. Using a function to do a specific task often saves us from writing a specific piece of code in the middle of other code; making it a function forces us to name the activity and document its dependencies.

Bjarne Stroustrup, designer of C++

Here are the small chunks I imagine when thinking about solving this task (Version 1):

Generate a random number in a range.
Test if the number is prime.
Repeat until prime.

An alternate version that I can imagine is this (Version 2):

Generate all primes up to a value.
Make a list of the primes in my range.
Randomly select an element from the list.

Generating "Randomness"

When we are young we play games, shuffle decks of cards, roll dice, play "Mad-Libs" (code this), and amuse ourselves with surprising outcomes. Despite growing up with so many sources of randomness it is a difficult task to do honorably in a computer. Computers are dumb and predictable. So we usually settle for "Pseudo-Random", that is, not actually random but random enough to fool us.

I won't talk too deeply about randomness today, but I invite you to think and chat about it after class or in office hours etc.

A google search reveals the function int rand( void ) in the <cstdlib>. Here it is in action:

Research-Task 1: Why is it not random? How can you fix it? (Hint: find an example using srand)

Random-Task 2: Alter this code to generate a number from 1 to 6 (roll a die). Hint 23 % 5 == 3.

Research-Task 3: Using an array of ints can you test the distribution of your die roller? (Hint: int histogram[6] = {0,0,0,0,0,0} defines an array of 6 values.)

SPOILER ALERT: if you want to see my die-rolling distribution tester. ONLY LOOK ONCE YOU HAVE WRITTEN YOURS OR TRIED FOR A WHILE.

Extra-Credit (outside of class): It has been stated that rand() is not random enough for cryptographic purposes. How would you confirm this? Can you come up with some tests?

Random-Task 4: Finally, convert your die-roller into a function which takes a lower bound and an upper bound and generates a number between the two inclusive.

A two file C++ program!

OK, so I want to offer some subtle thoughts on making a program. If you think about your final product you can begin to imagine the interface. So in our case there will be a main function that probably has a line that looks like this: int my_prime = random_prime(1,11);.

By thinking of the program from the end result to the details is a form of "top-down" thinking as opposed to "bottom-up" thinking. If you are a top-down thinker (not all of you will be) then you might want to consider Test-Driven Development as a philosophy that you ascribe to. It is abbreviated TDD and has a huge following out there. The idea is this: never write a line of code until you have made a failing test. Then make the test work. Then make it pretty. Then make another failing test, etc., etc..

That's nice and all, but you guys don't really know how to pull something like that off yet. So here is a thought for you: how would I call random_prime(1,11) in main.cpp but have the function defined in some file called random.cpp?

Cloud-9 Task 5: Fork this repo, cisc220/first-module, and spawn a cloud9 workspace from it.

Terminal Task 6: Take the average of 34 and 36 by compiling and running this program. The compile command is g++ main.cpp avg.cpp then execute with ./a.out.

Recognize for a moment that you have made a function definition in one file and called it from a different file. Modular code is now way more likely to happen in your C++ future!

Reading someone else's code

One bizarre truth of our field is that creating new code is easier than reading someone else's code. Even if that person is yourself (from the past).

I've implemented random prime generation in two ways.

Mini-Task 7: Create a cloud9 workspace from the github repo: cisc220/random-prime

Mini-Group-Task 8: Look at main.cpp then primes.h and random.h. Describe out loud what this code does and how it works.

New Feature Task 9: Compile this code using g++ main.cpp primes.cpp random.cpp -std=c++11. Generate 5 random primes smaller than 100.

So -std=c++11 is new for us. It is a compiler flag which specifies which standard of C++ we will run. Every couple of years a new set of features is rolled out. In this case I wanted to use auto so I compiled with -std=c++11. Some C++ standards are c++98, c++03, c++11, c++14, c++17 those last two aren't available everywhere. Something you might consider doing is googling the new features in each release, and trying to remember/store/make useful the main ideas of each release.

Refactor Task 10: Go to primes.cpp and see if you can make a new function: bool is_prime_square_root(unsigned long n). This should act like the brute force method but stop searching at \(\sqrt{n}\).

Timing Code

We know that speed is something to worry about, but for now it might seem like a sort of vague objective. Let's give you some tools to begin making it real.

Timing Task 11: Alter main.cpp to not ask for bounds but to have them set in code (I have some comments you can use). Also set the number of primes you generate to 10000. Then compile and use the following command: time ./a.out >> output

This gave us something like:

        
real    0m0.201s
user    0m0.156s
sys     0m0.044s

Time Task 12: Now change the number of primes generated to 1, and the method in is_prime to is_prime_brute_force. Time it now.

Time Task 13: Double the size of the values of lower and upper. Time it now. Did the amount of time change? Did it double? Did it quadruple?

Time Task 14: Now switch the primality testing method to your square root version. Time it now. How did that impact the performance?

In general the algorithm you use will have more impact on the speed of your code than anything else. A better algorithm on a smartphone can outpace super-computers running brute force algorithms.

We will generally want to know the following: How much does doubling the size of my problem change the running time of my algorithm?

If you see little or no impact then you are probably running a logarithmic-time algorithm (or possibly a constant-time algorithm).
If the time doubles then you are probaly running a linear-time algorithm.
If it quadruples then you are probably running a quadratic-time algorithm.

This is the very beginning of complexity theory and for now we can treat the rough indicators above as loose definitions of those terms. We'll talk more as we progress.

Warm-Up Interview Question