CS245

Baby names in Kotlin

In this assignment you will use functional programming in Kotlin to, more or less, reimplement a program that you might remember from Data Structures, namely Baby Names.

Overview

The idea of this assignment is to respond to user questions about the commonality of baby names in the United States over the period 2000-2009. The data is broken down in to two categories: boys and girls according to their assigned gender at birth.

The data files are available via http at

        https://cs.brynmawr.edu/cs245/Data/names2000.csv
        https://cs.brynmawr.edu/cs245/Data/names2001.csv
        https://cs.brynmawr.edu/cs245/Data/names2002.csv
        https://cs.brynmawr.edu/cs245/Data/names2003.csv
        https://cs.brynmawr.edu/cs245/Data/names2004.csv
        https://cs.brynmawr.edu/cs245/Data/names2005.csv
        https://cs.brynmawr.edu/cs245/Data/names2006.csv
        https://cs.brynmawr.edu/cs245/Data/names2007.csv
        https://cs.brynmawr.edu/cs245/Data/names2008.csv
        https://cs.brynmawr.edu/cs245/Data/names2009.csv
    

Data in the files looks like:
1,Jacob,34471,Emily,25953
2,Michael,32035,Hannah,23080
3,Matthew,28572,Madison,19967
4,Joshua,27538,Ashley,17997
5,Christopher,24931,Sarah,17697
6,Nicholas,24652,Alexis,17629
7,Andrew,23639,Samantha,17266
8,Joseph,22825,Jessica,15709
9,Daniel,22312,Elizabeth,15094
10,Tyler,21503,Taylor,15078
Where the columns are:
  1. The row number. The data is sorted numerically so the most common names for each assigned gender are in row 1.
  2. Boy name
  3. Boy frequency
  4. Girl name
  5. girl frequency
Your task is to read the data files to two data structures, one containing boys names and one containing girls names. Then from the command line read a set of names (case insensivive) and show information about each name in the set, in each assigned gender.

For example:

UNIX> java -jar hw4.jar Aaron Devon marlen      
Aaron
Boy   Aaron            2   19088   0.5524
     alphabetic position 1

Devon
Boy   Devon            2    5695   0.1648
     alphabetic position 284
Girl  Devon            2     665   0.0234
     alphabetic position 317

marlen
Girl  Marlen           1     212   0.0074
     alphabetic position 760
    
The above example is only for the years 2000 and 2001. In this output, you see the name Arron is used as a boy in 2 years, a total of 19088 times for 0.55% of the total boy's names. Alphabetically Aaron is the first boy's name. Aaron is not used as a girl's name.

Marlen is not used and a boy's name; it is used in only one year (unspecified) for girls, only 212 times and it is in position 760 alphabetically.

In addition to names your program must work for prefixes of names. Given a prefix, print information as above about each name that matches the prefix. For instance:

    java -jar hw4.jar syd
    syd
    Girl  Sydney           2   19879   0.6981
         alphabetic position 982
    Girl  Sydnee           2     923   0.0324
         alphabetic position 981
    Girl  Sydni            2     867   0.0304
         alphabetic position 983
    Girl  Sydnie           2     657   0.0231
         alphabetic position 984
    
Again, the data in this example is for only 2000 and 2001

Your output must include all of this information -- but for all 10 years. Also, I am confident that you can improve on my output presentation.

Getting the data

Reading from files, is a naturally serial process that does not map particularly well into recursion. Hence, I am giving you the following function which uses a loop for reading data from the web. It should be the only use of a loop construct in your program.
    val readURL = { url:String -> 
        val r = mutableListOf();
        try {
            val oracle = URL(url);
            val br = BufferedReader(InputStreamReader(oracle.openStream()));
            loop@ while (true) {
                val line = br.readLine();
                if (line==null) {
                    break@loop
                }
                r.add(line);
            }
        } catch (ee: Exception) {
            println("Problem ${ee}");
        }
        r
    }       
the call to use this function looks like
    readURL("https://cs.brynmawr.edu/cs245/Data/names2000.csv")

Requirements and Suggestions

For those of you who might remember requirements of the Baby Names assignment from Data Structures, forget those requirements. The only requirements are those that appear here.

You may use MutableList for storing data. However, be careful with its use. For instance, in my readURL function above, I use a mutable list, but given the same input file, the resulting list is always the same. You should aim for something similar. (My usual approach for is to allow the list to change within a single function; outside of that function the list is immutable.)

I encourage you to make liberal use of the functional constructs in Kotlin: forEach, map, filter, ... More often than not, clever use of these functions can avoid using loops or recursion. In my implementation I have only 5 recursive functions, and most of those are related to my linked list implementation. (Again, you are not required to write your own linked list; I expect you will be far better served to use List or MutableList)

While I do not require you to define and use classes, I do encourage you to do so. Moreover, if you do something like leaving all of the data in the list of String representation returned by readURL I doubt you will be able to succeed with this task. Worse, even if you do succeed, you will not get a good grade. In my implementation I wrote 5 classes (several of these were for my singly linked list). Suggestion: DO NOT write your own list class. Use the kotlin List. Then use functions defined on list like: map, filter and foreach. The assignment will be far easier doing it that way.)

Feel free to have everything in a single file. Kotlin, unlike Java, is content for you to do so.

Suggestion: get a clear picture of all of the work that you think needs to be done. Then, build everything bottom up. That is, take your idea and ask the question, what do I need to do? Each time you get know what you need to do, ask, what do I need for that. Keep doing this until you get to the point where the answer involves 10 or fewer lines of code and no variables. Then write that code and test that code.

Phrased alternately, write little functions that can be composed into bigger functions. Write little tests of your little functions. Write, test, write, test, put together, test, ... By so doing, you always have a program that does something. Even better, when you get to the end, if you are doing this correctly, there should be little to do. Everything should work because you have tested everything along the way.

Soapbox: One advantage of taking this approach to programming is that when you are learning a language, you are always asking a question that is reasonably easy to get an answer. So, for instance, when I wrote an implementation of this assignment, I constantly faced the question "how do I do that". But my questions were always small enough that it was easy to get an answer for my reference books or the web.

For example, in my development of this program I first wrote something to hold information about a single name. Then I tested that. Then I put 2 of those objects into a list, and tested the list. Then I wrote something to find a single item within the list, ... I expect your process will be quite different.

Electronic Submissions

Your submission will be handed in using the submit script.

If you write your program on computers other than those in the lab, be aware that your program will be graded based on how it runs on the department’s Linux server, not how it runs on your computer. The most likely problem is not submitting everything or hard coding file locations that are not correct on the Linux servers.

The submission should include the following items:

README:
This file should follow the format of this sample README
Source files
All of them (you might have only one)
Data files used:
Be sure to include any non-standard data files uses. You should not have any.
Script file
Include with your submission the output from your program on the following names: aa, Dev, sy, michelle
DO NOT INCLUDE:
Data files that are read from the class site.

Again: Once you have everything you want to submit in the a4 directory within /home/YOU/cs245/

  1. Go to the directory /home/YOU/cs245
  2. Enter /home/gtowell/bin/submit -c 245 -p 4 -d a4
If this worked you will get a message with the word "success". If you cannot achieve success and the deadline is approaching, send me email. We can set up a meeting to work out your problems. The email will establish that you intended to submit. Once you send the email, do not change the files that you were trying to submit.