v1.0 with GPU support

2017-02-07 01:33:38 -07:00 · 2017-02-07 01:33:38 -07:00 · d8f460b43c
parent 0fea36b252
commit d8f460b43c
4 changed files with 807 additions and 766 deletions
--- a/RELEASE_NOTES.txt
+++ b/RELEASE_NOTES.txt
@ -1,3 +1,56 @@
+2017 02/06
+
+Graphics Processing Units (GPU) are now supported with the introduction of the Python library TensorFlow. The end 
+result is a staggering improvement in performance. With one comparison of a 10,000 data points (rows) x 9 features
+(columns) dataset on a 40 core Intel Xeon motherboard versus a 2000 core Nvidia GPU card, the wall time as reduced from 
+50 hours to less than 4 minutes. On CPU-only computers, the performance on a single core is as much as 10x improved due 
+to the vectorisation of the data and application of the C-based TensorFlow maths library.
+
+To install TensorFlow, I recommend visiting https://www.tensorflow.org/get_started/ It is straight forward for Ubuntu, 
+but unfortunately can be rather challenging with OSX. Have patience. Review the forums. It's worth the effort.
+
+I owe many thanks to the expertise of Iurii Milovanov, a contract developer whom I engaged for this effort. While the
+number of lines of Karoo GP modified were initially less than a dozen (replacing the multi-core pprocess calls), I asked 
+Iurii to also rewrite the test functions. As such, both Training and Testing are now fully GPU enabled. Thank you!
+
+A number of other changes have been integrated, including:
+
+ - Karoo GP is now developed against Python 2.7 as provided with Ubuntu Desktop 16.04.1.
+   
+ - A number of Python methods have been deleted, added, modified and/or renamed. In particular, in the category 
+   'fx_fitness_' If you have built your own code based on the Karoo methods, please review this section carefully.
+   
+ - The user engaged 'bal'ance function (pause menu) has been rebuilt to anticipate exact quantities instead of 
+   percentages, enabling the user to define precisely how many of each of the four genetic operators will be applied 
+   with the construction of each subsequent population.
+   
+ - Activation of the 'test' is now conducted with only the letter 't' and the option to engage a specific number of 
+   'c'ores is removed. Therefore, the 't'imer mode is also removed, as this was a means to discover the optimal number 
+   for multi-core processing which is now automated by TensorFlow.
+   
+ - The libraries 'pprocess' and 'time' are no longer required nor imported.
+ 
+ - The population_* files (.csv) are now deposited into unique directories created with the launch of each run. A .txt
+ 	 file is also written to each directory which captures the run-time configuration of Karoo GP. This enables truly
+ 	 scriptable runs of Karoo.
+ 	 
+ - The Server interface to Karoo GP (karoo_gp_server.py) now terminates completely, kicking back to the command line.
+   This enables bash or chron launches of multiple sequential or parallel runs, enabling the exploration of multiple 
+   runs with identical configuration, or that of varied configuration parameters.
+   
+Finally, Karoo GP is now a 1.0 release. I never know when to transition from beta to real, so please forgive me if I 
+jumped the gun. But with GPU support and the revised Server script, I have find Karoo GP to be a fully functional, 
+powerful machine learning tool. I hope you will agree --kai
+
+
+2016 09/20
+
+Fixed the genetic operator (b)alance function to work with large than 100 trees per population.
+
+Introduced the pause for all runtime modes in the Desktop application, such that the user can apply configurations prior
+to the run (eg: change the balance of the genetic operators or the number of engaged cores).
+
+
 2016 09/19b

 After another 2 hours of trouble shooting, I learned that sympy.subs throws the 'zoo' error for a divide-by-zero if 
@ -31,11 +84,11 @@ With the 09/14 update I failed to upload the new coefficients.csv file to the fi
 this will be the means by which the user can define the constants desired for the Karoo GP run. If you had run Karoo GP 
 v0.9.2.0 in the past 24 hours without this file, it would have complained. My apology.

-Also, a bit of a roadmap for the 2nd half of 2016, into 2017
+Also, a bit of a road map for the 2nd half of 2016, into 2017
 - validate the new (faster) sympy.lambdify and fully replace the current (slower) sympy.subs
 - replace the row-by-row dictionaries with vectors for what should be a significant performance increase
 - complete the introduction of constants in a manner more well defined than is currently supported
- - investigate replacing pprocess with the multicore library
+ - investigate replacing pprocess with the multi-core library
 - introduce Theano or Tensor Flow for GPU support
 
 I welcome any assistance with these, if anyone has experience and time.
@ -47,8 +100,8 @@ In karoo_gp_base_class.py
 - Removed redundant lines in the method 'fx_karoo_data_load()'
 - Added support for the Sympy 'lambdify' function in 'fx_karoo_data_load' (see explanation below)
 - Added a draft means of catching divide-by-zero errors in the new 'lambdify' function
- - Discovered the prior 'fx_eval_subs' incorrected applied a value of 1 to the variable 'result' as a means to
- 	replace the 'zoo' function for divide by zero errors. However, this could inadvertantly undermine the success of
+ - Discovered the prior 'fx_eval_subs' uncorrectly applied a value of 1 to the variable 'result' as a means to
+ 	replace the 'zoo' function for divide by zero errors. However, this could inadvertently undermine the success of
 	Classification and Regression runs. My apology for not catching this sooner.

 "While attending the CHEAPR 2016 workshop hosted by the Center for Cosmology and Astro-Particle Physics, The Ohio State
@ -59,7 +112,7 @@ to use, but terribly slow as it relies upon an internal, Python mathematical lib
 seeing only a 2x performance increase. Clearly, there are yet other barriers to remove.

 In the new 'fx_eval_subs' method you will find both sympy.subs (active) and sympy.lambdify. While preliminary tests 
-worked well, I witnessed an erractic outcome which I yet need to reproduce and investigate. Feel free to comment the
+worked well, I witnessed an erratic outcome which I yet need to reproduce and investigate. Feel free to comment the
 .subs and uncomment the .lambdify sections and take it for a spin.

 I believe there are 2 more steps to increase performance: removing the dictionaries which contain each row, such that
@ -111,7 +164,7 @@ In karoo_gp_base_class.py
 - added (y/n) to "Are you certain you want to quit?" message --thanks Hunter!
 
 In karoo_gp_main.py
- - reset default evoluationary balance to .1/.1/.1/.7
+ - reset default evolutionary balance to .1/.1/.1/.7



@ -214,7 +267,7 @@ user interface that in the original versions were not present, as follows:
 	This script now auto-scales to any number of columns and rows (within the limit of your computer's capability), 
 	and features a text-based user interface. This script is designed to be used following karoo_data_sort.py.

-	karoo_multiclassifier.py
+	karoo_multi-classifier.py
 	This script functions as before, but with a minor bug fixed in which the final class was mislabeled.

 	karoo_iris_plot.py
--- a/karoo_gp_base_class.py
+++ b/karoo_gp_base_class.py
--- a/karoo_gp_main.py
+++ b/karoo_gp_main.py
@ -1,8 +1,8 @@
 # Karoo GP Main (desktop)
 # Use Genetic Programming for Classification and Symbolic Regression
-# by Kai Staats, MSc UCT / AIMS; see LICENSE.md
-# Much thanks to Emmanuel Dufourq and Arun Kumar for their support, guidance, and free psychotherapy sessions
-# version 0.9.2.1
+# by Kai Staats, MSc; see LICENSE.md
+# Thanks to Emmanuel Dufourq and Arun Kumar for support during 2014-15 devel; TensorFlow support provided by Iurii Milovanov
+# version 1.0

 '''
 A word to the newbie, expert, and brave--
@ -34,6 +34,7 @@ If you include the path to an external dataset, it will auto-load at launch:
 import sys # sys.path.append('modules/') to add the directory 'modules' to the current path 
 import karoo_gp_base_class; gp = karoo_gp_base_class.Base_GP()

+
 #++++++++++++++++++++++++++++++++++++++++++
 #   User Defined Configuration            |
 #++++++++++++++++++++++++++++++++++++++++++
@ -50,10 +51,10 @@ gp.karoo_banner()

 print ''

-menu = ['b','r','c','m','p','']
+menu = ['c','r','m','p','']
 while True:
 	try:
-		gp.kernel = raw_input('\t Select (r)egression, (c)lassification, (m)atching, or (p)lay (default m): ')
+		gp.kernel = raw_input('\t Select (c)lassification, (r)egression, (m)atching, or (p)lay (default m): ')
 		if gp.kernel not in menu: raise ValueError()
 		gp.kernel = gp.kernel or 'm'; break
 	except ValueError: print '\t\033[32m Select from the options given. Try again ...\n\033[0;0m'
@ -139,7 +140,7 @@ else: # if any other kernel is selected
 		except ValueError: print '\t\033[32m Enter a number from 1 including 100. Try again ...\n\033[0;0m'
 		except KeyboardInterrupt: sys.exit()
 		
-	menu = ['i','g','m','s','db','t','']
+	menu = ['i','g','m','s','db','']
 	while True:
 		try:
 			gp.display = raw_input('\t Display (i)nteractive, (g)eneration, (m)iminal, or (s)ilent (default m): ')
@ -150,13 +151,12 @@ else: # if any other kernel is selected
 		

 # define the ratio between types of mutation, where all sum to 1.0; can be adjusted in 'i'nteractive mode
-gp.evolve_repro = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Reproduction
-gp.evolve_point = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Point Mutation
-gp.evolve_branch = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Branch Mutation
-gp.evolve_cross = int(0.7 * gp.tree_pop_max) # percentage of subsequent population to be generated through Crossover
+gp.evolve_repro = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Reproduction
+gp.evolve_point = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Point Mutation
+gp.evolve_branch = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Branch Mutation
+gp.evolve_cross = int(0.7 * gp.tree_pop_max) # quantity of a population generated through Crossover

 gp.tourn_size = 10 # qty of individuals entered into each tournament (standard 10); can be adjusted in 'i'nteractive mode
-gp.cores = 1 # replace '1' with 'int(gp.core_count)' to auto-set to max; can be adjusted in 'i'nteractive mode
 gp.precision = 4 # the number of floating points for the round function in 'fx_fitness_eval'; hard coded


@ -182,8 +182,8 @@ gp.fx_karoo_construct(tree_type, tree_depth_base) # construct the first populati
 if gp.kernel != 'p': print '\n We have constructed a population of', gp.tree_pop_max,'Trees for Generation 1\n'

 else: # EOL for Play mode
-	gp.fx_eval_tree_print(gp.tree) # print the current Tree
-	gp.fx_tree_archive(gp.population_a, 'a') # save this one Tree to disk
+	gp.fx_display_tree(gp.tree) # print the current Tree
+	gp.fx_archive_tree_write(gp.population_a, 'a') # save this one Tree to disk
 	sys.exit()
 	

@ -206,11 +206,13 @@ if gp.display != 's':
 	if gp.display == 'i': gp.fx_karoo_pause(0)

 gp.fx_fitness_gym(gp.population_a) # 1) extract polynomial from each Tree; 2) evaluate fitness, store; 3) display
-gp.fx_tree_archive(gp.population_a, 'a') # save the first generation of Trees to disk
+gp.fx_archive_tree_write(gp.population_a, 'a') # save the first generation of Trees to disk

 # no need to continue if only 1 generation or fewer than 10 Trees were designated by the user
 if gp.tree_pop_max < 10 or gp.generation_max == 1:
-	gp.fx_karoo_eol(); sys.exit()
+  gp.fx_archive_params_write('Desktop') # save run-time parameters to disk
+  gp.fx_karoo_eol()
+  sys.exit()
 	

 #++++++++++++++++++++++++++++++++++++++++++
@ -238,14 +240,14 @@ for gp.generation_id in range(2, gp.generation_max + 1): # loop through 'generat
 	gp.fx_karoo_crossover() # method 4 - Crossover Reproduction
 	gp.fx_eval_generation() # evaluate all Trees in a single generation
 	
-	gp.population_a = gp.fx_evo_pop_copy(gp.population_b, ['GP Tree by Kai Staats, Generation ' + str(gp.generation_id)])
+	gp.population_a = gp.fx_evolve_pop_copy(gp.population_b, ['GP Tree by Kai Staats, Generation ' + str(gp.generation_id)])
 	

 #++++++++++++++++++++++++++++++++++++++++++
 #   "End of line, man!" --CLU             |
 #++++++++++++++++++++++++++++++++++++++++++

-gp.fx_tree_archive(gp.population_b, 'f') # save the final generation of Trees to disk
+gp.fx_archive_tree_write(gp.population_b, 'f') # save the final generation of Trees to disk
 gp.fx_karoo_eol()

 	
--- a/karoo_gp_server.py
+++ b/karoo_gp_server.py
@ -1,8 +1,8 @@
 # Karoo GP Server
 # Use Genetic Programming for Classification and Symbolic Regression
-# by Kai Staats, MSc UCT / AIMS; see LICENSE.md
-# Much thanks to Emmanuel Dufourq and Arun Kumar for their support, guidance, and free psychotherapy sessions
-# version 0.9.2.1
+# by Kai Staats, MSc; see LICENSE.md
+# Thanks to Emmanuel Dufourq and Arun Kumar for support during 2014-15 devel; TensorFlow support provided by Iurii Milovanov
+# version 1.0

 '''
 A word to the newbie, expert, and brave--
@ -14,7 +14,7 @@ of its intent and design.
 KAROO GP SERVER
 This is the Karoo GP server application. It can be internally scripted, fully command-line configured, or a combination
 of both. If this is your first time using Karoo GP, please run the desktop application karoo_gp_main.py first in order 
-that you come to understand its full functionality.
+that you come to understand the full functionality of this particular Genetic Programming platform.

 To launch Karoo GP server:

@ -52,18 +52,18 @@ import argparse
 import karoo_gp_base_class; gp = karoo_gp_base_class.Base_GP()

 ap = argparse.ArgumentParser(description = 'Karoo GP Server')
-ap.add_argument('-ker', action = 'store', dest = 'kernel', default = 'm', help = '[r,c,m] fitness function: (r)egression, (c)lassification, or (m)atching')
+ap.add_argument('-ker', action = 'store', dest = 'kernel', default = 'm', help = '[c,r,m] fitness function: (r)egression, (c)lassification, or (m)atching')
 ap.add_argument('-typ', action = 'store', dest = 'type', default = 'r', help = '[f,g,r] Tree type: (f)ull, (g)row, or (r)amped half/half')
 ap.add_argument('-bas', action = 'store', dest = 'depth_base', default = 3, help = '[3...10] maximum Tree depth for the initial population')
 ap.add_argument('-max', action = 'store', dest = 'depth_max', default = 3, help = '[3...10] maximum Tree depth for the entire run')
 ap.add_argument('-min', action = 'store', dest = 'depth_min', default = 3, help = '[3...100] minimum number of nodes')
 ap.add_argument('-pop', action = 'store', dest = 'pop_max', default = 100, help = '[10...1000] maximum population')
 ap.add_argument('-gen', action = 'store', dest = 'gen_max', default = 10, help = '[1...100] number of generations')
-ap.add_argument('-fil', action = 'store', dest = 'filename', default = 'files/data_MATCH.csv', help = '/path/to_your/data.csv')
+ap.add_argument('-fil', action = 'store', dest = 'filename', default = 'files/data_MATCH.csv', help = '/path/to_your/[data].csv')

 args = ap.parse_args()

-# set the same parameters found in the Karoo GP desktop application, but potentially passed from the command line
+# pass the argparse defaults and/or user inputs to the required variables
 gp.kernel = str(args.kernel)
 tree_type = str(args.type)
 tree_depth_base = int(args.depth_base)
@ -73,14 +73,13 @@ gp.tree_pop_max = int(args.pop_max)
 gp.generation_max = int(args.gen_max)
 filename = str(args.filename)

-gp.display = 'n' # display mode is set to (s)ilent
-gp.evolve_repro = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Reproduction
-gp.evolve_point = int(0.1 * gp.tree_pop_max) # percentage of subsequent population to be generated through Point Mutation
-gp.evolve_branch = int(0.2 * gp.tree_pop_max) # percentage of subsequent population to be generated through Branch Mutation
-gp.evolve_cross = int(0.6 * gp.tree_pop_max) # percentage of subsequent population to be generated through Crossover
+gp.display = 's' # display mode is set to (s)ilent
+gp.evolve_repro = int(0.1 * gp.tree_pop_max) # quantity of a population generated through Reproduction
+gp.evolve_point = int(0.0 * gp.tree_pop_max) # quantity of a population generated through Point Mutation
+gp.evolve_branch = int(0.2 * gp.tree_pop_max) # quantity of a population generated through Branch Mutation
+gp.evolve_cross = int(0.7 * gp.tree_pop_max) # quantity of a population generated through Crossover

 gp.tourn_size = 10 # qty of individuals entered into each tournament (standard 10); can be adjusted in 'i'nteractive mode
-gp.cores = 1 # replace '1' with 'int(gp.core_count)' to auto-set to max; can be adjusted in 'i'nteractive mode
 gp.precision = 4 # the number of floating points for the round function in 'fx_fitness_eval'; hard coded

 # run Karoo GP