Sunday, January 22, 2017

I work in computer science, machine learning (basically trying to glean actionable insights from recognizing patterns in data) to be specific. Some buzzwords are: -Scalable: This means the algorithm (theoretical description of the procedure) or implementation can run in a "reasonable" amount of time on very large datasets (millions of data points). This is very relative, but with the current trend in machine learning being to work on bigger and bigger data sets, many papers need to make the claim that the techniques they propose are scalable. -Fast: Broadly speaking, this means that the algorithm performs slightly better in theory than the next best algorithm known to solve the problem. It may still take a long time to run in practice. In particular, it doesn't refer to an absolute measure of time. -Data-driven: an adjective used to describe a choice made with machine learning techniques (based on a pattern in the data). Basically "we used machine learning." This sometimes is used in machine learning to make it clear that certain choices are made for principled reasons rather than dark art on the part of the researchers. But it's more often used by other fields who realize that machine learning and "data science" is hot right now and want to claim usage of it. All of this is a little tongue in cheek. I like my algorithms to be scalable, fast, and data driven too. These terms can be overused (especially in academic papers trying to get past conference reviewers) but machine learning does yield impressive results in many fields, even music and the arts. One term that gets used in a derogatory way in computer science in general is "hacky". This might refer to a piece of code that was made to work by trying unprincipled or sub-optimal techniques until something works, as opposed to being actually elegant or cleverly written. More importantly, many people refer to most of programming as "hacking" perhaps a bit unfairly. (Both within computer science, where theoretical computer scientists may look down on some efforts from the empirical community--this is true even at a more specific level in the machine learning community, and from non-computer scientists.) There is value in trying to insist on maximum elegance, since this corresponds with other people being able to read your work and use it themselves. However, I also think that "hacking" is important since no one starts off knowing everything, but it is a very valuable skill to get interesting things work by any means necessary. I think "hacky" should carry more of a connotation of "scrappy".

No comments:

Post a Comment