Capstone Project (Data Science Process)

From Graduate Student to Data Scientist: My Two Cents on Making a Successful Transition

Prologue

I graduated with a Master of Engineering (MEng) in Industrial Engineering and Operations Research from Berkeley in 2016. Berkeley was awesome. Period. However, getting that first data science job for someone like me, having no prior work experience and an undergraduate degree in Mechanical Engineering, wasn’t a piece of cake. Nevertheless, just as with most things in life, things have a magical way of falling into place. Looking back, it’s evident that all those stressful moments, rejections and setbacks were just guideways directing me to the right path. This path has me currently working as a Senior Data Scientist at eHealth in San Francisco.

Every year, I make it a point to go to the Alumni Brunch at Berkeley to share my story with current students with the intention of telling them all the things I wish I was privy to when I was in their position. Over the years, I found myself covering the same points, which gradually gained structure and eventually culminated into a short monologue which you may or may not have heard if you’ve interacted with me at one of these events.

It’s funny how many variations of the same questions I’ve heard but if I had to rephrase it, it’s just, “How do I do what you did?”. The first time I answered this question, I jumped right in but later realized that before I attempt to enlighten you with the nitty-gritty details, you need to know the following:

1. You cannot replicate someone else’s experience. Everyone has their own story so what worked for me might not work for you. Any advice you get, including this, is just a personal opinion (sample size = 1). I recommend talking to as many people as you can, take the best points from each of their advice and then map out your own journey.

2. You do not need a job before graduation. Yes, it’s nice to have one but you still have time after graduation. If you’re an international student in America, you have three whole months after graduation to find a job. If you’re looking for a job as a software developer or management consultant, you need to start applying as early as September but hiring for Data Science tends to get more active as you get closer to graduation. I say this because there is a lot of peer pressure when you just start your program in the Fall to apply for jobs, attend career fairs, attend info sessions and network with everyone. While this is useful, it must be done in moderation. Talk to people in different companies, ask what skills they are looking for, try to make some connections but don’t make this your top priority in the Fall. At this time, your primary focus should be learning new things, enhancing your technical skill set and setting up a good foundation for the Spring semester. Having the right skills in your arsenal will give you pertinent talking points and make you an interesting connection for someone.

3. Keep internship options open. As you must have deduced from the point above, I didn’t get a job before graduation. A month after graduation, I got three full-time job offers and an internship offer. I picked the internship because I was blown away by the vision and knowledge of my interviewers, and I knew they’d be good mentors. By the end of the internship, I not only had a full-time job but the skills section on my resume tripled (Adding Spark, git, bash, Impala, Neo4j alongside R, SQL and Python that I knew after graduation). The short-term risk of choosing an internship was worth the long-term reward. Also, don’t limit yourself geographically, look for opportunities around the country. I love living in San Francisco now but the one year I spent in New York City/Jersey City was incredible as well. If you do go the internship route and are an international student in America, talk to your manager about the possibility of converting the internship to a full-time so you can start your H1-B visa application process in April. Also, make sure you work for an e-verified company because only they can apply for your STEM OPT extension.

4. Your current major doesn’t matter. I firmly believe that anyone can get a data science job irrespective of their current major by focusing on the skills mentioned below. I won’t lie but it’s definitely easier with a major like Electrical Engineering & Computer Science (EECS) or Industrial Engineering & Operations Research (IEOR) because you can pick up relevant skills while earning credits towards your major, but if you’re ready to put in some extra work then it’s not impossible.

5. Finally, I cannot tell you which courses or project to choose since they keep/will keep changing but I’m going to tell you the skills you need. The onus is on you to pick the courses and projects that will help you gain these skills.

Main Story

Now, let’s dive into the things you need to do. There are other roles such as Business Analyst or Data Analyst that might only require a subset of the skills but if you pick these up, getting those jobs should be easy as well.

1. Machine Learning Algorithms

As a data scientist, you’ll definitely be doing some kind of Machine Learning. It’s not good enough to have an overview of the working of these algorithms; you need a theoretical understanding of their underpinnings. For example, how does a decision tree make a split? What’s the cost function of a logistic regression model? How does the k-means algorithm cluster data points? A strong theoretical understanding helps you gauge the advantages and disadvantages of picking a particular approach for a problem. Deep Learning is becoming very popular these days, however, deep learning roles tend to be very specialized. While it’s good to know, I believe that a good strong understanding of Machine Learning can help you get by for most Data Science positions right now. Personally, my first step into Deep Learning territory happened after graduation through an online course which was motivated by an upcoming project during my internship. I’m sure you will stumble upon Deep Learning use cases on the job so I definitely recommend picking up the skills at University if you have the bandwidth. You’re here to learn after all but I would still advise focusing first on Machine Learning.

Resources — Free online course by Trevor Hastie and Rob Tibshirani (authors of the Elements of Statistical Learning textbook). Andrew Ng’s Machine Learning and Deep Learning courses on Coursera. Use these video lectures alongside your regular University classes to solidify your theoretical knowledge.

Programming

The next thing you need to know to do is to implement these algorithms using a language such as Python or R. If you’re just starting out, I recommend Python since it’s a much more general purpose programming language. Learning to code these algorithms from scratch definitely solidifies your theoretical understanding but it’s not a must. If you’re working in Python, you should know how to use sklearn, h2o.ai, keras etc to implement the algorithm you want. You must also know how to filter and transform raw data to prepare it for modeling. I say the above assuming you already have the knowledge of fundamental programming concepts such as variables, if-else statements, loops, and functions. If not, you might want to start there. At University, I worked primarily on Jupyter Notebooks and ad-hoc R scripts for my classes and projects so the above knowledge sufficed. Hence, I’d definitely recommend mastering the above initially. But keep in mind that most practical data science applications require model deployment to production which entails being familiar with object-oriented programming, unit testing, git and some bash scripting. I didn’t get much time to dive into these concepts or use them during my degree but I did spend some time reading about them. This helped because I remember not feeling completely out of my comfort zone when exposed to it during my internship.

SQL

You must must must know SQL. Every company has structured data and you need to know how to query it. Even Hive and Impala that query distributed Hadoop clusters use SQL syntax. Apart from very basic SQL, you must be familiar with concepts such as joins, sub-queries and window functions. No matter where you are, understanding your company’s data and key metrics usually begins with SQL.

Resources — SQL Tutorial by Mode Analytics.

Statistics and Probability

If you religiously follow step one, you’ll soon realize that you can’t ignore topics like maximum likelihood, sampling, and p-values and you’ll start reading more about them. If I were to list topics, you should definitely know Bayes Theorem, Hypothesis Testing, and Design of Experiments. A/B Testing has become intrinsic to product development and as a data scientist, you must be aware of setting up and validating the results of experiments. Now, don’t be overwhelmed, these topics take a while to sink in but having a bird’s eye view is always a good idea. To be honest, it took me a while and on-job experience to get fully comfortable with these concepts, but starting early only helps.

Resources — A/B Testing course by Google on Udacity.

Capstone Project (Data Science Process)

The last on the list but the most important. Your Capstone Project is where you tie all the above knowledge and present some original work. It is also where you learn about the entire lifecycle of a data science project from working with messy raw data to validating a model you built. Use your project to make you stand out from everyone who has the above skills. If your project doesn’t require machine learning, make it require machine learning. There are always clever ways to tailor your project to incorporate it. Try getting a project with a company to leverage that experience as an internship, to work with a practical dataset and solve a real business problem. One of the things I loved about the MEng program at Berkeley was the flexibility of the projects and the opportunity to apply for company-sponsored projects. Furthermore, try to enroll in classes with projects so that you can list them on your resume upon completion.

Epilogue

Also, remember:

1. It’s easy for me to write that you need all these skills but the truth is that I didn’t know all the above when I got my first data science opportunity. I learned a lot of things over the years, and I’m still learning every day. Try your best to gain as much of the above skills as you can. Be confident of what you know and know it well and be cognizant of the things you don’t know. For all the things you don’t know in detail, have a high-level idea of what it is so you can have a smart conversation with someone. Having a vague idea is always better than having no idea. There’s always room to learn things on the job and employers know that.

2. You’re in University for only a short period of time so try to have fun as well. Work hard and play harder. Travel for spring break, go out to the city, explore what your college town and the University have to offer. Don’t burn yourself out.

3. Know what’s happening in the field. I do this by subscribing to data science newsletters that send you a list of top articles every week. The stuff you read makes for really good talking points during interviews and keeps you informed. Here’s one I recommend — www.datascienceweekly.org/. If reading is not your thing, firstly, kudos on making it this far and secondly, there are a bunch of great podcasts out there as well such as Data Skeptic or Liner Digressions.

4. Finally, keep picking up skills every day, keep learning every day, keep persevering every day, keep trying every day and don’t stop doing this even when you think things are going south and trust me, everything will work out for you just as it did for me. And before you know it, you’ll be at the Alumni Brunch or your own version of the event sharing your success story with others and I really hope you do pay it forward.

All the best!

I hope this helps. If you have any questions you can reach out to me through LinkedIn — www.linkedin.com/in/joelprince. Happy to chat!

 


Дата добавления: 2019-02-26; просмотров: 139; Мы поможем в написании вашей работы!

Поделиться с друзьями:




Мы поможем в написании ваших работ!