Data Science Summer 2019 Internship

Your challenge:  Help tell the story of books through data for the world's largest trade book publisher.


At Penguin Random House, the Data Science & Analytics group is an agile team comprised of data scientists, data engineers, front-end developers, and industry experts capable of tackling any data-oriented problem.


As a data science summer intern on the team, you will have an opportunity to contribute to a variety of high-profile projects while working closely with senior data scientists and key decision makers to help solve analytical problems of strategic value.  Our major areas of focus include price elasticity and marketing personalization.  Your domain of expertise will be equal parts feature engineering and statistical analysis or machine learning.


Qualifications include:

  • A bachelor's degree in statistics, economics, mathematics, computer science, business analytics, or the quantitative social sciences
  • Experience applying predictive modeling or machine learning techniques to "real world" data
  • Ability to communicate complex analytical concepts to a non-technical business audience
  • Data munging skills, including computing aggregates and performing table joints (e.g. SQL, dplyr)
  • Exposure to version control systems, preferably Git
  • Availability to work 28 hours a week.


Preferred qualifications:

  • A master's degree in a related field
  • Two years' experience working with scripting languages such as Python or R
    • A good understanding of scikit-learn, pandas, Theano, or TensorFlow
    • Familiarity with the R packages ggplot2, CausalImpact, dplyr, data.table, (b)lmer, lasso/glmnet, rstanarm, LOO, BayesPlot
  • Experience with Stan or other general-purpose modeling tools
  • Experience with experimental design and/or casual inference
  • Experience extracting data from APIs
  • Experience working with time series data
  • Comfortable iterating with Jupyter Notebook and the RStudio IDE, as well as shell scripting
  • Experience developing, testing, and deploying web applications using Shiny
  • Experience with automated feature engineering and large datasets (>1TB)


Please apply online and make sure to include a cover letter and resume.  Provide a link to your GitHub profile for a code sample, whether related to a Kaggle attempt, a school project, or a general open-source contribution.  Standalone samples are also accepted.


Penguin Random House is the leading adult and children’s publishing house in North America, the United Kingdom and many other regions around the world.  In publishing the best books in every genre and subject for all ages, we are committed to quality, excellence in execution, and innovation throughout the entire publishing process: editorial, design, marketing, publicity, sales, production, and distribution.  Our vibrant and diverse international community of nearly 250 publishing brands and imprints include Ballantine Bantam Dell, Berkley, Clarkson Potter, Crown, DK, Doubleday, Dutton, Grosset & Dunlap, Little Golden Books, Knopf, Modern Library, Pantheon, Penguin Books, Penguin Press, Penguin Random House Audio, Penguin Young Readers, Portfolio, Puffin, Putnam, Random House, Random House Children’s Books, Riverhead, Ten Speed Press, Viking, and Vintage, among others.  More information can be found at


Penguin Random House values the array of talents and perspectives that a diverse workforce brings. All qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status.



Company: Penguin Random House LLC 

Country: United States of America 

State/Region: New York 

City: New York 

Postal Code: 10019 

Job ID: 30375


Nearest Major Market: Manhattan
Nearest Secondary Market: New York City

Job Segment: Database, Part Time, Seasonal, Engineer, Intern, Technology, Retail, Engineering, Entry Level