Instructor Contact and General Information

 
Instructor: Luís Finotti
Office: Ayres Hall 251
Phone: 974-1321 (don't leave messages! -- e-mail me if I don't answer!)
e-mail: lfinotti@utk.edu
Office Hours: By appointment only, in-person or with Zoom.
Textbook: A. Adhikari, J. DeNero, D. Wagner, Computational and Inferential Thinking: The Foundations of Data Science, 2nd edition, 2021.
Prerequisites: None.
Class Meeting Time: MWF 11:30-12:20 at Perkins Hall 319.
Exams: Final: May 14 (Tuesday) from 3:30 to 5:30.
Grade: 80% for Assignments, 20% for Final Exam.
See here for letter grade ranges.
 

Back to the TOP.

Course Information

Course Content

Data 201 is a first course in data science. In this course we will cover some basic aspects of data manipulation and interpretation. We will cover:

  1. causality;
  2. exploring and visualizing data;
  3. randomness and sampling;
  4. hypothesis testing;
  5. inference;
  6. predictions and classification.

We will also learn some basic Python, including basic usage of Pandas, a data analysis Python package, and matplotlib, for data visualization. (We will use Jupyter Notebooks to write our code and notes.)

We will cover most of our textbook, although we will likely skip Chapters 14 and 18.

Method of Instruction

Mondays and Wednesdays lectures in the classroom, while Fridays are online coding lab sessions (via Zoom — the link is available in Canvas) where students are guided through a practical exercise on their own laptops. These will often go over the assignments and are optional, although strongly recommended, especially for those with little or no coding experience.

Course Objectives

By the end of this course you should be able to:

Assessment

We will have 10 labs (computer projects to be turned in) and a final exam only. The labs, with lowest score dropped, account for 80% of your grade and the final exam accounts for 20%.

Labs

Each lab is posted on Canvas on Monday. We work on it/discuss it on Friday’s meetings (via Zoom), and it is due on the next Monday evening, although you can turn it in early.

The labs will be Jupyter notebooks that will fill with code, mostly to analyze some data set(s) using the techniques we learned in class that week.

The Friday meetings for the labs are over Zoom. To access the Zoom meeting, log in to Canvas and click on “Zoom” on the left panel. There you will find the link to join the meeting.

The labs will be recorded and available in Canvas. Again, just click on the Zoom link in Canvas, and then on "Cloud Recordings" to access the recordings.

See schedule below for the tentative lab topics and meeting dates.

If you have completed and turned in the lab before Friday, you do not have to attend lab.

Final Exam

The final exam will be in class, but done in Canvas using Proctorio. You will have to bring your laptop and ID to class, and answer all question in class.

It will contain mostly multiple questions mainly covering:

A practice final (which does not count for your grade) will be available in the last week of classes to give you an idea of what the final exam will be like.

Attendance

As observed above, attendance to the Friday labs are optional.

Although attendance to lectures is also not strictly required, it is highly encouraged. We will discuss finer points in class that likely reading the book on your own would not provide. We will also have more practice in class, and you will be able to ask questions.

So, although there is no penalty for missing class, I will give extra-credit points to those with high attendance. More precisely, those with 90% or higher rate of attendance (for lectures only, labs do not count) will receive 5 extra points (out of 100) to their final course score. Those with 75% or lower rate of attendance will receive no points, while those in between will receive a scaled number of points. For instance, someone with a 82.5% rate, half way between 90% and 75%, will receive 2.5 extra points.

Note that I might forget to take attendance every once in a while (feel free to remind me!), and that’s why someone with 90% still gets 5 points. (The original idea was to give points for those with attendance between 100% and 90%.) So, the extra 10% will compensate for days I forget to take attendance.

Note: Canvas has a strange way of computing grades, so when the attendance grades are added before the final, it will seem like some will have their grades go down. This is just due to the way that Canvas completely disregards grades before their due dates. So, don’t worry if you see this happening. Attendance grade can only improve your grade.

Files

On the left panel of Canvas you will find a link called Files, where I will put many of the files used in the course. In particular you will find two folders: Chapters and Reference.

The Chapters folder contains the files I will use in our lectures. You can open the files for each chapter and follow along with your own notebook, if you wish. We have a Jupyter notebook for each chapter in its own folder. The folder for each chapter will also contain the data sets used in the examples of the corresponding chapter/notebook. These notebooks follow each chapter of our textbook very closely. Besides some small modifications, the main difference is that all code was converted to use Pandas instead of the book’s own (and not commonly used) library for data analysis, called datascience. (Pandas is the standard library for data science, while only the book uses the datascience library! You might as well learn what you will actually use in real life.)

For each chapter you will see two Jupyter notebooks. The one that has class on its name is the one we will use during class, and has blank sections of code for us to fill in as we go. You should use this one to follow along! This will give you some practice with what we’ve learned and help you better retain the material. The file that has no class as part of its name has all the code spots already filled in, and can be use as reference. (After we finish the class, both files should look identical or very close.)

Important: These chapter files might change! I might change them as I prepare or review lectures before teaching, or if I decide to change something when teaching. So, these files only become “official” after covered in class.

The Reference folder contains notebooks with examples for Pandas, NumPy, and functions we will write and use during the course. You can always refer to them when trying to remember how to do something we covered in class. They do not contain as much explanation as the notebook that first introduce them, but should serve you well as references.

Schedule

Here is a very tentative schedule.

Monday Wednesday Friday Chapters
01/22: Course Overview 01/24: Causality 01/26: Install Anaconda 1, 2
01/29: Python/Jupyter 01/31: Data Types 02/02: Lab 1: Python/Jupyter 3, 4
02/05: Sequences 02/07: Sequences 02/09: Lab 2: Data Types 5
02/12: Data Frames 02/14: Data Frames 02/16: Lab 3: Arrays 6
02/19: Visualization 02/21: Visualization 02/23: Lab 4: Data Frames 7
02/26: Functions 02/28: Functions 03/01: No class 8
03/04: Functions 03/06: Randomness 03/08: Lab 5: Plots 8, 9
03/11: Spring Break 03/13: Spring Break 03/15: Spring Break
03/18: Probability/Sampling 03/20: Hypotheses 03/22: Lab 6: Functions 9, 10
03/25: Hypotheses 03/27: A/B Testing 03/29: Spring Recess
04/01: A/B Testing 04/03: Predictions 04/05: Lab 7: Probability/Sampling 11, 12
04/08: Predictions 04/10: Predictions 04/12: Lab 8: Hypotheses 13
04/15: Predictions 04/17: Bootstrapped Regression 04/19: Lab 9: A/B Testing 15
04/22: Classification 04/24: Classification 04/26: Lab 10: Bootstrap and Predictions 16
04/29: Classification 05/01: Catch-up 05/03: Review 17
05/06: In-class Review

Most likely the labs will fit this schedule, and as mentioned above, the corresponding homework is due on Mondays. Here is the due dates for the labs (which will also be available in Canvas):

Lab Topic Lab Date Due Date
1 Python and Jupyter Friday 02/02 Monday 02/05
2 Data Types Friday 02/09 Monday 02/12
3 Arrays Friday 02/16 Monday 02/19
4 Data Frames Friday 02/23 Monday 02/26
5 Plots Friday 03/08 Monday 03/11
6 Functions Friday 03/22 Monday 03/25
7 Sampling and Probability Friday 03/29 Monday 04/08
8 Hypotheses Friday 04/12 Monday 04/15
9 A/B Testing Friday 04/19 Monday 04/22
10 Bootstrap and Predictions Friday 04/26 Monday 04/29
Back to the TOP.

Course Tools

Python

We will use Python for all computations and data manipulation in this course. We will not assume you know Python (or any programming), and we will go over the basics in the course.

Python is one of the most popular languages today (if not the most popular), especially due to its simplicity and extensibility. Therefore there are countless resources available online, including many YouTube videos. These can be used to help supplement (or give a different perspective to) what we will cover in class.

One of the easiest ways to install and launch Python is to use Anaconda. In our first Friday meeting, we will help you install it, but you can install it yourself before then as well. Note that if you are already familiar with Python (and the pip package installer), you do not need Anaconda.

The following Python packages will be used in our course:

We will not use the datascience package from the book! Pandas does the job better and is the standard library for data analysis and manipulation with Python.

Jupyter

We will also use Jupyter notebooks. We will use JupyterLab for notebooks, and not the older Jupyter Notebooks. These run in your browser and allow us to have richly formatted text, graphics, and computer code (Python code, for us) in the same document, making them quite convenient for data analysis.

Again, these are quite popular and there are countless resources online, but we will cover the basics as well.

For entering text, you can use Markdown for basic formatting and LaTeX for math symbols and formulas. Markdown is pretty simple, while LaTeX is quite powerful and the standard tool for typesetting mathematics.

Ed (Discussion Board)

We will use Ed for online discussions. The advantage of Ed (over other discussion boards) is that it allows us (or simply me) to use math symbols efficiently and with good looking results (unlike Canvas) and to post (and run!) formatted code. It also allows anonymous posts (also unlike Canvas).

Ed also uses Markdown (mostly) and allows the use of LaTeX.

It also allows us to enter code in Code Blocks and Snippets. (See the Quick Start Guide.) With Snippets we can even run the code, using Python. Please use code blocks whenever entering code!

You can access Ed through here: https://edstem.org/us/courses/51113/discussion/. (There is also a link at the “Navigation” section on the top of this page.)

To keep things organized, I’ve set up a few different categories for our discussions:

I urge you to use Ed often for discussions! (This is specially true for Feedback!) If you are ever thinking of sending me an e-mail, think first if it could be posted there. That way my answer might help others that have the same questions as you and will be always available to all. (Of course, if it is something personal (such as your grades), you should e-mail me instead.)

Note that you can post anonymously. (Just be careful to check the proper box!) But please don’t post anonymously if you don’t feel compelled to, as it would help me to know you, individually, much better.

Students can (and should!) reply to and comment on posts on Ed. Discussion is encouraged here!

Also, please don’t forget to choose the appropriate category for your question. And make sure to choose between Question and Post.

When replying/commenting/contributing to a discussion, please do so in the appropriate place. If it is an answer to the question, use the Answer area. If you have a comment, question, or suggestion, you can use the Comment area.

You can also use Ed for Private Posts, by checking the corresponding box. Posts marked as private will be only viewed by the student who posted and me. Only use this what you have to ask cannot be shared with all, e.g., if you are sharing something from your HW. Otherwise, don’t make it private, as other students might have the same questions as you.

You should receive an invitation to join our class in Ed via your “@vols.utk.edu” e-mail address before classes start. If you don’t, you can sign up here: https://edstem.org/us/join/SmhrgK. If you’ve register with a different e-mail (e.g., @tennessee.edu) you do not need to register again, but you can consolidate your different e-mails (like @vols.utk.edu and @tennessee.edu) in Ed, so that it knows it is the same person. (Only if you want to! It is recommended but not required as long as you have access to our course there!) Just click on the Account icon on the top right of Ed, select Emails, and then Add email address.

Back to the TOP.

Course Policies

Homework Policy

Homework (Labs/Assignments) will be posted on Canvas. These are Jupyter notebooks where you will fill “blanks” with the appropriate Python code to perform the requested tasks.

As previously mentioned, these will be posted on Mondays. On the following Friday we will have a lab (over Zoom) where we can discuss and work on this assignment. Then you will submit your (filled) Jupyter notebook on the following Monday.

The original plan is to have 10 assignments/labs to be turned in, with the (tentative) dates posted in the section Schedule above. I will drop the lowest score.

Your HW scores will account for 80% of your grade.

Assignments are individual! You can work with someone else and ask for help, but you should be able to understand the ideas and write your own code!

Also, it should go without saying, that you cannot copy code you find in the internet (including using AI tools, like ChatGPT). It is usually easy to spot these and you will be penalized for doing it! If you are having a hard time, ask me (or a colleague) for help instead.

Note: Do not use Canvas’ assignment comments to contact me about HW. Please just write me an email (at lfinotti@utk.edu) or use Ed.

Late Homework/Lab Policy

Late homework (labs) will incur a penalty of 10% per day, unless a valid excuse is provided. Moreover, labs over 7 (calendar) days late will not be graded (and receive 0 points), unless again a valid excuse is provided.

Note that Canvas automatically applies the penalty. If you provided an excuse for a late HW that was accepted, it is your responsibility to check Canvas to make sure that your grade is correct (with no extra penalty). If it is not, you must contact me by email so that I can manually fix it.

Communications and E-Mail Policy

You are required to set up notifications for Ed and for Canvas to be sent to you immediately.

On Ed, click on the Account icon on the top left, then Settings. In the new page click on Notifications. Under New Thread Digest, set the drop down box to Instant. I will consider a post in Ed official communication in this course, I will assume all have read every single post there!

For Canvas, check this page and/or this video on how to set your notifications. Set notifications for Announcements to “right away”! (Basically: click on the the profile button on left, under UT’s “T”, then click “Notifications”. Click on the check mark ("notify me right away") for Announcements.)

Moreover, I may send e-mails with important information directly to you. I will use the e-mail given to me by the registrar and set up automatically in Canvas. (If that is not your preferred address, please make sure to forward your university e-mail to it!)

All three (notifications from Ed, notifications from Canvas, and e-mails) are official communications for this course and it’s your responsibility to check them often!

Feedback

Please, post all comments and suggestions regarding the course using Ed. Usually these should be posted as Post and put in the Feedback category. These can be posted anonymously (or not), just make sure to check the appropriate option. Other students and myself will be able to respond and comment. If you prefer to keep the conversation private (between us), you can send me an e-mail (not anonymous), or a private message in Ed (possibly anonymous).

Back to the TOP.

Conduct

All students should be familiar with HilltopicsStudents Code of Conduct and maintain their Academic Integrity: from Hilltopics Academics:

Integrity

Study, preparation, and presentation should involve at all times the student’s own work, unless it has been clearly specified that work is to be a team effort. Academic honesty requires that the student present their own work in all academic projects, including tests, papers, homework, and class presentation. When incorporating the work of other scholars and writers into a project, the student must accurately cite the source of that work. For additional information see the applicable catalog or the UT Libraries site. See also the Student Code of Conduct and Honor Statement (below).

All students should follow the Honor Statement (also from Hilltopics Academics):

Honor Statement

"An essential feature of the University of Tennessee, Knoxville, is a commitment to maintaining an atmosphere of intellectual integrity and academic honesty. As a student of the university, I pledge that I will neither knowingly give nor receive any inappropriate assistance in academic work, thus affirming my own personal commitment to honor and integrity."

You should also be familiar with the Classroom Behavior Expectations.

We are in a honor system in this course!

 

Disabilities

Students with disabilities that need special accommodations should contact the Student Disability Services and bring me the appropriate letter/forms.

 

Campus Syllabus

Please, see also the Campus Syllabus (Fall 2023).

 

Back to the TOP.