Lecture 4: The Internet
Wednesday, 11 October 2006
We've talked briefly about podcasts
We're at the point where we should talk about such things as the podcast!
What is this podcast?
student answer: "The ability to take a program or video program off of an Internet and watch it on your iPod"
Podcasting is best explained by a demonstration.
The point is not necessarily to download the information via a webpage
You want to see this information through a podcasting client such as iTunes.
At the link above, click on the "Subscribe" to link.
This brings up the podcast listing in iTunes
We've been posting videos/audios/Videos of the Week on the podcast.
Podcast - new buzzphrase is the distribution of audio, video and content over the Internet. This distribution was possible before but what makes podcast special is a relatively new technology called RSS
RSS - Really Simple Syndication. a way to subscribe to online content
To publish our content to our podcast, we edit our podcast RSS feed via an XML file.
Later on we will be working with a derivative of XML called XHTML in the Web Development portion of the class.
In the XML file, we define the content: where to download the file, summary of the content, title, description.
When you go to a podcast directory all you are seeing is a listing of what the company has published in the RSS file. When you download a file you aren't downloading it directly from iTunes directly but instead through the link that the XML file defines.
Remember, you don't need to have a video or PDFs on your podcast.
Indeed, podcasts started as a sort of "Radio" of the Internet.
In the beginning, people would record their own content, compress it as an MP3 and publish it online in their personal podcast.
Internet Communication: Victor Cajiao
Victor does a weekly podcast and took an interest in what we do here.
We've invited him here (virtually!) tonight to discuss what it takes to start your own podcast.
We will communicate with Victor via Skype
Skype is a program with which you can instant message someone else
But really, you can do VOIP
VOIP: Voice Over IP. The ability, essentially, to make phone calls over the Internet.
It is freely available (that's $0.00 a minute to make a long distance call)
You can even make a video connection to have a video conference of sorts.
Video quality doesn't tend to be as good as the audio, but its still free.
To make a call with Skype you must have at least: a microphone and speakers.
If you have the speakers too close to the microphone you may get feedback, an annoying echo and noise that may inhibit people from truly enjoying VOIP.
We recommend using headphones to avoid this problem.
David: We have your video going through the lecture hall here. Where are you geographically?
V: I'm located in beautiful Southern California where its still warm!
D: Please explain a bit about podcasts
V: Its all about content first. You and I have a voice and we can let people know what we're passionate about. Whether its journalism, comedy, etc, there is no formula today for making a successful podcast. The first thing you need is a good story to tell and good content. The rest of it is just simple technology. You're gonna need some type of platform to start your podcast, like your computer. Start with something simple like a headset mic. Use something like that to record your audio. Next you need some kind of sound card, which is available on most computers today. If you have a Mac, GarageBand has built-in capabilities so you'd be all set. Audacity lets you mix together the sound of your voice, SFX, music, and combine all of those into a podcast in an MP3 formatted file. You record your voice, mix the elements, save the file as an MP3 and now you have the content saved somewhere for people to listen to. How do you get the information from your hands onto the Internet? You need to get yourself a hosting provider, or blog provider. Just like in blogging where you're writing down your journals there are providers that are giving podcasters the ability to save MP3 files and distribute them. Switchpod.com, available for free allows you to insret some commercials and will host your information for free. Bandwidth is an issue but it helps if you use a provider such as Switchpod. Users can then find your podcast using a podcatcher such as iTunes or Juice. Someone can open up the program, they say "Oh! Victor has a new podcast available." They can then download the program directly from your server. Its good to practice with family and friends first. We're not professionals, so it might be a good idea to get feedback first (and make sure its available for download without error). How do you promote? Let podcasting directories know you're there. 3 major ones: iTunes Store will review your content and unless it is questionable you will be in their directory. Podcastalley.com, podcastpickle.com. You and I as listeners can go and can find podcasts for all kinds of content. Matter of fact, there are not 1 but 2 different podcastings on knitting. If you're looking to be a star or to get rich off podcasting, this is not likely the time for this to happen. Less than 10% of all podcasts have more than 100 people listening to them. There are lots of people listening but there are only a very few listening to your type of show. My podcast deals with the typical Mac and PC users. My other podcast is called Immigration Tales, dealing with integration in one country or another. I'm passionate about these things. Even if I only had 100 listeners, I would still do it.
D: That was fantastic. Would you mind fielding questions?
Q: Thank you Victor. What do you see as the future of video of podcasting?
V: That is a huge future. The thing about video is that the complexity is a lot more. They're a lot of fun to watch but the editing process and the time it takes to make those is a lot. I don't see it as being as heavily geared towards you and me who can buy a $10 microphone who can get our message out.
Q: As someone new to podcasting, it doesn't seem to me that you can put out a podcast and get a huge demand. Is there really demand for one lone individual's podcast?
V: You'd be surprised. There are some shows that take off for unknown reasons. Keith & The Girls, is somewhat R rated but it took off. Get yourself into the podcasting community from within. Send emails to other podcasts asking to play a small promo from your own podcast. The podcast community organically grows, if you have good content they will come.
Q: Is there a possibility for podcasting generating income in the future?
V: I sure hope so! I think that there is. There are plenty of podcasts today that generate 8-9 million downloads a month and are generating some dollars. How do we turn podcasting model w/ a more traditional commercial media and have someone pay for it?
D: Thanks so much for joining us today. We'll be sure to post links to your podcast tonight.
Some links Victor mentioned:
Juice Podcast Receiver: http://juicereceiver.sourceforge.net/index.php
iTunes Music Store: http://www.apple.com/itunes/
Podcast Alley: http://podcastalley.com/
Podcast Pickle: http://podcastpickle.com/
Keith and the Girl Podcast: http://keithandthegirl.com/
Typical PC User Podcast: http://typicalpcuser.com/
Typical Mac User Podcast: http://typicalmacuser.com/
Podcast 411: http://podcast411.com/
Samson USB Podcasting Mic:
One of the things you can do with Skype today is dial any landline or cellular phone in the US for pennies per minute. It used to be free but it is still cheap.
You will recall a couple of weeks ago we should the internal of a hard drive.
Dan found a video on YouTube that demonstrates the internals of a hard drive.
The creator removed the top of the hard drive but left the cable connected.
That was a lot of noise and time for deleting a folder. Why?
Seeking to the folder
It looks like the reading head was going a lot of back and forth. Why?
It may have information on different parts of the disk. The folder may be fragmented.
Folders don't require a lot of storage space, they're represented efficiently. But what's inside of folders?
There were a lot of files in that folder that it had to delete as well.
Speaking of podcasts, thank you for subscribing to ours.
Our podcast has been so popular that we have been booted off of Harvard's servers.
We got a plan through Dreamhost.com. We had a plan for 1.6TB (terabytes). None of our videos are this large (they are hundreds of megabytes).
Remember a floppy? 1.44MB. Roughly 1000 times this is a gigabyte. Roughly 1000 times this is a terabyte.
Four days into our contract, we had used 1.4 of our 1.6TB!
Like cellular phone companies, we would be charged for overages. Luckily, we've been able to upgrade our contract to 8TB.
In 11 terms of teaching this class, this is the first time we've had the opportunity to talk about TB in more than just passing.
The Internet: Lets talk about the topic we've skirted over until now!
What is it?
It links all computers together via a network. It is THE network of networks.
How they are connected is one of the foci of tonight.
Today we'll focus mostly on the higher level details.
Next week will be more technical.
What is a domain?
its an address of sorts
Cnn.com Harvard.edu are domains
It is an English-like phrase that describes a network of computers that is usually geographically related.
You might assume that harvard.edu computers are geographically related.
fas. (in fas.harvard.edu), dce., eecs., law., post., ...
Separates a domain into smaller sections so that its easier to administer.
In Harvard, the EECS department gets its own domain and can manage it.
Email addresses can have subdomains. E.g., email@example.com.
Top-Level Domains (TLD)
.com, .edu, .edu, .gov, .info., .mil, etc
In general, this tells you a bit about the website: .edu is educational institution, .gov is government, .mil is military, .com is commercial, etc.
Whitehouse.gov is the correct website. Whitehouse.com used to give you something very different!
There used to be restrictions but there are no longer restrictions on .com, .net, .org
Originally used to make obvious hierarchies (such as .jp for Japan, .co.uk for commercial UK sites, etc) are now getting a bit more relaxed
You can register your own domain name!
Spend a little bit of money and go to domain name registrars (such as godaddy.com).
You can get, for example, davidmalan.jp!
So lets break apart a URL
www. cnn .com
www - subdomain or hostname
Hostname - the name of one particular computer on the Internet
cnn - domain
com - top level domain
cnn.com - hostname
When you send an email to firstname.lastname@example.org
fas - subdomain, not hostname
When we send an email we don't care which computer an email gets sent to, we care the location - we want it to go to fas.
Typical email address:
username @ domain.tld
username @ subdomain.domain.tld
Valid characters: hyphen, underscore, alphanumerics, period
No slashes, spaces, dollar signs, "weird characters"
A period has a different meaning on the left side of an @ sign, and on the right.
On the right it separates subdomain/domain/tld
On the left it is part of the person's chosen username
Capitalization does not matter in an email - case insensitive
Which are syntactically valid?
daffy email@example.com - NO, there's a space
dave@cbs - NO, missing TLD
jay@NBC.com - YES
firstname.lastname@example.org - YES
email@example.com - YES
user@yahoo!.com - NO, exclamation point
Note the @franklin.ma.us. This denotes a certain geographical location (Franklin in MA, United States). So why don't we use .com.us to refer to companies within the United States?
We were the first ones there.
Who Invented the Internet? Not Al Gore.
Originally a US military project to get computers to communicate across the country
Other countries sometimes do denote their country code in the TLD: cnn.co.uk (United Kingdom), cnn.co.jp (Japan)
Can you live in UK or Japan and own a .com?
There are no restrictions.
Question: I'm going to more and more websites that dont have www. in front of it!
Long story short, back in the days of when the Internet was becoming first popularized you would see addresses as: http://www.cnn.com
Gradually you saw http:// being dropped.
Even www. is a bit of a mouthfull and is a bit gratuitous.
Now it is very easy to make a website that operates at both www. and without the www.
Peer-to-peer. "P2P". Two computers connected together. A network of computers connected in an ad-hoc fashion.
A direct connection of sort from one computer to another.
Client and Server relationship.
Much like a restaurant. You, a client, will ask the server for information of food or to order a dish.
This is very much how the Internet works. You are the client and make a request to, say, cnn.com to show the main page and the server will serve to you the page that you requested.
Even through programs such as Skype there is a central server that allows two users to communicate without having to know any information about where the physical or logical connection of the other user is.
Skype's server knows where David is, knows where Victor is, and will make the connection between the two.
AOL Instant Messenger uses a similar scheme. What does this imply about the privacy of your instant messages?
Not very much privacy! Since your information is being sent through a server it is possible for AOL to read it. Whether or not they do this is not known.
Privacy concerns are true with other services as well, such as email.
email is sent through servers and so this could be recorded as well.
however, cost usually prevents companies from snooping
From our brief chat with Victor we found that we were not only transferring audio but video. Which costs more in terms of space, an instant message or a video of YOUR face?
This might serve as a problem when trying to channel all of this information through Skype's central server.
Ideally, the server informs the two clients how they can directly contact each other and avoid channeling a lot of information through a central server.
This direct connection can cause some problems. If you have a home router it can make the appearance that there is only one machine in your home when there might be 5. There are ways around this, but for now just consider that the server only puts the clients in touch with each other and then leaves the clients to their own devices.
A LAN, Local Area Network is a whole bunch of computers connected together, usually very geographically based.
A WAN, Wide Area Network. Used to connect two LANs together. For example, perhaps a company has two headquarter buildings (and a LAN in each). The WAN is the connection between the two HQ LANs.
Usually a website is not hosted by one large computer, but by many small, efficient, and relatively cheap computers.
This way the load can be balanced across each computer
if one goes down, the whole website doesn't go down
having a domain name prevents us from seeing the abstraction: we don't have to see how many servers they have in a pool.
they can have a hostname of every computer: alpha.cnn.com, beta.cnn.com, gamma.cnn.com for example. These are not subdomains but the name of each particular computer.
But if we go to cnn.com, your request will go to one of alpha, beta, or gamma without us knowing which one it is.
Remember that a hostname is the name of a particular computer on the Internet.
What does this refer to?
Do not, do not!, send an email message in all capital letters
Please take the time to turn it off.
It implies shouting.
Spam is a huge problem.
It uses up so much bandwidth.
You may not think so because every email is quite small.
But because its so cheap to spam millions of people, even if you get only 1% replies that is quite a profit.
Popular types: loans, Nigerian emails, porn, debt
Not only a sociological problem but a technical one as well.
What do you often see in spam?
Can't email back or reply to it (bogus return addresses)
You should NOT click links that offer to unsubscribe!
You're only verifying your email address
Instead of saying "Hey! I want out!" You're saying "Hey! I exist"
How do spammers get email addresses?
Any published emails on a webpage can be found by using a 'bot' that crawls the Internet just looking for email addresses.
Would it make sense to make your email address a link instead?
Even though its invisible to the user, the email address is embedded in the HTML and can still be harvested
Often what people do is make an email into an image rather than having the text itself on a webpage.
if you show some information as an image it is harder for a computer to harvest data from it
You may recognize that when you sign up for services on sites that a cryptic image shows up asking you to copy the word or text. It is to prevent bots from creating new accounts without a human behind it.
Spam detection is getting better, but email was not designed correctly to address this problem.
This was not raised as an important issue initially
Simply because no one foresaw the Internet becoming as large as it has become.
:-) - smile
:-( - frown
:'( - cry
:-D - laughing or Super Smile
:-O - surprise
The World Wide Web
What is the WWW?
The network of all networks? NO! This is the Internet
The Web and the Internet are not the same thing.
The Internet is an infrastructure, in a physical form.
It is the backbone on which the WWW resides, it is something you can do on the Internet.
Email, Instant Messaging are all services that reside on the Internet.
A URL is something you've probably typed every day.
Please don't call it an "Earl"
Canonical form: protocol://machine/path
Common protocol is "http"
You can have other protocols for other services.
Hyper Text Transfer Protocol
Just like we communicate in English, web browsers and web servers communicate in HTTP
You can have slashes in a URL, its usually because you have something like
bar is just a folder on the foo.com web server
index.html is just a file in the bar folder
index.html tends to be the default name for most webpages.
C:\Program Files\ (Windows path)
/Users/Library (Mac path)
The slashes both in the URL and in a folder name in your PC are paths.
Which of these are valid?
http:\\www.bar.com - NO! slashes are wrong direction
Forward slash: /
Back slash: \
www.bar.com - NO! Not officially a URL since its missing the protocol. A web browser helps us cheat by adding the protocol for us.
http://www.bar.com/menus/friday.html - yes
http://www.bar.com/?item=pizza&topping=cheese - yes
http://CNN/ - NO! No TLD.
ftp:/ftp.bar.com/pub/setup.exe - NO! One slash instead of two actual slashes.
Please note that Problem Set 3 is due October 25, NOT October 11.