Unit2-30 get_page function for Python 3.x

In python 3.2 instead of using get_page(url) method form the module called urllib2, we should import the module called urllib and from that module call a class method called urlopen(url, data, timeout) that takes an url as an input and returns an object that we can call a read() and readline() methods on it and convert the result into a string using str() method and finally call the print_all_links method and see the result:
Here is code that I use in python 3.2 which is a bit different from python 2.x in some aspects:

from urllib.request import urlopen

'''This function takes a string that represents the HTML content of a webpage
and return a url of a link it finds in it, and the the index of the last position
return None, 0
end_ind_quote = content.find('"', start_index_quote + 1)
url = content[start_index_quote+1:end_ind_quote]
return url, end_ind_quote

while True :
# check if the url is anything but the empty string or None
if url:
print(url)
page = page[end_pos:]
else:
break
def test():
# open the url
html = urlopen('http://www.xkcd.com/')
# read from opened url and store in page
# cover page into the string type and pass it to the print_all_links function

test()


Please correct me, if I misunderstand anything or if I did something wrong in my code. I know that it works, but I am not sure whether the way I implemented is the right way or not.

Cyberax
3313722

accept rate: 85%

This is my implementation of get_page(page) function. When I saw using that function in professor Dave's class, I thought it is standard function, that is implemented in Standard Library.
Later I realized that it is not built-in, and I found urllib2 library and function urlopen() and read(), so I implemented it by myself. I hope it will help someone. Greetings to all starters of Udacity and have a nice and easy learning very useful stuff.

def get_page(page):
import urllib2
source = urllib2.urlopen(page)


Drazen Lazar...
24315

Awesome! I had the same question. It's so useful to practice this in the terminal. BTW, I now see why no one uses the mailto html anymore for e-mail addresses, scanning for them in pages is way too easy!

(22 Jan '13, 12:12)

Thanx for providing the code sample for Python 3.2; Personally I am using Python 2.7.3 since another intro text I am using is avoiding Python 3 for now, as does the Google free class on Python (they just say avoid Python 3 for now). But the next text I will be working through starts with Python 3, so I hope to be able to see the differences after learning them both. I do know with Python 3 that functions now have to have parenthesis, and the way parameters and tuples are handled has changed, so I can't wait to take a look at that sometime. But for now I am plucking along with version 2.7.3 =)

1747

What are you discussing?

Could you possibly include a little preamble to postings that seem to reach way beyond CS101 Unit 2 level, just saying what you will be discussing? It's difficult to see if there's anything here that would help me with the problem I'm working on (missing test expression and 'else command' as required for Unit2-30 quiz).

Christian Mi...
1.6k51941

1

I guess the info I need would be how the problem you are clarifying relates to the problem as stated in the quiz.

(30 Apr '12, 06:01)

1

This is not specifically about Unit2-30. But if you do a google search for "get_page python 3" it is one of the first things that comes up. I am using it in Unit3-34. At some point during the course you might find yourself trying to write code on your own computer. When you do the get_page function will come in handy.

(12 May '12, 11:42)

I use the following function to implement get_page. The main change I made was converting the source/html to a string (note the use of decode [many thanks to http://groups.google.com/group/comp.lang.python/browse_thread/thread/b88239182f368505 for this fix]). Using the "with" keyword ensures the page is closed after you are done reading from it.

def get_page(page):
from urllib.request import urlopen
with urlopen(page) as f:
return html


Marlen Brunner
622

i tried Marlen Brunner's function but kept getting this error message:

Traceback (most recent call last):
File "vm_main.py", line 26, in <module>
import main
File "/tmp/vmuser_yylzngfzkx/main.py", line 52, in <module>
get_page(page)
File "/tmp/vmuser_yylzngfzkx/main.py", line 45, in get_page
from urllib.request import urlopen
ImportError: No module named request

I also tried CyberAx's and Drazen Lazar's functions and get similar responses. I think it has to do with the imports. if this method is covered in a later unit then Ill just learn it then

import urllib.request
from urllib.request import urlopen


will be a better idea?

Paritosh Tri...
111

@cyberax I tried your code. It doesn't work :/ I don't know why.

11

Question text:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

×30,489
×634