# Contest Announcement and Submissions

12345 ... 22next »

 17 Honorable Mention Submission So, instead of doing a couple of exam questions in my learning time tonite I decided to make a contest entry. Background - Among my many hobbies is amateur radio. I have been licensed for >20 years but have drifted in and out of this pursuit very much. Last fall I started drifting in again. Not having any equipment and having moved into a new home since my last drift into it, one of the things I needed to procure and install was an antenna of some sort. Being that my home is in a bit of a restricted community (and any of you who are ham's will understand what I'm talking about) I had to come up with a fairly unobtrusive antenna. And I didn't want to spend a whole lot of cash on it either. So after much research and deliberation I went with a telescoping fiberglass vertical (again for those in the know, it's an S9v). Basically, this is a somewhat limber 31 foot tall tapered fiberglass tube with a wire in it. It works pretty well with my radio and tuner, I've only had a little time on it (class here is taking up all hobby time at the moment) but I have had good contacts to California and Argentina from my home here in Illinois on 10 meters. Anyhow, this background finally brings me to the point. See, this antenna being not very expensive, fairly invisible behind my house and all... it doesn't handle higher winds all that well. They say it's good up to about 40mph winds but I don't want to test it that far, after that it'll break and I'll have repairs and parts to order. Being that it's designed as a portable antenna, it's very easy to put up and take down and that's what I do. Put it up when I know the wind is going to be ok and take it down if I know the wind is going to be too much for it. Initially I had looked around the web for some service that would just send an automated email with a 12 hr wind forecast. I didn't find much that was suitable for my needs. Weather Channel has emailing but only current weather that I could find. I did find a service that had wind forecast for the next couple of days but it's mingled in with a bunch of other stuff that I didn't really need, it was a HTML email and all clogged with ads, etc. I did also know about NOAA's NDFD service (National Digital Forecast Database) where they supply various forecast info in XML format real nice. I had looked at that a bit and made a half-hearted attempt at trying to make a Win Phone 7 app to grab it but that was quite over my head. Until... this class and the superb coverage of pulling of web pages, parsing out text and the various goings on with Python. Thus, I submit - The Wind Gust Forecast Emailer (WGFE) When the contest was announced I had thought about this. But at the time I was in the thick of the Unit 6 HW which wasn't going all that well for me (turns out it went fine, got 100%). Today at work the thought came up again and certain things came together in my head. A bit of guessing, a lot of googling, much testing and about 3 hrs later I have my Windows 7 Task Scheduler running a python script that emails me nothing but the next 12 hrs maximum wind gust speed forecast in mph. Hmm, how much to discuss here and how much to leave for the reader? The NOAA NDFD has a facility whereby you can enter some parameters in a webform, lots of parameters, click submit and receive an XML document giving what you requested. I ran that for my zip code and timeframe and requested only the Wind Gust forecast for 12 hrs. That provided me a basic URL which I would only need to change the dates on to get future days forecasts. It's a fairly huge URL and to me (being more of a SQL person) basically seems like it's passing a bunch of parameters to the server where it pulls and formats the output for you based on your parameters. I knew I had to import urllib, we've used that before with the crawling. I basically recycled get_page to pull back the XML document. One wrinkle was filling in the appropriate dates into the url. Googling led me to the datetime library and a bit of work on that plus some string concatenation made it so I could dynamically insert a date into the URL that I'll pass to get_page. Now to pull out the wind gust figures, for 12 hrs there are 5 one every three hours (i.e., 12, 3, 6, 9, 12), I was able to commandeer get_all_links and get_next_target to extract the 5 values into a list. Then I had to write a new procedure to pick out the maximum number from that list. Rather than doing it the split the list and sort way I thought it was easier to just set a variable to 0 then loop thru the list to see if the next list value is bigger than the variable's current value and if so just reassign and then try the next one etc. And then once the maximum number is established, quickly convert it from knots to MPH, round it up and make it a string for later use. What was left was the emailing part. More googling led me very easily to smtplib and a basic template that I modified to suit my situation. I don't really know what's required for the submission, but I'll make a basic walkthru of the code for anyone who is interested - First the libraries are imported - urllib, datetime, smtplib, string (this one was with the example for smtplib so I kept it. I'm a noob so they are on separate lines, I'm sure there is a way to put them all on one. Next, the five procedures are entered, more on them later. After that some variables are set - now to feed date to set a date for the URL zipcode to use as the easy entry point for others to use this if they want address to hold the email addy to be used for the emailing url to build the value to send to get_page  Then the action, starting from the bottom - send_mail with WindGustForecast and date WindGustForecast calls find_max with get_speeds(which is just get_all_links) for report which is the url  get_speeds goes thru the document using get_next_speed (which is really just get_next_target) and makes the list of forecasts. That result goes back to find_max and receives the maximum value. And finally that maximum value and the date are passed to send_mail where magic happens and my ISP is forced into SMTP'ing the info to me. The automating of this is done in Windows Task Scheduler, I'm familiar with this from work where I have in the past scheduled various batch files to do things. Making it run python stuff was a little bit of a struggle but some searching and testing ended in a working system. Python code is here. Hope I've submitted this correctly. I'm sure that this is somewhat subpar compared to what other individuals and teams are going to submit and I'm sure it can be improved a whole lot which I'll probably work on from time to time going forward. It's been super fun developing this and actually seeing test emails in my box. Amazing really. In closing I want to thank Dr. Evans, Dr Thrun, Mr. Chapman, all of their support staff and all of you Udacians for providing this immensely important opportunity and wonderful community to the world. It's a very interesting time we live in. Participants making this submittal - rrburton Update I made a bit of an update. To better handle the date and time variables I created procedure set_vars which takes the current datetime and produces the appropriate start and end strings to insert into the url and modified the url formula to suit. I also modified the body text for the email (in send_mail) to read a bit better and use the more appropriate start string. I found that all of the imports could be placed on a single line separated by commas. Lastly, I added a few comments, including the CC, etc etc at the top. answered 04 Apr '12, 00:12 rrburton 1.9k●3●12●28 UdacityDave ♦♦ 17.6k●25●64●83 1 love this submission (04 Apr '12, 10:02) elssar 2 I just wanted to say I think this is a realllllllly cool submission! Great job, @rrburton! You've actually helped me decide on what I'm going to do...well, not exactly yet, but I AM going to do something that will help me in my day to day life. There is an excellent blog post from one of the github Python open source projects Dave & Peter linked to from the notes. The guy talks about working on things that are useful to YOU. I think you have done precisely that. (04 Apr '12, 11:48) Joe Balsamo 1 Great idea! I was wondering if there were any other hams in the class but I never got around to asking. I'm in IL too, AB9PR, good on qrz.com. I also use a ground-mounted vertical (the Zero-Five) with pretty good results. Haven't been in the shack in a while, especially with the demands of this course. Maybe we can try for a QSO some time. Good luck, OM. 73. -- Derick (10 Apr '12, 22:21) R. F. Derick... 1 Thanks for the reply Derick - super to see another ham here!. FYI I'm N9JVA and I'm good on qrz also (though I don't have anything there). Being only a Tech(+), I only really have 10m on HF so I try for weekend afternoons if it's not too windy. (upgrading is another thing I need to squeeze in somewhere). Not sure if we could make that short of a hop but it would be fun to try. Good luck and 73s (10 Apr '12, 22:50) rrburton

WINNER!

# Frivolo.us (search engine)

It is a search engine that grants fundamental rights to algorithms, provides advanced search capabilities, and it intelligently supports a person's supernatural predisposition.

Frivolous is a replacement for traditional search engines. Search engines today discriminate in favor of "better" algorithms and shun others by using pejoratives like exponential time, impractical, and vulnerable.

## Search like you used to.

On the surface, frivolous looks hauntingly similar to other search engines. It features a list of links, the title of the page, snippets of text below the link, spell checking, and links are sorted based on the PageRank™ algorithm.

Note: Spell checking works by looking at the words in the index instead of a dictionary of words. So any misspellings in the index will make a wrong recommendation.

## Relive the old days.

Ever wonder how cool it would be to search the web back in 1998 when search was at its infancy? Well, now you can. Using our platform for recruiting unused and abandoned algorithms, we have successfully implemented 1998 technology to work today. We introduce the altavista hashtag. Feel the thrill of exploiting vulnerable algorithms yesterday, today!

Note: This counts the number of times the words in the query appear in a webpage as described in the AltaVista lecture.

Our search engine is clever enough to provide answers to general queries.

Note: I use DuckDuckGo's API to answer general queries.

## Equations abound.

Most search engines today can understand mathematics. To equalize the playing field, our research team has developed the nerd hashtag for all your symbolic manipulation needs.

Note: I used the eval function so that SymPy could deal with the equations in the query.

## Search evolved. Literally.

Our search algorithm has the ability to transform a random text string and it evolves by randomly changing characters in the string but selecting the fittest string from the children by using the edit distance algorithm. This is triggered by the weasel hashtag.

Note: I got this idea from Richard Dawkins' "Growing Up in the Universe" series.

## Palindrome search. A new kind of search.

Have you ever wanted to search for a page with the most palindromes? Palindrome search is for all palindrome lovers. Heck, It even checks your query for any palindromes. Try that with your search engine.

## TED Talks built-in. Amazing.

Are you bored? Do you want to search the latest and greatest ideas that could change our world for the better? Then TED is the place to go. By including TED to our search engine, you now have the chance to manipulate the results to your liking. You're welcome.

Note: The program visits http://feeds.feedburner.com/tedtalks_video and extracts all the links, descriptions, and titles using BeautifulSoup.

But that is just the beginning. What if we want the altavista hashtag with the ted hashtag? Easy.

Bam! The TED hashtag is pretty special because it temporarily changes the index used by the other hashtags.

Wikipedia has a lot of external links which act as references to the article. The thing is, they usually link to relevant sites at least for general searches. Frivolous can now fetch these links to augment search results.

Note: Printing the results can take time because it still has to download the webpage and retrieve the title of the page. It uses Wikipedia's API.

## We understand your supernatural predispositions.

Millions of people are born unlucky. They wake up in the morning, spill orange juice on their suit, trip on a curb, and they miss the bus to school. But once they try searching on search engines today, they are automatically given the best results. Why are search engines trying to change one's destiny? If you are destined to be unlucky then it is your right to stay unlucky! How dare they fiddle with your life. Starting today, we are proudly releasing the unlucky hashtag. You're lucky you're unlucky.

The unlucky hashtag too is special like the ted hashtag which means it can be combined with other hashtags. Here's a combination of unlucky and altavista hashtags:

or unlucky, ted, and palindrome hashtags:

## Make your search engine solve hard things.

Have you always wanted to impress your crossword puzzle buddies? Do you want to know what it feels like to be a crossword puzzle rock star? Well, you're in luck because we have just implemented the crossword hashtag!.

Note: Inspired from Wolfram|Alpha's crossword puzzle solver. Like the spelling suggestion, it also highly depends on the indexed words.

## We partner with other search companies.

We're pleased to announce that as of April 15, 2012, we've formed our first search partnership. Search is a tough problem to solve and we think that the best way to tackle search giants is for search start-ups to collaborate. Now introducing the all-new and improved searchwithpeter.info hashtag which combines the website's best search results with ours. There has never been a better time to search the web.

Note: The program doesn't actually access searchwithpeter.info. It goes directly to udacity-forums.com.

## Open source.

The source code can be found at GitHub.

## Team members:

jtalon - talon.jag at gmail dot com

## Note:

All images, text (including the spaces between them), and code are licensed under a Creative Commons CC BY-NC-SA license.

Jag Talon
94551628

UdacityDave ♦♦
17.6k256483

1

ding ding ding! We have a winner!

(13 Apr '12, 13:36)
1

impressive! how much coding did you know before CS101???

(17 Apr '12, 13:04)
1

Nice, it has good ideas. And you write in an entertaining way as well. Thanks.

(20 Apr '12, 18:06)
 5 WINNER! Submission: DaveDaveFind I’ve been teaching myself Python for a few months, but Udacity’s CS101 course was my first formal introduction to computer science concepts. Before this course, I was okay at hacking things together on my own, but Udacity has helped me clean up my code, think beyond Python to the fundamentals of computing, and understand how to break big problems into little parts. At the end of class, I was a little disappointed that we never implemented our web crawler. For my project, I wanted to use as much of our original search code as possible to build a web application that searches the Udacity site, forums, and course materials. The result is DaveDaveFind. As Peter once pointed out, it’s important for any new search engine to have a good name. Unfortunately, “searchwithpeter.info” was already taken by a much more useful site, so I decided to call my site DaveDaveFind. DaveDaveFind searches the full-text of the Udacity website, CS101 forums, course documents, and lecture transcripts. It supports multi-word lookup, but sometimes works better for single word searches. If DaveDaveFind finds your search query inside a video transcript, it will try to link inside the video to the moment the query occurs. If your search query is a common Python-related term, it will try to look up information in the Python documentation. Try searching for the name of a built-in function or standard library module, like str or urllib. The search box also accepts commands inspired by bang syntax. Try typing --forum before your search query to search the CS101 discussion forum, --python to search Python documentation, or --daverank to show the DaveRank‌™ for each result underneath its URL. There are also some hidden commands, but you'll have to read the development blog or look at the code to find them! I learned to use the Bottle web framework and Google App Engine in order to build this project. I also used the DuckDuckGo API, BeautifulSoup, and the Robot Exclusion Rules Parser. The web app uses styles from Twitter Bootstrap. I learned to use all these free tools by reading their documentation (plus a lot of trial and error). Update: It looks like CS253 is going to teach Google App Engine, if you're not already enrolled. I also kept a blog with lots of notes on this project. I hope other Udacity students will use it as a resource, realize that the code we wrote in class isn't too far from a working application, and avoid a lot of the dumb mistakes I made. All the code is available in this GitHub repository. My Udacity ID is ecmendenhall@gmail.com, and my forum ID is ecmendenhall. answered 15 Apr '12, 15:37 E.C. Mendenhall 900●1●7●17 UdacityDave ♦♦ 17.6k●25●64●83 2 great entry :) very useful I was thinking of doing something with video, but I found the subtitles to be un-precise for what I was planning. I wanted to build a 'composer' with videos. You type a phrase, the code breaks it up into words and finds video snippets of people (of celebrities, politicians, Udacians, etc) saying those words, than it would stich them together... lol (17 Apr '12, 14:03) Gian Carlo M... 1 That sounds like a tough project! I think your entry is pretty useful, too. And it looks good—Bootstrap is a great tool. (19 Apr '12, 17:26) E.C. Mendenhall 2 the webapp looks BEAUTIFUL!!!! However, going to your appspot URL, I'm unable to enter a query string. I tried using the chrome browser and firefox 10, but when I click into the search box, I don't get a caret nor do characters show up when I type something on my keyboard. Am I doing something wrong? Thanks and nice work! (21 Apr '12, 18:32) Alexander Co... 1 You're not doing anything wrong! I've heard this from a couple other users, but I haven't been able to replicate it yet. It's frustrating. What OS are you using? (21 Apr '12, 19:17) E.C. Mendenhall
 4 Submission What I did This is what I did: ZhuFangZhi A search engine for housing rental! (Currently, it supports most cities in China.) Try it Copy any row of the following (or choose an arbitrary place in China) and paste it in the text box at ZhuFangZhi and press enter: 屯三里 春熙路 上海 徐家汇 北京 海淀 上地 环岛  Framework of ZhuFangZhi Web: nginx + supervisor + tornado(Python) Database: mysql Crawler: a script written in Python  The crawler called zfz-bot collects several big sites in China that provides housing renting information and store the url and the corresponding information (like renting fee, address etc) to database. When a user access ZhuFangZhi's website, and query for a place, the nginx server will pass the place information to tornado server, which will retrieve relevant housing renting information from database and generating the result web page. What does ZhuFangZhi mean in Chinese? It's a homophonic to a Chinese word meaning renting a house. Source Code License I grant a Creative Commons CC BY - NC- SA license to both this description and the source code. My Udacity ID i@liangsun.org  answered 15 Apr '12, 10:35 Liang Sun 136●5●10 2 holy cow!! (15 Apr '12, 10:49) elssar

## SUBMISSION

UPDATES: so I had some time and I went on to do some improvements

UPDATE TWO!
The site is now running fully on Django. It's a bit search is a bit broken but I'll try to fix it soon. The original code can still be seen on GitHub before 23/apr commits.

• Search engine improved (not case sensitive anymore)
• You can add items to search by clicking on the ingredient/part list
• Social media buttons added (fixed, actually)
• Some redesign using Bootswatch :)

## Project Name: HackingPot

Objective: I really enjoy doing some DIY/hacking projects. The thing is, I don't have much time to do them. But, once in a while, I have a weekend off and I really feel like building something, but then I face a problem: what should I build??? Normally I would look at my parts bin and waste hours searching the web for cool projects. With HackingPot, I can do this easily!

All I need to do is enter some 'ingredients' (or components/parts) and HackingPot searches selected DIY/Tutorial websites (in this version only Make: Projects, and specifically eletronic related ones (that's why the code specifies starting points/targets)), ranking each project by it's own algorithm that determines how close my 'ingredient list' is to the project's needed materials. There are currently 317 projects listed

## Idea origin: I've been wanting to build something like this for over 6 months, but I have a business background and my computer skills where very limited until I did CS101 (they still are but I have seen a great improvement!).

Next steps: During CS253 I plan on turning this into a real web application, using a real database (and not Pickle, like I did on this version). I have a list of possible improvements already that I couldn't manage to do due to a lack of available time:

• Spell checker and query tips. Overall improvement of the search engine.
• Hash table implementation
• Overall overhall of the code (it does the job, but I know it is horrible!)
• Nicer design.. :)

Tools uses: I used Django, Python, BeautifulSoup and Bootstrap.

Code and Demo: My code is available at GitHub under Creative Commons BY-NC-SA License. There is also a DEMO. If you use it, let me know what you think! :] Any code improvements would be appreciated also.

Finishing thoughts: I would like to thank all of the Udacity Staff for giving the world the opportunity to learn such a great course for free, with such an extremely talented group of professionals. This is really game changing. Long live Udacity!

Submission by: Gian Carlo / @gcmartinelli / gcmartinelli AT gmail

Gian Carlo M...
1.5k21948

2

Olá Gian Carlo! Our ideas are indeed very similar :D Let me just add that your interface is very nice and clean! Congratulations! I hope you enjoy my reciPY as much as I will enjoy using your HackingPot ;) Um abraço!

(17 Apr '12, 14:34)
1

Obrigado Amarals! You should check out Bootstrap, it is really easy to deploy if you know some HTML and CSS.

(17 Apr '12, 14:56)
1

btw, feel free to grab my code and adapt it to your project!

(17 Apr '12, 14:59)
 3 Submission Name Project: Adjective Crawler for Books Repository Link: https://github.com/astenolit/adjective_crawler The Idea. I am a Spanish teacher of Spanish-French Language and Literature in a Secondary School in Madrid (Spain). Since the first moment I was engaged in this awesome trip for learning how to write a search engine in Python, I always had in mind the idea of using it in my domain. How about doing a "book crawler" instead of a web one? I thought searching strings in a book and analyzing the results can help in understanding the book itself. The idea was nice and after the final exam, I quickly wrote the code, thanks to all the things I've learned in this awesome course. But after running it, I realised that the statistic output data was a bit useless. So I decided to change the code for searching only adjective words. It made much more sense looking for the surrounding adjectives of a given string. Why it can be useful. Knowing the surrounding adjectives of a given word, or string of words, gives us a glimpse inside the features of a novel character, as well for every given noun. For instance: if we're looking for adjectives surrounding the word 'Spain' in a text and you found many times the adjective 'beautiful', we can infer the narrator, or the writter, has a good view about the country. In addition to that, knowing how many times a given string appears and at what position in the text allows us to have an idea about the importance of this character or place; when it appears, disappears, etc. For instance: if we enter the name 'Dulcinea' -a character from "The Ingenious Gentleman Don Quixote of La Mancha" the classical masterpiece of Miguel de Cervantes (Spain, 1605), we'll discover that she appears mostly in the middle part of the book, not at the beginning nor the end. Finally, for statistis purposes, we can know the total number of words of a given book. How it works. First, it would be necessary having a comprehensive database of adjectives. Thus, I wrote a code for extracting all the adjectives from a dictionnary. I noticed all the entries are in caps and adjectives are marked, as usual, with 'a.'. Well, I wrote a code inspired of the things I've learned in this course cs101 for extracting only the caps words preceded by 'a.'. This code is also available in the repository. By this way, I have a huge amount of adjectives to work with. The result is on the file "adjectives.txt" The core of the code is the procedure 'adjective_crawler' which takes 4 inputs: A string 'archivo' -> the book in plain .txt format A string 'cadena' -> string to search in the book An integer 'field' -> number of words before and after the 'cadena' string to look into. An integer 'letters' -> the minimum amount of letters have to have each adjective for being counted. Example. I chose the classical book "The Ingenious Gentleman Don Quixote of La Mancha" of Miguel de Cervantes Saavedra (Spain, 1547-1616) for mostly 2 reasons: this is the most important masterpiece of the Spanish Literature and also is huge and hard to read! It is composed of 2 books. I chose the first one. I ran the code with different strings to search and also modified the range of adjectives. And I've discovered some interesting things: searching the string "Dulcinea del Toboso" for adjectives in a range of 2 words before and after the string, with at least 4 letters each: adjective_crawler ('don_quixote_part_1.txt','Dulcinea del Toboso', 2, 4) (notice that the code allows to search a string composed of multiple words) The code displays:  ######################################################### ## Adjective Crawler for Books ## ######################################################### STATISTICS ********** The file: 'don_quixote_part_1.txt' is composed by 198575 words The target string: 'Dulcinea del Toboso' appears 34 times in the text. 11 times in the first third of the text (32.35%) 19 times in the second third of the text (55.88%) 4 times in the third third of the text (11.76%) ######################################################### In a range of 2 word(s) before and after the targetted string, 12 adjectives with at least 4 letters have been founded: * 17 times: lady 3 times: distant * 2 times: peerless 2 times: beloved * 1 times: sovereign 1 times: sole * 1 times: often 1 times: neither * 1 times: lovely 1 times: fair * 1 times: even 1 times: beautiful #########################################################  According to the above data, we can infer some interesting things: 'Dulcinea' was a 'lady', and very beautiful!, because many adjectives related to her are in that semantic field. In addition to that, Dulcinea (by the way, she is the cherish lover of Don Quixote) is 'distant'; in fact, Quixote hardly met her: she lives in a far away town. If we look up the percentages, we can observe that this character is more relevant in the second third of the book. But at the end of the book, Dulcinea hardly appears. And we can get all this data even without reading the whole book! I hope this simple code will be useful for linguistic and for students as well. Thank you Udacity for all the things I've learnt. I'll try to stay udacious too. Enrique Contreras (astenolit) enconva@gmail.com answered 14 Apr '12, 07:54 ENRIQUE CONT... 1.3k●5●20●43 very nice! great application of what we learnt in this course! :) (17 Apr '12, 13:59) Gian Carlo M... Obrigado! ;-) (17 Apr '12, 14:50) ENRIQUE CONT...
 24 From questions that are trickling in, with all due respect, it seems some of us are not seeing the bigger picture. I think Udacity is not trying to find that one killer app, in the winner-takes-it-all mentality, that has been deeply routed in all our systems (even academia). The point is look, we want to see if this course alone can PRODUCE students that can develop nice usable products. As long as you create something, you are already a winner! It will make more sense if you apply most of the things we've learnt in class, than perhaps something outside (including the language used). IMHO, I think this contest is more about proving that the course was useful than proving that the winner-to-be is truly a guru! The moral lesson here is, when you learn something, go out there and apply it in the real world! So take it easy friends :-) answered 31 Mar '12, 08:26 ProfNandaa 491●1●2●5 2 Well said! I don't really think I stand a chance of winning the contest as a newbie, but if I can come up with some code that actually does something useful, I will feel like a winner every time I'll see my code execute something. 6 weeks ago most of of didn't even know about Python, and now we have our own search engine and thinking about what real world programs to write... That is just mind blowing! I am so grateful for this course, I feel like it really gave me new super powers! ;) (31 Mar '12, 12:36) Alja Isakovic 1 @ iAlja Yes, I agree with this line of thinking. I'm not interested in "gaming the system" to win a trip to Palo Alto! (though it's a cool town and I'd love to meet some of y'all up there, visit Google, Stanford, etc., etc.) But yeah, I am motivated to do something. Not sure what yet. For me, dreaming up the idea is harder than coding, sometimes. I love this place, though. This class has been amazing. (31 Mar '12, 13:16) Joe Balsamo

Submission

When we were making the Urank procedure the drawings by David were illustrative, but I wished to be able to manipulate and visualize pages and links in a more systematic way. Since we are dealing with graphs I thought to myself "this is a job for Graphviz". My submission is a simple set of procedures that write a file in the DOT language that can be read by Graphviz to produce some pretty images.

The code is hosted here.

## DOT language

The idea is REALLY simple, a graph is just a set of lines describing nodes and edges. Each node and edge can have some options that modify the way they are displayed. Here is an example:

digraph G {
node1 [label="This is a BIG node", fontsize = 30];
node2 [label="This is a small node", fontsize = 5];
node1 -> node2;
}


This examples makes a simple directed graph (digraph) with two nodes and one edge from node1 to node2.

## Code

My code has only two files (plus a README). In the search_engine.py file you can find the code we built in this class. There is nothing new here. The graphviz.py file is my contribution.

All the procedures are straightforward. Most of them just concatenate strings so the graph_dot procedure can return a string with all the information needed to specify the graph in the DOT language. The one procedure of a different kind is lookup_graph which makes a lucky_search of a keyword and returns a graph in which the result from the lookup is the center and all ingoing and outgoing links are displayed.

There are two more procedures (write_dot_file and write_dot_lookup) that actually write the .dot file usable by Graphviz.

## Examples

The "web" we used in the class is crawled by default and there are three variables defined with the results: index, ranks and graph. To produce a file named, for example, "web.dot" all you need to do is

write_dot_file("web.dot", graph, ranks)


Now you can use Graphviz (in a shell outside the Python interpreter) to make the actual image

dot -Tsvg web.dot -o web.svg


This will make a file web.svg

Notice how each node and edge is scaled according to their rank. If you open the actual svg image with your browser you will notice each node is clickable and directs you to the corresponding page.

There is also a way to make a lucky_search and produce a .dot file with ingoing and outgoing links from the result. For example

# the order of the parameters is (output-file,index,ranks,keyword,graph)
write_dot_lookup("Kidnap.dot", index, ranks, "Kidnap", graph)


... and using Graphviz to make an svg image gives you (complete image here)

## To Do

There are A LOT of thing that can be improved, specially when crawling the "real world". I tried crawling the web with a max_depth = 1 from http://www.xkcd.com and the images produced are terrible. In part because there are some pages that have many links and this clutter the images. But also there are some glitches in the dot file produced that I haven't tracked down. At the end of my README fille you can look at some ideas I have that could improve this submission. However I won't be able to work on them at least in the next 3-4 weeks (I'm getting married in less than 14 days!! so I won't have time to even look at a computer screen at all until I'm back home). I will like to improve this code in the hope it can one day produce illustrative images for future Udacians ;)

My big thanks to David, Peter and all of you! this has truly been an extraordinary experience!

- Andrés García Saravia Ortíz de Montellano

Andres Garci...
1.6k42061

2

Why this one is not accepted?

(10 Apr '12, 00:32)
1

I have no idea. Perhaps it's just an overlook that will be corrected soon?

(10 Apr '12, 01:34)
2

I like this one myself - I actually created a similar python script in the past for visualising the hierarchies produced by our internal software (basically Department/Team hierarchy for a company). Since these hierarchies can be huge, I grouped together teams with the same set of parents, and set the area to be equal to n*num_teams (note, area rather then width). A bit of colouring in, and it produced a really handy reference for debugging some of our larger clients (finding "missing" teams, finding loops in the hierarchy structure, etc etc).

Tl;dr: Love it :)

(10 Apr '12, 06:06)
 11 Submission Before I start, I would just like to give a massive Thank you to the whole Udacity team! I've taken both courses (CS101 and CS373) and I feel that I have learnt a lot! It has been a lot of fun, and I think you guys are on your way to change education forever! --- Video & source code --- Video explaining how the crawler works and what its applications can be: http://youtu.be/8yM3M5rmi7o Source code: http://pastebin.com/QNFyu4sa (single file, Creative Commons CC BY-NC-SA license) --- Background information --- I'm a Business administration student in Barcelona, Spain. I've felt really passionate about technology since I was a little kid, and finally I get the opportunity to study Computer Science! I'm really excited! What I wanted to do, is to combine both of my passions: Business Strategy and Technology / Computer Science. I felt like this was a great opportunity so I've programmed a crawler that crawls Collective Buying sites (more info. in the video and http://en.wikipedia.org/wiki/Group_buying) --- Why is this interesting? --- 2 reasons: Since there are so many Collective buying sites out there, you could build a crawler that crawls all of them and aggregates all of the deals. You could then list them on your website and refer traffic to each Collective Buying site, and have a stake in their revenues for each deal they sell to a user to referred to them. Imagine you are one of these Collective Buying companies. Wouldn't it be great to receive, in real-time, each day, a report that would highlight the deals your competitors have, their average price, discount rate, how many people are buying from them, what's their sales forecast for the year? This would be really cool, so I've decided to build this kind of crawler. --- How to use --- If you just want to try it out, it's really simple: Go to line 366 in the code You will see 6 variables that start with "switch_" Turn the first four True / False to limit the amount of pages you want the crawler to crawl Turn the last two True / False to display a global report and/or a sales forecast Example: switch_seed_all = False switch_crawl_shopping = True switch_crawl_travel = True switch_crawl_local = True switch_print_global_report = True switch_print_forecast = True  That's all there is to it! If you want to fiddle around with the code, you are most welcome ;) --- Udacity ID --- michael.gradek@gmail.com answered 11 Apr '12, 16:08 Michael Gradek 516●1●4●12 2 Just watched your video, loved it! Its really neat that just after crawling the website you give earning forecasts for it. (11 Apr '12, 20:18) Shashank 1 @Shashank Thanks! I appreciate your comment! (12 Apr '12, 02:33) Michael Gradek 2 Michael, this is great! Congratulations! I'm from Brazil and a business guy also, but I graduated some years ago (FGV-2008). I really liked your entry because you thought about building a business around it. You might wan't to research about http://save.me, a Brazilian startup that was bought about a year ago, only 2 months after launching. They aggregate group buying site's offers, like you propose. If there aren't any solutions like that in Spain you should really pursue it as a biz! :) I'm sure we'll be seeing cool web apps being built by Udacians who could never have put together a prototype before, always depending on convincing/paying other people to do it for them. (13 Apr '12, 20:19) Gian Carlo M... 1 @Gian Carlo Thanks!! I have made some great friends last year from FGV when I was on my exchange semester :) I think you are 100% right, it's amazing how such an introductory course of building a crawler can open up so many opportunities. I can't wait till the Web App Engineering course starts :) It will also be incredibly interesting to see how people apply computer science to their areas of expertise. This can be really insightful and illustrate how diversity creates innovation, I'm really excited about this! Thanks for your comment, and best of luck with for your business projects! (14 Apr '12, 12:00) Michael Gradek
Question text:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

×15,288
×76
×10
×3