Unit 1 question selecting subsequences in strings

When selecting a sequence of chars in a string you use string[start:stop].
Since <start> is the position of the first character in the sequence why isn't <stop> the position of the last character?

Why was <start> designed as inclusive and <stop> as exclusive?

asked 16 Mar '12, 20:20

alan%20wheeler-2's gravatar image

alan wheeler-2
4715
accept rate: 50%


3 Answers:

One reason has to do with lists and strings being indexed from 0 instead of 1 (and most languages out there do this--not just Python, blame C for this). Thus, if you did a substring of str[0:len(str)], you get the entire string.

Another reason is for processing strings. For example, suppose you want to take substrings at index A, B, C, then you would have str[0:A], str[A:B], str[B:C], str[C:]. I don't have to add plus 1 everywhere.

Finally, it also allows you to take substrings that are empty strings, e.g. str[A:A], although I can't think of a good reason off-hand.

Sometimes string functions prefer start index and length instead (I prefer this, but start and end + 1 seems more common).

link

answered 16 Mar '12, 21:37

Charles%20Lin's gravatar image

Charles Lin
9.2k4294135

APL used 0-based indexing before C did. :-)

(17 Mar '12, 11:46) Kenneth I. L... Kenneth%20I.%20Laws-1's gravatar image

Fundamentally, the answer is "because that's the choice that the language author made." I agree with you that it's an odd choice, but then I'm used to a language that does it the other way. (Panorama PanTalk. It also uses a semicolon instead of the colon.)

There's no hope of reversing the decision, but we are free to think in terms of a different name for the second parameter. You could call it "upto" or "limit" or "until", for instance.

EDIT: On later thought, "startscan" and "stopscan" seem like good mnemonic names for the two parameters. If one were really bugged by the built-in syntax, one could write a wrapper function that adds 1 to its second parameter and then calls the built-in version. A bit slower to execute, of course, but Python isn't the right language for high-performance computation.

link

answered 16 Mar '12, 20:34

Kenneth%20I.%20Laws-1's gravatar image

Kenneth I. L...
21.5k1976178

edited 17 Mar '12, 11:41

I went back to see if he said anything about this in the lecture in case I missed it, but he didn't say why its this way.
Its just strange.

It would be nice to know why it is done this way.

(16 Mar '12, 20:51) alan wheeler-2 alan%20wheeler-2's gravatar image

My first guess is that Guido van Rossum was used to doing it that way in the previous language he used (ABC, nee SETL?), or chose to do it the way some larger community (Java?) was then doing it.

My second guess it that it arose in the misty past, when some subroutine author wrote a tight code loop to test various syntaxes and found that his code was slightly more elegant with this parameter definition. (Other test cases would have given other results.) The choice was later locked into Python as a built-in function.

The least likely hypothesis is that someone ran tests with a large number of programmers writing a lot of code, and discovered that this choice improved productivity or programmer satisfaction. That's the kind of testing Bell Labs did to arrive at an optimal arrangement of digits on early telephones. There is a human factors research community building models of human attention and action that can simulate repetitive behaviors and predict error rates, but so far their models apply better to cockpit design than to programming language development.

But I'm talking through my hat, with no knowledge of the real history. Anyone know the answer?

(16 Mar '12, 21:26) Kenneth I. L... Kenneth%20I.%20Laws-1's gravatar image

One way to think about this is to think about the indexes being between the letters. Like this:

 p = "String indexing"

is

" S t r i n g   i n d  e  x  i  n  g  "
 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

This gives a reasonable model for the examples above - things like

p[2:6] => ring
p[7:12]+p[12:15] == p[7:15]

...and so on. If you want, you can see every index as a running count - How many characters has there been in this string before this slot? Also, it explains how you can start from 0 and still slice with the length of the string.

link

answered 17 Mar '12, 14:16

JohanG-Sweden's gravatar image

JohanG-Sweden
9.4k1242100

edited 17 Mar '12, 14:16

Your answer
Question text:

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×15,222
×178
×70
×43

Asked: 16 Mar '12, 20:20

Seen: 267 times

Last updated: 17 Mar '12, 14:16