Make NSXMLParser your friend..

Update:

Since this is my most read blog article and it’s now 2 years old, one would think that things have changed, and they have.
2 Things: First, if you are free to choose between any API data representation, use JSON or plist, JSON is especially nice, because it’s very small and can be parsed with YAJL or TouchJSON.
And second, and foremost, please use something less painful.. like WonderXML – it really really helps. However, if you are still up to using NSXMLParser – go on reading..


For a demo on this, you may use http://svn.liip.ch/repos/public/iphone/MyBeer

As promised, here is a little How-I-did-it / How-To.

First off: I am not an experienced SAX-User.. So this approach might be packing the problem at it’s tail, but this is how DOM-Users feel comfortable with ;)

Let’s assume we want to parse the following XML:

tranist.xml

<root>
    <schedules>
        <schedule id="0">
            <from>SourceA</from>
            <to>DestinationA</to>
            <links>
                <link id="0">
                    <departure>2008-01-01 01:01</departure>
                    <arrival>2008-01-01 01:02</arrival>
                    <info>With food</info>
                    <parts>
                        <part id="0">
                            <departure>2008-01-01 01:01</departure>
                            <arrival>2008-01-01 01:02</arrival>
                            <vehicle>Walk</vehicle>
                        </part>
                        <part id="1">
                            <departure>2008-01-01 01:01</departure>
                            <arrival>2008-01-01 01:02</arrival>
                            <trackfrom>1</trackfrom>
                            <trackto>2</trackto>
                            <vehicle>Train</vehicle>
                        </part>
                    </parts>
                </link>
                <link id="1">
                    ...
                </link>
                <link id="2">
                    ...
                </link>
            </links>
        </schedule>
        <schedule id="1">
            ...
        </schedule>
        <schedule id="2">
            ...
        </schedule>
    </schedules>
</root>

In human readable format, this means: We have multiple schedules with from/to etc. These schedules consist of multiple links (different connections for the same route) with departure/arrival etc. These links consist then of multiple parts/sections with various elements which are not sure to be there..

With the let’s find the element called ‘part’ – approach, you won’t get anywhere..

The Basics

So what do we want to achieve? We want a list/array of Schedules, which have the given members. On member is a list/array of Links, also consisting of the given members and a list/array of parts with the respective members.

This is also the basic idea behind my approach: for every new node-container, use a new class/object (an array will also work, but it’s kinda crap..)

Now we have a Schedule class, a Link class and a Part class.

This is an example of the Link class interface:

Link.h

#import "Part.h"

@interface Link : NSObject {
    NSString *departure;
    NSString *arrival;
    NSString *info;
    NSMutableArray *parts;
}

@property (nonatomic, retain) NSString *departure;
@property (nonatomic, retain) NSString *arrival;
@property (nonatomic, retain) NSString *info;
@property (readonly, retain) NSMutableArray *parts;

- (void)addPart:(Part *)part;

@end

We use an accessor method for the parts, because it just feels better when dealing with arrays. (Instead of later using [foo.myArray addObject:..] we have [foo addMe:..])

Also we make it easier for us, using retain properties..

The Parser setup

A short introduction into SAX:

The parsing goes node by node and is not nesting-sensitive. That means that first we get root, then schedules, then schedule, then from, then to, then links, then link, then departure etc. As soon as the parser returns you the node for example, you don’t know anymore in what schedule you were. As long as you have a clearly defined structure where always every element must be present, you could do this using a counter, but as soon as you have multiple nodes with no defined count, you have a problem.

What we do is known as recursive parsing. What does this mean? We implement some kind of memory.

In our parser, we have 4 members and 1 method (to make actual use of the parser..):

@property (nonatomic, retain) NSMutableString *currentProperty;
@property (nonatomic, retain) Schedule *currentSchedule;
@property (nonatomic, retain) Link *currentLink;
@property (nonatomic, retain) Part *currentPart;
@property (nonatomic, readonly) NSMutableArray *schedules;

- (void)parseScheduleData:(NSData *)data parseError:(NSError **)error;

(Yes, this needs to be a NSMutableString..)

Your parseScheduleData method should look similar to the following:

parseJourneyData

- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];

    self.schedules = [[NSMutableArray alloc] init]; // Create our scheduler list

    [parser setDelegate:self]; // The parser calls methods in this class
    [parser setShouldProcessNamespaces:NO]; // We don't care about namespaces
    [parser setShouldReportNamespacePrefixes:NO]; //
    [parser setShouldResolveExternalEntities:NO]; // We just want data, no other stuff

    [parser parse]; // Parse that data..

    if (err && [parser parserError]) {
        *err = [parser parserError];
    }

    [parser release];
}

Now we need those delegate methods.

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string

This function is called by the parser, when it reads something between nodes. (Text that is..) Like with blah it would read “blah”. It is possible, that this method is called multiple times in one node. As you will see later, we define the property “currentProperty” only if we find a node, we care about. That’s why we test it against this property to make sure, that we need this property. This will then look something like this:

Parser

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (self.currentProperty) {
        [currentProperty appendString:string];
    }
}

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict

This is called, when the parser finds an opening element. In this case, we have a few cases, we need to distinguish. These are:

It’s standard property in the schedule (like <form> etc.) or it’s a deeper nested node (like <links>), the same for all the other nodes.

How to? We define, that we only set a member, if we are in that node. That means, only when we have entered a <part>, then currentPart is set, otherwise it’s nil. The same with the others.

We do then need to check them in reverse order of their nesting level.. Why? Because if we would check for currentLink before currentPart, currentLink would also evaluate to YES/True and hence we will have a problem if their are elements with the same name. If we aren’t in any node, then there is probably a new main node comming -> in the else..

When we hit a nested node, we need to allocate the respective member of our class, so we can use it when the parser gets deeper into it.

This will look like this:

Parser

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
    if (qName) {
        elementName = qName;
    }

    if (self.currentPart) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"vehicle"] || [elementName isEqualToString:@"trackfrom"] || [elementName isEqualToString:@"trackto"] ) {
            self.currentProperty = [NSMutableString string];
        }
    } else if (self.currentLink) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"info"]) {
            self.currentProperty = [NSMutableString string];
        // Check for deeper nested node
        } else if ([elementName isEqualToString:@"part"]) {
            self.currentPart = [[Part alloc] init]; // Create the element
        }
    } else if (self.currentSchedule) { // Are we in a  ?
        // Check for standard nodes
        if ([elementName isEqualToString:@"from"] || [elementName isEqualToString:@"to"]) {
            self.currentProperty = [NSMutableString string];
        // Check for deeper nested node
        } else if ([elementName isEqualToString:@"link"]) {
            self.currentLink = [[Link alloc] init]; // Create the element
        }
    } else { // We are outside of everything, so we need a
        // Check for deeper nested node
        if ([elementName isEqualToString:@"schedule"]) {
            self.currentSchedule = [[Schedule alloc] init];
        }
    }
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName

Basically, the same things apply as for didStartElement above. This time, we need to clean things up and assign them if they are set :) This is a bit a pitty, since it’s a lot of code.. *(for not so much)

It’s the same checker-structure..

If we are in a deeper nested node (like <Link>) and we hit an ending element of that nested node (like </Link>), Then we need to add this element to the parent (like <Schedule>) and set it to nil

See yourself:

Parser

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
    if (qName) {
        elementName = qName;
    }

    if (self.currentPart) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"]) {
            self.currentPart.departure = self.currentProperty;
        } else if ([elementName isEqualToString:@"arrival"]) {
            self.currentPart.arrival = self.currentProperty;
        } else if ([elementName isEqualToString:@"vehicle"]) {
            self.currentPart.vehicle = self.currentProperty;
        } else if ([elementName isEqualToString:@"trackfrom"]) {
            self.currentPart.trackfrom = self.currentProperty;
        } else if ([elementName isEqualToString:@"trackto"]) {
            self.currentPart.trackto = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"part"]) {
            [currentLink addPart:self.currentPart]; // Add to parent
            self.currentPart = nil; // Set nil
        }
    } else if (self.currentLink) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"]) {
            self.currentLink.departure = self.currentProperty;
        } else if ([elementName isEqualToString:@"arrival"]) {
            self.currentLink.arrival = self.currentProperty;
        } else if ([elementName isEqualToString:@"info"]) {
            self.currentLink.info = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"link"]) {
            [currentSchedule addPart:self.currentLink]; // Add to parent
            self.currentLink = nil; // Set nil
        }
    } else if (self.currentSchedule) { // Are we in a  ?
        // Check for standard nodes
        if ([elementName isEqualToString:@"from"]) {
            self.currentSchedule.from = self.currentProperty;
        } else if ([elementName isEqualToString:@"to"]) {
            self.currentSchedule.to = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"schedule"]) { // Corrected thanks to Muhammad Ishaq
             [schedules addObject:self.currentSchedule]; // Add to the result node
             self.currentSchedule = nil; // Set nil
        }
    }

    // We reset the currentProperty, for the next textnodes..
    self.currentProperty = nil;
}

Finally..

Well, that’s it. You can expand / shrink this principle as you like. You can also add a maxElements counter, like in the SeismicXML example of the iPhone SDK to get only a certain number of elements. You can abort the parser with [parser abortParsing]; It is important, that you don’t abort while in a deeper nested node, because this could lead to inconsistencies. You will need to skip them..

Please note, that I wrote this, while watching TV, so you may need to fix some syntax errors ;) But I hope you get the idea..

41 Comments to Make NSXMLParser your friend..

  1. August 4, 2008 at 2:56 pm Permalink
    BW's Gravatar BW

    I think I’m missing something, at what point is the external XML files called?

  2. August 4, 2008 at 3:41 pm Permalink

    Right here:
    - (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];

    You can also load it with “initWithContentsOfURL” or with a string with [NSData initWithByes:length:] (and provide the xml string) or with [NSData initWithContentsOfFile]

  3. August 4, 2008 at 3:43 pm Permalink

    And this file is included in your controller/wherever class, where you downloaded and need your data. You then call this parser with
    [myParser parseJourneyData:myData parseError:&err];

  4. August 11, 2008 at 9:35 am Permalink

    As far as I can see you are not releasing properly; as in e.g.

    self.currentPart = [[Part alloc] init]; // Create the element

    the Part instance should be relased just after the assignment because the currentPart property has a retain attribute.

  5. August 11, 2008 at 4:50 pm Permalink

    David: You are absolutely right. I actually copied the property part from some real code and wrote the other in the blog editor. Will correct it, as soon as I find some time. Thanks!

  6. August 25, 2008 at 1:25 pm Permalink
    matt's Gravatar matt

    Thanks for this excellent tutorial – this is exactly what I’ve been searching for.

  7. September 29, 2008 at 9:45 pm Permalink

    Thanks for this write up. I am just starting out with Cocoa and even driven xml but will be sure to give this a second read tomorrow.

    Any particular reason you used the dot syntax?

  8. October 2, 2008 at 2:05 pm Permalink
    Agustin's Gravatar Agustin

    @Marc:

    Do you have the code please??

    is giving me an error

  9. October 7, 2008 at 6:50 am Permalink

    Thanks so much, I got the link from a friend, I will have a try~

  10. February 25, 2009 at 7:30 am Permalink

    awesome! but shouldn’t the last else if check for a “schedule” at the end instead of a “link” i.e.

    else if (self.currentSchedule) { // Are we in a ?
    // Check for standard nodes
    if ([elementName isEqualToString:@"from"]) {
    self.currentSchedule.from = self.currentProperty;
    } else if ([elementName isEqualToString:@"to"]) {
    self.currentSchedule.to = self.currentProperty;
    // Are we at the end?
    } else if ([elementName isEqualToString:@"link"]) { // <<<< shouldn’t this line check for a @”schedule” instead of a @”link”
    [schedules addObject:self.currentSchedule]; // Add to the result node
    self.currentSchedule = nil; // Set nil
    }
    }

  11. February 27, 2009 at 2:43 pm Permalink

    Muhammad Ishaq: Oh, yes, silly me, that is the typical copy/paste error :)

  12. April 17, 2009 at 9:57 pm Permalink
    greg's Gravatar greg

    Do you have this code in full? I am having trouble understanding the snippets you have been so kind to show us and how they fit into the whole project.

    Thanks for the great tutorial!

  13. April 19, 2009 at 2:02 pm Permalink
    marc's Gravatar marc

    greg: I’m currently doing my annual army service, will come back to your post in 2 wks

  14. April 23, 2009 at 9:27 pm Permalink
    Nate's Gravatar Nate

    I would love to see the complete code too. I’m getting errors when I’m trying to use my object classes…such a noob.

  15. April 30, 2009 at 2:38 am Permalink
    Parisman's Gravatar Parisman

    Looks like there is a new XML/HTML parser on the block. May be worth a look…

  16. May 6, 2009 at 4:49 pm Permalink
    Matt's Gravatar Matt

    I would like to know when and how to release the objects properly. Here there are apparently some memory leaks.

  17. May 8, 2009 at 9:18 pm Permalink
    Thierry's Gravatar Thierry

    it crashes for me when release the parser on iPhone.
    and if I don’t release it after parse, instruments find a leak…

  18. May 9, 2009 at 4:10 pm Permalink
    marc's Gravatar marc

    As I mentioned in the blog post and earlier in the comments, I wrote this while sitting in front of the tv – in my blog editor. So there might be a few syntax errors and of course memory leaks.
    I try to setup a very small example of how it’s supposed to be connected.

  19. June 18, 2009 at 1:29 pm Permalink
    Daniel's Gravatar Daniel

    I think this is what I need to solve my problem but the example is too complicated to understand :-(

  20. July 13, 2009 at 10:41 am Permalink
    Danial's Gravatar Danial

    This is exactly what I have been searching for! I agree it is a tad complicated to follow but I will persist. However what I do not see at present is how you would drill into the various classes? So if you decided to populate a table with schedules that you then display, click on one and drill into that etc etc. :-(

  21. July 28, 2009 at 3:06 pm Permalink
    MK's Gravatar MK

    in the didEndElement method, the assignment,
    [schedules addObject:self.currentSchedule];

    Does anybody get this while printing the contents of the schedules array?

  22. July 28, 2009 at 3:10 pm Permalink
    MK's Gravatar MK

    by this i meant Schedule: 0x1164d60 , was in ‘html’ tags.. :(

  23. January 25, 2010 at 12:06 pm Permalink
    pradeep joshua's Gravatar pradeep joshua

    Hi Marc;
    I saw your demo project video on transport application ans saw some data(means:Make NSXMLParser your friend) load in this application. you had parse some xml in this project.
    my problem is “how to stored in this xml data on the xml file on a single array.”. i have one xml file. format to be same use for u. how to parse in this xml then how to stored the data.

    nombre this tag car name

    mydoubt: does not set the value(car details) on aparticular car name. suppose ALFA ROMEO 159 is car name.this car models are two details blow xml. To display first car name:ALFA ROMEO 159 details:detail-1 and second car name:ALFA ROMEO 159 details:detail-2.

    i am try use in your concept. but not set value. how many class create .how to stored value.how to set value from array;

    i am trying 2weeks marc.help me marc.to send my mail id

    ALFA ROMEO 147

    5 Puertas Compacto
    105 CV
    150 CV
    18.820
    23.620

    /pub/fotos/vehiculoNuevo/180/9/371/2009/24_5/01.jpg

    ALFA ROMEO 159

    4 Puertas Berlina
    120 CV
    209 CV
    27.970
    36.330
    /pub/fotos/vehiculoNuevo/180/9/429/2009/4_4/01.jpg

    5 Puertas Familiar grande
    120 CV
    209 CV
    29.470
    37.830
    /pub/fotos/vehiculoNuevo/180/9/429/2009/2_5/01.jpg

    ALFA ROMEO BRERA

    3 Puertas Coupe
    185 CV
    260 CV
    36.200
    45.510
    /pub/fotos/vehiculoNuevo/180/9/448/2009/6_3/01.jpg

    ALFA ROMEO GT

    2 Puertas Coupe
    140 CV
    165 CV
    29.040
    35.020
    /pub/fotos/vehiculoNuevo/180/9/325/2009/6_2/01.jpg

    ALFA ROMEO MITO

    3 Puertas Compacto
    79 CV
    170 CV
    14.150
    21.000

    /pub/fotos/vehiculoNuevo/180/9/594/2009/24_3/01.jpg

    ALFA ROMEO SPIDER

    2 Puertas Cabrio
    185 CV
    260 CV
    38.700
    48.010
    /pub/fotos/vehiculoNuevo/180/9/500/2009/1_2/01.jpg

  24. May 3, 2010 at 1:29 am Permalink
    Idrissa's Gravatar Idrissa

    Hello Marc,
    this is a useful tutorial, my concern is where do you put the Data in the case where you are programming for the iPhone? cheers

  25. June 23, 2010 at 1:55 pm Permalink
    FLHippy's Gravatar FLHippy

    This article has not made XMLParser my friend. Everything seems to compile fine without warnings but the delegate methods are never called. Sure wish I could get this worked out :)

  26. June 26, 2010 at 6:25 am Permalink
    Dilip's Gravatar Dilip

    Hello,
    I need simple example of xml parsing.
    code & xml file (with 1 tag).
    pls help.

  27. June 29, 2010 at 10:36 am Permalink
    Frank's Gravatar Frank

    Nice tutorial!
    Thx a lot.

    PS: just a note.. it is nice to have a link to the source code of the project in case you get some error or something =)

  28. July 17, 2010 at 6:44 pm Permalink

    Hi guys. It’s been a while since I checked on this. Please make sure you read the Update on top of the page for more information. NSXMLParser is kind of … yeah, not the way to go I’d say :)

  29. July 21, 2010 at 9:46 pm Permalink
    Georgia's Gravatar Georgia

    Hi!
    Thanks for the tutorial – it’s been super helpful.

    I’m currently having an issue where the application crashes after the parser runs successfully. I think it must be something with memory management or pointers or maybe just how the parser is exiting?

    Any suggestions would be greatly appreciated!

  30. November 26, 2010 at 4:25 am Permalink

    @Frank. hai, u got the source code rite?! just post it..

  31. February 4, 2011 at 1:41 pm Permalink

    Thanks! Will try and use this in my project, is writing my first application for mac osx now, great tutorial!

  32. March 2, 2011 at 10:56 pm Permalink
    Susan's Gravatar Susan

    If you would start *ONLY* doing cut/paste from actual/working code… all the typos would disappear.

  33. March 11, 2011 at 12:04 am Permalink
    Limunada's Gravatar Limunada

    Nice tutorial, thank you!

  34. March 11, 2011 at 11:34 am Permalink

    You, Sir, have been of invaluable service to me.

  35. March 13, 2011 at 12:27 pm Permalink

    Susan: You can always use the provided example above if you just want to copy/paste. I’m not a big fan of copy/paste coding however and sincerely hope that you got the point that this entry explains a concept and is not a ready-made library for your app.

  36. March 17, 2011 at 1:21 am Permalink
    Paul's Gravatar Paul

    Hi Marc

    Im new at this. Is it worth using an alternate method as mentioned in your update. Can you recommend the best option for xml parsing?

    Thanks for sharing

    P

  37. June 28, 2011 at 10:11 pm Permalink
    luke's Gravatar luke

    This is the worst xml tutorial I have came across yet, its inconsistent and generally hard to follow.

  38. July 7, 2011 at 3:20 pm Permalink

    I see a call to
            - (void)addPart:(Part *)part;
    but nothing about implementation.
    I can’t imagine it’s supposed to be empty.

    The whole “tutorial” here (which is really just bad copy-paste) is missing quite a lot, and the link to “MyBeer” (top of the page) blocks direct downloads (via right-click). If I want it, I have to open each and every page and copy from there. Not cool.

  39. July 7, 2011 at 3:40 pm Permalink

    And it gets better. MyBeer is 100% worthless. You shouldn’t link to that not-working garbage.

  40. July 8, 2011 at 7:47 pm Permalink

    I’d really give SVN a shot on this one..

  41. July 8, 2011 at 7:49 pm Permalink

    just link me up with a better one that serves you everything on a silver platter. (or just use the mentioned libraries at the top, SAX is not for everyone)

Dare to comment?