Update:
Since this is my most read blog article and it’s now 2 years old, one would think that things have changed, and they have.
2 Things: First, if you are free to choose between any API data representation, use JSON or plist, JSON is especially nice, because it’s very small and can be parsed with YAJL or TouchJSON.
And second, and foremost, please use something less painful.. like WonderXML – it really really helps. However, if you are still up to using NSXMLParser – go on reading..
For a demo on this, you may use http://svn.liip.ch/repos/public/iphone/MyBeer
As promised, here is a little How-I-did-it / How-To.
First off: I am not an experienced SAX-User.. So this approach might be packing the problem at it’s tail, but this is how DOM-Users feel comfortable with ;)
Let’s assume we want to parse the following XML:
tranist.xml
<root>
<schedules>
<schedule id="0">
<from>SourceA</from>
<to>DestinationA</to>
<links>
<link id="0">
<departure>2008-01-01 01:01</departure>
<arrival>2008-01-01 01:02</arrival>
<info>With food</info>
<parts>
<part id="0">
<departure>2008-01-01 01:01</departure>
<arrival>2008-01-01 01:02</arrival>
<vehicle>Walk</vehicle>
</part>
<part id="1">
<departure>2008-01-01 01:01</departure>
<arrival>2008-01-01 01:02</arrival>
<trackfrom>1</trackfrom>
<trackto>2</trackto>
<vehicle>Train</vehicle>
</part>
</parts>
</link>
<link id="1">
...
</link>
<link id="2">
...
</link>
</links>
</schedule>
<schedule id="1">
...
</schedule>
<schedule id="2">
...
</schedule>
</schedules>
</root>
In human readable format, this means: We have multiple schedules with from/to etc. These schedules consist of multiple links (different connections for the same route) with departure/arrival etc. These links consist then of multiple parts/sections with various elements which are not sure to be there..
With the let’s find the element called ‘part’ – approach, you won’t get anywhere..
The Basics
So what do we want to achieve? We want a list/array of Schedules, which have the given members. On member is a list/array of Links, also consisting of the given members and a list/array of parts with the respective members.
This is also the basic idea behind my approach: for every new node-container, use a new class/object (an array will also work, but it’s kinda crap..)
Now we have a Schedule class, a Link class and a Part class.
This is an example of the Link class interface:
Link.h
#import "Part.h"
@interface Link : NSObject {
NSString *departure;
NSString *arrival;
NSString *info;
NSMutableArray *parts;
}
@property (nonatomic, retain) NSString *departure;
@property (nonatomic, retain) NSString *arrival;
@property (nonatomic, retain) NSString *info;
@property (readonly, retain) NSMutableArray *parts;
- (void)addPart:(Part *)part;
@end
We use an accessor method for the parts, because it just feels better when dealing with arrays. (Instead of later using [foo.myArray addObject:..] we have [foo addMe:..])
Also we make it easier for us, using retain properties..
The Parser setup
A short introduction into SAX:
The parsing goes node by node and is not nesting-sensitive. That means that first we get root, then schedules, then schedule, then from, then to, then links, then link, then departure etc. As soon as the parser returns you the node for example, you don’t know anymore in what schedule you were. As long as you have a clearly defined structure where always every element must be present, you could do this using a counter, but as soon as you have multiple nodes with no defined count, you have a problem.
What we do is known as recursive parsing. What does this mean? We implement some kind of memory.
In our parser, we have 4 members and 1 method (to make actual use of the parser..):
@property (nonatomic, retain) NSMutableString *currentProperty;
@property (nonatomic, retain) Schedule *currentSchedule;
@property (nonatomic, retain) Link *currentLink;
@property (nonatomic, retain) Part *currentPart;
@property (nonatomic, readonly) NSMutableArray *schedules;
- (void)parseScheduleData:(NSData *)data parseError:(NSError **)error;
(Yes, this needs to be a NSMutableString..)
Your parseScheduleData method should look similar to the following:
parseJourneyData
- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
self.schedules = [[NSMutableArray alloc] init]; // Create our scheduler list
[parser setDelegate:self]; // The parser calls methods in this class
[parser setShouldProcessNamespaces:NO]; // We don't care about namespaces
[parser setShouldReportNamespacePrefixes:NO]; //
[parser setShouldResolveExternalEntities:NO]; // We just want data, no other stuff
[parser parse]; // Parse that data..
if (err && [parser parserError]) {
*err = [parser parserError];
}
[parser release];
}
Now we need those delegate methods.
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
This function is called by the parser, when it reads something between nodes. (Text that is..) Like with blah it would read “blah”. It is possible, that this method is called multiple times in one node. As you will see later, we define the property “currentProperty” only if we find a node, we care about. That’s why we test it against this property to make sure, that we need this property. This will then look something like this:
Parser
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
if (self.currentProperty) {
[currentProperty appendString:string];
}
}
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict
This is called, when the parser finds an opening element. In this case, we have a few cases, we need to distinguish. These are:
It’s standard property in the schedule (like <form> etc.) or it’s a deeper nested node (like <links>), the same for all the other nodes.
How to? We define, that we only set a member, if we are in that node. That means, only when we have entered a <part>, then currentPart is set, otherwise it’s nil. The same with the others.
We do then need to check them in reverse order of their nesting level.. Why? Because if we would check for currentLink before currentPart, currentLink would also evaluate to YES/True and hence we will have a problem if their are elements with the same name. If we aren’t in any node, then there is probably a new main node comming -> in the else..
When we hit a nested node, we need to allocate the respective member of our class, so we can use it when the parser gets deeper into it.
This will look like this:
Parser
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
if (qName) {
elementName = qName;
}
if (self.currentPart) { // Are we in a
// Check for standard nodes
if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"vehicle"] || [elementName isEqualToString:@"trackfrom"] || [elementName isEqualToString:@"trackto"] ) {
self.currentProperty = [NSMutableString string];
}
} else if (self.currentLink) { // Are we in a
// Check for standard nodes
if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"info"]) {
self.currentProperty = [NSMutableString string];
// Check for deeper nested node
} else if ([elementName isEqualToString:@"part"]) {
self.currentPart = [[Part alloc] init]; // Create the element
}
} else if (self.currentSchedule) { // Are we in a ?
// Check for standard nodes
if ([elementName isEqualToString:@"from"] || [elementName isEqualToString:@"to"]) {
self.currentProperty = [NSMutableString string];
// Check for deeper nested node
} else if ([elementName isEqualToString:@"link"]) {
self.currentLink = [[Link alloc] init]; // Create the element
}
} else { // We are outside of everything, so we need a
// Check for deeper nested node
if ([elementName isEqualToString:@"schedule"]) {
self.currentSchedule = [[Schedule alloc] init];
}
}
}
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName
Basically, the same things apply as for didStartElement above. This time, we need to clean things up and assign them if they are set :) This is a bit a pitty, since it’s a lot of code.. *(for not so much)
It’s the same checker-structure..
If we are in a deeper nested node (like <Link>) and we hit an ending element of that nested node (like </Link>), Then we need to add this element to the parent (like <Schedule>) and set it to nil
See yourself:
Parser
- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
if (qName) {
elementName = qName;
}
if (self.currentPart) { // Are we in a
// Check for standard nodes
if ([elementName isEqualToString:@"departure"]) {
self.currentPart.departure = self.currentProperty;
} else if ([elementName isEqualToString:@"arrival"]) {
self.currentPart.arrival = self.currentProperty;
} else if ([elementName isEqualToString:@"vehicle"]) {
self.currentPart.vehicle = self.currentProperty;
} else if ([elementName isEqualToString:@"trackfrom"]) {
self.currentPart.trackfrom = self.currentProperty;
} else if ([elementName isEqualToString:@"trackto"]) {
self.currentPart.trackto = self.currentProperty;
// Are we at the end?
} else if ([elementName isEqualToString:@"part"]) {
[currentLink addPart:self.currentPart]; // Add to parent
self.currentPart = nil; // Set nil
}
} else if (self.currentLink) { // Are we in a
// Check for standard nodes
if ([elementName isEqualToString:@"departure"]) {
self.currentLink.departure = self.currentProperty;
} else if ([elementName isEqualToString:@"arrival"]) {
self.currentLink.arrival = self.currentProperty;
} else if ([elementName isEqualToString:@"info"]) {
self.currentLink.info = self.currentProperty;
// Are we at the end?
} else if ([elementName isEqualToString:@"link"]) {
[currentSchedule addPart:self.currentLink]; // Add to parent
self.currentLink = nil; // Set nil
}
} else if (self.currentSchedule) { // Are we in a ?
// Check for standard nodes
if ([elementName isEqualToString:@"from"]) {
self.currentSchedule.from = self.currentProperty;
} else if ([elementName isEqualToString:@"to"]) {
self.currentSchedule.to = self.currentProperty;
// Are we at the end?
} else if ([elementName isEqualToString:@"schedule"]) { // Corrected thanks to Muhammad Ishaq
[schedules addObject:self.currentSchedule]; // Add to the result node
self.currentSchedule = nil; // Set nil
}
}
// We reset the currentProperty, for the next textnodes..
self.currentProperty = nil;
}
Finally..
Well, that’s it. You can expand / shrink this principle as you like. You can also add a maxElements counter, like in the SeismicXML example of the iPhone SDK to get only a certain number of elements. You can abort the parser with [parser abortParsing]; It is important, that you don’t abort while in a deeper nested node, because this could lead to inconsistencies. You will need to skip them..
Please note, that I wrote this, while watching TV, so you may need to fix some syntax errors ;) But I hope you get the idea..
I think I’m missing something, at what point is the external XML files called?
Right here:
- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
You can also load it with “initWithContentsOfURL” or with a string with [NSData initWithByes:length:] (and provide the xml string) or with [NSData initWithContentsOfFile]
And this file is included in your controller/wherever class, where you downloaded and need your data. You then call this parser with
[myParser parseJourneyData:myData parseError:&err];
As far as I can see you are not releasing properly; as in e.g.
self.currentPart = [[Part alloc] init]; // Create the element
the Part instance should be relased just after the assignment because the currentPart property has a retain attribute.
David: You are absolutely right. I actually copied the property part from some real code and wrote the other in the blog editor. Will correct it, as soon as I find some time. Thanks!
Thanks for this excellent tutorial – this is exactly what I’ve been searching for.
Thanks for this write up. I am just starting out with Cocoa and even driven xml but will be sure to give this a second read tomorrow.
Any particular reason you used the dot syntax?
@Marc:
Do you have the code please??
is giving me an error
Thanks so much, I got the link from a friend, I will have a try~
awesome! but shouldn’t the last else if check for a “schedule” at the end instead of a “link” i.e.
else if (self.currentSchedule) { // Are we in a ?
// Check for standard nodes
if ([elementName isEqualToString:@"from"]) {
self.currentSchedule.from = self.currentProperty;
} else if ([elementName isEqualToString:@"to"]) {
self.currentSchedule.to = self.currentProperty;
// Are we at the end?
} else if ([elementName isEqualToString:@"link"]) { // <<<< shouldn’t this line check for a @”schedule” instead of a @”link”
[schedules addObject:self.currentSchedule]; // Add to the result node
self.currentSchedule = nil; // Set nil
}
}
Muhammad Ishaq: Oh, yes, silly me, that is the typical copy/paste error :)
Do you have this code in full? I am having trouble understanding the snippets you have been so kind to show us and how they fit into the whole project.
Thanks for the great tutorial!
greg: I’m currently doing my annual army service, will come back to your post in 2 wks
I would love to see the complete code too. I’m getting errors when I’m trying to use my object classes…such a noob.
Looks like there is a new XML/HTML parser on the block. May be worth a look…
I would like to know when and how to release the objects properly. Here there are apparently some memory leaks.
it crashes for me when release the parser on iPhone.
and if I don’t release it after parse, instruments find a leak…
As I mentioned in the blog post and earlier in the comments, I wrote this while sitting in front of the tv – in my blog editor. So there might be a few syntax errors and of course memory leaks.
I try to setup a very small example of how it’s supposed to be connected.
I think this is what I need to solve my problem but the example is too complicated to understand :-(
This is exactly what I have been searching for! I agree it is a tad complicated to follow but I will persist. However what I do not see at present is how you would drill into the various classes? So if you decided to populate a table with schedules that you then display, click on one and drill into that etc etc. :-(
in the didEndElement method, the assignment,
[schedules addObject:self.currentSchedule];
Does anybody get this while printing the contents of the schedules array?
by this i meant Schedule: 0x1164d60 , was in ‘html’ tags.. :(
Hi Marc;
I saw your demo project video on transport application ans saw some data(means:Make NSXMLParser your friend) load in this application. you had parse some xml in this project.
my problem is “how to stored in this xml data on the xml file on a single array.”. i have one xml file. format to be same use for u. how to parse in this xml then how to stored the data.
nombre this tag car name
mydoubt: does not set the value(car details) on aparticular car name. suppose ALFA ROMEO 159 is car name.this car models are two details blow xml. To display first car name:ALFA ROMEO 159 details:detail-1 and second car name:ALFA ROMEO 159 details:detail-2.
i am try use in your concept. but not set value. how many class create .how to stored value.how to set value from array;
i am trying 2weeks marc.help me marc.to send my mail id
−
ALFA ROMEO 147
−
−
5 Puertas Compacto
105 CV
150 CV
18.820
23.620
−
/pub/fotos/vehiculoNuevo/180/9/371/2009/24_5/01.jpg
−
ALFA ROMEO 159
−
−
4 Puertas Berlina
120 CV
209 CV
27.970
36.330
/pub/fotos/vehiculoNuevo/180/9/429/2009/4_4/01.jpg
−
5 Puertas Familiar grande
120 CV
209 CV
29.470
37.830
/pub/fotos/vehiculoNuevo/180/9/429/2009/2_5/01.jpg
−
ALFA ROMEO BRERA
−
−
3 Puertas Coupe
185 CV
260 CV
36.200
45.510
/pub/fotos/vehiculoNuevo/180/9/448/2009/6_3/01.jpg
−
ALFA ROMEO GT
−
−
2 Puertas Coupe
140 CV
165 CV
29.040
35.020
/pub/fotos/vehiculoNuevo/180/9/325/2009/6_2/01.jpg
−
ALFA ROMEO MITO
−
−
3 Puertas Compacto
79 CV
170 CV
14.150
21.000
−
/pub/fotos/vehiculoNuevo/180/9/594/2009/24_3/01.jpg
−
ALFA ROMEO SPIDER
−
−
2 Puertas Cabrio
185 CV
260 CV
38.700
48.010
/pub/fotos/vehiculoNuevo/180/9/500/2009/1_2/01.jpg
Hello Marc,
this is a useful tutorial, my concern is where do you put the Data in the case where you are programming for the iPhone? cheers
This article has not made XMLParser my friend. Everything seems to compile fine without warnings but the delegate methods are never called. Sure wish I could get this worked out :)
Hello,
I need simple example of xml parsing.
code & xml file (with 1 tag).
pls help.
Nice tutorial!
Thx a lot.
PS: just a note.. it is nice to have a link to the source code of the project in case you get some error or something =)
Hi guys. It’s been a while since I checked on this. Please make sure you read the Update on top of the page for more information. NSXMLParser is kind of … yeah, not the way to go I’d say :)
Hi!
Thanks for the tutorial – it’s been super helpful.
I’m currently having an issue where the application crashes after the parser runs successfully. I think it must be something with memory management or pointers or maybe just how the parser is exiting?
Any suggestions would be greatly appreciated!
@Frank. hai, u got the source code rite?! just post it..
Thanks! Will try and use this in my project, is writing my first application for mac osx now, great tutorial!
If you would start *ONLY* doing cut/paste from actual/working code… all the typos would disappear.
Nice tutorial, thank you!
You, Sir, have been of invaluable service to me.
Susan: You can always use the provided example above if you just want to copy/paste. I’m not a big fan of copy/paste coding however and sincerely hope that you got the point that this entry explains a concept and is not a ready-made library for your app.
Hi Marc
Im new at this. Is it worth using an alternate method as mentioned in your update. Can you recommend the best option for xml parsing?
Thanks for sharing
P
This is the worst xml tutorial I have came across yet, its inconsistent and generally hard to follow.
I see a call to
- (void)addPart:(Part *)part;
but nothing about implementation.
I can’t imagine it’s supposed to be empty.
The whole “tutorial” here (which is really just bad copy-paste) is missing quite a lot, and the link to “MyBeer” (top of the page) blocks direct downloads (via right-click). If I want it, I have to open each and every page and copy from there. Not cool.
And it gets better. MyBeer is 100% worthless. You shouldn’t link to that not-working garbage.
I’d really give SVN a shot on this one..
just link me up with a better one that serves you everything on a silver platter. (or just use the mentioned libraries at the top, SAX is not for everyone)