Adium

Ticket #11160 (assigned defect)

Opened 3 months ago

Last modified 2 weeks ago

URLs with String Formatting escapes in them do not get linked.

Reported by: earthmkii Assigned to: earthmkii (accepted)
Priority: high Milestone: Adium 1.3.3
Component: AutoHyperlinks Version: 1.3.2svn
Severity: normal Keywords:
Cc: Patch: None
Pending: 0

Description

Related to #11059.

Links such as http://example.com/hi%uthere do not get linked after r25236 (although the application hang they caused has been cured).

A workaround is needed to properly handle these links.

Change History

09/28/2008 10:36:50 PM changed by earthmkii

  • status changed from new to assigned.

(follow-up: ↓ 3 ) 10/23/2008 11:03:26 AM changed by naixn

This bug also exists in the URI part, though the link is at least partially linked :

http://node%20with%20spaces.example.com/
gets highlited like :
http://node%20with%20spaces. example.com/

(in reply to: ↑ 2 ; follow-up: ↓ 4 ) 10/23/2008 04:59:43 PM changed by earthmkii

Replying to naixn:

This bug also exists in the URI part, though the link is at least partially linked : http://node%20with%20spaces.example.com/
gets highlited like :
http://node%20with%20spaces. example.com/

That would be by design, as spaces are technically illegal characters in domain names.

This bug deals with C-style string formatting escapes - URL escapes work properly in URL paths after domain names.

(in reply to: ↑ 3 ) 10/23/2008 06:06:33 PM changed by naixn

Replying to earthmkii:

That would be by design, as spaces are technically illegal characters in domain names. This bug deals with C-style string formatting escapes - URL escapes work properly in URL paths after domain names.

That's true for spaces, but for example : http://t%65st.example.com/ is correct, as %65 is e, and it becomes http://test.example.com.

But AutoHyperlinks doesn't like it :
http://t%65st.example.com
becomes
http://t%65st. example.com
Instead of
http://t%65st.example.com

10/23/2008 06:10:15 PM changed by boredzo

naixn: Then please file another ticket. This one is for string formatting sequences, not URL escape sequences.

10/23/2008 06:22:14 PM changed by naixn

Ok, sorry for the bad comments then.

Ticked filled : #11294

12/21/2008 01:47:21 PM changed by earthmkii

This looks to be breaking either in -[NSURL initWithString:] or CFURLCreateStringByReplacingPercentEscapesUsingEncoding() (if I try to urldecode the string first, then reencode all escapable sequences).

  • I know of no way to safely pass these functions a string with potential formatting escapes in them.
  • Working around this would require writing a custom parser to safely convert all potential formatting escapes (both C and Obj-C) to be preceded by a urlencoded ampersand. This seems like a lot of work for an edge case.
  • Do we need to file this with Apple?

(follow-up: ↓ 9 ) 12/26/2008 01:00:08 AM changed by thekancer

I'm new to the Adium codebase and somewhat new to Obj-C, but felt like trying to help out with some of these annoying bugs.

I saw this one and was reminded of similar problems in other projects. So I experimented with LinkDriver.app and the AH scanner code and the problem is definitely in the CFURLCreateStringByAddingPercentEscapes method (AHMarkedHyperlink.m)

Technically speaking, this is the correct behaviour. Any %'s that aren't there to represent encoded characters should be encoded itself (i.e. %25 and not just %). Naturally there is no way for CFURLCreateStringByAddingPercentEscapes to know what is encoded and what isn't. I'm pretty sure having regular % characters in a URI is also against some RFC, but that is beside the point... people will always have malformed URIs / URLs.

I experimented a bit and just did a manual encode of any % characters. Literally just a replace. The code I used to experiment with is (AHMarkedHyperlink.m):

- (void)setURLFromString:(NSString *)inString
{
	NSString	*linkString;
	NSString	*doubleEncString;
	
	doubleEncString = [inString stringByReplacingOccurrencesOfString:@"%" withString:@"%25"];

	linkString = (NSString *)CFURLCreateStringByAddingPercentEscapes(kCFAllocatorDefault,
	                                        (CFStringRef)doubleEncString,
	                                        (CFStringRef)@"#%[]",
	                                        NULL,
	                                        kCFStringEncodingUTF8); // kCFStringEncodingISOLatin1 );

	[linkURL release];
	linkURL = [[NSURL alloc] initWithString:linkString];

	[linkString release];

}

Very hackish experiment but it worked. If you build LinkDriver.app it should stand up to fuzzing. http://example.com/hi%uthere is now handled properly as an NSURL and is returned as http://example.com/hi%25uthere

Literally just double-encoding, replacing % with its URLencoded equivalent. I would guess that it should only be doing this on % tokens that don't have a 2-digit hex suffix.

I hope this helps the maintainer of this ticket. I would submit proper code changes but I am not familiar enough with everything yet. Cheers.

(in reply to: ↑ 8 ) 12/26/2008 01:26:42 AM changed by earthmkii

Replying to thekancer:

Technically speaking, this is the correct behaviour. Any %'s that aren't there to represent encoded characters should be encoded itself (i.e. %25 and not just %). Naturally there is no way for CFURLCreateStringByAddingPercentEscapes to know what is encoded and what isn't. I'm pretty sure having regular % characters in a URI is also against some RFC, but that is beside the point... people will always have malformed URIs / URLs.

That's exactly the issue here - both CFURLCreateStringByReplacingPercentEscapes() and -[NSURL initWithString:] work as intended, and return null for a (technically) malformed URL.

I experimented a bit and just did a manual encode of any % characters. Literally just a replace. The code I used to experiment with is (AHMarkedHyperlink.m):

...

Literally just double-encoding, replacing % with its URLencoded equivalent. I would guess that it should only be doing this on % tokens that don't have a 2-digit hex suffix. I hope this helps the maintainer of this ticket. I would submit proper code changes but I am not familiar enough with everything yet. Cheers.

Thats the gist of it - to make an effective workaround, we'd need an efficient way to encode all '%'s not already part of a valid escape sequence. (Your solution, btw, is effectively the same as removing the percent symbol from the 3rd argument of CFURLCreateStringByAddingPercentEscapes())

If you wanna have a go at writing a workaround and submit a patch, be my guest. :)