Parsing Obsession

by 22. April 2010 07:40

This past weekend I had an interview in Austin, TX with a pretty badass company.
The architects hit me with a question that made me cock my head to the side like
a cocker-spaniel that just heard a violin. The question wasn't brain surgery, but
it was complex enough that it required a lot of though and I got a little obsessed.
I had to code it today.

Here was the question (not verbatim, but close):

Write a method that returns a list of strings from a line from a comma delimited
file. Here's the data:

Bill,Brown,"austin, tx", 123, """Jr."""

Output should be:
Bill
Brown
austin, tx
123
"Jr."

I started to pseudocode it using some regex, and they didn't seem to thin that would
work. I went a couple more routes. In the end we discussed and the one architect
said he had solved this by rolling through each character in the line. That seemed
like a perfectly reasonable way, but I was really hung up on a couple of things.

1.) I think that sounds like a lot of CPU cycles for this operation
2.) I wanted to do this without rolling through each character, because that's how I
    roll.


So today my OCD got the best of me... and here's what fell out:

1: using System;
2:     using System.Collections.Generic;
3:     using System.Linq;
4:     using System.Text;
5:     using System.Text.RegularExpressions;
6:     using System.Diagnostics;
7:    
8:     namespace TestLib
9:     {
10:        public class ParserThing
11:         {
12:            public ParserThing()
13:            {
14:                DateTime str = DateTime.Now;
15:                //string dirtyLine = @"Bob,Brown,""""Jr."""",""dayton,oh,N"",123-45-6789";
16:                string dirtyLine = @"Bill,Black,""""""Sr."""",,\/some/thing\/,"""",,""123 Street Ave.,West. SomeCity,OH 45454"",""000000-0000""";
17:                int executions = 100;
18:                List<string> words = new List<string>();
19:                for (int i = 1; i <= executions; i++)
20:                {
21:                    words = CleanIt(dirtyLine);
22:                }
23:                TimeSpan ts = DateTime.Now - str;
24:                Debug.WriteLine("Execution Time: " + Math.Floor(ts.TotalMilliseconds).ToString());
25:                foreach (string word in words)
26:                {
27:                    Debug.WriteLine("Word: " + word);
28:                }            }
29:            public static List<string> CleanIt(string rawLine)
30:            {
31:                const string QUO = @"""";
32:                const string DELIMITER = ",";
33:                List<string> retVal = new List<string>();
34:                List<string> tmpWords = new List<string>();
35:                Regex regx = new Regex(@"""{2,}");
36:                string[] words = rawLine.Split(new string[] { DELIMITER },StringSplitOptions.None);
37:                bool isInQuoteBlock  = false;
38:                StringBuilder finalWord = new StringBuilder();
39:                string tmpWord = string.Empty;
40:                string word = string.Empty;
41:                for (int i = 0, l = words.Length - 1; i <= l; i++)
42:                {
43:                    word = words[i];
44:                    tmpWord = words[i];
45:                    if (word.Contains(QUO))
46:                    {
47:                        if (regx.IsMatch(word))
48:                        {
49:                        tmpWord = regx.Replace(word, @"""");
50:                        }
51:                        else
52:                        {
53:                        tmpWord = word.Replace(@"""", "");
54:                        }
55:                        finalWord.Append(tmpWord);
56:                        if (word.StartsWith(@"""") && !word.EndsWith(@""""))
57:                        {
58:                            // this is a partial word
59:                            finalWord.Append(DELIMITER);
60:                            isInQuoteBlock = true;
61:                        }
62:                        else if (word.EndsWith(@"""") || i == l)
63:                        {
64:                            // this is the end of a block
65:                            isInQuoteBlock = false;
66:                            retVal.Add(finalWord.ToString());
67:                            finalWord.Length = 0; // clear sb
68:                        }
69:                    }
70:                    else if (isInQuoteBlock)
71:                    {
72:                        finalWord.Append(word + DELIMITER);
73:                    }
74:                    else
75:                    {
76:                        retVal.Add(word);
77:                    }
78:                }
79:                return retVal;
80:                }
81:            }
82:        }
Important to note: I wrote this in a pretty short amount of time. There's probably some shortcuts I could do in here, I don't like the if/else if/else use in this. I'm not going to spend any more time on this, but damnit it works, and it works well, and it's reasonably fast. On my box, I can parse 100k iteration in about 1200ms. ... I didn't get the job, btw, but that's ok, it wasn't the right time. I got to see Austin, and I got to interview at a really awesome place. I'm hating blogengine, btw, I need to change this blog to something else.

Tags: , ,

.NET | Development | How To

.NET Triplet and Pair classes… and a lesson in lousy namespacing.

by 20. February 2010 20:16

 

This week I was working on a project and came across a class I’d never used in C# called a Triplet. You can look up the Triplet on MSDN here.

The Triplet is just a class with three properties: First, Second, and Third. Each of the properties holds an instance of an object. That’s it. That’s all this thing does. The very first thing I noticed once I saw what it did was the namespace it’s in. You’d think it would be in System.Collections, or System.Collections.Specialized or something, but no… it’s it System.Web.UI along with things like the System.Web.UI.Control class. WTFBBQ?

I would love to hear a rational explanation for this. I am pretty anal about namespacing and I don’t like having to stretch the tasteful bounds of a namespace to put something in. I will on occasion, but this is a bit much.

I downloaded Reflector to see if I could find any other gems stored in that namespace. Lo and behold… I find Pair. Care to guess what Pair is? Exactly.. it’s the same as Triplet… but with…. TWO properties.

Up on obsessing over this I started to think about why these classes even exist at all.  What is wrong with object[] objArray = new object[2]; ? Is that so complicated that there needs to be a type, and a horribly misplaced type at that?

In lieu of my findings I have created the following class I want added to the .NET Framework …. The Duodecuple.

1: using System;
2: //Stay consistent with completely nonsensical namespacing namespace System.Workflow.Runtime.DebugEngine
3: {
4:      // that's "12" for those of you who don't feel like going to wikipedia
5:      public class Duodecuple
6:      {
7:               public object First { get; set; }
8:               public object Second { get; set; }
9:               public object Third { get; set; }
10:               public object Fourth { get; set; }
11:               public object Fifth { get; set; }
12:               public object Sixth { get; set; }
13:               public object Seventh { get; set; }
14:               public object Eighth { get; set; }
15:               public object Ninth { get; set; }
16:               public object Tenth { get; set; }
17:               public object Eleventh { get; set; }
18:               public object Twelfth { get; set; }
19:               ///
20:               /// Constructor. Since we're taling C# 3.5+ here you can just
21:               /// instantiate with ... new Duodecuple { First = whatever, Second = anotherObject };
22:               /// No sense in having all those constructors
23:               ///
24:               public Duodecuple (){}
25:      } }

you're welcome.

-Mike

 

Tags: , , ,

.NET | Complaining | Development

ASP.NET UserControls that are AWESOME: Using Events, properties, and methods.

by 28. September 2009 07:21

UserControls are underused
I am always shocked to see how little UserControls are used by some ASP.NET developers. To me, the UserControl is where a bulk of the real power in ASP.NET is. It took me a long time to get really used to using them, coming from a Classic ASP background, the loss of Server Side Includes was not easy to get used to. Thankfully, UserControls gave me back a commanding portion of the functionality I needed.

A Better Model / View / Controller
It’s really difficult to implement an MVC pattern in ASP.NET that I can look at and be proud of. Most of the implementations I have seen are very restrictive frameworks that require a lot of adherence to naming conventions, use of ISAPI filters, XML documents that have to be manually maintained as db or class structure changes, problems with Medium / Low Trust environments, and big performance losses.

ASP.NET, I think, had MVC in mind when they developed the paradigm in which they expect pages to operate, but they sure put a boatload of tools in to derail that. Your .aspx page is your view, unfortunately, they tightly coupled a controller (aspx.cs, code-behind) to every view, to the point they are almost synonymous. Then the model, ugh.. you SHOULD be writing a model and a controller to be changing the views, but with WONDERFUL tools like the SqlDataSource, and XmlDataSource they promote bypassing the use of a model or data-layer, and promote addition of business logic in the code-behind. I’m not going to go on about this...

While what you see in my example is certainly not an MVC solution, but it will get you closer to a good and flexible pattern that you can rely on.

UserControls vs. ServerControl

UserControls are not ServerControls and vice-versa. A UserControl is a .ascx(.cs) file and for all shapes and purposes functions very similar to a .aspx(.cs) page. It has a similar life-cycle, and similar properties. The difference is that a ServerControl is typically a small, contained, piece of functionality ... think of a TextBox, or a Button control. ServerControls are usually an element made for a narrowed purpose. A UserControl on the other hand is typically a collection of controls for a specific business purpose. In short, if you extended a TextBox to use specifically for a first name you could put built-in usage and length rules, etc. Great example of a ServerControl. Comparatively, if you wanted a reusable login form for your website, that is something that you would want as a UserControl. UserControls are typically specific to the project or solution, and are very flexible. This is what we will focus on.

Explaining The Code
This would be a good time to download the .ZIP file and get into the code:

The web-app is a basic phonebook, there is no database, all of the entries are stored in a hashtable in Session.

Default.aspx - This is the main view. The UserControls are registered at the top of the .aspx file. In the CodeBehind you will see one method, and that is the EventHandler for one of the UserControls.

/controls - Folder for UserControls

/controls/ViewDataControl.ascx - This control is used for just SHOWING the data, really just contains a gridview. The CodeBehind contains several methods, the public “Refresh” method is the important one because that is exposed in order for the data in the control to be reloaded.

/controls/AddControl.ascx - This is the big one. This UserControl is the form where a person will add a new name to the phonebook. There are a few very important key points to note:
This control contains two properties. They allow abstracted access to the TextBox’s .Text properties. This allows the .ASPX page to read values from the form directly whenever it needs to.
This control contains a public DELEGATE and a public EVENT. This event is fired when the “add” button is pressed. This is how we are able to notify the .ASPX page that something has happened

/App_Code/PhoneBookDataHandler.cs - This class is what actually interacts with the Session data (the hashtable). We break logic like this out from the .ascx.cs and .aspx.cs  files because these functions are universal to the application. the aspx.cs and ascs.cs files SHOULD NOT (not just in this app, either) contain business logic. Those files are there to control the rendering of the page. I said above that I believe these files were also intended to be “controllers” for some perversion of an MVC pattern.

Back to the Default.aspx(.cs) files. - You can probably now see how the Default.aspx page is just the glue that holds the whole thing together. It’s nothing more than a conductor for the UserControls. The UserControls are put on the .aspx page, we then tell the .aspx page that when AddControl.ascx’s “OnEntryAdded” event fires, run the .Refresh() method on ViewDataControl.ascx. ViewDataControl.ascx then grabs the data from Session and puts it in the GridView.

WTF is Business Logic?
Many applications are written for a certain purpose, industry, or business. For this reason the “entities” (like a customer number) adhere universally to certain rules, and those rules have to be validated throughout the application. You do not want that type of logic repeated every time you have to do that, you just want to call a method that does it for you.

You will need to view the code to understand this, most likely. I recommend viewing in this order:

  • Default.aspx
    • Look at the UserControls being registered at the top of the page
    • Look at Line 16 where we tell the OnEntryAdded event to be handled by the method ucAddControl_EntryAdded
  • Default.aspx.cs
    • Look at the ucAddControl_EntryAdded method. All it does is run the .Refresh() method on the ViewData UserControl
  • ViewDataControl.ascx.cs
    • Notice the methods to get data simply ask the PhoneBookDataHandler class for the data.
    • View PhoneBookDataHandler.cs if you want to see the interaction with Session and the HashTable, but it’s not critical to understanding the UserControls.
  • AddControl.ascx
    • Just take a look at what controls are on the page, the text-boxes and buttons. The RequiredFieldsValidators are irrelevant to the tutorial.
  • AddControl.ascx.cs
    • This is the most important file of the entire tutorial.
    • Lines 19 & 20
      • Public properties that allow read-only access to the TextBox’s .Text property
    • Line 26
      • The delegate that defines the signature of the event
    • Line 32
      • The Event that broadcasts when an entry is added
    • Line 43
      • VERY IMPORTANT: This method is what actually fires the event.
    • Line 56
      • Just the button-click event handler. Calls methods that add the data, then fires the event.


I hope this helps you write awesome apps!

-Mike

AwesomeUserControls.zip (7.31 kb)

Tags: , , , , , ,

.NET | Development | How To

SweetLIB is LIVE!

by 19. September 2009 21:27

I've finally uploaded SweetLIB.

You can go view the project here (http://sweetlib.codeplex.com/).

SweetLIB is a library I've had for probably 3 years but never really had the balls to publish. I've done a lot to it.

The Major Features:

  • ADO Wrapper and Factory (helps make apps db agnostic)
  • Helpers for
    • Regex validation of data (zip codes, credit cards, email addys, etc)
    • Finding controls in control trees
    • Finding nodes in Tree controls
  • Object Validation classes
  • Web page error handling with base classes

If you use it, let me know what you think. There is an included web project that serves as documentation right now, but it's not all that great.

Tags: , , ,

Development


RecentComments

Comment RSS