Author Topic: Email address checking for near hits  (Read 2778 times)

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Email address checking for near hits
« on: March 16, 2012, 12:06:17 PM »
I am amazed at people's inability to type their email address (or have it form filled).  We get a ton of stuff like @yaho.com, @gmal.com, @sbcglobal.com (should be .net), etc.

Anyone know of a javascript package that will check for near hits?  What I want to do is detect a near hit, alert the user to it, and give them the option of changing it.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #1 on: March 16, 2012, 12:15:03 PM »
Hmmm, looks like levenshtein distance is what I'm looking for.  Found a JS implemenation at http://webreflection.blogspot.com/2009/02/levenshtein-algorithm-revisited-25.html

Think I'll have to try it and see how well it'll work.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #2 on: March 16, 2012, 02:01:33 PM »
Hmmm, this might be difficult.  I need to figure out how close is "close".

Results of a test looking at just the domains:

gmai.com vs gmail.com: 1
gmai.com vs sbcglobal.net: 10
gmai.com vs yahoo.com: 5
sbcglobal.com vs gmail.com: 7
sbcglobal.com vs sbcglobal.net: 3
sbcglobal.com vs yahoo.com: 8
yaho.com vs gmail.com: 4
yaho.com vs sbcglobal.net: 11
yaho.com vs yahoo.com: 1
gmail.com vs gmail.com: 0
gmail.com vs sbcglobal.net: 10
gmail.com vs yahoo.com: 5
mikemill.org vs gmail.com: 9
mikemill.org vs sbcglobal.net: 11
mikemill.org vs yahoo.com: 11

Right now I'm thinking if it is between 1 and 3 it is close.  Maybe for the ones like the 5s I can do a bit more.  Like separating out the 2ld from the tld, and seeing how those compare.

charlie

  • Jackass In Charge
  • Posts: 7903
  • Karma: +84/-53
Re: Email address checking for near hits
« Reply #3 on: March 16, 2012, 02:06:40 PM »
You mean that script just compares how many characters are different? (I didn't click the link.)

I would think you could have separate rules specific to web addresses that would make it more accurate and helpful.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #4 on: March 16, 2012, 02:09:03 PM »
From what I understand about levenshtein it gives the number of "edits" needed.  If you know a better method I'm ALL ears.  What I really don't want to do is create a list of found mistakes and seeing if they match those ones.  I'd rather try to identify mistakes that haven't been founded yet.

charlie

  • Jackass In Charge
  • Posts: 7903
  • Karma: +84/-53
Re: Email address checking for near hits
« Reply #5 on: March 16, 2012, 02:22:58 PM »
Well, I think there are some obvious things to check for. For example, if xxxx.com was input but is not a valid domain, but xxxx.net is, then that really shouldn't be a 3 since that's an obvious thing to warn the user about.

Perhaps just doing between 1 and 3 is close enough to avoid all the extra work, though. You're just warning them and they can continue with their input, right?

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #6 on: March 16, 2012, 05:15:21 PM »
We already check for syntax and a valid domain with either an A or MX record.  So at this point there is nothing indicating that the address is wrong.

Yeah, this would be advisory only.

Steve

  • This 49%er supports Romney
  • Just a Jackass
  • *
  • Posts: 16120
  • Karma: +31/-410
  • Mr. Mom
Re: Email address checking for near hits
« Reply #7 on: March 16, 2012, 06:37:41 PM »
Why not keep it simple? For example make a short list of the common domains. Compare input against this list from the @ on. If there isn't a match, have the script compare it to those in the list and kick back a result when it is similar to a found in the list.

Then just say "hey fuck nut, did you mean gmail.com?"

I mean it's their own fuckup, why kill yourself trying to hold their hands?
hey ethic if you and i were both courting lily allen..... oh wait, which one of us has a relationship that lasted more than the bus ride home?

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #8 on: March 16, 2012, 07:23:54 PM »
Cause the simple version requires way too much oversight.

As to why:  Because it affects my users and potential users which affects the school's income.  The two most common places I've seen problems is when people are requesting information and when they are applying to the school.  If the email fails and they don't get the information they might get pissed/annoyed and not finish which means lost revenue.

Now we've already got people monitoring the bounces so there are followups but those don't come for 3 days on average.

If I can spend a day and come up with a solution that actively addresses the issue before it is an issue that means less user annoyances and less man hours spent on errors which means overall benefit to the organization.

webwhy

  • Jackass IV
  • Posts: 608
  • Karma: +15/-10
Re: Email address checking for near hits
« Reply #9 on: March 21, 2012, 12:38:51 AM »
Mailcheck.js Just released.  I haven't looked at it yet

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Email address checking for near hits
« Reply #10 on: March 21, 2012, 01:04:03 AM »
https://github.com/Kicksend/mailcheck

Thanks.  Looks exactly like what I'm looking for.